Message/Author 


Hello Linda, I have several questions concerning growth models in Mplus: (1) I have estimated a two level multilevel model with the following code below. Is there and example that would show how to the parameters in the SEM framework and if so does this SEM framework give all the same parameter estimates as the multilevel modeling framework? (2) Why is Mplus able to “handle” missing data at both level one and level two as opposed to other programs such as HLM and SAS PROC MIXED which can only “handle” missing data at level one; is this due to the optimization algorithm? Is this true both in the (3) I’m trying to understand what default algorithm Mplus uses to estimate parameters in the presence of missing data. I’m fitting a two level model with random intercepts and slopes. My understanding is that MLR estimator is used along with the EM algorithm and that FIML is not being used? Best, Tom TITLE: This is an example of a onelevel growth model for a continuous outcome (twolevel analysis) DATA: FILE IS willet_mplus_missing.dat; format=free; VARIABLE: NAMES = id covar y time; USEVARIABLES = id y time; WITHIN = time; CLUSTER = id; MISSING= y (99); ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% s  y ON time; %BETWEEN% y s; y WITH s; 


1. Example 6.1 with residual variances held equal would be the same. 2. It is due to the optimization procedure. 3. MLR is FIML. The same parameter estiamtes are obtained. It is only standard errors and chisquare that differ. In most cases, QuasiNewton is used. In some cases, EM is used. 


Thank you! I’m still struggling with the algorithm being used. In the MSDOS window it indicates that EM is being largely invoked when there is not missing data. When there is missing data the window indicates EM is mostly being used along with QN. Also, I thought FIML was a direct method in that model parameters and standard errors were estimated directly from the data and the EM algorithm is an indirect ML in that it provides ML estimates of the covariance matrix and mean vector that is used for further analysis. So my confusion is in that FIML is being used along with the EM algorithm. Also: (1) Missing data modeling is handled the same in terms of MLE methods with hierarchical growth data when using the multilevel and SEM framework? (2) Can auxiliary variables be incorporated into the Mplus approach that uses multilevel modeling? 


One more question. What about the algorithm allows missingness to be modeled at level 2? Is there a reference that discusses this? Is it related to the multivariate approach to growth modeling that Mplus uses? 


One source of confusion is that "EM" is often thought of only as a method to estimate a mean vector and covariance matrix when there is missing data. This is too narrow of a view. EM is an algorithm that is used with maximumlikelihood estimation in general. It is true that it is typically used to estimate an unrestricted mean vector and covariance matrix, which with model fit testing provides the "H1" model estimates to which the fit of the "H0" model is compared. But EM can also be used for H0 models, that is the model you are ultimately interested in. It is used with missing data in a general sense, so for example including latent variables such as random intercepts and slopes which you have in your multilevel example. EM for H0 models can be slow and is therefore often accelerated by algorithms such as QuasiNewton, Fisher Scoring, etc. When that is done, Mplus shows it in the technical output for the iterations. The term "FIML" is also confusing and unfortunate. Fullinformation ML is the usual ML, but for some reason the FI prefix is added in missing data contexts. And ML with missing data  using any algorithm  is as you say estimating the H0 parameters directly, not doing imputations of the missing data first as in Bayesian multiple imputation. So FIML is what Mplus does when ML is requested with missing data, again using various algorithms. (1) The ML missing data principle is not different for single and multilevel models. (2) I don't think so, but that's a support question. Regarding your question about missing data and the algorithm for twolevel modeling, the Mplus group is trying to finish up a paper describing this. 


Thank you Bengt for your detailed answer! Best, Tom 


Hello Linda and Bengt: I have a couple more questions concerning this just so I'm clear. Is FIML being used for both the random and fixed effects? Also, I get the error message below and I'm not sure what to make of it because it sounds like listwise deletion. Best, Tom Data set contains cases with missing on all variables except xvariables. These cases were not included in the analysis. Number of cases with missing on all variables except xvariables: 10 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS 


Regarding your FIML (that is ML) question, I can interpret that 2 ways. (1) you are asking if ML is used both for models with fixed and models with random effects. Then the answer is yes. (2) you are asking if ML is used both for parameter estimation and random effect estimation for each individual in the sample. Then the answer is no. ML is used for parameter estimation. The random effects are not parameters of the model (their means and variances are), but the individual effects can be estimated after the model has been estimated. The individual effects are estimated using the posterior distribution (given the model and the data), typically using the expected value of this distribution. This is also referred to as Empirical Bayes. Regarding the x variable missingness, yes such cases are deleted. Missing data handling of the x's need to make assumptions about the distribution of the x's and this is not part of the original model. You can however bring the x's into the model (by mentioning their means or variances) in which case the default is to add a normality assumption and missingness is handled as we have discussed for dependent variables. 


Thank you Bengt! As always you and Linda provide timely and lucid responses. Best, Tom 

Hemant Kher posted on Wednesday, November 03, 2010  4:35 am



Hello Professors Muthen & Muthen, I have a question on LGM analysis .. I have 2 repeated variables in my data set – y1y4 and x1x4 – each recorded at 4 occasions. There’s missing data in my sample. When I fit a growth model to each, n is 230; with a dual growth model again n is 230. But, when I fit a growth model for y1y4 with x1x4 as a TVC, n shrinks to 102 .. this coincides with the fact that 102 student provided data at all 4 points. The Mplus message I get is: "Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 128 Any insights on why sample shrinks in the different models? Thanks for your time; much appreciated. Hemant 

Hemant Kher posted on Wednesday, November 03, 2010  5:29 am



Hello again Professors Muthen & Muthen, I would like to take my question (above) back. On the discussion board there is a suggestion from you to include the Xs in the model by mentioning their variances. I did this and the problem of shrinking sample size appears resolved. Now the output lists a WITH for each X variable (e.g. X1 with I, S; X2 with I, S, etc.). So a followup question in this case: does the presence of the abovestated WITH estimates, which I did not really request for, alter interpretations of the TVC model? Thanks for your time as always. 


If you do not want x1 correlated with i, say: x1 WITH i@0; 

Hemant Kher posted on Wednesday, November 03, 2010  1:11 pm



Thank you for a quick response Dr. Muthen. When I include statements like above, the fit deteriorates significantly. Thus keeping xvariables in the model and letting them correlate with growth parameters seems a better option. My only concern is if the interpretations are affected by the presence of the correlations; my inclination is that they are not. Hemant 


Bringing the timevarying covariates into to the model like this highlights the shortcomings of the typical assumptions made in multilevel modeling. For instance, it is likely that a timevarying covariate influences the slope growth factor. Now, a timevarying covariate measured at time t (t>1) could of course not influence the growth intercept if that is defined at time 1 as customarily done. But a timevarying covariate measured at time 1 could at least be correlated with the growth intercept. See also the handout from our Topic 3 course on growth modeling (slide 138 and on), where data on math development is analyzed by alternative models. 

Dex posted on Wednesday, October 14, 2015  8:57 am



Hi Dr. Muthens, I was wondering when using FIML to deal with missing data in SEM, does Mplus 7 produce the robust S.E by default? Thanks 


Use Estimator = mlr; 


Hello  I wanted to ask a question about MLR and missing data. How much missing data is too much for MLR? I am doing CFA if the type of analysis matters. Thank you! 


It depends on too many factors to give a general statement. The more missing data you have, the more you rely on distributional and other assumptions and the less you can rely on your data. You are certainly better off typically with for example less than 20% missing and worse off with for example 40% missing. The coverage information we give also shows the missingness (or rather its opposite, coverage) not only for each variable but also for pairs of variables which is relevant for factor analysis since information on the estimates largely come from pairs (covariances/correlations). Have a look at chapter 10 of our RMA book to learn the ins and outs. 

Back to top 