Missing data PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Allison Tracy posted on Wednesday, February 13, 2002 - 10:48 am
Version 2.02 allows missing data modeling when a latent mixture model is fit to data with a complex sampling design. Can missing data be handled with other models using complex data? I thought I read in the manual that it can't, but maybe the missing data feature has been added to other models in v. 2.02.
 Linda K. Muthen posted on Wednesday, February 13, 2002 - 2:50 pm
No, missing cannot be handled by the regular TYPE=COMPLEX or TYPE=TWOLEVEL. However, you can use TYPE=MIXTURE COMPLEX MISSING with one class and thereby get missing for COMPLEX.
 Allison Tracy posted on Friday, February 15, 2002 - 10:02 am
Can multiple groups be analyzed using the TYPE=MIXTURE COMPLEX MISSING?
 Linda K. Muthen posted on Friday, February 15, 2002 - 10:35 am
For TYPE=MIXTURE, the training data feature can be used to define groups.
 Maggie posted on Friday, April 02, 2004 - 5:01 am
I am designing a two-level SEM and I have much much missing data of the independent variables at within level. But there is no missing data at between-level independent vairabels. Can this model still be handeld by Mplus version 3? or should I replace all of the missing data at with-in level before runing the program? Thanks for suggestion.
 Linda K. Muthen posted on Friday, April 02, 2004 - 5:54 am
This can be handled by Mplus Version 3. You should treat the x variables as y variables. The normality assumption changes from normality given x to overall normality. You change them to y's by mentioning their variances in the MODEL command. Then use TYPE = TWOLEVEL MISSING;
 Maggie posted on Monday, April 05, 2004 - 2:01 am
So you means that I can add the variances of within-level X variables into the MODEL command of within part. I.e.:

fw BY Y1-Y3;
fw ON X1;
fw ON X2;
fw x1-x2;

and for the between part of model,I needn't to add the variance of X variables (given that there is no missing data in between part). Am I correct?

Thanks again.
 Linda K. Muthen posted on Monday, April 05, 2004 - 7:43 am
Yes. What you did is correct.
 Blair Beadnell posted on Saturday, April 24, 2004 - 8:56 am
We are running a structural equation model with clustered data (teenagers clustered within schools) using TYPE=COMPLEX modeling. We have missing data for some schools. Are there any special concerns we should consider when the missing data is at the second level, other than the usual things, like coverage and that missingness is at least MAR, when using TYPE=COMPLEX MISSING?
 bmuthen posted on Saturday, April 24, 2004 - 9:03 am
No special concerns, just the usual ones.
 Maggie posted on Tuesday, May 11, 2004 - 10:00 am
Questions again from a new user of Mplus 3. Could I return to the previous question posted on April 05? As you suggested, I add the variances of Xs in the MODEL command, but the output suggests me to use ALGORITHM = INTEGRATION; INTEGRATION= MONTECARLO in the ANALYSIS command. I refer to the example of MPLUS short curse: multilevel regression model,page 52, in the input command, there is no specification of variance of missing data and also no ALGORITHM command although in the VARIABLE, missing data is mentioned. So in general, in which situation should I add vaiances of missing Xs? In fact, after I add ALGORITHM = INTEGRATION;
INTEGRATION= MONTECARLO into the analysis, no any output comes out, only shows that "INPUT READING TERMINATED NORMALLY" (I put output option as SAMPSTAT TECH8).

In this case,is it necessary to run Monte Carlo simulation to generaing the missing data?

If possible, could you please suggest me one complete example of Two-level with Random and dealing with missing data? This perhaps can enable me ask you less questions concerning the similar issues.

Thanks in advance for your kind response.
 Linda K. Muthen posted on Tuesday, May 11, 2004 - 10:36 am
We don't have examples that show MISSING. You just need to add it to the TYPE option of the ANALYSIS command. I suspect that your outcomes are not continuous and that is why numerical integration is required. Please send your output to support@statmodel.com if you want me to look at it.
 Mpduser1 posted on Wednesday, September 28, 2005 - 11:51 am
I'm building at multilevel SEM with two endogenous variables, Y1 and Y2, both of which are prone to missingness, and both of which have WITHIN and BETWEEN sources of variation. The missing data rate for Y2 is much higher than the missing data rate for Y1. Y2 is categorical, Y1 is ordinal.

My question is this: Does Mplus 3.13 use information from both the WITHIN and BETWEEN portions of the model when adjusting the maximum likelihood calculations to account for the missing data ?.

I ask because this could greatly influence my variable selection / modeling strategy.

Thank you.
 bmuthen posted on Wednesday, September 28, 2005 - 9:06 pm
The answer is yes. That is how maximum-likelihood estimation under the standard "MAR" assumption works.
 anonymous posted on Monday, January 16, 2006 - 10:13 pm
Hi there-

I am running a multinomial logistic regression analysis (nominal dv; using missing and complex estimation) and wish to compare if two of my three-way interaction betas are significantly different from one another. For example, I have a 3 level dv (one is the reference) and I have a 3-way interaction which is statistically signficant when comparing the first level to the reference group and not significant when comparing the second level to the reference. I wish to know if the 2 betas are significantly different from one another. Any ideas?
 bmuthen posted on Tuesday, January 17, 2006 - 10:54 am
You compare the log likelihood (LL) of your model with a model where you constrain your betas to be equal (using the usual Mplus approach to equality constraints). Then use 2* LL as an LRT chi-square test of the equality with df = the difference in the number of parameters of the two models.
 Leif Edvard Aarø posted on Thursday, May 11, 2006 - 4:12 pm
Dear Bengt and Linda,

I have developed an SEM with TYPE COMPLEX (cluster data), and ESTIMATOR = MLR. Since I specified MISSING ARE ALL (-9), I assume that there has been a listwise deletion cases. The n varies nicely with the number of variables (with missing) that is used in the analyses.

Since I have missing, and would like to use a method equivalent to FIML, I have tried to specify TYPE = MISSING H1. Mplus gives no error message or warning, but simply responds with silence.

The relevant commands look like this:

TYPE = complex;
TYPE = missing h1;

I have tested out various ways, for instance this one:

TYPE = complex missing h1;

Nothing seems to help.

Any advice?

Best regards

 Linda K. Muthen posted on Thursday, May 11, 2006 - 4:46 pm
I am not sure what you mean by nothing seems to help. H1 is not used with MLR. So you would say:

 Leif Edvard Aarø posted on Thursday, May 11, 2006 - 10:26 pm
Hello again, Linda,

Same lack of response from Mplus.

Here are all the relevant commands. All functions well until I insert the word "missing" on the "TYPE =" line.

Any advice?



TYPE = complex missing;
CONVERGENCE = 0.00005;


 Linda K. Muthen posted on Friday, May 12, 2006 - 10:40 am
I need to know what you mean by lack of response and nothing seems to help. These don't tell me what you expect to happen that is not happening.
 Leif Edvard Aarø posted on Sunday, May 14, 2006 - 1:53 pm
Dear Linda,

Sorry for not providing sufficient information in my previous question. I have been able to solve the problem by rewriting the syntax.

Thanks for your patience!

 Daniel E Bontempo posted on Thursday, January 18, 2007 - 5:44 pm
I thought v4.0 supprted MISSING for TYPE=COMPLEX (as opposed to using the MIXTURE approach mentioned above).

However there seems to be a listwise deletion of cases where one of my predictors is missing.

TITLE: Effect of Clustering;

DATA: FILE = "c:\projects\PAYS CTC-YS\Select_PAYS.dat";

VARIABLE: NAMES = ID u4 u6 fr4 ip5 schoolid Year Grade CTCstat Poverty ;

usevariables = u4 u6 ctcstat poverty;
useobservations are grade==6 and year==2003;
categorical are u4 u6;

cluster = schoolid;
idvariable = id;
missing are all (99);

ANALYSIS: TYPE=complex missing ;


u4 on ctcstat poverty;
u6 on ctcstat poverty;

Output: stand;

Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 482
 Linda K. Muthen posted on Friday, January 19, 2007 - 10:33 am
There will always be listwise deletion of cases with missing on covariates because the model is estimated conditioned on the covariates. Means, variances, and covariances of the covariates are not estimated as part of the model. No missing date theory exists for covariates. If you don't want cases with missing of the covariates to be deleted, you need to bring the covariates into the model by mentioning their variances in the MODEL command. Means, variances, and covariances will then be estimated for them. In addition, distributional assumptions will be made about them as for any dependent variable.
 student07 posted on Friday, July 27, 2007 - 8:33 am
I'd like to ask how Mplus deals with missing values for x-variables (covariates) which are measured only on the between-level when using TYPE= twolevel?

Thanks in advance!
 Linda K. Muthen posted on Friday, July 27, 2007 - 10:10 am
Any observation with a missing value on a covariate is eliminated from the analysis.
 student07 posted on Monday, July 30, 2007 - 7:01 am
Thank you very much for your response to my earlier question - I now found that when using "type= twolevel missing", no chi-square statistics/ CFI or TLI are reported in the output. Am I doing something wrong here? Or Is there any possibility to request CFI TLI when using "type= twolevel missing"?

Many thanks for your response.
 Linda K. Muthen posted on Monday, July 30, 2007 - 7:59 am
Because means, variances, and covariances are not sufficient statistics for model estimation with multilevel missing, chi-square and related fit statistics are not available.
 student07 posted on Monday, July 30, 2007 - 8:17 am
thanks, Linda. One more question: Is there any standard protocol how to report the adequacy of models estimated using 'type = twolelve missing'?
 Linda K. Muthen posted on Monday, July 30, 2007 - 9:13 am
When fit statistics like chi-square are not available, nested models can be compared using -2 times the loglikelihood difference for the two nested models.
 Ronny Scherer posted on Thursday, October 28, 2010 - 7:32 am
Dear Mplus developers and experts,

I'm trying to carry out a twolevel analysis with data of a pre-post-and-follow-up design in an intervention study. There are three groups (on control group and two treatment groups) on level 2 (operationalized as two dummy variables which predict the dependent variable on level 2).
My question is: How can I do a twolevel analysis with taking missing data into account? Is there something like a syntax such as "TYPE=MISSING" for the twolevel approach?

Best regards,
 Linda K. Muthen posted on Thursday, October 28, 2010 - 10:29 am
The default since Version 5 is TYPE=MISSING for all analyses.
 Kätlin Peets posted on Thursday, February 17, 2011 - 11:58 am
I have a question. My model looks like that


Laused2 on sugu ;
Laused2 on Reading0 ;
Laused2 ON Math0;
Laused2 ON Avoid0;

reading0 avoid0 math0 AAA;

Laused2 on Reading0 ;
Laused2 on Math0;
Laused2 on Avoid0;
Laused2 ON AAA;! between-level predictor

Thus, I specify reading0, avoid0, math0, and AAA as part of the model in order not to lose cases with missing values on covariates. Model modif. indices suggest that I would specify correlations/covariances between avoid0, reading0, and math0. However, when I do so, my model parameters (especially between-level slopes) change. Why is it so?
 Bengt O. Muthen posted on Thursday, February 17, 2011 - 4:32 pm
Not including those correlations may give a strongly misfitting model - and as such its parameter estimates are not trustworthy.
 Peggy Clements posted on Saturday, March 12, 2011 - 12:58 pm
Does the MISSING default in version 5 handle missing data differently for


than for a

TYPE = GENERAL analysis?

I've used Mplus for years, but always for SEM or LGM. I'm trying to analyze data for a school-level randomized control trial, in which students have a pre-test and a post-test. However, the output includes the following warnings:

Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 327
Data set contains cases with missing on all variables except
x-variables. These cases were not included in the analysis.
Number of cases with missing on all variables except x-variables: 56

Why is it excluding these cases if I do not have LISTWISE = ON?
 Linda K. Muthen posted on Monday, March 14, 2011 - 4:20 pm
In GENERAL prior to Version 6, the model was not estimated conditioned on the observed exogenous variables as is done with TWOLEVEL RANDOM. Starting with Version 6, all models are estimated conditioned on the observed exogenous variables.

Missing data theory applies only to dependent variables. This is why observations with missing on observed exogenous variables are excluded. See the 6.1 Version History for further information.
 Kätlin Peets posted on Saturday, April 09, 2011 - 6:00 pm
I specify all the possible covariances between my covariates (at the within and between level) to be able to include all the cases in my analyses (when I mention only variances of x-s instead of covariances, the model fit is very bad). However, I get an error message:




Can I just ignore it?
 Linda K. Muthen posted on Sunday, April 10, 2011 - 10:16 am
We do not know the impact of having more parameters than clusters. This has not been studied. Certainly you don't want more between parameters than clusters because the number of clusters is the number of independent units.
 Kätlin Peets posted on Sunday, April 10, 2011 - 11:27 am
But I understood that the parameters might be untrustworthy if I don't include the covariances. What could I do? Could I just leave out some covariances (and examine the model fit)?
 Linda K. Muthen posted on Monday, April 11, 2011 - 9:12 am
If you include the covariates in the model, you must estimate the means, variances, and covariances of these variables. Perhaps you would be better off losing the observations that have missing data on the covariates.
 Kätlin Peets posted on Monday, April 11, 2011 - 10:39 am
I could, but my sample size decreases by 30%. I considered using MI. However, I need to know covariances for my parameter estimates (Tech 3 output gives a covariance matrix for each of my imputed data sets) to estimate simple slopes. And, I did not know how to get such an estimate.
 Kätlin Peets posted on Monday, April 11, 2011 - 12:04 pm
I have another question. Why are the cases with missing values on y deleted?

I get the following error message:

Data set contains cases with missing on all variables except
x-variables. These cases were not included in the analysis.
Number of cases with missing on all variables except x-variables:
 Linda K. Muthen posted on Monday, April 11, 2011 - 2:20 pm
Missing data theory applies to dependent variables. If an observation has missing data for all dependent variables, that observation contributes nothing to the analysis.
 Stephen Short posted on Sunday, April 24, 2011 - 2:39 pm
I'm using the montecarlo feature of mplus to generate a 2 level model with 3 level 1 predictors (2 fixed and 1 random) and 1 level 2 predictor.

I'm interested in creating 10% and 30% missingness across either the level 1 predictors, the level 2 predictor, or across both.

When I use the PATMISS and PATPROBS commands, mplus informs me for analysis=twolevel random I must use montecarlo integration. However, when I use this integration I have several errors in the tech 9 output.

I've attempted using the missing= and MODEL MISSING: commands, but have not had much success.

What would be the best way to create 10% and 30% missingness on my multilevel data?
Thank you for your time.
 Linda K. Muthen posted on Monday, April 25, 2011 - 6:18 am
Please send your output and license number to support@statmodel.com so I can see what you are doing and the errors you are receiving.
 Lisa M. Yarnell posted on Monday, August 15, 2011 - 9:55 am
Hello Drs. Muthen,

I have some variables measuring depression and acitivities of daily living, which I believe have some missing data. I will be creating percents based on total scores these scales because they are frequency scales (not truly continuous). The depression scale ranges from 0 to 3 for each of 9 items; the activities of daily living scale ranges from 0 to 2 for each of 5 items.

If I use the define statement at the beginning of my program, as below, will Mplus, by dafault, replace missing items with the maximum likelihood-estimated value for that item? OR should I handle missing data in SAS prior to exporting my data to Mplus for analysis? Thanks for your help!

DEFINE: depress = (dep1 + dep2 + dep3 + dep4 + dep5 + dep6 + dep7 + dep8 + dep9)/18;
daily = (daily1 + daily2 + daily3 + daily4 + daily5)/10;
 Lisa M. Yarnell posted on Monday, August 15, 2011 - 10:00 am
P.S. I also see that I can use this "DEFINE" function:
variable = SUM(list of variables);

I just wonder how Mplus will handle missing data in doing this sum.
 Linda K. Muthen posted on Monday, August 15, 2011 - 10:04 am
Any observation that has a missing value on one or more of the variables being summed is assigned a missing value on the sum variable.
 Lisa M. Yarnell posted on Monday, August 15, 2011 - 11:16 am
Thanks, Dr. Muthen. I could be making a silly mistake, but when I use this define cOmmand, I get no variance on the resulting variable. I summed across the depression items, then divided by the total possible score of 3*9=27 to create a percent which we could then be divided into four categories for the resulting percent. (For this project, we wanted four categories for depression.) But I end up with DEPRESSC variable that has no variance, so Mplus won't run the model for depression.

There WAS variance on the original DEPRESS sum variable, and not all persons would fall into category 1. Is there some obvious mistake that I am making?

My code:

H3SP12 H3SP13)/27;
IF .25 <= DEPRESS < .50 THEN DEPRESSC = 2;
IF .50 <= DEPRESS < .75 THEN DEPRESSC = 3;

The error message:
One or more variables have a variance of zero.
Check your data and format statement.

Continuous Number of
Variable Observations Variance

PERC_HL3 9419 0.737
**DEPRESSC 9388 0.000
 Linda K. Muthen posted on Monday, August 15, 2011 - 11:30 am
I think the problem is that your statements are not being parsed because they are not stated correctly. It should be:

IF (depress GE 0 and depress is LT .25) THEN depressc = 1;
 Lisa M. Yarnell posted on Monday, August 15, 2011 - 12:39 pm
Thanks, Dr. Muthen, I'll try this!
 Jaime Puccioni posted on Tuesday, December 06, 2011 - 3:55 pm

I am running a latent growth curve model using complex survey data (ECLS-K). I received this warning:

Data set contains unknown or missing values for GROUPING,
PATTERN, COHORT and/or CLUSTER variables.
Number of cases with unknown or missing values: 2175

I reviewed the data and yes there are 2175 observations missing data for the strata and psu. These observations also have a weight of 0. It must be something with the sampling design of ECLS-K.

Is there anything i can or should do to make sure these values are included in the analysis? Based on the output it appears that they are not included in the analysis.

thank you,

 Linda K. Muthen posted on Wednesday, December 07, 2011 - 11:49 am
I would contact ECLS-K to see why they have weights of zero.
 Kätlin Peets posted on Monday, June 11, 2012 - 11:46 am
I am conducting multilevel modeling with random slopes. Let's say I regress y on x and z. And, y on x is treated as random (varies between classrooms). However, I have missing data on my y. I have heard that I could potentially regress z on x to include more cases in my analyses (using FIML). I tried it and it worked. Is this allowed?

Thank you,
 Linda K. Muthen posted on Monday, June 11, 2012 - 5:00 pm
FIML requires more than one dependent variable. That is why your second model used FIML and your fist model did not.
 Paul Tremblay posted on Saturday, November 24, 2012 - 9:14 pm
I have an aggression variable at the within level and I want to create an average cluster aggression score to use at the between level. I understand that Mplus does this automatically (by not specifying this variable as within or between). My question is how does Mplus handle missing observations at the within level (e.g., level-1 aggression scores missing for a few individuals within each cluster). More specifically, is the average value based simply on the average of the non-missing observations or are the missing observations somehow estimated first using the standard ML missing procedure?

Related to the above, when would someone use Define Cluster_Mean instead of having Mplus calculate the between level values automatically?

Thank you.
 Linda K. Muthen posted on Sunday, November 25, 2012 - 9:19 am
When you don't put an individual-level variable on the WITHIN list, an average cluster score is not created, a latent variable decomposition is done. See Examples 9.1 and 9.2. To create an average cluster score, use the CLUSTER_MEAN option in DEFINE. For each cluster, the value is the average of the non-missing values in each cluster. If all values are missing in a cluster, the value is missing.
 Paul Tremblay posted on Sunday, November 25, 2012 - 11:11 am
Thank you. Would you say that the latent variable decomposition is a better approach than using cluster_mean option? Is one procedure better than the other with missing data in level 1 observations?

As an aside, I find the new diagrammer in Version7 extremely useful for preparing course slides to present multiple examples.
 Linda K. Muthen posted on Monday, November 26, 2012 - 9:44 am
I don't think missing data handling is the deciding factor here. See the following paper which is available on the website:

Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229.
 Katerina Gk posted on Wednesday, October 16, 2013 - 3:54 am
Dear Linda,

I got twolevel random type of mondel with missing data,
Missing are all (999);

indpara1 | par_b XWITH a1_b ;
indpara2| par_b XWITH a2_b ;
indpara3 | par_b XWITH a3_b ;

When I dont have the interaction and so I get type is two level and estimator=WLSMV, in the beginning, the programme read quickly the model and then take some time to converge but it gives the output,
BUT now adding type is random and the interaction and changing the estimator to ML, mplus read very slowly the models giving one by one the iterations so I was thinking that is something wrong because of the missing data and estimator=ML.

1)Am I right saying that the programme must read the model quicklier in the beginning?

2) If yes, could you please recommend me something to fix the error.

Hope I make clear where is my problem!

Thank you very much for your help
 Linda K. Muthen posted on Wednesday, October 16, 2013 - 3:59 pm
With ML and categorical outcomes, numerical integration is required. I would test the interactions one at a time and keep only those that are significant.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message