Message/Author 


Are Missing data now allowed in, for example multilevel models with sample weight in Mplus 3? Thanks in advance. Best, Kajsa. 


Yes. 

CW posted on Tuesday, August 24, 2004  7:02 pm



Hello, I am fitting a CFA with 2 factors, 20 ordinal indicators, and missing data, with version 3. Though I don't fully understand the missing data methods in v.3 for categorical indicators, I think I prefer the EM ML variety over WLSMV with pairwise deletion. I've read the descriptions of the ML, MLR and MLF estimators on p. 401 of the manual, but I don't know how to pick which one to use. Do you have any suggestions for how to choose? 


I would use MLR. 

CW posted on Tuesday, August 24, 2004  9:51 pm



Ok, thanks for the reply. Why that one? Also, I tried to fit a onefactor model with correlated errors using MLR and it said: *** FATAL ERROR RECIPROCAL INTERACTION PROBLEM. What does this mean? 


MLR gives robust standard errors. I would have to see your output to answer the question about the error message. Please send it to support@statmodel.com. 

Anonymous posted on Tuesday, November 02, 2004  1:46 pm



Is there a Mplus technical report and / or Muthén paper available which details the ways Mplus handles MAR missing data in a MLSEM ? 


Techincial Appendix 6 which is available at the Mplus website discusses missing data. Also, the following reference: Little, R.J., & Rubin, D.B. (2002). Statistical analysis with missing data. Second edition. New York: John Wiley & Sons. 

Anonymous posted on Tuesday, November 02, 2004  3:03 pm



Thank you. I sounds like, in short, Mplus uses the conventional EM fix  substituting in the sufficient statistics during the Estep, iterating, resubstituting, etc.. I assume this same procedure is used for missing data both at Level1 and Level2 of a MLSEM ? 


Yes, it is. 


In my RCT, our participants are assessed at waves 1, 3, and 4. Missingness was not a problem from w1 to w3. But now, at w4 or followup, all the "problematic" children in the control group have been lost, leaving us with a "super control group" to which our experimental group is being compared. Both Wave 1 and wave 3 problemscales predict dropout in the control group (the greater the problems the more likely they are to drop out), but are unrelated to dropout in the exp group. Please help me with the specific Mplus command that takes this "uneven missingness" into consideration. Do I have to apply weights to anything? If so, how do I calculate the weights? Thank you! 


In principal, the MAR assumption of TYPE=MISSING handles this because missingness is a function of observed outcomes at prior timepoints. 


Hello. I am analyzing decline in word recall scores in an elderly population over a decade with measurement taken every two years. I am attempting to use a WuCarroll selection model to control for the relationship between the outcome and dropout. All individuals were alive at baseline (t0) measurement. When I regress dropout (t1t5) on intercept and slope, the model does not converge. When I regress dropout (t2t5) on intercept and slope, the model converges. Why would the inclusion of dropout at t1 cause problems? Thank you. dd(t)= 0  observed, 1  dropout at time t, 99  dropout at previous time i s  totrec98@0 totrec00@.2 totrec02@.4 totrec04@.6 totrec06@.8 totrec08@1; [i*8.607 s*2.802]; dd00dd08 on i s; 


For the variable dd98, what is the proportion of zeroes and ones? 


Hi Linda, For dd98, there were no missing cases. For dd00, 90% were observed (0's) and about 10% of cases were missing (1's). For dd02, 77% were observed (0's), 13% went missing (1's), and 10% were missing from previous wave (99's). 


If there are no dropouts at the first time point, don't use that dummy variable. 


Hi Linda, I am not using the dropout indicators for the first wave of measurement (taken in 1998). I am attempting to use the dropout indicators from 20002008, which I was successfully able to do using the DiggleKenward selection model. When attempting to use the WuCarroll selection model, I was not able to get the model to converge using dropout indicators from 20002008. However, when I did not include the dropout dummy for 2000 and only included dropout indicators from 20022008, the model successfully converged. 


Please send the outputs and your license number to support@statmodel.com. 


Dr. Muthen, What is the best way to determine whether the degree to which the estimated covariance matrix using MLR replicates the original matrix? Is there a certain fit index I should be referring to? Thank you, Suzanne Elgendy 


I assume that by original matrix you mean the sample covariance matrix. If means, variances, and covariances are the statistics used for model estimation, all fit statistics examine this. 


Hello, I am currently using a number of NMAR models (DiggleKenward, WuCarroll, and pattern mixture) to control for nonrandom missing data on the Y variable in my study. When implementing NMAR models, what options do I have for handling missing data on my X variables? I am losing a large number of observations because respondents have Y observations without X observations and the missing data on X seems to be handled through listwise deletion. Thank you. Nicholas Bishop Arizona State University 


You could use DATA IMPUTATION to impute values for the missing x's or you could include the variances of the x's in the MODEL command. Doing this means they are treated as dependent variables and distributional assumptions are made about them. 


Thank you Linda. When I include the X's with missing data in the model command, I receive an error "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY...". The warning points me to the PSI matrix diagonal for one of the dichotomous x's that I included in the model command (it was not defined as categorical). Do you think the problem is arising due to the use of the binary variable in maximum likelihood estimation? I can also send my information if that will help. Thanks. 


The mean and variance of a binary variable are not orthogonal and can generate a message about nonidentification when the variable is included in the model and is not identified as categorical. I can't say more without seeing the full output at support@statmodel.com. 


Dear Dres. Muthén, I have estimated a model with complete data on the predictor side (7 indicators, a nestedfactor model, N = 1187) and a lot of missing data on the criterion side (11 indicators, a gfactormodel, N = 79). The standardized factor loadings from my 11 indicators on their latent variable on the criterion side (where data is only available for N = 79) are really high in the SEM, mainly between .90 and .99. When I just estimate a gfactormodel CFA for those 79 people with those 11 indicators, the standardized factor loadings range between .56 and .83. Why do the estimates change so much? What is the best way to deal with this problem? Thanks for your help! 


It sounds like the sample statistics are very different for the two samples. 


The 79 people are a subsample of those 1187. The mean of those 79 on the seven indicators on the predictor side is much higher because a majority of them were selected on the basis of their scores on those indicators. Variances are similar. Since I only have data on the criterion side for those 79 people I can't estimate sample statistics for the indicators on the criterion side for the other ones. So the different means on the indicators on the predictor side are the reason why factor loadings on the criterion side change so much in the SEM? Are my estimates reliable? What would be the best way to handle this situation? Thanks a lot! 


The analysis of the criterion variables only (for n=79) is a submodel of the full model that includes the predictor variables. If your model estimates are very different for the criterion variables in the submodel and full model analyses, this probably means that the assumptions of the full model don't fully hold  in particular the covariances between the predictor variables and the criterion variables may not be captured fully by their sets of factors being related. Look for big modification indices in the full model analysis. Because you select the n=79 based on the predictor variables, it would seem that MAR may hold so in principle the full model analysis should give you the right answer. 


Dear Dr. Muthén, thanks for your reply. So far, I've used the option for auxiliary variables (m) where Mplus does not give Modification indices. Hence, I decided to drop the auxiliary option in order to get Modification indices. Now, for my measurement model on the criterion side, this leads to "normal" factor loadings for my 11 indicators. However, one of my path coefficients from the two latent variables which are the predictors changes a lot  from .59 to .81 (the other one remains more or less the same)  this seems pretty unrealistic and I'm not sure what to do with this result. In this model, the highest ModIndices are for residual correlations on the predictor side (where there is no missing data)  not for covariances between predictor and criterion variables. Would you have any suggestions of how to procede? Thank you very much! 


Perhaps your factor variances changed as well, so that in a standardized metric the change wasn't that big. In any case, I would trust this solution. 


The factor loadings that changed where the standardized factor loadings already. So far, I've fixed my factor variances to one in all models in order to identify the model  would you suggest to estimate those variances freely and to fix the factor loading for the first indicator at 1? And then to check whether the variance of the factor on the criterion side changes from the measurement model (n = 79) to the structural model (N = 1187)? 


No need to do that. Note that you haven't fixed the factor variance at 1 for the dependent (criterion) factors  you fixed the residual variance; but that's ok too. All these choices give the same standardized solution. 


I'm attempting to impute data from a cross sectional cohort sequential study. I received the following message in the output. PROBLEM OCCURRED DURING THE DATA IMPUTATION. YOU MAY BE ABLE TO RESOLVE THIS PROBLEM BY SPECIFYING THE USEVARIABLES OPTION TO REDUCE THE NUMBER OF VARIABLES USED IN THE IMPUTATION MODEL. YOU MAY ALSO BE ABLE TO RESOLVE THIS PROBLEM BY INCREASING THE NUMBER OF ITERATIONS USING THE THIN OR BITERATIONS OPTIONS OF THE ANALYSIS COMMAND. SPECIFYING A DIFFERENT IMPUTATION MODEL MAY ALSO RESOLVE THE PROBLEM. I attempted to decrease the number of variables used to impute the data, and received the same message. I would be grateful for some guidance on how to proceed. Thank you! 


As a first step, take a look at the Version 7 UG ex 11.5, page 397, and its use of USEVARIABLES, AUXILIARY, and IMPUTE. As a second step, have a look at the 14 practical tips in Section 4 of the paper on our website: Asparouhov, T. & Muthén, B. (2010). Multiple imputation with Mplus. Technical Report. Version 2. Click here to view Mplus inputs, data, and outputs used in this paper. If you still have problems, please send data, input, output and license number to Support@statmodel.com. 


Dr. Muthen, Thank you very much! 


I have an overall sample of 1600 which has missing data throughout. I am using the estimator MLR for growth modelling. I get these warning messages in the output: WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 772 *** WARNING Data set contains cases with missing on all variables except xvariables. These cases were not included in the analysis. Number of cases with missing on all variables except xvariables: 34 Would it be possible to get some clarification on what MLR does in relation to missing data and what these warning messages mean? Many thanks 


MLR handles missing data under the MAR assumption, that is, using what is commonly called "FIML". That applies to people with missing on some but not all DVs, in the subset of people who don't have any missing on IVs and don't have all DVs missing. If you want to include those with missing on IVs, you can mention those variable names in the Model, thereby extending the model and making stronger, normality, assumptions for the IVs. 


Thank you very much for this information. Is it possible to find out what code is needed to tell Mplus to include missing on IV cases? Many thanks 


If x is an IV, just say x; in the Model command. 


OK thank you very much. I have added this and the missing data are being estimated but now I get the following warning: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.479D11. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 35, APFY9 It names a different variable in each model. Is it possible to find out what this could mean? Many thanks 


That could be due to an x variable that is binary, in which case it is ignorable. 


Dear Dr Muthen, I am reading Muthen & Muthen 2002 paper and I have a basic question. I generated 200 complete data sets with Mplus simulation device, used an external program to create MAR on the same datasets and ran a SEM using Mplus on the data with and without missing. The following pattern was seen: 1) The same sample size is used in the two analyses. This is expected because an inspection of the data showed that no individual has missing in all variables. 2) The log likelihood is however different with lower mean and small SD for the complete data (e.g., M=15476.595 SD=48.47 M=14471.77 SD=71.57 and for complete and incomplete data). I was expecting the log likelihood to be different, for different information is used despite the equality of the sample size. However, I wasn’t expecting constantly higher log likelihood values for the incomplete data (I repeated this experiment with different models and the same pattern is always observed). How can we explain this result? May this be due to the fact that missing information introduces a level of uncertainty in the data which results in a variability in the log likelihood? Do you have any other explanation? Thank you for your help. Regards, Sam 


You may want to ask this on SEMNET. 

Back to top 