Message/Author 

yawen posted on Thursday, November 12, 2009  5:04 am



Dear Dr Muthen, I am doing a basic latent class analysis of crosssectional data. Sample size is 2000. Some independent variables have missing values, but none of the indicators of latent class is missing. In the section of variable, I typed: "Missing are all (9999); and in the analysis section, I typed "type=mixture." I run a couple models. Each turns out has different number of observations since I use different sets of independent variable. The user guidebook said: The default is to estimate the model under missing data theory using all available data. I have the following questions. 1)Were all available data, including variables with missing values, utilized if I put command in the abovementioned way? I would like to have all of the available information used. 2)My models have different number of observations, all of which are smaller than my sample size. Should I report both number of observations and sample size in my tables? 3)The reason I run different models is to see if the coefficient of a specific variable is stable. But is it appropriate to report models with different numbers of observation (the same data set) in one paper? Or should I drop all of the cases with missing variables to fix the number of observation? Thank you in advance. yawen 


You can report that results for different sample sizes or you can bring the covariates into the model and treat them as dependent variables. This is done by mentioning their variances in the MODEL command. This is similar to multiple imputation. 


Dear Dres Muthén, I have a question about proper treatment of missingness due to panel attrition. I have a 2variable2wave crosslagged model with repeated measures for X and Y, i.e. X1 and Y1 (at t1) are associated with both X2 and Y2 (at t2). Some missingness at t2 is due to panel attrition. My questions: 1. Is it sensible to use a "FIML" approach with panel data? 2. What N am I supposed to report? Giving the full sample size reported by Mplus looks weird because it would suggest that there is no panel attrition. I would appreciate any suggestions. Thank you in advance, Oliver Arránz Becker 


1. Yes. 2. Report the full sample size along with the fact that you are using FIML. Also give the coverage. 


Thanks a lot for your advice to the last questions. In the same paper, we also present a discrete time event history analysis (with a dichotomous event indicator as DV and several covariates). On the basis of other postings on this board, we think there are two options to avoid listwise deletion of missings: 1. probit model (WLS) including predictors by mentioning their variances, or 2. estimate the model via ML with Monte Carlo integration. My questions: 1. Which of the two methods would you recommend (if you know of any references, that would be great)? 2. Is the second approach essentially a FIML estimation? Or how are we supposed to describe the method in the paper? 3. Can you provide any reference as to why Monte Carlo integration is necessary in this case? Thank you so much for your help, best regards, O. Arránz Becker 


You must use maximum likelihood to estimate a discretetime survival model. This means that the missing data estimation is done using FIML. If you don't want observations with missing values on observed exogenous variables removed from the analysis, you can mention the variances in the MODEL command. They will then be treated as dependent variables and distributional assumptions will be made about them. Monte Carlo integration is required when there are missing data on mediators. 


I guess I misstated my previous question. We do not use a latent variable framework for the discretetime event history analysis but rather have just one event indicator (i.e., there are only 2 timepoints, covariates are measured at t1, event at t2). So the analysis is essentially a path analysis with a dichotomous outcome. What is the difference between an FIML and a WLS estimator in this case, and which one is preferable? How are missings treated in WLS if exogenous variables are mentioned in the model command? Thank you so much for your support, best regards, O. Arránz Becker 


With weighted least squares, missing data are handled using pairwise present information. Given the choice, I would use FIML. 


There appears to be a difference in the way that Mplus 6 reads in "Number of observations" compared to Mplus 5 with regard to missing values. I have found that in Mplus 6 my "Number of observations" was reduced to only those individuals who had complete "xvariables" while in Mplus 5, it gives the total n in the dataset regardless of what actual model is being fit. It appears that whereas before these observations with missing xvariables would be included in the "Number of missing data patterns", now those individuals simply are not included at all in analyses. Hence for a basic SEM, I am getting different results whether I use Mplus 5 or Mplus 6 presumably because of this difference in the way missing data are handled. Is there a way in Mplus 6 to have individuals with missing xvariables included in FIML without resorting to multiple imputation? 


If you specify the variances of the covariates in the MODEL command, you will obtain the Version 5 results. In Version 6, with all continuous outcomes and maximum likelihood estimation, we started estimating the model conditioned on the covariates to be in line with the rest of Mplus and regular regression. It is the case that with all continuous outcomes, maximum likelihood, and no missing data estimating the model conditioned on x or jointly obtains the same results. 


Hi Linda So I did as you suggested and "specified the variances of the covariates in the MODEL command" simply by adding another line listing them with a semicolon, e.g. x1 x2 x3; I see this change in the default from Mplus 5 to Mplus 6 as a substantial change in the handling of missing data. Basically, if I understand correctly, now by default missings on xvariables are handled with listwise deletion, while missings on yvaraibles are handled with FIML. Previously all variables were handled with FIML. Perhaps I missed it, but it might be useful to make this change more prominent on your UPDATES in Mplus 6 page. 


The handling of missing data has not changed. Missing data theory applies to dependent variables only. It has never applied to independent variables. Prior to Version 6 for models with all continuous variables and maximum likelihood estimation, all variables were treated as dependent variables for the reason stated above. 

JEP posted on Wednesday, October 24, 2012  5:41 pm



I have a dataset from a survey I conducted with missing data (N=610). I ran a CFA with 12 binary variables and the WLSMV estimator. f1 by y1 y2 y3; f2 by y4 y5 y6 y7 y8 y9; f3 by y10 y11 y12; My resulting N for the CFA was 572. I followed up my CFA with the following: f1 by y1 y2 y3; f2 by y4 y5 y6 y7 y8 y9; f3 by y10 y11 y12; f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9; which results in an N=418 because there are 192 cases with missing on xvariables. However, I receive an error that says: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. When I rerun it like this (per advice I received): f1 by y1 y2 y3; f2 by y4 y5 y6 y7 y8 y9; f3 by y10 y11 y12; f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9; x1 x2 x3 x4 x5 x6 x7 x8 x9; my N=610. This model runs successfully and has good model fit. However, I'm not sure I can justify presenting a CFA with 572 cases followed by a structural model with 610 cases  or can I? Is the latter method appropriate/ justifiable? I am new to SEM and Mplus so any thoughts on this matter would be greatly appreciated. Thank you. 


I would use 572 for both analyses. This would eliminate the observations with missing on the y's and having data for only the x's. 

JEP posted on Thursday, October 25, 2012  10:22 am



Hi Dr. Muthen, That would be fantastic. I'm not at a computer with Mplus right now but would the syntax be: f1 by y1 y2 y3; f2 by y4 y5 y6 y7 y8 y9; f3 by y10 y11 y12; f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9; y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12; Thank you. 


You would need to use the USEOBSERVATIONS option to exclude observations with missing on all of the y's and then use the following: f1 by y1 y2 y3; f2 by y4 y5 y6 y7 y8 y9; f3 by y10 y11 y12; f1 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f2 on x1 x2 x3 x4 x5 x6 x7 x8 x9; f3 on x1 x2 x3 x4 x5 x6 x7 x8 x9; x1 x2 x3 x4 x5 x6 x7 x8 x9; 

John Perry posted on Friday, April 11, 2014  4:06 am



Dear Drs Muthen, hopefully I am missing something obvious here... I have a sample with 342 participants and no missing data. However, in a simple ML SEM, I am getting 171 observations in the output (exactly half). I'm really not sure what I'm doing wrong here because I've not come across this before. John 


Probably some data reading issue. Please send data, input, output, and license number to Support. 

Back to top 