Message/Author 


Hi, I have a question about your missing data modeling analysis. If I run an LGM with missing data, am I able to test for MAR? In other words, how can I test whether my data is Missing at random versus missing not at random? Would failure to reject the null hypothesis mean my data is at minimum MAR? Is your Missing data analysis option similar to what I read in Duncan and Duncan (1995: "Modeling the process of development via latent variable growth curve methodlolgy." 


There is no statistical test for whether missingness is MAR. See the book by Rubin and Little for a discussion of the missingdata that is used in Mplus. 


Thanks. Also, are you planning on designing routines that would allow modeling with missing data with categorical observed variables? 


Yes, this will be available in Version 3. 


Thanks so much! You and your husband are very good to the mplus users! 

wendy posted on Wednesday, July 19, 2006  6:55 pm



Hi, Dr. Muthen I am currently conducting a research in terms of growth modeling with one covariate predicting both latent intercept and slope. Therefore, exogenous variable Z predicts I and S. I fit 2 models with a growth modeling and an equivalent model with special pattern of missing data; I would like to compare the parameter estimations, bias, and power of those two models. One unusual characteristic is that the power detecting S regressed on Z of full growth model is less than that of model with missing data and the direct effect of S on Z is insignificant. I suppose direct effects will be less powerful in model with missing data,is it right? Thank you. 


When you say "pattern of missing data" I get the association of the "patternmixture" approach to nonignorable missing data analysis. That is, using dummy covariates to represent missing data patterns so that model parameter values can vary for different missing data patterns. If that is the correct association, I would think that differences in results are due to MAR not holding but that the nonignorable missing data approach is needed. That doesn't seem to be a matter of power alone but also bias. 


I am modeling student examinee effort scores across seven time points (sample size = 1434). I have missing data (MAR) and am using full information maximum likelihood. My univariate skew and kurtosis are normal; however, my Mardia's is large, suggesting multivariate nonnormality (but could be due to large sample size). The MLM estimator appears to require listwise deletion (which makes sense, given that FIML assumes multivariate normality, correct?). In this instance, do I ignore the Mardia's and use MLR? Thank you for any advice or recommendations. Sincerely, Jeanne 


MLR is robust to nonnormality so I don't see that there is a problem. 


We are running an LGM on unbalanced panel data spanning the years 1984 to 2009. Since we are interested in trajectories across the whole life course, we rearranged the data so that there are agespecific indicators for the dependent var, from ages 18 to 65 (wide format). The problem is that, obviously, no respondent is observed across the whole life course (we only have 12% valid measurements on average); hence, Mplus gives an error that the minimum covariance coverage was not fulfilled (even after setting COVERAGE=.01), and the model does not converge. Does that mean that it is not possible to run the LGM in wide data format so we have to switch to a multilevel framework? Thanks in advance, Oliver 


You might consider the multiple group multiple cohort approach shown in Example 6.18. 

Carolin posted on Wednesday, June 29, 2011  5:00 am



Dear Mr. and Mrs. Muthen, I have some questions concerning FIML with GMM. After reading many papers I conclude that it is better to use FIML instead of LD (I hope I am right?!). First question: in GMM the default is FIML, right? Second: Is it ok to use FIML even if I have nonnormal data (I read that FIML has the assumption of multivaraite normality)? I actually thought that nonnormality is often assumed in GMM (because of unobserved distinct classes). Thanks a lot! Carolin 


In most cases you are better off using FIML. Yes, it is the default. For mixture models, the assumption of normality is not for the data. The assumption is that there is normality within classes. 

Carolin posted on Thursday, June 30, 2011  12:26 am



Thanks a lot for your answers! Two related questions: 1) In which cases would you recommend not to use FIML? 2) If there should be normality within classes, it could however be that there is nonnormality in the data. If this would be the case, can I still use FIML (besides the assumption of multivariate normality for FIML)? Do you have any experiences, suggestions? Thanks! 

Carolin posted on Thursday, June 30, 2011  12:40 am



... concerning my second question: or did you mean that the assumption of normality of FIML is related to the classes instead of to the original data set in GMM? 


I would always use FIML because most of the time it is the best and it is not possible to know the infrequent times when it is not the best. The odds are in your favor. The assumption of normality does not apply to the data overall. 

Carolin posted on Friday, July 01, 2011  12:42 am



Thanks a lot, you are really helpful! What do you mean with your last statement? I'm not sure! Do you mean I can relax the assumption in GMM? 


There is no normality assumption made about the observed data. The model assumes normality within classes which implies overall nonnormality. 


Dear Mr and Mrs Muthen I'm trying to estimate a growth model using FIML for the missings, however I get a warning message, saying that cases with missing data are excluded (see below). When I add covariates to the model, more cases are excluded. Is it possible to specify the model in such a way that all cases are included? What can be the reason that FIML excludes cases in a Growth Model (with a dependent metric variable) like this? (A search in the FAQ only generated an example of a count model, but this is another kind of model). *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 1 *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 285 


Missing data theory applies to dependent variables only because the model is estimated conditioned on the covariates. If you don't want to lose cases missing on covariates, you can include the variances of the covariates in the MODEL command. They will then be treated as dependent variables and distributional assumptions will be made about them. Cases with missing on all variables cannot be included because they have nothing to contribute. 


Hello, I've run a growth mixture model on longitudinal data and determined the number of classes. I then tried to add auxiliary variables to test for mean differences/proportions across classes, using DCON and DCAT in version 7.11. The auxiliary variables had missing data, and the output contained the following message: WARNING: LISTWISE DELETION IS APPLIED TO THE AUXILIARY VARIABLE IN THE ANALYSIS. TO AVOID LISTWISE DELETION, DATA IMPUTATION CAN BE USED FOR THE AUXILIARY VARIABLES FOLLOWED BY ANALYSIS WITH TYPE=IMPUTATION. So I imputed the data for the auxiliary variables and reran the model using TYPE=IMPUTATION. Now the output still includes the mean and S.E. for each class for each variable, but it no longer includes the test for mean differences, i.e. the chisquare and pvalues. Is there a way I can get this information with imputed data for auxiliary variables? 


Further to my previous question, if it's not possible to get the DCON and DCAT tests with imputed variables, is there a way to see how many people are left in each class after listwise deletion? The output tells me the total number of observed and the total number deleted, but doesn't give me a breakdown by class. Many thanks. 


Rachel The current version of Mplus doesn't compute these. The mean differences you can obtain by hand if you run all the imputed data sets one at a time and combine the results as in http://sites.stat.psu.edu/~jls/mifaq.html#howto The chi2 is much harder but in principle can be done the same way. For the second question ... people don’t really change classes. They are in the same class as the model without the aux variable ... but if you want to confirm that – tech8 contains such information (end of tech8 is the last iteration from the DCON analysis with the class allocation). Tihomir 


Hello, I’m planning to use LGM to model how mother’s age at child birth affects her offspring sex. I know that I can accommodate the differing ages at child birth between mothers by using AT option, but I’m wondering whether missing data will be a problem here because mothers also differed in how many offspring they had during lifetime (e.g. 115)? In other words, does it matter assigning a missing value for offspring sex for those offspring that were never born? Can such models be fit in Mplus? Thanks a lot in advance! Best, Samuli 


I would put the data in long format to deal with the missing values. Then cluster size would vary depending on the number of children. See Example 9.16. 

Leo Young posted on Wednesday, June 17, 2015  4:27 am



Hi, Linda. I am running LGM in Mplus 7.2 Model: i s  s1@0 s2@1 s3@2; i s ON gender; s1 ON income1; s2 ON income2; s3 ON income3; but my covariates income have many missing values because some cases died in the next waves.so many cases dropped out when modeling. you mentioned that " If you don't want to lose cases missing on covariates, you can include the variances of the covariates in the MODEL command." in the previous answers. My question is how to write this command in the Model? Thank you very much. 


Don't use the missing data flag for income. Because your outcome is missing when income is missing, there is no effect of the income on the outcome for the dropout occasions. If you really want to include the covariates, you simply mention their means or variances. 


Hi Dr. Muthen, Is there a way to find the amount of data present for latent variables and for predictors in a growth curve model? Thanks! Hillary 


For observed variables, use the PATTERNS option of the OUTPUT command. 


Thank you for your response Dr. Muthen! Any suggestions for latent variables? Hillary 


I don't know what you mean by that. 


Sorry for being unclear! Is there a command for trying to find the amount of data used for a latent variable? Or does a latent variable take into account all data. For instance, if a model has 1149 observations, is the latent variable considering all 1149 observations? 


The default in Mplus is to use all available information to estimate the model. 


Dear Mplus users, I'd like to do LGM on data from 5 daily measures of employees. However, certain employees do not work every day, simply because of their working schedule. For example, some employees work 5 days a week and some work 3. So the first group has 5 chances to show growth during a week (and their growth score "2" occurs on day 3, Wednesday) while the latter group has 3 chances to show growth and their growth score "2" occurs later on in the week, and this is already the end of their growth. In other words, the latter group will never reach a growth score of "4". Basically I would need a growth model that models each time wave to be +1 higher than the previous nonmissing wave. Can that happen in Mplus? Thank you! Paris 


Try using a 2level growth model where you represent the individuallyvarying times of observation with a level1 time variable. The UG has an example of such a 2level growth model. 


Hi, Regarding missing data. Does choosing to do it using nonignorable missing data instead of the default (MAR) require more power to estimate the model? Or it doesn't change that? Thanks 


It may in that such models are more complex. 

Yaqiong Wang posted on Tuesday, September 04, 2018  8:44 pm



This may be a basic question: my data have 4 time points. I wonder for those cases that are completely missing for one or two time points, should I exclude them from the analysis? Or keep them and use Mplus settings to address missing data? 


You should keep people that have some nonmissing data. Only people missing at all four time points should be eliminated. 


Dear Dr. Muthen I am currently performing a fourwave LGM for a twoorder model with categorical indicators (of 5 point Likert scale type). I use a cluster variable and the TYPE=Complex command. Even though there is a lot of missing data the model terminates normally with the following fit indices: RMSEA=0.026, CFI=.968 and TLI=.967. My first question is: Do you think I need to use a method for Imputing data or will the result be totally different if I impute data? How do I Impute data with categorical data and a cluster variable? Should I use MLR (but without the cluster variable?) Another thing: When I am performing CFA for each wave, I am also thinking of saving Fscores for the latent variables in the model for subsequent LGM. However, in the new datfile there are only Fscores for the observed variables. How do I save the Fscores for the latent factors built up by the observed variables? 


Multiple imputation usually gives similar results to ML so I would not bother with that unless ML is demanding computationally. Chapter 11 of the UG shows multilevel imputation with categorical vbles. Fscores are given only for latent variables. You can send your output and resulting file to support along with your license number. 

Back to top 