Missing observations in LCA PreviousNext
Mplus Discussion > Missing Data Modeling >
 Lenna Nepomnyaschy posted on Friday, October 14, 2011 - 4:01 am

I am having trouble with LCA models with binary variables when there are missing observations. If I drop all cases with missing observations before converting to an mplus dataset (from Stata) everything works fine. But, if I leave in cases with missing observations, Mplus seems to think that my binary variables have three categories. It is treating the missings as actual values.

In the file that reads the data into mplus, it seems that Mplus understands that they are missing because it computes the right means for all the variables, but then in the LCA models something goes wrong. And when I look at the saved raw data it changes all the binary 0/1 variables to three-category 0,1,2 variables with 0s being those that are missing. I have tried changing missings to periods (.) in the LCA input file, but the same thing happens.

however, if I change missings to periods in the original file that reads the data into Mplus, then all the means of the variables are completely wrong.

Am I doing something wrong?
thank you
 Linda K. Muthen posted on Friday, October 14, 2011 - 12:27 pm
It sounds like you have blanks in your data set for missing values and that you are reading the data with free format. Blanks are not allowed with free format. If this does not help, please send the input, data, output, and your license number to support@statmodel.com.
 Sebastian Daza posted on Tuesday, April 03, 2012 - 9:57 am
I am running some LCA and LTA model with covariates. Some covariates have missing data. What could I do with that? Does anyone know references about how to deal with that using mplus (e.g. examples, syntax)?

Thank you in advance,
 Linda K. Muthen posted on Tuesday, April 03, 2012 - 4:17 pm
You can bring all of the covariates into the model by mentioning their means, variances, or covariances in the MODEL command. When you do this, you make distributional assumptions about them and they are treated as dependent variables. You could also use multiple imputation.
 Sebastian Daza posted on Tuesday, April 03, 2012 - 7:57 pm
Thank you Linda. Do you know where I could find examples of how to implement or do that in plus?

Thank you,
 Linda K. Muthen posted on Wednesday, April 04, 2012 - 6:30 am
You just mention the variances of the covariates in the MODEL command. If your covariates are x1 and x2, you would say

x1 x2;
 ruthjlee posted on Thursday, May 14, 2015 - 10:26 pm

I would like to run an LCA using parent reports on aspects of children’s home environment, with age as a covariate. The questionnaire was administered to parents who made a one-off visit to a lab with their children. It was used over a period of around seven years.
The questions changed somewhat during that period, so the majority of cases have missing data.

1. We suspect that some of our missing data are NMAR. For instance, a question about young children's use of computers at home was only added when it became common for young children to use computers. Here, missingness probably depends on the unseen observations themselves: in years when the question was not asked, this was because the expectation at that time was that the vast majority of parents would indicate no computer use at home.

- If we are right to consider these data NMAR, is it realistic to model this in the LCA, or would we be better served by simply removing the question?

- If the former: I am not sure whether growth model examples 11.1, 11.3 or 11.4 in the manual would be most appropriate, or whether some other approach would be a better fit for an LCA.

2. Some of our items have around one-third MAR data, and some have more cases with missing data than with valid data. Is there a rule of thumb regarding Mplus's tolerance, i.e. a level of missingness at which we should remove the item?

Many thanks in advance.
 Bengt O. Muthen posted on Friday, May 15, 2015 - 5:40 pm
NMAR modeling can be done in many different model contexts and is a bit complex. See e.g. my paper:

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper.

It is not the Mplus tolerance that is pushed when too much data are missing, but it is the reliance on model assumptions rather than data information that is the problem. With more missing than present you have a big problem, but the problem exists even for less missing than that. Nobody can give a rule of thumb.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message