I am having trouble with LCA models with binary variables when there are missing observations. If I drop all cases with missing observations before converting to an mplus dataset (from Stata) everything works fine. But, if I leave in cases with missing observations, Mplus seems to think that my binary variables have three categories. It is treating the missings as actual values.
In the file that reads the data into mplus, it seems that Mplus understands that they are missing because it computes the right means for all the variables, but then in the LCA models something goes wrong. And when I look at the saved raw data it changes all the binary 0/1 variables to three-category 0,1,2 variables with 0s being those that are missing. I have tried changing missings to periods (.) in the LCA input file, but the same thing happens.
however, if I change missings to periods in the original file that reads the data into Mplus, then all the means of the variables are completely wrong.
It sounds like you have blanks in your data set for missing values and that you are reading the data with free format. Blanks are not allowed with free format. If this does not help, please send the input, data, output, and your license number to firstname.lastname@example.org.
Hello, I am running some LCA and LTA model with covariates. Some covariates have missing data. What could I do with that? Does anyone know references about how to deal with that using mplus (e.g. examples, syntax)?
You can bring all of the covariates into the model by mentioning their means, variances, or covariances in the MODEL command. When you do this, you make distributional assumptions about them and they are treated as dependent variables. You could also use multiple imputation.
You just mention the variances of the covariates in the MODEL command. If your covariates are x1 and x2, you would say
ruthjlee posted on Thursday, May 14, 2015 - 10:26 pm
I would like to run an LCA using parent reports on aspects of children’s home environment, with age as a covariate. The questionnaire was administered to parents who made a one-off visit to a lab with their children. It was used over a period of around seven years. The questions changed somewhat during that period, so the majority of cases have missing data.
1. We suspect that some of our missing data are NMAR. For instance, a question about young children's use of computers at home was only added when it became common for young children to use computers. Here, missingness probably depends on the unseen observations themselves: in years when the question was not asked, this was because the expectation at that time was that the vast majority of parents would indicate no computer use at home.
- If we are right to consider these data NMAR, is it realistic to model this in the LCA, or would we be better served by simply removing the question?
- If the former: I am not sure whether growth model examples 11.1, 11.3 or 11.4 in the manual would be most appropriate, or whether some other approach would be a better fit for an LCA.
2. Some of our items have around one-third MAR data, and some have more cases with missing data than with valid data. Is there a rule of thumb regarding Mplus's tolerance, i.e. a level of missingness at which we should remove the item?
NMAR modeling can be done in many different model contexts and is a bit complex. See e.g. my paper:
Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper.
It is not the Mplus tolerance that is pushed when too much data are missing, but it is the reliance on model assumptions rather than data information that is the problem. With more missing than present you have a big problem, but the problem exists even for less missing than that. Nobody can give a rule of thumb.