I am carrying out latent class analysis using 5 multiply imputed datasets. I have managed to carry out the analysis using type=imputation
I have then run the analysis with all the parameters fixed to values of the average estimates from the mulitple imputation analysis to obtain latent class probablities. This gives me the class probablities.
My question is how should I assess the model fit. Because the results are the average across the 5 datasets, the usual criteria I would use are not available.
Sorry, I have another question. How can I find out the results of the bootstrap likelihood ratio test for the different models I run? Using tech14 when I run the analysis using the fixed values, on any of the imputed datasets, is different depending on the dataset I use. Does this matter? The significance level of the models seems to remain the same for the different datasets.
Jon Heron posted on Wednesday, April 15, 2015 - 11:47 pm
there are many unanswered questions when it comes to MI in mixture modelling.
It sounds like you are attempting to determine the number of classes across your imputed datasets. My suggestion would be that you make this decision first, prior to analysing the imputed data - perhaps relying on likelihood-based methods to deal with class-indicator missingness at that point, alternatively using some theory.
By the way, your imputation model should contain information regarding the classes - so this is a little chicken and egg. The first mention I have seen on this is
Colder CR, Mehta P, Balanda K, et al. Identifying trajectories of adolescent smoking: an application of latent growth mixture modeling. Health Psychol 2001/3; 20(2): 127-35
You'll find that some people - including me - have neglected to do this, and assumed a single population for the purpose of imputation. This is likely to attenuate any differences between the classes.
Thanks for your reply. If I understand correctly you suggest running the LCA analysis on the data with missing values and then using multiple imputation to impute class membership?
Jon Heron posted on Thursday, April 16, 2015 - 12:59 am
no, I didn't mean that. I'll try again now I am caffeinated.
You don't need imputation to deal with indicators of class membership so presumably you are using imputation because you have missing covariate information.
My suggestion is that you determine the number of classes before considering imputation and then you use imputation to deal with covariate missingness. After doing imputation you will need to carry out another LCA but at this point you wont need to worry about BLRT/BIC etc.
You'll see a number of ad-hoc alternatives in the literature. For instance, if your entropy is excellent (>0.9?) then exporting your LCA results, assigning to modal class and then doing imputation *might* be adequate. Hard to say whether any bias from doing that would be smaller/greater than imputing whilst ignoring the underlying class structure.
Thanks again Jon. I'm probably being really dumb, but why don't I need inputation to deal with indicators of class membership? If some of the variables I am using to determine my classes having missing values should I not deal with these before using them to determine my classes?
Jon Heron posted on Thursday, April 16, 2015 - 2:37 am
missing data in class indicators is typically handled using likelihood methods (FIML). Check out some of Craig Enders' work including Enders and Bandalos. Of course there are assumptions to make, but there always are with missing data.
Tech10 has univariate and bivariate fit information which you can use to asses model fit.
For determining the number of classes, my preferred method is to run tech14 on each imputed data set (with all parameters free). Once you determine the number of classes for each imputed data set - use the # of classes that comes up the most across the imputed data sets.
Just to confirm Jon Heron's reply ... if you have the raw data with the missing values you can run LCA with the ML estimator directly without doing imputations and avoid the above complications.
We are running a Latent Profile Analysis with 4 indicators in a sample of 371 participants. We identified a 3 –class solution and would like to test which covariates predict class membership (using the Vermunt 3 steps approach). As we have quite some missing in our dataset, we would like to make multiple imputation of our covariates. However, we read that multiple imputation in the context of latent class analysis is not recommended, because it hypothesizes multiple latent sub-populations while multiple imputation suppose a single population. Our questions are thus the following:
1) Does this issue also apply for covariate or is it more for the imputation of the latent class indicators? 2) Would this issue be somewhat addressed if the imputation of the covariates is made based on – among other variables – the probabilities of each class membership (previously identified)? 3) We first identified the classes using FIML and would consider multiple imputation only for the covariate: Is this combination of different ways to handle missing data appropriate?
Thank you in advance for your help. Any thoughts would be welcome.
2) Generally the more information is supplied for the imputation the better
3) Yes - it is appropriate and it is preferable because the missing data for the indicators can easily be handled in stage one with FIML.
You can also consider these two approaches:
A. Treat the covariate as a dependent variable during stage 3 estimation. You would need to use the manual 3 step approach for that - see Section 3 in http://statmodel.com/download/webnotes/webnote15.pdf In the third stage you would need to mention the variance of the covariate. This avoids imputations completely and uses FIML treatment of the missing data.
B. The imputation of the covariates can be done using H0 imputations in Mplus. The covariate is treated here again as a dependent variable and the imputation is based on the mixture model itself. See User's Guide example 11.7 for an example of H0 imputation.
Thank you very much for this quick answer. Two things are not totally clear for me:
1) If I understood well, you advise us to enter our covariates as distal outcomes in the manual 3 steps approach?
2) For the A approach, you say to "mention the variance of the covariate" in the 3rd step, but do you mean estimating the variance, or fixing them, or something else? and should it be done in the overall model or in the specific class models?
1) No. Treat it as a class predictor in a bivariate model C on X; X; so that missing data is treated with FIML
2) Estimating the variance. It should be in the overll section. Mplus will not allow you to do it in the class specific section since that creates reciprocal interaction, see Maddala reference in http://statmodel.com/download/Rec.pdf
Thank you very much for this very clear answer. It works well and the model is estimated with FIML. However, we noticed that our number of individuals per class changed slightly compared to the step 1 model. Is it normal? Can it be avoided?
Dear Tihomir, If I understood it well, with the H0 imputation model one would: 1. Use the LPA identified as the best fitting solution to impute the missing values of the covariates 2. The imputed values would be used in a second analysis with the 3-step approach to test which covariates predict class membership. Is this correct?
If so, how would the covariates be specified in the imputation model? I tried the following syntax, but it didn’t work (*** ERROR in DATA IMPUTATION command Unknown variable(s) in the IMPUTE option: x1 ).
VARIABLE: NAMES = id u1-u4 x1-x15; USEVARIABLES = u1-u4; MISSING = ALL (-999); IDVARIABLE = id; CLASSES = c(4); ANALYSIS: ESTIMATOR = BAYES; PROCESSORS = 2; TYPE = MIXTURE; MODEL: %OVERALL% x1-x9; !I mention the variance of the predictors with missing data %C#1% u1; u2; u3; %C#2% u1; u2; u3; %C#3% u1; u2; u3; %C#4% u1; u2; u3; DATA IMPUTATION: IMPUTE = x1-x10 NDATASETS = 20; SAVE = imputation*.dat; SAVEDATA: FILE = plausvalues.dat;
I assume this error appears because I am not mentioning my covariates in the USEVARIABLES command, but if I do it, the covariates would be used as indicators of the LPA. What would then be the best way to set up the imputation syntax? I appreciate a lot your help.
I think so ... add the variables x1-x15 to the usevar command and make these class indicators and add x1-x15 with x1-x15 and the covariances with U. Such a model could be hard to estimate, however, so I would consider simply using H1 imputation using type=basic;
Mayra Galvis posted on Monday, September 17, 2018 - 11:34 pm
Dear Tihomir Thank you very much for your answer. In case of specifying the model the way you suggested, is there a high risk of getting a LPA solution that is significantly different from the original one we identified as the best fitting? or would there be a way to “protect” the classes we previously identified?
I think you are correct. I think some kind of two-stage imputation can work to protect the LPA solution. First impute the latent class variable (plausible value) from the LPA and then for each such imputation impute the covariates (treating the imputed latent class value as knownclass) and data set at a time.