How can we predict latent class membership for a new data set which is not part of the data set used for LCA? For example, if we conduct LCA based on male sample, can we get output file with the predicted class membership for females?
You can do this. It makes the assumption that the new sample comes from the same population as the original sample. This may not be justified in your case.
To do so, you need to fix all of the parameters to their estimated values using the original sample. You can use the SVALUES option of the OUTPUT command to get your MODEL command with those values as starting values. You can do a global replace of * with @ to fix the values. Then estimate the model using the new data set asking for CPROBABILITIES in the SAVEDATA command.
I conducted LCA based on data collected up to 2009 (Sample 1). In year 2010, we collected new sample (Sample 2). I want to investigator whether using LCA model based on Sample 1 predict classmembership of Sample 2 is consistent with results from LCA analysis on Sample 2. If there were no missing data on Sample 2, I would use formulae for posterior probability to predict the classmembership. But we do have missing data on some indicators. What you advised makes sense to me.
Do you have any sample M-plus syntax or similar example for doing this prediction I can follow?
I think you want to compare posterior probabilities using the sample 1 estimates on sample 2 and using sample 2 in a new analysis. The way you would do the first is ask for SVALUES in the analysis with sample 1. Then you would change the * to @ and use that input with the sample 2 data. All parameters in the model should be fixed. Ask for CPROBABILITIES in the SAVEDATA command.
Elina Dale posted on Tuesday, November 26, 2013 - 10:55 am
Dear Dr. Muthen,
I would like to assign compliance class membership and then use those estimates in my mediation model.
Basically, instead of using a 1-step approach (estimating compliance status & estimating effect of X on Y through M among compliers simultaneously), I'd like to use a 2-step approach.
First, estimate the compliance status for each observation. Second, estimate indirect effect of X on Y through M based on observed compliance class.
Could you please, help me with MPlus commands? I couldn't find it in the MPlus Guide. Thank you!
If in a first step you have created an observed binary variable of compliance status using Most Likely Class you can then in a second step simply use a multiple-group analysis using that binary variable as a GROUPING variable (see UG Index) in a model that is a standard mediation model
y ON M X; M ON X;
Note, however, that the entropy in the first step should be at least 0.8 for this to give trustworthy results. If less than 0.8, you want to study 3-step techniques in Web Note 15.
Elina Dale posted on Tuesday, November 26, 2013 - 7:50 pm
Thank you, Dr. Muthen, for such a prompt response! That's the thing, I have never created observed binary variables based on compliance status. That is the thing I do not know how to create such a variable in MPlus.
I know once you have them you use SAVE=CPROBABILITIES; to save them. But how do you I create them? I don't see an example in the MPlus Guide.
And once I create them, can I actually open the data set & see them (like in STATA when you can browse your data set)?
SAVE = CPROB not only gives the posterior class probabilities for each person but also gives the most likely class membership (the class that the person's posterior probability is highest for). You can also save any variable you want into this data set by using the Auxiliary command.
Elina Dale posted on Friday, November 29, 2013 - 4:49 pm
Dear Dr. Muthen,
I have tried using SAVE=CPROB but I am not sure how to do the next step. What is the name of the new variable that was created under CPROB?
I want to specify it under GROUPING IS but am not sure how to recall it. When in step 1 I asked MPlus to estimate most likely class membership and save it, what is the variable that contains that?
I know the dat file doesn't contain variable names, it just shows values, so I am not sure what to do.
At the end of your output you see the heading SAVEDATA INFORMATION. That shows you which variables are where. It is the "c" variable that contains most likely class.
Elina Dale posted on Friday, November 29, 2013 - 5:22 pm
Thank you! I see now it's after TECH outputs!
Elina Dale posted on Sunday, December 01, 2013 - 10:44 pm
Dear Dr. Muthen,
I followed a 3-step approach in modeling a mediation model with non-compliance and a latent mediator. 1. Obtained factor scores for my latent variable factors (my M variable in the final model); 2. Obtained compliance class probabilities (CPROB) 3. Used compliance class as an observed grouping variable.
In step 3, however, I am getting "perfect" fit indices (RMSEA=0.000, CFI=1.00, TLI=1.00) and it is a bit disturbing to me.
Can one get perfect values for all 3 fit indices? Should I be worried? Wondering if there is something wrong with the model.
Thank you, Elina
Elina Dale posted on Sunday, December 01, 2013 - 10:53 pm
One more thing to add to the above. While my fit indices look perfect, I have a very large residual variance for my Y. So, it doesn't seem like a model explains a lot of variance in the outcome. I know fit indices are not compared directly to residual variance or amount of variance explained, but still seems strange to have such perfect fit in this situation.