socrates posted on Saturday, November 11, 2006 - 8:30 am
Dear Dr. Muthén
With an unconditional GMM I identified five latent classes in a londitudinal dataset. These trajectories agree with theoretical expectations. Subsequently, I entered time-invariant covariates to check if these variables allow to predict growth parameter variance within these latent classes. While I found some significant predictors with this procedure, some of the resulting trajectories look quite different compared to the ones of the unconditional GMM. How do I have to interpret this?
The following paper discusses this isuue. It can be downloaded from the website. See Recent Papers.
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.
1. I am using unconditional GMM to generate classes, exporting them, and then examining covariates as well as outcomes, i.e. I am treating 'class' like any other categorical variable. Many posts talk about 'leading to distorted results' but intuitively this is what makes sense to me- using the naturally occuring patterns (regardless of who populates them), then exploring who is in what class, and what the consequence of being in that class is. Is there a problem with this approach, i.e. estimating unconditionally and then dealing with covariates subsequently?
2. Unrelated to above: when I use GMM (linear), I get the expected number of classes (4) and patterns, in line with a priori theory; if I fix the variance, it is still similar. If I fix the variance AND the intercept, I no longer get the patterns I anticipated- just more and more parallel lines. I know that there is a lot of within person variation, and in my data I have 20 time points. So it works well with GMM but not with LCGA. Is this a problem? My feeling is that I have to be able to allow within class variation in order to have them emerge.
1.) If you export to another program class membership is based on "most likely" and fuzzy boundaries (that would be: class membership based on posterior probabilities) are not taken into account in further analyses. Contrary, if you model the covariates in your LGMM (within mplus) effects of covariates are controlled for this class uncertainty. Another point is that in the latter kind of model you can reduce the potential of a misspecified model (often you need direct effects of covariates on the indicators). However, if your classification quality is very good (entropy above .90, average post probabilities also) you can probably stick with saving class membership based on most likely (because class uncertainty is very lowin this case). I also often check whether there are very few "borderline cases", i. e. individuals with nearly the same posterior probability to be assigned to each class.
2.) Interesting issue. However, I always do a LCGA first and then free growth factor variances in a stepwise fashion (first intercept, then slope) and finally I try to let these variances differ across classes. If your growth factor variances are significant within classes, I would leave them in the model, because this is closer to reality. Addtionally, out of my experiences, I get the impression that one overestimates the number of classes with restricted growth factor variances.
For the second part, if GMM makes more sense both statistically and substantively because you have variation within classes, I would use GMM.
Tracie B posted on Wednesday, May 13, 2009 - 1:46 am
Thank you very much, this was a great help!
Youngoh Jo posted on Friday, December 16, 2011 - 2:51 am
Using unconditional models I found 3 groups. When I use conditional models, I put the following commands: data: file is "E:\data\w1-w6.csv"; variable: NAMES ARE ID SEX sc1-sc6 pa1-pa5 mo1-mo5 ab1-ab5 ta1-ta5 dp1-dp5 ne2-ne5; USEVARIABLES ARE SEX sc2-sc6; MISSING ARE ALL (999); classes = c (3); ANALYSIS: TYPE = MIXTURE; starts = 20 2; model: %overall% i s | sc2@0sc3@1sc4@2sc5@3sc6@4; i s on sex; c on sex;
OUTPUT: tech1 tech8;
and I got the following error message:
*** ERROR in Model command Unknown variable(s) in an ON statement: C
Stine Hoj posted on Thursday, June 26, 2014 - 1:37 am
I have specified a 4-class GMM with individually varying times of observation and now wish to predict class membership probabilities from covariates. I have two questions about this:
(1) What is the most effective way to determine whether I need direct effects between the covariates and growth factors? For instance, in a cross-sectional LCA you can constrain direct effects between the covariate and each item to zero (Y1 on X@0, Y2 on X@0, etc) then use modification indices to see which paths should be freed. However, modification indices don't seem to appear for paths between the covariate and the growth factors (I on X@0).
I have considered using MODEL TEST to determine whether each path (I on Xk, S on Xk) differs significantly from zero but this seems like it may become a lengthy process with many predictors.
(2) To include time-varying covariates, I believe I just extend the LGCM approach and regress Yt on each Xt, while refraining from modelling these covariates as predictors of the latent class variable.
Is this the correct approach? If so, how is it applied in a 3-step procedure where covariates are introduced in the third step, at which point information pertaining to the specific values of Yt is dropped?
(1) One way is to compare BIC when including all such paths versus none.
I would have thought Modindices would show up.
(2) Right. Not sure that 3-step works here because in step 1 you say that the class indicators are not influenced by the tvcs and in the last step you say they are - that case seems to fall in the "direct effects" category of our 3-step paper, where 3-step is not a good choice.