I'm looking at both GMM and LCGA of depression scores at 4 time points, using cubic growth (neither linear nor quadratic fits well). For the GMM I am allowing only the intercept to vary. As expected, LCGA yields more classes for about the same fit (BIC): 7-8 vs. 3-4 for GMM. For LCGA some of the classes have roughly parallel, flat curves with different starting points. This has appealing interpretation: e.g., "chronically depressed" (consistently high scores) or "chronically happy" (consistently low scores). In the GMM these are merged into one centrally located class, which I think makes sense since the intercepts are random instead of fixed. The entropy for the GMM is higher than LCGA (0.84 vs. 0.70). However, the interpretation seems less useful, because the depressed and the non-depressed folks are lumped into the one class that can only be described as "stable" regardless of severity.
It seems that such a GMM will always tend to lump parallel trajectories, and different classes would be primarily distinguished by shape. Is this correct? If so, it seems that an LCGA model is more appropriate for us, since average depression level is as important as trajectory shape in our characterization. However, a 7/8-class LCGA model is a bit unwieldy. Is there an alternative way I could/should be thinking about this that allows integration of average depression status as well as shape? Thanks.
Yes, GMM primarily distinguishes classes by curve shape. The within-class variation, captured by a variance for a growth factor, picks up variations on such themes. But I wouldn't agree that "an LCGA is more appropriate for us, since average depression level is as important as trajectory shape" for two reasons. First, one should ask which model fits the data better; if GMM is significantly better, LCGA shouldn't be used. Second, the importance of the level can be acknowledged also in a GMM framework - for instance predicting a distal outcome from not only the latent class but also the growth factor itself.
...well after a bit more thought perhaps "GMM will always tend to lump parallel trajectories" is probably not accurate. At any rate, though, at least with our depression data there does seem to be this continuum of average depression status that results in several parallel-trajectory classes under LCGA, which get lumped into once classs in GMM. I am very new to this but have gleaned that LCGA sometimes can't differentiate between a normal-mixture approximation to a non-normal distribution (which the depression scores are) and different classes. How do I reconcile this with the fact that the LCGA classes have greater interpretability in terms of average depression status, which gets lost in the GMM model? Thanks so much.
Thanks--very helpful, although I'm still struggling with this issue of average level. In some sense we are interested in predicting a patient's trajectory given where they start. I'm not sure of the best way to formulate this in the model though. My initial and perhaps naive thought was to run separate GMM models by subgroups defined by the baseline score (e.g. 0-4:no depression, 5-9:mild, 10-14:moderate, etc.). One potential problem is that the baseline score, thus stratified, is not normally distributed nor can no longer be treated as censored normal as I do with the other three time points. Perhaps I should use only the follow-up time points then in identifying the trajectories? Do you have other thoughts about how to approach this?
On another note, is there any problem with estimating cubic curves using only 4 time points? We often see rapid change following by plateauing, which neither linear nor quadratic fit very well. I have been admitting variances for the intercept +/- slope but have been holding the quadratic and cubic terms fixed.
Apologies for so many questions and tremendous thanks for your suggestions. We have a contingent of folks signed up for the March workshop and are looking forward to it.
Usually, the initial score contributes to the class formation by the fact that the initial status mean [i] varies across classes. But one can also specify "c on i" - but that is advanced (i.e. something you do after having studied the topic a long time).
Estimated time scores are discussed in our Short Courses, "Day 2", covering growth modeling - you can get handouts from that. Here is an example:
Hello, I am running LCGA in randomized trials of smoking at 6 time points. I am wondering if it is correct to have a direct path from treatment to slope. I know that GMM can have the path, but I am not sure LCGA. In my model, should the path from treatment be related to class (not slope) because LCGA does not have within-class variation?
I also have covariates. Can they also be mapped only to class (not intercept and slope)?
I am sorry for this beginner's question. I will appreciate your help. Thanks!
It's a good question. LCGA as defined by Nagin does not have within-class variation as you say, and covariates cannot influence growth within classes. But in Mplus you can let treatment and other covariates change the slope mean within classes, using zero residual variance. Note that you don't want the treatment to influence the class membership because the class membership is typically thought of as a pre-treatment variable.
But why restrict the slope residual variance to zero - I think you should use GMM, not LCGA and as you say let the within-class slope be influenced by treatment.
mari posted on Wednesday, April 27, 2011 - 7:24 am
Thank you very much for your response. It helps me a lot.
I'm looking for different classes of fatigue trajectories in a sample of osteoarthritis patients (n = 1000) using both GMM and LCGA (with linear and quadratic growth factors).
At this moment I'm in a bit of a conundrum. Not surprisingly, in case of LCGA I've been able to find more distinct classes than with GMM (6-7 vs. 3-4). With LCGA the entropy scores are exceeding the 0.8 level, although these classes only differ in terms of initial change (i.e. intercept). The GMMs on the other hand have consistently shown much better fit (e.g. BIC, BLRT). The entropy scores of these GMMs, however, do not exceed the 0.6 level. Thus GMM doesn't seem to be able to classify individual trajectories properly. Although the average latent class probabilities on the diagonal are mostly ~0.75.
As the LCGA models show worse fit and the trajectory plots show obvious within-class differences (at least in some classes) regarding individual trajectories, I'm inclined to go for GMM. But if I do, I lose the distinction between individual trajectories concerning initial change.
In deciding on the number of classes, I would not focus on entropy. I would look at model fit and the substantive meaning of the classes. See the Topic 6 course handout and video where we give a strategy for deciding on the number of classes using various measures. It is well known that LCGA extracts many similar classes because of not allowing for within-class variability.
I have run a 2-class LCGA and interestingly, the BIC increases from the unconditional 1 class model. However, when I run a fully GMM model with 2 classes (though the entropy is only .68) I get a lower BIC from the 1 class model and the BLRT: p<.01. How might this happen? I have always read LCGA is going to extract more classes than the GMM and that LCGA might be considered a first step to determining number of classes. Is this a sign that there are no classes in the data even though the GMM might indicate there are?
You say that you "get a lower BIC from the 1-class model" when using GMM. If that means that your 2-class GMM has a higher BIC than a 1-class GMM, then that points to 1 class also for the GMM. We want to find the model with the smallest BIC.
Seems ok tome. A 2-class LCGA is more restrictive than a 2-class GMM, so the extra parameters for LCGA when going from 1 to 2 classes may not be worth it, but they are for GMM.
John Woo posted on Saturday, November 14, 2015 - 2:17 pm
Dear Dr. Muthen, my two-class model for GMM and LCGA yield similar-looking trajectories but the class membership shift quite a bit between the two models. Because my sample is relatively small (n=117), even a slight shift in class membership makes a sizable difference in the results pertaining to distal outcomes. I want to go ahead with the LCGA because the result in terms of the distal outcomes fits the expectation better, but I am not sure how to make a good justification for using LCGA instead of GMM. The GMM model converges fine, with the variances of growth factors ss at 0.1 level (but not at 0.05 level). The entropy and avg prob are higher for LCGA than for GMM, but AIC, BIC, SABIC are better for GMM (but not significantly better). Would a small sample size be of any factor in deciding between GMM and LCGA? If the extent to which you relax model assumptions indicates how much closer the model is to reality, then GMM seems like a better choice.. but the actual results align better with LCGA in terms of what I know from the literature. Any suggestion would be appreciated. Thank you.
Hard to say. BIC doesn't work too well for such a small sample size. And you don't have much power for getting a small SE for the growth factor variances at that n. I think you need to report both models to honestly portray the situation and say that only a new, larger sample can decide on this.
I have a similar situation. I have 6 time points. GMM models produce more interesting curve shapes, and the fit statistics (BIC, etc.) are better, but entropy and classification probabilities are lower (~.6). LCGA gives me better entropy and class probs (>.8) but the BIC is higher and of course I lose the interesting curve shapes. However, if I'm predicting distal outcomes, isn't it a problem to have low entropy and class probs? Also, wouldn't I need some theoretical justification for choosing an LCGA when there is significant variance within classes??
I think BIC and entropy to some extent play the roles of regular SEM's chi-square and R-square. You can have great chi-square fit and a poor R-square for key relations - or the reverse. They are really two different things and you wouldn't interpret the R-square unless the chi-square was good.
So with that analogy, I don't think entropy should take priority over BIC. Furthermore, I think the coefficients in the prediction of the distal outcome as a function of the latent class variable can be well estimated even without high entropy.