GMM vs. LCGA - interpretation of classes PreviousNext
Mplus Discussion > Growth Modeling of Longitudinal Data >
 Philip Jones posted on Monday, December 11, 2006 - 8:52 am
I'm looking at both GMM and LCGA of depression scores at 4 time points, using cubic growth (neither linear nor quadratic fits well). For the GMM I am allowing only the intercept to vary. As expected, LCGA yields more classes for about the same fit (BIC): 7-8 vs. 3-4 for GMM. For LCGA some of the classes have roughly parallel, flat curves with different starting points. This has appealing interpretation: e.g., "chronically depressed" (consistently high scores) or "chronically happy" (consistently low scores). In the GMM these are merged into one centrally located class, which I think makes sense since the intercepts are random instead of fixed. The entropy for the GMM is higher than LCGA (0.84 vs. 0.70). However, the interpretation seems less useful, because the depressed and the non-depressed folks are lumped into the one class that can only be described as "stable" regardless of severity.

It seems that such a GMM will always tend to lump parallel trajectories, and different classes would be primarily distinguished by shape. Is this correct? If so, it seems that an LCGA model is more appropriate for us, since average depression level is as important as trajectory shape in our characterization. However, a 7/8-class LCGA model is a bit unwieldy. Is there an alternative way I could/should be thinking about this that allows integration of average depression status as well as shape? Thanks.
 Bengt O. Muthen posted on Monday, December 11, 2006 - 9:04 am
Yes, GMM primarily distinguishes classes by curve shape. The within-class variation, captured by a variance for a growth factor, picks up variations on such themes. But I wouldn't agree that "an LCGA is more appropriate for us, since average depression level is as important as trajectory shape" for two reasons. First, one should ask which model fits the data better; if GMM is significantly better, LCGA shouldn't be used. Second, the importance of the level can be acknowledged also in a GMM framework - for instance predicting a distal outcome from not only the latent class but also the growth factor itself.
 Philip Jones posted on Monday, December 11, 2006 - 9:10 am
...well after a bit more thought perhaps "GMM will always tend to lump parallel trajectories" is probably not accurate. At any rate, though, at least with our depression data there does seem to be this continuum of average depression status that results in several parallel-trajectory classes under LCGA, which get lumped into once classs in GMM. I am very new to this but have gleaned that LCGA sometimes can't differentiate between a normal-mixture approximation to a non-normal distribution (which the depression scores are) and different classes. How do I reconcile this with the fact that the LCGA classes have greater interpretability in terms of average depression status, which gets lost in the GMM model? Thanks so much.
 Bengt O. Muthen posted on Monday, December 11, 2006 - 9:43 am
You may want to take a look at the recent GMM paper on the Mplus web site (see Papers)

Kreuter, F. & Muthen, B. (2006). Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling.
 Philip Jones posted on Tuesday, December 12, 2006 - 6:36 am
Thanks--very helpful, although I'm still struggling with this issue of average level. In some sense we are interested in predicting a patient's trajectory given where they start. I'm not sure of the best way to formulate this in the model though. My initial and perhaps naive thought was to run separate GMM models by subgroups defined by the baseline score (e.g. 0-4:no depression, 5-9:mild, 10-14:moderate, etc.). One potential problem is that the baseline score, thus stratified, is not normally distributed nor can no longer be treated as censored normal as I do with the other three time points. Perhaps I should use only the follow-up time points then in identifying the trajectories? Do you have other thoughts about how to approach this?

On another note, is there any problem with estimating cubic curves using only 4 time points? We often see rapid change following by plateauing, which neither linear nor quadratic fit very well. I have been admitting variances for the intercept +/- slope but have been holding the quadratic and cubic terms fixed.

Apologies for so many questions and tremendous thanks for your suggestions. We have a contingent of folks signed up for the March workshop and are looking forward to it.
 Bengt O. Muthen posted on Tuesday, December 12, 2006 - 11:49 am
Regarding your 1st paragraph regarding the importance of the starting point, for a growth model like

i s | y1@0 y2@1 ....

have you considered doing

s on i; ?

By setting y1@0 you define i as the initial status which is in line with your interests.

Regarding the 2nd paragraph, I wonder if a model with only i and s (as above) can use estimated time scores that reflect leveling off from the line.
 Philip Jones posted on Tuesday, December 12, 2006 - 12:36 pm
I have so much to learn. Thank you!

Thoughts on first suggestion: That's conditional on class membership, right? What if we wanted to predict which class a patient was likely to be in based on their initial score?

Thoughts on second suggestion: none, meaning I never thought of that. Can you give or point me to an example of how to do that?
 Bengt O. Muthen posted on Tuesday, December 12, 2006 - 12:47 pm
Usually, the initial score contributes to the class formation by the fact that the initial status mean [i] varies across classes. But one can also specify "c on i" - but that is advanced (i.e. something you do after having studied the topic a long time).

Estimated time scores are discussed in our Short Courses, "Day 2", covering growth modeling - you can get handouts from that. Here is an example:

i s | y1@0 y2@1 y3*2 y4*3;
 mari posted on Monday, April 25, 2011 - 8:42 pm
Hello, I am running LCGA in randomized trials of smoking at 6 time points. I am wondering if it is correct to have a direct path from treatment to slope. I know that GMM can have the path, but I am not sure LCGA. In my model, should the path from treatment be related to class (not slope) because LCGA does not have within-class variation?

I also have covariates. Can they also be mapped only to class (not intercept and slope)?

I am sorry for this beginner's question. I will appreciate your help. Thanks!
 Bengt O. Muthen posted on Tuesday, April 26, 2011 - 6:22 pm
It's a good question. LCGA as defined by Nagin does not have within-class variation as you say, and covariates cannot influence growth within classes. But in Mplus you can let treatment and other covariates change the slope mean within classes, using zero residual variance. Note that you don't want the treatment to influence the class membership because the class membership is typically thought of as a pre-treatment variable.

But why restrict the slope residual variance to zero - I think you should use GMM, not LCGA and as you say let the within-class slope be influenced by treatment.
 mari posted on Wednesday, April 27, 2011 - 7:24 am
Thank you very much for your response. It helps me a lot.
 J. Botterman posted on Wednesday, June 05, 2013 - 4:27 am

I'm looking for different classes of fatigue trajectories in a sample of osteoarthritis patients (n = 1000) using both GMM and LCGA (with linear and quadratic growth factors).

At this moment I'm in a bit of a conundrum. Not surprisingly, in case of LCGA I've been able to find more distinct classes than with GMM (6-7 vs. 3-4). With LCGA the entropy scores are exceeding the 0.8 level, although these classes only differ in terms of initial change (i.e. intercept). The GMMs on the other hand have consistently shown much better fit (e.g. BIC, BLRT). The entropy scores of these GMMs, however, do not exceed the 0.6 level. Thus GMM doesn't seem to be able to classify individual trajectories properly. Although the average latent class probabilities on the diagonal are mostly ~0.75.

As the LCGA models show worse fit and the trajectory plots show obvious within-class differences (at least in some classes) regarding individual trajectories, I'm inclined to go for GMM. But if I do, I lose the distinction between individual trajectories concerning initial change.

What is an advisable thing to do? Thanks.
 J. Botterman posted on Wednesday, June 05, 2013 - 8:03 am
Forgot to add that the outcome variable was measured at 7 (equally spaced) time points.
 Linda K. Muthen posted on Wednesday, June 05, 2013 - 12:26 pm
In deciding on the number of classes, I would not focus on entropy. I would look at model fit and the substantive meaning of the classes. See the Topic 6 course handout and video where we give a strategy for deciding on the number of classes using various measures. It is well known that LCGA extracts many similar classes because of not allowing for within-class variability.
 Ketan posted on Saturday, July 19, 2014 - 5:30 pm
I have run a 2-class LCGA and interestingly, the BIC increases from the unconditional 1 class model. However, when I run a fully GMM model with 2 classes (though the entropy is only .68) I get a lower BIC from the 1 class model and the BLRT: p<.01. How might this happen? I have always read LCGA is going to extract more classes than the GMM and that LCGA might be considered a first step to determining number of classes. Is this a sign that there are no classes in the data even though the GMM might indicate there are?
 Bengt O. Muthen posted on Sunday, July 20, 2014 - 6:30 am
You say that you "get a lower BIC from the 1-class model" when using GMM. If that means that your 2-class GMM has a higher BIC than a 1-class GMM, then that points to 1 class also for the GMM. We want to find the model with the smallest BIC.
 Ketan posted on Sunday, July 20, 2014 - 9:55 am
I apologize. I meant to say that I get a lower BIC for the 2-class GMM model than the 1 class GMM. Yet this is not true for the LCGA. I get a higher BIC for the 2-class LCGA than the 1-class model.
 Bengt O. Muthen posted on Monday, July 28, 2014 - 4:23 pm
Seems ok tome. A 2-class LCGA is more restrictive than a 2-class GMM, so the extra parameters for LCGA when going from 1 to 2 classes may not be worth it, but they are for GMM.
 John Woo posted on Saturday, November 14, 2015 - 2:17 pm
Dear Dr. Muthen, my two-class model for GMM and LCGA yield similar-looking trajectories but the class membership shift quite a bit between the two models. Because my sample is relatively small (n=117), even a slight shift in class membership makes a sizable difference in the results pertaining to distal outcomes. I want to go ahead with the LCGA because the result in terms of the distal outcomes fits the expectation better, but I am not sure how to make a good justification for using LCGA instead of GMM. The GMM model converges fine, with the variances of growth factors ss at 0.1 level (but not at 0.05 level). The entropy and avg prob are higher for LCGA than for GMM, but AIC, BIC, SABIC are better for GMM (but not significantly better). Would a small sample size be of any factor in deciding between GMM and LCGA? If the extent to which you relax model assumptions indicates how much closer the model is to reality, then GMM seems like a better choice.. but the actual results align better with LCGA in terms of what I know from the literature. Any suggestion would be appreciated. Thank you.
 Bengt O. Muthen posted on Saturday, November 14, 2015 - 2:57 pm
Hard to say. BIC doesn't work too well for such a small sample size. And you don't have much power for getting a small SE for the growth factor variances at that n. I think you need to report both models to honestly portray the situation and say that only a new, larger sample can decide on this.
 Katharine Buek posted on Tuesday, January 23, 2018 - 8:33 am
I have a similar situation. I have 6 time points. GMM models produce more interesting curve shapes, and the fit statistics (BIC, etc.) are better, but entropy and classification probabilities are lower (~.6). LCGA gives me better entropy and class probs (>.8) but the BIC is higher and of course I lose the interesting curve shapes. However, if I'm predicting distal outcomes, isn't it a problem to have low entropy and class probs? Also, wouldn't I need some theoretical justification for choosing an LCGA when there is significant variance within classes??
 Bengt O. Muthen posted on Tuesday, January 23, 2018 - 5:45 pm
I think BIC and entropy to some extent play the roles of regular SEM's chi-square and R-square. You can have great chi-square fit and a poor R-square for key relations - or the reverse. They are really two different things and you wouldn't interpret the R-square unless the chi-square was good.

So with that analogy, I don't think entropy should take priority over BIC. Furthermore, I think the coefficients in the prediction of the distal outcome as a function of the latent class variable can be well estimated even without high entropy.
 Nour Azhari posted on Wednesday, April 11, 2018 - 3:50 pm
Good afternoon professor,

I ran an LCGA with a linear model and found a 3 class model. However,
my estimates of intercept and slope for each are as follows:

Class 1: i:8.822 s: -0.672
Class 2: i: 1.724 s: 0.194
Class 3: i: 1.708 s: -0.136

Class 2 and 3 have very close intercepts. Is that an issue and means it is giving me erroneous classes or it's okay to have classes with one similar growth parameter?

Also, how do we know if the difference is significant?

 Bengt O. Muthen posted on Wednesday, April 11, 2018 - 4:38 pm
Q1: Yes. Your slopes are very different for class 2 and class 3.

Q2: Use Model Test where you express differences between these means using parameter labels in the Model command. See the UG for how to use Model Test.
 Nour Azhari posted on Sunday, April 22, 2018 - 7:16 am
thank you. I have a follow up question. I am modeling daily alcohol consumption during a treatment trial for AUD. However my sample size is very small (N=40) so as you can imagine I have a lot of problems of convergence. Therefore I am using a linear model in the LCGA, because adding a quadratic term just doesn't work .

So these classes above are the 3 classes I found. However, I am having a hard time interpreting the classes because of the linear shape. Class 2 for example, has worsening of symptoms and the other 2 benefit, although one benefits more than the other (steeper slope). However, at what time point should I make my interpretation regarding the "outcome of the trial?" (moderate drinking, abstinence, binge drinking etc). Because theoretically, those benefiting will always reach "abstinence" (because a negative slope line will always reach 0 and those worsening will linearly get worse over time to the point that the value of alcohol consumption isn't even meaningful."

Should I interpret the means of alcohol consumption at the last time point? ( in my case it represents the end of treatment)

Let me know your thoughts on how to deal with this.

Thank you

 Bengt O. Muthen posted on Sunday, April 22, 2018 - 10:16 am
That is a very small sample.

I would think the whole process as affected by the treatment is of interest, e.g. the speed of improvement. You may want to ask on SEMNET.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message