I am trying to model the growth trajectories for subgroups of (approx 800) Vietnam veterans who completed a treatment program. The vets were assessed on a measure of trauma at 4 time points; intake, 6 months, 12 months and 24 months. Previous analyses (latent profile analysis) of the data suggests that there may be subgroups within the sample. But the reviewers were unhappy that I didn't use growth modeling techniques. Is there a procedure that enables me to (1) model the data in such a way as to get Mplus to produce different numbers of trajectories, (2) calculate the fits of the different solutions (i.e., different numbers of trajectories)? and (3) determine the intercept and slope for each trajectory? I want to be able to nominate the number of trajectories that most parsimoniously describes the data.
Yes, this is what growth mixture modeling accomplishes. For references, please see the Mplus home page at
especially papers 82, 85, 86.
Koen Luyckx posted on Friday, February 18, 2005 - 7:58 am
i'm trying to identity different trajectories of perceived parenting using five measurement waves each six months apart. When I use LCGA (thus allowing no within-class variability on the growth parameters), the solution with three classes is the best one in terms of BIC and entropy-values. Moreover, these three classes have meaningful differential relationships with external variables. On the other hand, if I just estimate one class and allow within-class variability, the BIC is slightly better than the three-class solution with the Nagin-approach (as in Muthen, 2001). If I use the Nagin-approach as starting values and do "classical" mixture modeling, the BIC and entropy worsen significantly and the model does not converge well. Would it be useful to continue working with the three-class solution according to the Nagin-approach because these classes are theoretically expected and have differential relations with external variables, despite the fact that they appear to originate from one population in which there is significant variation (as evidenced by a better BIC for the 1-class solution with variability)?
bmuthen posted on Friday, February 18, 2005 - 9:06 am
Interpretability is a very important aspect. But perhaps you can consider the following. Do you use Version 3 so that you can get a sufficient search over many starting values to avoid local maxima (using the starts option)? You say that the regular growth model (1-class with variation) has a better BIC than the 3-class LCGA (a la Nagin) - what about BICs for a 2- and a 3-class model with within-class variation - is that what you refer to when you say "classical mixture modeling"? Have you tried that with Mplus-generated starting values using starts = 50 5, say? Also, you may want to consider ABIC (sample-size adjusted BIC) as well as LMR (Tech11) to guide you. If you have continuous outcomes, allowing for within-class variation typically gives better model fit, but not necessarily so with say dichotomous outcomes.
Anonymous posted on Monday, February 21, 2005 - 1:38 am
Many thanks for the quick reply and valuable suggestions. I have an additional question. The variable I am using is significantly positively skewed. Does this make the variable less suitable to use in GMM?
bmuthen posted on Saturday, February 26, 2005 - 5:48 pm
You can handle that by using the MLR estimator, giving non-normality-robust SEs. Unless you have a strong floor or ceiling effect in which case you might want to consider two-part modeling.
Anonymous posted on Friday, April 01, 2005 - 1:59 pm
Hi, can you tell me what model/analysis command could be used for using latent class membership and covariates to predict binary outcomes? like is Muthen and Shedden's Biometrics paper? Thanks a lot
BMuthen posted on Saturday, April 02, 2005 - 8:54 pm
You would use the CATEGORICAL option of the VARIABLE command to specify that the outcomes are binary. The ON option is used to specfy the regression of the binary outcome on covariates. The influence of latent class membership on the binary outcome is captured by the class-varying thresholds for the binary outcomes (this is the default).
Anonymous posted on Sunday, April 10, 2005 - 2:10 pm
I am running a 2-class mixture model and I am trying to see if class membership predicts a distal outcome (a continuous variable). When I add the command:
outcome on c#1;
I get an error message. What do I need to do to get this?
This is not how you inlcude a distal outcome in the model. See distal outcome in the index of the Mplus User's Guide for such an example. I do not have a user's guide handy so I cannot send you to the exact example. The idea is that the thresholds or means of the distal outcome variaes across classes. This captures the regression of the outcome on class.
Anonymous posted on Monday, April 11, 2005 - 2:21 pm
hello, I am running a mixture model for longitudinal data. here is the command file: data: file is "D:\My DocumentsFiles\completedata.txt"; variable: names are rowname x1-x7 y1-y12; usevariables are y7-y12 x2 x7; classes=c(3); analysis:type=mixture; starts=100,2; model: %overall% i s q|y7@0y8@1y9@2y10@3y11@4y12@5; i s q on x2 x7; c#1 on x2 x7; c#2 on x2 x7; output: tech1;
I kept getting error messeges as the following: WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IN CLASS 1 IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE S.
WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IN CLASS 2 IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE S.
WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IN CLASS 3 IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE S.
do you have any comments on what might went wrong? Thanks a lot.
Anonymous posted on Monday, April 11, 2005 - 6:21 pm
Hello Linda/Bengt, I have a question about introducing a distal outcome into the mixture analysis for multiple trajectories. I want to know what kind of distal outcome can Mplus handle so far besides binary? what about a continous ditsal?
BMuthen posted on Wednesday, April 13, 2005 - 11:52 pm
Re: ananymous at 2:21, this means that you have a zero or negative within class residual variance for s or that s correlates one or greater with another growth factor. See TECH4 output to help determine the problem. This does not mean that anything went wrong but simply that when you have several classes, the within class variation can be very small because the differences in class means take care of most of the variation. The solution would be to fix the variances to zero.
Re: anonymous at 6:21, distal outcomes can be categorical, continuous, censored, or counts.
Anonymous posted on Friday, June 17, 2005 - 9:15 am
Hello Linda/Bengt, I am trying to do a simple growth curve analysis with mixtures. I used commands:classes; c(3); analysis: mixture and model: i s q|y1-y5; currently I am not including the covariates in the model. in the output, I did get a final counts in each class and estimates of the mean and variance. I wonder if there is a way to retreat the posterior probability for each individual, i.e. can it display or calculate the probability that each subject be in each class? Thanks a lot.
I am trying to model growth trajectories for 7 groups (cohorts). I can get the estimates for one group at a time, using USEOBS. When I try to model them altogether, however, my model does not converge. Must I specify group-specific models to reach convergence?
You should send your output, data, and license number to email@example.com so we have more information. This sounds like a data-model dependent problem.
Anonymous posted on Friday, August 26, 2005 - 12:45 pm
Hi, I was reading the Muthen and Shedden's Biometric 1999 paper. The part of the identifiablility is rather short in the paper. I wonder if there is anything else other than specifying lamba_y needed for the model to be identifiable. especially that is it has anything to do with the type and levels of the covariates. Is there any reference paper can help?
bmuthen posted on Friday, August 26, 2005 - 3:54 pm
There is no directly/practically useful writing on identification with mixtures. The way I have found it useful to think about it is to ask if the model in each class would be identified if you knew the class membership, that is analyzed data on people in that class in a regular, non-mixture analysis. Therefore, identification rules from confirmatory factor analysis and regular growth modeling (repeated measures analysis) help. Covariates help identification in those models and therefore in mixture modeling.
Hello Bengt and Linda, I'm running a GGMM where latent trajectory classes predict a distal outcome. My question is: is it possible to include more than one distal outcome, whereby one distal continuous outcome interacts with latent class membership to influence an even more distal continuous outcome? If this is possible, how does one specify this in the model command? Thanks.
Thank you Bengt and Linda, for your helpful advice on running a GGMM model with an interaction between latent class membership and a distal categorical outcome (u1), as it influences a second distal continuous outcome (u2).
However, I'm having trouble getting the model to run when I include the u2 ON u1 relationship in each class. My model statements (with error message) are:
I am trying to include a continous distal outcome in a growth mixture model. Could you let me know what is mplus commands/ways to do that? I found an example for a categorical distal outcome but can't find one for a continous one. Thanks.
You do it the same way as for a categorical distal outcome. The only difference is that you don't put the variable on the CATEGORICAL list and means change across classes rather than thresholds.
Nigel posted on Thursday, November 15, 2007 - 2:58 pm
Hello A simulation study presented at the 2006 CILVR modeling conference showed including a covariate in a growth mixture model reduced power to detect the appropriate number of classes. You seem to say that the issue is not well understood rather than saying include the covariate or exclude the covariate. I'm wondering what your latest thinking on this issue is. thanks.
Have you looked at my 2004 chapter in the Kaplan handbook on our web site?
My current thinking is that one should first do the analysis without covariates as a practical matter, but then not be surprised if the class enumeration comes out differently due to adding covariates - per the example given in this chapter.
Nigel posted on Friday, November 16, 2007 - 9:18 am
I am new to MPLUS, and have to test something similar to example 8.8. (i.e. black versus white) and longitudinally measured health behaviors, and a distal outcome of health. I am hoping to see if different latent classes not only emerge from longitudinally observed data, but also whether race changes these latent class trajectories, and eventually, whether these trajectories predict health 10 years later. So, it's a mix of 8.6 and 8.8 really.
what is confusing me, is that i don't see anything differentiating 8.6 from 8.1, that no where in 8.6 is u in the MODEL. My outcome is continuous, so really don't see where i would put that. would i simply add a line at the end saying u ON c ?
All of the variables on the NAMES or USEVARIABLES list, if there is one, are included in the analysis. The difference between 8.1 and 8.6 is that u is on the NAMES list in 8.6. The effect you are looking at is found in the varying of the thresholds of u across classes. It is not explicitly specified.
Weiwei Liu posted on Monday, March 28, 2011 - 7:41 am
In a growth mixture model, when estimating a class-specific intervention effect on a nominal distal outcome, the intercepts of the distal outcome categories are constrained to be the same across classes by default. This puzzles me especially because when estimating a class-specific intervention effect on a binary distal outcome, the threshold of that outcome is set to be different across classes by default. Could you explain the rationale behind these two different defaults?
Hi, I’m doing a multiple group analysis LGM on 3 times. I have a tree groups variable (n1=no treatment, n2=low treatment, n3=high treatment). What is the minimum sample size of observations in each group? And some references about it. Just to let you know: n1=100, n2=70 and n3=24.
I am running a GMM with known class based on example 8.8. I have 3 known class and 2 classes for y1-y4.
Now, I wonder how to set up model constrains for comparing intercept, slope, and also co-vars are invariant across 6 classes.
I have looked up on chapter 14 on page 428 to see how to set up constrains for intercept and slope first. I met error messages about the program cannot recognize model g1: or model class1: or model cg#1.c#1:
And I am not sure whether following syntax would work for comparing each pairs of classes? i s | y1@0y2@1y3@2y4@3; [y1-y4@]; [INT@0SLP@0];
(1) In the output, I and S for my covariates are the same for every class (x1 S for c1 is the same as x1 S for c2). This happens for all covariates, also in 3 and 4 class solutions. I want the GMM to take the information and differences of covariates into account when building the classes.
(2) DEFINE In Chapter 18 of the MANUAL it says variables should be redefined if they have larger numbers than 10. How important is this really? Most psychological questionnaires' sum scores range up to 30 or 40, and I have one ranging up to 80. If I have to rescore, what would you recommend?
EFried posted on Tuesday, January 10, 2012 - 9:12 pm
(3) 999.00 I have lots of 999 (not able to compute) in my output. Under model results, I and S of my 5 measurement points of the dependent variable is all 999, so are the Intercepts. Is that normal?
(4) Covariates Can I run the GMM with all covariates, kick the ones that don't contribute to classes or affect intercepts/slopes, and rerun GMMs until I only have "meaningful" covariates in the model?
(5) Random I and S for individual classes Adding %c#1% i s; %c#2% i s; I get the error "Unknown class label: %C#1% I S;"
Please limit your question to one screen. We have a limit to keep questions manageable.
1. The default is to hold these coefficients equal across classes. You can relax the equality by mentioning the regressions in the class-specific parts of the MODEL command. 2. We recommend keeping variances of the variables between one and ten. It can sometimes cause convergence problems when variances are large. 3. I would have to see your output and license number at firstname.lastname@example.org to answer this. 4. I would keep all of the covariates in the model. I would not overtrim the model. 5. Perhaps you need to put i s; on the next line after the class label.
AJB posted on Monday, February 06, 2012 - 10:40 am
Dear Drs. Muthen:
I am extending Singer & Willet's (2003) longitudinal data analysis into a mixture, ie. Table 5.7 Model C (p.163)- a growth model, with data in long format with time-invariant and varying covariates on intercepts and slopes (students in schools). Questions: 1) I would prefer to use the wide format. However, Example 6.10 in Mplus 6.1 appears to require that time-varying covariates be on the y's. From the S&W perspective, I'd like to have the time-varying covariates on initial status of each trajectory and then interact each with time to put them on the "change over time" slopes. 2) In long format students are able to "jump" from classes over time. Is there a way in long to hold them in each class? Data is long format: Define: a1TIME = a1*YEAR; Variable: Names = ID SCHID SCORE YEAR x1 a1 URBAN; IDVARIABLE = ID; CLUSTER = SCHID; USEVARIABLES = SCORE YEAR x1 a1 URBAN; CLASSESS = c(3); WITHIN = YEAR SCORE x1 a1; BETWEEN = URBAN RURAL; ANALYSIS: TYPE = MIXTURE TWOLEVEL RANDOM; MODEL: %WITHIN% %OVERALL% s | SCORE ON YEAR; SCORE ON x1 a1 a1TIME; c ON x1 a1; !Not including "by time" on c %C#1% SCORE ON x1 a1 a1TIME; %C#2% SCORE ON x1 a1 a1TIME; %C#3% SCORE ON x1 a1 a1TIME; %BETWEEN% %OVERALL% s c ON URBAN; SCORE WITH s;
1) I don't know that you can change the slope as a function of time-varying covariates easily in wide format. S & W do it in long format which you can do in Mplus.
2) In long format the latent classes refer to subjects, that is, level 2. This means that you have to put the latent class variable on the Between= list.
AJB posted on Wednesday, February 08, 2012 - 9:53 am
Thank you for the response. I have a few follow-up questions.
1) Placing c BETWEEN requires that level 1 time-varying covariates are the same across the time trajectory classes since WITHIN variables cannot be mentioned again at BETWEEN. Conceptually though, with say scores across time nested within students (in long format), a time-varying covariate (such as attendance and attendance*time) should be allowed to differ across latent trajectory classes on the intercepts and slopes of scores over time. At least that's my reading of extending S&W into a mixture using long-format data. Is there a way to allow these to vary across classes?
2) The other issue with c BETWEEN is that I am unable to regress time-varying covariates on c, since they're specified WITHIN. It would be nice to see if the probability of being in each trajectory class differed by WITHIN variables, such as attendance, but I'm not sure how or if it can happen. Singer & Willet require time-varying covariates to be WITHIN, and having a probability of inclusion in a latent class vary over time seems odd now that I type it out. Is this possible? Or would it be better to aggregate time-varying data and place it on c at level 2?
Those are research questions that I don't know the answer to off-hand. You can have a within-level c (in addition to a between-level c), you just have to remember that with growth modeling in long format the within level refers to variation across time.
I am modeling multiple growth trajectories and have a correlation between 2 slopes that is greater than 1.0. I have constrained the correlation to 1.0 (and equated associated covariances), but unfortunately these constraints results in a significant detriment to model fit (based on difftest, because I am using WLSMV).
Do I have other options for keeping my estimates within bounds? Or is it appropriate to simply acknowledge that the model fit would be better without these constraints, but opt to use them anyway in order to keep admissible estimates?
I have specified a latent growth model for a couples' intervention using a similar format to the twin model that Dr. Muthen presented in one of the short courses, estimating different slopes for men and women. Additionally, I included the treatment condition as a time-varying covariate. The treatment effect was not significantly different between genders, and in this case it is customary to pool effects across genders in order to present more a parsimonious interpretation. In analogous multilevel models, I know this is done by writing custom hypothesis tests, however, I am not aware of similar functionality in Mplus. Can I simply compute the estimates by pooling standard errors with typical formulas, or do I need a different formula specific to model-based estimates? I am also curious whether this is a reasonable approach, or I'm missing a better solution. Thanks very much for your time.
I am trying to assess power for a GMM that I have proposed. I want to find out the power of my sample to detect stable hi, stable lo, increasing, and decreasing levels of an outcome variable. Below is how I have approached this, but I suspect this is not correct. In addition, how do you specify expectations about the proportions of the various trajectories?
(b) Got the "MODEL" bit, but omitted due to post length constraints.
(c) Can't find how to do this. On other posts I read mysterious things like "
If I want to create 30:70 class proportion using the code "[c#1*-0.8472];"
No idea what -.8472 has to do with setting the likely class proportions to 30:70
(d) This is very helpful. So the bottom line is that this analytic approach is probably not going to be very useful for a sample limited to 250 people unless the group are very strongly differentiated by I and S.
I conducted a multivariate GLM for three variables (y1-y6, z1-z6, & x1-x6): i1 s1| y1@0y2@1y3@2y4@3y5@4y6@5; i2 s2| z1@0z2@1z3@2z4@3z5@4z6@5; i3 s3| x1@0x2@1x3@2x4@3x5@4x6@5; I aim to examine a mediation model in which s2 mediates the relationship between s1 and s3. 1) I am wondering whether my below analysis is correct? s2 on i1; s2 on s1 (a); s3 on i2; s3 on s2 (b); MODEL CONSTRAINT: NEW(inS); inS = a*b; 2) the above model fits the data significantly and the mediation effect is significant. However, when I add the path “s3 on i1 s1” then the mediation is no longer significant (s3 has a significant regression on s1). Should I keep this path in the model or the above model is sufficient?
I have some questions regarding fitting growth models to my data (N=75, 3 time points, using type=complex). I first attempted to fit a general growth curve with i s and q. However I get a warning regarding the fisher information matrix, with the problem involving q with s. A model with only i and s gives problems involving a non-positive theta or psi. Indeed, sometimes (i have several variables I want to model independently), I have a negative residual variance. Other times I have a estimated correlation of .999 between i en s in tech4 output.
Could the problem be that there is too much variation within the growth model and that a LCGA would be better? The LCGA's I have tried did produce models without warnings, but entropy and model fit didn't seem too good. Would it help to include some theoretically relevant predictors or outcomes? Or would it be better to start by identifying trajectories, before adding other variables?
Thank you for your reply. I will then only include i and s in my models.
However, when I have a model with only i and s, i still get warnings, namely that I have a non-positive theta or psi (depending on the variable I a modelling). Indeed, the models either have a negative residual variance or have a estimated correlation of .999 between i en s in tech4 output.
What does this mean? Is there a way I can remedy this problem?