Message/Author 


I am trying to model the growth trajectories for subgroups of (approx 800) Vietnam veterans who completed a treatment program. The vets were assessed on a measure of trauma at 4 time points; intake, 6 months, 12 months and 24 months. Previous analyses (latent profile analysis) of the data suggests that there may be subgroups within the sample. But the reviewers were unhappy that I didn't use growth modeling techniques. Is there a procedure that enables me to (1) model the data in such a way as to get Mplus to produce different numbers of trajectories, (2) calculate the fits of the different solutions (i.e., different numbers of trajectories)? and (3) determine the intercept and slope for each trajectory? I want to be able to nominate the number of trajectories that most parsimoniously describes the data. 

bmuthen posted on Monday, June 03, 2002  8:30 am



Yes, this is what growth mixture modeling accomplishes. For references, please see the Mplus home page at www.statmodel.com especially papers 82, 85, 86. 

Koen Luyckx posted on Friday, February 18, 2005  7:58 am



Hello, i'm trying to identity different trajectories of perceived parenting using five measurement waves each six months apart. When I use LCGA (thus allowing no withinclass variability on the growth parameters), the solution with three classes is the best one in terms of BIC and entropyvalues. Moreover, these three classes have meaningful differential relationships with external variables. On the other hand, if I just estimate one class and allow withinclass variability, the BIC is slightly better than the threeclass solution with the Naginapproach (as in Muthen, 2001). If I use the Naginapproach as starting values and do "classical" mixture modeling, the BIC and entropy worsen significantly and the model does not converge well. Would it be useful to continue working with the threeclass solution according to the Naginapproach because these classes are theoretically expected and have differential relations with external variables, despite the fact that they appear to originate from one population in which there is significant variation (as evidenced by a better BIC for the 1class solution with variability)? Many thanks, Koen 

bmuthen posted on Friday, February 18, 2005  9:06 am



Interpretability is a very important aspect. But perhaps you can consider the following. Do you use Version 3 so that you can get a sufficient search over many starting values to avoid local maxima (using the starts option)? You say that the regular growth model (1class with variation) has a better BIC than the 3class LCGA (a la Nagin)  what about BICs for a 2 and a 3class model with withinclass variation  is that what you refer to when you say "classical mixture modeling"? Have you tried that with Mplusgenerated starting values using starts = 50 5, say? Also, you may want to consider ABIC (samplesize adjusted BIC) as well as LMR (Tech11) to guide you. If you have continuous outcomes, allowing for withinclass variation typically gives better model fit, but not necessarily so with say dichotomous outcomes. 

Anonymous posted on Monday, February 21, 2005  1:38 am



Many thanks for the quick reply and valuable suggestions. I have an additional question. The variable I am using is significantly positively skewed. Does this make the variable less suitable to use in GMM? 

bmuthen posted on Saturday, February 26, 2005  5:48 pm



You can handle that by using the MLR estimator, giving nonnormalityrobust SEs. Unless you have a strong floor or ceiling effect in which case you might want to consider twopart modeling. 

Anonymous posted on Friday, April 01, 2005  1:59 pm



Hi, can you tell me what model/analysis command could be used for using latent class membership and covariates to predict binary outcomes? like is Muthen and Shedden's Biometrics paper? Thanks a lot 

BMuthen posted on Saturday, April 02, 2005  8:54 pm



You would use the CATEGORICAL option of the VARIABLE command to specify that the outcomes are binary. The ON option is used to specfy the regression of the binary outcome on covariates. The influence of latent class membership on the binary outcome is captured by the classvarying thresholds for the binary outcomes (this is the default). 

Anonymous posted on Sunday, April 10, 2005  2:10 pm



I am running a 2class mixture model and I am trying to see if class membership predicts a distal outcome (a continuous variable). When I add the command: outcome on c#1; I get an error message. What do I need to do to get this? Thanks 


This is not how you inlcude a distal outcome in the model. See distal outcome in the index of the Mplus User's Guide for such an example. I do not have a user's guide handy so I cannot send you to the exact example. The idea is that the thresholds or means of the distal outcome variaes across classes. This captures the regression of the outcome on class. 

Anonymous posted on Monday, April 11, 2005  2:21 pm



hello, I am running a mixture model for longitudinal data. here is the command file: data: file is "D:\My DocumentsFiles\completedata.txt"; variable: names are rowname x1x7 y1y12; usevariables are y7y12 x2 x7; classes=c(3); analysis:type=mixture; starts=100,2; model: %overall% i s qy7@0 y8@1 y9@2 y10@3 y11@4 y12@5; i s q on x2 x7; c#1 on x2 x7; c#2 on x2 x7; output: tech1; I kept getting error messeges as the following: WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IN CLASS 1 IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE S. WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IN CLASS 2 IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE S. WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IN CLASS 3 IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE S. do you have any comments on what might went wrong? Thanks a lot. 

Anonymous posted on Monday, April 11, 2005  6:21 pm



Hello Linda/Bengt, I have a question about introducing a distal outcome into the mixture analysis for multiple trajectories. I want to know what kind of distal outcome can Mplus handle so far besides binary? what about a continous ditsal? many thanks, 

BMuthen posted on Wednesday, April 13, 2005  11:52 pm



Re: ananymous at 2:21, this means that you have a zero or negative within class residual variance for s or that s correlates one or greater with another growth factor. See TECH4 output to help determine the problem. This does not mean that anything went wrong but simply that when you have several classes, the within class variation can be very small because the differences in class means take care of most of the variation. The solution would be to fix the variances to zero. 


Re: anonymous at 6:21, distal outcomes can be categorical, continuous, censored, or counts. 

Anonymous posted on Friday, June 17, 2005  9:15 am



Hello Linda/Bengt, I am trying to do a simple growth curve analysis with mixtures. I used commands:classes; c(3); analysis: mixture and model: i s qy1y5; currently I am not including the covariates in the model. in the output, I did get a final counts in each class and estimates of the mean and variance. I wonder if there is a way to retreat the posterior probability for each individual, i.e. can it display or calculate the probability that each subject be in each class? Thanks a lot. 


See the CPROBABILITIES option of the SAVEDATA command. This allows you to save the posterior probabilities. 

Stacey posted on Tuesday, June 21, 2005  2:56 pm



Linda (or Bengt), I am trying to model growth trajectories for 7 groups (cohorts). I can get the estimates for one group at a time, using USEOBS. When I try to model them altogether, however, my model does not converge. Must I specify groupspecific models to reach convergence? 


You should send your output, data, and license number to support@statmodel.com so we have more information. This sounds like a datamodel dependent problem. 

Anonymous posted on Friday, August 26, 2005  12:45 pm



Hi, I was reading the Muthen and Shedden's Biometric 1999 paper. The part of the identifiablility is rather short in the paper. I wonder if there is anything else other than specifying lamba_y needed for the model to be identifiable. especially that is it has anything to do with the type and levels of the covariates. Is there any reference paper can help? 

bmuthen posted on Friday, August 26, 2005  3:54 pm



There is no directly/practically useful writing on identification with mixtures. The way I have found it useful to think about it is to ask if the model in each class would be identified if you knew the class membership, that is analyzed data on people in that class in a regular, nonmixture analysis. Therefore, identification rules from confirmatory factor analysis and regular growth modeling (repeated measures analysis) help. Covariates help identification in those models and therefore in mixture modeling. 


Hello Bengt and Linda, I'm running a GGMM where latent trajectory classes predict a distal outcome. My question is: is it possible to include more than one distal outcome, whereby one distal continuous outcome interacts with latent class membership to influence an even more distal continuous outcome? If this is possible, how does one specify this in the model command? Thanks. 


If you have one distal outcome predicting another one, you can use ON to specify this relationship, for example, u2 ON u1; 


And the interaction with class is obtained by letting this u2 on u1 regression be mentioned in each class. 


Thank you Bengt and Linda, for your helpful advice on running a GGMM model with an interaction between latent class membership and a distal categorical outcome (u1), as it influences a second distal continuous outcome (u2). However, I'm having trouble getting the model to run when I include the u2 ON u1 relationship in each class. My model statements (with error message) are: CATEGORICAL IS u1; MODEL: %OVERALL% i s  pr1@0 pr2@1 pr3@2 pr4@3 pr5@4 pr6@5; i s ON age1 female ; c#1 ON age1 female; u2 ON u1; %c#1% [u1$1*1 u2] u2 ON u1$1; %c#2% [u1$1*0 u2] u2 ON u1; *** ERROR in Model command Unknown variable(s) in an ON statement: u2 What am I doing wrong? Thanks. 


You have u2 ON u1$1; but you want to say u2 ON u1; Also, make sure that u2 in on the USEV list. 


Hello, I am trying to include a continous distal outcome in a growth mixture model. Could you let me know what is mplus commands/ways to do that? I found an example for a categorical distal outcome but can't find one for a continous one. Thanks. 


You do it the same way as for a categorical distal outcome. The only difference is that you don't put the variable on the CATEGORICAL list and means change across classes rather than thresholds. 

Nigel posted on Thursday, November 15, 2007  2:58 pm



Hello A simulation study presented at the 2006 CILVR modeling conference showed including a covariate in a growth mixture model reduced power to detect the appropriate number of classes. You seem to say that the issue is not well understood rather than saying include the covariate or exclude the covariate. I'm wondering what your latest thinking on this issue is. thanks. 


Have you looked at my 2004 chapter in the Kaplan handbook on our web site? My current thinking is that one should first do the analysis without covariates as a practical matter, but then not be surprised if the class enumeration comes out differently due to adding covariates  per the example given in this chapter. 

Nigel posted on Friday, November 16, 2007  9:18 am



thanks for this...NIgel 


I am new to MPLUS, and have to test something similar to example 8.8. (i.e. black versus white) and longitudinally measured health behaviors, and a distal outcome of health. I am hoping to see if different latent classes not only emerge from longitudinally observed data, but also whether race changes these latent class trajectories, and eventually, whether these trajectories predict health 10 years later. So, it's a mix of 8.6 and 8.8 really. what is confusing me, is that i don't see anything differentiating 8.6 from 8.1, that no where in 8.6 is u in the MODEL. My outcome is continuous, so really don't see where i would put that. would i simply add a line at the end saying u ON c ? thanks for your help! 


All of the variables on the NAMES or USEVARIABLES list, if there is one, are included in the analysis. The difference between 8.1 and 8.6 is that u is on the NAMES list in 8.6. The effect you are looking at is found in the varying of the thresholds of u across classes. It is not explicitly specified. 

Weiwei Liu posted on Monday, March 28, 2011  7:41 am



In a growth mixture model, when estimating a classspecific intervention effect on a nominal distal outcome, the intercepts of the distal outcome categories are constrained to be the same across classes by default. This puzzles me especially because when estimating a classspecific intervention effect on a binary distal outcome, the threshold of that outcome is set to be different across classes by default. Could you explain the rationale behind these two different defaults? Thank you very much! 


I just ran a GMM with a distal outcome treating the variable as categorical in one case and nominal in the other and the thresholds are free across classes in both cases. 


Hi, I’m doing a multiple group analysis LGM on 3 times. I have a tree groups variable (n1=no treatment, n2=low treatment, n3=high treatment). What is the minimum sample size of observations in each group? And some references about it. Just to let you know: n1=100, n2=70 and n3=24. Thank you. 


Hi Drs. Muthen, I am running a GMM with known class based on example 8.8. I have 3 known class and 2 classes for y1y4. Now, I wonder how to set up model constrains for comparing intercept, slope, and also covars are invariant across 6 classes. I have looked up on chapter 14 on page 428 to see how to set up constrains for intercept and slope first. I met error messages about the program cannot recognize model g1: or model class1: or model cg#1.c#1: And I am not sure whether following syntax would work for comparing each pairs of classes? i s  y1@0 y2@1 y3@2 y4@3; [y1y4@]; [INT@0 SLP@0]; Thank you! 


Annie: It is impossible to say what size sample is needed without doing a Monte Carlo study based on the data situation involved. See the following paper which is on the website: Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. 


ChienTi: Please send your output and license number to support@statmodel.com so I can see the full input and your results. 

EFried posted on Tuesday, January 10, 2012  9:10 pm



Hello. I have a couple of questions regarding a longitudinal GMM I am running currently. Dataset: N=800, 5 measurement points of one dependent variable (5% missing at first to 40% missing at 5h measurement point), 4 timeinvariant covariates, and 1 timevarying covariate. Syntax: ... classes=c(2); ANALYSIS: type = MIXTURE; Starts = 50 10; STITERATIONS = 10; MODEL: i s  y0@0 y1@1 y2@2 y3@3 y4@4; is@0; i s ON x1 x2 x3 x4; c ON x1 x2 x3 x4; y1 ON x6; y2 ON x7; y3 ON x8; y4 ON x9; Questions: (1) In the output, I and S for my covariates are the same for every class (x1 S for c1 is the same as x1 S for c2). This happens for all covariates, also in 3 and 4 class solutions. I want the GMM to take the information and differences of covariates into account when building the classes. (2) DEFINE In Chapter 18 of the MANUAL it says variables should be redefined if they have larger numbers than 10. How important is this really? Most psychological questionnaires' sum scores range up to 30 or 40, and I have one ranging up to 80. If I have to rescore, what would you recommend? ... 

EFried posted on Tuesday, January 10, 2012  9:12 pm



... (3) 999.00 I have lots of 999 (not able to compute) in my output. Under model results, I and S of my 5 measurement points of the dependent variable is all 999, so are the Intercepts. Is that normal? (4) Covariates Can I run the GMM with all covariates, kick the ones that don't contribute to classes or affect intercepts/slopes, and rerun GMMs until I only have "meaningful" covariates in the model? (5) Random I and S for individual classes Adding %c#1% i s; %c#2% i s; I get the error "Unknown class label: %C#1% I S;" Probably a syntax error but I can't fix it.  Thank you! T 


Please limit your question to one screen. We have a limit to keep questions manageable. 1. The default is to hold these coefficients equal across classes. You can relax the equality by mentioning the regressions in the classspecific parts of the MODEL command. 2. We recommend keeping variances of the variables between one and ten. It can sometimes cause convergence problems when variances are large. 3. I would have to see your output and license number at support@statmodel.com to answer this. 4. I would keep all of the covariates in the model. I would not overtrim the model. 5. Perhaps you need to put i s; on the next line after the class label. 

AJB posted on Monday, February 06, 2012  10:40 am



Dear Drs. Muthen: I am extending Singer & Willet's (2003) longitudinal data analysis into a mixture, ie. Table 5.7 Model C (p.163) a growth model, with data in long format with timeinvariant and varying covariates on intercepts and slopes (students in schools). Questions: 1) I would prefer to use the wide format. However, Example 6.10 in Mplus 6.1 appears to require that timevarying covariates be on the y's. From the S&W perspective, I'd like to have the timevarying covariates on initial status of each trajectory and then interact each with time to put them on the "change over time" slopes. 2) In long format students are able to "jump" from classes over time. Is there a way in long to hold them in each class? Data is long format: Define: a1TIME = a1*YEAR; Variable: Names = ID SCHID SCORE YEAR x1 a1 URBAN; IDVARIABLE = ID; CLUSTER = SCHID; USEVARIABLES = SCORE YEAR x1 a1 URBAN; CLASSESS = c(3); WITHIN = YEAR SCORE x1 a1; BETWEEN = URBAN RURAL; ANALYSIS: TYPE = MIXTURE TWOLEVEL RANDOM; MODEL: %WITHIN% %OVERALL% s  SCORE ON YEAR; SCORE ON x1 a1 a1TIME; c ON x1 a1; !Not including "by time" on c %C#1% SCORE ON x1 a1 a1TIME; %C#2% SCORE ON x1 a1 a1TIME; %C#3% SCORE ON x1 a1 a1TIME; %BETWEEN% %OVERALL% s c ON URBAN; SCORE WITH s; Thank you! 


1) I don't know that you can change the slope as a function of timevarying covariates easily in wide format. S & W do it in long format which you can do in Mplus. 2) In long format the latent classes refer to subjects, that is, level 2. This means that you have to put the latent class variable on the Between= list. 

AJB posted on Wednesday, February 08, 2012  9:53 am



Dr. Muthen: Thank you for the response. I have a few followup questions. 1) Placing c BETWEEN requires that level 1 timevarying covariates are the same across the time trajectory classes since WITHIN variables cannot be mentioned again at BETWEEN. Conceptually though, with say scores across time nested within students (in long format), a timevarying covariate (such as attendance and attendance*time) should be allowed to differ across latent trajectory classes on the intercepts and slopes of scores over time. At least that's my reading of extending S&W into a mixture using longformat data. Is there a way to allow these to vary across classes? 2) The other issue with c BETWEEN is that I am unable to regress timevarying covariates on c, since they're specified WITHIN. It would be nice to see if the probability of being in each trajectory class differed by WITHIN variables, such as attendance, but I'm not sure how or if it can happen. Singer & Willet require timevarying covariates to be WITHIN, and having a probability of inclusion in a latent class vary over time seems odd now that I type it out. Is this possible? Or would it be better to aggregate timevarying data and place it on c at level 2? Thank you! 


Those are research questions that I don't know the answer to offhand. You can have a withinlevel c (in addition to a betweenlevel c), you just have to remember that with growth modeling in long format the within level refers to variation across time. 


I am modeling multiple growth trajectories and have a correlation between 2 slopes that is greater than 1.0. I have constrained the correlation to 1.0 (and equated associated covariances), but unfortunately these constraints results in a significant detriment to model fit (based on difftest, because I am using WLSMV). Do I have other options for keeping my estimates within bounds? Or is it appropriate to simply acknowledge that the model fit would be better without these constraints, but opt to use them anyway in order to keep admissible estimates? Thanks! 


Having correlations greater than one makes the model inadmissible. Try adding covariances between the residuals of the processes at each time point. 


I have specified a latent growth model for a couples' intervention using a similar format to the twin model that Dr. Muthen presented in one of the short courses, estimating different slopes for men and women. Additionally, I included the treatment condition as a timevarying covariate. The treatment effect was not significantly different between genders, and in this case it is customary to pool effects across genders in order to present more a parsimonious interpretation. In analogous multilevel models, I know this is done by writing custom hypothesis tests, however, I am not aware of similar functionality in Mplus. Can I simply compute the estimates by pooling standard errors with typical formulas, or do I need a different formula specific to modelbased estimates? I am also curious whether this is a reasonable approach, or I'm missing a better solution. Thanks very much for your time. 


In Mplus, this would be equivalent to holding the coefficient equal across groups. 


That is very helpful. I've spent an embarrassingly long time trying to figure this out. This constraint also yielded a huge improvement in the BIC. Thank you! 

Danli Li posted on Thursday, April 16, 2015  2:03 am



Hi, In my study, both father and child were measured simultaneously across six time waves. I want to explore if father's intercept and slope predict children's intercept and slope, which model fits this? 


See the Topic 8 course handout on the website starting at Slide 53. 


Hello, I'm having trouble with a quadratic 4class mixture model (2 timeinvariant, 3 timevariant covariates). With the timeinvariant covariates added, the model runs smoothly. When I add the timevarying: MODEL: %overall% i s q C1@0 C2@1 C3@2 C4@3 C5@4 C6@5 C7@6; C4 WITH C3; S@1; i s q ON Male1 CPuber1; stvc  C1 ON C_SEstC1; stvc  C2 ON C_SEstC2; stvc  C3 ON C_SEstC3; stvc  C4 ON C_SEstC4; stvc  C5 ON C_SEstC5; stvc  C6 ON C_SEstC6; stvc  C7 ON C_SEstC7; [C_SEstC1@0.013 C_SEstC2@0.049 C_SEstC3@0.065 C_SEstC4@0.024 C_SEstC5@0.047 C_SEstC6@0.072 C_SEstC7@0.107] %class#1% i s q ON Male1 CPuber1; ...(same for Class 2, 3, 4)... and try to run it, I get the following error: *** ERROR in MODEL command Unknown variable(s): ON in line: I S Q ON MALE1 CPUBER1 Any idea what's going wrong? Or is the input incorrect? Thank you very much! 


Please send the full output and your license number to support@statmodel.com. 


I am trying to assess power for a GMM that I have proposed. I want to find out the power of my sample to detect stable hi, stable lo, increasing, and decreasing levels of an outcome variable. Below is how I have approached this, but I suspect this is not correct. In addition, how do you specify expectations about the proportions of the various trajectories? MONTECARLO: NAMES ARE a1a3 b1b3 c1c3; NOBSERVATIONS = 250; NREPS = 500; SEED = 134; classes = eibp(4); genclasses = eibp(4); SAVE = lpa.sav; ANALYSIS: Type = Mixture; ESTIMATOR = ML; MODEL MONTECARLO: %OVERALL% t1 by a1*.6 b1*.6 c1*.6; t2 by a2*.6 b2*.6 c2*.6; t3 by a3*.6 b3*.6 c3*.6; t1@1 t2@1 t3@1; i s  t1@0 t2@1 t3@2; i*.5; s*.1; i WITH s*0; a1a3*.5; b1b3*.5; c1c3*.5; %eibp#1% [i*0 s*0]; %eibp#2% [i*0 s*.2]; %eibp#3% [i*.2 s*0]; %eibp#4% [i*.2 s*.2]; 


Model Montecarlo should be replaced by Model Population in line with UG Chapter 12. You want to include a MODEL part as well. You give the class proportions in the Overall using [c#...]. The growth factor means don't differ very much across classes relative to their variances  at least 2 SD diffs are needed for good performance, particularly for small samples. 


(a) used an old example syntax, thanks. (b) Got the "MODEL" bit, but omitted due to post length constraints. (c) Can't find how to do this. On other posts I read mysterious things like " If I want to create 30:70 class proportion using the code "[c#1*0.8472];" No idea what .8472 has to do with setting the likely class proportions to 30:70 (d) This is very helpful. So the bottom line is that this analytic approach is probably not going to be very useful for a sample limited to 250 people unless the group are very strongly differentiated by I and S. 


(d) Right (c) [c#1*0.8472] is a logit and logits can be translated to probabilities as we describe in Chapter 14: P = 1/[1+exp(L)] 


I conducted a multivariate GLM for three variables (y1y6, z1z6, & x1x6): i1 s1 y1@0 y2@1 y3@2 y4@3 y5@4 y6@5; i2 s2 z1@0 z2@1 z3@2 z4@3 z5@4 z6@5; i3 s3 x1@0 x2@1 x3@2 x4@3 x5@4 x6@5; I aim to examine a mediation model in which s2 mediates the relationship between s1 and s3. 1) I am wondering whether my below analysis is correct? s2 on i1; s2 on s1 (a); s3 on i2; s3 on s2 (b); MODEL CONSTRAINT: NEW(inS); inS = a*b; 2) the above model fits the data significantly and the mediation effect is significant. However, when I add the path “s3 on i1 s1” then the mediation is no longer significant (s3 has a significant regression on s1). Should I keep this path in the model or the above model is sufficient? 


Yes, if significant, you should add s3 on s1. I would add s3 on i1 as well and just report that it is insignificant. 


Thank you. Do you mean to report the insignificant mediation? Because when I add s3 on s1 and i1, then the mediation is no longer significant. 


I was referring to s3 on i1 being insignificant. But, yes, I would also report the indirect effect is insignificant. 

Back to top 