Message/Author 

Anonymous posted on Monday, June 03, 2002  1:10 pm



Hello: I am working on a latent class growth analysis (LCGA) of binary indicators, similar to the one in example 25.11 of the Version 2 manual. The only differences are (a) three timepoints instead of four, and (b) a nonlinear growth pattern represented with Helmert contrast coefficients (i.e., comparing each timepoint to the mean of those following). In performing this analysis, I have encoutered two points of confusion. The first is general, the second more specific to my model. First, what is the scale of the "Factor Means" (e.g., intercept and slope factors)? Are these logits, or something else? Second, the 2class model runs normally (though the fit is unexceptional), but I begin to encounter estimation problems with a 3class solution. These typically involve the Fisher Information Matrix in some way. The program cautions me that my start values may be poor, or the model may be unidentified, or both. So far, changing the start values has changed the details of the intermediate solution, but hasn't solved the problem. Furthermore, the error messages frequently cite "parameter #8," which led me to notice that, with 3 binary indicators, the underlying contingency table would have 8 "patterns", only seven of which represent unique pieces of information. Is the model, therefore, underidentified in the absolute sense (you can't have more parameters in the model than the number of unique pieces of information  period)? Or is this more likely a sign of empirical underidentification (model is ok in principle but isn't right for the data)? I would appreciate any insights from those with more experience in LCGA than I have! Thank you. 

bmuthen posted on Tuesday, June 04, 2002  6:43 am



Yes, the scale of the factor means is logits. The best way to check for identification is to request the MLF estimator (this is done automatically in Mplus version 2.1). For identification of all parameters you have to have at least as many pieces of information as parameters, so from this point of view you can only have 7 parameters. 

Dustin posted on Friday, February 04, 2005  2:48 pm



I am interested in understanding the similarities and differences in latent growth trajectories that are identified using various statistical approached. Is LCGA with binary variables in Mplus identical to Nagin's proc traj modeling program in terms of extracting groups? In addition, would you expect to obtain different trajectory groups using GGMM in comparison to the previously mentioned techniques? I know a lot of it has to due with stopping points, but if you chose to extract the same number of latent classes would you anticipate obtaining the same results across the different procedures? 

bmuthen posted on Friday, February 04, 2005  2:57 pm



see Muthen (2004) from the Kaplanedited book which is in pdf on our web site. 


I would expect the same number of classes with Mplus or Proc Traj if the models are the same, that is, LCGA. I would expect a different number of classes with GGMM given that within class variability is allowed unlike in LCGA. 

Dustin posted on Saturday, February 05, 2005  10:07 am



I can not find the examples for the GGMM model referenced in the Muthen (2004) chapter of the Kaplanedited book. Have they not been posted yet? 

bmuthen posted on Saturday, February 05, 2005  1:45 pm



Correct. If you have a specific example you are interested in, I could send it to you. 

Dustin posted on Sunday, February 06, 2005  10:31 am



That would be great. I am interested in the delinquency GGMM model that was specified in the chapter. I am attempting to do a GGMM using a binary indicator of delinquency across 13 time points from age 7 to 19 with missing data (using a linear and quadratic curve components). In your experience, is this model too computationally complex for GGMM. I notice that you fixed the quadratic slope variance to zero to help simplify the book chapter model. Was this done due to the computational complexity of the model? I was planning to let the linear and quadratic slope variance be freely estimated given that both parameters are significant in the conventional oneclass growth model. My email is dap38@pitt.edu 

bmuthen posted on Sunday, February 06, 2005  11:55 am



3 random effects is computationally heavy with growth mixtures. You can reduce the burden by using integration = 5. But typically you don't need a random quadratic but can set its variance at zero  because you have classes that pick up much of the variation. I would do an LCGA model first (all variances zero) and then add intercept variance, then add linear slope variance. 

Dustin posted on Monday, February 07, 2005  3:59 am



Sounds good. Thanks for the advice. I look forward to receiving your syntax. 

bmuthen posted on Monday, February 07, 2005  5:08 pm



The syntax for the LCGA is simply Model: %overall% i s q  u1@0 u2@1....; iq@0; 

Dustin posted on Wednesday, February 09, 2005  10:39 am



I would actually like the syntax for the GGMM model that was specified in the book. Sorry for the misunderstanding. I have already run the LCGA with my data without any problems. It would be interesting to see the model and output for the different class solutions using GGMM in the book if it is avaiable. Again, my email is email is dap38@pitt.edu 

bmuthen posted on Wednesday, February 09, 2005  10:52 am



Will send it to you. 

Anonymous posted on Monday, March 07, 2005  4:55 am



I have performed lcga on 3 identity dimensions simultaneously and lcga on 2 adjustment dimensions simultaneously. The first lcga revealed four classes and the second lcga revealed three classes. Now I want to examine how the four identityclasses relate to the three adjustmentclasses, but I am a bit puzzled how to proceed due to the fact that group membership is probabilistic. With known groups, one can simply perform chisquare analyses by crosstabulating both group memberships, but I do not know if this can be done using latent classes? Similarly, I wonder if one can save group membership obtained using lcga and use this group membership as a basis to conduct multigroupanalysis? Many thanks in advance for all helpful suggestions! 


You can save the posterior probabilities and most likely classmembership. But using most likely class membership in a subsequent analysis would introduce error as each observation has a probability of being in each class. The error would be greater as entropy declines. You can have a model with more than one categorical latent variable. I would suggest that. See Example 7.14 in the Mplus User's Guide. Then you can test a set of nested models. 

Anonymous posted on Tuesday, March 08, 2005  1:07 am



Hello, Is TECH11 (LMRtest) appropriate to define the optimal number of classes in multivariate LCGA (LCGA performed simultaneously on two or more variables at a time), besides BICvalues? Thank you for your help. 

Anonymous posted on Tuesday, March 08, 2005  5:22 am



I have a question concerning LCGA. I have performed LCGA on two variables and found a number of classes for variable a and a number of classes for variable b. Is it a viable option to save group membership probabilities for all classes obtained and to correlate them among each other? A positive correlation, for instance, would then indicate that having a high probablility to belong to the first class for variable a is associated with having a low probability to belong to the first class for variable b. Thanks in advance for your comment. 


TECH11 is one piece of information to use in deciding on the number of classes. This issue is discussed in the following paper: Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345368). Newbury Park, CA: Sage Publications. which can be downloaded from the following link: http://www.statmodel.com/recpapers.html 


Using most likely class membership in this way would introduce errors given that observations are in all classes proportionally during model estimation. I suggest having a model which contains two categorical latent variables and look at the correlation between these. 

Anonymous posted on Friday, March 25, 2005  11:57 am



Hi, I want to use growth mixture model to analyse a longitudinal dataset which contains 12 waves of a continous response variable. The problem is that many subjects only have a few waves of data, some have less than 4. Can Mplus handle this situation? if yes,what approach is used, are missing data filled in first? also, in Muthen and Shedden's paper, binary outcomes (u) can be predicted using class memebership and covariates. has it become a procedure in Mplus yet? Many thanks, 

bmuthen posted on Friday, March 25, 2005  3:54 pm



Yes, Mplus handles missing data by the standard approach of MAR under ML (see introduction paragraphs for the Missing Data section of Mplus Discussion, which is taken from the User's Guide). Yes, all of what is in MuthenShedden is in Mplus and much much more general models. See for example the 2002 Biostatistics paper by Muthen et al on the Mplus web site. 

D. Baliunas posted on Wednesday, October 26, 2005  8:42 am



Hello, Similar to a situation described above, I am trying to use a GMM to analyze a longitudinal data set that contains 44 waves of a continuous variable. There is quite a lot of missing data (I think the worst case has about half the data points missing). From several answers I have seen on this discussion board, and from reading the Mplus manual, I understand that Mplus handles missing data. I'm new to Mplus, and am having trouble setting up my model correctly. I have followed ex8.1 from the User's Guide (version3). Modifying this dataset to introduce missing data, but keeping the syntax from the book, I am able to run the analysis. However, when I apply this simple syntax to my dataset, I get the following message: ***ERROR in Model command Growth factor indicators must be all observed or all latent. What does this mean? Is there an example (data, syntax) that you can recommend that would demonstrate a GMM or LCGA that incorporates missing data? regards, 


The error message means that you are using both observed and latent variables in the same growth model. If you have missing data and want to estimate the model taking this into account, you add TYPE=MISSING; to the ANALYSIS command. If this is not sufficient information to help you, please send your input, data, output, and license number to support@statmodel.com so we can see exactly what the problem is. 

Anonymous posted on Wednesday, January 18, 2006  9:06 am



Dear Drs. Muthen, I am using GMM to model trajectories as a function of 5 classes and then predict latent trajectory class from covariates. The examples in the manual that I've found deal with two trajectory classes, and I understand class regressed on a predictor as a logistic regression (class 1 vs. 2). However, in my case, where there are 5 trajectory classes and I've regressed latent trajectory class #1 on a predictor c#1 ON x I only get one regression coefficient in the output. I guess I expected to see four, one each for predicting class 1 vs. 2, 1 vs. 3, and 1 vs. 4, like one would get with ordinary multinomial regression. What does the regression coefficient from the above statement mean when there are more than two classes? Is it the reference class vs. all others? Thanks! 


You need: c#1 ON x; c#2 ON x; c#3 ON x; c#4 ON x; 

Anonymous posted on Wednesday, January 18, 2006  9:42 am



Thank you. From the above, I surmise that class #5 is being treated as the reference class in *each one* of the four regressions. Is that correct? 


Yes. 

Anonymous posted on Wednesday, January 18, 2006  10:02 am



Great. One additional question on GMM with a covariate. The class membership probabilities seem to change a bit with the addition of covariates. There are missing data, so it makes sense to me that this could be the case. The covariates are contributing additional information on the scores used as indicators of the growth factors. Further, it seems to me that the class probabilities generated from models with additional covariates are probably more reliable b/c more info in the covariance matrix. Is this a valid interpretation, or should I be concerned about the differing class membership probabilities? Thanks again for all your help! 


For a discussion of this issue, see the Muthen chapter in the book edited by Kaplan. You can download it from the website. 

Anonymous posted on Wednesday, January 18, 2006  12:45 pm



With reference to c#1 ON x; c#2 ON x; c#3 ON x; c#4 ON x; from above... I am interested in shifting which class is the reference class for some analyses. For example, I setup the following regressions to use trajectory class #2 as the reference class (it's the only one excluded from the regression statments). c#1 ON x; c#3 ON x; c#4 ON x; c#5 ON x; However, Mplus says that I can't include the highest numbered class in regression analyses. Is there any way around this, so that I can use some other class than the highest number one (5 in this case) as the reference? Thank you. 


If you want to change the last class to a different class, use the parameter estimates of the class that you want to be the last class as starting values for the last class. 


Is the warning message (pasted below) equivalent to negative variance, or would this model be appropriate to interpret? Thanks! ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL. THE FOLLOWING PARAMETERS WERE FIXED: 14 22 


It would depend on which parameters are fixed and the type of model. The best thing is to send your input, data, output, and license number to support@statmodel.com. I would not ignore such a message unless I had more information. 


I'm trying to run an LCGA on crime rates over an 11 year span (n=4000). I've run several variants of the model below, but continue to get warning messages similar to those below. I've set different starting values, but still get error statements. Am I missing something obvious? Is there a sound way to determine which starting values to use? ...... Classes= ndxcls(3); Analysis: Type=Mixture; Starts= 100 10; Estimator=MLF Model: %Overall% Intrcpt Slope ndx87rt@0 ndx88rt@1 ndx89rt@2 ndx90rt@3 ndx91rt@4 ndx92rt@5 ndx93rt@6 ndx94rt@7 ndx95rt@8 ndx96rt@9 ndx97rt@10; WARNING: MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF NONIDENTIFICATION. THE CONDITION NUMBER IS 0.106D17. THE STANDARD ERRORS OF THE MODEL ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO STARTING VALUES BUT MAY ALSO BE AN INDICATION OF NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 15. 


This is a data dependent problem. Please send your input, data, output, and license number to support@statmodel.com and we will see if we can help you. 

Tony Jung posted on Wednesday, September 06, 2006  9:53 am



I'm running an LCGA using a zeroinflated poisson model. How do you control for Time 1? I tried: Model: %overall% i s q  u2@2 u3@1 u4@0 u5@1 u6@2 ; i s q ON u1 ; But the ON part of the model is giving me error messages: *** WARNING in Model command All variables are uncorrelated with all other variables within class. Check that this is what is intended. *** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: I ON U1 S ON U1 Q ON U1 *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. What do I need to do to control for Time 1? 

Tony Jung posted on Wednesday, September 06, 2006  11:21 am



I forgot to add that I am controlling for Time 1 because I have experimental and control groups. Thanks. 


Sounds like your model has no random effects as stated  see the UG ex 6.7  you need to add algorithm = integration; and decide on which growth factors are random. As the error message says, I believe algo=int also allows e.g. i on u1; But if you want to control for baseline, I think a better way is to center at time 1 so you can work with latent (true) baseline i. Then you can regress s and q on i. See also the MuthenCurran 1997 article in Psych Meth. 

Tony Jung posted on Wednesday, September 13, 2006  2:12 pm



Thank you for your helpful suggestions. I have a few more followup questions: First, when I add ALGORITHM=INTEGRATION, it seems to also require INTEGRATION=MONTECARLO when I specify I S Q on U1 COND SEX ; iqi@0; This has increased the computation time considerably. Is this the correct syntax if I want to go with this approach? Second, in your chapter, "Latent Variable Analysis" in the Kaplan book, you underscore the importance of including covariates for LCGA. In my case, I take it to mean I should at least include the intervention (COND). However, by doing this, I can no longer get the graphs of the different classes that I used to by doing: PLOT: SERIES = u1u6 (s); TYPE = PLOT3 ; How can I get the estimated means plot of the different classes when doing a conditioned LCGA? Or is this only available for the unconditioned model? 

Tony Jung posted on Wednesday, September 13, 2006  2:13 pm



Following previous post: A related question is, if I want to use only waves 26 and condition it on wave 1 for the unconditioned LCGA, can I just do: i s q  u2@2 u3@1 u4@0 u5@1 u6@2 ; i s q  u1 ; So what I'm asking is, what's the difference between doing i s q  u1 and i s q ON u1? Am I not accounting for Time 1 either way? My final question is regarding the stacked model (multiple group) approached outlined in the Muthen & Curran (1997) paper. If I want to apply the approach to my model, do I need to run a separate model line for each path that I want to test, i.e. free one path at a time and run the stacked model? And, how do I interpret the output, i.e. check for significance? Do I just subtract the two "chisquare contribution" values from control and intervention groups, and compare to chisquare (1 df) criteria? 


Hi, I want to know, how MPlus can tell me how many latent classes I have in my data. Can you tell me witch example do that and were in the output MPlus revealed de number of latent classes. Thank you and sorry for my French accent!! 


Answer to Jung's Sept 12 2:12 questions: It seems like you don't need to declare u1 as a categorical variable because it is an IV  this would then avoid the MonteCarlo integration requirement I think. Plot is not available in the case you refer to. Answer to Jung's Sept 12 2:13 questions: See the UG chapter 16 for explications of growth model statements using the bar ("") approach. You will see there that the bar statement does not estimate regression slopes. Note also that you cannot identify a quadratic growth model by a single outcome (as in i s q  u1). Regarding your final questions, I would simply use group as a dummy covariate. 


I am sorry if this post is a repeat of a message I posted earlier (my computer crashed and so I am not sure if it was sent). I wish to analyse the patterns of longitudinal poverty (a binary measure) over 15 waves of data. I would like to know what are the main differences between LCGA and latent Mrkov analysis (of Langeheine and van de Pol). Many thanks. 


LCGA is similar in spirit to regular random effects growth modeling, where the idea is that a growth process affects the development over time. Latent Markov is an autoregressive type of model where the status at one time point influences the status at the next time point. So the former model type does not specify direct influence between the outcomes over time (although they are certainly allowed to be highly correlated), while the latter does. The substantive application may be more suitable to one or the other model type. 


Dear Dr Muthen, Many thanks for your response. This is most helpful. Regards Sara 


I would like to run a latent class growth analysis to examine differing trajectories of change across 7 waves of data. I would like to use individuallyvarying time points, as there is a great deal of variation in time of interview at each wave. Is it possible to run latent class analyses using individuallyvarying time points? Would I be able to compare models with different numbers of classes? 


Yes and yes. You use TYPE=RANDOM; and the AT option for individuallyvarying times of observation. 


Thanks! This set me on the right path, and I ended up using TYPE=MIXTURE RANDOM MISSING. I am having some trouble interpreting my results, however. 1) without the “means” for intercept and slope, where should I be looking to interpret the overall slope for each class? 2) my results give me negative values for the intercepts of the intercept factor, and positive values for the intercepts of the slope factor. I had been expecting a negative trend for the slope, and do not know how to interpret a negative intercept. 


It sounds like you have covariates in the model. Otherwise, you would obtain a mean and variance for all growth factors. 


Thanks, taking out the covariate fixed it. 


am i supposed to use the same number of sets of starting values and same number of optimizations when testing a kclassmodel against a k1classmodel when using the STARTS option in the analysis command? thanks a lot. 


You don't need the same number of starts for models with a different number of classes. You need enough starts so that the best loglikelihood is replicated. 


I am doing latent class growth analysis on a dataset of 206 individuals, where the outcome is adherence to endurance exercise (minues/week) measured at 4 time points (4, 6, 8 and 12 months) following a 3month rehabilitation program. My first objective is to determine the number of latent trajectory classes, without covariates. I have several basic questions: 1) My understanding is that for entropy, a higher value is better. What is an acceptable value ? 2) For the LoMendellRubin test, a low pvalue indicates the model with one less class is rejected in favour of the estimated model. What is considered a low pvalue: < 0.05, <0.1? 3) Given that the outcome was measured at 4, 6, 8 and 12 months, should the model syntax specifying the time scores be: endmth4@0 endmth6@2 endmth8@4 endmth12@8 OR endmth4@4 endmth6@6 endmth8@8 endmth12@12 I have tried it both ways. The entropy and classifications are the same, however the estimates are slightly different. Which is correct ? Thanks very much. 


1. Entropy ranges fro 0 to 1 so you would want a fairly high value like .8. However, entropy is a summary measure. It may be that some classes distinguish well and if those are the classes you are most interested in, then I wouldn't give so much importance to a summary measure. 2. Less than .05. 3. Your time scores should reflect time between your measurement occasions, for example, 0 1 2 4. 


Thanks very much. Some additional questions re: latent class growth analysis with 4 time points. 1) In the output, 3 classifications are given: i) final class counts and proportions for the latent class patterns based on the estimated model, ii) final class counts and proportions for the latent class patterns based on estimated posterior probabilities, and iii) classification of individuals based on their most likely latent class membership. i) and ii) appear to be the same, and is the classification used in the plot of sample and estimated means. iii) appears to differ from i) and ii), in some models, and is the classification saved when save=cprob is requested. My question is: why is one classification used for the plot, and another saved in the .dat file ? 2) I have tried running my model with homogeneous and heterogeneous (default) variance across time points. Is either acceptable ? When should one be used versus the other ? 3) From the web course on multilevel modeling, it was mentioned briefly that piecewise modeling can be used when there are several time points per growth phase. Can I assume that my growth model, which has a TOTAL of 4 time points, would NOT be appropriate for piecewise modeling ? 


1. The estimated posterior probabilites and most likely class membership are all saved. Estimated means in the plots use estimated probabilities. Observed individual trajectories are based on most likely class membership. 2. We recommend allowing heterogeneous residual variances across time. 3. Four time points is typically too few time points for piecewise. 


Thanks. For the analysis described above: Because the outcome has a skewed distribution, I have categorized it into 5 categories (04). The categories are 0: 0 minutes/wk, 1: 160 mins/wk, 2: 61100 mins/wk, etc. I have then treated the outcome as continuous in a 2class LCGA model with 4 timeindependent covariates. Below is a portion of the output: Categorical Latent Variables C#1 ON RFEV10PT 0.444 0.177 2.516 MCIEJHD 2.533 0.436 5.810 EXACFU 1.126 0.392 2.870 O2EX 1.347 0.569 2.367 Intercepts C#1 2.287 1.032 2.215 My questions are about the interpretation of parameter estimates. For covariate EXACFU (binary), is the interpretation that individuals with exacfu=1 are exp(1.126)=3.08 times more likely to be in class 1 versus class 2 ? I am unsure of the interpretation for the estimate 2.287 for Intercepts C#1. Is this the inverse natural log (or odds) of being in class #1 vs class 2, adjusted for the 4 covariates, ie. exp(2.287)=0.102 ? If I am incorrect, please explain how these estimates are interpreted. 


The odds of being in class 1 versus class 2 adjusted for the other covariates is 3.08 higher for those with exacfu = 1 than those with exacfu = 0. The intercept is the logit when all covariates are zero. 


For the above analysis, I have a sample of 206 individuals, measured at 4 time points. Can you provide any guidelines or references for sample size requirements for latent class growth analysis. 


The sample size needed depends on several factors so it is hard to say without doing a simulation study. In general, one needs fewer observations with repeated measures. See the following paper which you might find helpful: Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. 


I have a question. I am doing growth mixture modeling. Is it OK to compare two different solutions (e.g., 2 classes vs 3 classes) when I have made modifications to one of the models (e.g., fixed the variance of the growth factor to zero in one of the classes, let the residual variances be freely estimated in different classes...)? Thank you! 

sara khanum posted on Wednesday, October 31, 2007  12:03 pm



Hi I am learning about LCGA in MPlus and would be very grateful for your advice about data format. I have a large panel data set in long format and wish to work in this format in MPlus (I will be adding many covariates for a 15 wave panel and so it is easier). Most of the examples I have seen are in wide format and I would be grateful if you could help me modify the syntax below (which uses a wide structure) to accommodate for this. Do I need to include my ID and wave (time) indicators? Regards Sara DATA: FILE IS longdat.dat; VARIABLE: NAMES ARE id gen sup pov1 pov2 pov3 pov4 gpa1 gpa2 gpa3 gpa4; USEVARIABLES ARE pov1 pov2 pov3 pov4; CLASSES = c (2); ANALYSIS: TYPE = mixture; STARTS = 100 10; ESTIMATOR = MLR; MITERATION = 5000; MODEL: %OVERALL% i s  pov1@0 pov2@1 pov3@2 pov4@3; i@0; s@0; OUTPUT: STANDARDIZED MOD; TECH11; SAVEDATA: FILE = lcga_pov.dat; SAVE = CPROBABILITIES; PLOT: TYPE = PLOT2; SERIES = pov1 (1) pov2 (2) pov3 (3) pov4 (4); 


See Example 9.16. 

sara khanum posted on Wednesday, October 31, 2007  4:08 pm



Dear Linda Thank you for your prompt response. I looked at 9.16 but I couldn't find any LCGA examples for data that are already in long format. I tried to modify the above syntax for my data but couldn't get it work. I guess I am not clear where/if to specify the id and wave variables and the growth factors. I did the following: USEVARIABLES ARE pid wave pov1 pov2 pov3 pov4; CLASSES = c (2); ANALYSIS: TYPE = mixture; IDVARIABLE = person; REPETITION = time; STARTS = 100 10; ESTIMATOR = MLR; MITERATION = 5000; MODEL: %OVERALL% i s  pov1@0 pov2@1 pov3@2 pov4@3 ON wave; i@0; s@0; OUTPUT: STANDARDIZED MOD; TECH11; SAVEDATA: FILE = lcga_pov.dat; SAVE = CPROBABILITIES; PLOT: TYPE = PLOT2; SERIES = pov; In your experience, is long data more difficult/inflexible for longitudinal growth analysis? Your guidance would be most appreciated. Regards Sara 

sara khanum posted on Wednesday, October 31, 2007  4:29 pm



Dear Professor Muthen Sorry to bother you again! I have looked at 9.16 again  in order to use long data, does one have to specify a multilevel model? I couldn't adapt 9.16 to my data as I wasn't intending to do a multilevel specification. Regards Sara 


Example 9.16 is the only example we have in long format although you can translate any one into this. You can compare the MODEL command from this example to the  statement that specifies the growth model in Example 6.1 to see how they compare. The wide specification of the growth model is actually more flexible than the long specification. Residual variances can be estimated for each time point and residual covariances can be included in the model. The only time the long format might be desirable is with a very long time series. I would use an LCGA example from Chapter 8. 


When using LCGA to determine the best number of classes, how do you reconcile an earlier comparison with a decreasing BIC and a nonsignificant LMRLRT (e.g., two versus three classes), and a subsequent comparison with a decreasing BIC and a significant LMRLRT (e.g., three versus four classes)? I've run into this many times when a priori testing across a range of possible classes (e.g., from 1 to 6). 


The various statistics that are examined to determine the number of classes do not always agree. If the 3, 4, and 5 class solutions are suggested, one can see which makes the most substantive sense. 


I am doing 3class LCGA with 4 time points. When I include the syntax below, I get exactly the same output as when I don't include it. %c#1% [i](i1); [s](s1); %c#2% [i](i2); [s](s2); My questions are: 1) Is this the correct syntax to specify linear slopes ? 2) What is the default (eg. linear, spline) if I do not include this syntax ? 


The syntax you show specifies that the means of the intercept and slope growth factors are free across classes. This is the default so it has no impact. You have also given different labels to the four parameters which also has no impact on model estimation. 


Thanks. Followup questions: 1) Is linear growth the default ? 2) Is it possible to specify quadratic growth ? If so, what is the syntax ? 


There is no default growth model. The growth model has to be specified. See Example 6.1 for a linear growth model. See Example 6.9 for a quadratic growth model. See the discussion of growth modeling in Chapter 16 under The  Symbol. 


According to the MPlus web training ‘Growth Modeling with Latent Variables using MPlus’ (slide 39), the number of free parameters in the H1 unrestricted model is 14 for a linear growth model with 4 time points and no covariates (14=4 means + 4(5)/2). My questions are: How is the number of free parameters calculated for the HI unrestricted model, with 4 time points, but with a quadratic growth factor? Does this calculation apply to latent class growth analysis ? 


The number of parameters for the H1 model is the same for a linear or quadratic growth model. Only the H0 model changes. This calculation does not apply to latent class growth analysis. 


Thank you. I am trying to determine the degrees of freedom for different LCGA models. 1) Is degrees of freedom = #parameters for H1 model minus #parameters estimated in Ho model? 2)The MPlus output gives the 'number of free parameters'  is this the number of parameters being estimated in the Ho model? 3) How is the number of parameters for the H1 model determined for a 3class LCGA linear growth model with 4 time points, with intercept and slope variance constrained to zero ? 4)Would the calculation be the same for a model that includes a quadratic growth factor (variance set to zero) ? 


Degrees of freedom are relevant when means, variances, and covariances are sufficient statistics for model estimation. In this case, the degrees of freedom are equal to the difference between the parameters in the H0 and H1 models. When they are not, the number of parameters is used instead. The number of parameters given is for the H0 model. 

jemila seid posted on Friday, January 02, 2009  7:35 am



Dear Profs. Muthen, I am looking of a paper in Genetics Applications of Growth Mixture Models for a journal club presentation. Could you please recommend a recent paper in this area? I got the opportunity to read one of your GAW16 contributions and I am looking for a similar paper. Thanks a lot for your help Best regards Jemila 


Beyond the GAW16 paper I don't recall having yet seen a GMM paper with genetics application except perhaps Irene Rebollo has done some work on that  you may want to google her at the Vrije Univ of Amsterdam. I have done a crosssectional version of such a genetics analysis in a twin setting using "factor mixture modeling"  see Muthén, B., Asparouhov, T. & Rebollo, I. (2006). Advances in behavioral genetics modeling using Mplus: Applications of factor mixture modeling to twin data. Twin Research and Human Genetics, 9, 313324. in our Genetics Section http://www.statmodel.com/geneticstopic.shtml 

jemila seid posted on Friday, January 02, 2009  10:06 am



Thanks a lot, Prof. Muthen, for your prompt replay. I appreciate it. I will have a look at Irene Rebollo's work. would it be possible to use your GAW16 contribution as an example in my presentation? if so, may I get a complete version of your contribution? Thanks once again Best regards Jemila 


Hi, I am trying to replicate in Mplus a LCGA model with a censored normal outcome that successfully ran using Proc Traj, but I cannot get it to run in Mplus. The Mplus code is in the next message. My question is whether I am specifying the model correctly to replicate the Proc Traj analysis. I have read about the differences between the two approaches, but I'm not sure what would be causing this problem. I know numerical integration is computationally intensive, and I've tried all sorts of approaches, but nothing works. Is there a way to specify the estimation to do the same thing Proc Traj does? And, if I'm already doing this correctly, do you have suggestions for why the model converges in Proc Traj but not Mplus? Thanks for your help. 


Code to replicate proc traj (part 1): classes= group(5); censored= lcwkav1lcwkav35 (b); analysis: type=mixture; ALGORITHM=integration; integration=5; 


Code (part 2) %overall% i s q c lcwkav1@0...lcwkav35@34; %group#1% [i*1.4 s*.22 q*0 c*0]; %group#2% [i*5 s*.05 q*0 c*0]; %group#3% [i*.5 s*.68 q*.03 c*0]; %group#4% [i*1.25 s*.1 q*.02 c*0]; %group#5% [i*1.7 s*.02 q*0 c*0]; ic@0; 


You should delete your statements ALGORITHM=integration; integration=5; because they initiate the use of random effects in the model, which is not part of the TRAJ model. Also, divide your time scores by 10 to make things run more smoothly. 


Dear Drs Muthén, I am plotting graphs obtained from a LCGA and although I am fitting a linear model with three classes, the growth curves two out of three are not linear but a have a curvature as if I was fitting a quadratic model. Does this have anything to do with the use of a censored (b) distribution? Thank you in advance! 


Yes, linear in the underlying (uncensored) normal variable, but not in the observed variable. Same for categorical outcomes. 

Amery Wu posted on Wednesday, April 08, 2009  11:44 am



Dear Dr. Muthen, I am running a piecewise general growth mixture model. I selected a 4class model based on the unconditional model (without the auxiliary variables), which has sound statistical fit and substantive interpretability. I would like to proceed to add the auxiliary variable to the same 4class model extracted by the unconditional model, so I fixed the growth factors means to those of the unconditional model while running the conditional model. The results make sense, despite that the class distribution changed a bit. I also tried to free the growth factors means for the conditional model, the growth factor means changed a lot due to the addition of the auxiliary variables. The classes are very different from those of the unconditional model. My question is: Is my approach of fixing the growth factor means justifiable, especially in terms of the estimation? Or, should I report and interpreted the growth factors means of the conditional model? Thanks a lot, Amery Wu 


This is a big topic. Please see the following two papers both of which are available on the website: Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345368). Newbury Park, CA: Sage Publications. Clark, S. & Muthén, B. (2009). Relating latent class analysis results to variables not included in the analysis. Submitted for publication. 

Amery Wu posted on Wednesday, April 08, 2009  8:29 pm



Many Thanks, Dr. Muthen. I'll reread the 2004 paper, and read the 2009 manuscript, and ask your advice again if necessary. Amery Wu 


Dear Dr. Muthen, I'm writing a master's thesis in Statistics about Latent Class Growth Analysis using Mplus. My outcomes are kategorical (4 kategories), 8 timepoints, N=1000. Can you please recommend me materials, that will help me in my work. Thank you in advance! 


I would suggest looking at the Topic 5 and 6 course handouts, videos, and the references therein in addition to papers on the website. 


I am trying to do a latent class growth mixture model to describe different classes of individuals in terms of their pubertal trajectory. The goal is to identify individuals according to membership in "early" and "late" trajectory classesplus some number of intermediate classes depending on the best solution. There are ~400 individuals who were measured across ages 10 through 16. Scores are continuous (based on an average of 5 subindices), and there is a wide variation among individuals in the number of missing scores. I ran LCGMM for linear, freed linear, quadratic, and cubic slopeseach with either one, two, three, four, or five classes. My strategy for choosing the best model has been to look at the lowest BIC criterion based on Nulund, Asporouhov, and Muthen (2007) and to also consider the percentage of individuals in each classavoiding solutions with less than 5% of individuals in the smallest class. Is the BIC criterion only meant to be used within a particular slope model, or can it be used to compare, say, the best quadratic versus the best linear class solutions? I assumed that the slope and number of classes should be considered simultaneously. Alternatively, should I first decide on the slope category using longitudinal growth models (based on Chisquare), and then decide on the number of classes using latent class growth mixture modeling (based on BIC and % class membership)? 


You can use BIC to compare models as long as the set of observed variables is the same. Remember that if you use chisquare to compare nested models and the model has any variances fixed at zero, the difference test may not be distributed as chisquare. I would take theory and an examination of individual trajectories to find the most complex model that might need to be fitted, for example, a quadratic model. I would then use that model and extract classes. If the mean and variance of the growth factors in some classes are not significant, I would adjust the growth models for these classes. 


Thanks so much for your response. I am very new to MPLUS and have to ask some very basic followup questions. 1) I am using Example 8.6 as my model without the u parameter. Where is the information on the significance of means and variances? 2) In terms of making adjustments, I would like to explore models in which class 1 is linear, class 2 is quadratic, class3 is cubic, etc. How can I specify different classes having different slopes in the same model? 3) In the tech4 output for several of my models, I have small negative variances (e.g., .008) in the estimated covariance matrix of latent variables (among my intercepts). I read in another comment that these may be the reason for nonpositive definite psi matrix warnings, and that a solution is to set these to zero. What code specifies that this should be zero in the model? 


1. If you are asking about model results, the third column of the output is a zscore and the pvalue of this score is given in the fourth column. 2. The most complex model should be specified in the %OVERALL% part of the MODEL command. The classspecific parts of the MODEL command should fix the growth factor variances to zero for the components not part of that class, for example, cubic@0; 3. Is the negative value on the diagonal or offdiagonal? 

Josh Bricker posted on Wednesday, February 17, 2010  3:23 pm



3. the negative value in on the diagonal 

Josh Bricker posted on Wednesday, February 17, 2010  3:44 pm



2. to be more specific about the coding would that mean something like this? VARIABLE: NAMES ARE ...; USEVARIABLES Y1 Y2 Y3 Y4; CLASSES = C(3); MODEL: %OVERALL% isq Y1@0 Y2@1 Y3@2 Y4@3; MODEL c1: quadratic@0 linear@0; MODEL c2: quadratic@0; 

Josh Bricker posted on Wednesday, February 17, 2010  4:19 pm



2. okay, I figured out 2 as follows: VARIABLE: NAMES ARE ...; USEVARIABLES Y1 Y2 Y3 Y4; CLASSES = C(3); MODEL: %OVERALL% isq Y1@0 Y2@1 Y3@2 Y4@3; %c#1% [q@0 l@0]; %c#2% [q@0]; 


You want to fix not only the classspecific growth factor mean to zero but also the classspecific growth factor variance to zero. Unless you have a strong theory for different curve shapes in different classes, a more common approach is to fit the most general shape (here quadratic) in all classes  then you will find if a growth factor mean/variance is zero in certain classes. 

Josh Bricker posted on Wednesday, February 17, 2010  8:22 pm



Oh, so my example code was only setting the means to zerohow do I also set the variances to zero? I am basing my exploration of different curve shapes for different classes on Linda's advice above to first do an overall model and then to make adjustments informed from the output about specific class means/variances. I also wanted to ask (#3 above) how to set small negative variances (on the diagonal) to zero in the estimated covariance matrix of latent variables (among my intercepts). What code specifies that this should be zero in the model? 

Josh Bricker posted on Wednesday, February 17, 2010  10:58 pm



A separate additional question in response to Bengt's comment "You want to fix not only the classspecific growth factor mean to zero but also the classspecific growth factor variance to zero" Is the decision to fix a parameter to zero always made for both means and variances together even if there is conflicting evidence from the pvalue for the Means and Variances? Sometimes I find that the mean for the Quadratic in a particular class is no different from 0 while its variance is significantor vice versathe mean is small but significant while the variance is not different from 0. Are you saying in either case, both the mean and variance should be set to 0? 


If you have a quadratic model in the overall part of the MODEL command and want a linear model in one class, fix the mean and variance of the quadratic growth factor to zero in that class, for example, [q@0]; q@0; I would not fix a mean to zero without also fixing the variance to zero. If a variance is significant and a mean is not, I would leave the mean free. I would not overfit the model to the sample data. 

Averdijk posted on Friday, July 16, 2010  3:27 am



Dear Drs Muthen, I performed a 5group LCGA with multinomial logistic regression in both SAS Proc Traj and MPlus. Although the overall trajectories look similar in both programs, counts for trajectory memberships are somewhat different, and moreover the results for the multinomial logit are different (I defined the same reference category). I must have made a mistake in the syntax. Would you have any suggestions on how to change the syntax? Many thanks in advance. usevar = taggr1  taggr4 Emoattr Sex ISEI mighh2 Stablfam Socdes; Censored taggr1  taggr4 (b); missing = all (999); CLASSES = c(5); Analysis: type = MIXTURE; Estimator = ML; Starts = 500 20; Stiterations = 20; LRTSTARTS = 2 1 50 15; Model: %OVERALL% i s  taggr1@0.1 taggr2@0.2 taggr3@0.3 taggr4@0.4; c on Emoattr Sex ISEI mighh2 Stablfam Socdes; OUTPUT: sampstat TECH14; 


They should give the same results if the models are specified to be the same. Check the number of parameters and the loglikelihood. Also, be sure that data are the same. I would think you would want one of your time scores to be zero. 

Jamie Vaske posted on Friday, August 06, 2010  6:26 am



Hello, A colleague and I are estimating an LCGA with observed variables. Our observed scales, though, have very poor reliability and we were wondering whether a CFA can be incorporated into an LCGA framework. If it can, are we still modeling absolute change in our variable over time, or are we modeling change in one's factor scores over time? If this analysis is possible, do you know of a good reference/article that discusses this type of analysis? Thank you for your time and advice! Jamie 


If you have a growth model on the factors, you are modeling change in the factors over time. I don't know of a reference but the Topic 4 course handout and video goes through the steps to do this in detail. 

Jahun Kim posted on Thursday, February 03, 2011  12:11 pm



Hello, I'm trying to do GMM to identify classes of mother's support and to examine whether these classes predict kid's risk behavior, depression, and drug use. All of three outcome variables are continuous and I want to add them, one by one, to the GMM. I realized that my model is similar to Example 8.6 in Mplus manual (except my outcomes are continuous). But it doesn't say what how to write when classes predict outcomes (hrb). I used 'on' statement like below, but it did not work. q5hrb on c#1; The warning statement I've got... *** WARNING in Model command Variable is uncorrelated with all other variables within class: Q5HRB *** WARNING in Model command All least one variable is uncorrelated with all other variables within class. Check that this is what is intended. *** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: Q5HRB ON C#1 Could you help me how I can fix these problems? 


Just add Q5HRB to the USEVARIABLES list but don't use an ON statement. The results are found in the varying of the means of Q5HRB across classes 


Drs. Muthen, I am running an LCGA model with a continuous distal outcome (Memory). I can find in the output the mean of Memory for each class but don't know where to look for parameter estimates of the regression of Memory on C. In case my code is the problem, I am using the following: usevar = ArousD1ArousD5 Age APACHE Memory; Classes = c(2); Analysis: type = Mixture; Model: %OVERALL% i sArousD1@0 ArousD2@1 ArousD3@2 ArousD4@3 ArousD5@4 ; is@0; c#1 on Age APACHE; Output: sampstat standardized tech1 TECH11 TECH14 Thank you for your guidance and for this forum. 


Your input is fine. The regression of Memory on C is expressed by the classspecific means of Memory. Mplus does not allow Memory ON C, and it is not needed. This is analogous to linear regression Of Y on a dummy X variable  the different X categories give different Y means. There is no other coefficient. You can test mean differences of Memory using Model test. 


Thank you Dr. Muthen. An additional question on the model test syntax. Since memory does not explicitly appear in the model command: Model: %OVERALL% i sArousD1@0 ArousD2@1 ArousD3@2 ArousD4@3 ArousD5@4 ; is@0; c#1 on Age APACHE; How do I specify the classspecific means of memory in Model Test? I have tried different variations on the NEW option of the model constraint command without success. Thank you 


You must specifically mention the means by using a bracket statement. If you continue to have problems, please send your output and license number to support@statmodel.com. 

Dana Wood posted on Wednesday, October 05, 2011  1:34 pm



Hello, Using a sample size of 1,495, I am estimating a growth mixture model. Comparing BIC, AIC, and BLRT points to selection of a 4class solution over the 3class solution. However, the 4class solution has one very tiny class (4% of sample). When I try to run the 5class solution, it turns out to be inestimable. Is this likely because I have one very tiny class in the 4class solution (making it difficult or implausible to further split the sample)? Thank you! 


I think the 5class solution is possible in principle because it could split a large class in the 4class solution. A class with 4% isn't that small in a sample of your size. So the fact that 5 classes is inestimable is another indication of 4 classes being best. 


Hello, I'm interested in possibly using a LCGA model and have some conceptual questions that I was hoping the topic 6 lecture video would answer (Berlin 2009). It seems like part 2 of the video ends at slide 69 and part 3 of the video starts at slide 95. Do you know where I can find the portion covering LCGA? Thanks, Diana 


It may be that was lost. You need to see the materials in the handout. 


Hello again, Unfortunately the slides alone don't answer my question, so I'm hoping you can help. I have a categorical variable measured over time. I would like to create "trajectories" that represent patterns over time. However, I don't really want to impose a structure on the trajectories by forcing them to be lines, per se. I want them to be able to have any shape. Does LCGA do this, or should I just use LCA? I know that LCGA differs from GGMM in that the variances within classes are 0. I'm not sure whether this has any effect on the possible trajectory shapes. I apologize if this doesn't make sense. I have read a few articles on LCGA and GGMM but I'm still confused. I have also gotten different answer from different people when asking about this, so I would greatly appreciate your input. Thank you! Diana 

Jon Heron posted on Wednesday, February 15, 2012  12:09 am



Hi Diana, if your data is binary, you have fixed time points, and you're not modelling variance then you can get the same answer with an LCA that you'd get with an equivalent LCGA. For instance if you have 4 time points then you can fit an LCA or a cubic polynomial LCGA. I would imagine that a piecewise linear LCGA would also give the same answer although I haven't tried this. They all use four parameters within each class to describe the four probabilities. Once you get above 4 time points the equivalence should still remain however estimation becomes more difficult  the loadings you'd need to apply in a quintic LCGA get a bit on the large side. I always start of with an LCA and then move towards a potentially more parsimonious LCGA if the patterns look well behaved. 


Hi John, Thanks for your response. Just to clarify, are you saying that to get the same answer (or to have equally flexible models) then I will need to add a polynomial term to the LCGA model every time I add a time point? I have 15 time points, so I imagine that's not a realistic option. I had thought that they might also give different answers (even if they are equally flexible) because the LCA is disregarding time. Have you found that that isn't true? Thanks! Diana 

Jon Heron posted on Wednesday, February 15, 2012  6:31 am



Hi Diana you'd get the SAME answer with four time points and either an LCA or a cubic LCGA  they're different parameterizations of the same model (if your data is binary etc..) In theory you could fit a polynomial of degree 14 to your data and that would give the same answer as an LCA, but that's probably impossible to estimate. You might still be able to a model with 14 piecewise linear slopes and that should once again agree with the LCA. The only benefit to including time in the model is that you can then fit simpler shapes (as you then know the ordering of the observations). With the 4 time point example, the cubic polynomial would be the same as joining the dots so there's no simplification there. My recommendation would be to start with an LCA and examine the patterns. You might decide at that point that a cubic or quartic or perhaps 3 or 4 piecewise linear segments would do the job just as well. bw, Jon 

Jon Heron posted on Wednesday, February 15, 2012  6:36 am



Some "proof" (I did this as a teaching example once) LCA: categorical = msmk21 msmk33 msmk47 msmk61; usevariables = msmk21 msmk33 msmk47 msmk61; MODEL: %overall% [c#1 c#2]; $c#1% [msmk21$1 msmk33$1 msmk47$1 msmk61$1]; $c#2% [msmk21$1 msmk33$1 msmk47$1 msmk61$1]; $c#3% [msmk21$1 msmk33$1 msmk47$1 msmk61$1]; $c#4% [msmk21$1 msmk33$1 msmk47$1 msmk61$1]; Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers: 7435.680 299700 932 LCGA: categorical = msmk21 msmk33 msmk47 msmk61; usevariables = msmk21 msmk33 msmk47 msmk61; Model: %overall% i s q cub  msmk21@0 msmk33@1 msmk47@2.166 msmk61@3.33; [msmk21$1@0 msmk33$1@0 msmk47$1@0 msmk61$1@0]; [c#1 c#2]; %c#1% [i s q cub]; %c#2% [i s q cub]; %c#3% [i s q cub]; Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers: 7435.680 887676 22 

Jon Heron posted on Wednesday, February 15, 2012  6:38 am



Oh moo, I've pasted the wrong syntax and thus proved nothing  doh :S 

Jon Heron posted on Wednesday, February 15, 2012  6:44 am



Ignore this bit "$c#4% [msmk21$1 msmk33$1 msmk47$1 msmk61$1]; " and those two models are identical 


Okay, great, this is very helpful. Thank you! 

Jon Heron posted on Wednesday, February 15, 2012  7:57 am



No probs :) 


Hello, I am interested in the genderspecific trajectories of comorbid syndroms during and in the course of a depressive disorder in a sample of N=222. The data are longitudinal with three followups after the first occurence of a depression. We used the SCL90, which consists of nine dimensional subscales (Likert Scale). I am a real beginner with Mplus and thought, the convenient analysis would be a LCGA or a GMM (both with a covariate sex) depending on how I handle the outcome (binary or metric). Is that correct? My second question concerns the number of indicators: Can I perform a LCGA using all nine scales simultaneous to each point of measurement (0,1,2,3)? The syntax would look like this: u1@0 u2@0....u9@0 u1@1 u2@1... Or would you recommend to perform the analysis seperately for each subscale? If so, how can I bring together the results after I identified the latent classes for the nine scales seperately? Or is there a more appropriate methodical approach to answer my research question? Thank you in advance for your answer! Kind regards, Stephanie 


I would do each scale separately as a start. 


Thank you for your answer. And what would you do in the next step? Thanks Stephanie 


There are possibilities to do parallel process growth mixture analysis of more than one subscale at a time. In this case you can see if you need one latent class variable in common for the processes or if each process needs its own latent class variable, where the latent class variables are correlated. But cross that bridge after you've done GMM for each process. 


Dear Dr. Muthen, I have a question on Growth Mixture Model. Currently, I am testing GMM with continuous outcome. Thus, I want to test each class can predict continous outcome. As you suggested above, I just added my outcomes in the uservariable and found the means of each classes. As you suggested, I understood these means work in the same manner as regression coefficients with dummy variables. To test mean difference, you suggested the model test. However, I am little confused whether you meant the model test is the model constraint using chisquare. If you have any advice on my question, please let me know. I very appreciate your advice in advance and look forward to hearing back from you. Have a nice day 


I meant MODEL TEST. 


Dear Dr. Muthen, Thank you for your quick response. However, I have a followup question on your answer. If you meant that the model test is the model constraints using mean equality, I think I already tried to do this test. However, I couldn't find any chisquare value for the model fit in the results. I only got Loglikelihood, BIC, and AIC. Thus, if possible, can I ask you for your advice on how to do the model test with these information? If you have any suggestion, please let me know. I very appreciate your advice on my situation and look forward to hearing back from you. Have a nice day. 


If you use MODEL TEST, the Wald test results are with the fit statistics. If you use MODEL CONSTRAINT, ztests for the new parameters come at the end of the results. If you need further help on this, please send the output and your license number to support@statmodel.com. 


Dear Dr. Muthen, Thank you for your help. If you allow me to ask just one more question, I would like to ask for your advice on how to add control variable for outcomes in growth mixture model? Currently, I consider using control variable (continuous) for outcome. In this case, I think model looks like ANCOVA Model. That is, y(distal outcome) = intercept + class*b1 + control variable(Continuous)*b2 + e. If I am correct, could I ask you on how to add this control variable(covariate) in the program? For this situation, do I only need to add normal regression syntax in %overall% area of model specification? (e.g., y on x) For your information, my syntax is as follow: MODEL: %overall% I Sdep90@0 dep91@1 dep92@2 dep94@4; c on bt0sex fep90 fdep90 mdep90 mmi90 fmi90; gh on s(covariate); %c#1% i; [gh](m1); %c#2% i@0; [gh](m2); %c#3% i; [gh](m3); model test: m1=m2; I appreciate so much your help. Have a nice day. 


This is correct. You may also want to use s as a covariate in the c ON statement. 

F Lamers posted on Friday, March 30, 2012  2:22 pm



I’m running a LCGA with 5 time points in a sample of 804 persons. Initially, I ran the model without my covariate. This went smoothly and it seemed that a quadratic model fit the data best. However, after adding the dichotomous covariate, I’m running into problems. Models with 1, 2, 3 or 5 classes run OK, but models with 4, 6, and 7 classes give me the following error: ‘THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.349D11. PROBLEM INVOLVING PARAMETER 40.’ The parameter it mentions in all of these errors involve my covariate (TAU (U)), but I’m not sure what it means. Can you tell me what this means? 


Please send your output and license number to support@statmodel.com. 

ywang posted on Thursday, July 19, 2012  8:18 am



Dear Drs. Muthen: For the variable of frequency of drug use in lifetime (responses in likert scale from none to 40 or more) with about half of responses as 0 (no), can a latent growth modeling be used? If not, can a latent class growth modeling used? Thanks! 


You may try "twopart" growth modeling. See our Topic 4 handout and papers on our web site. 

IYH Boon posted on Tuesday, July 24, 2012  1:44 pm



In an LCGA, when Mplus gives the "one or more parameters were fixed to avoid singularity of the information matrix" error, what are the parameters being fixed to? According to TECH1, in the 4class model that I'm estimating, the mean of the quadratic term for the second latent class is being fixed. Is it being fixed to 0? 


You see the fixed value in the regular output. 

Sar posted on Monday, July 30, 2012  12:35 am



Hi, I have estimated a LCGM and wish to start freeing the variance of the intercept and slope. However whenever I do this for particular classes the class structure (i.e the order of classes) changes and therefore what I have freed up is not always in the same class. I have tried to use starting values based on mean and slope of intercept from the LCGM to specify the order of the latent classes something like this for each class: %c#1% [i*.212 s*.114]; i s; It doesn't seem to have any effect on the order of the latent classes and the same problem is occurring. Is this the code I should be using to freely estimate the variance for specific classes? 


When you free a variance in one class, the program puts the free variance in the class that provides the best overall loglikelihood. You can try freeing all variances or use the SVALUES option of the OUTPUT command to get input statements with starting values and free just one variance. This may help the classes stay in the same order. 


I would like to use Latent Class Growth Analysis (LCGA) to analyze longitudinal Add Health data, which has a complex sampling design. Is it possible to specify an LCGA model with Add Health strata, cluster, and sampling weight variables? If so, could you provide a syntax example? 


Yes. See pages 499505 of the Mplus User's Guide. 


I am trying to use the AUXILIARY option for latent class growth analysis with covariates on the auxiliary variable. In other words, I want to predict an outcome from the classes, but control for other potentially preexisting differences. My current output results in: *** ERROR in MODEL command Unknown variable(s) in an ON statement: BIN_TM54 Here's the syntax I am trying. usevariables = zaff01 zaff15 zaff36 zaff54 bin_tm54; classes = c(5); AUXILIARY = bin_tm54 (de3step); ANALYSIS: type = mixture; !ESTIMATOR = ML; starts = 1000; stiterations = 50; LRTSTARTS = 100 100 500 500; Model: %overall% ip sp  zaff01@0 zaff15@1 zaff36@2 zaff54@3 ; ipsp@0; bin_tm54 on incntm01 male ark cal kan nhamp penn temple va wa wcar wisc meducm01 mage_m01 permar_1 mcrabm01 psyagg2 other white black trad_m01 neurtm05 stdscm36 married1 ; Any help is appreciated. 


A variable cannot appear on both the USEVARIABLES list and the AUXILIARY list. The variables on the AUXILIARY list should not be analysis variables. All variables on the USEVARIABLES list are used in the analysis. 


I have tried without the bin_tm54 variable on USEVARIABLES too, but my error is then: *** ERROR in MODEL command Unknown variable(s) in an ON statement: BIN_TM54 Again, I want to add covariates to the distal outcome (bin_tm54). Is this possible? 

Jon Heron posted on Thursday, September 27, 2012  11:42 pm



Am I right in thinking that ideally you would want NONE of the other variables in your analysis to impact on the estimation of your latent classes? You may need to do this the old fashioned way. I'm doing just this at the minute  assessing the effect of a 4cat latent class measure on a binary outcome whilst controlling for confounders. If Linda agrees this can't yet be done the "new way" then I'm happy to send you some code (it's a bit long to post here). 


You should not have the BIN_TM54 variable in your MODEL (i.e. BIN_TM54 ON...)  that's what Mplus complains about (you can have it on the USEV list, but it will be removed due to being on the AUXILIARY list). It looks like you want to control for covariates influencing your distal outcome BIN_TM54. In that case, you may want to do the 3step "by hand" as described in our web note 15 posted on our web site. 


I keep receiving the following error: *** ERROR The following MODEL statements are ignored: * Statements in Class %OVERALL% of MODEL: WJAPWCX5 ON C1#1 WJAPWCX5 ON C2#1 WJAPWCX5 ON C2#2 *** ERROR One or more MODEL statements were ignored. Note that ON statements must appear in the OVERALL class before they can be modified in classspecific models. Some statements are only supported by ALGORITHM=INTEGRATION. However, my syntax is as following and seems to follow examples provided in the Users Guide: Model: %overall% ip sp  zppov1@0 zppov2@1 zppov3@2 ; ipsp@0; ipa spa  zaaff1@0 zaaff2@1 zaaff3@2; ipaspa@0; wjapwcx5 on c1 c2; How can I fix the syntax to get it to run? Much appreciated! 


You can use c1 and c2 as covariates. The effect you are looking for is found in the varying of the means of japwcx5 across classes. Remove the ON statement. 


I am not sure where to post my question... I am investigating the emotional influence of one group member on other group members across time (4 measurement moments). In order to test for this influence, I used a fully crosslagged model. I find that group members influence each other's emotions. But I also want to know how the path looks like. Are they influencing each other upwardly or downwardly? How can I test for this? I was thinking about running a growth model for two parallel processes (example 6.13 from the manual). Would this be the right thing to do? Are there other altnernatives that look at influence patterns and paths across time at the same time? Thank you for your help! 


This sounds like a question better served by a general discussion forum like SEMNET. 

IYH Boon posted on Tuesday, December 11, 2012  9:39 am



I realize that LCGA and GMMs are specifically designed for use with longitudinal data, but I'm wondering about LCA. Are there any specific drawbacks (or benefits) to using LCA when the observed indicators are timeordered variables (e.g., repeated measures of poverty status)? Wagmiller et al. (2006) did it in the (widelycited) paper referenced below, but I haven't seen many other examples. Any thoughts or information would be very much appreciated. Thanks in advance. IYH Wagmiller, Robert L., Mary Clare Lennon, Li Kuang, Philip M. Alberti, J. Lawrence Aber. 2006. "The Dynamics of Economic Disadvantage and Children's Life Chances." American Sociological Review 71(5):847866. 


There is no drawback to using LCA with longitudinal data other then the fact that your findings will be very similar to LCGA and LCGA is a more parsimonious model. LCA is a great way to see the growth shape for use in a LCGA. See the Topic 6 course handout on the website starting at Slide 76. 


Dear Drs Muthen, I'm conducting some LCGA with 4 indicators using a piecewise model (one slope is defined by 2 indicators only with variance fixed at zero). I would like to identify my trajectories while controlling for some covariates. My issue is that I am not specifically interested in the effects of the covariates on the trajectory membership. Instead, I need to account for the effects of these covariates while estimating my models because they are related to my indicators. Thus, my question is: should I introduce my covariates on the class membership (e.g., c#1 on cov) OR should I better control for the effects of the covariates on each indicator? Thanks! 


I wonder how you know that the covariates influence your indicators and not the class membership. Covariates that influence the class membership have an indirect influence on the indicators, so they are not uncorrelated. 


Thanks for your reply Bengt! I actually do not know whether my covariates influence the class membership but, as you say, I assume they do. However, because I am not specifically interested in these effects I was wondering what was the best way to estimate trajectories while controlling for the effects of these covariates (as I know that they affect my indicators). From your reply, it seems to me like I should end up with highly similar results, right? Thanks!!! Matteo 


I would just regress the class membership on your covariates. 

ywang posted on Monday, March 18, 2013  9:00 am



Dear Drs. Muthen: I have a question about the parallel latent class growth modeling. Can we get the output of Odds Ratio instead of proportion (for example, the OR of the m class for outcome Y1, given the n class of outcome Y2)? If so, what is the syntax? Thanks! 


Not automatically. But you can form the odds ratios you want in MODEL CONSTRAINT, creating NEW parameters. 

ywang posted on Wednesday, March 27, 2013  8:38 pm



Dear Drs. Muthen: I would like to follow up with the question regarding the Odds Ratio for parallel latent class growth modeling.The output is as follows. We have three classes for C1 and three classes for C2. We specified "C2 on C1". So it is a multinomial logistic regression. I am not sure how to interpret. For example, the regression coefficient of C2#1 on C1#1 is 3.318. Exp(3.318) is 0.036. Can it be interpreted as that the relative risk for the kids to stay in C2#1 class versus C2#3 class is much smaller (only 0.036 times) for the kids in C1#1 class compared to the kids in C1#3 class? If it is not correct and the relative risk ratio cannot be acheived by directly exponentiating the regression coefficient, how can we get the relative risk ratio? Thanks a lot for your help in advance! Categorical Latent Variables C2#1 ON C1#1 3.318 1.650 2.011 0.044 C1#2 3.246 1.588 2.044 0.041 C2#2 ON C1#1 3.345 1.505 2.223 0.026 C1#2 3.834 1.382 2.773 0.006 Means C1#1 1.748 0.359 4.865 0.000 C1#2 2.053 0.373 5.501 0.000 C2#1 0.102 1.471 0.069 0.945 C2#2 1.335 1.386 0.963 0.335 


The exp of c2 ON c1 is the odds ratio. As Wikipedia says: Relative risk is different from the odds ratio, although it asymptotically approaches it for small probabilities. 

Zihan Wei posted on Saturday, May 04, 2013  7:14 am



Dear Drs. Muthen, I have two independent variables, X1, X2(both binary) and a dependent variable with 10 time points repeated measurement, Y1Y10(continuous). I'm interested in how Y develops and how X1 and X2 affect the development of Y. It was suggested that Y is heterogeneous and may have subgroups, so I chose GMM to analyses my data. I followed the User's Guide and some other papers and written my main model syntax as follows: ANALYSIS: TYPE = MIXTURE; starts=10 2; stiterations=10; MODEL: %OVERALL% i s y1@0 y2@1 y3@2 y4@3 y5@4 y6@5 y7@6 y8@7 y9@8 y10@9; is@0; i s on x1 x2; c on x1 x2; I'm totally new to Mplus, so I have some questions about the analysis. (1) Is this a correct model to solve my research problem? (2) I found that the path coefficient estimate for the regression of the slope and the intercept on predictor X1, X2 are the same in every class. But I suppose the coefficient may be different in different class. So, if I want to estimate the path coefficient differently in different class, is that possible? Thanks a lot! Regards, Zihan Wei 


To relax the equality constraint, mention the regression in the classspecific part of the MODEL command, for example, %c#1% i s ON x1 x2; 

ywang posted on Tuesday, July 30, 2013  12:19 pm



Dear Drs. Muthen: I would like to follow up with your response regarding the p values for LoMendellRubin Test a few years ago. In one of your responses, you mentioned that "For the LoMendellRubin test, pvalue<0.05 indicates the model with one less class is rejected in favour of the estimated model." I am wondering if I can use p<0.1 as a cutoff. I am working on a paper and have a p value of 0.07 for 3class vs. 2class model. However, there is variation in slopes (both slopes not significant different from 0) in 2class model while there is variation in slopes in 3class model (some significant slopes from 0, and some not). The BOOTSTRAPPED LIKELIHOOD RATIO TEST shows p values less than 0.001 for either 3 compared to 2 or 4 compared to 3. BIC keeps dropping. I would like to choose the 3class model instead of 2class model based on (1) the LMT LRT p<0.1 and (2) the fact that there is differences in slopes across classes. Do you think I can use p value<0.10 for justification for my selection of class number? Thanks! 


No, I wouldn't fiddle with the alpha level. You could instead argue that the pvalue is suggesting 2 classes, but that the 3class solution adds a substantively meaningful class, while a 4th class does not. Or, you could investigate why BIC keeps dropping  that is often a sign that a different model is needed. 

ywang posted on Tuesday, July 30, 2013  2:14 pm



Thanks a lot for the response. It is great idea to argue that 3class model includes an additional meaningful class. I have a followup question. You mention a different model might be needed when BIC keeps dropping (for both LCGA and GMM). Do you mean a different model such as inclusion of quadratic slope, or do you mean any other alternative models? 


Quadratic, or, for instance, a free growth factor variance in a class that needs it. 


Dear Muthen & Muthen, I am running LCGA and GMM models. I want to compare how parameters and trajectories differ between genders. However, running a multi group analysis results in very different results. I Assume this is due to the smallest trajectories for the full sample not consisting of a large enough sample for each gender. I am therefore wondering what the best method is? Shall I simply add gender as a covariate to the analysis? Many thanks Lina 


If you want to compare trajectories by gender, ideally you will do an LCA by gender as a first step to determine whether you find the same classes. If you regress c ON gender, you assume there are no direct effects from gender to the outcome and the intercept and slope growth factors. 


Hi, I am running a Latent Growth Curve Model with binary outcome and logit link. I have 4 time points. The code for my model is summarized below: CATEGORICAL are SMOKE1 ... SMOKE4; MODEL: i s SMOKE1@1 SMOKE2@2 SMOKE3@3 SMOKE4@4; i on race_h race_w race_b; s on race_h race_w race_b; In the output I have that the latent variable “I” for the intercept has mean 0 and I have a threshold of 2.3. As far as I know, the threshold is the same as the intercept times 1, right? I want Mplus to output an intercept instead of a threshold and I added the following line to my code: [SMOKE1$1@15 SMOKE2$1@15 SMOKE3$1@15 SMOKE4$1@15 i]; This successfully gives the latent variable “I” a mean which corresponds to the intercept I was looking for, and all the thresholds are 15 which I thought it meant that I don’t have the thresholds anymore. However, the results are not exactly the same. The intercept I get is 19.2, but I thought it should be close to 2.3. Also, the coefficient of i on race_h is very different, but everything else seems to be very close with both codes. Is this the appropriate way to ask Mplus to use an intercept instead of a threshold? Thank you for your time. Sebastian 

Jon Heron posted on Wednesday, April 16, 2014  8:07 am



Hi Sebastian, I think you should be fixing your thresholds to zero rather than minus 15 to transfer that value to the intercept. To get this working, I would be inclined to temporarily remove the covariates from the model. Once you include covariates for i and s, Mplus no longer quotes you the latent variable means. Confusingly, you are now given an intercept for your Intercept. This will only equal the mean if your covariates are centred. best, Jon 


Hi Jon, Thank you for your response. Just to make sure I am understanding this correctly, what I should include in my code is: CATEGORICAL are SMOKE1 ... SMOKE4; MODEL: i s SMOKE1@1 SMOKE2@2 SMOKE3@3 SMOKE4@4; i on race_h race_w race_b; s on race_h race_w race_b; [SMOKE1$1@0 SMOKE2$1@0 SMOKE3$1@0 SMOKE4$1@0]; Is this correct? Thank you for being so helpful! Sebastian 


That's correct. Except you want to add [i]; to free the intercept mean. You also want to have a time score 0 to clearly define i, so for instance: i s SMOKE1@0 SMOKE2@1 SMOKE3@2 SMOKE4@3; 


Thank you! 


Hi, I decided just to go with the default Threshold instead of trying to create an intercept. CATEGORICAL are SMOKE1 ... SMOKE4; MODEL: i s SMOKE1@0 SMOKE2@1 SMOKE3@2 SMOKE4@3; i on race_h race_w race_b; s on race_h race_w race_b; I get some output that looks like this (not showing SD or pvalues): I ON RACE_H 4.345 RACE_W 2.167 RACE_B 4.767 S ON RACE_H 0.6 RACE_W 4.471 RACE_B 0.77 THRESHOLD 2.128 INTERCEPTS I 0 S 19.353 Let say I want to plot the probabilities over time for Hispanics(race_h is a dummy for Hispanics).Does the following make sense, where P(t) is the probability over time: I=2.1284.345 S=19.3530.6 P(t)=1/(1+exp(IS*time)) ? I have seen some examples of this and I believe this is how they do it but I wanted to confirm it. Thank you 


This gives the probability at the means of i and s. So a conditional probability, not the marginal. The marginal needs to be integrated over the distribution of i and s given covariates. This is what is shown in the plots that Mplus makes (Adjusted estimated means). 


Hi, Thanks for your previous response. This makes sense now. I come from a Mixed Models background and displaying probabilities as I mentioned above wouldn't make sense without integrating out the 'random effects'. I have added the following lines to my code: Plot: type =plot2; series= SMOKE1(1) SMOKE2(2) SMOKE3(3) SMOKE4(4); When I open the graph and I get the window that says 'Select plot to view', I only get as options 'Sample proportions', 'Item characteristic curves' and 'Information curves'. Nowhere I can select 'Adjusted Estimated Means'. Could you help me with this please? Thank you. Sebastian 


Please send the output, graph file, and your license number to support@statmodel.com. 


Dear Dr.Muthen, I have a longitudinal data set with four waves. At wave 1, the paricipants's baseline age range are 1221.I want to estimate trajectory memberships based on age, not wave. That is the cohort sequential design. Is it fine that I resturcture the data and create agebased variables, and then use LCGA to indentify the trajectories? Does this way ignore the effect of cohort? Thanks!!! 


You can restructure the data so time is age. When you do this you make the assumption that all cohorts come from the same population. You can also take a multiple group approach. See Example 6.18. 

RuoShui posted on Thursday, March 05, 2015  3:24 pm



Dear DRs. Muthen, I am conducting growth mixture modelling for two big groups A and B. I found the same number and similar pattern of trajectory classes. I can compare the mean differences in academic adjustment among subgroups such as A1, A2, A3 and A4 in Mplus. But is there a way to also compare the mean differences between similar subgroup such as A1 and B1 in Mplus? Thank you very much! 


Statistically, you can do any mean difference estimation using Model Constraint. But perhaps you are asking when it is substantively ok to compare across subgroups. 

RuoShui posted on Tuesday, March 10, 2015  8:43 pm



Dear Dr. Muthen, Thank you very much for your response above. My question is more of a technical question. I was able to test whether the residuals of distal outcomes (auxilary variable: (e)) differ across latent classes within Group A or B. But I couldn't figure out how to simultaneously test in Mplus whether the residuals of distal outcomes differ across groups A and B for the similar latent classes (such as between A1 and B1; and between A2 and B2). Do you have one example that I could follow? Thank you very much for your time. 


I don't understand your setting. Are you doing an auxiliary (e) analysis? If so, how do you test residuals of those aux(e) variables? Send output to support if this is hard to explain without it. 


I am running an LCGA model using the 3 step procedure and attempting to look at an outcome specified as latent variable. The model freely estimates the means and variances of the latent factor for all 4 groups. This produces an error saying that the mean of the latent factor is fixed to zero for the last class (a default solution), but the variance is estimated for the that factor. Does this happen because the model is not identified unless the latent factor mean for one group serves as the contrast (i.e., fixed to 0). When this is done the estimated means for the other latent factors are difference scores relative to that group. The significance test for the other group means in turn indicates whether they are significantly different from the class where the mean is fixed to zero (i.e., relative difference in means). 


Correct on all counts. 


Thank you for the feedback. Building on the previous question (4/2/2015 at 11:13 AM), we are wondering about the most appropriate way to get estimated means/intercepts and standard errors for each class. If no groups are specified as the contrast group (no means are fixed to 0), the mean for the last class is automatically fixed and there is no standard error estimated for this class (for clarity, we’ll call this Model 1 here). Is it possible to generate or calculate the standard error for the last class? We tried fixing the mean of the latent outcome for class 1 at the value provided in the output from Model 1. This allows the mean of the latent factor to be freely estimated for the last class, which allows a standard error to be estimated for this class. However, when we do this, the standard errors associated with the means of the latent factor for the other classes are different than what they were in Model 1 (although the means and other parameter estimates are basically unchanged). Is is okay to get the standard error for the last class this way? Alternatively, would it be better to square root the residual variances for the latent outcome variable and present these values with the means (instead of the standard errors)? 


You should not want an estimate and SE for the factor mean in the reference group where it is fixed at zero. There is no more information to be obtained. Factors have an arbitrary scale and their metric needs to be set. Mean zero and variance 1 is one way. Only in comparisons across groups or time can you estimate factor means  and then only in comparison with a reference group/timepoint where the mean is fixed (say at zero). There is no disadvantage in this. 


Thanks for the reply Bengt. What we are trying to do is set up a table that summarizes group differences on a latent variable within the context of a 5 class LCGA model. We wanted to set it up similar to a traditional ANOVA table that has means and SD for each of the groups. We can definitely indicate which group has a factor mean set to 0 so the reader understands the estimated mean factor scores are relative to this group. However, we also wanted to include some metric indicating the dispersion around this mean value for each class. Is it appropriate to take the squareroot of the estimated factor variance to get a SD about the mean scores even for the contrast group? Our model includes covariates, so the variance estimates are after adjusting for these factors. Thanks. 


Yes. 

Back to top 