Message/Author 


Hi, I am not a statistician, but I am conducting longitudinal research with categorical dependent variables. My variables have 5 different classes. I am having some difficulty understanding the threshold concept. Could someone clear this up for me, and explain how I would use thresholds in my Mplus command sequence? 

bmuthen posted on Monday, September 09, 2002  6:38 pm



Fundamental issues related to thresholds for ordered polytomous (ordinal) dependent variables in regression is described in Agresti (1990; pp. 322324) as given in the User's Guide. You may view this situation as wanting to measure a continuous dependent variable y*, but you can only observed it crudely as y = 0, 1, 2, ... where observing a category occurs if y* is in between the corresponding thresholds. Think of y* as the specific ability needed to solve a math item correctly. You solve it if y* exceeds a threshold. For an Mplus input example of a threshold structure in growth models, see page 211 of the User's Guide. 


Now, my ordered categorical variable has categories ranging from 0 to 4. I have four thresholds. Is my first threshold to be set equal zero or one? 

bmuthen posted on Tuesday, September 10, 2002  8:00 am



The parameterization used in the User's Guide on page 211 is to hold thresholds equal across time. That is, no threshold is fixed at zero. Instead, the intercept factor mean is set at zero. Other parameterizations include the one where the first threshold is instead fixed at zero and the intercept factor mean is free. 


I'm using LCGA and GMM to examine social mobility in a longitudinal dataset by looking at the different career trajectories respondents' typically follow. My variables are categorical (ordinal), recording social class at 4 different time points. I would be grateful if you could confirm whether the syntax I am using is correct. My reading of the Version 3 User's Guide is that the only differnce between the specification of the LCGA and GMM model is the line 'ALGORITHM=INTEGRATION' in GMM models? Is this right and can I assume that the slope and intercept variance is zero in the LCGA model when no variance is included in my LCGA output? The reason I ask is becuase the version 3 syntax appears to be much simpler than the syntax associated with the Muthen and Muthen article 'Integrating PersonCentered and VariableCentered Analysis' (Vol 24:6 Alcoholism: Clinical and Experimental Research'and other articles I've printed off the MPlus website. Any enlightenmnet would be much appreciated! 

bmuthen posted on Sunday, November 14, 2004  12:39 pm



It sounds like your syntax is right. In addition to looking at your regular results, you can request and check Tech1 where the "Psi" matrix will show you if you have any free variances. If the parameters of Psi all have integers 0, then Psi parameters are not involved and you have LCGA. 

Huabin Luo posted on Tuesday, May 17, 2005  4:15 pm



Hi, I am working on a twopart semicontinuous modelthe outcome varible has a preponderous of zeros, similar to the model introduced in this Mplus website lecture 10. I have 2506 observations in the sample. And the outcome variale(hospital subacute beds) was measured at 5 timepoints(5year period). Out of the 2506 hospitals, 538 hospitals reported "0" subacute bed in each year of fiveyear period, other hospitals reported different patterns of zeros of subacute bed. I decomposed the outcome variable into Y and U part. I followed the coding method introduced in lecture 10 in this website. And I coded the outcome variable as missing, corresponding to a value of "0" in the U part. My inital run was to estimate the unconditional model with the whole sample (2506 cases), the codes are attached below. It did not converge. But it converged if I put in some covariates. Today, I deleted those 538 observations and estimated the model( now there are 1968 cases in the sample) again with the same codes(without any covariates), it converged. I am wondering why this happened and I appreciate your suggestions. Sincerely, Huabin The following is the command code I used: Data: file is may9.dat; variable: Names are hsa system owner u1u5 fteh92fteh96 y1y5 hmo mimic; categorical=u1u5; usev=u1u5 y1y5; missing=all(99); analysis: type=missing; estimator=mlr; miterations=1000; algorithm=integration; coverage=.09; model: iu su  u1@0 u2@1 u3@2 u4@3 u5@4; iy sy  y1@0 y2@1 y3@2 y4@3 y5@4; iu with sy@0; iy with su@0; su with sy@0; 

bmuthen posted on Tuesday, May 17, 2005  4:35 pm



Hard to say without looking at it; please send output and data to support@statmodel.com together with your license number. 

Vic chen posted on Thursday, October 20, 2005  12:44 pm



Hi, Professors, I am running a growth mixture model with 4category ordianl indicators across 4 time points. I found that when I ran a LCCA model, the output contained RESULTS IN PROBABILITY SCALE. Is this the same information that we can get in PLOT command under estimated probabilities? After the LCCA model, I ran a GMM model by putting ALGORITHM=INTEGRATION. The RESULTS IN PROBABILITY SCALE was missing, but I still can get the estimated probabilities plot. Is there a way to get the RESULTS IN PROBABILITY SCALE information under GMM model? Thanks for you help! 

BMuthen posted on Thursday, October 20, 2005  6:37 pm



Results in probability scale refers to the conditional probability of an item given the latent class. In LCA, the plotted probabilities are the same as these. In GMM with categorical outcomes, the conditional item probabilties require numerical integration. In this situatoin, conditional probabilities are not computed. The values in the PLOT command are estimated probabilities computed at the mean of the growth factors, so they are not item probabilities condtional on class. 

Thrasher posted on Wednesday, October 26, 2005  5:00 pm



I've been working on a multiple indicator growth curve model with ordinal indicators, and in order to make the model run, I had to deviate a little from the syntax you suggest in example 6.15 on page 94 of the MPlus manual. Although I can find no text explaining it and no other examples of its use in the other growth curve examples in the book, I assume that the second to last line of syntax (i.e., "{u11u31@1 u12u33};") fixes to 1 the factor loadings for the scaling indicators of the first order factors, while leaving the other loadings to be freely estimated. After repeatedly getting error messages reporting problems with one of the indicator parameters, I got rid of the statement in the {}’s and tried mimicking your specification by other means. In other words, I used the "@1" command to fix the loadings within the model statement for the factors, as shown in the syntax below for the first indicator associated with each first order factor. The resulting model runs and provides reasonable output; however, I worry that by doing this, I am misspecifying the growth model. Am I? variable: names=weight strata canada uk us aust age male other edu income innoads1 innoads2 innoads3 inreg1 inreg2 inreg3 intruth1 intruth2 intruth3 inresp1 inresp2 inresp3 proads1 antiads1 warlab1 smkbans1 hlthcon1 functn1 qtint3_4 quitrct1 quit3 smkstat1 hvysmk1; usevariables=innoads1inresp3; categorical=innoads1inresp3; missing are .; stratification=strata; weight=weight; analysis: type=meanstructure complex missing; model: wave1 by innoads1@1 inreg1 (1) intruth1 (2) inresp1 (3); wave2 by innoads2@1 inreg2 (1) intruth2 (2) inresp2 (3); wave3 by innoads3@1 inreg3 (1) intruth3 (2) inresp3 (3); [innoads1$1 innoads2$1 innoads3$1] (4); [innoads1$2 innoads2$2 innoads3$2] (5); [innoads1$3 innoads2$3 innoads3$3] (6); [innoads1$4 innoads2$4 innoads3$4] (7); [inreg1$1 inreg2$1 inreg3$1] (8); [inreg1$2 inreg2$2 inreg3$2] (9); [inreg1$3 inreg2$3 inreg3$3] (10); [inreg1$4 inreg2$4 inreg3$4] (11); [intruth1$1 intruth2$1 intruth3$1] (12); [intruth1$2 intruth2$2 intruth3$2] (13); [intruth1$3 intruth2$3 intruth3$3] (14); [intruth1$4 intruth2$4 intruth3$4] (15); [inresp1$1 inresp2$1 inresp3$1] (16); [inresp1$2 inresp2$2 inresp3$2] (17); [inresp1$3 inresp2$3 inresp3$3] (18); [inresp1$4 inresp2$4 inresp3$4] (19); i s  wave1@0 wave2@1 wave3@2; innoads1 with innoads2 innoads3; innoads2 with innoads3; inreg1 with inreg2 inreg3; inreg2 with inreg3; intruth1 with intruth2 intruth3; intruth2 with intruth3; inresp1 with inresp2 inresp3; inresp2 with inresp3; 


The statement {u11u31@1 u12u33} refers to the scale factors not the factor loadings. Go back to the syntax in Example 6.15 and then send your input, data, output, and license number to support@statmodel.com and I will try to help you. 

Anonymous posted on Saturday, November 05, 2005  2:49 pm



I am interested in knowing the commands behind the growth model syntax when using categorical variables. In particular, I'm interested in knowing how to free up thresholds in order to test for threshold invariance across time. While keeping the first two thresholds equivalent across time, I've attempted to free up the others by stating the following after the standard growth model syntax: [y1$3 y2$3 y3$3]; I've also tried inserting asterisks in order to free them up: [y1$3* y2$3* y3$3*]; In either case, the default of equivalent thresholds across time holds up. Is there any way to free these parameters while using the standard growth model syntax? If not, could you let me know the commands behind the syntax so that I can free up these parameters? 


I would need to see your input, data, output, and license number at support@statmodel.com to answer this. I am not sure why this is happening. 

HeeJin Jun posted on Wednesday, August 08, 2007  11:40 am



Dear Linda, I am doing GMM with categorical outcomes (famdin96, famdin97 and famdin98 are the categorical outcome with 3 ordered categories). I modified the example 8.4 and added Plot command. VARIABLE: NAMES ARE boy famdin96 famdin97 famdin98 idm kidid mplusid; USEV = famdin96 famdin97 famdin98; !boy; CLASSES = c (2); CATEGORICAL = famdin96 famdin97 famdin98; !CLUSTER = idm; IDVARIABLE=mplusid; MISSING are famdin96 famdin97 famdin98(999); ANALYSIS: TYPE = MIXTURE MISSING; !COMPLEX; STARTS = 100 50; STITERATIONS = 20; ESTIMATOR = MLR; ALGORITHM = INTEGRATION; MODEL: %OVERALL% i s  famdin96@0 famdin97@1 famdin98@2; !i s ON boy; !c#1 ON boy; PLOT: TYPE IS PLOT3; SERIES IS famdin96 famdin97 famdin98 (s) ; What I got from the graph are Histograms Scatterplots Sample proportions Estimated probabilities Item characteristic curves Information curves Also, when I included i s ON boy; c#1 ON boy; I didn't even get the option of Sample proportions and Estimated probabilities. Will you teach me how to create a trajectory curves from this results. Thank you for your help. HeeJin 


These plots are not available when numerical integration is used and covariates are included in the model. 


Hello, I am new to MPLUS and I am also running a GMM analysis with categorical outcomes similar to the example cited directly above. I was wondering if you could help me with the interpretation of the results if they are not in probability scale. For example, how would you interpret the beta estimate of the covariate 'boy' above? Also, what type of model is used in the statement 'c#1 ON boy' logistic regression? Do you have a paper using the method or describing the method that I may use for reference? Further, could this model include time varying covariates? 


The regression of a categorical latent variable on a covariate or set of covariates is a multivariate logistic regression. See Calculating Probabilities From Logistic Regression Coefficients in Chapter 13 for a discussion of multinomial logistic regression. 


Dear Dr. Muthen, I am running a growth model with binary responses across 4 time points. I have a basic question about how to interpret the unconditional model. For instance, how should I interpret level and trend? What importance should I give to thresholds? Thank you for your time. 


With a categorical outcome individual development across time is in terms of probabilities rather than values as with continuous outcomes. Chapter 8 of the Bollen and Curran book Latent Curve Models discusses this topic. 

Emily Blood posted on Friday, November 28, 2008  9:10 am



I am using the MONTECARLO feature to repeatedly fit a conditional latent growth model model with repeated binary outcomes (y), repeated predictors (z) with repeated mediator (m)6 timepoints. I am interested in the total effect of each z on each y. I use MLR and logit link. I have generated the data with a logit link so do not want to use a probit link (for which the same setup with WLSMV & probit link works with INDIRECT option). I get the error "MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION.". Can I not use the INDIRECT option when I use MLR and logit link? 


No, but indirect effects can be evaluated as products of slopes defined in Model Constraint. You should also read the mediational paper MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499513. which is on our web site. 

Emily Blood posted on Monday, December 01, 2008  6:45 am



Thank you, Dr. Muthen. I will read this paper, however, since I am running Monte Carlo study I don't get all of the regression coefficients out so can't calculate each indirect and total effect, but could only calculate the product of two mean coefficients. 

Emily Blood posted on Monday, December 01, 2008  6:57 am



I will try the MODEL CONSTRAINT command to obtain the total effect. Currently I am not using that. Thank you. 


I am running growth models where the variables at each wave of measurement are binary (0,1) categorical variables. Do I interpret the unstandardized parameter estimates the same way as I would in an ordinary logistic regression (i.e., exponentiate the b coefficient to get an odds ratio)? 


If you are using maximum likelihood estimation, the default is logistic regression so parameter estimates can be exponentiated. If you are using the default weighted least squares estimator, you obtain probit regression coefficients which should not be exponentiated. 

ywang posted on Tuesday, September 29, 2009  10:27 am



Dear Drs. Muthen: I am new to the LGM with categorical variables. I have a starter's question. The LGM model without covariates showed no significant variance for either i or s. However, when I included the regresison of both i and s on several covariates in the LGM, significant relationship was found between some covariates and i or s. In this case, can I report the relationship between the covariates and i or s in the paper? Or since there was no significant variance for i or s in the model without covariates, I should not further explore the LGM by including any covariats to predict i or s? Thank you very much for your help in advance! 

ywang posted on Tuesday, September 29, 2009  11:36 am



Dear Drs. Muthen: I have another question about LGM with categorical variables. When I used ML or WLSE, the significance of coefficients is very different. For example, the ML shows no significant relationship between depression and s; however, the WLSE shows signficant relationship (shown below). I used SAS software to doublecheck and it seems that SAS GEE model is consistent with Mplus ML estimation. My question is (1) why there is inconsistency between ML and WLSE and (2) which one I should trust? Thanks a lot! WLSE: S ON STNBDI_B 0.708 0.176 4.019 0.000 ML: S ON STNBDI_B 0.708 0.679 1.044 0.297 


Regarding your first question, adding covariates may increase the power to detect variation in i and s. I would report the results with covariates. Regarding your second question, I assume you mean WLSMV, not WLSE. It is hard to answer you without looking at the outputs. However, with categorical outcomes, the default growth model used with WLSMV is more general than the model used with ML  it allows residual variances for latent response variables to be different across time. SAS GEE probably uses the ML model. If you want a more precise answer, support needs to look at your ML and WLSMV runs, so please send input, output, data and license number to support@statmodel.com. 


Dear Linda, I am trying to run a LGM model for offending measured as a categorical variable (there are 6 categories in the offending scale) and test whether victimization measured repeatedly across six waves of the study has both distal and proximal effects on offending. In order to address these questions, I ran 1) parallel processes LGM and 2) autoregression models for 6 various age cohorts of adolescents in the sample. Next, I would like to include several covariates simultaneously into all of my models. All my covariates are timeinvariant and categorical (e.g., race, family income, urbanity, etc.). When I include them, I get the warning statement “ALGORITHM=INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE=MIXTURE” Could you please advise me on how to solve this? 


It sounds like you are putting your covariates on the CATEGORICAL list. You should not do this. It is for dependent variables only. If your covariates are nominal, you need to create a set of dummy variables. Covariates should be binary or continuous. 


Dear Linda, I did not include covariates on the CATEGORICAL list. I included offending (dv) on this list only. 


With a categorical dependent variable and maximum likelihood estimation, for multiple group analysis you need to use TYPE=MIXTURE and the KNOWNCLASS option. This is multiple group analysis. You can use the GROUPING option with the default WLSMV estimator. 


Dear Linda, Thank you very much for your help. I was able to include race as a covariate using the KNOWNCLASS option for TYPE=MIXTURE. I cannot figure out how to include several covariates simultaneously. Can I use knownclass option for a list of covariates? Thank you. 

Jon Heron posted on Tuesday, July 13, 2010  3:07 am



Hi Arina, I expect you'll need to derive a single KNOWNCLASS variable for your grouping e.g. two binary variables gives you a 4level categorical variable. I'd be wary of going too far down this route though  you're likely to run into estimation problems if you have too many groups. From your initial post I'm wondering why you're using groups at all rather than just regressing your growth factors on your covariates. 


I agree. KNOWNCLASS is for multiple group analysis. It is not a way to specify covariates. 


Hello! I'm running GMM with categorical variables. It seems that 5 trajectories fit the data best. However, in the output it says that "one or more parameters were fixed to avoid singularity.." and it shows that 11 parameters were fixed (but the model estimation terminated normally). Why is this and can I trust the results? For example, some of the slopes were fixed which makes the interpretation of significant changes confusing. Thank you! 


It depends on what is fixed. Send the full output and your license number to support@statmodel.com. 


Dear Dr. Muthen We are trying to fit a linear growth model with three points in time. Mplus tells us that the numnber of df is negative. Do we need at least four points to fit a linear growth model in mplus? thank you fernando 


For a continuous outcome, you will have one degree of freedom for a conventional growth model. The model will not be identified if the outcome is categorical. Then you will need four time points. 


thank you, yes the outcome are categorical one more question we are fitting this model with categorical indicators MH4 BY nervw4r downw4r deprw4r; MH6 BY nervw6r downw6r deprw6r; MH8 BY nervw8r downw8r deprw8r; i s  MH4@0 MH6@0.1 MH8@0.2; we get this message THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 28. THE CONDITION NUMBER IS 0.468D16. FACTOR SCORES WILL NOT BE COMPUTED DUE TO NONCONVERGENCE OR NONIDENTIFIED MODEL. is the message also due to the fact that we are working with categorical indicators? we also correlated the errors across time getting the same message (e.g. nervw4r with nervw6r; nervw6r with nervw8r; nervw4r with nervw8r;) fernando 


It would seem so. If you want a definitive answer, you would need to send the full output and your license number to support@statmodel.com. 

Yuchun Peng posted on Wednesday, February 16, 2011  9:11 am



Hi, I have a basic question Could you help me to explain the main difference between the item conditional probability and estimated probability As you posted above, the estimated probability is computed at the mean of growth factors. But, Given the mean intercept for the last cluster was set at 0, how the estimated probability for this cluster can be computed? Thanks Vicky 


If you have a latent class mode, the probability is the sum over the classes of the product of the class probability and the conditional item probability. If you have a laten trait model (=factor model, = IRT model), then the sum is replacted by an integral over the range of the normal factor. 

gibbon lab posted on Tuesday, October 11, 2011  12:05 pm



Hi Professors, I am running a LGA with four ordinal categorical variables. I followed the example given in the short course on growth modeling. But Mplus 5.21 gives me error message. Below has the code and output. Thanks. categorical are t3smklast3bin t4smk3cat t5smk3cat t6smk3bin ; MISSING ARE ALL (99); analysis: parameterization is theta; MODEL: smk_i by t3smklast3bin@1 t4smk3cat@1 t5smk3cat@1 t6smk3bin@1; smk_s1 by t3smklast3bin@0 t4smk3cat@3 t5smk3cat@6 t6smk3bin@8; [t3smklast3bint6smk3bin] (1); [smk_i@0 smk_s1]; The output says *** ERROR The following MODEL statements are ignored: * Statements in the GENERAL group: [ T3SMKLAST3BIN ] [ T4SMK3CAT ] [ T5SMK3CAT ] [ T6SMK3BIN ] 


For categorical variables, you refer to thresholds not means. The statement should be: [t3smklast3bin$1t6smk3bin$1] (1); It suggest using the growth language because the defaults are better for growth models. The BY statement defaults are for factor analysis. You model statement would then be: smk_i smk_s1  t3smklast3bin@0 t4smk3cat@3 t5smk3cat@6 t6smk3bin@8; Everything else would be done as the default. 

Mark Litt posted on Tuesday, February 07, 2012  7:26 am



Dear Dr. Muthen, I am trying to run GMM or LCGA with categorical DVs measured at unequal time intervals over 14 months. The basic LCGA model is as follows: USEVARIABLES ARE pda0900 abst1 abst2 abst3 abst4 abst5; CLASSES = AbstC (4); CATEGORICAL = abst1 abst2 abst3 abst4 abst5; ANALYSIS: TYPE = MIXTURE; STARTS = 1000 2; STITERATIONS = 20; MODEL: %OVERALL% i s q abst1@1 abst2@2 abst3@3 abst4@4 abst5@5; abstc on pda0900; i s q on pda0900; The 'abst' variables are abstinence from drug use. Since no one was abstinent at time 0, I started at time 1, and used a continuous baseline variable 'pda0900' as a covariate. No matter how I adjust the model, I get a message like that below. The dataset has almost no missing data, so that's not causing the identification issue. Any suggestions? WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS. ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. 


You say the measurement times were not equidistant but the time scores you give are equidistant. A zero time score defines the intercept growth factor. See the Topic 3 course handout and video on the website where time scores are discussed. 

EFried posted on Wednesday, February 08, 2012  6:35 am



A quick question about GMM with 5 measurement points and a categorical outcome variable (values 0, 1, 2, 3): The papers I find and your videos mostly refer to continuous outcome variables when it comes to fit indices of GMM. For categorical outcome variables, (1) Is the BIC also the most important index when it comes to model fit, and (2) are t11 and t14 indices also the best to determine number of classes? Or do I have to apply different indices for categorical outcome variables? Thank you so much T. 

EFried posted on Wednesday, February 08, 2012  6:39 am



(sorry, I forgot one question: the MPLUS manual seems to add "algorithm = integration" to all categorical GMMs  is that correct? thank you!) 

Mark Litt posted on Wednesday, February 08, 2012  12:50 pm



Dear Dr. Muthen, Thank you for pointing out my problems with time scores. I decided a safer bet was to use contin vars. I tried to run this causal chain model, where PDA is a contin var, equally spaced: ANALYSIS: TYPE = MIXTURE; STARTS = 300 2; STITERATIONS = 20; ALGORITHM=INTEGRATION; MODEL: %OVERALL% i s q pda0900@0 pda0601@1 pda0902@2 pda0903@3 pda0904@4 pda0905@5; S@0; i q on zmjse1 zcsstot1; pdac on zmjse1 zcsstot1; zmjse1 on Attend ContAbst zmjse0; zcsstot1 on attend contabst zcsstot0; attend contabst on cond2 cond3; When run, I get FATAL ERROR RECIPROCAL INTERACTION PROBLEM Thanks, //M Litt 


Please send your output and license number to support@statmodel.com. 


EFried: You can also use TECH10 with categorical outcomes. See the following paper which is available on the website for further information: Muthén, B. & Asparouhov, T. (2009). Growth mixture modeling: Analysis with nonGaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143165. Boca Raton: Chapman & Hall/CRC Press. 

EFried posted on Friday, February 10, 2012  6:57 am



Thank you for the tech10 recommendation. The paper (Muthén, B. & Asparouhov, T. (2009)) states: "With categorical and count outcomes, model fit may be investigated with respect to univariate and bivariate fre quency tables, as well as frequencies for response patterns that do not have too small expected counts." Is there some information as to how to read the tech10 output? I don't quite get how to deduct model fit from it. Also, I see some cells as "deleted" in tech10, I guess that refers to this warning: "Of the 2580 cells in the latent class indicator table, 6 were deleted in the calculation of chisquare due to extreme values." Is that normal or does it imply corrupt data? Thank you Eiko 


See the crime example in the paper. They discuss TECH10. 

jpmv posted on Wednesday, May 30, 2012  7:46 am



Dear all, I’m conducting some latent growth models using binary outcomes at 6 time points. In an unconditional model using a maximum likelihood estimator I obtained significant variances around intercept and slope. Thus, I’ve conducted some conditional models and found a significant unstandardized estimate for one covariate on the intercept growth factor equal to 4.006 (SE = .876). Now, I’m having concerns in interpreting this estimate that would correspond to an OR equal to 55 (CI = 9.81303). How can I interpret such a large effect? Also, do you know any good empirical paper that used latent growth curve with binary outcomes? Any suggestions are welcome. Thank you in advance for your answer! 


The effect of the covariate on the intercept growth factor should not be exponentiated because you are considering a linear regression  your DV is a continuous variable (the intercept growth factor), not a binary one. For examples, you can look at the RaudenbushBryk book. 

jpmv posted on Friday, June 08, 2012  7:03 am



Dear Bengt, Thank you for your previous post! I would like to probe the effect of a covariate on the slope of my model. As the default, the mean of the intercept growth factor is fixed at zero and the thresholds are estimated. Yet, to probe the effect of a covariate on the mean of the linear growth factor I guess I would also need to estimate the mean of the intercept growth factor. So, my questions are: Can I estimate an intercept growth factor if my outcomes are binary and if so, do I need to fix the thresholds?  What would be the main difference between these two models (with thresholds and fixed intercept vs. with intercept and fixed thresholds)?  If I estimate the intercept growth factor does the following equation Y = a + b*time + c*a + c*b*time represent a conditional latent growth model in which y is the logit of the behavior, (I’m using MLR and algorithm integration) and the other terms respectively the intercept, the slope, the effect of the covariate on the intercept and the effect of the covariate on the slope? Thank you in advance for your answer! 


The two models you mention fit the data identically  they are simply reparameterizations of each other. We discuss this in Topic 4. See Topics 3 and 4 to answer your last question. 

Raed Ali posted on Monday, April 08, 2013  10:59 pm



Hi, I have a continuous outcome measured over six waves. The patients were randomized into four treatment categories coded as 1, 2, 3, and 4. I want to test the growth among the four categories (predictor variable, X) of patients to see which category has the best growth. I know that I should create 3 dummy variables for my categorical predictor but I don't know what is Mplus syntax to do that? Also, what should be the syntax for the MODEL command? I want to include a gender (coded 0 and 1) as a predictor of growth, what should be the syntax for the MODEL command? Thank you. 


You can use the DEFINE command to create the three dummy variables. Or you could do a multiple group analysis if the group sizes are large enough. See Example 6.10. 

Raed Ali posted on Tuesday, April 09, 2013  10:24 am



I saw the example you referred to, my predictor is coded as 1, 2, 3, and 4 (Treat)representing four different treatment categories. I have one continuous outcome variable measured over 6 waves (Log1  Log6). This is the syntax I am using: VARIABLE: names are id Treat Age Male Log1Log6; usevariables are Log1Log6 Treat Male; missing are all (99999); Define: if (Treat eq 1) then ???? MODEL: i s  Log1@0 Log2@1 Log3@2; int lin on Treat; I don't know how to do the dummy variables or the multiple group analysis (the sample size is large for each category). I deeply appreciate if you provide me with the complete syntax for the two conditions (dummy vars using DEFINE and multiple group analysis). Thanks 

Raed Ali posted on Tuesday, April 09, 2013  11:23 am



referring to the previous problem, here is my syntax: VARIABLE: names are id Treat Age Male Log1Log6; USEVARIABLES are Log1 Log2 Log3 D1 D2 D3; missing are all (99999); DEFINE: IF (Treat == 1) THEN D1 = 1 ; IF (Treat > 1) THEN D1 = 0 ; IF (Treat == 2) THEN D2 = 1 ; IF (Treat /= 2) THEN D2 = 0 ; IF (Treat == 4) THEN D3 = 1; IF (Treat /= 4) THEN D3 = 0; MODEL: int lin  Log1@2 Log2@1 Log3@0; int lin on D1 D2 D3; I got the following WARNING: THE MODEL ESTIMATION TERMINATED NORMALLY WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE LOG3. 1) how can I solve this problem? 2) If I keep it as it is, Do my results still VALID? I mean, can I use Mplus output in this case and make valid conclusions? Thanks! 


Please send your output and license number to support@statmodel.com. Please keep your posts to one window. 

Weiwei Liu posted on Friday, September 20, 2013  6:57 am



In a paper examining changes in maternal drinking at 6 time points: preconception, last 3 months of pregnancy, 9 months, 2 years, 4 years and 5 years after delivery, we estimated a conventional growth model with ordinal outcomes using 4level categorical measures (0 drinks; less than 1 drink; 13 drinks; 4 or more drinks per week). While model fit point to a model with intercept, slope and quadratic slope, some of the standardized residuals are fairly large (especially for the pregnancy time point), leading us to think that the measurement of alcohol use may be different for the pregnancy time point, i.e. the quadratic slope alone does not capture the drastic reduction in alcohol intake during pregnancy. Alcohol use may mean something entirely different during pregnancy (in a sense it might be perceived as “deviant”). A piecewise growth model with two slopes did not resolve the issue either. When freeing the thresholds for the pregnancy time point, the solution improves and in fact alcohol consumption seems different during pregnancy. We seem to have a combination of nonlinear change and measurement variance. I understand the default is to have time invariant thresholds (longitudinal measurement invariance). Is it legit to test for longitudinal measurement variance? Is a model where the thresholds for one time point are different from the thresholds at the other time points defensible? 


I think you can let a threshold for a certain timepoint be free if you have a good interpretation for it. You are then deviating from a standard growth model, although you can think of it as an effect of a timevarying covariate shifting the level of the growth curve at that point. 

Weiwei Liu posted on Monday, September 23, 2013  6:26 am



Thank you so much Bengt! This is really good to know. 


Hi All; I'm running a latent growth curve model (4 observations). What estimator does one use in a LGCM if your latent variable at one timepoint is made up entirely of categorical data, but at other time points the same latent variable is made up of continuous variables? It seems WLSMV with theta parametrization would misestimate the continuous latent variables while an MLR estimator would misestimate the latent variable made up of categorical variables. Thanks! 


Both WLSMV and ML can handle a combination of categorical and continuous variables. Note that ML doesn't mean treating variables as continuous. A bigger issue is how you can claim measurement invariance over time for your latent variable indicators so that the latent variable is on the same scale at the different timepoints as required for growth modeling. 


AH, yes. So if I can find a continuous variable to mix into the categorical timepoint that makes theoretical sense and is on a common scale with the later indicator variables then there wouldn't be a a problem? It seemed it would be such a problem to have mixed continuous and categorical variables, 


You are fine as long as you have some variables  categorical or continuous doesn't matter  at the different time points for which you can apply masurement equality constraints. 


wonderful, thank you so much. 


I am running a growth curve with 4 binary indicators over time. My categorical model used weighted least squares by default, and though I got decent model fit, when I added any predictors I got Heywood cases. The data I am using is complex (and this is specified in the model), so I tried changing the default estimator to the MLR estimator instead, which increased my model fit and got rid of my Heywood cases. I just want to double check a few things to make sure that my analysis is correct: 1) Is it acceptable to use the MLR estimator with a binary growth curve using complex data? 2) Do I interpret the effects of covariates on the latent intercept and slope as logit coefficients? 3) Does this model still assume that the latent intercept and slope are continuous latent variables, or that they are binary? Thank you very much for your advice on how to proceed. 


1. Yes. 2. Growth factors are continuous variables. The regressions of growth factors on covariates are linear regressions. 3. They are continuous. 


I am running a growth mixture model with an ordinal substance use variable using a WLSMV estimator from early adolescence into adulthood. I can't include all of the assessments in the model because there are zero cells in some of the more frequent use categories at the earliest assessments. I was wondering if there was a strategy for working around this problem. I considered creating a "dummy" case that endorsed the highest use category across the early use assessments to get the model to run, but this does not seem like an appropriate fix. Collapsing the categories in not appropriate because it ignores important variability that occurs as use escalates into adulthood. Suggestions? 


You can't use WLSMV for a mixture model. You must use maximum likelihood. See the * setting of the CATEGORICAL option in the user's guide. This allows different categories across time with maximum likelihood estimation. 


Hi Linda & Bengt! I am running a latent growth curve model with a binary dv (marriage), conditional on college status (3 categories, dummy coded). I am not sure how to interpret the results with a binary/categorical dv. Here's what I got for my model: I ON COLL0 0.164 1.571 0.105 0.917 COLL1 1.598 1.234 1.294 0.196 S ON COLL0 0.174 0.665 0.262 0.793 COLL1 0.541 0.549 0.985 0.325 Q ON COLL0 0.055 0.089 0.622 0.534 COLL1 0.011 0.074 0.150 0.881 ... Intercepts I 0.000 0.000 999.000 999.000 S 2.530 0.832 3.042 0.002 Q 0.142 0.109 1.304 0.192 If I wanted to write an equation to estimate the probability of being married at any given time point, how would I do that? Thanks! 


Are you using WLSMV or ML estimation? And if ML, are you using the default logit link, or probit? 


Thanks Bengt! I am using ML, and I believe the default. Here is my input, in case that's helpful: Data: File = "S:\PSYC\RESEARCH\Lauren\140307\marriagelist.dat"; type = imputation; Variable: Names are coll0 coll1 w9 w10 w11 w12 w13 w14 w15; missing are all (9); usevariables are coll0 coll1 w9 w10 w11 w12 w13 w14 w15; categorical are w9w15; Analysis: estimator = ml; MODEL: i s q  w9@0 w10@1 w11@2 w12@3 w13@5 w14@6 w15@7; i s q on coll0 coll1; Plot: Type is PLOT2; Series is w9w15 (*); 


You can look at our Topic 3 video and handout slides 192212 where we discuss binary growth modeling and also give references to work where technical details are given. The general interpretation is the same as with continuous outcomes, for example, a positive slope mean implies that the probability of U=1 (as opposed to U=0) goes up over time. To compute the probability you need to integrate over the growth factor distributions (see those refs which explicate that integration) so that is not something you want to do yourself. Mplus does it for you when it plots the estimated probabilities over the time points using the SERIES option in the PLOT command. 


Dear Professors, I am currently modeling an ordinal growth curve with the default WLSMV estimator (so I assume its an ordered probit). The model seems fine but I'd like to output the covariate adjusted curves (probabilities, etc.). When I look at my plot 3 outputs, there is no option to view something akin to the "adjusted mean" curve when I run a similar model with a continuous outcome. Is there any way to get Mplus to plot the adjusted probabilities for this model and/or would there be if I chose to run it as an ordered logit? Thanks, Miles 


This is not available for WLSMV. You can try ML but it requires numerical integration and I don't think that plot is available when numerical integration is involved. 


OK, thanks Linda. Will you please remind me of the correct formula to obtain the probabilities by hand given the tau's or factor means, latent intercept, and latent linear slope? Or please maybe refer me to a citation that includes the correct formula for Mplus? Thanks, Miles 


See the Topic 2 course handout on the website, Slides 162164. 


Hi, I have a nonsignificant chisquare for a growth curve with three time points. Does it make sense to add predictors? If I did add predictors, can I trust my model? I did add predictors and 2 out of 6 were significant. Thanks! Hillary 


Q1 Yes. Q2. If the model still fits. 


Hi Dr. Muthen, In a similar growth curve model with three time points and 11 predictors (as follows), I asked for STDY Standardization (due to the categorical nature of the data). However, I just realized in my model results section I only have "MODEL RESULTS" not "STDY Standardization." In my other similar models, I was given output for both "MODEL RESULTS" and "STDY Standardization." Is the model result section here in STDY standardization? I did not see any related warnings. Below is my syntax for the Model and Output: MODEL: GAff@1; HAff@1; IAff@1; i s  GAff@0 HAff@1 IAff@2; i s ON NA EX EC HRP NLE; i s ON int1 int2 int3 int4 int5 int6; !s@0; !i@0; PLOT: Series=GAff Haff IAff (s); TYPE=PLOT3; OUTPUT: STDY; Thanks! Hillary Gorin 


Please send your output along with your license number to Support. 


Hi Dr. Muthen, Are missing data being handled differently in MPLUS Version 7.0 versus MPLUS Version 7.4? Specifically, when running a growth curve model with categorical data and WLSMV estimation, are missing data being managed differently? Thanks! Hillary 


Not that I am aware of. If you think you see a problem, send the outputs and your license number to support@statmodel.com. 

JD sun posted on Thursday, September 01, 2016  2:18 am



Hello Dr. Muthen, I am constructing GC model with ordered categorical outcomes, and defined them as "categorical" in Mplus. The variables have excessive zeros at each time point. In this case, 1. Do I need to take into account the excessive zeros (i.e., skewness) in my outcome variables even if I set these variables as categorical (ordinal)? 2. Do you have any suggestions for different types of GC model in this case (e.g., twopart model for count or continuous outcomes with excessive zeros)? Thank you very much for your help in advance! JD 


Skewness is not an issue for categorical variables. Floor or ceiling effects are handled by categorical data methodology. 


Hi Dr. Muthen, I'm sorry my previous question was unclear. I have no concerns about problems with my data and i am not concerned about any problems with MPLUS. I am trying to understand better what MPLUS is doing with missing data in version 7.4. More specifically, I have been told that, in version 7.0, all missing data were deleted when using maximum likelihood estimation. I have the impression that, in version 7.4, if I use maximum likelihood estimation to fit a growth curve, MPLUS uses a full information procedure in which all available data are used and weighted based on what data are present. My question is this: Does MPLUS use a similar procedure, a full information procedure, if using WLSMV estimation in a growth curve using version 7.4? And if so, how is the missing data estimation different for WLSMV than for ML? Thanks! Hillary 


It is not true that all missing data were deleted in 7.0 (or 7.4) when using ML. A fullinformation approach is used. Mplus WLSMV does not use a fullinformation approach. See also our FAQ: Estimator choices with categorical outcomes 


Hello Dr. Muthen, Thank you very much for this information! It was very helpful! One more question I will use an example to be clear: If WLSMV estimation was used for invariance testing and then a growth curve model for latent variable X estimated at 3 time points (time point T1, T2, and T3). In the growth curve model, 3 predictors (A, B, and C) were used to predict the slope and intercept of X. The predictors are observed outcomes which are not considered categorical (not included in categorical statement). X is considered categorical. I have full data for the same number of participants at time 1, same number at time 2, and same at time 3 for variable X. Why would I get 1250 observations in the invariance testing analyses but only 1148 observations for the growth curve analysis? Thanks! Hillary 


You probably have missing data on your 3 predictors. The output will say how many subjects are deleted and why. 

Sooyong Lee posted on Sunday, February 04, 2018  4:06 pm



Hi. I'm studying latent growth modeling (LGM) with binary indicators, and I have a question about plotting issue with Mplus. While I was running the LGM with six binary indicators, I found that there was a difference between the Mplus plot (type = plot3) and my plot where I manually calculated the probability of Y being 1 by transforming logits to expected probabilities using growth parameter estimates (intercept and slope factors). More specifically, my plot showed a positive linear trend over time (intercept = 1.240, slope=0.188 with the probabilities of 0.775,0.806,0.834,0.858,0.879,0.898 at each time point), whereas the Mplus plot presented a flat trend (or slightly decreasing) over time (the probabilities at each time point: 0.747,0.756,0.755,0.749,0.7421,0.734). According to Maysn, Petras, and Liu (2013), I understand that the process of plotting growth trajectory of the LGM with categorical data is different from with continuous data. In this case, should I report and interprete the growth parameter estimates and growth trajectory using Mplus plot (type = Plot3) or my plot? If you have any other suggestions, please let me know. Thank you for your time and help in advance. 

Sooyong Lee posted on Sunday, February 04, 2018  4:18 pm



Hi. I'm studying latent growth modeling (LGM) with binary indicators, and I have a question about a plotting issue with Mplus. While I was running the LGM with six binary indicators, I found that there was a difference between the Mplus plot (type = plot3) and my plot where I manually calculated the probability of Y being 1 by transforming logits to expected probabilities using growth parameter estimates (intercept and slope factors). More specifically, my plot showed a positive linear trend over time (intercept = 1.240, slope=0.188 with the probabilities of 0.775,0.806,0.834,0.858,0.879,0.898 at each time point), whereas the Mplus plot presented a flat trend (or slightly decreasing) over time (the probabilities at each time point: 0.747,0.756,0.755,0.749,0.7421,0.734). According to Maysn, Petras, and Liu (2013), I understand that the process of plotting growth trajectory of the LGM with categorical data is different from with continuous data. In this case, should I report and interprete the growth parameter estimates and growth trajectory using Mplus plot (type = Plot3) or my plot? If you have any other suggestions, please let me know. Thank you for your time and help in advance. 


If you have growth factors with variances, the outcome probabilities using logit have to be computed using numerical integration; this is what Mplus gives you. You can't compute them by hand. 

Sooyong Lee posted on Monday, February 05, 2018  1:11 pm



Thank you for your reply. Given that the plot Mplus showed is flat but the slope parameter is positive (0.188), how could I interpret the result? I mean, I'm wondering whether the trend is increasing or doesn't change over time. 


Please send your output and gh5 file to Support along with your license number. 


Hi there, I am looking at whether children met the expected grade at 4 time points = KS1 (age 7), KS2 (age 11), KS3 (age 14), KS4 (age 16)  1 = yes, 0 = no. The youtube videos on topic 3 and 6 regarding LGM with categorical variables have been very useful. I have fitted a quadratic term in my LGM, and this fitted better than just a linear slope i.e. lower BIC and less outliers in terms of zscores using TECH10; although the chisquare model fit and LRT was significant (sample size n= 10,000). However, I am not sure which factor score is to be used as a dependent variable for mediation analysis  I see that I, S, Q and their SE's are available. Which factor score do you deem the most appropriate in terms of use as a DV? Many thanks in advance, Emily Code: CATEGORICAL ARE ks1bi18 ks2bi30 ks3bi36 ks44acem; USEVARIABLES ARE ks1bi18 ks2bi30 ks3bi36 ks44acem; MISSING ARE ALL(99); IDVARIABLE IS unqid; ANALYSIS: ESTIMATOR = MLR; MODEL: i s q  ks1bi18@7 ks2bi30@11 ks3bi36@14 ks44acem@16; OUTPUT: TECH10; SAVE: SAVE=FS; FILE IS schooltrajectory_quadratic.dat. 


Which one you choose depends on your research question. Are you interested in the level at a certain time point such as the end, are you interested in the change over time? 


Probably change over time for analyses, as I am interested in the educational trajectory. However, it would be very helpful if you could explain which factor score tells you what if possible! Many thanks, Emily 


If you have a linear growth model, the slope s would be the one to focus on. With a quadratic growth model, it is complex because s and q interact in their determination of growth. You don't need to get factor scores but can do the analysis in one step. If you want more input, this general question is suitable for SEMNET. 


Hi, Thanks for getting back to me. Ok, that is useful know regarding linear models. Ok, I am assuming you mean fit the indirect effects on the LGM itself in one model overall  interesting thought! I’ll have a look at SEMNET. Thanks again! 

Dela posted on Monday, April 08, 2019  2:33 pm



Hi Dr. Muthen, I have a longitudinal data with 4 time points. I have identified the intercepts and slopes and now I want to predict a categorical and a continuous outcome by the intercept and slope of my independent variable. When I specify the estimator = ML or MLR, I receive a warning: " MODINDICES option is not available for ALGORITHM=INTEGRATION." Also I do not get an estimate for slope predicting my categorical outcome variable. INT 0.150 SLP ********* I also tried ESTIMATOR = WLSMV but that did not give me the odds ratio. I will greatly appreciate your advice. I am including my syntax. Thanks so much! USEVARIABLES = STEM_GPA T1_ant T3_ant T4_ant T5_ant MAJ_STEM; MISSING ARE ALL (99); CATEGORICAL = MAJ_STEM; ANALYSIS: ESTIMATOR = ML; MODEL: [T1_ant@0]; [T3_ant@0]; [T4_ant@0]; [T5_ant@0]; int slp  T1_ant@0 T3_ant@1.3 T4_ant@2.3 T5_ant@3.3; [int slp]; STEM_GPA on int slp; MAJ_STEM on int slp; OUTPUT: SAMPSTAT MODINDICES (3.84) STDYX TECH1 TECH4; 


We need to see your full output  send to Support along with your license number. 


Dear Drs Muthen, *apologies if this or a similar issue has been previously discussed and answered. I was not able to find a directly relevant answer when searching the discussion board* I am running an LGM with a categorical outcome and 4 timepoints. When I use the default WLSMV estimator (with default delta paramterization), the residual covariance matrix (theta) is not positive definite, due to a negative residual variance. The same problem occurs when I use the WLS or WLSM estimators. However, when I switch to any of the ML estimators (and use either probit or logit link), the model estimation terminates normally and there are no more issues with negative residual variances. My question is: Would this be a good reason to choose ML rather than WLS in this case? Importantly, the model results differ considerably when using WLS vs. ML estimators, so my decision has consequences for the substantive conclusion I draw from the model. I would appreciate any advice you can offer to help me decide on the best approach here. Many thanks, Pete 


There is a contradiction in what you say here: The Delta parameterization does not give Theta residual variances  Delta gives scale factors. The Theta param'n gives Theta res vars. The ML model is more restrictive than the WLSMV model because it has no free scale factors or residual variances. 

Peter Koval posted on Wednesday, May 08, 2019  5:50 pm



Dear Dr Muthen, Thanks for your reply. Forgive me if I've confused things. I was referring to the following warning message, which I get when running a latent growth model with a categorical outcome, and Estimator = WLSMV; Parameterization = DELTA; WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE Y0. The scale factor estimates look OK: estimate for Y0 is fixed to 1, and estimates for Y1Y3 are all positive. So the problem is not there. Instead, under RSQUARE, the observed variable, Y0, has a negative residual variance. More generally, I understand that the ML model is more restrictive than WLSMV. So, when (if ever) is it advisable to use the ML estimator for a categorical LGM? Thanks very much. Pete 


I see what you mean now. The residual variance is computed as a remainder, not a model parameter with Delta. But it is a problem if it is negative so you may want to explore why. The more restrictive model used with ML is common in mainstream statistics and could be used. Nevertheless, you want to know why you have the WLSMV problem. For instance, you can run with Param=Theta, both having the Theta variances free for all but one time point and fix them all to 1. If you fix them all to 1, you get the ML model. You can also fix the last time point's scale factor (for Delta) or residual variance (for Theta) to explore if that changes things. 

Back to top 