Message/Author 

Anonymous posted on Friday, October 29, 1999  11:47 am



What is the minimum number of timepoints needed to estimate a growth model? 


A minimum of four timepoints is recommended for growth models for two reasons. First of all, with less than four timepoints it is not possible to identify enough parameters in the growth model to make the model flexible. Secondly, four timepoints give more power. With four timepoints for continuous outcomes, the H1 model has 14 free parameters: four means, four variances, and six covariances. With three timepoints, the H1 model has 9 free parameters: three means, three variances, and three covariances. There are also 9 parameters in the H0 model that need to be identified: the means of the intercept and slope growth factors (2), the variances of the intercept and growth factors (2), the covariance between the two growth factors (1) and the residual variances of the outcome variables (4). There are five other parameters that could be used for model modification purposes: the residual covariances among the outcome variables (3) and the time scores (2). No model modification would be allowed with three timepoints unless certain other restrictions were placed on the growth model. 


I have been running a Growth curve model with four timepoints. The first time point is at the beginning of a stay in a rehab clinic, the second at the end of that stay (ca. 4 weeks later), the third one is three month and the fourth six month later. I started my analysis with fixed time scores, centered at the first time point (0,1,4,7). The analysis did not work very well, so I started freeing time scores. The final analysis resulted in the following time scores: 0 1 .8 1.7. Here is my question: Does it make sense to have negative time scores? Thanks! Hanno 


First of all, have you plotted your observed variable means and also looked at a random set of indivdual growth curves to get an idea of what the shape of your growth curve is and also to see if there is variability in your sample regarding the growth curve shape. If there is a lot of variability in the growth curve shapes in the sample, perhaps you need to take this heterogeneity into account. If your growth curve looks linear, I think the appropriate time scores would be 0, 1, 3, 6 if the observations were at the first timepoint and then at one month, three months, and six months. If your average growth curve does not look linear, its shape should be a guide to how the time scores will come out. The estimated time scores above imply that after a linear start (0 to 1), the curve goes down. Perhaps you need to add a quadratic growth factor to your model. 


Along the lines of questions pertaining to timepoints in growth curve modeling... I have previously used SAS PROC MIXED to run growth curve models. With a 3 timepoint longitudinal survey study I found that there was a good deal of variation around the interview dates at each wave. I wanted to account for that variation in my model. I was able to scale time continuously by calculating time from initial baseline event to each interview point for each person. For the questions I was investigating this rescaled time variable improved my model when compared to using a straight 3 timepoint model and it also made it a bit easier to interpret my results in the context of 'years since event' as opposed to change from wave to wave. My question is: can I do this type of growth curve time scaling in Mplus version 2? 


This can be handled in Mplus for continuous outcomes by using TYPE=MISSING if there are not too many unique timepoints. The dataset will contain missing values for individuals not measured at particular timepoints. For example, the data would be as follows: person 3 5 11 12 28 30 1 55 * * 75 85 * 2 * 25 45 * * 70 where 3, 5, etc. represent the measurement occasions, and person 1 was measured at times 3, 12, and 28 while person 2 was measured at times 5, 11, and 30. 

anonymous posted on Tuesday, July 31, 2001  6:48 pm



Is it possible/acceptable that the slope is regressed on covariates measured at the second or later timepoints? 

bmuthen posted on Wednesday, August 01, 2001  3:48 pm



If the intercept is defined at the first time point, it seems that potential predictors of the slope should be measured before the second time point because the slope describes the change between the first and the second time point. 

Anonymous posted on Thursday, August 02, 2001  2:54 pm



If the intercept is defined at the last time point, is it accptable? 


Yes. 

Anonymous posted on Monday, August 27, 2001  9:02 am



I have a quadratic growth model with 10 time points that are unequally spaced. I would like to code time: s BY ret1@0 ret 2@.5 ret3@1 ret4@3 ret5@3.5 ret6@4 ret7@4.5 ret8@5 ret9@5.5 ret10@9; q BY ret1@0 ret2@.25 ret3@1 ret4@9 ret5@12.25 ret6@16 ret7@20.25 ret8@25 ret9@30.25 ret10@81; Do you see any problem with the second time point at .5 (and .25) instead of 1?? 


No. However, remember when you interpret the mean of the slope growth factor, it is the change between time score 0 and time score 1. 

Rina Eiden posted on Thursday, December 20, 2001  8:27 am



Hi  I am trying to model changes in fathers' drinking over 4 time points (child ages of 12 months, 18 months, 24 months, and 36 months). I ran the original analyses with just a linear component with time scores of 0, 1, 2, and 4. The model fit was very poor. When I looked at the overall sample means, they are as follows: 3.74, 4.05, 1.82, and 2.11  definitely not linear. There is a great deal of variability in the individual growth curves  with some looking cubic and some quadratic. I reran the model adding a quadratic component and the slope for the quadratic factor is significant  but the model fit is still poor. I added the quadratic component by adding quad by alc1@0 alc2@1 alc3@4 and alc4@16. Is this right? Is there any way to model a cubic curve? I am going to add covariates next, but wanted to make sure I had modeled the curve of the slope sufficiently well before doing that. Thanks so much. 

bmuthen posted on Thursday, December 20, 2001  12:50 pm



First, a substantive comment. Is there a theory for this development? Are there substantive reasons for the sample mean to go up, then down, and then up again? This upanddown development is difficult to capture in a growth model. Next, some technical comments. The quad specification is correct, and a cubic could be added as well. But there may be other reasons for your model misfit. For example, adjacent time points may have correlated errors, or a time point may deviate from the quadratic curve as would seem to be the case given your sample mean development. You may detect this using modification indices. You may also have data that are far from normal, in which case using MLM can give you a better chisquare fit assessment. 


I am looking for ways to model longitudinal data with variable time points using a latent growth curve. An attractive feature of the Mx program is that it can accommodate individual slope loadings via implementation of definition variables, special data columns containing fixed parameter values for each individual (Hamagami, 1997; Neale et al., 1999). Operationally, this involves creating a set of slope factor loadings unique to each individual. In the special case of longitudinal research involving age, this is referred to as scaling age across individuals (Mehta & West, 2000). A sample Mx script demonstrating this technique is included in Appendix A of Mehta and West's (2000) paper. Does Mplus have a similar feature? Yes, I know this is easier to do as a multilevel model, but what about in the LGM context? 


I'm not familiar with the Mehta & West paper but it sounds like you mean a growth model where individuals are measured at different times, that is, a model with individuallyvarying times of observation. Mplus Version 2.1 can estimate this type of model. Example 5 on page 15 of the Addendum to the Mplus User's Guide deals with this model. The addendum can be downloaded from www.statmodel.com under Product Support. If this is not the model you mean, let me know and I will take a look at the article. 

Anonymous posted on Monday, March 10, 2003  11:23 pm



Assuming same number of assessments, same timing of assessments across cases. If the data collection intervals are unequal [say 0 1mo 2mo and 9m], what values should be used fix slope parameters to test a linear model? Can 0 1 2 3 be used, or does this imply equal intervals? 


0, 1, 2, 3 implies equal intervals. You can use 0, 1, 2, 9 to represent the unequal intervals. 

anonymous posted on Tuesday, July 08, 2003  2:30 pm



I am modeling a LGC and I want to include time varying covariates. Is there a maximum number of timevarying covariates that one can include given the number of measurement occassions in the model? That is, with 4 timepoints is their a max number of TVC's one should include? 


There is no maximum number of covariates you can include as far as the model being identified. For each covariate you add, you bring a variance and covariances with the other observed variables in the model. And you are estimating only a regression coefficient. 

anonymous posted on Tuesday, July 08, 2003  5:49 pm



Thank you for your response. I'm estimating the regression coefficients of the tvc's, but I also have several "person level predictors" of the intercept and slope. I'm asking because I'm trying to understand how a LGC with TVC's in SEM relates to an equivelent model using Proc Mixed. I'm wondering if having a few measurement occassion (say 3 or 4) nested within individuals presents a problem when trying to assess the impact of TVC's. It is my understanding that when fitting this type of model in HLM or Proc Mixed, the level 1 model is fit for each individual. It seems to me like it would be problematic to have as many TVC's as measurement occaassions. Any insight would be greatly appreciated. Thank you! 

bmuthen posted on Wednesday, July 09, 2003  11:39 am



You are right that conceptually, the level 1 model is fit for each individual, letting the intercept and slopes vary across individuals. However, the statistical procedure makes use of the fact that these intercepts and slopes come from a single population common to all the individuals so that the statistical procedure does not estimate model parameters for each individual, but instead means and variances across individuals. After model estimates have been obtained, the individual estimates are obtained from the model and the individual's data by a separate empirical Bayes (factor score) step. In conclusion, it is ok to have many tvc's per time point even when having only 3 or 4 occasions. 

Lori Weight posted on Thursday, September 11, 2003  10:03 am



I have a followup question in regard to the one posted on July 8 (regarding the number of time vary covariates). I have a similar model and I'm wondering if with just 3 timepoints I can model the linear trend and two additional timevarying covariates. This model converges for me. I've also run the model in Proc MIXED and it converges there as well. Can I assume that both the SEM and Proc Mixed models are NOT overparameterized...I'm concerned that they might be with only three time points even though the estimation of them looks good. 

Lori Weight posted on Thursday, September 11, 2003  10:19 am



I should also mention that I'm treating all of the level one effects in the Proc Mixed model as random. 

bmuthen posted on Thursday, September 11, 2003  10:22 am



Yes, this model is identified in its standard form. With 3 time points the growth model part is identified even without the extra information provided by the covariances between the 2 timevarying covariates (tvc's) and the outcomes. In SEM terms, without the tvc's, you have 3*(3+1)/2 + 3 = 9 pieces of information and 8 parameters. The 2 slopes you add when adding the tvc's are identified by the covariance of the tvc and the outcome. If the model was not identified, the SEM approach would tell you so by not being able to computed SEs. 

bmuthen posted on Thursday, September 11, 2003  10:24 am



So, the slopes for the 2 tvc's are random. And probably held equal for the 2 tvc's in Proc Mixed. That is also fine in Mplus. Conventional SEM cannot handle a random slope for the tvc's. 

Lori Weight posted on Thursday, September 11, 2003  11:34 am



Thanks for the quick response. I'm not sure what you mean by "held equal for the 2 tvc's". In proc mixed I have four random effects (int, slp, tvc1, and tvc2) and each is estimated. Is this a problem? Also, do I need to do something extra in mplus in order to treat my tvc's as random, i.e. specify the hierarchical structure (measurement occasion nested within subject)? Thank you. 

bmuthen posted on Thursday, September 11, 2003  11:40 am



Often, the same random effect is used for tvc1 and tvc2. Mplus can have them the same or different. Having them different is not a problem. The Mplus specification is a standard, singlelevel model (so no nesting, but a "multivariate approach") specifying analysis type = random to be able to use the random slopes s2 and s3 for the tvc's: s2  y2 on x2; s3  y3 on x2; 

Lori posted on Thursday, September 11, 2003  2:53 pm



OK, so if I understand correctly, then a RC model specified in Proc Mixed with 3 time points and 4 random effects (int., lin. tvc1, tvc2) is not overparameterized. Is this correct? Thanks so much for helping me with all of this. 

bmuthen posted on Thursday, September 11, 2003  2:56 pm



no I don't think so. another matter is if all 4 have nonzero variance, but the estimation will tell you that. 

Anonymous posted on Thursday, October 30, 2003  2:53 pm



I have a 4 point growth trajectory. The data were collected at time 1 and then 3.5 years later, 4.5 years after time one, and 5.5 years after time one. What would be the best way to code time. 


I would use time scores of 0, 3.5, 4.5, and 5.5. 

Anonymous posted on Friday, October 31, 2003  8:04 am



In response to yesterday: That is initially what I used but when I calculated the curve for the group with a value of "1" on a predictor the resulting curve doesn't map well onto the raw means for the those with value of 1 on the predictor. I thought the problem might be the factor loadings. for example Raw means start around 3 and increase to around 3.9 (see below) the estimated curve with the above factor loaddings is: Intercept + predictor(1) = 2.348 + 2.13 = 4.478 Slope + predictor (1) = .106 + .562(1) = .668 The CURVE for predictor = 1 is. intercept + slope (factor loading) = curve value estimated curve 4.418 + 0.668 (0) =4.4 4.418 + 0.668 (3.5) =2.1 4.418 + 0.668 (4.5) =1.4 4.418 + 0.668 (5.5) =0.7 Raw means for X group t1 = 3.0 t2 = 3.2 t3 = 3.5 t4 = 3.9 I'm not sure about how to deal with these results. They don't make sense substantively or map well to the raw means ( I do have time varying cov's but they are not significant). 


I would have to see the full output to be able to comment. You can send it to support@statmodel.com. 

Anonymous posted on Thursday, November 13, 2003  10:03 am



I have specified a simple linear growth model where individuals have different times of observation in two ways in Mplus. In the usual way, I have specified the time variable with all possible times and used the MISSING to deal with the fact that not everyone had every time. I have also specified it using TSCORES and the RANDOM analysis type. The usual way print the usual statistics, chisquared, etc. to assess fit. The TSCORES approach does not provide the conventional fit indices. What is the difference? Is the answer the reason why multilevel model programs do not provide those statistics? 


Yes, the difference is the reason that multilevel does not print fit statistics. In SEM models, there is one variance/covariance matrix to compare to. With individuallyvarying times of observation entered as data, you have a random slope multiplying an observed covariate. When such random slopes are involved, the residual variance of the outcome given the covariates varies as a function of the covariates. Therefore, there is no single variance/covariance matrix against which to test model fit. 

Anonymous posted on Thursday, November 13, 2003  11:49 am



What if there are not covariates involved, as was the case with my model? 

bmuthen posted on Thursday, November 13, 2003  11:53 am



When you use the TSCORES option you are using a covariate, namely time. So that case falls into the category of changing variances and lack of overall model tests of fit. 

Anonymous posted on Saturday, November 15, 2003  8:07 pm



I am trying to compare the results of a Proc Mixed model to an MPlus model. The time points are continuous (t1,t2,t3) with corresponding outcomes (y1,y2,y3). I'm starting with one timeinvariant predictor variable (x). I set up the model as follows: Int Slope  Y1 Y2 Y3 AT T1 T2 T3; Int Slope ON X; Int WITH Slope; Y1 (1); Y2 (1); Y3 (1); I am trying to generate a covariance and a single residual variance. When I ran this model without the x variable, it converged after a long time. With the x variable, it crashed. Have I incorrectly specified this model? Thank you, 


I would need your input and data from Mplus and your SAS output to answer that. You can send them to support@statmodel.com. From what you show, it should not take any time to run. There must be some other complication. 


Thank you for sending your data and input. When I run this with or without the x variable, it finishes in less than one second. I notice that you are using Version 2.12. I suggest that you download Version 2.14 which is the current version of Mplus. Note that Mplus and SAS use different approches to weighting. I would do my comparison without weights to be sure you have the model set up correctly (it looks like you do) and then I would add weights. 

Mel Dal posted on Sunday, April 11, 2004  1:26 pm



Hi, I am using Proc Mixed to determine time trends in my dependent variable. However, my observations are unequally spaced at 0, 2 years, 5 years and 7 years. Can I use time as 0, 2, 5 and 7, respectively if I want to consider time as a continuous variable? Also, if I intend to model for withinsubject correlation over time, can I use AR(1) to model dependency over time, if I have unequally spaced intervals. Thanks, Mel 

bmuthen posted on Sunday, April 11, 2004  2:52 pm



With unequally spaced intervals, perhaps you want to simply correlate adjacent residuals, letting those correlations be different. You may also consider the OrnsteinUhlenbeck process which is suitable for irregular spaced observations. 

Wim Beyers posted on Thursday, May 20, 2004  1:15 am



I have 5wave data, however not equally spaced in time (particularly the last measure). LGC modeling using intercept (all 1's) and slope (0,1,2,3,7) factors does not fit the data well. Adding a quadratic trend (0,1,4,9,49) makes the fit slightly better, but still not enough. From a repeated measures ANOVA on the same data, a cubic trend was suggested. So, I would like to add a cubic trend. Questions:  Are the loadings of the cubic factor fixed at (1,1,1,1 etc.)?  And, what about the last measure, do I use a 1 or a 1 there (given the fact that it was taken 4 years later than the second last)?  And, does modeling a cubic trend require to model also a quadratic trend on the data? I mean, there clearly is a cubic trend, but all results point to the fact that the quadratic trend is not important (both mean and variance of this trend not significant). Thanks 


The loadings for the cubic growth factor are the loadings of the linear growth factor to the power of three. I would include the quadratic trend as long as the mean and variance of the quadratic growth factor are not zero. You might want to think about a model with some free time scores instead of a cubic model. 

Wim Beyers posted on Friday, May 21, 2004  1:51 am



Ok, thanks for the suggestion to take some free time scores. I hesitated to do this at first, but it seems a much better solution, because I do not really have an interpretation for a 'cubic' trend. One more question on these free time scores. I know that I need to fix at least two of them, 0 (at the first measure, if I want to center at the initial level), and another one, for instance 1 (at the second measurement). So: [0,1,free,free,free] With respect to the latter, in terms of fit it does not seem to matter which time score I fix, e.g. [0,free,2,free,free] or [0,free,free,3,free] gives me exactly the same chisquare (and I need to do this because otherwise my model sometimes does not converge) However, the mean (and variance) of the slopefactor then seems to change when I fix other time scores. Does this mean that the meaning of the slope factor (originally the rate of change between two measures) is also different now? Thanks for all help. 


You will get the same fit with 0 1 free free, 0 free 1 free, or 0 free free 1. There are just different parameterizations of the same model. Note that you should use 0 and 1 for the two fixed time scores for reasons of interpretability. The reason that the mean and variance of the slope growth factor change is because the refer to the slope between 0 and 1 and that slope changes depending on where you fix the 0 and 1. 

Anonymous posted on Monday, July 12, 2004  5:11 am



I have fitted a linear growth model to (generated) data with individually varying points of observation. I used SPLUS, SPSS, and MPLUS. I used Maximum Likelihood estimation, and restricted the variances of the error terms to be equal across time points. The results of SPSS were exactly equal to those of SPLUS. The results of MPLUS differed from those obtained with the other two packages. For fixed parameters, the second digit differed. For random parameters, the first digit differed. As I expected the results of MPLUS to be equal to those for SPLUS and SPSS (as was the case for fixed points of observation), I wonder what may have caused the difference. 


I could only answer this question by seeing how you generated the data and the output from the three programs. You can send them to support@statmodel.com. 


Mplus reports the residual variances of the outcome, the intercept growth factor, and the slope growth factor. Splus and SPSS report the standard deviations. So to compare, you must square their estimates or take the square root of the Mplus estimates. They give the standardized value of the covariance between i and s. Mplus gives the raw value. When these differences in reporting are taken into account, the results do not differ in any important way. 

Anonymous posted on Tuesday, July 13, 2004  5:23 pm



Could someone clarify the relationship between the single regression coefficient associated with a level 1, timevarying covariate in a multilevel growth model as one might estimate such a model in a program like sas' proc mixed and the multiple coefficients produced in latent growth curve model as one might estimate such a model using mplus where the separate time point values of the dependent variable are regressed on the corresponding time point values of the timevarying covariate? 

bmuthen posted on Tuesday, July 13, 2004  5:36 pm



There is a difference here between what you can do in conventional SEM programs and what you can do in Mplus. Mplus allows you to do what SEM can do as well as what you can do in conventional multilevel programs such as SAS. SEM cannot have the slope for the timevarying covariate be randomly varying across individuals, but can let it vary across time. In SAS, the slope can vary across individuals (and can also be made to vary across time). Again, Mplus can be used for either. 

Shige Song posted on Sunday, October 24, 2004  1:22 am



Hi Linda, I am very interested in this discussion and want to probe a bit more. You mentioned that four timepoints are needed because otherwise "it is not possible to identify enough parameters in the growth model to make the model flexible"; in addition, four timepoints give more power. Can you explain in some more details about what kind of parameters are not identified and what kind of flexibies will be missed. I ask this question because I am doing a growth modeling and unfortunately I have only three time points. I remember that Muthen (2000) uses only two time points to identify a random intercept model. With three time points, it is probably ok to identify a model with both random intercept and random slope. Since the literature on growth modeling is so big and I am very new in this area, it will be greatly appreciated if you can provide some more information (references, explanations, etc.) Thanks! Best, Shige Reference: Muthen, Bengt O. 2000. "Methodological Issues in Random Coefficient Growth Modeling Using a Latent Variable Framework: Applications to the Development of Heavy Drinking in Ages through 18 to 37." Pp. 113140 in Multivariate Applications in Substance Use Research: New Methods for New Questions, edited by Jennifer S. Rose. Mahwah, N.J.: Lawrence Erlbaum Associates. 


With four timepoints and a continuous outcome, the H1 model has 14 free parameters  four means and 10 variances/covariances. The basic H0 model has 9 free parameters, means and variances of the intercept and slope growth factors, covariance between the intercept and slope growth factors, and four residual variances for the outcome. This leave 5 degrees of freedom to use to modify the model. With three timepoints, the H1 model has 9 free parameters and the H0 model has 8 free parameters. This leaves only one degree of freedom to modify the model. So four timepoints allows a lot more flexibility to add residual covariances and/or free time scores. 

Anonymous posted on Friday, March 25, 2005  11:05 am



What exactly is the technic used in the Mplus to handle missind data, say in mixture growth curve analysis? 

bmuthen posted on Friday, March 25, 2005  3:39 pm



I copy the following from the User's Guide. This applies to all models. Mplus has several options for the estimation of models with missing data. Mplus provides maximum likelihood (ML) estimation under MCAR (missing completely at random) and MAR (missing at random; Little & Rubin, 2002) for continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types. MAR means that missingness can be a function of observed covariates and observed outcomes. For censored and categorical outcomes using the weighted least squares estimators, missingness is allowed to be a function of the observed covariates but not the observed outcomes. Nonignorable missing data modeling is possible using maximum likelihood where categorical outcomes represent indicators of missingness and where missingness may be influenced by continuous and categorical latent variables (Muthén et al., 2003). Multiple data sets generated using multiple imputation (Schafer, 1997) can be analyzed using a special feature of Mplus. Parameter estimates are averaged over the set of analyses, and standard errors are computed using the average of the standard errors over the set of analyses and the between analysis parameter estimate variation. In all models, missingness is not allowed for the observed covariates because they are not part of the model. The outcomes are modeled conditional on the covariates and the covariates have no distributional assumption. Covariate missingness can be modeled if the covariates are explicitly brought into the model and given a distributional assumption. With missing data, the standard errors for the parameter estimates are computed using the observed rather than the expected information matrix (Kenward & Molenberghs, 1998). Bootstrap standard errors and confidence intervals are also available with missing data. 

Anonymous posted on Tuesday, March 29, 2005  9:18 am



Is there any limitation for the number of covariates? Should the covariates be independent with each other? Thanks 


No and no. 

Anonymous posted on Saturday, April 16, 2005  6:40 am



Assuming that appropriate command and data files have been created, can MPlus v3 be called in "batch" mode to run single analyses. If so, what are the appropriate command line options? Alternatively, could one use the RUNALL utility to perform the analyses where only a single data set (ie. no replicates) is specified as input? Thanks! Richard 


Yes, you can run Mplus in batch mode. The command line specification is: C:> Mplus inputfile [outputfile] where the outputfile is optional. If not given, the output will be set with the same filename as the input but with the OUT extension. The RUNALL utility can be used to run analyses with any number of data sets. 


I am running a growth model with 6 time points that are not equally spaced. I am coding time as: iw sw  cigs301@0 cigs302@.5 cigs303@1 cigs304 cigs305 cigs306 [freeing the last 3 times]. I keep receiving these messages: THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. I would appreciate it very much if you can suggest a way of addressing this issue. Thank you. Below is the input I am using. TITLE: 10th Grade Growth Model DATA: FILE IS "C:\Documents and Settings\INSTHEALTHSA5\Desktop\Growth10.dat"; FORMAT IS 203F8.2; VARIABLE: NAMES ARE Crsswlk transfer treatms hs9dist y1prt y1pot y2follow y3prstat y3postat y4sustat prisk7 rhsfree rhsred sex AGE white Latino Black Asian Amind othrace misrace T1SRPEP7 SRPREV7 SRPREV8 SRPREV9 SRPREV10 rprog7th rprog8th rprog9th R7TH R8TH R9TH refuset1 refuset2 refuset3 refuset4 refuset5 descale1 descale2 descale3 meandes4 meandes5 meandes6 comsk1 comsk2 comsk4 comsk5 comsk6 meanq51 meanq52 meanq53 meanq54 meanq55 meanq56 meanq61 meanq62 meanq63 meanq64 meanq65 meanq66 meanq91 meanq92 meanq93 meanq94 meanq95 meanq96 meanq111 meanq112 meanq113 meanq114 meanq115 meanq116 meanq271 meanq272 meanq273 meanq274 meanq275 meanq276 posexp1 posexp2 posexp3 posexp4 posexp5 posexp6 intcig1 intcig2 intcig3 intcig4 intcig5 intcig6 intalc1 intalc2 intalc3 intalc4 intalc5 intalc6 intpot1 intpot2 intpot3 intpot4 intpot5 intpot6 cigs301 cigs302 cigs303 cigs304 cigs305 cigs306 alc301 alc302 alc303 alc304 alc305 alc306 drnk301 drnk302 drnk303 drnk304 drnk305 drnk306 pot301 pot302 pot303 pot304 pot305 pot306 huff301 huff302 huff303 huff304 huff305 huff306 bng301 bng302 bng303 bng304 bng305 bng306 other303 oth304 oth305 oth306 pres303 pres304 pres305 pres306 smoket1 smoket2 smoket3 smoket4 smoket5 smoket6 drinkt1 drinkt2 drinkt3 drinkt4 drinkt5 drinkt6 drunkt1 drunkt2 drunkt3 drunkt4 drunkt5 drunkt6 binget1 binget2 binget3 binget4 binget5 binget6 pot1 pot2 pot3 pot4 pot5 pot6 inhalet1 inhalet2 inhalet3 inhalet4 inhalet5 inhalet6 othert3 othert4 othert5 othert6 prest3 prest4 prest5 prest6 t1useall t2useall t3useall t4useall t5useall t6useall catuse1 catuse2 catuse3 catuse4 catuse5 catuse6; USEVARIABLES ARE treatms hs9dist cigs301  cigs306; BETWEEN IS treatms; CLUSTER IS hs9dist; MISSING IS blank; ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% iw sw  cigs301@0 cigs302@.5 cigs303@1 cigs304 cigs305 cigs306; cigs301  cigs306 (1); %BETWEEN% ib sb  cigs301@0 cigs302@.5 cigs303@1 cigs304 cigs305 cigs306; cigs301  cigs306@0; ib sb ON treatms; 


You should send your input, data, output, and license number to support@statmodel.com. 

Daniel posted on Wednesday, September 28, 2005  11:31 am



A reviewer asked my colleague and I about "period effects" in our longitudinal data collecting strategy. Essentially, the reviewer was concerned that by collecting repeated measures each spring semester, we may be encountering effects that are specific to spring. Do you have any suggestions how we may counter this argument? Is there a name for this effect that can aide in my search other than Period effect? Do you believe it is better to model with random instead of fixed timepoints? I'd appreciate any guidance. 


The only thing that comes to mind is that in education students may do better in Spring due to a loss of knowledge over the Summer and therefore drop in their scores in the Fall. I imagine it would depend on what you are measuring. Perhaps someone else has some knowledge of this. 

bmuthen posted on Wednesday, September 28, 2005  9:11 pm



Just to add to this discussion, I wonder why you are thinking of random instead of fixed time points. Random time point would mean that different students are measured at different times irregularly. It seems better to me to measure a couple of times a year, say Fall and Spring  this would still be using fixed time points. I wonder if anything is written under the rubric of seasonal effects in noneducational contexts. 


I wonder what would be the maximum number of timepoints to be used in LGM in a reasonable way? Any quidelines for this? For example, if I have data from 100 individuals across 60 measurement points is it possible to use LGM to analyze the data, or should I rather be satisfied with the time series analyses? 

bmuthen posted on Tuesday, January 17, 2006  6:27 am



I think 60 time points would be a lot for the multivariate, wide data, singlelevel approach to growth that is usually used in latent growth curve modeling. But you can formulate this as a long data, 2level analysis as in the multilevel approach to growth. Then cluster = person id and you regress the outcome on time with a random slope. With a single outcome you then have only a univariate problem, where the 60 time points are 60 members of a cluster. 

Russell Ecob posted on Thursday, February 02, 2006  1:00 pm



I have data (ordered categorical) with 5 time points with a range of explanatory variables. I cannot decide whether to model with up to quadratic terms in the growth model and free some time points (e.g. the last two – I am particularly interested in relationships between variables near the beginning of the sequence) or whether to increase the degree of the polynomial (to cubic). What considerations would decide you on one course or the other? I assume that polynomials with degree larger than cubic cannot be fitted (at present) in mplus. 

bmuthen posted on Thursday, February 02, 2006  6:22 pm



You can fit polynomials with degree larger than cubic in Mplus, but you have to revert from the growth language back to the approach using BY statements. Adding free timescores to a quadratic sounds a bit complex, although perhaps necessary. Maybe the quadratic works better with some transform of age such as log age? We have found that in some data where the process increases faster than it decreases. 

Ann Helgman posted on Wednesday, February 15, 2006  12:05 pm



I have only 3 time points (equal and balanced) but would like to plot a two order growth model (with parameters for the intercept, linear, and quadratic growth). However, I am only interested in making the intercept random. Conceptually, this model makes more sense than a linear model because I do expect subjects to initially increase and then decrease. I know it seems a bit stretched to have only 3 time points and to plot a quadratic growth, but it makes sense, and (in HLM at least) the likelihood ratio test indicates that adding TimeSquared to the model improves the fit of the data or is that intuitively obvious anyway? I would very much like to proceed with a quadratic growth, with only the intercept random: is there a reason not to do so? Many thanks, Ann 

bmuthen posted on Thursday, February 16, 2006  6:08 am



That should work. It often makes sense that only the intercept factor is random. 

Ann H. posted on Thursday, February 16, 2006  8:55 am



Great I'm confused about what to call something: when only the intercept is random, should I still be calling this approach latent growth trajectory modelling? I think that both the intercept and growth factors must be random in order to call this approach "individal growth"; so I was wondering what would be the correct way to refer to a model with only the intercept random, but with fixed linear and quadratic growth terms: should this be a "population growth trajectory" or something like that, or can it still be referred to as individual growth, albeit constrained to everyone assumed to have the same linear and quadratic rates? Thanks so much for this board! Ann 


I would call it a growth model with a random intercept growth factor and fixed slope growth factors. 


I have longitudinal clinical data on symptoms of cancer treatment. Individuals were measured several days before the first treatment and then about 48 hours after the second and third treatments. I understand that three time points is not ideal, but it is what I am stuck with. The length of time between treatments (cycle length) can vary but it was typically 14, 21 or 28 days depending upon the chemotherapy regimen. My intention is to center around the second chemotherapy treatment, thus my first time point will be negative. Within patients with the same treatment cycle length, the intervals between time points will be fairly consistent, BUT the intervals will be very different across treatments with different lengths (e.g. 14 day vs. 21, vs. 28 days). My questions are, Does the variability in the intervals between time points threaten the validity of the linear growth model? AND would I be better to fix the first time point at 0 and then let the other two time points vary across individuals. 

bmuthen posted on Tuesday, February 28, 2006  3:47 pm



You might want to approach this as a multiplegroup growth analysis, where groups differ based on treatment regimen. You would then have different time scores for the different groups. If you believe the same linear model holds for the groups you can impose that restriction  holding growth factor means variances, and covariance equal across groups. This way, the different timings due to different regimens will not be a problem. For growth factors to be the same across groups, however, you have to center at the same time point, so it seems best to center at the pretreatment time point. 


I will be working with a data set of 550 mother adolescent dyads measured at 4 time points over four years. I have the date of the interviews and thus I can compute the intervals between each observation. I anticipate that I will model the data with individually varying time points. I have 3 questions: My reading of the Singer and Willet text tells me that by using this technique (as opposed to forcing the times to be the same for all participants) will reduce error in the slope. Is this correct? Also I assume that my reliability of change (Willett 1989) will be higher than with fixed time points. I think this follows from the reduced error in the slopes AND in the larger SST (sum of squares for time) Finally I wonder if you can point to me some published work which has modeled growth using individually varying time points. 


I would agree with Singer and Willet on the first point. I believe the second point is correct. I don't know of any published work that uses individuallyvarying times of observation although I'm sure there must be many. This question might best be posted on Multilevelnet. 


I want to analyze data from a longitudinal study measured at three time points (each nine months apart) in six centres; data is about the influence of therapy (three different therapy types; utilization in the six months before each interview date) on substance use (in the last 30 days before the three time points for four substances) in opioid addicts; all variables continuos (no. of days). About 50 patients per centre have a full data set. I'm thinking about a growth model with intercepts and slopes for each substance and therapy type with regressions of the substance use developments on the intercept of the therapies. I'm afraid the n of each centre is too small for it to be a covariate. Additionaly the growth curve is not quite a linear one. Does LGM seem like a possible solution for this? 


Seems like growth modeling is possible here. n=50 would seem large enough for center as a covariate. If your study is a randomized trial, you will find several relevant growth modeling articles on our web site. 


In an earlier posting you indicated that "A minimum of four timepoints is recommended for growth models for two reasons. First of all, with less than four timepoints it is not possible to identify enough parameters in the growth model to make the model flexible. Secondly, four timepoints give more power." Is there an empirical citation you recommend that supports these two reasons? 


I don't know of any citation but if you look at the degrees of freedom for a three timepoint model and compare it to the degrees of freedom a four timepoint model, you will see the lack of flexibility for three timepoints versus four. A standard three timepoint growth model has one degree of freedom while a standard four timepoint growth model has 5 degrees of freedom. 


To add to this, the MuthenCurran (1997) Psych Methods article gives power charts that could be taken as arguments for a substantial power increase going from 3 to 4 time points. 


Just a conceptual question here, but I was wondering if you could explain why, if the linear trend coefficients are fixed at, say, 0, 1, 2, and 3, the intercept coefficients are all fixed @ 1? (E.g., the parallel processes growth curve modeling example on the website, cont5.std.) If the assumption is that the intercept represents the initial levels (i.e., estimated T1 values) of the variable, shouldn't they be fixed @ "0"? Thanks. 


The intercepts of the outcome variable are fixed at zero. It is the loadings for the intecept growth factor that are fixed at one and this follows from the formulas for the growth model. 


I have two questions: 1.would a cubic growth model be identified if the outcome variable was observed at 4 time points and the growth factors are regressed on covariates or do you need more time points to achieve identification? 2. how do you interpret a model with free time scores? suppose your model is MODEL: %OVERALL% i s  out1@0 out2@2 out3* out4*; The output will give you the values of i and s for each individual, and therefore, you obtain the intercept of the estimated curve and the slope or change between the two first time points, but how do you calculate the estimated outcome for the third and fourth timepoints? 


1. No, a cubic growth model cannot be identified with four time points. 2. The mean of the slope growth factor is the average growth for the fixed time scores. You can also check if the free time scores deviate from linearity by dividing the difference between the estimated time score and the linear time score by the standard error of the estimated time score. If this is significant, the estimated time score deviates from linearity. The output does not give individual scores. If you want these, you need to ask for FSCORES in the SAVEDATA command. I'm not sure what you mean by estimated outcomes for the third and fourth time points. If you mean estimted time scores, they are part of the results. 


Hi Linda, many thanks for your quick answers. My question about free scores have to do with the fact that I would like to plot, for some of the individuals in the study, the model fitted and the observed curves and compare them by calculating residuals. Even though I can use the intercept and slope given in FSCORES to obtain the values of the individual's fitted curve up to the second timepoint( first time point was zero and second timepoint 2) it is not clear to me how to obtain values of the fitted curve after the second time point if free scores were used in the model for the third and fourth times. Hope I made myself clear now. Thanks again! 


You would use the following formula: yhat(ti) = ihat (i) + lambda (t)*shat (i) where ihat (i) is the intercept growth factor score for indivdual i lambda (t)is the estimated time score for time t shat (i)is the slope growth factor score for individual i 

Laney Sims posted on Wednesday, February 07, 2007  8:13 am



1) I am working on a latent growth model with two measures over three time points, so I have two slope variables and two intercept variables, and regression relationships between them. I would like to isolate individual cases to see if they behave the way I expect (i.e. choose a case with a high intercept for one measure and then look at the slope for the other measure). Is there a way to specify an individual case and look at the values of all the latent variables for that case? 2) Is there a way to find the frequencies of the original measures (# of cases with a response of 1, eg)? 


1. You can ask for factor scores in the SAVEDATA command. You can include an id variable in the VARIABLE command. Then you can select people who are high on the intercept factor and save their scores and look at them. You can also plot the intercept and slope growth factors and see which individual each dot represents by holding the mouse on that dot. 2. I assume you mean categorical outcomes. The proportions of each categorical outcome is given automatically. You can also use TYPE=BASIC; for basic sample statistics. 


I have a several part question: (1) Is it appropriate to run a four wave model similar to example 6.10 in the user's guide, but without any time invariant covariates? If not, please explain. (2) If this is fine, what information (other than the fit indexes) do I need to include in a results section? For example, if I presented a figure like the one graphically representing ex 6.10, which standardized coefficient values should I include? I am asking because the output shows relations that I didn't specify in the input. To illustrate, if I were to run ex 6.10 (without the time invariant covariates) the output would show a31 with i; a31 with s; a32 with i; a32 with s, etc. Why is this? Do I need to include these paths in a figure? (3) Continuing to use 6.10 as an example, one of my models shows that y11 ON a31 and y12 on a32 are significant/have large effects, but y13 on a33 and y14 on a34 are not as large. if the fit indexes are good, are the findings worth reporting? Or do all the effects from the time varying covariates to the outcome variables need to be large/significant? Thank you for your time! Renee 


1. Yes. 2. Take the timevarying covariates off of the USEVARIABLES list. 3. Report both signficant and nonsignificant effects. 


Thanks, Linda! However, I don't think I understand your answer to #2. If I take the timevarying covariates off of the USEVAR list, then the input won't run. In regards to #3, I understand that I should report both significant and nonsignificant findings. However, I do not have a theoretical explanation (e.g., intervention between time 2 & 3) for why the time three and time four covariants have different relations with the time three and time four outcome variables (compared to time 1 and 2). Given this, does it make sense to report any of these findings? In other words, does the good fit indexes make the two significant relations worth talking about? Thank you again! 


I must have misread your post. I thought it was timevarying covariates. The path diagram should reflect the parameters in the model. Yes. 


Thank you for your responses, Linda. In growth curve modeling, are outcomes at early time points used to predict later time points? I'm asking because several of my models have great fits but the paths from the covariates to the outcome variables are not significant. 


The standard random coefficient growth curve model does not have direct relationships from the outcome at time one to the outcomes at time two etc. This is an autoregressive growth model. I think there has been some work combining these two models. I don't see the fact that the timevarying covariates behave differently at the different time points as a problem. It may just be that there are different covariates that are important at different time points. 

Carl Hauser posted on Tuesday, April 10, 2007  8:19 pm



I hope that this is not too much of an off the wall question. I was recently introduced in a very cursory way to the concept of quantile regression. If I'm understanding it correctly, this would be a very useful method for using individual (person) growth data in a descriptive/ diagnostic way. The particular application I saw used this procedure for creating academic growth reference charts  like pediatricians use for determining percentiles of height and weight across time. In searching the posts here, there are none that refer to this. As a very new user to MPlus and to this site, I don't know if this is appropriate to ask BUT, have you given any serious consideration to this (quantile regression) as a feature in MPlus OR can it be done in the current version even though it isn't referenced? 


I don't think we can do that in Mplus. You are the first to add this to the wish list for future Mplus development. 

Hemant Kher posted on Saturday, April 14, 2007  3:40 pm



My question is about the appropriateness of using LGMs for data I have. My unit of analysis is automobiles, and my sample contains 132 automobiles (e.g. Camry, Corolla, Crown Victoria, etc.), with reliability data for models 200105. Reliability data comes from annual surveys conducted from 200106. For each model year, data comes from the survey conducted from the model year till 06. Thus, for models made in 2001, we have data from the surveys from 200106. For models made in 2002, data comes from surveys from 200206 and so on. Finally, for models made in 2005, data comes from surveys from 2005 and 2006. Mean reliability for the samples plotted over time shows linear changes, and LGM models fit ... but is the application of LGMs appropriate? 


Growth modeling considers changes in individual subjects (such as cars) over time, so you need to have individualspecific measurements at the different time points. I may misunderstand, but it sounds like you have a reliability measure obtained not on each of your 132 cars, but obtained from an annual survey of owners of such cars. If so, that would mean that every Camry made in the same year in your sample has the same outcome in a given year, so no individual variation. So if I am reading this right, this would not be a situation for which growth modeling is applicable. 


Hi Linda and Bengt, I'm trying to evaluate a primary prevention program for alcohol misuse in schools. (1) I have nonequidistant measurement points. pretest, posttest (0.5 after pretest), follow up 1 (1.5 years after pretest) and follow up 2 (2.5 years after pretest). To assume a linear trend, would be "0 1 3 5" the rigth choice (one time score unit = 0.5 years)? (2) If it doesn't fit well (as i would predict), is it possible to free the last two measurement points, even in this case of non equidistance? (3) Is it possible to add a quadraric trend to this existing trend with free time scores? Thanks! 


1. The time scores look correct. 2. You can free two time scores. 3. I would not free time scores in a quadratic model as a first step. 


I am doing twopart models for semicontinuous data. When I ran the regular latent growth curve models, results suggested that a quadratic model fits that data better than a linear model. How do I specify quadratic growth in the twopart models if my time points are not equal distance? So linear with the unequal distance between data points is specified as: iu su  bin3@0 bin4@1.5 bin5@2.5 bin6@3.5; iy sy  cont3@0 cont4@1.5 cont5@2.5 cont6@3.5; Would I just let the time points be free to estimate a quadratic model: iu su  bin3@0 bin4@1.5 bin5* bin6*; iy sy  cont3@0 cont4@1.5 cont5* cont6*; Thank you 


You specify a quadratic model by adding a name of the quadratic growth factor on the left hand side of the : i s q  y1@0 y2@1 y3@2 y4@3; 


Thank you. I wasn't sure if it was the same for the twopart models. 


Hello. I would like to model trajectories of change in lung function across four timepoints in a sample of factory workers. The difficulty I face is that lung function was not measured in waves. For example, one worker may have had his lung function measured once one year after he was hired, and then three times during the following year. Another worker may have had his lung function measured at four equally spaced intervals beginning the year that he was hired. I'm not sure what would be the best way to model these data, or whether it is even possible. Thank you. Denise 


It sounds like you have individuallyvarying times of observation. See Example 6.12 and the AT option. 


Thank you. 


Dr. Muthen, Thank you, again, for your response. I tried the approach you suggested. The model ran, but I received the following error message: THE ESTIMATED COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. COMPUTATION COULD NOT BE COMPLETED. THE VARIANCE OF S APPROACHES 0. FIX THIS VARIANCE AND THE CORRESPONDING COVARIANCES TO 0 OR DECREASE THE MINIMUM VARIANCE. 


Just say s@0; which fixes the slope variance to zero, i.e. the slope is a fixed instead of a random effect (everybody has the same slope value). 


Hello, a free time scores model fitted in my 4 wave data best. Due to interpretation issues I decided to reparameterize from 0 1 * * to 0 * * 1. I want to test these free time scores with regard to deviance from lin and quad scores. Is this possible with 0 * *1? How would one interpret this compared with such a deviance test in 0 1 * *? E.g., in the last paramet. I would test deviance from linearity in the development from T 2 to T3 to T4 but how would one interpret this testing in 0 * *1? In addition and may be the same direction of thinking... Is it possible to plot the means of 0 * * 1 as in 0 1 * *? Regards, michael 


The test of an estimated time score versus its linear counterpart is the difference between the two divided by the standard error of the estimated time score. You should be able to obtain plots in both situations. If you have a problem, send it and your license number to support@statmodel.com. 


thank's, but I already found the formula in this forum. my question pointed in the direction if one could only test this deviance from linearity in the "normal" parametrization "0 1 * *" or also in "0 * * 1" parametrization (or is this silly?). I know how to interpret the second parametrization but another question arises. You once stated that one could interpret aquadratic factor more as "later development". Does this hold also for he second parametrization, where means of slopes describe average increase or decrease between T1 and T4? 


You can test against linearity in either specification. No, later development is relevant only when a quadratic growth factor is involved or a piecewise model. 


ok, my last post regarding this topic: I asked because I get different results of deviance for both paras. The cause of all these questions is, that a free time score model fitted equal to a quadratic model and both fitted better as a linear (binary part of a two part model). In terms of parsimony (2 df more in a free score model) I decided for free scores (with following interpretation problems if you add covariates and needed reparametrizations). But in the full two part model instead, I have on df more with a quadratic model in the binary part as compared to free scores in the binary part (due to needed specifications for the entire model) and my parsimony argumentation is no more valid. So at the moment I'm thinking of going back to a quadratic model in the binary part and get rid of all the interpretation problems. Am I right? 


or let's say it more general: if a free score model fits equal to a quadratic model, which model should one choose? 


I would use free time score models very sparingly and therefore go with the quadratic in your case. 


I have a parallel growth model, in which one development seems linear and the other exponential. Measures are at 0, 3 and 7 months. Can I model one development as linear with time points 0, 3, 7 and the other exponential? Would this than be 1, e^3, e^7? I'm not sure how to explain then that both developmental processes are on the same timescale. I've used the article of Nilam Ram and Kevin Grimm (Using simple and complex growth models to articulate developmental change). Thanks, Renske 


If you go to http://www.statmodel.com/trainhandouts.shtml, you can find our short course handouts. See Topic 3 slide 103 for an example of how to express the time scores for exponential growth. 


Hello, A simple question: I have unequal time between my measurement points (8 months between 1st and 2nd, 12 months between 2nd and 3rd, and 12 months between 3rd and 4th). My time scores are the following: i s  MEASURE1@0 MEASURE2@1 MEASURE3@2.5 MEASURE4@4; What does the mean of the slope indicate (average growth from time1 to time2 or average growth considering all the time points)? Thank you! 


It is for a one unit change in time, from 0 to 1 or 1 to 2 etc. 


Can someone recommend an article that presents a complete example of example 6.10 (pp. 9596) in the mplus manual, fourth edition? This example is "Linear Growth Model for a Continuous Outcome with TimeInvariant and TimeVarying Covariates." I've been exploring this with 5 time points and it would help me to read a presentation of this approach with real data. 


I would suggest that you ask this question on SEMNET or MULTILEVELNET. You are likely to get more suggestions. 


We are conducting a LGM model with 9 waves of yearly data. We first ran a univariate unconditional model looking at linear and polynomial growth factors. We also examined a model where all of the parameters were freed (0 * * *...1). The freely estimated model provided a much better fit compared to the linearonly model. The quadratic also provided a better fit better than the linearonly. We retained the freely estimated model. In our LGMMs, we then used the freely estimated growth factors. A theoretically coherent 4class solution best fit the data, with covariates predicting class membership in expectable ways. I am now wondering whether our use of freely estimated parameters is justified. Using the polynomial growth factors, we obtained a very similar 4class solution to the one with the freely estimated model. Is there a way to adjudicate between the use of the quadratic and freely estimated growth parameters here? Thanks again in advance. 


I think either model is defensible, but a polynomial such as a quadratic is more parsimonious which should be reflected in a lower BIC. 


Ok, great. Very much appreciated. 


I am doing growth curve modeling with three time points. I have some of the predictors measured just before the last time point. In order for me to be able to include the predictors in the model, should my intercept reflect the last timepoint? And, I guess there is no way I can predict the trend (considering the time when the predictors were measured)? Thank you for you help. 


I agree on both counts. But you may want to allow the slope to correlate with the predictors. 

Bill Dudley posted on Tuesday, April 14, 2009  1:19 pm



I have data on how knee joints move and muscles are activated upon landing (imagine a basketball player landing after a jump to catch a rebound). These continuous level data are captured at 1 millisecond intervals and the most interesting part of the landing typically takes about 150 milliseconds to play out. In the recent Baltimore workshop Bengt indicated that after a few dozen measures, the multivariate approach for growth curve modeling becomes unwieldy. In this case I have over 100 time points (the number of time point varies across individuals). In addition, each person has 5 trials. Thus I am thinking that I need to use a multilevel approach (with two levels time points within trials and trials within individuals). Will this work within MPLUS (I have the combo version) and if so can you point me to examples or slides within the last two topics that provide an example of how I might model this? 


I think you are right to take a long, twolevel approach in this case. Take a look at the UG ex 9.16. 

ywang posted on Tuesday, September 08, 2009  10:47 am



Dear Dr. Muthen: I am new to Mplus and have a basic question. I have three time points for Latent Growth Modeling for dummy variables. How many covariats can I have in maximum in order to be identified? Thank you very much for your help in advance! 

ywang posted on Tuesday, September 08, 2009  10:50 am



Dear Dr. Muthen: I forgot to ask another question regarding threetime point latent growth modeling. Can I do parellel latent growth modeling for three time points? If so, can I add covariates and how many covariates can I add into the parellel model? Thank you! 


The number of covariates that you can have is not a function of how many time points you have. Each covariate adds information in the form of covariances with the 3 outcomes, so if you have 2 growth factors (linear growth) your net result is positive. The answer to your second post is yes and see answer above. 

Dale Glaser posted on Wednesday, February 17, 2010  1:46 pm



Dear Dr. Muthen(s); I wanted to followup on the 2007 inquiry if, as of the most recent version of Mplus, the capability of conducting longitudinal quantile regression. I have a project that may necessitate the use of such a technique, and being a SPSS user the only option is to use the quantreg package from R..............any feedback would be most appreciated. Dale 


What is longitudinal quantile regression? 

Dale Glaser posted on Thursday, February 18, 2010  3:16 pm



Good queston Bengt! A faculty member at one of the local universities requested help with this technique, and though I know what Quantile Regression is (see Hao and Naiman, 2007) for crosssectional studies, I am blissfully unaware of this approach to longitudinal studies. 

Sara Vargas posted on Wednesday, March 31, 2010  1:59 pm



I am doing LGM using 3 time points (0, 6, 12 months) to look for condition effects (control vs intervention) on the slope and intercept of an outcome measure. I estimated the 3rd time point (estimated at 8 months after baseline). The syntax: Intercept by baseline@1 time2@1 time3@1; Slope by basline@0 time2@6 time3*; [baseline@0 time2@0 time3@0]; [Intercept Slope]; baseline(1); time2(1); time3(1); Intercept on condition; Slope on condition; I want estimated means and standard errors for each group (control and intervention) at each time point. I have recentered the slope at time 2 and at time 3 and recorded the intercept as the mean. I run into problems when I center on a time 3 that has been estimated. I am first setting the slope @ 0, 6, *, and then recentering on time 2 such that the slope is set @ 6, 0, *. What is the proper way to recenter at time 3? Also, is there a better way to get estimated means and standard errors for two condition over three time points? 


Instead of recentering, I would express the means in terms of labeled model parameters using the "new" option of Model Constraint. This gives you the estimate and its SE. So for instance, the mean at time 3 for condition=0 is: Model Constraint: New(t3mean); t3mean = = iint + lam3* sint; where iint/sint is the label for the intercept of the intercept/slope growth factor in its regression on condition. 

Sara Vargas posted on Sunday, April 18, 2010  10:33 am



Thank you for your response. I am still a little confused about how I am supposed to be labeling the intercept of the intercept/slope growth factors. I tried to do it as noted below, and I keep getting error messages. I also tried numerous derivations based on your response and I continue to get error messages. MODEL: Intercept by baseline@1 time2@1 time3@1; Slope by basline@0 time2@6 time3*; [baseline@0 time2@0 time3@0]; [Intercept Slope]; baseline(1); time2(1); time3(1); Intercept on condition; Slope on condition; MODEL CONSTRAINT: New(t2mean); t2mean = [Intercept] + 6* [Slope]; New(t3mean); t3mean = [Intercept] + 8.2* [Slope]; 


You label the parameters in the MODEL command, for example, MODEL: [intercept] (p1); [slope] (p2); and use the labels in MODEL CONSTRAINT, for example, MODEL CONSTRAINT: p1 = p2; See the user's guide for further information. If you continue to have problems, send the output and your license number to support@statmodel.com. 


I'm having issues with modeling a latent growth curve with achievement data. The time periods of assessment are approximately 0, 2, 4, 6, and 9 years apart, but when I enter those values for the slope, the psi matrix is always NPD. Looking at the curve itself, it appears there is very little improvement from year 6 to 9 in particular (a ceiling effect). So I free the last 3 years, which runs, but the values I get for the slope are not intuitive (from about 2.5 to 3.2 to 3.6). Is it problematic to report or use this as a final model when it doesn't represent the actual timing of developmental growth? Being a developmentalist, it just irks me a bit to use this as the final model. Or is this the best because it does represent the real development in these skills? Many Thanks 


Fixed time scores should reflect the differences between the measurement occasions. If you free a time score, the estimated value shows the deviation from linearity. It looks like you have freed several time scores. Perhaps you should instead consider a different model for your data, for example, a piecewise model. 

Regan posted on Thursday, August 19, 2010  12:49 pm



Dear Professors, I am new to SEM models. I understand that for growth models, you recommend having at least 4 time points. My main interest is to look at an adult predictor on an adult outcome, controlling for measurements in the predictor taken at one and possibly two antecedent (adolescent) periods. Now, is just putting the predictor variable in the model and specifying paths as usual enough, or is there something else that I need to do to adjust for the 'repeated measures' in the model? Example if F1 = adolescent predictor F2 = adult predictor F3 = adultoutcomebehavior can I just use the code: F3 on F2 F1; F2 on F1; (Of course there are other covariates in the model, I just wanted to be simple here) I have been told by someone that if I do this, I am not accounting for repeated measures and that I need to do a multilevel model or growth model. Is this correct? Thank you! 


It does not sound like you have repeated measures. Repeated measures is the measurement of the same variable at multiple time points. It sounds like you have several variables for each person which is fine. Multivariate analysis handles this. 

Regan posted on Friday, August 20, 2010  10:56 am



I am sorry, I wasn't clear. F1 and F2 are the same variable measured first during the adolescent period and then again during the adult period. In that case the same variable (let's say depression) is measured at two time points. Do I need to do anything different than the notation given previously? Thank you and sorry I was not clear the first time. 


With only two repeated measures, only an intercept only growth model would be available. It would seem your regression model would be fine. 

Regan posted on Friday, August 20, 2010  3:12 pm



Thank you! 

Nary Shin posted on Sunday, September 19, 2010  4:34 pm



I have 3 time point model which is not linear. The trajectory looks like a V and actully it is theoretically correct. Due to this characteristics,I fixed only the time 1 and 2 scores, and use free time score for the time 3. Does it sound ok? Thanks in advance. 


You can do this but you really need more than three time points to understand nonlinearity. 

ywang posted on Monday, September 20, 2010  12:56 pm



I have 3 time points of repeated measures. Can I conduct a parallel growth modeling? Do I need 4 or more time points for the model? Thanks a lot! 


Each process can have three time points. It is desirable to have at least four time points so the model has enough degrees of freedom to allow for modifications of the model. 

ywang posted on Wednesday, December 08, 2010  12:00 pm



Dear Linda, It is easier to conduct the parallel growth modeling, but what are the steps for parallel growth mixture modeling? Are there any handouts or papers for this? Thanks! 


You need to extend a regular parallel process model like the one shown in the Topic 3 course handout to mixtures. You may find the following paper which is on the website helpful: Kim, Y.K. & Muthén, B. (2009). Twopart factor mixture modeling: Application to an aggressive behavior measurement instrument. Structural Equation Modeling, 16, 602624. 


Dear Linda, We offered parenting groups to 90 parents (for an average of 8 parents per groups) and we would like to show that parenting practices and child's psychological health improved from T1 to T2. I've read above that to do a growth model, I need 4 time points. Unfortunately, T3 and T4 are not yet available. If I run a simple MANOVA (I have 5 dependent variables), reviewers will say that I did not take the nested nature of my data into account. Is there any way that I can compare T1 to T2 on five dependent variables using multivariate hierarchical analyses? (We were originally planning to conduct a piecewise growth HLM model with 4 time points) I thank you for you time, it is really appreciated. 


Multivariate analysis takes into account the lack of independence due to having several variables per person. Therefore MANOVA would take this into account. You could also look at an intercept only growth model. 


Dear Linda, Let's say that I have just one outcome. Can you confirm that this model would not be identified: Level 1(Repeated measures): Outcome= n0 + n1(tprepost) + e Level 2(Parent level): n0 = b00 + r0 n1 = b10 + r1 Level 3(Group level): b00 = y000 + u00 b10 = y100 + u10 , where n1 = coefficient representing change from baseline to posttest for each parent (tprepost = measurement time dummy coded as follows: baseline=0, posttest=1; I'm only interested in the y100, could I interprete it? If so, could I add covariates? Thank you again. 


That model is indeed not identified because you don't have repeated Level1 observations on that postonpre regression (they repeat on level 2). Two alternatives: One alternative is to instead write your model as a growth model with 2 time points y1 for pre and y2 for post. But it is a very weak growth model where you cannot identify both intercept and slope variance across parents, but can say fix the slope variance at zero, only estimating its mean. The intercept variance across groups can also be estimated. If you are particularly interested in the mean of the n1 slope (what you called y100), you can work with a 2level model where parent is level 1 and group is level 2: Level 1(Parent level): Outcome= n0 + n1(tprepost) + e Level 2(Group level): n0 = b00 + r0 n1 = b10 + r1 You would then focus on b10. 


Thank you very much. This is very, very helpful. Geneviève 

Kesinee posted on Friday, June 17, 2011  9:04 am



Hi, I have 4 time points with single level. I would like to analysis (growth model or autoregressive model) as in figure 3 or 4 in paper “10 LATENT VARIABLE MODELING OF LONGITUDINALAND MULTILEVEL DATA by Bengt Muthén.” Any suggestion would be appreciate. Sincerely yours, Kesinee 


See Chapter 6 for growth modeling examples. A simple autogregressive model would be: y4 ON y3; y3 ON y2; y2 ON y1; 


Hello Linda, I am doing growth mixture modeling on pain measurements taken on hospital patients. Measurements 1 and 2 were taken at fixed time points but the last measurement was taken at time of discharge which is different for each individual. How do I include time in my analysis of trajectories of pain? 


You should create time score variables for the three time points and use the AT option. See Example 6.12 and adapt that to GMM. 


Thanks Linda. I assumed the tscores work with type=random only. This is helpful. i will try it. Thanks. 


Hello Linda, I am getting the following error message: *** ERROR in VARIABLE command TSCORES option is only available with TYPE = RANDOM. Selahadin 


Ok. sorry Linda. I can use type = mixture with random. 

Erin posted on Tuesday, November 15, 2011  2:32 pm



Hello, I have a design where there is unequal time assessments (baseline, 3 months, 4 months, and 9 months). Can I code time as: baseline: 0 3 months: 1 4 months: 1.33 9 months: 3 Thus, one time score unit = 3 months, and .33 units of time = 1 month. Does this make sense? Thanks so much! 


Yes, this makes sense. 


Dear, I am fitting a growth curve model. The means of my 3 measurements of the concept are: 2.47 (time 1) 2.45 (time 2) 2.57 (time 3) The difference between time 1 and time 2 is not significant. Te difference between time 1 and time 3 and time 2 and time 3 is significant. I want to fit a growth model that specifies that there is no growth between time 1 and time 2 and growth between time 2 and time 3. Is this possible? I was thinking about this specification: int slp  time1@0 time2@0 time3@1; But such a model is not identified. How do you fit a model in which you hypothesize that there is no growth? Thank you! 


To identify your model, you either need more time points or to fix the variance of slp to zero. 


Excuse me for again another question on LCM. How do you fit a model without average growth (insignificant slope) but with slope variance. So the average of the growth over three time points is zero, but between the individuals there is significant variation in growth but they cancel each other out. Thank you 


You would estimate both the mean and variance of the slope growth factor. 

Mark Schultz posted on Wednesday, February 08, 2012  9:53 am



Hello: I have longitudinal data on 512 subjects, but only 2 time points. I've plotted the data and it looks like there are several homogeneous subgroups that vary in both intercept and slope. Can longitudinal mixture modeling do anything for me or maybe some other clustering technique? Thanks, Mark 

Vanessa posted on Wednesday, February 08, 2012  4:44 pm



Hello. I have four time points. If I specify the slope by T1@0 T2* T3* T4@1, is it correct that this allows for the possibility of either a linear slope (ie. if T2 estimated at .333 and T3 estimated at .6666) or a nonlinear slope? hence, to determine shape of curve, instead of modelling a linear slope and comparing with eg. T1@0 T2* T3* T4@1, I can just model the latter to see which is the best description? 


Mark: You could model the two variables as an intercept only growth model or just use them both as latent class indicators. It's hard to see much of a developmental trend with only two time points. 


Vanessa: When you have fixed andfree time scores, the mean of the slope growth factor is the average linear slope between t1 and t4. 

Vanessa posted on Wednesday, February 08, 2012  5:15 pm



Thanks Linda. But what I am wondering is, is it correct to describe this specification (T1@0 T2* T3* T4@1) as allowing for either a linear or nonlinear slope? 


I would not say that. The model is a nonlinear model in that there are free time scores. But the mean of the slope growth factor describes the average linear slope from time one to time 4. 

Vanessa posted on Thursday, February 09, 2012  2:12 pm



But the estimated time scores (in (T1@0 T2* T3* T4@1) could turn out to be .333 and .666 (so T@0, T2 .333, T3 .666, T4@1) thus allowing a linear model  is this correct? 


Are the measurement occasions equidistant? 

Vanessa posted on Friday, February 10, 2012  11:18 pm



Linda, yes the measurement occasions are equidistant... 


Then if your estimated values are .333 and .666, you have no need for free time scores. You should fix the time scores at 0 1 2 3 or 0 .333 .666 1. 

Vanessa posted on Monday, February 13, 2012  4:48 pm



It was more a hypothetical situation and question. I am not sure of the shape of the slope (it is an additional slope present only in one group (intervention scenario)); so instead of trying to fit both a linear and nonlinear (estimated time scores) slope and then comparing to see which is best, I thought that a simpler solution is to allow for free time scores (T1@0 T2* T3* T4@1), providing this allows for either a linear or nonlinear slope to be estimated... 


See the Topic 3 course handout and video on the website. This covers the topics you are interested in. 

YKim posted on Sunday, February 26, 2012  7:52 pm



Dear Dr. Muthen, I am trying to compare the results of LGM with timeinvariant covariates between AMOS and Mplus. The parameter estimates are almost same up to first two decimal points, but Mplus didn't provide the model fits and standard errors for parameters with following message: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 8. THE CONDITION NUMBER IS 0.564D10. , while AMOS did provide all related information. Would it be possible to send the data and input file along with AMOS output to you? Thank you 


Please send the AMOS output, the Mplus output, and your license number to support@statmodel.com. 

Sar posted on Friday, July 27, 2012  6:21 am



Hi, I am working on a longitudinal study with three time points and have been reading about growth mixture modeling. I have recently read several papers that have used three time points to estimate a growth mixture model however noticed that you recommended a minimum of four time points for this analysis. I also note this is the standard for your examples throughout the Mplus manual. Is there a rule of thumb for the number of time points when using Mplus? Also in which ways will I be restricted (if any) if I use only three time points as has been done in the papers I have been reading? Any advice would be greatly appreciated. 


The reason we advise at least four time points is because with three, there is only one degree of freedom for making model modifications unless other parameters are fixed or constrained to be equal. This is shown in the Topic 3 course handout on Slides 50 and 52. 

L. Siemons posted on Tuesday, July 31, 2012  6:33 am



Hello, I’m not very familiar with growth mixture modeling yet, but I’m going more deeply into the analysis at the moment. I’m planning to perform such an analysis on my data, but I do have a question about the timepoints which are allowed to use. I have data available from hospital patients. As you can imagine, the measurement times vary a lot across patients, because they are not measured at the same time. I did read that this is not a problem for the GMM analysis, because I can enter time variables individually for each case in the dataset. However, I also have a different number of measurements per patients. Some patients have 10 measurements spread over a period of a year, while other patients have only 5 measurement points over 5 months, for example.  Is it a problem that patients do not all cover the same time period? (Some patients are measured over a longer time period than other patients.)  Is it a problem that some patients have for instance 5 measurements, whereas other patients have 10 or even more? If these are problems, why are they? I cannot find anything about this in the literature. Do you perhaps also know a reference in which this is further explained? Thank you very much! 


You can deal with individuallyvarying times of observations by using the AT option in conjunction with the TSCORES option. See Example 6.12. This is not a mixture model but can be extended to the mixture case. Regarding a different number of measurement occasions, for the outcome this is done by using missing data. Time scores cannot have missing data but I believe you can assign any value for time scores when the outcome is missing because these don't come into the estimation. You can try this by doing the analysis twice, once using one number and a second time using another to test that there is no difference in the results when you do this. 


hello, i am working on a higher order LGM with 5 time points and a quadratic slope. two different codings of time are: 0 1 2 3 4 (model a) and 4 3 2 1 0 (model b). estimated means for the latent variables are: 2.999 2.690 2.601 2.567 2.643. a substantial difference (change of sign) appears for the slope factor means. model a: I 2.999 S 0.221 Q 0.038 model b: I 2.643 S 0.130 Q 0.038 since the slope mean represents the instantaneous rate of change, what would you say does the positive sign indicate in model b? something like the rate of change "looking backwards" in time? thanks for your help. 


It is hard to disentangle the linear and quadratic effects. They work together. 


actually my question is related to the change of the sign. how can the linear component switch from + to ? 


When you change the centering and have a quadratic model, because the linear and quadratic components are entangled, what will happen is not predictable. 


Hello I am working with an unusual nested dataset. The structure is multiple detention stays nested within juveniles. It is unbablanced in that not every juvenile has the same number of stays. The problem is that while each stay is unique, sometimes the outcome attached to each stay spans multiple stays. That is, my outcome could be the exact same outcome for Stay 1 and Stay 2. Does this pose a problem when analyzing the parameters of a Multilevel Growth Curve Model? 


No, this should not be a problem. 

L. Siemons posted on Wednesday, October 24, 2012  8:36 am



Hello, after performing a GMM analysis on my data, I plotted the sample and estimated means of my 3 groups over time. However, I would like to plot 95% confidence intervals around these curves also (or 95% error bars at the different measurement points for each curve), to enable a better interpretation of the results. Is this possible? If so, how can I do that? Do I have to add something to my syntax? If this is not possible within the plot, is it then possible to get the intervals in the output, so I can still make this plot on my own? Or do I have to compute the confidence intervals myself? In that case, can you please tell me how do I do this and what kind of output I need for this purpose? Thank you for your time! 


We don't automatically give confidence intervals for this plot. You can use MODEL CONSTRAINT to define the model estimated means at each time point and obtain standard errors and confidence intervals to plot in this way. 

L. Siemons posted on Wednesday, December 12, 2012  7:51 am



Dear Linda Muthen, Thank you for the quick response to my question. However, I still not fully understand how I should use the MODEL CONSTRAINT comment to get confidence intervals. I searched for an example in the user guide, but I could not find any. Can you please tell me how I should adapt my syntax to get the confidence intervals?: TITLE: GROWTH MIXTURE MODELING DATA: file is DAS28OverTijd036912_6wks_FINAL.dat; VARIABLE: names are pnr das0 das3 das6 das9 das12; usevar = das0das12; missing = all (999); CLASSES = c(3); AUXILIARY = pnr; ANALYSIS: type = MIXTURE missing; STARTS = 100 10; STITERATIONS = 10; MODEL: %OVERALL% i s q  das0@0 das3@1 das6@2 das9@3 das12@4; %c#1% i s q; %c#2% i s q; %c#3% i s q; SAVEDATA: file is 3classGMMquadratic.txt; save = cprob; OUTPUT: sampstat standardized CINTERVAL TECH1 TECH8 TECH11 TECH14; PLOT: SERIES = das0das12 (s); TYPE = PLOT3; Thank you! 


You don't use MODEL CONSTRAINT to get a confidence interval. You use it to get a standard error to use to create a confidence interval of the parameter estimate plus and minus 1.96 times the standard error. 

L. Siemons posted on Friday, December 14, 2012  2:19 am



Dear Linda Muthen, okay, I understand that I won't get the confidence intervals immediately, but I will be able to calculate them. But where and, maybe more important, how do I include that in my syntax? How do I define this? Kind regards and many thanks! 


You do them manually. There is no syntax. A confidence interval is the parameter estimate plus and minus 1.96 times the standard error. 

L. Siemons posted on Friday, January 11, 2013  7:33 am



Dear Linda Muthen, can you please tell me how I can use MODEL CONSTRAINT to define the model estimated means at each time point? My syntax is as follows: TITLE: GROWTH MIXTURE MODELING DATA: file is DAS28OverTijd036912_6wks_FINAL_final.dat; VARIABLE: names are pnr das0 das3 das6 das9 das12; usevar = das0das12; missing = all (999); CLASSES = c(3); AUXILIARY = pnr; ANALYSIS: type = MIXTURE missing; STARTS = 100 10; STITERATIONS = 10; MODEL: %OVERALL% i s q  das0@0 das3@1 das6@2 das9@3 das12@4; %c#1% i s q; %c#2% i s q; %c#3% i s q; SAVEDATA: file is 3classGMMquadratic.txt; save = cprob; OUTPUT: sampstat standardized CINTERVAL TECH1 TECH8 TECH11 TECH14; PLOT: SERIES = das0das12 (s); TYPE = PLOT3; And my mean values are: Timepoint Class1 Class2 Class3 0 4.84786 5.06631 4.0073 1 3.14396 4.56802 2.83109 2 2.69506 4.22247 2.42962 3 3.11937 3.74058 2.25646 4 4.75657 3.0839 2.23159 How do I specify the MODEL CONSTRAINT in my syntax so I can get the standard errors of the means? Can you tell me how to do it or maybe give an example? Kind regards and many thanks! 


The formulas for the means of the outcomes can be found on Slide 98 of the Topic 3 course handout on the website. You can specify these formulas in MODEL CONSTRAINT. 


Hello, I am having trouble with a simple growth model with unequally spaced time points. I have data from 4 time points. There is a wide range (10 years) in the age of the participants at the start of the study. I have specified a model using TSCORES defined as the participant's age at each time point: MODEL: I S  y1y4 AT age1age4; I S on x; ANALYSIS: TYPE = RANDOM; I keep getting the error "*** ERROR in MODEL command The number of fixed time scores is not sufficient for model identificationin the following growth process: I S" I have tried reducing the number of different ages, first by rounding to integers, but got the same error, then tried rounding in 2s and then 5s, but still have the same error. Any advice would be most welcome. Thank you! 


Please send the output and your license number to support@statmodel.com. 


Hello, I have only three time points and the data do not fit a linear model. The time points are not equidistant. Measurements were taken when children were 6, 8 and 9 years old. So this is how I modeled it: i s  x1@0 x2@2 x3@3; But to fit a nonlinear model, I wanted to try freeing the second time point; should I model it as i s  x1@0 x2@* x3@1; or x1@0 x2@* x3@3; Also do you have an article/reference suggestion for nonlinear models with three time points? Thank you very much. 


This is difficult to do because a model with three timepoints has only one degree of freedom unless you place other constraints on the model, for example, holding the residual variances equal over time. The choice of which time point to free should be determined by the part of the model that is of interest. Are you more concerned with the growth from 6 to 8 or 6 to 9. The growth factors will be describing the development you choose. 


Thank you so much for the answer, we are interested in 6 to 9 years, then after setting other constraints for the nonlinear model such as holding residual variances equal, should i free time scores as 0 * 3 or 0* 1, that part gets me confused. Also is there a specific article you would recommend on nonlinear models? İ greatly appreciate your help. 


You would use 0 * 1. I don't know of any specific artilce on nonlinear models. 

Cathy Nylin posted on Sunday, March 10, 2013  11:41 pm



I have a nonlinear model and want to free the time points. I keep getting the error message "*** ERROR Error in parsing line: "var3@0 var4@* var5@* var6@1" Is it an error in my syntax? MODEL: I1 S1  var3@0 var4@* var5@* var6@1; I2 S2  con3@0 con4@* con5@* con6@1; 


Are var3@0 and var6@1 actually underlined? and the same for the con variables? 


Actually, I think the problem is @*. You should just have *. @ means fixed at. * means free. 

Cathy Nylin posted on Thursday, March 14, 2013  9:44 am



Thank you so much, the variables are not actually underlined even though they show that way here. It was the presence of the @ symbol. 


Hi, I want to look at the variability across time (trajectory) of a specific behavior measured repeatedly across time using multilevel modeling. First of all, I want to know if it's more recommended to use the univariate way (long format HLM) when we have a small sample (n < 100) and time varying measurements (nonequaly spaced). Furthermore, I have three different populations (or groups) in my sample and I want to look at the difference between these groups. I have the same data for each of them. Should I used the groups as a "between" predictor or should I used a multiple group function to compare the growth of each group? Thank you for your help (and I'm sorry for my poor english, I am french). 


Unless you have very many time points or individuallyvarying times of observations (not just nonequally spaced), I would recommend using the wide approach because it can not only do the same growth models as the twolevel approach but more general growth models. When you have a small sample, I would recommend using group as a covariate, so using 2 dummy variables for the 3 groups. 


Thank you for your help. Do you recommend to use bayesian estimation in a latent growth model to handle a small sample size? 


Bayes can have advantages with a small sample size, for instance when the estimate distribution is not close to normal due to the small sample size or when the chi2 test of model fit is not reliable due to the small sample size. 

tmh2013 posted on Thursday, May 09, 2013  2:30 pm



I have four time points of data. I ran an unconditional LGM to see if these reports changed over time. I would now like to see if gender differences exist in these reports (perhaps the trajectory is different for boys vs girls). Is it okay to run a conditional LGM that includes gender (0=girls, 1=boys) as a timeinvariant covariate? How would running a multigroup SEM be different? Which would be the better approach? 


If you want to compare growth models for boys and girls, you should first do them separately for each gender to see if the same model holds for each group. If, for example, both models have linear growth, then you can do a multiple group analysis to compare growth parameters. If not, it does not make sense to compare them. 


I'm a beginner using LGC modeling. I have a 4point data, collected in 1986, 1989, 1994, and 2001. So, I tried time scores of 0, 3, 8, 15 (and 0, .3, .8, and 1.5), but my model failed to converge. But, when I did a random trial with 0, 1, 2, 3, I had a converged solution. What does this mean regarding fixing time scores? Also, I read an article, which Dr. Bengt Muthen did with Curran and Harford (Curran et al. 1998. J. Stud. Alcohol 59:647658), where it says, "The final factor loadings on the growth factor were 0, 1, 1.54, and 2.46 (where the first two values were fixed and the second two values were estimated from the data and significant at p < .01)" (p. 650). Was it program as "I S  VAR1@0 VAR2@1 VAR3 VAR4"? 


Please send the output with the correct time scores and your license number to support@statmodel.com. 


I'm sorry, but please ignore the above message I posted. 


Hello, I performed a GMM analysis on my data and plotted the sample and estimated means of 3 groups over time. Now, I would like to plot 95% confidence intervals around these curves, to enable a better interpretation of the results. I read from the forum that you cannot automatically get confidence intervals for this plot and that I should use MODEL CONSTRAINT to define the model estimated means at each time point and obtain standard errors and confidence intervals to plot in this way. However, I already tried many things but I wasn’t successful. Can you help me out? I have to submit a revision soon and the reviewer wants to see confidence intervals around the trajectories. I can send you my syntax if you want to, so maybe you can modify it the way it should be? I’m really stuck with this, I hope you can help me solve this. Many thanks! 


Please send your output and license number to support@statmodel.com. 


Hello! I have three questions: (1) When I test a nonlinear growth curve model then I always need the linear slope variable (e.g., for exponential, logarithmic curves)? Or do I only need the linear slope variable when I analyse a quadratic curve? (2) I have 7 time points with the following means: T1=8,33 T2=15,61 T3=20,22 T4= 16,23 T5=23,75 T6=20,02 T7 = 21,02. SPSS tells me that an scurve would fit best for these data. Now I don´t know (i) which parameter fixations I should set and (ii) if i need a linear slope variable as well? I used some freely chosen fixations and the model fits good. However, the fixations are not based on any calculations. Here is my model: i by b2b8@1; scurve by b2@0 b3@2 b4@3 b5@3.8 b6@4.6 b7@5.2 b8@5.8; [b2b8@0]; [i scurve]; (3) For the model in (2) I get a good model fit however, I get the following warnings: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD ... CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE I. Furthermore, I get no standardized values. How can I handle this problem? Thank you very much indeed! 


12. You need only one slope growth factor as you have specified. 3. You probably have a negative variance for one of your two growth factors. 


Thank you very much for your reply. Concerning 2: you don't think that a reviewer will make problems when I freely estimate the fixation of the growth curve and not the program? 


That might be. Although if the fitting of the means gives good model fit and the shape is substantively motivated, it may be accepted. If you want to do it right, you may want to use a nonlinear growth model. See 2 articles by Grimm on our website under Papers, Growth Modeling, showing how to do nonlinear growth modeling in Mplus. 


I am fitting a latent change model with two measurement occasions (Duncan, Duncan, & Stycker, 2010). My goal is testing whether the change factors, intercept and slope, predicts a dichotomous outcome. I estimate the model without any problem. However, when I compare the regression estimates from intercept to the DV, they don't match the results that I obtained from a binary logistic regression where I use the same DV and the T1 measurement of the growth indicator, either in MPlus or SPSS. In the latent change models, the intercept is a significant predictor of my DV (p<.001). In the binary logistic models, the indicator measured at T1 is not a significant predictor of DV. I got the same pattern for 8 different models, and even when I varied the error variances of the measures at a range of 0.00 to 0.90. Could this inconsistency be a result of the MPlus's default model specifications? 


This makes sense if your intercept growth factor is defined by centering time at T1. In that case, the outcome at T1 is the intercept plus a residual. This means that regression on the T1 outcome has an error in the predictor and therefore the regression slope is attenuated relative to regressing on the intercept itself. The lower the Rsquare for the T1 outcome as explained by the intercept, the larger the attenuation. 


Is it right when I say that I can´t estimate a growth model when I have 3 timepoints (t1 t2 t3) and the mean peak is at t2 (e.g. means t1t3: 50 100 40)? Many thanks! 


Three time points is not sufficient to understand a nonlinear trend. 

Yan Quan posted on Wednesday, September 11, 2013  9:11 am



Hi Linda, I have a binary longitudinal data measured on daily basic (270 days and 300 subjects). I would like to group the whole population based on their trajectory pattern. I am wandering to use GMM with logistic model, but I found that most of the examples are less than 10 time points. Do you have any experience or suggestions on grouping daily longitudinal data? Thank you so much! Yan 


I have not done this, but one could make it into a 2level data set and do growth modeling the long way (see UG ex 9.16) and then add trajectory mixtures by having a level2 latent class variable (the UG has several examples of this). Scale the time scores to lie between 0 and 1. 


Hi, I am running a growth model with several time point (9) after the baseline: 1 week, 2 weeks, 3 weeks and 4 weeks then 6 weeks, 8 weeks 10 and 12 weeks. I was wondering if I have to adjust the model specification for the difference between the first 4 time point and the others. Namely 1 week difference between the first 5 and 2 weeks difference between the others 4. If yes, may you suggest me how? Thanks and Happy Holiday to you both! Andrea 


The time scores should reflect the distance between the time points, for example, 0 1 2 3 5 7 9 11 


Thank you! 

Andrea posted on Tuesday, March 11, 2014  12:20 pm



Hello: I am running a multiple group LGM. I would like to free time scores, but would like to constrained the scores estimated across groups. Is there a way to do this? The error I am receiving is: Equality/parameter labels are not allowed in a growth statement. Problem for the process: I S Q in the line: var1@0 var2var6* var7@1 (A1A7) Andrea 


You would need to specify the model using the BY option. See pages 677682 of the user's guide. 

Matt K posted on Saturday, July 05, 2014  5:31 pm



Suppose there is a quadratic conditional model with an outcome measured at 6 nonequidistant time points. Is it a requirement that each case have outcome data for at least 4 time points for the model to be identified? 


No, each person does not need to have data for at least 4 time points. 


I am currently wanting to estimate 3 models using different outcomes for change across time using seven predictors taken prior to the first data point. The first model has 3 time points and I have undertaken analyses on Change difference scores. However the other 2 measures only have 2 time points, may I ask what type of analysis I could use in MPlus? 


With two time points, an intercept only model can be identified. 

Anne Chan posted on Monday, June 08, 2015  2:40 am



Hello, I specified the time loadings of a linear latent growth curve with 4 measurement points as 0, 1, 2, 3. Then I recentered the time loading as 3,2,1,0. The slopes of the two models are the same, which make senses. However, if I do the same for curvilinear growth curve (growth factors: I S and Q), the quadratic term of the two models (one with time loading as 0,1,2,3 and another as 3,2,1,0) are the same but the slopes are different. May I ask why the slopes are different? 


In the first case you do a linear transformation of time in a linear model so that will give you the same slope (with opposite sign). In the second case, you can't interpret the linear part separately from the quadratic. But you will get the same estimate means for the outcomes. 


Dear all, I have conducted an LGCM together with survival analysis: i s  y11@0 y12@.2 y13@.5 y14@.8; f by u2u4@1; f on i s; f@0; Findings indicate that a) the mean of the slope was not significant and b) a significant effect of f on the slope. I was wondering if can interpret the regression of f on the slope (change predicts mortality) even though there is no significant increase in y for a one unit increase in time? Thank you very much! Stefan 


If s has variability, this is fine. It is the variance of s not the mean that you should look at. 

sojung park posted on Sunday, October 18, 2015  4:34 pm



I have 7 time points does each person have to participat in at least 3 or 4 timepoints in growth curve model? In one stat class in the past I was told that this is the requirement .. I was not sure.. 


No, not as long as a good portion of your sample is observed at several time points. 

benedetta posted on Wednesday, January 20, 2016  2:34 am



Dear professors, I am running LGM for a continuous outcome observed at 6 points in time equally spaced, and I want to study the effect of an exposure only measured at baseline. Outcome is first measured at baseline (same time of observation of exposure). I firstly specified the growth model as follows: i s mh1@1 mh2@2 mh3@3 mh4@4 mh5@5 mh6@6; then I changed it into i s  mh1@0 mh2@1 mh3@2 mh4@3 mh5@4 mh6@5; The linearity of growth I am assuming is consistent in the two model and results only differ for intercept coefficients, although the difference is very small. I just wanted to ask a theoretical explanation for that. Thank you very much! 


See the handout for our Topic 3 short course on our website regarding how the time scores define intercept and slope growth factors. 


Dear professors, I am trying to investigate following question on sample with 35 people with 40 measurement points: is variance in X (mood) within a person by Y (selfesteem  baseline variable  on between level)? and is variance in X across people predicted by Y? Thereby, we dont expect X to be linear within the person and across timepoints but rather to fluctuate (mood during one week period). I was considering multilevel approach but I have difficulties defining it since the predictor is on between level and I actually want to see whats the effect on within level and between level. When I try with wide format data and LGM I get the error message that I have more parameters then cases so I assume there is a problem with to few cases. Do you have any suggestion how to best analyze this type of data? Thank you! 


Take a look at Section 7 in http://statmodel.com/download/NCME_revision2.pdf 


Hi! Thank you so much, that sounds right, is there any practical guide how to do this in Mplus? That would be very helpful, thanks in advance! 


We have all the examples in http://statmodel.com/download/all_tables.zip You might also find useful slides 1835 https://www.statmodel.com/download/handouts/MuthenV7Part2.pdf and example 9.19 from the user's guide 


Dear Linda: I am trying to estimate a growth curve model and ran into some difficulties. I first estimated an unconditional model (model without covariates) and encountered no problems; the estimates looked plausible. I next estimated the model with timeinvariant covariates and it converged without any problem; the estimates also look plausible. However, once I include timevarying covariates into the model, it does not converge. I have tried everything that I know but the model will not converged. I even reduced the time points for the time varying covariates from 10 to 5, but the model still failed to converge. I will appreciate any suggestions on how to proceed. I'd planned to include my model/syntax here, but that will cause my post to exceed the size limit for messages. I can submit it in a subsequent post if that is necessary. 


Please send the output that does not converge and your license number to support@statmodel.com. 


Hello, I am currently doing multivariate latent growth curve analysis examining relations between technology use (IV), sleep (mediator) and life satisfaction (DV) as per MacKinnon model (page 212 in Introduction to Statistical Mediation Analysis). Measures for technology use and sleep have been taken in year 8, year 9 and year 10 and life satisfaction in year 9, year 10 and year 11 so the DV is lagged by one year. Slopes controlled for their intercept. I have bootstrapped the indirect effects and am mainly examining the intercept to intercept to intercept paths (similar to cross sectional mediation analysis with intercept for DV one year later) and the slope to slope to slope indirect paths (with trajectory one year later than IV and mediator) Do you see any problems with coding the time @0, @1 and @2 for the IV and Mediator LGC and then @1, @2 and @3 for DV. 


Hello, For my multivariate LGC mediation model I was asked to determine the size of the indirect effect so I calculated the indirect effect as a proportion of the total effect (if the direct effect was negative and indirect is positive the absolute values were used, page 82 MacKinnon). When I do this I still have the cross paths in the model e.g. from intercept (IV) to intercept (mediator) to Slope (DV) and other combinations of cross paths (intercept to slope to slope). Some of these paths are not significant. I am getting large proportions, for example the indirect effect for Intercepts is 0.206, direct effect is 0.009 and therefore the ratio is .206/.215 = .96 so the mediated effect of sleep explains about 96% of the total effect of levels of technology use on levels of life satisfaction. 2. Should I have taken out the cross paths so I am looking at the total effects just for intercepts to intercept to intercept? When I do this the total effects increases and therefore the proportion decreases as per previous example the indirect effect for Intercepts is now 0.221, direct effect is 0.080 and therefore the ratio is .221/.301 = .73 so the mediated effect of sleep explains about 73% of the total effect of levels of technology use on levels of life satisfaction (without cross paths say from intercept to intercept to slope). Many thanks 


First post: Because the 3 variables (IV, mediator, and DV) have different growth models, I would choose the time scores to give easy interpretation. So for the DV I would use 0, 1, 2. Having 0 for its first time point makes the intercept interpretation clear. Second post: This is a good question for SEMNET. 


Dear Bengt, Many thanks for your support. I have gone back to my models and redone them thanks. When I interpret the intercepts i just note that IV and Med levels associated with levels of DV one year later.For slopes changes in IV and Med associated with subsequent changes in DV. Is this correct? Thanks again for taking the time and posted on SEMNET.... haven't heard anything yet. Kind Regards, Lyn 


Try SEMNET also for these kinds of interpretations. 

Nick posted on Friday, May 06, 2016  2:14 pm



This may seem very basic, but I'll ask anyways, as I'm a bit of a LGM neophyte. I'm putting together various PPLGM's. While fitting the unconditional models, I've set the slope growth factor for the mediators at [0,1,2,3,4]. This is because there are 5 evenly spaced timepoints and I expect early growth that continues linearly. For the outcomes, I've set the slopes at [0,0,1,2,3], because I expect later change that grows linearly. Am I doing this correctly, or should they both be [0,1,2,3,4]? Thanks in advance. 


It sounds like you have 2 growth processes analyzed together. You don't have to have the same time scores for them. 

Nick posted on Monday, May 09, 2016  3:25 pm



Yes, thank you. I expect that an improvement in the mediator will precede an improvement in the outcome. It's a parallel process model. So, would I be correct in understanding that a [0,0,1,2,3] timescore would reflect this prediction? So that I expect no significant change from baseline to time 2, but linear change after that? 


Seems reasonable. 

Djangou C posted on Sunday, October 30, 2016  5:38 pm



Dear Dr Muthen, I would like to cite: Videos and Handouts for Mplus Short Courses (topic 3). I was wondering how to do this? I was also thinking that it would be easier to include this information somewhere within Videos and Handouts for Mplus Short Courses. Thank you. 

Jon Heron posted on Monday, October 31, 2016  8:33 am



I've referenced Topic 3 before in a publication. And am pretty sure I asked the same question on here before I did: Muthén LK, Muthén BO (2010) Mplus short courses (Topic 4). Growth modeling with latent variables using Mplus: advanced growth models, survival analysis, and missing data. Available from https://www.statmodel.com/download/Topic%204new.pdf 

Jon Heron posted on Monday, October 31, 2016  8:33 am



Hmm, that is blatantly topic 4. You get the idea though. 


Looks good to us. 


Can variables in the model be used to replace the values specifying time points that is X1@y1 instead X1@0 or X1@*, as a way of calculating intrasubct correlations between two lists of variables? I am looking for an easy way to compute within subject correlations for data that is structured on a single record. I can find no macros for this in SPSS and hand coding the the computations is daunting. The context is this. A single subject evaluates 10 physical symptoms on the extent to which they are bothered by it, and the extent to which they believe others are bothered by it. It is theoretically interesting to know if people believe the things they had more of are more common than the things they report having less of. 


The @ and * option cannot be followed by another variable name. It must be a number. 


Hi, I'm wondering if you have any recommendations for carrying out a GMM with continuous time. I have repeated measures at the daylevel over a long period of time resulting in 1,705 repeated measures with lots of missing data. Can you provide any recommendations for the best way to approach this many repeated measures? Thank you very much for any advice you can provide, Becky 


You can do GMM as a Twolevel, long format analysis (time and person) in which case the latent class variable should be declared on the Between= list given that level2 is person. Or, you can wait for Version 8.  but that is a long wait. 


Thanks for your help and can't wait to check out Version 8!! Can you by any chance recommend a good reference for the 2level model with the latent variable declared on the between line, just so I can read about it/follow along a little more? Thanks again! 

Joey Fung posted on Wednesday, February 08, 2017  11:56 am



Hi, I am wondering if there is a maximum number of timepoints for running growth mixture modeling. We have approximately 80 participants who completed daily diary for 14 days. Is it feasible to run a GMM with 14 timepoints? If not, what is your recommendations for alternative ways to handle the data? Thank you very much. 


Yes, 14 time points is fine for GMM. 

chioma nwaru posted on Thursday, February 09, 2017  12:34 am



Dear Bengt & Linda, I am using a longitudinal study with 4 wave response of a same cohort on disability which is on the scale of 010 on each wave, I am trying to find the latent classes for disability. I just wanted to know if I am using the right model, i.e. LCGM. Disability variable of different waves are coded as: dis1 (1988) n=5500, dis2 (1995) n=4500, dis3 (2002) n=3500 and dis4 (2014) n=2900 and should I use any other variable here? If I am using a wrong model or command, could you suggest a right one (or from Mplus guide). t1 t2 t3 t4 are time in years for each waves respectively; female age occup are baseline variables (from 1983). (Every time I run the below command it gives the results, but it also says input reading terminated normally) Thanking you! With regards, 


This general analysis question is better suited for SEMNET. 

M. Howland posted on Sunday, March 05, 2017  1:12 pm



Hello, I am modeling scores from pre to post experiment (so only two time points), with the primary hypothesis being that scores will change from pre to post based on several timeinvariant predictors. Unfortunately there are pretty substantial ceiling effects in the pre/post test scores, so I was hoping to account for this. I've been looking into fitting a linear growth model for a censored outcome, and predicting intercept and slope from several timeinvariant predictor variables. I understand from this thread that a minimum of 4 timepoints is recommended to estimate a growth model, so is it impossible to run the model I am suggesting? Do I have any other options to look at change over time with censored data? Thanks in advance. 


2 time points doesn't really afford a growth model. You can do a very limited version of a growth model where the linear slope has zero variance (and therefore zero covariance with the intercept), that is, everybody changes the same amount. 

M. Howland posted on Sunday, March 05, 2017  3:38 pm



The problem of course is that is our main question of interest, e.g., what variables explain variance in change from t1 to t2. Are there any options which would allow me to model change between two time points while accounting for ceiling effects? 


I have a question regarding the necessary amount of time points to estimate a growth model. More specifically, I want to estimate nonlinear growth models with free time scores for four different variables (seperately). My models are, for example, specified as follows: I S  EL_W1@0 EL_W2@1 EL_W3*; When running the models I encounter different problems. First of all, my models are saturated. Secondly, some models give errors. For example: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 9, S Because I encounter all these problems while estimating the non linear growth curves with free time scores, I wonder whether it is possible/justified to run such models with only three time scores? I read that no model modification would be allowed with three timepoints unless certain other restrictions were placed on the growth model. Which restrictions could be justified? Can you give some suggestions? Thank you very much in advance! 


It seems that your H0 model would have 9 parameters and your H1 model also 9 parameters (check that this is the case). So I don't see offhand what the problem is. If you like, you can send the output to Support along with your license number. 


Hello, I try to run a growth mixture model. I have three time points. Which restrictions can I do to get an overidentified model? Can I analyze a GMM with latent variables as time points of measurement (three time points which four indicators each)? Thank you in advance 


Q1: Run the GMM and see what your restrictions are (TECH1 shows this most clearly). Q2: Yes. 


Hello. I am trying to follow Ex. 6.12 in the User's Guide to estimate a model with individuallyvarying times of observation. This is a very basic question, but what exactly would the TSCORES (a11a14) be? I have participants who varied in age at baseline, and were followed approximately every 6 months, but there was range in this and I'm trying to account for the differences in the followup length. If I centered on average age at baseline, then a11 would have nonzero values. 


As an example, a11 would be a participant's value on the centered age variable at baseline. 

Yoon Oh posted on Monday, June 19, 2017  10:34 am



I'm trying to specify latent growth models with individuallyvarying times of observations. But using TSCORES does not provide usual fit statistics such as RMSEA and SRMR. In this case, what would be the best way to compare model fits? Should I just use loglikelihood value? 


Use the logL to compare with neighboring models, that is, models that relax some key restrictions of the model you are considering. 


Hi all, I have a dataset of 230 people with a baseline survey and 24 monthly followups, so a possible total of 25 time points. The DV has a range from 042. The mean is My ultimate goal is to model change in the DV over time, and eventually to use baseline info to predict that change. As such, I am using a generalized linear mixed effects model. When I make a spaghetti plot of the individual growth curves, I see lots of heterogeneity. I know that this is a problem, because it is difficult to combine so many different lines in a single estimate (or a few, in the case of fixed slope, random intercept, random slope, quadratic term, cubic term, etc). How should one approach a mixed effects model with so much heterogeneity and so many time points? Is it feasible to do this? Should I pick specific time points (0,4,8,12,16, for example), and model the change between those time points? I would prefer not to, as I have significant amounts of missing data and want to use all cases with at least 3 time points. Should I be trying a different approach? Thanks, Jon 


This general analysis strategy question is a good one for SEMNET. 


I have 200 people with 17,000 observations total with a maximum of 240 observations nested within a person. I want to see if the slope decreases significantly over time. Should I use Ex. 930 in the User's guide and read the "Intercepts" part of the output for S in order to get the mean of S? 


Note that S in ex9.30 is the random autoregressive slope. It is not the trend over time that it sounds like you are after. For trends, ex 9.39 is more to the point, but before delving into that I would recommend that you watch the "Dynamic Structural Equation Modeling (DSEM)" videos that we recently posted from our August workshops. Parts 3 and 4 deal with trends. See Topic 12 of our Short Course web page at http://www.statmodel.com/course_materials.shtml 


Thank you Bengt, that is really helpful! 

Jan posted on Wednesday, October 25, 2017  4:14 pm



I have a general question on historical data. E.g. hospitalizations or services people received in hospitals. Such data have long format, nested within individuals, and all timepoints are irregular. People have different dates of different services or procedures that they underwent, and dates are unique to individuals. How can this type of data format be currently analyzed with MPlus? Also, when one outcome is longitudinal e.g in this historical dataset people had the same blood test, and there were lots of different, irregular occasions, how to model a continuous result of the test simultaneously with a censored outcome and testing for covariance between the continuous and the censored time to event variable? all dates are nested within people and unique for each individual. Finally, will there be Bayes estimation of timetoevent outcomes? 


Longformat, twolevel modeling as a function of time is available in Mplus. Time is represented as a variable and the outcome regressed on time can have a random intercept and random slope which vary over subjects on level 2. Twolevel continuoustime survival analysis is also available and can be combined with another outcome with a residual correlation between the two processes accommodated by a factor influencing both. Bayes for survival analysis is on our todo list. 

Jan posted on Wednesday, October 25, 2017  5:14 pm



This is absolutely fantastic, especially the last one. Thank you very much. 


"Bayes for survival analysis is on our todo list." I am curious: a joint growth + discretetime survival model using Bayes estimation converged without issues. Do you suggest to treat model results with caution until Bayes for survival analysis is officially supported? 


That statement was referring to Bayes for continuoustime survival. Discretetime survival is just mixture modeling. 

Stefan Kamin posted on Wednesday, November 08, 2017  5:39 am



I see, thank you! 


Hi everyone, I recently collected data on white blood cell release of 3 separate proteins (that are highly related) at 4 time points: Protein 1: 2, 24, 48, 72 hrs. Protein 2: 2, 24, 48, 72 hrs. Protein 3: 2, 24, 48, 72 hrs. As I am not concerned about change over time for this analysis, I was wondering if it would be appropriate to use TYPE = COMPLEX and treat each participant as a cluster, such that there would be 4 measurements of each protein for each individual. Next, I would then load each of the 3 proteins onto a latent construct of overall protein release. Model fit statistics suggest adequate fit when I do this, but I want to make sure my results are valid. Thanks! 


This is fine. 


Thank you! 


Greetings, I run a twolevel growth curve model. In my first level of analysis I included no covariates. I am interesting in getting estimated means of the intercept and slope for both the within and between cluster components of the analysis. However, the output only prints means of the intercept (ib) and slope(sb) for the between cluster component. Is there anything I can do to obtain means for the intercept and slope for the within cluster component of the analysis as well?. 


There is only one set of means and we put them on the between level. This is in line with regular multilevel growth modeling like in the RaudenbushBryk book. 


I am running a 2 class model with covariates. The output provides odds ratios for class 1 only. How can I obtain odds ratios for the opposite class? 


You can express any odds ratio using Model Constraint with parameter labels, first expressing the outcome probabilities, then the odds and then the odds ratios. Or you can switch the classes in a second run by giving starting values based on the first run, with rearranged classes. 


Thank you for the response above. For the second option, can you give an example of what the syntax would look like for a second run that switches classes? 


Use SVALUES in your first run to save the estimates. Then switch classspecific estimates around for a second run. For instance, if your latent class indicators are binary/continuous and you have 2 classes where the first class has indicators with high threshold/high mean estimates and the second class low estimates, you switch by letting the second class have starting values from the SVALUES section corresponding to the high values and the first class the low values. In this second run you use Starts = 0 and you make sure that the loglikelihood value is the same as in the first run. There are UG examples for LCA that show you how to give starting values. 


Thank you, this worked! 


Hi, I am trying to run a linear and quadratic growth model with the following syntax; MODEL: i s ls_08 AT TSneg4 ls_09 AT TSneg3 ls_10 AT TSneg2 ls_12 AT TSneg1 ls_13 AT TS0 ls_14 AT TS1 ls_16 AT TS2 ls_17 AT TS3; sq  ls_08 AT TSneg4sq ls_09 AT TSneg3sq ls_10 AT TSneg2sq ls_12 AT TSneg1sq ls_13 AT TS0sq ls_14 AT TS1sq ls_16 AT TS2sq ls_17 AT TS3sq; !means [ls_08  ls_17@0]; [i*]; [s*]; [sq*]; !variance and covariance i* s* sq@0; i WITH s; i WITH sq@0; s WITH sq@0; !specific variance ls_08ls_17*; The TSCORE variables have been computed to reflect the individually varying time scores. TSneg4sq etc reflects the squared TSCORES. However, the model is not running. I keep getting warning messages to inform me that the model estimation did not terminate normally. Any help would be appreciated! Thanks, Sophie 


In your statement sq  ls_08 AT TSneg4sq ls_09 AT TSneg3sq ls_10 AT TSneg2sq ls_12 AT TSneg1sq ls_13 AT TS0sq ls_14 AT TS1sq ls_16 AT TS2sq ls_17 AT TS3sq; sq will not be interpreted as a quadratic growth factor. It needs to appear in 3rd place before the  Like: i s q  ... 


Hello, In which case, how am I able to set the quadratic growth factor to individually varying time points? Thank you, Sophie 


Just say i s q  ls_08 AT TSneg4 ls_09 AT TSneg3 ls_10 AT TSneg2 ls_12 AT TSneg1 ls_13 AT TS0 ls_14 AT TS1 ls_16 AT TS2 ls_17 AT TS3; 


Ah great, this has worked! I was wondering if it were possible to get fit indices (e.g., Chi Square, RMSEA, SRMR) when using this script; ANALYSIS: TYPE = RANDOM; ESTIMATOR=MLR; H1ITERATIONS=10000; ITERATIONS=20000; COVERAGE=0; MODEL: !define level factors i s q  ls_08 AT TSneg4 ls_09 AT TSneg3 ls_10 AT TSneg2 ls_12 AT TSneg1 ls_13 AT TS0 ls_14 AT TS1 ls_16 AT TS2 LS2017 AT TS3; Again, any help would be appreciated! Sophie 


These fit indices are not available for models such as this because the covariance matrix and mean vector are not sufficient statistics. Instead, you can work with neighboring models  models that are slightly less restricted  and see if extra parameters are significant. 


Okay, thank you very much for your help! 


Hi  I understand the basis for not being able to identify a latent growth model with a quadratic function when there are only x3discrete time points (unless imposing some restrictions). However out of curiosity I ran a model with a quadratic using TScores and the model identified (i.e. i s q  y1 y2 y3 at t1 t2 t3). Is the model spurious. Or am I missing something obvious in terms of why it identified with TScores though there are only x3 repeated measures of the dependent variable? Also in a growth model of this nature is any attention given to the condition number when considering competing nonnested models? 


Perhaps the TSCORES are not the same for all subjects which implies more than 3 time points. Condition numbers less than 10 to the power of 10 suggest illdefined models (little information about the parameters in the data). 


Thanks Bengt  to followup  each person has three repeated measures and has been assigned a TScore in years based on the time elapsed from the first subject's assessment e.g. subject1 has TScores of 0.0, 4.1, 8.5y while another subject has TScores of 1.2, 4.4, 9.7y ... does this seem intuitive, and is it an okay to code the TScores like this? I had wondered about the condition number as the "I and S" model using TScores identified with a condition number of 0.246E03, while adding a "Q" to this model gave this condition number: 0.118E06 ... however the model with the addition of the "Q" gave a lower AIC, BIC, and adjusted BIC than the "I" and "S" only model 


Q1: Seems fine. Q2: The comparison of condition numbers if not really informative  you want to only look out for very small ones (and Mplus then warns you). 


Thanks for this Bengt 


I am running a 2 class longitudinal model (4 timepoints) with covariates (I posted earlier about this analysis). I wanted to check some of my findings. I was originally doing this analysis using Version 8. I tried to rerun my syntax using Version 8.1. For some reason, Mplus is now not including the missing data on my dependent variables (FIML was used to handle missing data). I looked at the Addendum for Version 8.1 and tried to follow the instructions to correct this issue by including the statement below, but it did not work. y2sasall y3sasall y4sasall ON y1sasall; What modifications do I need to make to my syntax to bring in the missing data? 


Send your output to Support along with your license number. 


I am running a GMM with free variances and using a variable which we measured on 12 timepoints (every 3 months), starting from quarter 7 to quarter 18. As the first time point is quarter 7, I included: quarter7@7, quarter8@8 and so on. If I change the input to quarter7@0, quarter8@1 and so on, I notice that the model changes. Could you explain to me how this happens and what I should put in my input? (start from quarter7@0 or @7)? 


The choice of the time point with the zero time score influences the interpretation of the growth factors. See the video and handout for our Topic 3 Short Course on our website. 


I will check this video and handout on your website thanks! If I use timepoints 7,8,9,10 until 18, (instead of starting from 0), my model (GMM free variances) seems stronger and more plausible than starting from 0. Is it justified to start from 7 instead of 0? 


That muddles the interpretation of the intercept and growth factors. 


Hi, A quick question: we are using timevarying and timeinvariant predictors of change on a latent variable growth model. Additionally, we are using TSCORES to account for the individually varying timepoints. Earlier in this feed you state "When you use the TSCORES option you are using a covariate, namely time." Can you explain how this model is taking the TSCORE variable and using it as a covariate? I can not find an explanation of this. Thank you! 


Take a look at UG ex 9.16. The time variable in that example plays the role of TSCORES in that the time variable values vary across subjects. 


Got it! Thank you. 


I am running a 3 class model with covariates. The output provides odds ratios for class 1 and class 2 in relation to class 3. I would like to obtain odds ratios for class 1 and 3 in relation to class 2. How can I obtain these alternative odds ratios? 


Send output from version 8.1 to Support along with your license number. 


As stated above, I am running a 3 class model with covariates. The output provides odds ratios for class 1 and class 2 in relation to class 3. I would like to obtain odds ratios for class 1 and 3 in relation to class 2. How do I manually change the reference class in my syntax to obtain these? 


This will be available in version 8.2. You can either use SVALUES to give start values that reflect the order of the classes you are interested in (using STARTS=0 and making sure to get the same best logL). Or, use the printed logit estimates to compute OR CIs using the FAQ on our website: Odds ratio confidence interval from logOR estimate and SE 


As instructed by your Support line, I used the SVALUES option of the OUTPUT command to get input with ending values as starting values. I modified this input so class 3 is has a class 1 label and class 1 has a class 3 label. I used this input with STARTS = 0; I didn't use the part of the input for the means/intercepts of the categorical latent variables in the model. This worked. Thank you. 


we are fitting a latent growth model to a longitudinal data set with 14 time points in which the first 9 points are each separated by approximately 6 months from the preceding time point and in which the remaining 5 time points are separated by 12 months from the preceding time point. When we use # of months from the beginning of the study to code the linear slope (0, 6, 12, 18, etc.) and include a quadratic factor in the model, the model won't converge. When use use # of years from the beginning of the study to code the linear slope (0, .5, 1, 1.5, etc.) the model will converge even with a quadratic factor in the model. We are fine with the latter coding of time but are curious as to why the former coding of time doesn't work. Any insight you can share about this will be most appreciated. 


High values for the time scores together with a quadratic model squaring those values can lead to numerical difficulties that you can avoid by doing what you did or by dividing the time scores by 10. Your coding is easiest to interpret. 

Kiki posted on Tuesday, January 08, 2019  6:01 am



I read that a minimum of 4 measurement points in longitudinal growth modelling is recommended because (1) with less than four timepoints it is not possible to identify enough parameters in the growth model to make the model flexible and (2) four timepoints give more power. Question 1. Are 4 measurement points a 'rigourous' minimum, or can longitudinal mixture modellling also be employed with 3 measurement points when just those 3 are available? I understand that another advantage of at least 4 measurement points, is that it enables one to take into account quadratic growth. Question 2. What are the advantages of taking into account quadratic growth, beyond linear growth? 


Q1: At least 4 time points is desirable but 3 works if one can assume that the model doesn't need much addition. Q2: The substantive phenomenon that you are studying may require a quadratic growth model. 

Lucien Xu posted on Tuesday, May 07, 2019  12:55 pm



Hi, I am running a latent growth model with a distal categorical outcome. I conducted two models with latent intercept be fixed at different timepoints and used the latent intercept and slope to predict my outcome. I am wondering why the odds ratios I got for the latent intercept from two models are the same while the odds ratios for the slope are different. Should not the odds ratio for the slope from two models be the same? A part of the syntax is : (model 1) i s t1@0 t2@1 t3@2.25; so on i s; (model 2) i s t1@2.25 t2@1.25 t3@0; so on i s; Many thanks 


We need to see your full outputs to say  send to Support along with your license number. 

jhodel posted on Friday, May 17, 2019  8:22 am



Dear all, I would like to perform GMM on longitudinal data about persons undergoing clinical rehabilitation. The study design includes 4 repeated measurement time points at 1month (tp1), 3 months (tp2), 6 months (tp3) post diagnosis and at discharge (tp4). Currently, there are two issues arising according to this design: 1) The measurement time points are “windows” of several days and they are unequally spaced. Fortunately, I have also the exact dates of measurements. Therefore, I would like to perform GMM with individually varying time scores. Furthermore, I plan to use the 3 step approach to identify predictors of trajectories. Are these two approaches (individually varying time scores and 3 step approach) compatible? 2) A lot of persons are already discharged within the measurement windows tp1, tp2, or tp3. Therefore, besides the trajectories with four measurement time points I have also trajectories that include only one, two or three time points (not due to missingness, but due to the study design). Is there a way how to handle this issue in GMM in Mplus without deleting patients with less than four measurement time points? In other words: Is it possible to analyse trajectories with 4, 3, and even 2 (and 1) time points simultaneously and if yes, how? I would be very happy for your recommendations! 


1) Yes, you can either use singlelevel wide format GMM using the TSCORES option or twolevel, long format GMM with mixtures (latent class variable) declared as Between and using time as a covariate. 2) See the paper on our web site: Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with nonignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 1733. Click here to view Mplus outputs used in this paper. download paper contact first author show abstract 

jhodel posted on Monday, May 20, 2019  6:53 am



Dear Prof. Muthén Thank you very much for the prompt response. I would like to ask some followup questions on the two issues and your corresponding answers: 1) What is the difference between the two options you recommended/ is there a specific setting when one is recommended over the other? (singlelevel wide format vs. twolevel long format) 2) I tried to follow the description in your recommended paper and started to fit GMM under MAR for single and multipleclass models using the singlelevel wide format option. Unfortunately, I am neither able to plot the observed, nor the estimated trajectories. At the moment, my code is the following: VARIABLE: NAMES ARE ID age days_t1 days_t2 days_t3 days_t4 y_t1 y_t2 y_t3 y_t4; USEVARIABLES ARE ID days_t1 days_t2 days_t3 days_t4 y_t1 y_t2 y_t3 y_t4; MISSING = all (999); IDVARIABLE IS ID; TSCORES = days_t1 days_t2 days_t3 days_t4; CLASSES = c(3); ANALYSIS: TYPE = MIXTURE RANDOM; MODEL: %OVERALL% i s  y_t1 y_t2 y_t3 y_t4 AT days_t1 days_t2 days_t3 days_t4; OUTPUT: SAMPSTAT STANDARDIZED TECH1 TECH8; PLOT: SERIES = y_t1 y_t2 y_t3 y_t4 (s); TYPE = PLOT3; Is there a way to plot observed and estimated trajectories and even include their confidence intervals? Thanks again very much and best wishes! 


1) Wide format allows for more flexible modeling but can't as handle many time points as long format. See also the videos and handouts of our Short course topics 3 and 4. 2) All the plots in the paper are done in Mplus. We need to see your full output to see the problem you have  send to Support along with your license number. 


Hello I attempting to run a latent growth curve model with 4 time points (the model currently contains no other variables, besides the data entered in the curve). I am receiving the "THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE." error message and the problematic parameter is the intercept. I suspect that the issue is with the data from the first time point as the model convergences when this time point is not included. Do you have any suggestions for how to solve this so that I might be able to include the earlier time point? 


We need to see your full output to say  send to Support along with your license number. 

Back to top 