A minimum of four timepoints is recommended for growth models for two reasons. First of all, with less than four timepoints it is not possible to identify enough parameters in the growth model to make the model flexible. Secondly, four timepoints give more power.
With four timepoints for continuous outcomes, the H1 model has 14 free parameters: four means, four variances, and six covariances. With three timepoints, the H1 model has 9 free parameters: three means, three variances, and three covariances. There are also 9 parameters in the H0 model that need to be identified: the means of the intercept and slope growth factors (2), the variances of the intercept and growth factors (2), the covariance between the two growth factors (1) and the residual variances of the outcome variables (4). There are five other parameters that could be used for model modification purposes: the residual covariances among the outcome variables (3) and the time scores (2). No model modification would be allowed with three timepoints unless certain other restrictions were placed on the growth model.
I have been running a Growth curve model with four timepoints. The first time point is at the beginning of a stay in a rehab clinic, the second at the end of that stay (ca. 4 weeks later), the third one is three month and the fourth six month later. I started my analysis with fixed time scores, centered at the first time point (0,1,4,7). The analysis did not work very well, so I started freeing time scores. The final analysis resulted in the following time scores: 0 1 .8 -1.7. Here is my question: Does it make sense to have negative time scores?
First of all, have you plotted your observed variable means and also looked at a random set of indivdual growth curves to get an idea of what the shape of your growth curve is and also to see if there is variability in your sample regarding the growth curve shape. If there is a lot of variability in the growth curve shapes in the sample, perhaps you need to take this heterogeneity into account. If your growth curve looks linear, I think the appropriate time scores would be 0, 1, 3, 6 if the observations were at the first timepoint and then at one month, three months, and six months. If your average growth curve does not look linear, its shape should be a guide to how the time scores will come out. The estimated time scores above imply that after a linear start (0 to 1), the curve goes down. Perhaps you need to add a quadratic growth factor to your model.
Along the lines of questions pertaining to timepoints in growth curve modeling...
I have previously used SAS PROC MIXED to run growth curve models. With a 3 timepoint longitudinal survey study I found that there was a good deal of variation around the interview dates at each wave. I wanted to account for that variation in my model. I was able to scale time continuously by calculating time from initial baseline event to each interview point for each person. For the questions I was investigating this rescaled time variable improved my model when compared to using a straight 3 timepoint model and it also made it a bit easier to interpret my results in the context of 'years since event' as opposed to change from wave to wave. My question is: can I do this type of growth curve time scaling in Mplus version 2?
This can be handled in Mplus for continuous outcomes by using TYPE=MISSING if there are not too many unique timepoints. The dataset will contain missing values for individuals not measured at particular timepoints. For example, the data would be as follows:
where 3, 5, etc. represent the measurement occasions, and person 1 was measured at times 3, 12, and 28 while person 2 was measured at times 5, 11, and 30.
anonymous posted on Tuesday, July 31, 2001 - 6:48 pm
Is it possible/acceptable that the slope is regressed on covariates measured at the second or later timepoints?
bmuthen posted on Wednesday, August 01, 2001 - 3:48 pm
If the intercept is defined at the first time point, it seems that potential predictors of the slope should be measured before the second time point because the slope describes the change between the first and the second time point.
Anonymous posted on Thursday, August 02, 2001 - 2:54 pm
If the intercept is defined at the last time point, is it accptable?
No. However, remember when you interpret the mean of the slope growth factor, it is the change between time score 0 and time score 1.
Rina Eiden posted on Thursday, December 20, 2001 - 8:27 am
Hi - I am trying to model changes in fathers' drinking over 4 time points (child ages of 12 months, 18 months, 24 months, and 36 months). I ran the original analyses with just a linear component with time scores of 0, 1, 2, and 4. The model fit was very poor. When I looked at the overall sample means, they are as follows: 3.74, 4.05, 1.82, and 2.11 - definitely not linear. There is a great deal of variability in the individual growth curves - with some looking cubic and some quadratic. I re-ran the model adding a quadratic component and the slope for the quadratic factor is significant - but the model fit is still poor. I added the quadratic component by adding quad by alc1@0alc2@1alc3@4 and alc4@16. Is this right? Is there any way to model a cubic curve? I am going to add covariates next, but wanted to make sure I had modeled the curve of the slope sufficiently well before doing that. Thanks so much.
bmuthen posted on Thursday, December 20, 2001 - 12:50 pm
First, a substantive comment. Is there a theory for this development? Are there substantive reasons for the sample mean to go up, then down, and then up again? This up-and-down development is difficult to capture in a growth model.
Next, some technical comments. The quad specification is correct, and a cubic could be added as well. But there may be other reasons for your model misfit. For example, adjacent time points may have correlated errors, or a time point may deviate from the quadratic curve as would seem to be the case given your sample mean development. You may detect this using modification indices. You may also have data that are far from normal, in which case using MLM can give you a better chi-square fit assessment.
I am looking for ways to model longitudinal data with variable time points using a latent growth curve. An attractive feature of the Mx program is that it can accommodate individual slope loadings via implementation of definition variables, special data columns containing fixed parameter values for each individual (Hamagami, 1997; Neale et al., 1999). Operationally, this involves creating a set of slope factor loadings unique to each individual. In the special case of longitudinal research involving age, this is referred to as scaling age across individuals (Mehta & West, 2000). A sample Mx script demonstrating this technique is included in Appendix A of Mehta and West's (2000) paper.
Does Mplus have a similar feature? Yes, I know this is easier to do as a multilevel model, but what about in the LGM context?
I'm not familiar with the Mehta & West paper but it sounds like you mean a growth model where individuals are measured at different times, that is, a model with individually-varying times of observation. Mplus Version 2.1 can estimate this type of model. Example 5 on page 15 of the Addendum to the Mplus User's Guide deals with this model. The addendum can be downloaded from www.statmodel.com under Product Support.
If this is not the model you mean, let me know and I will take a look at the article.
Anonymous posted on Monday, March 10, 2003 - 11:23 pm
Assuming same number of assessments, same timing of assessments across cases.
If the data collection intervals are unequal [say 0 1mo 2mo and 9m], what values should be used fix slope parameters to test a linear model? Can 0 1 2 3 be used, or does this imply equal intervals?
0, 1, 2, 3 implies equal intervals. You can use 0, 1, 2, 9 to represent the unequal intervals.
anonymous posted on Tuesday, July 08, 2003 - 2:30 pm
I am modeling a LGC and I want to include time varying covariates. Is there a maximum number of time-varying covariates that one can include given the number of measurement occassions in the model? That is, with 4 time-points is their a max number of TVC's one should include?
There is no maximum number of covariates you can include as far as the model being identified. For each covariate you add, you bring a variance and covariances with the other observed variables in the model. And you are estimating only a regression coefficient.
anonymous posted on Tuesday, July 08, 2003 - 5:49 pm
Thank you for your response. I'm estimating the regression coefficients of the tvc's, but I also have several "person level predictors" of the intercept and slope. I'm asking because I'm trying to understand how a LGC with TVC's in SEM relates to an equivelent model using Proc Mixed. I'm wondering if having a few measurement occassion (say 3 or 4) nested within individuals presents a problem when trying to assess the impact of TVC's. It is my understanding that when fitting this type of model in HLM or Proc Mixed, the level 1 model is fit for each individual. It seems to me like it would be problematic to have as many TVC's as measurement occaassions. Any insight would be greatly appreciated. Thank you!
bmuthen posted on Wednesday, July 09, 2003 - 11:39 am
You are right that conceptually, the level 1 model is fit for each individual, letting the intercept and slopes vary across individuals. However, the statistical procedure makes use of the fact that these intercepts and slopes come from a single population common to all the individuals so that the statistical procedure does not estimate model parameters for each individual, but instead means and variances across individuals. After model estimates have been obtained, the individual estimates are obtained from the model and the individual's data by a separate empirical Bayes (factor score) step. In conclusion, it is ok to have many tvc's per time point even when having only 3 or 4 occasions.
Lori Weight posted on Thursday, September 11, 2003 - 10:03 am
I have a follow-up question in regard to the one posted on July 8 (regarding the number of time vary covariates). I have a similar model and I'm wondering if with just 3 timepoints I can model the linear trend and two additional time-varying covariates. This model converges for me. I've also run the model in Proc MIXED and it converges there as well. Can I assume that both the SEM and Proc Mixed models are NOT overparameterized...I'm concerned that they might be with only three time points even though the estimation of them looks good.
Lori Weight posted on Thursday, September 11, 2003 - 10:19 am
I should also mention that I'm treating all of the level one effects in the Proc Mixed model as random.
bmuthen posted on Thursday, September 11, 2003 - 10:22 am
Yes, this model is identified in its standard form. With 3 time points the growth model part is identified even without the extra information provided by the covariances between the 2 time-varying covariates (tvc's) and the outcomes. In SEM terms, without the tvc's, you have 3*(3+1)/2 + 3 = 9 pieces of information and 8 parameters. The 2 slopes you add when adding the tvc's are identified by the covariance of the tvc and the outcome. If the model was not identified, the SEM approach would tell you so by not being able to computed SEs.
bmuthen posted on Thursday, September 11, 2003 - 10:24 am
So, the slopes for the 2 tvc's are random. And probably held equal for the 2 tvc's in Proc Mixed. That is also fine in Mplus. Conventional SEM cannot handle a random slope for the tvc's.
Lori Weight posted on Thursday, September 11, 2003 - 11:34 am
Thanks for the quick response. I'm not sure what you mean by "held equal for the 2 tvc's". In proc mixed I have four random effects (int, slp, tvc1, and tvc2) and each is estimated. Is this a problem? Also, do I need to do something extra in mplus in order to treat my tvc's as random, i.e. specify the hierarchical structure (measurement occasion nested within subject)? Thank you.
bmuthen posted on Thursday, September 11, 2003 - 11:40 am
Often, the same random effect is used for tvc1 and tvc2. Mplus can have them the same or different. Having them different is not a problem. The Mplus specification is a standard, single-level model (so no nesting, but a "multivariate approach") specifying analysis type = random to be able to use the random slopes s2 and s3 for the tvc's:
s2 | y2 on x2; s3 | y3 on x2;
Lori posted on Thursday, September 11, 2003 - 2:53 pm
OK, so if I understand correctly, then a RC model specified in Proc Mixed with 3 time points and 4 random effects (int., lin. tvc1, tvc2) is not overparameterized. Is this correct? Thanks so much for helping me with all of this.
bmuthen posted on Thursday, September 11, 2003 - 2:56 pm
no I don't think so. another matter is if all 4 have non-zero variance, but the estimation will tell you that.
Anonymous posted on Thursday, October 30, 2003 - 2:53 pm
I have a 4 point growth trajectory. The data were collected at time 1 and then 3.5 years later, 4.5 years after time one, and 5.5 years after time one. What would be the best way to code time.
Anonymous posted on Friday, October 31, 2003 - 8:04 am
In response to yesterday: That is initially what I used but when I calculated the curve for the group with a value of "1" on a predictor --the resulting curve doesn't map well onto the raw means for the those with value of 1 on the predictor. I thought the problem might be the factor loadings. for example
Raw means start around 3 and increase to around 3.9 (see below) the estimated curve with the above factor loaddings is:
Anonymous posted on Thursday, November 13, 2003 - 10:03 am
I have specified a simple linear growth model where individuals have different times of observation in two ways in Mplus. In the usual way, I have specified the time variable with all possible times and used the MISSING to deal with the fact that not everyone had every time. I have also specified it using T-SCORES and the RANDOM analysis type. The usual way print the usual statistics, chi-squared, etc. to assess fit. The T-SCORES approach does not provide the conventional fit indices. What is the difference? Is the answer the reason why multi-level model programs do not provide those statistics?
Yes, the difference is the reason that multilevel does not print fit statistics. In SEM models, there is one variance/covariance matrix to compare to. With individually-varying times of observation entered as data, you have a random slope multiplying an observed covariate. When such random slopes are involved, the residual variance of the outcome given the covariates varies as a function of the covariates. Therefore, there is no single variance/covariance matrix against which to test model fit.
Anonymous posted on Thursday, November 13, 2003 - 11:49 am
What if there are not covariates involved, as was the case with my model?
bmuthen posted on Thursday, November 13, 2003 - 11:53 am
When you use the TSCORES option you are using a covariate, namely time. So that case falls into the category of changing variances and lack of overall model tests of fit.
Anonymous posted on Saturday, November 15, 2003 - 8:07 pm
I am trying to compare the results of a Proc Mixed model to an MPlus model. The time points are continuous (t1,t2,t3) with corresponding outcomes (y1,y2,y3). I'm starting with one time-invariant predictor variable (x).
I set up the model as follows:
Int Slope | Y1 Y2 Y3 AT T1 T2 T3; Int Slope ON X; Int WITH Slope;
Y1 (1); Y2 (1); Y3 (1);
I am trying to generate a covariance and a single residual variance.
When I ran this model without the x variable, it converged after a long time. With the x variable, it crashed. Have I incorrectly specified this model?
I would need your input and data from Mplus and your SAS output to answer that. You can send them to email@example.com. From what you show, it should not take any time to run. There must be some other complication.
Thank you for sending your data and input. When I run this with or without the x variable, it finishes in less than one second. I notice that you are using Version 2.12. I suggest that you download Version 2.14 which is the current version of Mplus. Note that Mplus and SAS use different approches to weighting. I would do my comparison without weights to be sure you have the model set up correctly (it looks like you do) and then I would add weights.
Mel Dal posted on Sunday, April 11, 2004 - 1:26 pm
I am using Proc Mixed to determine time trends in my dependent variable. However, my observations are unequally spaced at 0, 2 years, 5 years and 7 years. Can I use time as 0, 2, 5 and 7, respectively if I want to consider time as a continuous variable? Also, if I intend to model for within-subject correlation over time, can I use AR(1) to model dependency over time, if I have unequally spaced intervals.
bmuthen posted on Sunday, April 11, 2004 - 2:52 pm
With unequally spaced intervals, perhaps you want to simply correlate adjacent residuals, letting those correlations be different.
You may also consider the Ornstein-Uhlenbeck process which is suitable for irregular spaced observations.
Wim Beyers posted on Thursday, May 20, 2004 - 1:15 am
I have 5-wave data, however not equally spaced in time (particularly the last measure). LGC modeling using intercept (all 1's) and slope (0,1,2,3,7) factors does not fit the data well. Adding a quadratic trend (0,1,4,9,49) makes the fit slightly better, but still not enough. From a repeated measures ANOVA on the same data, a cubic trend was suggested. So, I would like to add a cubic trend. Questions: - Are the loadings of the cubic factor fixed at (1,-1,1,-1 etc.)? - And, what about the last measure, do I use a -1 or a 1 there (given the fact that it was taken 4 years later than the second last)? - And, does modeling a cubic trend require to model also a quadratic trend on the data? I mean, there clearly is a cubic trend, but all results point to the fact that the quadratic trend is not important (both mean and variance of this trend not significant). Thanks
The loadings for the cubic growth factor are the loadings of the linear growth factor to the power of three. I would include the quadratic trend as long as the mean and variance of the quadratic growth factor are not zero. You might want to think about a model with some free time scores instead of a cubic model.
Wim Beyers posted on Friday, May 21, 2004 - 1:51 am
Ok, thanks for the suggestion to take some free time scores. I hesitated to do this at first, but it seems a much better solution, because I do not really have an interpretation for a 'cubic' trend. One more question on these free time scores. I know that I need to fix at least two of them, 0 (at the first measure, if I want to center at the initial level), and another one, for instance 1 (at the second measurement). So: [0,1,free,free,free] With respect to the latter, in terms of fit it does not seem to matter which time score I fix, e.g. [0,free,2,free,free] or [0,free,free,3,free] gives me exactly the same chi-square (and I need to do this because otherwise my model sometimes does not converge)
However, the mean (and variance) of the slope-factor then seems to change when I fix other time scores. Does this mean that the meaning of the slope factor (originally the rate of change between two measures) is also different now? Thanks for all help.
You will get the same fit with 0 1 free free, 0 free 1 free, or 0 free free 1. There are just different parameterizations of the same model. Note that you should use 0 and 1 for the two fixed time scores for reasons of interpretability. The reason that the mean and variance of the slope growth factor change is because the refer to the slope between 0 and 1 and that slope changes depending on where you fix the 0 and 1.
Anonymous posted on Monday, July 12, 2004 - 5:11 am
I have fitted a linear growth model to (generated) data with individually varying points of observation. I used SPLUS, SPSS, and MPLUS. I used Maximum Likelihood estimation, and restricted the variances of the error terms to be equal across time points.
The results of SPSS were exactly equal to those of SPLUS.
The results of MPLUS differed from those obtained with the other two packages. For fixed parameters, the second digit differed. For random parameters, the first digit differed.
As I expected the results of MPLUS to be equal to those for SPLUS and SPSS (as was the case for fixed points of observation), I wonder what may have caused the difference.
Mplus reports the residual variances of the outcome, the intercept growth factor, and the slope growth factor. S-plus and SPSS report the standard deviations. So to compare, you must square their estimates or take the square root of the Mplus estimates. They give the standardized value of the covariance between i and s. Mplus gives the raw value. When these differences in reporting are taken into account, the results do not differ in any important way.
Anonymous posted on Tuesday, July 13, 2004 - 5:23 pm
Could someone clarify the relationship between the single regression coefficient associated with a level 1, time-varying covariate in a multilevel growth model as one might estimate such a model in a program like sas' proc mixed and the multiple coefficients produced in latent growth curve model as one might estimate such a model using mplus where the separate time point values of the dependent variable are regressed on the corresponding time point values of the time-varying covariate?
bmuthen posted on Tuesday, July 13, 2004 - 5:36 pm
There is a difference here between what you can do in conventional SEM programs and what you can do in Mplus. Mplus allows you to do what SEM can do as well as what you can do in conventional multilevel programs such as SAS. SEM cannot have the slope for the time-varying covariate be randomly varying across individuals, but can let it vary across time. In SAS, the slope can vary across individuals (and can also be made to vary across time). Again, Mplus can be used for either.
Shige Song posted on Sunday, October 24, 2004 - 1:22 am
I am very interested in this discussion and want to probe a bit more.
You mentioned that four timepoints are needed because otherwise "it is not possible to identify enough parameters in the growth model to make the model flexible"; in addition, four timepoints give more power. Can you explain in some more details about what kind of parameters are not identified and what kind of flexibies will be missed.
I ask this question because I am doing a growth modeling and unfortunately I have only three time points. I remember that Muthen (2000) uses only two time points to identify a random intercept model. With three time points, it is probably ok to identify a model with both random intercept and random slope. Since the literature on growth modeling is so big and I am very new in this area, it will be greatly appreciated if you can provide some more information (references, explanations, etc.) Thanks!
Muthen, Bengt O. 2000. "Methodological Issues in Random Coefficient Growth Modeling Using a Latent Variable Framework: Applications to the Development of Heavy Drinking in Ages through 18 to 37." Pp. 113-140 in Multivariate Applications in Substance Use Research: New Methods for New Questions, edited by Jennifer S. Rose. Mahwah, N.J.: Lawrence Erlbaum Associates.
With four timepoints and a continuous outcome, the H1 model has 14 free parameters -- four means and 10 variances/covariances. The basic H0 model has 9 free parameters, means and variances of the intercept and slope growth factors, covariance between the intercept and slope growth factors, and four residual variances for the outcome. This leave 5 degrees of freedom to use to modify the model.
With three timepoints, the H1 model has 9 free parameters and the H0 model has 8 free parameters. This leaves only one degree of freedom to modify the model. So four timepoints allows a lot more flexibility to add residual covariances and/or free time scores.
Anonymous posted on Friday, March 25, 2005 - 11:05 am
What exactly is the technic used in the Mplus to handle missind data, say in mixture growth curve analysis?
bmuthen posted on Friday, March 25, 2005 - 3:39 pm
I copy the following from the User's Guide. This applies to all models.
Mplus has several options for the estimation of models with missing data. Mplus provides maximum likelihood (ML) estimation under MCAR (missing completely at random) and MAR (missing at random; Little & Rubin, 2002) for continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types. MAR means that missingness can be a function of observed covariates and observed outcomes. For censored and categorical outcomes using the weighted least squares estimators, missingness is allowed to be a function of the observed covariates but not the observed outcomes. Non-ignorable missing data modeling is possible using maximum likelihood where categorical outcomes represent indicators of missingness and where missingness may be influenced by continuous and categorical latent variables (Muthén et al., 2003). Multiple data sets generated using multiple imputation (Schafer, 1997) can be analyzed using a special feature of Mplus. Parameter estimates are averaged over the set of analyses, and standard errors are computed using the average of the standard errors over the set of analyses and the between analysis parameter estimate variation.
In all models, missingness is not allowed for the observed covariates because they are not part of the model. The outcomes are modeled conditional on the covariates and the covariates have no distributional assumption. Covariate missingness can be modeled if the covariates are explicitly brought into the model and given a distributional assumption. With missing data, the standard errors for the parameter estimates are computed using the observed rather than the expected information matrix (Kenward & Molenberghs, 1998). Bootstrap standard errors and confidence intervals are also available with missing data.
Anonymous posted on Tuesday, March 29, 2005 - 9:18 am
Is there any limitation for the number of covariates? Should the covariates be independent with each other? Thanks
I am running a growth model with 6 time points that are not equally spaced. I am coding time as: iw sw | cigs301@0cigs302@.5cigs303@1 cigs304 cigs305 cigs306 [freeing the last 3 times]. I keep receiving these messages: THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. I would appreciate it very much if you can suggest a way of addressing this issue. Thank you. Below is the input I am using. TITLE: 10th Grade Growth Model
DATA: FILE IS "C:\Documents and Settings\INSTHEALTHSA5\Desktop\Growth10.dat"; FORMAT IS 203F8.2;
Daniel posted on Wednesday, September 28, 2005 - 11:31 am
A reviewer asked my colleague and I about "period effects" in our longitudinal data collecting strategy. Essentially, the reviewer was concerned that by collecting repeated measures each spring semester, we may be encountering effects that are specific to spring. Do you have any suggestions how we may counter this argument? Is there a name for this effect that can aide in my search other than Period effect? Do you believe it is better to model with random instead of fixed timepoints? I'd appreciate any guidance.
The only thing that comes to mind is that in education students may do better in Spring due to a loss of knowledge over the Summer and therefore drop in their scores in the Fall. I imagine it would depend on what you are measuring. Perhaps someone else has some knowledge of this.
bmuthen posted on Wednesday, September 28, 2005 - 9:11 pm
Just to add to this discussion, I wonder why you are thinking of random instead of fixed time points. Random time point would mean that different students are measured at different times irregularly. It seems better to me to measure a couple of times a year, say Fall and Spring - this would still be using fixed time points. I wonder if anything is written under the rubric of seasonal effects in non-educational contexts.
I wonder what would be the maximum number of timepoints to be used in LGM in a reasonable way? Any quidelines for this? For example, if I have data from 100 individuals across 60 measurement points is it possible to use LGM to analyze the data, or should I rather be satisfied with the time series analyses?
bmuthen posted on Tuesday, January 17, 2006 - 6:27 am
I think 60 time points would be a lot for the multivariate, wide data, single-level approach to growth that is usually used in latent growth curve modeling. But you can formulate this as a long data, 2-level analysis as in the multilevel approach to growth. Then cluster = person id and you regress the outcome on time with a random slope. With a single outcome you then have only a univariate problem, where the 60 time points are 60 members of a cluster.
Russell Ecob posted on Thursday, February 02, 2006 - 1:00 pm
I have data (ordered categorical) with 5 time points with a range of explanatory variables. I cannot decide whether to model with up to quadratic terms in the growth model and free some time points (e.g. the last two – I am particularly interested in relationships between variables near the beginning of the sequence) or whether to increase the degree of the polynomial (to cubic). What considerations would decide you on one course or the other? I assume that polynomials with degree larger than cubic cannot be fitted (at present) in mplus.
bmuthen posted on Thursday, February 02, 2006 - 6:22 pm
You can fit polynomials with degree larger than cubic in Mplus, but you have to revert from the growth language back to the approach using BY statements.
Adding free timescores to a quadratic sounds a bit complex, although perhaps necessary. Maybe the quadratic works better with some transform of age such as log age? We have found that in some data where the process increases faster than it decreases.
Ann Helgman posted on Wednesday, February 15, 2006 - 12:05 pm
I have only 3 time points (equal and balanced) but would like to plot a two order growth model (with parameters for the intercept, linear, and quadratic growth). However, I am only interested in making the intercept random. Conceptually, this model makes more sense than a linear model because I do expect subjects to initially increase and then decrease. I know it seems a bit stretched to have only 3 time points and to plot a quadratic growth, but it makes sense, and (in HLM at least) the likelihood ratio test indicates that adding TimeSquared to the model improves the fit of the data- or is that intuitively obvious anyway? I would very much like to proceed with a quadratic growth, with only the intercept random: is there a reason not to do so? Many thanks, Ann
bmuthen posted on Thursday, February 16, 2006 - 6:08 am
That should work. It often makes sense that only the intercept factor is random.
Ann H. posted on Thursday, February 16, 2006 - 8:55 am
Great- I'm confused about what to call something: when only the intercept is random, should I still be calling this approach latent growth trajectory modelling? I think that both the intercept and growth factors must be random in order to call this approach "individal growth"; so I was wondering what would be the correct way to refer to a model with only the intercept random, but with fixed linear and quadratic growth terms: should this be a "population growth trajectory" or something like that, or can it still be referred to as individual growth, albeit constrained to everyone assumed to have the same linear and quadratic rates? Thanks so much for this board! Ann
I have longitudinal clinical data on symptoms of cancer treatment. Individuals were measured several days before the first treatment and then about 48 hours after the second and third treatments. I understand that three time points is not ideal, but it is what I am stuck with. The length of time between treatments (cycle length) can vary but it was typically 14, 21 or 28 days depending upon the chemotherapy regimen. My intention is to center around the second chemotherapy treatment, thus my first time point will be negative. Within patients with the same treatment cycle length, the intervals between time points will be fairly consistent, BUT the intervals will be very different across treatments with different lengths (e.g. 14 day vs. 21, vs. 28 days).
My questions are, Does the variability in the intervals between time points threaten the validity of the linear growth model? AND would I be better to fix the first time point at 0 and then let the other two time points vary across individuals.
bmuthen posted on Tuesday, February 28, 2006 - 3:47 pm
You might want to approach this as a multiple-group growth analysis, where groups differ based on treatment regimen. You would then have different time scores for the different groups. If you believe the same linear model holds for the groups you can impose that restriction - holding growth factor means variances, and covariance equal across groups. This way, the different timings due to different regimens will not be a problem. For growth factors to be the same across groups, however, you have to center at the same time point, so it seems best to center at the pre-treatment time point.
I will be working with a data set of 550 mother adolescent dyads measured at 4 time points over four years. I have the date of the interviews and thus I can compute the intervals between each observation. I anticipate that I will model the data with individually varying time points.
I have 3 questions: My reading of the Singer and Willet text tells me that by using this technique (as opposed to forcing the times to be the same for all participants) will reduce error in the slope. Is this correct?
Also I assume that my reliability of change (Willett 1989) will be higher than with fixed time points. I think this follows from the reduced error in the slopes AND in the larger SST (sum of squares for time)
Finally I wonder if you can point to me some published work which has modeled growth using individually varying time points.
I would agree with Singer and Willet on the first point. I believe the second point is correct. I don't know of any published work that uses individually-varying times of observation although I'm sure there must be many. This question might best be posted on Multilevelnet.
I want to analyze data from a longitudinal study measured at three time points (each nine months apart) in six centres; data is about the influence of therapy (three different therapy types; utilization in the six months before each interview date) on substance use (in the last 30 days before the three time points for four substances) in opioid addicts; all variables continuos (no. of days). About 50 patients per centre have a full data set. I'm thinking about a growth model with intercepts and slopes for each substance and therapy type with regressions of the substance use developments on the intercept of the therapies. I'm afraid the n of each centre is too small for it to be a covariate. Additionaly the growth curve is not quite a linear one. Does LGM seem like a possible solution for this?
Seems like growth modeling is possible here. n=50 would seem large enough for center as a covariate. If your study is a randomized trial, you will find several relevant growth modeling articles on our web site.
In an earlier posting you indicated that "A minimum of four timepoints is recommended for growth models for two reasons. First of all, with less than four timepoints it is not possible to identify enough parameters in the growth model to make the model flexible. Secondly, four timepoints give more power."
Is there an empirical citation you recommend that supports these two reasons?
I don't know of any citation but if you look at the degrees of freedom for a three timepoint model and compare it to the degrees of freedom a four timepoint model, you will see the lack of flexibility for three timepoints versus four. A standard three timepoint growth model has one degree of freedom while a standard four timepoint growth model has 5 degrees of freedom.
Just a conceptual question here, but I was wondering if you could explain why, if the linear trend coefficients are fixed at, say, 0, 1, 2, and 3, the intercept coefficients are all fixed @ 1? (E.g., the parallel processes growth curve modeling example on the website, cont5.std.) If the assumption is that the intercept represents the initial levels (i.e., estimated T1 values) of the variable, shouldn't they be fixed @ "0"?
I have two questions: 1.would a cubic growth model be identified if the outcome variable was observed at 4 time points and the growth factors are regressed on covariates or do you need more time points to achieve identification?
2. how do you interpret a model with free time scores?
suppose your model is MODEL: %OVERALL% i s | out1@0out2@2 out3* out4*;
The output will give you the values of i and s for each individual, and therefore, you obtain the intercept of the estimated curve and the slope or change between the two first time points, but how do you calculate the estimated outcome for the third and fourth timepoints?
1. No, a cubic growth model cannot be identified with four time points.
2. The mean of the slope growth factor is the average growth for the fixed time scores. You can also check if the free time scores deviate from linearity by dividing the difference between the estimated time score and the linear time score by the standard error of the estimated time score. If this is significant, the estimated time score deviates from linearity.
The output does not give individual scores. If you want these, you need to ask for FSCORES in the SAVEDATA command. I'm not sure what you mean by estimated outcomes for the third and fourth time points. If you mean estimted time scores, they are part of the results.
Hi Linda, many thanks for your quick answers. My question about free scores have to do with the fact that I would like to plot, for some of the individuals in the study, the model fitted and the observed curves and compare them by calculating residuals. Even though I can use the intercept and slope given in FSCORES to obtain the values of the individual's fitted curve up to the second timepoint( first time point was zero and second timepoint 2) it is not clear to me how to obtain values of the fitted curve after the second time point if free scores were used in the model for the third and fourth times.
ihat (i) is the intercept growth factor score for indivdual i
lambda (t)is the estimated time score for time t
shat (i)is the slope growth factor score for individual i
Laney Sims posted on Wednesday, February 07, 2007 - 8:13 am
1) I am working on a latent growth model with two measures over three time points, so I have two slope variables and two intercept variables, and regression relationships between them. I would like to isolate individual cases to see if they behave the way I expect (i.e. choose a case with a high intercept for one measure and then look at the slope for the other measure). Is there a way to specify an individual case and look at the values of all the latent variables for that case?
2) Is there a way to find the frequencies of the original measures (# of cases with a response of 1, eg)?
1. You can ask for factor scores in the SAVEDATA command. You can include an id variable in the VARIABLE command. Then you can select people who are high on the intercept factor and save their scores and look at them. You can also plot the intercept and slope growth factors and see which individual each dot represents by holding the mouse on that dot.
2. I assume you mean categorical outcomes. The proportions of each categorical outcome is given automatically. You can also use TYPE=BASIC; for basic sample statistics.
I have a several part question: (1) Is it appropriate to run a four wave model similar to example 6.10 in the user's guide, but without any time invariant covariates? If not, please explain.
(2) If this is fine, what information (other than the fit indexes) do I need to include in a results section? For example, if I presented a figure like the one graphically representing ex 6.10, which standardized coefficient values should I include? I am asking because the output shows relations that I didn't specify in the input. To illustrate, if I were to run ex 6.10 (without the time invariant covariates) the output would show a31 with i; a31 with s; a32 with i; a32 with s, etc. Why is this? Do I need to include these paths in a figure?
(3) Continuing to use 6.10 as an example, one of my models shows that y11 ON a31 and y12 on a32 are significant/have large effects, but y13 on a33 and y14 on a34 are not as large. if the fit indexes are good, are the findings worth reporting? Or do all the effects from the time varying covariates to the outcome variables need to be large/significant?
Thanks, Linda! However, I don't think I understand your answer to #2. If I take the time-varying covariates off of the USEVAR list, then the input won't run.
In regards to #3, I understand that I should report both significant and nonsignificant findings. However, I do not have a theoretical explanation (e.g., intervention between time 2 & 3) for why the time three and time four covariants have different relations with the time three and time four outcome variables (compared to time 1 and 2). Given this, does it make sense to report any of these findings? In other words, does the good fit indexes make the two significant relations worth talking about?
Thank you for your responses, Linda. In growth curve modeling, are outcomes at early time points used to predict later time points? I'm asking because several of my models have great fits but the paths from the covariates to the outcome variables are not significant.
The standard random coefficient growth curve model does not have direct relationships from the outcome at time one to the outcomes at time two etc. This is an autoregressive growth model. I think there has been some work combining these two models.
I don't see the fact that the time-varying covariates behave differently at the different time points as a problem. It may just be that there are different covariates that are important at different time points.
Carl Hauser posted on Tuesday, April 10, 2007 - 8:19 pm
I hope that this is not too much of an off the wall question. I was recently introduced in a very cursory way to the concept of quantile regression. If I'm understanding it correctly, this would be a very useful method for using individual (person) growth data in a descriptive/ diagnostic way. The particular application I saw used this procedure for creating academic growth reference charts - like pediatricians use for determining percentiles of height and weight across time. In searching the posts here, there are none that refer to this. As a very new user to MPlus and to this site, I don't know if this is appropriate to ask BUT, have you given any serious consideration to this (quantile regression) as a feature in MPlus OR can it be done in the current version even though it isn't referenced?
I don't think we can do that in Mplus. You are the first to add this to the wish list for future Mplus development.
Hemant Kher posted on Saturday, April 14, 2007 - 3:40 pm
My question is about the appropriateness of using LGMs for data I have. My unit of analysis is automobiles, and my sample contains 132 automobiles (e.g. Camry, Corolla, Crown Victoria, etc.), with reliability data for models 2001-05. Reliability data comes from annual surveys conducted from 2001-06. For each model year, data comes from the survey conducted from the model year till 06. Thus, for models made in 2001, we have data from the surveys from 2001-06. For models made in 2002, data comes from surveys from 2002-06 and so on. Finally, for models made in 2005, data comes from surveys from 2005 and 2006. Mean reliability for the samples plotted over time shows linear changes, and LGM models fit ... but is the application of LGMs appropriate?
Growth modeling considers changes in individual subjects (such as cars) over time, so you need to have individual-specific measurements at the different time points. I may misunderstand, but it sounds like you have a reliability measure obtained not on each of your 132 cars, but obtained from an annual survey of owners of such cars. If so, that would mean that every Camry made in the same year in your sample has the same outcome in a given year, so no individual variation. So if I am reading this right, this would not be a situation for which growth modeling is applicable.
I'm trying to evaluate a primary prevention program for alcohol misuse in schools. (1) I have non-equidistant measurement points. pre-test, post-test (0.5 after pre-test), follow up 1 (1.5 years after pre-test) and follow up 2 (2.5 years after pre-test). To assume a linear trend, would be "0 1 3 5" the rigth choice (one time score unit = 0.5 years)? (2) If it doesn't fit well (as i would predict), is it possible to free the last two measurement points, even in this case of non- equidistance? (3) Is it possible to add a quadraric trend to this existing trend with free time scores?
I am doing two-part models for semicontinuous data. When I ran the regular latent growth curve models, results suggested that a quadratic model fits that data better than a linear model. How do I specify quadratic growth in the two-part models if my time points are not equal distance?
I would like to model trajectories of change in lung function across four timepoints in a sample of factory workers. The difficulty I face is that lung function was not measured in waves. For example, one worker may have had his lung function measured once one year after he was hired, and then three times during the following year. Another worker may have had his lung function measured at four equally spaced intervals beginning the year that he was hired. I'm not sure what would be the best way to model these data, or whether it is even possible.
Thank you, again, for your response. I tried the approach you suggested. The model ran, but I received the following error message:
THE ESTIMATED COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. COMPUTATION COULD NOT BE COMPLETED. THE VARIANCE OF S APPROACHES 0. FIX THIS VARIANCE AND THE CORRESPONDING COVARIANCES TO 0 OR DECREASE THE MINIMUM VARIANCE.
Hello, a free time scores model fitted in my 4 wave data best. Due to interpretation issues I decided to reparameterize from 0 1 * * to 0 * * 1. I want to test these free time scores with regard to deviance from lin and quad scores. Is this possible with 0 * *1? How would one interpret this compared with such a deviance test in 0 1 * *? E.g., in the last paramet. I would test deviance from linearity in the development from T 2 to T3 to T4 but how would one interpret this testing in 0 * *1? In addition and may be the same direction of thinking... Is it possible to plot the means of 0 * * 1 as in 0 1 * *? Regards, michael
thank's, but I already found the formula in this forum. my question pointed in the direction if one could only test this deviance from linearity in the "normal" parametrization "0 1 * *" or also in "0 * * 1" parametrization (or is this silly?). I know how to interpret the second parametrization but another question arises. You once stated that one could interpret aquadratic factor more as "later development". Does this hold also for he second parametrization, where means of slopes describe average increase or decrease between T1 and T4?
ok, my last post regarding this topic: I asked because I get different results of deviance for both paras. The cause of all these questions is, that a free time score model fitted equal to a quadratic model and both fitted better as a linear (binary part of a two part model). In terms of parsimony (2 df more in a free score model) I decided for free scores (with following interpretation problems if you add covariates and needed reparametrizations). But in the full two part model instead, I have on df more with a quadratic model in the binary part as compared to free scores in the binary part (due to needed specifications for the entire model) and my parsimony argumentation is no more valid. So at the moment I'm thinking of going back to a quadratic model in the binary part and get rid of all the interpretation problems. Am I right?
I have a parallel growth model, in which one development seems linear and the other exponential. Measures are at 0, 3 and 7 months. Can I model one development as linear with time points 0, 3, 7 and the other exponential? Would this than be 1, e^-3, e^-7? I'm not sure how to explain then that both developmental processes are on the same time-scale.
I've used the article of Nilam Ram and Kevin Grimm (Using simple and complex growth models to articulate developmental change).
Can someone recommend an article that presents a complete example of example 6.10 (pp. 95-96) in the mplus manual, fourth edition? This example is "Linear Growth Model for a Continuous Outcome with Time-Invariant and Time-Varying Covariates."
I've been exploring this with 5 time points and it would help me to read a presentation of this approach with real data.
We are conducting a LGM model with 9 waves of yearly data.
We first ran a univariate unconditional model looking at linear and polynomial growth factors. We also examined a model where all of the parameters were freed (0 * * *...1). The freely estimated model provided a much better fit compared to the linear-only model. The quadratic also provided a better fit better than the linear-only. We retained the freely estimated model.
In our LGMMs, we then used the freely estimated growth factors. A theoretically coherent 4-class solution best fit the data, with covariates predicting class membership in expectable ways.
I am now wondering whether our use of freely estimated parameters is justified. Using the polynomial growth factors, we obtained a very similar 4-class solution to the one with the freely estimated model.
Is there a way to adjudicate between the use of the quadratic and freely estimated growth parameters here?
I am doing growth curve modeling with three time points. I have some of the predictors measured just before the last time point. In order for me to be able to include the predictors in the model, should my intercept reflect the last time-point? And, I guess there is no way I can predict the trend (considering the time when the predictors were measured)? Thank you for you help.
I agree on both counts. But you may want to allow the slope to correlate with the predictors.
Bill Dudley posted on Tuesday, April 14, 2009 - 1:19 pm
I have data on how knee joints move and muscles are activated upon landing (imagine a basketball player landing after a jump to catch a rebound). These continuous level data are captured at 1 millisecond intervals and the most interesting part of the landing typically takes about 150 milliseconds to play out.
In the recent Baltimore workshop Bengt indicated that after a few dozen measures, the multivariate approach for growth curve modeling becomes unwieldy. In this case I have over 100 time points (the number of time point varies across individuals). In addition, each person has 5 trials. Thus I am thinking that I need to use a multilevel approach (with two levels- time points within trials and trials within individuals). Will this work within MPLUS (I have the combo version) and if so can you point me to examples or slides within the last two topics that provide an example of how I might model this?
I think you are right to take a long, twolevel approach in this case. Take a look at the UG ex 9.16.
ywang posted on Tuesday, September 08, 2009 - 10:47 am
Dear Dr. Muthen:
I am new to Mplus and have a basic question. I have three time points for Latent Growth Modeling for dummy variables. How many covariats can I have in maximum in order to be identified?
Thank you very much for your help in advance!
ywang posted on Tuesday, September 08, 2009 - 10:50 am
Dear Dr. Muthen:
I forgot to ask another question regarding three-time point latent growth modeling. Can I do parellel latent growth modeling for three time points? If so, can I add covariates and how many covariates can I add into the parellel model?
The number of covariates that you can have is not a function of how many time points you have. Each covariate adds information in the form of covariances with the 3 outcomes, so if you have 2 growth factors (linear growth) your net result is positive.
The answer to your second post is yes and see answer above.
Dale Glaser posted on Wednesday, February 17, 2010 - 1:46 pm
Dear Dr. Muthen(s); I wanted to follow-up on the 2007 inquiry if, as of the most recent version of Mplus, the capability of conducting longitudinal quantile regression. I have a project that may necessitate the use of such a technique, and being a SPSS user the only option is to use the quantreg package from R..............any feedback would be most appreciated.
Dale Glaser posted on Thursday, February 18, 2010 - 3:16 pm
Good queston Bengt! A faculty member at one of the local universities requested help with this technique, and though I know what Quantile Regression is (see Hao and Naiman, 2007) for cross-sectional studies, I am blissfully unaware of this approach to longitudinal studies.
Sara Vargas posted on Wednesday, March 31, 2010 - 1:59 pm
I am doing LGM using 3 time points (0, 6, 12 months) to look for condition effects (control vs intervention) on the slope and intercept of an outcome measure. I estimated the 3rd time point (estimated at 8 months after baseline). The syntax: Intercept by baseline@1time2@1time3@1; Slope by basline@0time2@6 time3*; [baseline@0time2@0time3@0]; [Intercept Slope]; baseline(1); time2(1); time3(1); Intercept on condition; Slope on condition; I want estimated means and standard errors for each group (control and intervention) at each time point. I have re-centered the slope at time 2 and at time 3 and recorded the intercept as the mean. I run into problems when I center on a time 3 that has been estimated. I am first setting the slope @ 0, 6, *, and then re-centering on time 2 such that the slope is set @ -6, 0, *. What is the proper way to re-center at time 3? Also, is there a better way to get estimated means and standard errors for two condition over three time points?
Instead of re-centering, I would express the means in terms of labeled model parameters using the "new" option of Model Constraint. This gives you the estimate and its SE.
So for instance, the mean at time 3 for condition=0 is:
t3mean = = iint + lam3* sint;
where iint/sint is the label for the intercept of the intercept/slope growth factor in its regression on condition.
Sara Vargas posted on Sunday, April 18, 2010 - 10:33 am
Thank you for your response. I am still a little confused about how I am supposed to be labeling the intercept of the intercept/slope growth factors. I tried to do it as noted below, and I keep getting error messages. I also tried numerous derivations based on your response and I continue to get error messages. MODEL: Intercept by baseline@1time2@1time3@1; Slope by basline@0time2@6 time3*; [baseline@0time2@0time3@0]; [Intercept Slope]; baseline(1); time2(1); time3(1); Intercept on condition; Slope on condition; MODEL CONSTRAINT: New(t2mean); t2mean = [Intercept] + 6* [Slope]; New(t3mean); t3mean = [Intercept] + 8.2* [Slope];
I'm having issues with modeling a latent growth curve with achievement data. The time periods of assessment are approximately 0, 2, 4, 6, and 9 years apart, but when I enter those values for the slope, the psi matrix is always NPD. Looking at the curve itself, it appears there is very little improvement from year 6 to 9 in particular (a ceiling effect). So I free the last 3 years, which runs, but the values I get for the slope are not intuitive (from about 2.5 to 3.2 to 3.6). Is it problematic to report or use this as a final model when it doesn't represent the actual timing of developmental growth? Being a developmentalist, it just irks me a bit to use this as the final model. Or is this the best because it does represent the real development in these skills? Many Thanks-
Fixed time scores should reflect the differences between the measurement occasions. If you free a time score, the estimated value shows the deviation from linearity. It looks like you have freed several time scores. Perhaps you should instead consider a different model for your data, for example, a piecewise model.
Regan posted on Thursday, August 19, 2010 - 12:49 pm
Dear Professors, I am new to SEM models. I understand that for growth models, you recommend having at least 4 time points. My main interest is to look at an adult predictor on an adult outcome, controlling for measurements in the predictor taken at one and possibly two antecedent (adolescent) periods. Now, is just putting the predictor variable in the model and specifying paths as usual enough, or is there something else that I need to do to adjust for the 'repeated measures' in the model?
Example if F1 = adolescent predictor F2 = adult predictor F3 = adultoutcomebehavior
can I just use the code:
F3 on F2 F1; F2 on F1;
(Of course there are other covariates in the model, I just wanted to be simple here)
I have been told by someone that if I do this, I am not accounting for repeated measures and that I need to do a multilevel model or growth model. Is this correct? Thank you!
It does not sound like you have repeated measures. Repeated measures is the measurement of the same variable at multiple time points. It sounds like you have several variables for each person which is fine. Multivariate analysis handles this.
Regan posted on Friday, August 20, 2010 - 10:56 am
I am sorry, I wasn't clear. F1 and F2 are the same variable measured first during the adolescent period and then again during the adult period. In that case the same variable (let's say depression) is measured at two time points. Do I need to do anything different than the notation given previously? Thank you and sorry I was not clear the first time.
Nary Shin posted on Sunday, September 19, 2010 - 4:34 pm
I have 3 time point model which is not linear. The trajectory looks like a V and actully it is theoretically correct. Due to this characteristics,I fixed only the time 1 and 2 scores, and use free time score for the time 3. Does it sound ok? Thanks in advance.
We offered parenting groups to 90 parents (for an average of 8 parents per groups) and we would like to show that parenting practices and child's psychological health improved from T1 to T2. I've read above that to do a growth model, I need 4 time points. Unfortunately, T3 and T4 are not yet available. If I run a simple MANOVA (I have 5 dependent variables), reviewers will say that I did not take the nested nature of my data into account.
Is there any way that I can compare T1 to T2 on five dependent variables using multivariate hierarchical analyses? (We were originally planning to conduct a piecewise growth HLM model with 4 time points)
I thank you for you time, it is really appreciated.
Multivariate analysis takes into account the lack of independence due to having several variables per person. Therefore MANOVA would take this into account. You could also look at an intercept only growth model.
That model is indeed not identified because you don't have repeated Level-1 observations on that post-on-pre regression (they repeat on level 2). Two alternatives:
One alternative is to instead write your model as a growth model with 2 time points y1 for pre and y2 for post. But it is a very weak growth model where you cannot identify both intercept and slope variance across parents, but can say fix the slope variance at zero, only estimating its mean. The intercept variance across groups can also be estimated.
If you are particularly interested in the mean of the n1 slope (what you called y100), you can work with a 2-level model where parent is level 1 and group is level 2:
Level 1(Parent level): Outcome= n0 + n1(tprepost) + e
Hi, I have 4 time points with single level. I would like to analysis (growth model or autoregressive model) as in figure 3 or 4 in paper “10 LATENT VARIABLE MODELING OF LONGITUDINALAND MULTILEVEL DATA by Bengt Muthén.” Any suggestion would be appreciate.
I am doing growth mixture modeling on pain measurements taken on hospital patients. Measurements 1 and 2 were taken at fixed time points but the last measurement was taken at time of discharge which is different for each individual. How do I include time in my analysis of trajectories of pain?
I am fitting a growth curve model. The means of my 3 measurements of the concept are: 2.47 (time 1) 2.45 (time 2) 2.57 (time 3)
The difference between time 1 and time 2 is not significant. Te difference between time 1 and time 3 and time 2 and time 3 is significant. I want to fit a growth model that specifies that there is no growth between time 1 and time 2 and growth between time 2 and time 3. Is this possible? I was thinking about this specification: int slp | time1@0time2@0time3@1; But such a model is not identified.
How do you fit a model in which you hypothesize that there is no growth?
How do you fit a model without average growth (insignificant slope) but with slope variance. So the average of the growth over three time points is zero, but between the individuals there is significant variation in growth but they cancel each other out.
You would estimate both the mean and variance of the slope growth factor.
Mark Schultz posted on Wednesday, February 08, 2012 - 9:53 am
Hello: I have longitudinal data on 512 subjects, but only 2 time points. I've plotted the data and it looks like there are several homogeneous subgroups that vary in both intercept and slope. Can longitudinal mixture modeling do anything for me or maybe some other clustering technique?
Vanessa posted on Wednesday, February 08, 2012 - 4:44 pm
I have four time points.
If I specify the slope by T1@0 T2* T3* T4@1, is it correct that this allows for the possibility of either a linear slope (ie. if T2 estimated at .333 and T3 estimated at .6666) or a non-linear slope?
hence, to determine shape of curve, instead of modelling a linear slope and comparing with eg. T1@0 T2* T3* T4@1, I can just model the latter to see which is the best description?
Then if your estimated values are .333 and .666, you have no need for free time scores. You should fix the time scores at 0 1 2 3 or 0 .333 .666 1.
Vanessa posted on Monday, February 13, 2012 - 4:48 pm
It was more a hypothetical situation and question.
I am not sure of the shape of the slope (it is an additional slope present only in one group (intervention scenario)); so instead of trying to fit both a linear and non-linear (estimated time scores) slope and then comparing to see which is best, I thought that a simpler solution is to allow for free time scores (T1@0 T2* T3* T4@1), providing this allows for either a linear or non-linear slope to be estimated...
See the Topic 3 course handout and video on the website. This covers the topics you are interested in.
YKim posted on Sunday, February 26, 2012 - 7:52 pm
Dear Dr. Muthen,
I am trying to compare the results of LGM with time-invariant covariates between AMOS and Mplus. The parameter estimates are almost same up to first two decimal points, but Mplus didn't provide the model fits and standard errors for parameters with following message: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 8.
THE CONDITION NUMBER IS 0.564D-10.
, while AMOS did provide all related information.
Would it be possible to send the data and input file along with AMOS output to you?
Hi, I am working on a longitudinal study with three time points and have been reading about growth mixture modeling. I have recently read several papers that have used three time points to estimate a growth mixture model however noticed that you recommended a minimum of four time points for this analysis. I also note this is the standard for your examples throughout the Mplus manual. Is there a rule of thumb for the number of time points when using Mplus? Also in which ways will I be restricted (if any) if I use only three time points as has been done in the papers I have been reading? Any advice would be greatly appreciated.
The reason we advise at least four time points is because with three, there is only one degree of freedom for making model modifications unless other parameters are fixed or constrained to be equal. This is shown in the Topic 3 course handout on Slides 50 and 52.
L. Siemons posted on Tuesday, July 31, 2012 - 6:33 am
I’m not very familiar with growth mixture modeling yet, but I’m going more deeply into the analysis at the moment. I’m planning to perform such an analysis on my data, but I do have a question about the timepoints which are allowed to use.
I have data available from hospital patients. As you can imagine, the measurement times vary a lot across patients, because they are not measured at the same time. I did read that this is not a problem for the GMM analysis, because I can enter time variables individually for each case in the dataset. However, I also have a different number of measurements per patients. Some patients have 10 measurements spread over a period of a year, while other patients have only 5 measurement points over 5 months, for example.
- Is it a problem that patients do not all cover the same time period? (Some patients are measured over a longer time period than other patients.)
- Is it a problem that some patients have for instance 5 measurements, whereas other patients have 10 or even more?
If these are problems, why are they?
I cannot find anything about this in the literature. Do you perhaps also know a reference in which this is further explained?
You can deal with individually-varying times of observations by using the AT option in conjunction with the TSCORES option. See Example 6.12. This is not a mixture model but can be extended to the mixture case. Regarding a different number of measurement occasions, for the outcome this is done by using missing data. Time scores cannot have missing data but I believe you can assign any value for time scores when the outcome is missing because these don't come into the estimation. You can try this by doing the analysis twice, once using one number and a second time using another to test that there is no difference in the results when you do this.
i am working on a higher order LGM with 5 time points and a quadratic slope. two different codings of time are: 0 1 2 3 4 (model a) and -4 -3 -2 -1 0 (model b). estimated means for the latent variables are: 2.999 2.690 2.601 2.567 2.643.
a substantial difference (change of sign) appears for the slope factor means. model a: I 2.999 S -0.221 Q 0.038 model b: I 2.643 S 0.130 Q 0.038
since the slope mean represents the instantaneous rate of change, what would you say does the positive sign indicate in model b? something like the rate of change "looking backwards" in time?
I am working with an unusual nested dataset. The structure is multiple detention stays nested within juveniles. It is unbablanced in that not every juvenile has the same number of stays. The problem is that while each stay is unique, sometimes the outcome attached to each stay spans multiple stays. That is, my outcome could be the exact same outcome for Stay 1 and Stay 2. Does this pose a problem when analyzing the parameters of a Multilevel Growth Curve Model?
L. Siemons posted on Wednesday, October 24, 2012 - 8:36 am
after performing a GMM analysis on my data, I plotted the sample and estimated means of my 3 groups over time. However, I would like to plot 95% confidence intervals around these curves also (or 95% error bars at the different measurement points for each curve), to enable a better interpretation of the results. Is this possible? If so, how can I do that? Do I have to add something to my syntax?
If this is not possible within the plot, is it then possible to get the intervals in the output, so I can still make this plot on my own? Or do I have to compute the confidence intervals myself? In that case, can you please tell me how do I do this and what kind of output I need for this purpose?
We don't automatically give confidence intervals for this plot. You can use MODEL CONSTRAINT to define the model estimated means at each time point and obtain standard errors and confidence intervals to plot in this way.
L. Siemons posted on Wednesday, December 12, 2012 - 7:51 am
Dear Linda Muthen,
Thank you for the quick response to my question. However, I still not fully understand how I should use the MODEL CONSTRAINT comment to get confidence intervals. I searched for an example in the user guide, but I could not find any. Can you please tell me how I should adapt my syntax to get the confidence intervals?:
TITLE: GROWTH MIXTURE MODELING DATA: file is DAS28OverTijd036912_6wks_FINAL.dat; VARIABLE: names are pnr das0 das3 das6 das9 das12; usevar = das0-das12; missing = all (999); CLASSES = c(3); AUXILIARY = pnr; ANALYSIS: type = MIXTURE missing; STARTS = 100 10; STITERATIONS = 10; MODEL: %OVERALL% i s q | das0@0das3@1das6@2das9@3das12@4; %c#1% i s q; %c#2% i s q; %c#3% i s q; SAVEDATA: file is 3classGMMquadratic.txt; save = cprob; OUTPUT: sampstat standardized CINTERVAL TECH1 TECH8 TECH11 TECH14; PLOT: SERIES = das0-das12 (s); TYPE = PLOT3;
You don't use MODEL CONSTRAINT to get a confidence interval. You use it to get a standard error to use to create a confidence interval of the parameter estimate plus and minus 1.96 times the standard error.
L. Siemons posted on Friday, December 14, 2012 - 2:19 am
Dear Linda Muthen,
okay, I understand that I won't get the confidence intervals immediately, but I will be able to calculate them. But where and, maybe more important, how do I include that in my syntax? How do I define this?
Hello, I am having trouble with a simple growth model with unequally spaced time points. I have data from 4 time points. There is a wide range (10 years) in the age of the participants at the start of the study. I have specified a model using TSCORES defined as the participant's age at each time point:
MODEL: I S | y1-y4 AT age1-age4; I S on x; ANALYSIS: TYPE = RANDOM;
I keep getting the error "*** ERROR in MODEL command The number of fixed time scores is not sufficient for model identificationin the following growth process: I S"
I have tried reducing the number of different ages, first by rounding to integers, but got the same error, then tried rounding in 2s and then 5s, but still have the same error.
Hello, I have only three time points and the data do not fit a linear model. The time points are not equidistant. Measurements were taken when children were 6, 8 and 9 years old. So this is how I modeled it: i s | x1@0x2@2x3@3;
But to fit a nonlinear model, I wanted to try freeing the second time point; should I model it as i s | x1@0 x2@* x3@1; or x1@0 x2@* x3@3;
Also do you have an article/reference suggestion for nonlinear models with three time points?
This is difficult to do because a model with three time-points has only one degree of freedom unless you place other constraints on the model, for example, holding the residual variances equal over time.
The choice of which time point to free should be determined by the part of the model that is of interest. Are you more concerned with the growth from 6 to 8 or 6 to 9. The growth factors will be describing the development you choose.
Thank you so much for the answer, we are interested in 6 to 9 years, then after setting other constraints for the nonlinear model such as holding residual variances equal, should i free time scores as 0 * 3 or 0* 1, that part gets me confused.
Also is there a specific article you would recommend on nonlinear models?
Hi, I want to look at the variability across time (trajectory) of a specific behavior measured repeatedly across time using multilevel modeling. First of all, I want to know if it's more recommended to use the univariate way (long format HLM) when we have a small sample (n < 100) and time varying measurements (nonequaly spaced). Furthermore, I have three different populations (or groups) in my sample and I want to look at the difference between these groups. I have the same data for each of them. Should I used the groups as a "between" predictor or should I used a multiple group function to compare the growth of each group?
Thank you for your help (and I'm sorry for my poor english, I am french).
Unless you have very many time points or individually-varying times of observations (not just non-equally spaced), I would recommend using the wide approach because it can not only do the same growth models as the two-level approach but more general growth models.
When you have a small sample, I would recommend using group as a covariate, so using 2 dummy variables for the 3 groups.
Bayes can have advantages with a small sample size, for instance when the estimate distribution is not close to normal due to the small sample size or when the chi-2 test of model fit is not reliable due to the small sample size.
tmh2013 posted on Thursday, May 09, 2013 - 2:30 pm
I have four time points of data. I ran an unconditional LGM to see if these reports changed over time. I would now like to see if gender differences exist in these reports (perhaps the trajectory is different for boys vs girls). Is it okay to run a conditional LGM that includes gender (0=girls, 1=boys) as a time-invariant covariate? How would running a multigroup SEM be different? Which would be the better approach?
If you want to compare growth models for boys and girls, you should first do them separately for each gender to see if the same model holds for each group. If, for example, both models have linear growth, then you can do a multiple group analysis to compare growth parameters. If not, it does not make sense to compare them.
I'm a beginner using LGC modeling. I have a 4-point data, collected in 1986, 1989, 1994, and 2001. So, I tried time scores of 0, 3, 8, 15 (and 0, .3, .8, and 1.5), but my model failed to converge. But, when I did a random trial with 0, 1, 2, 3, I had a converged solution. What does this mean regarding fixing time scores? Also, I read an article, which Dr. Bengt Muthen did with Curran and Harford (Curran et al. 1998. J. Stud. Alcohol 59:647-658), where it says, "The final factor loadings on the growth factor were 0, 1, 1.54, and 2.46 (where the first two values were fixed and the second two values were estimated from the data and significant at p < .01)" (p. 650). Was it program as "I S | VAR1@0VAR2@1 VAR3 VAR4|"?
I performed a GMM analysis on my data and plotted the sample and estimated means of 3 groups over time. Now, I would like to plot 95% confidence intervals around these curves, to enable a better interpretation of the results.
I read from the forum that you cannot automatically get confidence intervals for this plot and that I should use MODEL CONSTRAINT to define the model estimated means at each time point and obtain standard errors and confidence intervals to plot in this way. However, I already tried many things but I wasn’t successful.
Can you help me out? I have to submit a revision soon and the reviewer wants to see confidence intervals around the trajectories. I can send you my syntax if you want to, so maybe you can modify it the way it should be? I’m really stuck with this, I hope you can help me solve this.
Hello! I have three questions: (1) When I test a nonlinear growth curve model then I always need the linear slope variable (e.g., for exponential, logarithmic curves)? Or do I only need the linear slope variable when I analyse a quadratic curve? (2) I have 7 time points with the following means: T1=8,33 T2=15,61 T3=20,22 T4= 16,23 T5=23,75 T6=20,02 T7 = 21,02. SPSS tells me that an s-curve would fit best for these data. Now I don´t know (i) which parameter fixations I should set and (ii) if i need a linear slope variable as well? I used some freely chosen fixations and the model fits good. However, the fixations are not based on any calculations. Here is my model: i by b2-b8@1; scurve by b2@0b3@2b4@firstname.lastname@example.org@email@example.com@5.8; [b2-b8@0]; [i scurve]; (3) For the model in (2) I get a good model fit however, I get the following warnings: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD ... CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE I. Furthermore, I get no standardized values. How can I handle this problem?
That might be. Although if the fitting of the means gives good model fit and the shape is substantively motivated, it may be accepted. If you want to do it right, you may want to use a non-linear growth model. See 2 articles by Grimm on our website under Papers, Growth Modeling, showing how to do non-linear growth modeling in Mplus.
I am fitting a latent change model with two measurement occasions (Duncan, Duncan, & Stycker, 2010). My goal is testing whether the change factors, intercept and slope, predicts a dichotomous outcome. I estimate the model without any problem. However, when I compare the regression estimates from intercept to the DV, they don't match the results that I obtained from a binary logistic regression where I use the same DV and the T1 measurement of the growth indicator, either in MPlus or SPSS. In the latent change models, the intercept is a significant predictor of my DV (p<.001). In the binary logistic models, the indicator measured at T1 is not a significant predictor of DV. I got the same pattern for 8 different models, and even when I varied the error variances of the measures at a range of 0.00 to 0.90.
Could this inconsistency be a result of the MPlus's default model specifications?
This makes sense if your intercept growth factor is defined by centering time at T1. In that case, the outcome at T1 is the intercept plus a residual. This means that regression on the T1 outcome has an error in the predictor and therefore the regression slope is attenuated relative to regressing on the intercept itself.
The lower the R-square for the T1 outcome as explained by the intercept, the larger the attenuation.
Three time points is not sufficient to understand a non-linear trend.
Yan Quan posted on Wednesday, September 11, 2013 - 9:11 am
Hi Linda, I have a binary longitudinal data measured on daily basic (270 days and 300 subjects). I would like to group the whole population based on their trajectory pattern. I am wandering to use GMM with logistic model, but I found that most of the examples are less than 10 time points. Do you have any experience or suggestions on grouping daily longitudinal data?
I have not done this, but one could make it into a 2-level data set and do growth modeling the long way (see UG ex 9.16) and then add trajectory mixtures by having a level-2 latent class variable (the UG has several examples of this). Scale the time scores to lie between 0 and 1.
I am running a growth model with several time point (9) after the baseline: 1 week, 2 weeks, 3 weeks and 4 weeks then 6 weeks, 8 weeks 10 and 12 weeks.
I was wondering if I have to adjust the model specification for the difference between the first 4 time point and the others. Namely 1 week difference between the first 5 and 2 weeks difference between the others 4. If yes, may you suggest me how?
You would need to specify the model using the BY option. See pages 677-682 of the user's guide.
Matt K posted on Saturday, July 05, 2014 - 5:31 pm
Suppose there is a quadratic conditional model with an outcome measured at 6 non-equidistant time points. Is it a requirement that each case have outcome data for at least 4 time points for the model to be identified?
I am currently wanting to estimate 3 models using different outcomes for change across time using seven predictors taken prior to the first data point.
The first model has 3 time points and I have undertaken analyses on Change difference scores. However the other 2 measures only have 2 time points, may I ask what type of analysis I could use in MPlus?
With two time points, an intercept only model can be identified.
Anne Chan posted on Monday, June 08, 2015 - 2:40 am
Hello, I specified the time loadings of a linear latent growth curve with 4 measurement points as 0, 1, 2, 3. Then I recentered the time loading as 3,2,1,0. The slopes of the two models are the same, which make senses.
However, if I do the same for curvilinear growth curve (growth factors: I S and Q), the quadratic term of the two models (one with time loading as 0,1,2,3 and another as 3,2,1,0) are the same but the slopes are different. May I ask why the slopes are different?
I have 7 time points-- does each person have to participat in at least 3 or 4 timepoints in growth curve model? In one stat class in the past I was told that this is the requirement .. I was not sure..
No, not as long as a good portion of your sample is observed at several time points.
benedetta posted on Wednesday, January 20, 2016 - 2:34 am
Dear professors, I am running LGM for a continuous outcome observed at 6 points in time equally spaced, and I want to study the effect of an exposure only measured at baseline. Outcome is first measured at baseline (same time of observation of exposure). I firstly specified the growth model as follows: i s |mh1@1mh2@2mh3@3mh4@4mh5@5mh6@6; then I changed it into i s | mh1@0mh2@1mh3@2mh4@3mh5@4mh6@5;
The linearity of growth I am assuming is consistent in the two model and results only differ for intercept coefficients, although the difference is very small. I just wanted to ask a theoretical explanation for that. Thank you very much!
Dear Linda: I am trying to estimate a growth curve model and ran into some difficulties. I first estimated an unconditional model (model without covariates) and encountered no problems; the estimates looked plausible. I next estimated the model with time-invariant covariates and it converged without any problem; the estimates also look plausible. However, once I include time-varying covariates into the model, it does not converge. I have tried everything that I know but the model will not converged. I even reduced the time points for the time varying covariates from 10 to 5, but the model still failed to converge. I will appreciate any suggestions on how to proceed. I'd planned to include my model/syntax here, but that will cause my post to exceed the size limit for messages. I can submit it in a subsequent post if that is necessary.
Hello, I am currently doing multivariate latent growth curve analysis examining relations between technology use (IV), sleep (mediator) and life satisfaction (DV) as per MacKinnon model (page 212 in Introduction to Statistical Mediation Analysis). Measures for technology use and sleep have been taken in year 8, year 9 and year 10 and life satisfaction in year 9, year 10 and year 11 so the DV is lagged by one year. Slopes controlled for their intercept. I have bootstrapped the indirect effects and am mainly examining the intercept to intercept to intercept paths (similar to cross sectional mediation analysis with intercept for DV one year later) and the slope to slope to slope indirect paths (with trajectory one year later than IV and mediator)
Do you see any problems with coding the time @0, @1 and @2 for the IV and Mediator LGC and then @1, @2 and @3 for DV.
Hello, For my multivariate LGC mediation model I was asked to determine the size of the indirect effect so I calculated the indirect effect as a proportion of the total effect (if the direct effect was negative and indirect is positive the absolute values were used, page 82 MacKinnon). When I do this I still have the cross paths in the model e.g. from intercept (IV) to intercept (mediator) to Slope (DV) and other combinations of cross paths (intercept to slope to slope). Some of these paths are not significant. I am getting large proportions, for example the indirect effect for Intercepts is 0.206, direct effect is 0.009 and therefore the ratio is .206/.215 = .96 so the mediated effect of sleep explains about 96% of the total effect of levels of technology use on levels of life satisfaction. 2. Should I have taken out the cross paths so I am looking at the total effects just for intercepts to intercept to intercept? When I do this the total effects increases and therefore the proportion decreases as per previous example the indirect effect for Intercepts is now 0.221, direct effect is 0.080 and therefore the ratio is .221/.301 = .73 so the mediated effect of sleep explains about 73% of the total effect of levels of technology use on levels of life satisfaction (without cross paths say from intercept to intercept to slope). Many thanks
Because the 3 variables (IV, mediator, and DV) have different growth models, I would choose the time scores to give easy interpretation. So for the DV I would use 0, 1, 2. Having 0 for its first time point makes the intercept interpretation clear.
Dear Bengt, Many thanks for your support. I have gone back to my models and redone them thanks. When I interpret the intercepts i just note that IV and Med levels associated with levels of DV one year later.For slopes changes in IV and Med associated with subsequent changes in DV. Is this correct?
Thanks again for taking the time and posted on SEMNET.... haven't heard anything yet.
This may seem very basic, but I'll ask anyways, as I'm a bit of a LGM neophyte. I'm putting together various PPLGM's. While fitting the unconditional models, I've set the slope growth factor for the mediators at [0,1,2,3,4]. This is because there are 5 evenly spaced timepoints and I expect early growth that continues linearly. For the outcomes, I've set the slopes at [0,0,1,2,3], because I expect later change that grows linearly. Am I doing this correctly, or should they both be [0,1,2,3,4]?
Yes, thank you. I expect that an improvement in the mediator will precede an improvement in the outcome. It's a parallel process model. So, would I be correct in understanding that a [0,0,1,2,3] timescore would reflect this prediction? So that I expect no significant change from baseline to time 2, but linear change after that?
Djangou C posted on Sunday, October 30, 2016 - 5:38 pm
Dear Dr Muthen, I would like to cite: Videos and Handouts for Mplus Short Courses (topic 3). I was wondering how to do this? I was also thinking that it would be easier to include this information somewhere within Videos and Handouts for Mplus Short Courses. Thank you.
Jon Heron posted on Monday, October 31, 2016 - 8:33 am
I've referenced Topic 3 before in a publication. And am pretty sure I asked the same question on here before I did:
Can variables in the model be used to replace the values specifying time points that is X1@y1 instead X1@0 or X1@*, as a way of calculating intrasubct correlations between two lists of variables?
I am looking for an easy way to compute within subject correlations for data that is structured on a single record. I can find no macros for this in SPSS and hand coding the the computations is daunting.
The context is this. A single subject evaluates 10 physical symptoms on the extent to which they are bothered by it, and the extent to which they believe others are bothered by it. It is theoretically interesting to know if people believe the things they had more of are more common than the things they report having less of.
I'm wondering if you have any recommendations for carrying out a GMM with continuous time. I have repeated measures at the day-level over a long period of time-- resulting in 1,705 repeated measures with lots of missing data. Can you provide any recommendations for the best way to approach this many repeated measures?
Thank you very much for any advice you can provide, Becky
Thanks for your help and can't wait to check out Version 8!! Can you by any chance recommend a good reference for the 2-level model with the latent variable declared on the between line, just so I can read about it/follow along a little more?
Joey Fung posted on Wednesday, February 08, 2017 - 11:56 am
I am wondering if there is a maximum number of timepoints for running growth mixture modeling. We have approximately 80 participants who completed daily diary for 14 days. Is it feasible to run a GMM with 14 timepoints? If not, what is your recommendations for alternative ways to handle the data? Thank you very much.
chioma nwaru posted on Thursday, February 09, 2017 - 12:34 am
Dear Bengt & Linda,
I am using a longitudinal study with 4 wave response of a same cohort on disability which is on the scale of 0-10 on each wave, I am trying to find the latent classes for disability. I just wanted to know if I am using the right model, i.e. LCGM. Disability variable of different waves are coded as: dis1 (1988) n=5500, dis2 (1995) n=4500, dis3 (2002) n=3500 and dis4 (2014) n=2900 and should I use any other variable here? If I am using a wrong model or command, could you suggest a right one (or from Mplus guide). t1 t2 t3 t4 are time in years for each waves respectively; female age occup are baseline variables (from 1983). (Every time I run the below command it gives the results, but it also says input reading terminated normally)
This general analysis question is better suited for SEMNET.
M. Howland posted on Sunday, March 05, 2017 - 1:12 pm
I am modeling scores from pre to post experiment (so only two time points), with the primary hypothesis being that scores will change from pre to post based on several time-invariant predictors. Unfortunately there are pretty substantial ceiling effects in the pre/post test scores, so I was hoping to account for this. I've been looking into fitting a linear growth model for a censored outcome, and predicting intercept and slope from several time-invariant predictor variables. I understand from this thread that a minimum of 4 timepoints is recommended to estimate a growth model, so is it impossible to run the model I am suggesting? Do I have any other options to look at change over time with censored data?
2 time points doesn't really afford a growth model. You can do a very limited version of a growth model where the linear slope has zero variance (and therefore zero covariance with the intercept), that is, everybody changes the same amount.
M. Howland posted on Sunday, March 05, 2017 - 3:38 pm
The problem of course is that is our main question of interest, e.g., what variables explain variance in change from t1 to t2. Are there any options which would allow me to model change between two time points while accounting for ceiling effects?
I have a question regarding the necessary amount of time points to estimate a growth model. More specifically, I want to estimate nonlinear growth models with free time scores for four different variables (seperately). My models are, for example, specified as follows:
When running the models I encounter different problems. First of all, my models are saturated. Secondly, some models give errors. For example:
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 9, S
Because I encounter all these problems while estimating the non linear growth curves with free time scores, I wonder whether it is possible/justified to run such models with only three time scores? I read that no model modification would be allowed with three timepoints unless certain other restrictions were placed on the growth model. Which restrictions could be justified? Can you give some suggestions?
It seems that your H0 model would have 9 parameters and your H1 model also 9 parameters (check that this is the case). So I don't see off-hand what the problem is. If you like, you can send the output to Support along with your license number.
Hello. I am trying to follow Ex. 6.12 in the User's Guide to estimate a model with individually-varying times of observation.
This is a very basic question, but what exactly would the TSCORES (a11-a14) be? I have participants who varied in age at baseline, and were followed approximately every 6 months, but there was range in this and I'm trying to account for the differences in the follow-up length. If I centered on average age at baseline, then a11 would have non-zero values.
I have a dataset of 230 people with a baseline survey and 24 monthly follow-ups, so a possible total of 25 time points. The DV has a range from 0-42. The mean is My ultimate goal is to model change in the DV over time, and eventually to use baseline info to predict that change. As such, I am using a generalized linear mixed effects model. When I make a spaghetti plot of the individual growth curves, I see lots of heterogeneity. I know that this is a problem, because it is difficult to combine so many different lines in a single estimate (or a few, in the case of fixed slope, random intercept, random slope, quadratic term, cubic term, etc).
How should one approach a mixed effects model with so much heterogeneity and so many time points? Is it feasible to do this? Should I pick specific time points (0,4,8,12,16, for example), and model the change between those time points? I would prefer not to, as I have significant amounts of missing data and want to use all cases with at least 3 time points. Should I be trying a different approach?
I have 200 people with 17,000 observations total with a maximum of 240 observations nested within a person. I want to see if the slope decreases significantly over time. Should I use Ex. 9-30 in the User's guide and read the "Intercepts" part of the output for S in order to get the mean of S?
Note that S in ex9.30 is the random auto-regressive slope. It is not the trend over time that it sounds like you are after. For trends, ex 9.39 is more to the point, but before delving into that I would recommend that you watch the "Dynamic Structural Equation Modeling (DSEM)" videos that we recently posted from our August workshops. Parts 3 and 4 deal with trends. See Topic 12 of our Short Course web page at
Jan posted on Wednesday, October 25, 2017 - 4:14 pm
I have a general question on historical data. E.g. hospitalizations or services people received in hospitals. Such data have long format, nested within individuals, and all time-points are irregular. People have different dates of different services or procedures that they underwent, and dates are unique to individuals. How can this type of data format be currently analyzed with MPlus?
Also, when one outcome is longitudinal e.g in this historical dataset people had the same blood test, and there were lots of different, irregular occasions, how to model a continuous result of the test simultaneously with a censored outcome and testing for covariance between the continuous and the censored time to event variable? all dates are nested within people and unique for each individual.
Finally, will there be Bayes estimation of time-to-event outcomes?
Long-format, two-level modeling as a function of time is available in Mplus. Time is represented as a variable and the outcome regressed on time can have a random intercept and random slope which vary over subjects on level 2.
Two-level continuous-time survival analysis is also available and can be combined with another outcome with a residual correlation between the two processes accommodated by a factor influencing both.
Bayes for survival analysis is on our to-do list.
Jan posted on Wednesday, October 25, 2017 - 5:14 pm
This is absolutely fantastic, especially the last one. Thank you very much.
"Bayes for survival analysis is on our to-do list."
I am curious: a joint growth + discrete-time survival model using Bayes estimation converged without issues. Do you suggest to treat model results with caution until Bayes for survival analysis is officially supported?
I recently collected data on white blood cell release of 3 separate proteins (that are highly related) at 4 time points:
Protein 1: 2, 24, 48, 72 hrs. Protein 2: 2, 24, 48, 72 hrs. Protein 3: 2, 24, 48, 72 hrs.
As I am not concerned about change over time for this analysis, I was wondering if it would be appropriate to use TYPE = COMPLEX and treat each participant as a cluster, such that there would be 4 measurements of each protein for each individual. Next, I would then load each of the 3 proteins onto a latent construct of overall protein release.
Model fit statistics suggest adequate fit when I do this, but I want to make sure my results are valid.
Greetings, I run a twolevel growth curve model. In my first level of analysis I included no covariates. I am interesting in getting estimated means of the intercept and slope for both the within and between cluster components of the analysis. However, the output only prints means of the intercept (ib) and slope(sb) for the between cluster component. Is there anything I can do to obtain means for the intercept and slope for the within cluster component of the analysis as well?.
Use SVALUES in your first run to save the estimates. Then switch class-specific estimates around for a second run. For instance, if your latent class indicators are binary/continuous and you have 2 classes where the first class has indicators with high threshold/high mean estimates and the second class low estimates, you switch by letting the second class have starting values from the SVALUES section corresponding to the high values and the first class the low values. In this second run you use Starts = 0 and you make sure that the loglikelihood value is the same as in the first run.
There are UG examples for LCA that show you how to give starting values.
!variance and covariance i* s* sq@0; i WITH s; i WITH sq@0; s WITH sq@0;
!specific variance ls_08-ls_17*;
The TSCORE variables have been computed to reflect the individually varying time scores. TSneg4sq etc reflects the squared TSCORES. However, the model is not running. I keep getting warning messages to inform me that the model estimation did not terminate normally.
These fit indices are not available for models such as this because the covariance matrix and mean vector are not sufficient statistics. Instead, you can work with neighboring models - models that are slightly less restricted - and see if extra parameters are significant.
Hi - I understand the basis for not being able to identify a latent growth model with a quadratic function when there are only x3-discrete time points (unless imposing some restrictions). However out of curiosity I ran a model with a quadratic using TScores and the model identified (i.e. i s q | y1 y2 y3 at t1 t2 t3). Is the model spurious. Or am I missing something obvious in terms of why it identified with TScores though there are only x3 repeated measures of the dependent variable? Also in a growth model of this nature is any attention given to the condition number when considering competing non-nested models?
Thanks Bengt - to follow-up - each person has three repeated measures and has been assigned a TScore in years based on the time elapsed from the first subject's assessment e.g. subject-1 has TScores of 0.0, 4.1, 8.5y while another subject has TScores of 1.2, 4.4, 9.7y ... does this seem intuitive, and is it an okay to code the TScores like this?
I had wondered about the condition number as the "I and S" model using TScores identified with a condition number of 0.246E-03, while adding a "Q" to this model gave this condition number: 0.118E-06 ... however the model with the addition of the "Q" gave a lower AIC, BIC, and adjusted BIC than the "I" and "S" only model
I am running a 2 class longitudinal model (4 timepoints) with covariates (I posted earlier about this analysis). I wanted to check some of my findings. I was originally doing this analysis using Version 8. I tried to rerun my syntax using Version 8.1. For some reason, Mplus is now not including the missing data on my dependent variables (FIML was used to handle missing data). I looked at the Addendum for Version 8.1 and tried to follow the instructions to correct this issue by including the statement below, but it did not work. y2sasall y3sasall y4sasall ON y1sasall; What modifications do I need to make to my syntax to bring in the missing data?
I am running a GMM with free variances and using a variable which we measured on 12 timepoints (every 3 months), starting from quarter 7 to quarter 18. As the first time point is quarter 7, I included: quarter7@7, quarter8@8 and so on. If I change the input to quarter7@0, quarter8@1 and so on, I notice that the model changes.
Could you explain to me how this happens and what I should put in my input? (start from quarter7@0 or @7)?
I will check this video and handout on your website thanks! If I use timepoints 7,8,9,10 until 18, (instead of starting from 0), my model (GMM free variances) seems stronger and more plausible than starting from 0. Is it justified to start from 7 instead of 0?
A quick question: we are using time-varying and time-invariant predictors of change on a latent variable growth model. Additionally, we are using TSCORES to account for the individually varying time-points.
Earlier in this feed you state "When you use the TSCORES option you are using a covariate, namely time."
Can you explain how this model is taking the TSCORE variable and using it as a covariate? I can not find an explanation of this.
I am running a 3 class model with covariates. The output provides odds ratios for class 1 and class 2 in relation to class 3. I would like to obtain odds ratios for class 1 and 3 in relation to class 2. How can I obtain these alternative odds ratios?
As stated above, I am running a 3 class model with covariates. The output provides odds ratios for class 1 and class 2 in relation to class 3. I would like to obtain odds ratios for class 1 and 3 in relation to class 2. How do I manually change the reference class in my syntax to obtain these?
This will be available in version 8.2. You can either use SVALUES to give start values that reflect the order of the classes you are interested in (using STARTS=0 and making sure to get the same best logL). Or, use the printed logit estimates to compute OR CIs using the FAQ on our website:
Odds ratio confidence interval from logOR estimate and SE
As instructed by your Support line, I used the SVALUES option of the OUTPUT command to get input with ending values as starting values. I modified this input so class 3 is has a class 1 label and class 1 has a class 3 label. I used this input with STARTS = 0; I didn't use the part of the input for the means/intercepts of the categorical latent variables in the model. This worked. Thank you.
we are fitting a latent growth model to a longitudinal data set with 14 time points in which the first 9 points are each separated by approximately 6 months from the preceding time point and in which the remaining 5 time points are separated by 12 months from the preceding time point. When we use # of months from the beginning of the study to code the linear slope (0, 6, 12, 18, etc.) and include a quadratic factor in the model, the model won't converge. When use use # of years from the beginning of the study to code the linear slope (0, .5, 1, 1.5, etc.) the model will converge even with a quadratic factor in the model. We are fine with the latter coding of time but are curious as to why the former coding of time doesn't work. Any insight you can share about this will be most appreciated.
High values for the time scores together with a quadratic model squaring those values can lead to numerical difficulties that you can avoid by doing what you did or by dividing the time scores by 10. Your coding is easiest to interpret.
Kiki posted on Tuesday, January 08, 2019 - 6:01 am
I read that a minimum of 4 measurement points in longitudinal growth modelling is recommended because (1) with less than four timepoints it is not possible to identify enough parameters in the growth model to make the model flexible and (2) four timepoints give more power.
Question 1. Are 4 measurement points a 'rigourous' minimum, or can longitudinal mixture modellling also be employed with 3 measurement points when just those 3 are available?
I understand that another advantage of at least 4 measurement points, is that it enables one to take into account quadratic growth.
Question 2. What are the advantages of taking into account quadratic growth, beyond linear growth?
Q1: At least 4 time points is desirable but 3 works if one can assume that the model doesn't need much addition.
Q2: The substantive phenomenon that you are studying may require a quadratic growth model.
Lucien Xu posted on Tuesday, May 07, 2019 - 12:55 pm
I am running a latent growth model with a distal categorical outcome. I conducted two models with latent intercept be fixed at different timepoints and used the latent intercept and slope to predict my outcome. I am wondering why the odds ratios I got for the latent intercept from two models are the same while the odds ratios for the slope are different. Should not the odds ratio for the slope from two models be the same?
Dear all, I would like to perform GMM on longitudinal data about persons undergoing clinical rehabilitation. The study design includes 4 repeated measurement time points at 1month (tp1), 3 months (tp2), 6 months (tp3) post diagnosis and at discharge (tp4). Currently, there are two issues arising according to this design:
1) The measurement time points are “windows” of several days and they are unequally spaced. Fortunately, I have also the exact dates of measurements. Therefore, I would like to perform GMM with individually varying time scores. Furthermore, I plan to use the 3 step approach to identify predictors of trajectories. Are these two approaches (individually varying time scores and 3 step approach) compatible?
2) A lot of persons are already discharged within the measurement windows tp1, tp2, or tp3. Therefore, besides the trajectories with four measurement time points I have also trajectories that include only one, two or three time points (not due to missingness, but due to the study design). Is there a way how to handle this issue in GMM in Mplus without deleting patients with less than four measurement time points? In other words: Is it possible to analyse trajectories with 4, 3, and even 2 (and 1) time points simultaneously and if yes, how?
1) Yes, you can either use single-level wide format GMM using the TSCORES option or two-level, long format GMM with mixtures (latent class variable) declared as Between and using time as a covariate.
2) See the paper on our web site:
Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper. download paper contact first author show abstract
Dear Prof. Muthén Thank you very much for the prompt response. I would like to ask some follow-up questions on the two issues and your corresponding answers:
1) What is the difference between the two options you recommended/ is there a specific setting when one is recommended over the other? (single-level wide format vs. two-level long format)
2) I tried to follow the description in your recommended paper and started to fit GMM under MAR for single- and multiple-class models using the single-level wide format option. Unfortunately, I am neither able to plot the observed, nor the estimated trajectories. At the moment, my code is the following:
VARIABLE: NAMES ARE ID age days_t1 days_t2 days_t3 days_t4 y_t1 y_t2 y_t3 y_t4; USEVARIABLES ARE ID days_t1 days_t2 days_t3 days_t4 y_t1 y_t2 y_t3 y_t4; MISSING = all (999); IDVARIABLE IS ID; TSCORES = days_t1 days_t2 days_t3 days_t4; CLASSES = c(3);
ANALYSIS: TYPE = MIXTURE RANDOM;
MODEL: %OVERALL% i s | y_t1 y_t2 y_t3 y_t4 AT days_t1 days_t2 days_t3 days_t4;
OUTPUT: SAMPSTAT STANDARDIZED TECH1 TECH8;
PLOT: SERIES = y_t1 y_t2 y_t3 y_t4 (s); TYPE = PLOT3;
Is there a way to plot observed and estimated trajectories and even include their confidence intervals?
Hello I attempting to run a latent growth curve model with 4 time points (the model currently contains no other variables, besides the data entered in the curve). I am receiving the "THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE." error message and the problematic parameter is the intercept.
I suspect that the issue is with the data from the first time point as the model convergences when this time point is not included. Do you have any suggestions for how to solve this so that I might be able to include the earlier time point?