Message/Author 

Anonymous posted on Monday, April 12, 2004  9:04 pm



When I use mcohort function, how does MPLUS calculate degrees of freedom? 

bmuthen posted on Tuesday, April 13, 2004  12:16 pm



As in regular modeling. 

guest posted on Tuesday, October 18, 2005  12:51 am



I have longitudinal achievement data containing math and reading scores from 5th, 6th, 7th, and 9th grade. There were four data files. One for 2002 year, one for 2003 year, one for 2004 year, and one for 2005 year. Each data file had 5th, 6th, 7th, (not 8th) and 9th grade math and reading scores. I merged 4 data files into one big file (a long stacked file) and created a cohort variable. Cohort 1: 5th in 2002, 6th in 2003, and 7th in 2004 Cohort 2: 6th in 2002, 7th in 2003, and 9th in 2005 Cohort 3: 7th in 2002, and 9th in 2004 Cohort 4: 9th in 2002 Cohort 5: 5th in 2003, 6th in 2004, and 7th in 2005 Cohort 6: 6th in 2003 and 7th in 2004 Cohort 7: 7th in 2003 and 9th in 2005 Cohort 8: 9th in 2003 Cohort 9: 5th in 2004 and 6th in 2005 Cohort 10: 6th in 2004 and 7th in 2005 Cohort 11: 7th in 2004 Cohort 12: 9th in 2004 Cohort 13: 5th in 2005 Cohort 14: 6th in 2005 Cohort 15: 7th in 2005 Cohort 16: 9th in 2005 A subset of the stacked data file looks like below. StudentID Cohort year SchoolID grade math reading 13641 1 2002 12 5 648 620 13641 1 2003 12 6 698 635 13641 1 2004 28 7 719 661 14293 4 2002 21 9 709 671 11116 4 2002 28 9 794 709 11118 4 2002 32 9 689 663 11918 8 2003 33 9 758 700 13294 8 2003 33 9 716 665 15989 4 2002 28 9 705 700 17117 3 2002 27 7 746 726 17117 3 2004 36 9 739 674 19197 2 2002 19 6 654 639 19197 2 2003 23 7 652 648 19197 2 2005 35 9 705 665 After that, I looked at mplus manual and this website to find out how to set up a mplus input file for modeling growth. I couldn't figure out yet. Could you help me out? Thank you. 


Chapter 6 of the Mplus User's Guide contains many examples of growth models for data in the wide format where data collected for each individual at different time points is represented by a column in the data set. All of these examples and data come on the Mplus CD and are also available on our website under excerpts from the user's guide. It looks like you have set your data up in the long format where data collected for each individual is represented by a diferrent record. Growth models for data in the long format are estimated in Mplus using TYPE=TWOLEVEL: with CLUSTER=ID;. In Chapter 13, there is a description of how to handle multiple cohort data under the section called Missing. I'm not sure if this is what you are interested in but you can read that also. 


Does Mplus allow for Growth Mixture Modelling with longitudinal data collected from multiple cohorts? I have annual data over five years collected from 8,10,12,14,and 16 yearolds at time 1. I was hoping to look at trajectories of count outcomes using the full dataset. Any suggestions would be greatly appreciated. Thank you. 

bmuthen posted on Monday, January 02, 2006  2:36 pm



Yes. There are 2 ways to do this. One is to do a multiplegroup analysis  which in the case of mixtures uses Knownclass  where cohort is group. The other is to string out the data across all ages represented so that data from each cohort has missing at some ages. 

Sarah Dauber posted on Wednesday, November 08, 2006  12:40 pm



Hello, I am interested in conducting growth curve analysis with cohort data. Each person was measured at 3 timepoints, and there are 10 different cohorts. So, altogether I have data from age 12 to age 28. I would like to use the approach of stringing out the data so all ages are represented, rather than using the multiple group method, but there is a lot of missing data this way and I am getting very low covariance coverage rates. Is there a coverage rate that is considered optimal? Also, can you point me to some readings that would provide more info on using MPlus with cohort data? Thank you, Sarah Dauber 


Covariance coverage of zero when the missingness is by design is fine. Other than that, I would recommend no less than .8. I know of no references where Mplus has been used with cohort data. 


Hi. I'm wanting to do growth modeling working with data from Add Health, in which youth were recruited in an agespan from 1120 (or thereabouts), and interviewed at years 0, 1, and 7. I know this is a situation for the cohort analysis, but there is substantial missingness that is not by design, which, if I understand it, is treated by listwise deletion using the cohort commands in Mplus. I'm trying the knownclass = age approach, but the 6 year gap seems to be causing problems. So there is a lot of coverage of zero, as no respondent was measured at age 11 along with any of age 13, 14, 15, 16, or 17. Help? Thanks. 


You might try a singlegroup approach using the AT option to allow individuallyvarying ages at the 3 occasions. 


Didn't think of that, thanks. I'm really hoping for a solution that takes advantage of the accelerated longitudinal design, though, so I can get beyond linear growth. I think I'd still be limited, wouldn't I? Thanks. 


You can do up to cubic with AT. I used this recently for a growth model. 


Great, thanks  I'll give it a try! 

Matt Moehr posted on Monday, March 19, 2007  10:25 am



I used the following code in an accelerated design. X1X3 are the same variables measured at successive time points. Cohort tells the program when the person started, and then I keep it as type = missing. I think you would need to change the lines where I specified the cohort models, e.g.: Model Cohort13: i s  x1@0 x2@1 x3@7; **** Begin Code **** TITLE: GROWTH CURVE MODEL; DATA: file is ...; VARIABLE: names are id cohort x1x3 ; usevariables are x1x3 cohort; grouping is cohort (1=30 2=39 3=46); missing are all(99); ANALYSIS: type = mgroup meanstructure missing ; estimator = ml; iterations = 1000; MODEL: i s  x1@0 x2@1 x3@2; [x1x3@0];!x1x3(1); ![i](2);i(3); ![s](4);s@0; !i WITH s(6); MODEL 30: i s  x1@0 x2@1 x3@2; MODEL 39: i s  x1@1 x2@2 x3@3; MODEL 46: i s  x1@2 x2@3 x3@4; OUTPUT: standardized mod(1) tech1 tech4; PLOT: type = plot1 plot2 plot3; series = x1x3(*); 


Are you trying to do a multiple group approach to cohort analysis or do you want the program to string the data out over time? 


Following up on our exchange on 3/17: Does having so few actual measurement variables have an impact on identification? We got the growth model working, but adding predictors of the growth parameters has been causing nothing but trouble. Thanks. 


I think you said you had 3 occasions but with a lot of missing data. Growth modeling with random intercept and slope should still be supported if enough people have observations on all 3 occasions. I assume you are handling individuallyvarying ages of observation using the AT option per our earlier communication. And holding residual variances equal across time (since different ages for different people at a given occasion). I don't see off hand how adding covariates would make it more difficult unless covariates have a lot of missingness too. 


Thanks, Bengt. Yes, this is with the AT syntax. The outcomes are 3category ordinal; the thresholds are constrained to be equal across time. Is there anything else we should do about that? The covariates are all xvariables, so there is listwise deletion. The sample size is still upwards of 10,000. Is it to time to send this to support? Thanks. 


Please send the input, data, output, and your license number to support@statmodel.com. 


Hello, I am trying to model growth in depression scores over time using the Add Health data. People were assessed 3 times, baseline, one year later, and five years later. However, ages ranged from 1221 at baseline, so that stringing the data out using age as the time variable would theoretically allow you to look at growth from age 12 to 28. When I try to run this (with age as the time variable, data strung out over time), the model doesn't converge b/c there is so much missing data. I have also tried modeling it with just the 3 variables for each person (t1 t2 and t3) and using AT to indicate individually varying times of observation. The model converges this way, but I can't get a plot of the curve across all ages. Is this the correct way to model change in growth across all ages? And if so, how would I get a plot of the curve across all ages? Thanks so much, sarah Dauber 


The two approaches you are using are the same except with the wide format the residual variances are free across time. With AT they are held equal across time. I would try holding them equal across time in the wide analysis and see if that helps. 


Hello, I am using the DATA COHORT to rearrange my longitudinal data so that I can investigate development over age. My data were collected at 6month intervals (baseline, 6m, 12m, 18m, 24m). Mplus requires birth year and measurement year, and the actual ages that I'd like represented are 1318. However, since the waves are in 6 months intervals, I'm not sure how to do that. Therefore, I have set up some arbitrary integers that actually present the data in the age range of 1320. Therefore, I'll just redo the graph so that the X axis shows the correct age. I'm wondering if there's another way of correctly capturing the 6month intervals or if I'm doing the correct way? Here's the current code I'm using: USEVARIABLES ARE copeblwcopefw boys; DATA COHORT: COHORT IS ageby (73 74 75 76); TIMEMEASURES = copeblw (89) copecw (90) copedw (91) copeew (92) copefw (93); TNAMES = copew; ANALYSIS: TYPE=missing H1 meanstructure; MODEL: I S Q  copew13@0 copew14@1 copew15@2 copew16@3 copew17@4 copew18@5 copew19@6 copew20@7; I S Q ON boys; 


I can't think of any other way to do this. 


Thanks! Instead of working in years (see the code above), I worked in months by multiplying all the year values by 12. This way I captured all the 6 month intervals in age growth from 13 to 18 years old (cope13 cope13_5 cope14 cope14_5 cope15 ...). 

Matthew Cole posted on Saturday, September 01, 2007  7:28 pm



I've been using the DATA COHORT to rearrange my longitudinal data, and I received the nonconvergence message below. I am curious if there is a way to get my data to converge so that I can plot the means. If not, at least Mplus is providing the savedata file so I'll be able to put a figure together using another program. THE MISSING DATA EM ALGORITHM FOR THE H1 MODEL HAS NOT CONVERGED WITH RESPECT TO THE PARAMETER ESTIMATES. THIS MAY BE DUE TO SPARSE DATA LEADING TO A SINGULAR COVARIANCE MATRIX ESTIMATE. INCREASE THE NUMBER OF H1 ITERATIONS. NOTE THAT THE NUMBER OF H1 PARAMETERS(MEANS, VARIANCES, AND COVARIANCES) IS GREATER THAN THE NUMBER OF OBSERVATIONS. NUMBER OF H1 PARAMETERS : 209 NUMBER OF OBSERVATIONS : 166 


The message refers to the convergence of the H1 model not the H0 model. This means that you do not get fit statistics. You could try the suggestion of increasing the H1 iterations. It sounds like your sample may be too small for the intended modeling. 


Thanks Linda. That did it! I set H1iterations=10000 and it finally fit. Fortunately with the new Duo processors and setting process=2 the run doesn't take that long. 


I’ve done a multiple cohorts growth curve. In particular I have two cohorts: Older Cohort: 15 years old 1998/ 17 yo 2000/ 19 yo 2002/ 21 yo 2004 Younger Cohort: 16 years old 1998/ 18 yo 2000/ 20 yo 2002/ 22 yo 2004 I want now add predictors and outcomes Can I add them even if each variable is not measured at the same age? For example I want to add as predictor “qda” measured in ’98 (15 year old for the younger cohort and 16 for the older one). Is this procedure correct? This is part of the input USEV ARE sex md98 md20 md02 md04 qdr98 qdd98; Im Sm  md98@0 md20@2 md02@4 md04@6; im with sm (21); Sm (22); im (23); [sm] (24); [im] (25); sm on sex (26); im on sex (27); qda by qdd98 qdr98; im on qda (28); sm on qda (29); qda on sex (30); model older: Im Sm  md98@1 md20@3 md02@5 md04@7; Could you help me??? thanks 


I think you have a good start here. Note that the introduction to your message has older and younger reversed (those who are 16 in 1998 are older than those who are 15 in 1998). You have it right in the model. Your equality restrictions look right. Since you don't measure your qda indicators at the same age for the 2 cohorts, you may want to test that the equalities across cohorts related to this factor actually fit well by also runnning the model with them unequal. Even if they are unequal, the parameters related to the growth factors may be equal. It is of interest to test if parameters related to the growth factors are invariant across cohorts. 


THANK YOU VERY MUCH!!! I'VE DONE AGAIN THIS MODEL WITH THE EQUALITIES AND ADDING THE OUTCOME. ALL THE PARAMETERS ARE INVARIANT ACROSS COHORTS. IN THIS WAY I CAN DISCUSS OLSO THE IMPACT ON THE OUTCOME CONSIDERING THE PREDICTOR. USEV ARE sex md98 md20 md02 md04 qdd98 qdr98 VAS04R; missing are all (99.00); grouping is coorte (1=younger 2=older); Analysis: Type = MEANSTRUCTURE MISSING ; ESTIMATOR = mlR; model: Im Sm  md98@0 md20@2 md02@4 md04@6; im with sm (21); Sm (22); im (23); [sm] (24); [im] (25); sm on sex (26); im on sex (27); qda by qdd98 ; qda BY qdr98 (31) ; VAS04R ON IM (32); VAS04R ON SM (33); VAS04R ON SEX (34); VAS04R ON QDA (35); im on qda (28); sm on qda (29); qda on sex (30); model older: Im Sm  md98@1 md20@3 md02@5 md04@7; 


Hi Bengt and Linda, I am working on a multiple cohorts growth curve, but there is substantial missingness that is not by design. I know your recommendation is to do listwise deletion first, but when I do that I will only keep about 30% of my cases, which is not really an option. My questions are: 1. is it really necessary to do listwise deletion first? 2. with both missingness by design and missingness at random, will Mplus estimate the model incorrect? (though it runs properly). In what way then? 3. if listwise deletion is the only solution, does it matter if I perform listwise myself in SPSS first, or is it better (for model estimation) to let Mplus do that using the DATA COHORT command? I know TSCORES is good way to overcome this problem (that also runs properly), but then I am not able make a plot. Thanks. Sylvana 


I'm not sure why you think we recommend listwise deletion. We don't. You can use the multiple group multiple cohort approach shown in the new Example 6.18 or you can string the data out by age and not use TSCORES. 


I mean that when I string the data out by age, I will get missingness by design next to missingness at random. Is this allowed? (or is listwise deletion necessary?) Using the DATA COHORT command, each observation that does not have complete data is deleted from the data set (page 350 userguide version 4.1), that was the reason why I thought you recommend listwise deletion. I hope you can clear this up for me. Thanks in advance. Sylvana 


You can string the data out with or without doing listwise deletion within each pattern of variables. This is your choice. If you don't want listwise deletion within each pattern, you would have to do the analysis in two steps. In the first step, save the data without using the MISSING option. In the second step, use the MISSING option and do the analysis. 


Example 6.18 is quite helpful, however then I get separate growth curves for all cohorts instead of 1 overall growth curve. The twostep analysis you mention is maybe a better solution. But what do you mean by saving the data without using the missing option? What is the difference then between that file and the raw data file? Sorry if these are stupid questions, but I just don't get what you're saying. Thanks in advance. 


You do not get separate growth curves for each cohort. If you look at the example, you will see that each cohort contributes to part of the growth curve for which you obtain one intercept and one slope growth factor mean, variance, and covariance due to the equalities that are imposed on these parameters. Take a moment to thoroughly go through the input and also look at the output to see which parameters are estimated. If you do not use the MISSING option in the first step where you use MODEL COHORT to string out the data, then there will be no listwise deletion because no value will be considered missing. 


hi, I’ve done growth model for two parallel processes for continous outcomes with regression among the random effects and predictors using a multiple cohorts growth curve approach. In particular I have two cohorts: Younger Cohort: 15 years old 1998/ 17 yo 2000/ 19 yo 2002/ Older Cohort: 16 years old 1998/ 18 yo 2000/ 20 yo 2002/ thus the two growth curves are from age 15 to 20 years. Could you suggest me some references in which this approach was used or references that can help me to describe the results? thank you 


We used a multiple cohort approach in the following paper: Muthén, B. & Muthén, L. (2000). The development of heavy drinking and alcoholrelated problems from ages 18 to 37 in a U.S. national sample. Journal of Studies on Alcohol, 61, 290300. 


Thank you!!!!! 

J.Reef posted on Friday, May 02, 2008  8:07 am



I would like to use a multiple group multiple cohort growth model for two parallel processes..is that possible? If so, could you suggest me some references in which this model was used? Thanks in advance. 


Yes, this is possible. I don't know of any papers where this has been done. 

Reef posted on Wednesday, May 07, 2008  12:52 am



Dear Bengt and Linda, I would like to do a multiple group multiple cohort growth model for two parallel processes. The reason that I want to use parallel processes is that · I want to estimate a model for two different instruments (child and adolescent). I want to use multiple cohort because · I have an accelerated design. This means that for each cohort, the measurement points for the parallel processes will differ. In some cohorts, not all variables will have observations because the cohort is ‘too old’ for the instrument. Is this possible with this analysis in Mplus? Thank you. 


You should be able to do this. You would need to have the same set of variables in each group which may result in a problem with zero variances. I think this can be overcome by including VARIANCES=NOCHECK in the DATA command. 


I have data for a cohortsequential design with 4 groups. See below for the speces between the months. Cohort 1: 0, 2, 6 Cohort 2: 0, 4, 6 Cohort 3: 1, 3, 7 Cohort 4: 1, 5, 7 Does my syntax look correct? MODEL: acpic_i by acpic@1 bcpic@1 ccpic@1; acpic_s by acpic@0 bcpic@2 ccpic@6; acpic_i with acpic_s (20); acpic_i (21); acpic_s; [acpic_i] (23); [acpic_s] (24); MODEL fall9: acpic_i by acpic@1 bcpic@1 ccpic@1; acpic_s by acpic@0 bcpic@4 ccpic@6; Model spring8: acpic_i by acpic@0 bcpic@0 ccpic@0; acpic_s by acpic@1 bcpic@3 ccpic@7; Model spring9: acpic_i by acpic@0 bcpic@0 ccpic@0; acpic_s by acpic@1 bcpic@5 ccpic@7; 


This looks correct with the exception that in the last two groups you have the intercept growth factor loadings fixed to zero instead of one. I would also use the special  growth language instead of BY because the defaults are more appropriate to a growth model. 


Thanks for taking a look, but I guess I'm still a bit unclear on whether changing those final two models to have intercept factor loadings of 1 allows us to maximize the strength of our cohortsequential design. Even though we only sample each participant three times, we obtain 8 data points across the four cohorts (0, 1, 2, 3, 4, 5, 6, 7, 8). Is it possible to model growth AS IF there were 8 points across all the participants? 


In a growth model, the loadings for the intercept growth factor are one. That is part of the model parameterization. The way you have the model set up is as if the data are across 8 timepoints. See Example 6.18 for a full description of the multiple group multiple cohort model. 

C. Sullivan posted on Friday, December 12, 2008  12:59 pm



I have three cohorts measured at three waves each and would like to be able to also assess neighborhood variance (and potential effects) on growth factors. Is it possible to estimate a multiple cohort growth model within a multilevel framework? Specifically, could a model like that shown in example 6.18 be run in the multilevel framework (like ex. 9.12)? 


The GROUPING option is available with TYPE=TWOLEVEL when outcomes are continuous. 

C. Sullivan posted on Tuesday, December 23, 2008  12:13 pm



Two other quick questions on the multiple cohort, multilevel growth model In the time structuring...if I have two ages with no coverage, would I just set the rest of the scores as usual (i.e., y1@0, y2@2, y3@3 if there was no coverage at the second interval)? I'm trying to run a MC model for the Twolevel, grouped growth model, but I'm not getting any estimates and the output is telling me that none of the repetitions that I requested were completed. Would that more likely be the result of a setting being incorrect or model misspecification? Thanks. 


Yes, regarding the time scores. Please send your output and license number to support@statmodel.com. 


Dear Dr. Muthen I have a quick question. I am trying to conduct a multiple group multiple cohort model. EX.6.18 is a good example. However, I wonder whether I can estimate random slopes for timevarying covariates for continuous outcomes (just like EX 6.12) in a multiple group multiple cohort model. Thanks. Mark 


Yes but you would need to use TYPE=MIXTURE and the KNOWNCLASS option to do this. The GROUPING option is not available with TYPE=RANDOM; 

csulliva posted on Thursday, June 17, 2010  3:22 pm



1. Is the known class multiple cohort approach to growth modeling equivalent to the cohort group based approach (ex. 6.18)? I received some warnings on the model’s identification with the latter but not the formerand was a bit unclear on the potential source for that discrepancy. 2. Also, in response to a question above it is mentioned that “each cohort contributes to part of the growth curve for which you obtain one intercept and one slope growth factor…” Does this mean that it is appropriate to plot the outcome across age for the full sample—as opposed to a series of separate plotted lines for each known class (cohort)? 


1. Yes. 2. Yes. 

Nicolas M posted on Saturday, July 17, 2010  4:47 pm



Dear Professors, I'm doing a growth curve analysis on a 9waves panel data. Individuals in it have very different ages (going from 16 to 80). I think I'm using what you call the "wide" format, where for each individual I have age1 outcome1 age2 outcome2 age3 outcome3 ... 20 1 21 5 22 6 I defined the ages as TSCORES using "TSCORES are age1age9;". However, I have convergence problems. I managed to solve them by standardizing all the age variables using the following operation: age1standard. = (age1  mean(all_ages))/sd(all_ages) age2standard. = (age2  mean(all_ages))/sd(all_ages) etc. Now, the model converges. Do you think this is a proper way to solve this problem? Can you see any reason for not doing that? Thanks in advance for your advice. 


I would not standardize. I would divide age by a constant such as ten. 

Nicolas M posted on Tuesday, July 20, 2010  2:13 pm



Thank you for your answer. I did try to divide the age by a constant, but I still have major convergence problems. I was thinking, as every observations are equally spaced, is it reasonable to use simply : i s  f1@0 f2@1 f3@2 ... and then controlling for the starting age : i s ON age0 instead of using the TSCORES command? Or is it not a good idea? Actually what I like with this method is that mplus doesn't use the EM algorithm for numerical integration, so it is much faster. The model has a good fit. But I need to be sure it is statistically correct... 


I think using TSCORES is preferred. You could also consider multiple group multiple cohort as shown in Example 6.18. If you send the output where you failed with TSCORES and your license number to support@statmodel.com, we can see if we can help. 

Melvin C Y posted on Monday, September 06, 2010  8:39 pm



Dear Dr Muthen, I have similar measures obtained from two cohorts (group1=1012 years; group2=1316 years). As there is no common age or linking data between cohorts, would I still be able to use the multiple cohort LGM (i.e., 1017 years)? Would you suggest piecewise model instead? Thank you. 


You can use piecewise and see if the two pieces align. 

Andy Ross posted on Thursday, February 24, 2011  2:13 am



Dear Prof Muthen I’d like to estimate a linear growth model for a categorical outcome and wanted to use the multiple cohort option, however i'm under the impression that this option is not available with categorical outcomes is that correct? I'm modelling gang membership over three time points using data that contains young people aged 11, 12, 13, 14, 15, 16, 17 at time point one. I wanted to use this approach as the alternatives appear severely limited by the number of time points  i.e. the standard LGM would only allow a linear model which is not supported by the data  not to mention the fact that I would like to capture the age crime curve. It would also be useful to estimate different trajectories, i.e. adolescent limited and persistent offenders, as far as they may exist for gang membership  does the multiple cohort option allow this? I did also consider using LCA as an alternative but am limited by the number of classes I can estimate before the model is not identified. Can you offer any suggestions or am I simply asking too much of the data? Many thanks for your support Andy 


Multiple group multiple cohort analysis as shown in Example 6.18 can be used with categorical outcomes. 

Andy Ross posted on Thursday, February 24, 2011  10:09 am



Many thanks Maybe i'm setting the model up wrong? When i use the variable command to specify that the newly created observed measures should be categorical i get the following warning Observed outcomes in a growth process must be measured on the same scale. Problem with: I S Q 


Please send your output and license number to support@statmodel.com. 

Jing Zhang posted on Tuesday, May 03, 2011  12:11 pm



Dear Professor Muthen, You mentioned that there are two ways to handle multiple cohort data: 1) a multiple group approach to cohort analysis; and 2) make the program to string the data out over time. I have several questions: 1) If the data are missing by design, e.g. for some cohorts, the data were not collected at certain time points of the survey, can I still use a multiple group approach to cohort analysis as indicated in example 6.18? 2) I am doing a threelevel multiple cohort growth curve model for my research. The data are missing by design, and followings are the example. Can I still follow the example of 6.18, or I should string the data out over time? Do you have an example of the syntax for multilevel multiple cohort growth curve modeling with stringing the data out over the time? Note: cohort 1 does not have data on y1 and y2, and cohort 2 doesn’t have data on y2. y1 y2 y3 x1 x2 x3 cohort x x 7 2 5 3 1 x x 6 1 3 4 1 x x 5 2 5 6 1 x 3 5 1 6 7 2 x 2 3 4 3 2 2 x 1 3 6 7 3 2 5 3 2 1 8 9 3 3 4 7 5 1 8 3 3 5 8 2 4 6 3 Thanks, Jing 


The multiplecohort, multiplegroup approach is not straightforward in Mplus when the cohorts have different number of observed time points. So I would string out the data. Multilevel does not cause any extra difficulty as far as I can see; I don't have an example. 


Use DATA LONGTOWIDE to string out the data. See the user's guide. 

Jing Zhang posted on Thursday, May 05, 2011  12:46 pm



Dear Dr Muthen, Thanks for your answers. I have a further question about DATA LONGTOWIDE. My data set is already in wide format. Can I still use DATA LONGTOWIDE to string out the data? Maybe the way I present my data set caused confusion. The data set is as follows: subjectNO. y1 y2 y3 cohort 10001 x x 7 1 10002 x x 6 1 10003 x x 5 1 10004 x 3 7 2 10005 x 5 8 2 10006 x 2 0 2 10007 1 5 7 3 10008 2 8 9 3 10009 5 9 2 3 note: x represents missing data, y1y3 represent data collected at three waves I wrote the following codes VARIABLE: NAMES = y1y3 g; GROUPING = g (1 = cohort 6, 2 = cohort 9, 3=cohort 12); COPATTERN = cohort (1=y3 2=y2 y3 3=y1 y2 y3) TIMEMEASURES= y1(1994) y2(1997) y3(2000) TNAMES=int The results keep saying "UNKNOWN OPTION: COPATTERN". I wonder why? Could you give some inputs? Thanks, Jing 


If you date are in the wide format, you don't need to do anything to string it out. The COPATTERN option is part of the DATA COHORT command. See the current user's guide. 

csulliva posted on Friday, May 27, 2011  2:43 pm



I conducted a multiple cohort growth model using the known class option (three cohort groups) and found that the model with equality constraints was of poorer fit than the unconstrained model. 1. This would suggest that I would need to account for those groups (cohort) effects throughout my analysis. Is this correct? 2. Does that necessitate freeing the estimates for the growth factors, residual variances/covariances, and any covariate effects across groups? 3. If so, are there any tractability/estimation issues in particular that need attention in this process? I have run a test model freeing those parameters and a covariate effect and have had difficulty with convergence. Would this just be a matter of increasing the number of random starts or MIterations? Thanks in advance for any advice you can offer. 


1. Yes 2. Yes, but you may only have to free a few critical parameters. 3. Try using modification indices in the model with full equality to see which parameters are not equal. 


Drs. Muthen, I have a longitudinal dataset with seven waves (base, six months, 12 months, 18 months, 4.5 years, 5.5 years, 6.5 years), with youth who were aged 1317 at baseline. I need to use TSCORES to deal with individual variability around each wave, and zeroinflated Poisson distribution to deal with high number of zero’s in the outcome. I would like to use age, rather than time since baseline as the time variable, so created TSCORES variables that are age0, age1, age2, etc… representing their age at each wave of data collection. In addition I am trying to test for cohort effects given the 1317 range at baseline. Does the syntax below make sense? In particular, I’m concerned about whether I am structuring the TSCORES correctly. Thanks in advance, Carolyn … USEVAR ARE AGECOH AGECEN0 AGECEN1 AGECEN2 AGECEN3 AGECEN4 AGECEN5 AGECEN6 ALSYCT0 ALSYCT1 ALSYCT2 ALSYCT3 ALSYCT4 ALSYCT5 ALSYCT6; COUNT=ALSYCT0ALSYCT6 (i); MISSING ARE ALL (999); TSCORES ARE AGECEN0 AGECEN1 AGECEN2 AGECEN3 AGECEN4 AGECEN5 AGECEN6; ANALYSIS: TYPE= RANDOM MISSING; PROCESSORS=2; MODEL: i s  alsyct0alsyct6 AT agecen0agecen6; ii si  alsyct0#1alsyct6#1 AT agecen0agecen6; i ON agecoh; s ON agecoh; ii ON agecoh; si ON agecoh; 


I would take the mean of all of the time score variables (AGECEN0 AGECEN1 AGECEN2 AGECEN3 AGECEN4 AGECEN5 AGECEN6) and subtract that mean from each time score variable. 


Hi I am trying to develop a multivariate MLGM (using Bayesian estimation). The challenge is that data consist of different cohorts.  I have two cohorts with data at three time points each (school grades 8, 9, and 10).  I have two cohorts with data at two time points (school grades 8 and 9 OR school grades 9 and 10, so these cohorts have missingness by design). My idea is that some of the analyses should combine all four cohorts into one analysis of school grades 8 to 10 to increase sample size. I see there are a few options for cohort analyses in Mplus. But I should try to do this wisely. I would want to test for time effects (e.g. simulating the possibility of unknown historical events). This means that a measurement in any of the three specific years where measurements were conducted can affect scores, for one cohort this effect will be in grade 8, for another cohort the effect will be in grade 9 while for yet another cohort the effect will be in grade 10. Anyone of the wonderful Mplus team  or any other in the Mplus community  do you have suggestions on how best to develop this model? Kind regards, Christopher Bratt 


CORRECTION: I have FOUR measurements (four years where measurements were conducted). These are used for growth models with three time points (maximum number of measurements for one cohort), measurements in the model are equal to school grades 8 to 10. Chris 


I would do a multiplegroup analysis, with cohort as group and grade as time axis for the growth model. With Bayes, you do multiplegroup via Knownclass in Type=Mixture. Testing for time effects may be more tricky. Although an event in a certain year influences subjects in different grades for different cohorts, we don't know if that same event has a different influence for students in different grades. There is a large literature on ageperiodcohort analyses. But in principle you can let the event effect be restricted to have the same magnitude in the different cohorts (for the different grades), for instance by letting an intercept of the outcome at that point jump out of line of a linear growth model. 


Thanks, Bengt. So you would not (also) do a model with all cohorts and try to account for cohorts effects within that model. A multiplegroup analysis seems to give me only data for the two cohorts with measurements at all three time points, I wanted to add an analysis with all my data (four cohorts) to check whether increasing the number of cohorts and the sample size changed anything (but this gives missingness by design). 


You can do a singlegroup run of all cohorts as well, although then investigation of cohort differences is not as flexible. The multigroup approach can handle different number of observed variables per our FAQ, but I haven't tried something like that with Bayes. 


Bengt, just a follow up on this brief dialogue: *** WARNING in ANALYSIS command Estimator BAYES is not allowed with TYPE=TWOLEVEL MIXTURE. Default estimator will be used. You will know that. But I thought I should add it for other readers. 


Hi, I have data from an accelerated cohort design with 4 cohorts, measured at three time points. I want to look at temporal antecedents for my outcome. Is the only way to do this to use X1 and X2 to predict Y2 and Y3, or is there a way I can use the accelerated cohort design too? Thanks 


A multiple cohort design can be used with any model. 


Hello, I am interested in running a multiplegroup multiplecohort model similar to example 6.18, although I would like to use time scores rather than measurement occasion for my time points. In a multiplegroup multiplecohort model, do I need to center the individuallyvarying timescores (age) for each of the groups (cohorts) at initial measurement, or would I grandmean center the time scores for all groups? Thanks for your assistance. 


You want to center time with respect to the full time range, not specifically for each cohort. 


Thank you for your quick response. To elaborate on my question, would I want center time on the full time range in both of the following scenarios? a) accelerated cohort design (example 6.18) b) multiple group LGM using time scores as time points and using cohort as the grouping variable 


You would want to center time on the full time range in both a and b. 


In response to a recent post in this thread, Linda suggested that the user should center time scores representing age on the mean age taken across all observation points. I have been under the assumption that time scores representing age should be centered on the mean age at the first observation. Is one of these methods "correct" or are there certain situations where one method is preferable to the other? As always, thank you for your guidance. 


I think centering choice is largely determined substantively. That is, which age do you want the intercept factor to represent? But in some cases the correlations between the growth factors can get uncomfortably close to 1 in which case average age centering can help to make them less correlated. 


Thank you again. I've ran a multiple group multiple cohort analysis, using KNOWNCLASS due to the use of time scores. I am not sure if there are problems with my model or if I need to rethink my interpretation of group specific intercepts. My main question is regarding the interpretation of the intercept for each cohort. Currently the time scores are age at each observation centered on the grand mean for age at time 0, with this value divided by 10 to reduce the range of time scores. The problem is that I am receiving estimated intercepts that are outside the range of possible values for the outcomes. For example, the estimated intercept for my outcome for the oldest cohort (cohort 1) was 24, which is far off the possible range of values for the outcome (mean = 3, sd = 3, range = 011). The mean value of the time score for the oldest cohort was 1.4. My understanding is that the intercept should be interpreted as the mean value of my outcome at the mean age at time 0. Should the groupspecific intercepts be interpreted in this manner? If so, would these intercepts suggest nonlinearity that may require quadratic terms? FYI, here is the mean age of each cohort at time 0: Cohort 1: 79 Cohort 2: 70 Cohort 3: 60 Cohort 4: 52 


You should be estimating one intercept growth factor and one slope growth factor. There should be equalities of these parameters across cohorts. Please see Example 6.18. 


Thank you again for the guidance. I now have a question regarding the class specific output. When using the KNOWNCLASS option in mixture modeling, my assumption was that the "CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP" output should provide the same group ncounts as the frequency of my cohort grouping variable, but it does not. Is there a simple explanation for this difference? Observed class counts: a = 4753 b = 2096 c = 9396 d = 2277 Estimated class counts and proportions ncounts: a = 4480, b = 2239, c = 7596, d = 4206 


Please send the output and your license number to support@statmodel.com. 


I take it that a multilevel multiple cohort latent growth model is possible in Mplus. The GROUPING option is available with TYPE=TWOLEVEL RANDOM. (Or?) Are there any syntax examples available for this type of model? Thanks. David 


If GROUPING is not available, use the KNWONCLASS option and TYPE=MIXTURE TWOLEVEL RANDOM. The only syntax is the example in Chapter 6. You would need to adapt that. 


Thanks, Linda. I am assuming that when I use the "mixture twolevel random" for a growth model and include multiple cohorts, I will assign time scores for each known class/group as shown in Ex. 6.18. But I am not sure what time scores would be assigned in the %WITHIN% / %OVERALL% portion of the model. Thanks for your help with this. Sorry, I am sure there is an obvious answer that I am overlooking here. David 


Yes, you would set it up like Example 6.18 but in a multilevel setting. See Example 9.12. 


I am still unsure about what time scores would replace the question marks below. (I used Ex. 6.18 to figure out the time score for each of 4 cohorts.) ... TYPE = TWOLEVEL MIXTURE RANDOM; MODEL: %WITHIN% %OVERALL% iw sw  y1@0 y2@.? y3@.? y4@.? ; iw sw ON x1 ; y1y4 (1) ; %cg#1% iw sw  y1@0 y2@.1 y3@.2 y4@.3 ; %cg#2% iw sw  y1@.1 y2@.2 y3@.3 y4@.4 ; %cg#3% iw sw  y1@.2 y2@.3 y3@.4 y4@.5 ; %cg#4% iw sw  y1@.3 y2@.4 y3@.5 y4@.6 ; %BETWEEN% %OVERALL% ib sb  y1@0 y2@.? y3@.? y4@.? ; y1@0 y2@0 y3@0 y4@0 ; ... Thanks. David 


You need to follow Example 6.18. This is also described in either the Topic 3 or Topic 4 course handouts. The key is arranging your data by age not cohort. Age is the time variable. The time scores come from this. This is described in Example 6.18. Follow these steps for your example. 


Hello, I've created a multiple group growth model with grand mean centered time scores and want to test for differences in intercept and slope across groups so I'm not imposing equalities on these parameters as described in ex. 6.18. As I understand, the resulting intercepts tell me each group's estimated intercept at the grand mean age, around 65 years of age. I would assume that the slope would tell me the estimated change in each group as age increases from the intercept (in my case 10 years, so from age 6574). How can I use this information to explain the intercept and slope of each age group relative to their group mean age? Rather than estimating what my youngest group would look like (mean age 50 at t1) between the ages of 6574, I want to know what their estimated change would be from age 5059. With this said, what issues would arise if I centered the time scores on group mean age? 


After reading my post over I believe I was able to answer my initial question. If I wanted to know what the estimated value for a 55 year old would be, I would subtract the slope estimate, representing 10 years of change, from the intercept estimated at age 65. I would still be interested to know what issues arise when centering time scores on group mean age versus grand mean age. 


This type of general question is more appropriate for a general discussion forum like SEMNET. 


I'm hoping to run a cohortsequential, piecewise LGM using Add Health data to determine the trajectory of sedentary behavior from adolescence to adulthood (ages 1332). The piecewise model will have 3 segments, ages 1318 years, 1922, and 2332. Data were collected at 4 waves (19945, 1996, 200102 and 2008). I have a few questions regarding my analyses. 1a. My plan is to use a multiple group multiple cohort approach and apply this to piecewise LGM. Is this appropriate or should I use DATA COHORT? 1b. What would be the limitations of the DATA COHORT approach and are there any particular references you can point to that would elaborate on these limitations? I haven't some across anything in my search for this. 2. If the multiple group multiple cohort method is appropriate, how would I go about specifying the time scores for each age cohort to ensure I model the 3 different segments of the piecewise model? Participants who are in the 13 year old cohort contribute sedentary behavior data for ages 13, 14, 19 and 26. Those in the age 14 cohort contribute data for ages 14, 15, 20 and 27, etc. I understand that the time scores associated with sedentary behavior at a particular age will be the same across cohorts. Any recommendations on how to integrate the piecewise component into this would be much appreciated! 


1a, b Definitely use the multiplegroup approach so you have the flexibility of testing acrossgroup restrictions. 2. See our UG ex 6.18 which discussed the details of time scores. Piecewise just takes a little care in getting things right. 

YUN HWAN KIM posted on Thursday, November 05, 2015  8:54 pm



Dr. Muthen, Using accelerated longitudinal data, I am trying to examine the growth pattern of the variable of my interest. Different from the example 6.18 in the Mplus User's Guide, however, my dataset has different assessment points for different cohort. Specifically, I have five cohorts as below. _______WaveI WaveII WaveIII WaveIV WaveV Cohort I Age13 Age14 Age15 Age16 Age17 Cohort II Age16 Age17 Age18 (NoData) Age20 Cohort III (NoData) Age20 (NoData) Age22 (NoData) Cohort IV Age22 (NoData) Age24 (NoData) Age26 Cohort V (NoData) Age26 (NoData) Age28 (NoData) To me, it seems that, in order to use the example 6.18 as a guide, each of five cohorts should have the same assessment points. So, I am wondering how I should handle the different assessment points between different cohorts? I would really appreciate if you could guide me. Best, Yunhwan 


Regarding my question above, I realized that a similar question was already posed with the answer from Dr.Muthen, "The multiplecohort, multiplegroup approach is not straightforward in Mplus when the cohorts have different number of observed time points. So I would string out the data. Multilevel does not cause any extra difficulty as far as I can see; I don't have an example." Please discard the above question! 


Hi Linda and Bengt, I am considering EXAMPLE 6.18: MULTIPLE GROUP MULTIPLE COHORT GROWTH MODEL or else DATA COHORT for modeling growth on items that are categorical and zeroinflated count in nature (2 different sets of variables). 1. Why is DATA COHORT available only for continuous outcomes? 2. Example 6.18 models linear growth using the time scores 0, .1, .2, .3., .4 and so on. Can these be squared in order to model quadratic growth (option A)? (I could then compare the fit of the linear and quadratic models. These would I think be nested models, correct?) Or else, quadratic growth could be examined in the model itself using the "i s q  y1@0 y2@.1 y3@.2 y4@.3;" form (option B). However, the latter will not work with only 3 time points, correct(?)which is why I inquire about option A. Thank you sincerely, Lisa M. Yarnell 


1. We've found the multiple cohort approach much more useful than the Data Cohort approach and therefore not developed the latter further. 2. The only correct alternative is option B. You can fix the variance of q  I think that makes it justidentified with 3 time points. 


Thanks! 

Back to top 