Multiple cohort PreviousNext
Mplus Discussion > Growth Modeling of Longitudinal Data >
Message/Author
 Anonymous posted on Monday, April 12, 2004 - 9:04 pm
When I use mcohort function, how does MPLUS calculate degrees of freedom?
 bmuthen posted on Tuesday, April 13, 2004 - 12:16 pm
As in regular modeling.
 guest posted on Tuesday, October 18, 2005 - 12:51 am
I have longitudinal achievement data containing math and reading scores from 5th, 6th, 7th, and 9th grade.

There were four data files. One for 2002 year, one for 2003 year, one for 2004 year, and one for 2005 year.

Each data file had 5th, 6th, 7th, (not 8th) and 9th grade math and reading scores. I merged 4 data files into one big file (a long stacked file) and created a cohort variable.

Cohort 1: 5th in 2002, 6th in 2003, and 7th in 2004
Cohort 2: 6th in 2002, 7th in 2003, and 9th in 2005
Cohort 3: 7th in 2002, and 9th in 2004
Cohort 4: 9th in 2002

Cohort 5: 5th in 2003, 6th in 2004, and 7th in 2005
Cohort 6: 6th in 2003 and 7th in 2004
Cohort 7: 7th in 2003 and 9th in 2005
Cohort 8: 9th in 2003

Cohort 9: 5th in 2004 and 6th in 2005
Cohort 10: 6th in 2004 and 7th in 2005
Cohort 11: 7th in 2004
Cohort 12: 9th in 2004

Cohort 13: 5th in 2005
Cohort 14: 6th in 2005
Cohort 15: 7th in 2005
Cohort 16: 9th in 2005

A subset of the stacked data file looks like below.

StudentID Cohort year SchoolID grade math reading
13641 1 2002 12 5 648 620
13641 1 2003 12 6 698 635
13641 1 2004 28 7 719 661
14293 4 2002 21 9 709 671
11116 4 2002 28 9 794 709
11118 4 2002 32 9 689 663
11918 8 2003 33 9 758 700
13294 8 2003 33 9 716 665
15989 4 2002 28 9 705 700
17117 3 2002 27 7 746 726
17117 3 2004 36 9 739 674
19197 2 2002 19 6 654 639
19197 2 2003 23 7 652 648
19197 2 2005 35 9 705 665
After that, I looked at mplus manual and this website to find out how to set up a mplus input file for modeling growth. I couldn't figure out yet.

Could you help me out?

Thank you.
 Linda K. Muthen posted on Tuesday, October 18, 2005 - 10:02 am
Chapter 6 of the Mplus User's Guide contains many examples of growth models for data in the wide format where data collected for each individual at different time points is represented by a column in the data set. All of these examples and data come on the Mplus CD and are also available on our website under excerpts from the user's guide.

It looks like you have set your data up in the long format where data collected for each individual is represented by a diferrent record. Growth models for data in the long format are estimated in Mplus using TYPE=TWOLEVEL: with CLUSTER=ID;.

In Chapter 13, there is a description of how to handle multiple cohort data under the section called Missing. I'm not sure if this is what you are interested in but you can read that also.
 Derek Kreager posted on Monday, January 02, 2006 - 9:21 am
Does Mplus allow for Growth Mixture Modelling with longitudinal data collected from multiple cohorts?

I have annual data over five years collected from 8,10,12,14,and 16 year-olds at time 1. I was hoping to look at trajectories of count outcomes using the full dataset. Any suggestions would be greatly appreciated.

Thank you.
 bmuthen posted on Monday, January 02, 2006 - 2:36 pm
Yes.

There are 2 ways to do this. One is to do a multiple-group analysis - which in the case of mixtures uses Knownclass - where cohort is group. The other is to string out the data across all ages represented so that data from each cohort has missing at some ages.
 Sarah Dauber posted on Wednesday, November 08, 2006 - 12:40 pm
Hello,
I am interested in conducting growth curve analysis with cohort data. Each person was measured at 3 timepoints, and there are 10 different cohorts. So, altogether I have data from age 12 to age 28. I would like to use the approach of stringing out the data so all ages are represented, rather than using the multiple group method, but there is a lot of missing data this way and I am getting very low covariance coverage rates. Is there a coverage rate that is considered optimal?
Also, can you point me to some readings that would provide more info on using MPlus with cohort data?

Thank you,
Sarah Dauber
 Linda K. Muthen posted on Thursday, November 09, 2006 - 10:07 am
Covariance coverage of zero when the missingness is by design is fine. Other than that, I would recommend no less than .8. I know of no references where Mplus has been used with cohort data.
 Patrick Malone posted on Saturday, March 17, 2007 - 10:27 am
Hi. I'm wanting to do growth modeling working with data from Add Health, in which youth were recruited in an agespan from 11-20 (or thereabouts), and interviewed at years 0, 1, and 7.

I know this is a situation for the cohort analysis, but there is substantial missingness that is not by design, which, if I understand it, is treated by listwise deletion using the cohort commands in Mplus.

I'm trying the knownclass = age approach, but the 6 year gap seems to be causing problems. So there is a lot of coverage of zero, as no respondent was measured at age 11 along with any of age 13, 14, 15, 16, or 17.

Help? Thanks.
 Bengt O. Muthen posted on Saturday, March 17, 2007 - 10:41 am
You might try a single-group approach using the AT option to allow individually-varying ages at the 3 occasions.
 Patrick Malone posted on Saturday, March 17, 2007 - 1:28 pm
Didn't think of that, thanks. I'm really hoping for a solution that takes advantage of the accelerated longitudinal design, though, so I can get beyond linear growth. I think I'd still be limited, wouldn't I?

Thanks.
 Bengt O. Muthen posted on Saturday, March 17, 2007 - 2:34 pm
You can do up to cubic with AT. I used this recently for a growth model.
 Patrick Malone posted on Saturday, March 17, 2007 - 3:19 pm
Great, thanks -- I'll give it a try!
 Matt Moehr posted on Monday, March 19, 2007 - 10:25 am
I used the following code in an accelerated design. X1-X3 are the same variables measured at successive time points. Cohort tells the program when the person started, and then I keep it as type = missing. I think you would need to change the lines where I specified the cohort models, e.g.:

Model Cohort13: i s | x1@0 x2@1 x3@7;

**** Begin Code ****
TITLE: GROWTH CURVE MODEL;

DATA: file is ...;

VARIABLE: names are id cohort x1-x3 ;
usevariables are x1-x3 cohort;
grouping is cohort (1=3-0 2=3-9 3=4-6);
missing are all(-99);

ANALYSIS: type = mgroup meanstructure missing ;
estimator = ml;
iterations = 1000;

MODEL: i s | x1@0 x2@1 x3@2;
[x1-x3@0];!x1-x3(1);
![i](2);i(3);
![s](4);s@0;
!i WITH s(6);

MODEL 3-0: i s | x1@0 x2@1 x3@2;
MODEL 3-9: i s | x1@1 x2@2 x3@3;
MODEL 4-6: i s | x1@2 x2@3 x3@4;

OUTPUT: standardized mod(1) tech1 tech4;

PLOT: type = plot1 plot2 plot3;
series = x1-x3(*);
 Linda K. Muthen posted on Monday, March 19, 2007 - 11:22 am
Are you trying to do a multiple group approach to cohort analysis or do you want the program to string the data out over time?
 Patrick Malone posted on Friday, April 13, 2007 - 5:22 am
Following up on our exchange on 3/17:

Does having so few actual measurement variables have an impact on identification? We got the growth model working, but adding predictors of the growth parameters has been causing nothing but trouble.

Thanks.
 Bengt O. Muthen posted on Friday, April 13, 2007 - 9:07 am
I think you said you had 3 occasions but with a lot of missing data. Growth modeling with random intercept and slope should still be supported if enough people have observations on all 3 occasions. I assume you are handling individually-varying ages of observation using the AT option per our earlier communication. And holding residual variances equal across time (since different ages for different people at a given occasion). I don't see off hand how adding covariates would make it more difficult unless covariates have a lot of missingness too.
 Patrick Malone posted on Saturday, April 14, 2007 - 9:58 am
Thanks, Bengt.

Yes, this is with the AT syntax.

The outcomes are 3-category ordinal; the thresholds are constrained to be equal across time. Is there anything else we should do about that?

The covariates are all x-variables, so there is listwise deletion. The sample size is still upwards of 10,000.

Is it to time to send this to support?

Thanks.
 Linda K. Muthen posted on Saturday, April 14, 2007 - 10:26 am
Please send the input, data, output, and your license number to support@statmodel.com.
 Sarah Dauber posted on Thursday, August 02, 2007 - 12:11 pm
Hello,
I am trying to model growth in depression scores over time using the Add Health data. People were assessed 3 times, baseline, one year later, and five years later. However, ages ranged from 12-21 at baseline, so that stringing the data out using age as the time variable would theoretically allow you to look at growth from age 12 to 28. When I try to run this (with age as the time variable, data strung out over time), the model doesn't converge b/c there is so much missing data. I have also tried modeling it with just the 3 variables for each person (t1 t2 and t3) and using AT to indicate individually varying times of observation. The model converges this way, but I can't get a plot of the curve across all ages. Is this the correct way to model change in growth across all ages? And if so, how would I get a plot of the curve across all ages?

Thanks so much,

sarah Dauber
 Linda K. Muthen posted on Friday, August 03, 2007 - 10:17 am
The two approaches you are using are the same except with the wide format the residual variances are free across time. With AT they are held equal across time. I would try holding them equal across time in the wide analysis and see if that helps.
 Matthew Cole posted on Saturday, August 11, 2007 - 12:29 pm
Hello,

I am using the DATA COHORT to rearrange my longitudinal data so that I can investigate development over age. My data were collected at 6-month intervals (baseline, 6m, 12m, 18m, 24m). Mplus requires birth year and measurement year, and the actual ages that I'd like represented are 13-18. However, since the waves are in 6 months intervals, I'm not sure how to do that. Therefore, I have set up some arbitrary integers that actually present the data in the age range of 13-20. Therefore, I'll just redo the graph so that the X axis shows the correct age. I'm wondering if there's another way of correctly capturing the 6-month intervals or if I'm doing the correct way? Here's the current code I'm using:

USEVARIABLES ARE copeblw-copefw boys;

DATA COHORT:
COHORT IS ageby (73 74 75 76);
TIMEMEASURES = copeblw (89) copecw (90) copedw (91)
copeew (92) copefw (93);
TNAMES = copew;


ANALYSIS:

TYPE=missing H1 meanstructure;

MODEL:
I S Q | copew13@0 copew14@1 copew15@2 copew16@3 copew17@4
copew18@5 copew19@6 copew20@7;
I S Q ON boys;
 Linda K. Muthen posted on Tuesday, August 14, 2007 - 3:03 pm
I can't think of any other way to do this.
 Matthew Cole posted on Tuesday, August 14, 2007 - 6:04 pm
Thanks!

Instead of working in years (see the code above), I worked in months by multiplying all the year values by 12. This way I captured all the 6 month intervals in age growth from 13 to 18 years old (cope13 cope13_5 cope14 cope14_5 cope15 ...).
 Matthew Cole posted on Saturday, September 01, 2007 - 7:28 pm
I've been using the DATA COHORT to rearrange my longitudinal data, and I received the nonconvergence message below. I am curious if there is a way to get my data to converge so that I can plot the means. If not, at least Mplus is providing the savedata file so I'll be able to put a figure together using another program.

THE MISSING DATA EM ALGORITHM FOR THE H1 MODEL HAS NOT CONVERGED WITH RESPECT TO THE PARAMETER ESTIMATES. THIS MAY BE DUE TO SPARSE DATA LEADING TO A SINGULAR COVARIANCE MATRIX ESTIMATE.

INCREASE THE NUMBER OF H1 ITERATIONS.

NOTE THAT THE NUMBER OF H1 PARAMETERS(MEANS, VARIANCES, AND COVARIANCES) IS GREATER THAN THE NUMBER OF OBSERVATIONS.
NUMBER OF H1 PARAMETERS : 209
NUMBER OF OBSERVATIONS : 166
 Linda K. Muthen posted on Sunday, September 02, 2007 - 7:52 am
The message refers to the convergence of the H1 model not the H0 model. This means that you do not get fit statistics. You could try the suggestion of increasing the H1 iterations.

It sounds like your sample may be too small for the intended modeling.
 Matthew Cole posted on Sunday, September 02, 2007 - 3:09 pm
Thanks Linda. That did it! I set H1iterations=10000 and it finally fit. Fortunately with the new Duo processors and setting process=2 the run doesn't take that long.
 giovanna caprara posted on Wednesday, October 24, 2007 - 7:04 am
I’ve done a multiple cohorts growth curve.
In particular I have two cohorts:

Older Cohort: 15 years old 1998/ 17 yo 2000/ 19 yo 2002/ 21 yo 2004
Younger Cohort: 16 years old 1998/ 18 yo 2000/ 20 yo 2002/ 22 yo 2004

I want now add predictors and outcomes
Can I add them even if each variable is not measured at the same age?
For example I want to add as predictor “qda” measured in ’98 (15 year old for the younger cohort and 16 for the older one). Is this procedure correct?

This is part of the input

USEV ARE
sex md98 md20 md02 md04 qdr98 qdd98;

Im Sm | md98@0 md20@2 md02@4 md04@6;
im with sm (21);
Sm (22);
im (23);
[sm] (24);
[im] (25);
sm on sex (26);
im on sex (27);
qda by qdd98 qdr98;

im on qda (28);
sm on qda (29);
qda on sex (30);
model older:
Im Sm | md98@1 md20@3 md02@5 md04@7;


Could you help me???

thanks
 Bengt O. Muthen posted on Wednesday, October 24, 2007 - 8:16 am
I think you have a good start here. Note that the introduction to your message has older and younger reversed (those who are 16 in 1998 are older than those who are 15 in 1998). You have it right in the model. Your equality restrictions look right.

Since you don't measure your qda indicators at the same age for the 2 cohorts, you may want to test that the equalities across cohorts related to this factor actually fit well by also runnning the model with them unequal. Even if they are unequal, the parameters related to the growth factors may be equal. It is of interest to test if parameters related to the growth factors are invariant across cohorts.
 giovanna caprara posted on Wednesday, October 24, 2007 - 10:01 am
THANK YOU VERY MUCH!!!

I'VE DONE AGAIN THIS MODEL WITH THE EQUALITIES AND ADDING THE OUTCOME.
ALL THE PARAMETERS ARE INVARIANT ACROSS COHORTS. IN THIS WAY I CAN DISCUSS OLSO THE IMPACT ON THE OUTCOME CONSIDERING THE PREDICTOR.

USEV ARE
sex md98 md20 md02 md04 qdd98 qdr98 VAS04R;
missing are all (99.00);
grouping is coorte (1=younger 2=older);
Analysis:
Type = MEANSTRUCTURE MISSING ;
ESTIMATOR = mlR;
model:

Im Sm | md98@0 md20@2 md02@4 md04@6;
im with sm (21);
Sm (22);
im (23);
[sm] (24);
[im] (25);
sm on sex (26);
im on sex (27);
qda by qdd98 ;
qda BY qdr98 (31) ;
VAS04R ON IM (32);
VAS04R ON SM (33);
VAS04R ON SEX (34);
VAS04R ON QDA (35);
im on qda (28);
sm on qda (29);
qda on sex (30);
model older:
Im Sm | md98@1 md20@3 md02@5 md04@7;
 Sylvana Robbers posted on Wednesday, February 06, 2008 - 1:42 am
Hi Bengt and Linda,

I am working on a multiple cohorts growth curve, but there is substantial missingness that is not by design. I know your recommendation is to do listwise deletion first, but when I do that I will only keep about 30% of my cases, which is not really an option.

My questions are:
1. is it really necessary to do listwise deletion first?
2. with both missingness by design and missingness at random, will Mplus estimate the model incorrect? (though it runs properly). In what way then?
3. if listwise deletion is the only solution, does it matter if I perform listwise myself in SPSS first, or is it better (for model estimation) to let Mplus do that using the DATA COHORT command?

I know TSCORES is good way to overcome this problem (that also runs properly), but then I am not able make a plot.


Thanks.

Sylvana
 Linda K. Muthen posted on Wednesday, February 06, 2008 - 10:42 am
I'm not sure why you think we recommend listwise deletion. We don't. You can use the multiple group multiple cohort approach shown in the new Example 6.18 or you can string the data out by age and not use TSCORES.
 Sylvana Robbers posted on Thursday, February 07, 2008 - 8:09 am
I mean that when I string the data out by age, I will get missingness by design next to missingness at random. Is this allowed? (or is listwise deletion necessary?)

Using the DATA COHORT command, each observation that does not have complete data is deleted from the data set (page 350 userguide version 4.1), that was the reason why I thought you recommend listwise deletion.

I hope you can clear this up for me. Thanks in advance.

Sylvana
 Linda K. Muthen posted on Thursday, February 07, 2008 - 8:34 am
You can string the data out with or without doing listwise deletion within each pattern of variables. This is your choice. If you don't want listwise deletion within each pattern, you would have to do the analysis in two steps. In the first step, save the data without using the MISSING option. In the second step, use the MISSING option and do the analysis.
 Sylvana Robbers posted on Friday, February 08, 2008 - 3:41 am
Example 6.18 is quite helpful, however then I get separate growth curves for all cohorts instead of 1 overall growth curve.
The two-step analysis you mention is maybe a better solution. But what do you mean by saving the data without using the missing option? What is the difference then between that file and the raw data file? Sorry if these are stupid questions, but I just don't get what you're saying.

Thanks in advance.
 Linda K. Muthen posted on Friday, February 08, 2008 - 6:10 am
You do not get separate growth curves for each cohort. If you look at the example, you will see that each cohort contributes to part of the growth curve for which you obtain one intercept and one slope growth factor mean, variance, and covariance due to the equalities that are imposed on these parameters. Take a moment to thoroughly go through the input and also look at the output to see which parameters are estimated.

If you do not use the MISSING option in the first step where you use MODEL COHORT to string out the data, then there will be no listwise deletion because no value will be considered missing.
 giovanna caprara posted on Wednesday, February 20, 2008 - 7:57 am
hi,
I’ve done growth model for two parallel processes for continous outcomes with regression among the random effects and predictors using a multiple cohorts growth curve approach.
In particular I have two cohorts:

Younger Cohort: 15 years old 1998/ 17 yo 2000/ 19 yo 2002/
Older Cohort: 16 years old 1998/ 18 yo 2000/ 20 yo 2002/

thus the two growth curves are from age 15 to 20 years.

Could you suggest me some references in which this approach was used or references that can help me to describe the results?

thank you
 Linda K. Muthen posted on Wednesday, February 20, 2008 - 8:18 am
We used a multiple cohort approach in the following paper:

Muthén, B. & Muthén, L. (2000). The development of heavy drinking and alcohol-related problems from ages 18 to 37 in a U.S. national sample. Journal of Studies on Alcohol, 61, 290-300.
 giovanna caprara posted on Wednesday, February 20, 2008 - 8:35 am
Thank you!!!!!
 J.Reef posted on Friday, May 02, 2008 - 8:07 am
I would like to use a multiple group multiple cohort growth model for two parallel processes..is that possible?

If so, could you suggest me some references in which this model was used?

Thanks in advance.
 Linda K. Muthen posted on Saturday, May 03, 2008 - 9:16 am
Yes, this is possible. I don't know of any papers where this has been done.
 Reef posted on Wednesday, May 07, 2008 - 12:52 am
Dear Bengt and Linda,
I would like to do a multiple group multiple cohort growth model for two parallel processes.

The reason that I want to use parallel processes is that
· I want to estimate a model for two different instruments (child and adolescent).

I want to use multiple cohort because
· I have an accelerated design.

This means that for each cohort, the measurement points for the parallel processes will differ. In some cohorts, not all variables will have observations because the cohort is ‘too old’ for the instrument.

Is this possible with this analysis in Mplus?

Thank you.
 Linda K. Muthen posted on Wednesday, May 07, 2008 - 8:21 am
You should be able to do this. You would need to have the same set of variables in each group which may result in a problem with zero variances. I think this can be overcome by including VARIANCES=NOCHECK in the DATA command.
 Jeff Cookston posted on Wednesday, May 07, 2008 - 2:47 pm
I have data for a cohort-sequential design with 4 groups. See below for the speces between the months.
Cohort 1: 0, 2, 6
Cohort 2: 0, 4, 6
Cohort 3: 1, 3, 7
Cohort 4: 1, 5, 7

Does my syntax look correct?

MODEL:
acpic_i by acpic@1 bcpic@1 ccpic@1;
acpic_s by acpic@0 bcpic@2 ccpic@6;
acpic_i with acpic_s (20);
acpic_i (21);
acpic_s;
[acpic_i] (23);
[acpic_s] (24);

MODEL fall9:
acpic_i by acpic@1 bcpic@1 ccpic@1;
acpic_s by acpic@0 bcpic@4 ccpic@6;

Model spring8:
acpic_i by acpic@0 bcpic@0 ccpic@0;
acpic_s by acpic@1 bcpic@3 ccpic@7;

Model spring9:
acpic_i by acpic@0 bcpic@0 ccpic@0;
acpic_s by acpic@1 bcpic@5 ccpic@7;
 Linda K. Muthen posted on Wednesday, May 07, 2008 - 5:54 pm
This looks correct with the exception that in the last two groups you have the intercept growth factor loadings fixed to zero instead of one. I would also use the special | growth language instead of BY because the defaults are more appropriate to a growth model.
 Jeff Cookston posted on Thursday, May 08, 2008 - 2:03 pm
Thanks for taking a look, but I guess I'm still a bit unclear on whether changing those final two models to have intercept factor loadings of 1 allows us to maximize the strength of our cohort-sequential design. Even though we only sample each participant three times, we obtain 8 data points across the four cohorts (0, 1, 2, 3, 4, 5, 6, 7, 8). Is it possible to model growth AS IF there were 8 points across all the participants?
 Linda K. Muthen posted on Thursday, May 08, 2008 - 2:28 pm
In a growth model, the loadings for the intercept growth factor are one. That is part of the model parameterization. The way you have the model set up is as if the data are across 8 timepoints. See Example 6.18 for a full description of the multiple group multiple cohort model.
 C. Sullivan posted on Friday, December 12, 2008 - 12:59 pm
I have three cohorts measured at three waves each and would like to be able to also assess neighborhood variance (and potential effects) on growth factors. Is it possible to estimate a multiple cohort growth model within a multilevel framework? Specifically, could a model like that shown in example 6.18 be run in the multilevel framework (like ex. 9.12)?
 Linda K. Muthen posted on Sunday, December 14, 2008 - 10:38 am
The GROUPING option is available with TYPE=TWOLEVEL when outcomes are continuous.
 C. Sullivan posted on Tuesday, December 23, 2008 - 12:13 pm
Two other quick questions on the multiple cohort, multilevel growth model

In the time structuring...if I have two ages with no coverage, would I just set the rest of the scores as usual (i.e., y1@0, y2@2, y3@3 if there was no coverage at the second interval)?

I'm trying to run a MC model for the Twolevel, grouped growth model, but I'm not getting any estimates and the output is telling me that none of the repetitions that I requested were completed. Would that more likely be the result of a setting being incorrect or model misspecification?

Thanks.
 Linda K. Muthen posted on Tuesday, December 23, 2008 - 12:38 pm
Yes, regarding the time scores.

Please send your output and license number to support@statmodel.com.
 Hsien-Yuan Hsu posted on Wednesday, February 25, 2009 - 2:02 pm
Dear Dr. Muthen

I have a quick question.
I am trying to conduct a multiple group multiple cohort model. EX.6.18 is a good example.

However, I wonder whether I can estimate random slopes for time-varying covariates for continuous outcomes (just like EX 6.12) in a multiple group multiple cohort model.

Thanks.
Mark
 Linda K. Muthen posted on Thursday, February 26, 2009 - 10:00 am
Yes but you would need to use TYPE=MIXTURE and the KNOWNCLASS option to do this. The GROUPING option is not available with TYPE=RANDOM;
 csulliva posted on Thursday, June 17, 2010 - 3:22 pm
1. Is the known class multiple cohort approach to growth modeling equivalent to the cohort group based approach (ex. 6.18)? I received some warnings on the model’s identification with the latter-- but not the former--and was a bit unclear on the potential source for that discrepancy.
2. Also, in response to a question above it is mentioned that “each cohort contributes to part of the growth curve for which you obtain one intercept and one slope growth factor…” Does this mean that it is appropriate to plot the outcome across age for the full sample—as opposed to a series of separate plotted lines for each known class (cohort)?
 Bengt O. Muthen posted on Thursday, June 17, 2010 - 3:39 pm
1. Yes.

2. Yes.
 Nicolas M posted on Saturday, July 17, 2010 - 4:47 pm
Dear Professors,

I'm doing a growth curve analysis on a 9-waves panel data. Individuals in it have very different ages (going from 16 to 80). I think I'm using what you call the "wide" format, where for each individual I have
age1 outcome1 age2 outcome2 age3 outcome3 ...
20 1 21 5 22 6

I defined the ages as TSCORES using "TSCORES are age1-age9;". However, I have convergence problems. I managed to solve them by standardizing all the age variables using the following operation:
age1standard. = (age1 - mean(all_ages))/sd(all_ages)
age2standard. = (age2 - mean(all_ages))/sd(all_ages)
etc.

Now, the model converges.
Do you think this is a proper way to solve this problem? Can you see any reason for not doing that?

Thanks in advance for your advice.
 Linda K. Muthen posted on Saturday, July 17, 2010 - 5:41 pm
I would not standardize. I would divide age by a constant such as ten.
 Nicolas M posted on Tuesday, July 20, 2010 - 2:13 pm
Thank you for your answer.
I did try to divide the age by a constant, but I still have major convergence problems.
I was thinking, as every observations are equally spaced, is it reasonable to use simply :

i s | f1@0 f2@1 f3@2 ...
and then controlling for the starting age :
i s ON age0

instead of using the TSCORES command? Or is it not a good idea?
Actually what I like with this method is that mplus doesn't use the EM algorithm for numerical integration, so it is much faster.
The model has a good fit. But I need to be sure it is statistically correct...
 Linda K. Muthen posted on Wednesday, July 21, 2010 - 10:10 am
I think using TSCORES is preferred. You could also consider multiple group multiple cohort as shown in Example 6.18. If you send the output where you failed with TSCORES and your license number to support@statmodel.com, we can see if we can help.
 Melvin C Y posted on Monday, September 06, 2010 - 8:39 pm
Dear Dr Muthen,

I have similar measures obtained from two cohorts (group1=10-12 years; group2=13-16 years). As there is no common age or linking data between cohorts, would I still be able to use the multiple cohort LGM (i.e., 10-17 years)? Would you suggest piecewise model instead?

Thank you.
 Bengt O. Muthen posted on Tuesday, September 07, 2010 - 8:22 am
You can use piece-wise and see if the two pieces align.
 Andy Ross posted on Thursday, February 24, 2011 - 2:13 am
Dear Prof Muthen

I’d like to estimate a linear growth model for a categorical outcome and wanted to use the multiple cohort option, however i'm under the impression that this option is not available with categorical outcomes is that correct?

I'm modelling gang membership over three time points using data that contains young people aged 11, 12, 13, 14, 15, 16, 17 at time point one. I wanted to use this approach as the alternatives appear severely limited by the number of time points - i.e. the standard LGM would only allow a linear model which is not supported by the data - not to mention the fact that I would like to capture the age crime curve.

It would also be useful to estimate different trajectories, i.e. adolescent limited and persistent offenders, as far as they may exist for gang membership - does the multiple cohort option allow this? I did also consider using LCA as an alternative but am limited by the number of classes I can estimate before the model is not identified.

Can you offer any suggestions or am I simply asking too much of the data?

Many thanks for your support

Andy
 Linda K. Muthen posted on Thursday, February 24, 2011 - 9:47 am
Multiple group multiple cohort analysis as shown in Example 6.18 can be used with categorical outcomes.
 Andy Ross posted on Thursday, February 24, 2011 - 10:09 am
Many thanks

Maybe i'm setting the model up wrong? When i use the variable command to specify that the newly created observed measures should be categorical i get the following warning

Observed outcomes in a growth process must be measured on the same scale.
Problem with: I S Q
 Linda K. Muthen posted on Thursday, February 24, 2011 - 10:36 am
Please send your output and license number to support@statmodel.com.
 Jing Zhang posted on Tuesday, May 03, 2011 - 12:11 pm
Dear Professor Muthen,

You mentioned that there are two ways to handle multiple cohort data: 1) a multiple group approach to cohort analysis; and 2) make the program to string the data out over time.

I have several questions:
1) If the data are missing by design, e.g. for some cohorts, the data were not collected at certain time points of the survey, can I still use a multiple group approach to cohort analysis as indicated in example 6.18?
2) I am doing a three-level multiple cohort growth curve model for my research. The data are missing by design, and followings are the example. Can I still follow the example of 6.18, or I should string the data out over time? Do you have an example of the syntax for multilevel multiple cohort growth curve modeling with stringing the data out over the time?

Note: cohort 1 does not have data on y1 and y2, and cohort 2 doesn’t have data on y2.

y1 y2 y3 x1 x2 x3 cohort
x x 7 2 5 3 1
x x 6 1 3 4 1
x x 5 2 5 6 1
x 3 5 1 6 7 2
x 2 3 4 3 2 2
x 1 3 6 7 3 2
5 3 2 1 8 9 3
3 4 7 5 1 8 3
3 5 8 2 4 6 3

Thanks,
Jing
 Bengt O. Muthen posted on Tuesday, May 03, 2011 - 3:04 pm
The multiple-cohort, multiple-group approach is not straightforward in Mplus when the cohorts have different number of observed time points. So I would string out the data. Multilevel does not cause any extra difficulty as far as I can see; I don't have an example.
 Linda K. Muthen posted on Tuesday, May 03, 2011 - 5:24 pm
Use DATA LONGTOWIDE to string out the data. See the user's guide.
 Jing Zhang posted on Thursday, May 05, 2011 - 12:46 pm
Dear Dr Muthen,

Thanks for your answers. I have a further question about DATA LONGTOWIDE.

My data set is already in wide format. Can I still use DATA LONGTOWIDE to string out the data? Maybe the way I present my data set caused confusion. The data set is as follows:

subjectNO. y1 y2 y3 cohort
10001 x x 7 1
10002 x x 6 1
10003 x x 5 1
10004 x 3 7 2
10005 x 5 8 2
10006 x 2 0 2
10007 1 5 7 3
10008 2 8 9 3
10009 5 9 2 3

note: x represents missing data, y1-y3 represent data collected at three waves

I wrote the following codes

VARIABLE: NAMES = y1-y3 g;
GROUPING = g (1 = cohort 6, 2 = cohort 9, 3=cohort 12);
COPATTERN = cohort (1=y3 2=y2 y3 3=y1 y2 y3)
TIMEMEASURES= y1(1994) y2(1997) y3(2000)
TNAMES=int

The results keep saying "UNKNOWN OPTION: COPATTERN".

I wonder why? Could you give some inputs?

Thanks,
Jing
 Linda K. Muthen posted on Thursday, May 05, 2011 - 1:39 pm
If you date are in the wide format, you don't need to do anything to string it out.

The COPATTERN option is part of the DATA COHORT command. See the current user's guide.
 csulliva posted on Friday, May 27, 2011 - 2:43 pm
I conducted a multiple cohort growth model using the known class option (three cohort groups) and found that the model with equality constraints was of poorer fit than the unconstrained model.

1. This would suggest that I would need to account for those groups (cohort) effects throughout my analysis. Is this correct?

2. Does that necessitate freeing the estimates for the growth factors, residual variances/covariances, and any covariate effects across groups?

3. If so, are there any tractability/estimation issues in particular that need attention in this process? I have run a test model freeing those parameters and a covariate effect and have had difficulty with convergence. Would this just be a matter of increasing the number of random starts or MIterations?

Thanks in advance for any advice you can offer.
 Bengt O. Muthen posted on Friday, May 27, 2011 - 4:51 pm
1. Yes

2. Yes, but you may only have to free a few critical parameters.

3. Try using modification indices in the model with full equality to see which parameters are not equal.
 Carolyn Tompsett posted on Tuesday, September 20, 2011 - 7:55 am
Drs. Muthen,
I have a longitudinal dataset with seven waves (base, six months, 12 months, 18 months, 4.5 years, 5.5 years, 6.5 years), with youth who were aged 13-17 at baseline. I need to use TSCORES to deal with individual variability around each wave, and zero-inflated Poisson distribution to deal with high number of zero’s in the outcome. I would like to use age, rather than time since baseline as the time variable, so created TSCORES variables that are age0, age1, age2, etc… representing their age at each wave of data collection. In addition I am trying to test for cohort effects given the 13-17 range at baseline. Does the syntax below make sense? In particular, I’m concerned about whether I am structuring the TSCORES correctly.
Thanks in advance,
Carolyn


USEVAR ARE AGECOH AGECEN0 AGECEN1 AGECEN2 AGECEN3 AGECEN4 AGECEN5 AGECEN6
ALSYCT0 ALSYCT1 ALSYCT2 ALSYCT3 ALSYCT4 ALSYCT5 ALSYCT6;

COUNT=ALSYCT0-ALSYCT6 (i);

MISSING ARE ALL (999);

TSCORES ARE AGECEN0 AGECEN1 AGECEN2 AGECEN3 AGECEN4 AGECEN5 AGECEN6;

ANALYSIS: TYPE= RANDOM MISSING;
PROCESSORS=2;

MODEL:
i s | alsyct0-alsyct6 AT agecen0-agecen6;
ii si | alsyct0#1-alsyct6#1 AT agecen0-agecen6;
i ON agecoh;
s ON agecoh;
ii ON agecoh;
si ON agecoh;
 Linda K. Muthen posted on Tuesday, September 20, 2011 - 10:15 am
I would take the mean of all of the time score variables (AGECEN0 AGECEN1 AGECEN2 AGECEN3 AGECEN4 AGECEN5 AGECEN6) and subtract that mean from each time score variable.
 Christopher Bratt posted on Wednesday, February 22, 2012 - 7:28 am
Hi

I am trying to develop a multivariate MLGM (using Bayesian estimation). The challenge is that data consist of different cohorts.

- I have two cohorts with data at three time points each (school grades 8, 9, and 10).
- I have two cohorts with data at two time points (school grades 8 and 9 OR school grades 9 and 10, so these cohorts have missingness by design).

My idea is that some of the analyses should combine all four cohorts into one analysis of school grades 8 to 10 to increase sample size.

I see there are a few options for cohort analyses in Mplus. But I should try to do this wisely. I would want to test for time effects (e.g. simulating the possibility of unknown historical events). This means that a measurement in any of the three specific years where measurements were conducted can affect scores, for one cohort this effect will be in grade 8, for another cohort the effect will be in grade 9 while for yet another cohort the effect will be in grade 10.

Anyone of the wonderful Mplus team -- or any other in the Mplus community -- do you have suggestions on how best to develop this model?

Kind regards,
Christopher Bratt
 Christopher Bratt posted on Wednesday, February 22, 2012 - 7:32 am
CORRECTION:

I have FOUR measurements (four years where measurements were conducted).

These are used for growth models with three time points (maximum number of measurements for one cohort), measurements in the model are equal to school grades 8 to 10.

Chris
 Bengt O. Muthen posted on Wednesday, February 22, 2012 - 10:09 am
I would do a multiple-group analysis, with cohort as group and grade as time axis for the growth model. With Bayes, you do multiple-group via Knownclass in Type=Mixture. Testing for time effects may be more tricky. Although an event in a certain year influences subjects in different grades for different cohorts, we don't know if that same event has a different influence for students in different grades. There is a large literature on age-period-cohort analyses. But in principle you can let the event effect be restricted to have the same magnitude in the different cohorts (for the different grades), for instance by letting an intercept of the outcome at that point jump out of line of a linear growth model.
 Christopher Bratt posted on Wednesday, February 22, 2012 - 10:27 am
Thanks, Bengt. So you would not (also) do a model with all cohorts and try to account for cohorts effects within that model. A multiple-group analysis seems to give me only data for the two cohorts with measurements at all three time points, I wanted to add an analysis with all my data (four cohorts) to check whether increasing the number of cohorts and the sample size changed anything (but this gives missingness by design).
 Bengt O. Muthen posted on Wednesday, February 22, 2012 - 11:07 am
You can do a single-group run of all cohorts as well, although then investigation of cohort differences is not as flexible. The multi-group approach can handle different number of observed variables per our FAQ, but I haven't tried something like that with Bayes.
 Christopher Bratt posted on Thursday, February 23, 2012 - 2:40 pm
Bengt, just a follow up on this brief dialogue:

*** WARNING in ANALYSIS command
Estimator BAYES is not allowed with TYPE=TWOLEVEL MIXTURE.
Default estimator will be used.

You will know that. But I thought I should add it for other readers.
 Matt Easterbrook posted on Thursday, May 10, 2012 - 9:34 am
Hi,

I have data from an accelerated cohort design with 4 cohorts, measured at three time points. I want to look at temporal antecedents for my outcome. Is the only way to do this to use X1 and X2 to predict Y2 and Y3, or is there a way I can use the accelerated cohort design too?

Thanks
 Linda K. Muthen posted on Thursday, May 10, 2012 - 1:20 pm
A multiple cohort design can be used with any model.
 Nicholas Bishop posted on Monday, September 10, 2012 - 6:28 pm
Hello,
I am interested in running a multiple-group multiple-cohort model similar to example 6.18, although I would like to use time scores rather than measurement occasion for my time points. In a multiple-group multiple-cohort model, do I need to center the individually-varying time-scores (age) for each of the groups (cohorts) at initial measurement, or would I grand-mean center the time scores for all groups? Thanks for your assistance.
 Bengt O. Muthen posted on Monday, September 10, 2012 - 6:58 pm
You want to center time with respect to the full time range, not specifically for each cohort.
 Nicholas Bishop posted on Wednesday, September 12, 2012 - 11:29 am
Thank you for your quick response. To elaborate on my question, would I want center time on the full time range in both of the following scenarios?

a) accelerated cohort design (example 6.18)
b) multiple group LGM using time scores as time points and using cohort as the grouping variable
 Linda K. Muthen posted on Thursday, September 13, 2012 - 10:47 am
You would want to center time on the full time range in both a and b.
 Nicholas Bishop posted on Sunday, September 16, 2012 - 3:42 pm
In response to a recent post in this thread, Linda suggested that the user should center time scores representing age on the mean age taken across all observation points. I have been under the assumption that time scores representing age should be centered on the mean age at the first observation. Is one of these methods "correct" or are there certain situations where one method is preferable to the other? As always, thank you for your guidance.
 Bengt O. Muthen posted on Monday, September 17, 2012 - 7:55 am
I think centering choice is largely determined substantively. That is, which age do you want the intercept factor to represent? But in some cases the correlations between the growth factors can get uncomfortably close to 1 in which case average age centering can help to make them less correlated.
 Nicholas Bishop posted on Tuesday, September 18, 2012 - 4:38 pm
Thank you again.

I've ran a multiple group multiple cohort analysis, using KNOWNCLASS due to the use of time scores. I am not sure if there are problems with my model or if I need to rethink my interpretation of group specific intercepts.

My main question is regarding the interpretation of the intercept for each cohort. Currently the time scores are age at each observation centered on the grand mean for age at time 0, with this value divided by 10 to reduce the range of time scores.

The problem is that I am receiving estimated intercepts that are outside the range of possible values for the outcomes. For example, the estimated intercept for my outcome for the oldest cohort (cohort 1) was 24, which is far off the possible range of values for the outcome (mean = 3, sd = 3, range = 0-11). The mean value of the time score for the oldest cohort was 1.4. My understanding is that the intercept should be interpreted as the mean value of my outcome at the mean age at time 0. Should the group-specific intercepts be interpreted in this manner? If so, would these intercepts suggest non-linearity that may require quadratic terms?


FYI, here is the mean age of each cohort at time 0:

Cohort 1: 79
Cohort 2: 70
Cohort 3: 60
Cohort 4: 52
 Linda K. Muthen posted on Wednesday, September 19, 2012 - 11:57 am
You should be estimating one intercept growth factor and one slope growth factor. There should be equalities of these parameters across cohorts. Please see Example 6.18.
 Nicholas Bishop posted on Thursday, September 27, 2012 - 7:09 am
Thank you again for the guidance. I now have a question regarding the class specific output. When using the KNOWNCLASS option in mixture modeling, my assumption was that the "CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP" output should provide the same group n-counts as the frequency of my cohort grouping variable, but it does not. Is there a simple explanation for this difference?

Observed class counts: a = 4753 b = 2096 c = 9396 d = 2277

Estimated class counts and proportions n-counts: a = 4480, b = 2239, c = 7596, d = 4206
 Linda K. Muthen posted on Thursday, September 27, 2012 - 8:35 am
Please send the output and your license number to support@statmodel.com.
 David Sikkink posted on Sunday, April 07, 2013 - 2:59 pm
I take it that a multilevel multiple cohort latent growth model is possible in Mplus. The GROUPING option is available with TYPE=TWOLEVEL RANDOM. (Or?)

Are there any syntax examples available for this type of model?

Thanks.

--David
 Linda K. Muthen posted on Monday, April 08, 2013 - 12:02 pm
If GROUPING is not available, use the KNWONCLASS option and TYPE=MIXTURE TWOLEVEL RANDOM.

The only syntax is the example in Chapter 6. You would need to adapt that.
 David Sikkink posted on Monday, April 08, 2013 - 5:29 pm
Thanks, Linda.

I am assuming that when I use the "mixture twolevel random" for a growth model and include multiple cohorts, I will assign time scores for each known class/group as shown in Ex. 6.18.

But I am not sure what time scores would be assigned in the %WITHIN% / %OVERALL% portion of the model.

Thanks for your help with this. Sorry, I am sure there is an obvious answer that I am overlooking here.

--David
 Linda K. Muthen posted on Tuesday, April 09, 2013 - 1:54 pm
Yes, you would set it up like Example 6.18 but in a multilevel setting. See Example 9.12.
 David Sikkink posted on Tuesday, April 09, 2013 - 6:05 pm
I am still unsure about what time scores would replace the question marks below.

(I used Ex. 6.18 to figure out the time score for each of 4 cohorts.)
...

TYPE = TWOLEVEL MIXTURE RANDOM;

MODEL:

%WITHIN%
%OVERALL%

iw sw | y1@0 y2@.? y3@.? y4@.? ;

iw sw ON x1 ;
y1-y4 (1) ;

%cg#1%
iw sw | y1@0 y2@.1 y3@.2 y4@.3 ;

%cg#2%
iw sw | y1@.1 y2@.2 y3@.3 y4@.4 ;

%cg#3%
iw sw | y1@.2 y2@.3 y3@.4 y4@.5 ;

%cg#4%
iw sw | y1@.3 y2@.4 y3@.5 y4@.6 ;

%BETWEEN%
%OVERALL%

ib sb | y1@0 y2@.? y3@.? y4@.? ;
y1@0 y2@0 y3@0 y4@0 ;
...
Thanks. --David
 Linda K. Muthen posted on Wednesday, April 10, 2013 - 7:02 am
You need to follow Example 6.18. This is also described in either the Topic 3 or Topic 4 course handouts. The key is arranging your data by age not cohort. Age is the time variable. The time scores come from this. This is described in Example 6.18. Follow these steps for your example.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: