Mplus Discussion >> Lcgm and binary outcome

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Lcgm and binary outcome

Mplus Discussion > Growth Modeling of Longitudinal Data >

Message/Author

Scott C. Roesch posted on Saturday, May 26, 2007 - 5:55 am

I ran an unconditional growth model with both a linear and a quadratic growth factor. My primary outcome is categorical, and treating as such in the input file using the code below. Using the code below the intercept was not estimated. I was curious why this would happen since it is estimated when I treat the categorical observed variable as continuous?

CATEGORICAL ARE t1smkst t2smkst t3smkst t4smkst;

Analysis:
TYPE = MEANSTRUCTURE MISSING H1;

MODEL:
i s q | t1smkst@0 t2smkst@1 t3smkst@2 t4smkst@3;
i with s@0;
i with q@0;
s with q@0;
s@0;
q@0;

OUTPUT: SAMPSTAT RESIDUAL STANDARDIZED;

Linda K. Muthen posted on Saturday, May 26, 2007 - 6:36 am

The growth model paramaterization with categorical outcomes is to hold thresholds equal and not estimate the mean of the intercept growth factor. The two equivalent but different growth model parameterizations are shown in Chapter 16 where growth models are discussed. This one is used with categorical outcomes because it offers more flexibility with ordinal variables.

Mario Mueller posted on Thursday, July 29, 2010 - 1:40 am

Hello,

I have a small sample (N=86) and 3 time points with binary outcomes. I plan to run a LCGM to find trajectory classes. Time intervals between T1 and T2 are varying across individuals between 10 and 50 years, between T2 and T3 it is the same for each person (14 years).

My idea is to center time at T2 (slope loading=0), using a loading of 1 for T3 and allow the loading of T1 to be free.

Example for a 3-class solution:

classes = c(3);
categorical = t1 t2 t3;
missing are t1 t2 t3 (99);

analysis: type=mixture;
STARTS= 500 50;

model:
%overall%
i s | t1 t2@0 t3@1;

My questions:
1. Are these conditions (time points, sample size) sufficient for this method?
2. What do you suggest is the best way to specify time ?

Thank you!

Jon Heron posted on Thursday, July 29, 2010 - 7:51 am

Hi Mario,

I think you're going to need to go long-format here, or wide format but model time with the AT command.

With 3 fixed time points and binary data you only have 8 pieces of information - the number of occurrences of 000, 001, 011, 010, 100, 110, 101 and 111.

Doesn't sound enough to me to fit 3 classes (2 parameters), estimate means for intercept and slope in each one (another 6 parameters) and also estimate one of your loadings.

Mario Mueller posted on Thursday, July 29, 2010 - 8:40 am

Hi Jon,

thank you for your response.
When comparing with model 8.9, is it because of only 3 time points? Is there another way to test for or to confirm trajectories of these binary data (diagnosis vs. none) across three time points?

Jon Heron posted on Thursday, July 29, 2010 - 8:59 am

Hi Mario,

the method I obscurely referred to as "AT" is shown in example 6.12. Ignore the stuff with "ST" in it.

This allows for varying ages at each observation, e.g. you might have an intended data collection when the subjects are age 12 but due to a number of reasons the actual age is scattered across 11.5 - 12.5.

You can get an equivalent model by stacking the dataset into long format and fitting an MLwiN-style growth model.

This would seem a reasonable approach if your ages were spread in the way I describe, however I am a little bit wary in your case as you say the spread of ages is 10-50 years. can anything be modelled in a linear fashion across such a range?

Mario Mueller posted on Friday, July 30, 2010 - 9:16 am

Hello Jon,

Thank you...okay, I reshaped the dataset to long format and now it contains ID, "age" (0 at baseline) and individually varying ages for t1 and t2 as well as the associated diagnosis status (0/1).
There is no way to run such a model in MPlus? What about STATA?
Or did I misunderstand you?

Linda K. Muthen posted on Friday, July 30, 2010 - 9:40 am

Example 9.16 shows the model specification for a growth model using the long format.

Mario Mueller posted on Monday, August 02, 2010 - 1:02 am

Hello Linda,

I studied the Instructions for model 9.16 but do not fully understand its parts (time, a3). Is there a topic handout available?

My aim is to find and/or confirm classes of trajectories over 3 time points with binary outcomes without any covariates but individually varying time intervals.

Thank you for your support
Mario

Mario Mueller posted on Monday, August 02, 2010 - 1:34 am

Hello Linda,

Thank you for your response. I studied model 9.16 but do not fully understand its parts (e.g., time , a3). Is there a topic handout available?

My aim is to find and/or confirm trajactories of a binary outcome over three time points without any covariates but individually varying time intervals.

Mario

Mario Mueller posted on Monday, August 02, 2010 - 1:37 am

Sorry, got an error message after first posting!

Jon Heron posted on Monday, August 02, 2010 - 2:58 am

Hi Mario,

a3 is the time-varying covariate.
This might be useful. I've fitted a growth model to 3 measures of weight (wt1-wt3) with variable times of measurement. Both these bits of syntax give the same result (wide and then long format). Should be simple to extend to multiple classes.

VARIABLE:
names=
wt1 wt2 wt3 agewks1 agewks2 agewks3;
missing are all (-9999) ;
tscores = agewks1 agewks2 agewks3;
ANALYSIS:
proc = 2 ;
type=random ;
MODEL:
i s | wt1 wt2 wt3 at agewks1 agewks2 agewks3;
i s ;
i with s ;
wt1 wt2 wt3 (equalvar) ;

VARIABLE:
names=
id wt agewks ;
missing are all (-9999) ;
cluster=id ;
within=agewks ;
ANALYSIS:
type= twolevel random ;
algorithm = integration ;
integration=100 ;
miterations=1000 ;
MODEL:
%within%
s | wt on agewks ;
%between%
wt with s ;

Mario Mueller posted on Tuesday, August 03, 2010 - 4:01 am

Hello Jon,

thanks for these examples...

when I specify the regarding model:

variable: names are id dia0 dia1 dia2 zeit0 zeit1 zeit2;
usevariables dia0 dia1 dia2 zeit0 zeit1 zeit2;
tscores = zeit0 zeit1 zeit2;
categorical = dia0 dia1 dia2;

analysis: type=random;
proc = 2;

model: i s | dia0 dia1 dia2 at zeit0 zeit1 zeit2;
i s ;
i with s ;
dia0 dia1 dia2 (equalvar) ;

...I get the error message, that "proc" is unknown. Furthermore, how can I extend to classes (type=mixture?)? Is (equalvar) part of the model? didn't find any explanation about it.

thanks
Mario

Jon Heron posted on Tuesday, August 03, 2010 - 5:32 am

1] Proc problem. Have you got a really old version of Mplus - v4 perhaps??

2] Equalvar. Anything in round brackets implies a constraint. This is simply saying that the residual variances are constrained equal across time. Not necessary for the model, but needed to obtain the same answer as long-format

3] Add "type = mixture;" to the analysis, "classes = c(2);" to the variable section and "%overall%" immediately after "model:"

Mario Mueller posted on Tuesday, August 03, 2010 - 9:43 am

yes, I use v4! What is it for and can I use an equivalent command in v4?

Jon Heron posted on Wednesday, August 04, 2010 - 12:43 am

It's to utilise multiple processors if your PC has more than one. Means your programs will run faster - random starts are shared across the processors. Guess it was introduced into version 5.

Your sample is small so I don't think a big problem here, but it makes a big difference to me as I have 100 times your sample size.

I should point out that that's not the only benefit to upgrading - new routines, new estimators, better efficiency. I'll stop now cos I don't work for Mplus' sales dept ;-)

Mario Mueller posted on Thursday, August 05, 2010 - 2:03 am

Thank you, Jon! I hope that's okay that I ask all these things but I've just begun to work with MPlus.

I followed all your suggestions but two errors occured:

1. tscores: are these relevant for the model? tscores are not compatible with mixture models.
2. the known error message: Categorical variable DIA1 contains less than 2 categories. ...although I doublechecked it!

Thanks,
Mario

Jon Heron posted on Thursday, August 05, 2010 - 4:54 am

1] Yes, you need them for allowing varying times of observation if you're modelling in wide-format. Sounds like long-format the only option then.

2] Has your dataset been read in correctly? If you fit a very simple model e.g.

i | dia0@0 dia1@1 dia2@2;

then the first bit of output will be the distribution of each of your categorical measures. Does that look like it should do?

Jon Heron posted on Friday, August 06, 2010 - 1:07 am

Mario,

I'm struggling to estimate a 2-class growth model in long-format using my data so am wondering if you have any other options to keep things simple.

For simpler wide-format analysis you need more degrees of freedom, so how about 3-level instead of binary repeated measures, is that possible with your data?

Mario Mueller posted on Friday, August 06, 2010 - 7:36 am

Hi Jon,

my outcome is strictly binary...either you are below or above the diagnostic cutoff from a stress response syndrome. We used interview data containing different criteria.

I will try your two suggestions from above and let you know.

You are a big help to me. Thank you very much.
Mario

Mario Mueller posted on Friday, August 06, 2010 - 8:06 am

oha, I mixed up dia2 with time0 (what is always =0) so MPlus didn't read two categories...it's corrected now.

When I specify "mixture" I get the message:

TSCORES option is only available with TYPE = RANDOM.

Otherwise (with "random") I get:

*** WARNING in Variable command
CLASSES option is only available with TYPE=MIXTURE.
CLASSES option will be ignored.
*** ERROR in Model command
Unknown variables: %OVERALL%
in line: %OVERALL%

Should I try your suggested long-format syntax?

Jason Bond posted on Saturday, December 06, 2014 - 11:59 am

Bengt/Linda,

I intend to do age-based (instead of wave-based) analyses using the NLSY dataset. Do all age-based analyses require the use of t-scores? That is, although it may be that using year of interview may not be exactly linearly related to age, if one simply uses age at first interview and then assumes time between year of interview increases exactly linear with age, is there a way to get around using the tscores option to do age-based analyses? Thanks,

Jason

Bengt O. Muthen posted on Saturday, December 06, 2014 - 2:04 pm

If you have the same distance in time between the repeated measurements for all subjects there is no need for t scores.

Jason Bond posted on Tuesday, December 16, 2014 - 1:28 pm

I was hoping this was the case. Then how would one implement age based analyses? My guess is that fixing time to specific values (e.g., @0 for the first wave, etc.) would not be correct? Thanks again...

Bengt O. Muthen posted on Tuesday, December 16, 2014 - 6:30 pm

See UG ex 6.18.

Jason Bond posted on Wednesday, January 21, 2015 - 3:01 pm

Thanks for this. Beyond the question of cohort effects, the endpoint of our analyses, however, is to estimate LCGM/GMM. Assuming no cohort effects are found, is there a way to set up such mixture models for age-based analyses assuming time between waves is constant across individuals not using TSCORES? Thanks again...

Jason

Bengt O. Muthen posted on Wednesday, January 21, 2015 - 4:07 pm

The approach of UG ex 6.18 can be used in a mixture setting as well. Group (cohort) becomes Knownclass.

Jason Bond posted on Tuesday, February 10, 2015 - 12:57 pm

Related...I'm analyzing NLYS with data from 11 waves. In using TSCORES in a random slope model, say, the error message I get is:

*** ERROR in MODEL command
The number of fixed time scores is not sufficient for model identification
in the following growth process: I S

from the corresponding analysis syntax:

i s | f3_82 f3_83 f3_84 f3_88 f3_89 f3_94 f3_02 f3_06 f3_08 f3_10 f3_12 AT t_82
t_83 t_84 t_88 t_89 t_94 t_02 t_06 t_08 t_10 t_12 ;

Although I'm guessing it doesn't matter, age has been 'centered' around 21 (and the data truncated so that only data from ages 21-51 are analyzed) so that 'times' of measurement indicate how much older than 21 the respondent was at the corresponding wave. Thanks for any input...

Jason

Jason Bond posted on Tuesday, February 10, 2015 - 1:12 pm

Related to the above mentioned NLSY data (11 waves and over 10k cases), I'm trying to estimate a single class cubic growth model with random growth coefficients and no other covariates. Quadratic models seem to converge fine but cubic (and higher order ones) give me:

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILL-CONDITIONED
FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE
DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES
BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION
NUMBER IS 0.204D-10.

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE
AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR
STARTING VALUES. PROBLEM INVOLVING PARAMETER 1.

I've tried increasing the starts to no avail and the same LL seems to be being reached for a number of starts. With so many time points per respondent available, I would assume that such a model would be estimable. Might you have any advice for me for fitting such a model (other than the typical fixing of parameters)? Thanks much again,

Jason

Bengt O. Muthen posted on Tuesday, February 10, 2015 - 2:11 pm

Post 1:

The Timescores option should be used in conjunction with the AT option of growth modeling.

Post 2:

We need to see the output.