Longitudinal binary data PreviousNext
Mplus Discussion > Growth Modeling of Longitudinal Data >
 Anonymous posted on Tuesday, November 20, 2001 - 11:13 am
Hi, I am pretty new in this field. I have a data set with the following relationship:

X -> Y -> Z

X, Y and Z are all observable binary longitudinal data. I wonder if Mplus can handle this type of model, if it does, which part of the analysis (e.g. 'path analysis') should be performed? Many thanks for your help!
 bmuthen posted on Wednesday, November 21, 2001 - 8:26 am
Yes, this can be seen as path analysis. In Mplus language you define y and z as categorical and then say:

z on y;
y on x;
 Silvia Sörensen posted on Sunday, May 18, 2003 - 9:44 am
Is it possible to run a parallel process model for a continuous and a categorical variable? I haven't found any examples that combine the parallel process with the
categorical approach.

My continuous variable was measured monthly over 15 months, my categorical (binary) variable at baseline, month 3, month 9 and month 15. I would like to know if one variable "drives" the other.
I'm fairly new at this
 Linda K. Muthen posted on Monday, May 19, 2003 - 8:39 am
Yes, it is possible to do that. Just put together a model where one process is continuous with 15 measures and one categorical with four measures. You can put together using Example 22.1 for the continuous outcome and Example 22.1D for the categorical outcomes with/or without the covariates.
 Anonymous posted on Monday, December 06, 2004 - 11:28 am
How much variability is necessary for MPlus to estimate a model? I have binary outcome data and some low base rates (in terms of the presence of the outcome). I have 20 participants measured over 18 time points. When I attempt to run the LGCA, I get the following error message:


Unfortunately, I get the same message for about 1/2 of my time points. (I have no missing data - just mostly 0's with a few 1's interspersed throughout the dataset.)Is this error accurate, or might I have an error in my program?
 bmuthen posted on Monday, December 06, 2004 - 11:41 am
You mention doing LGCA (I assume you mean LCGA), that is ML estimation of a mixture model. But the error message seems to refer to weighted least squares analysis - not a mixture analysis. Please clarify or send input, output, and data to support@statmodel.com.
 Anonymous posted on Wednesday, October 12, 2005 - 12:04 am

I am working on LGM with longitudinal binary variables. It also has time- varying covariates which are also binary. When I added time-varying covariates, the program gave me a warning saying that it needs Monte Carlo integration.

Having Monte Carlo as integration is the way to go in my situation(LGM with binary variables and time-varying covariates)?

Thanks in advance!
 Linda K. Muthen posted on Wednesday, October 12, 2005 - 8:36 am
I would imagine that without covariates, you were using the default WLSMV estimator and when you added the covariates, maximum likelihood was used. Certain models require Monte Carlo integration. If you need additional information, please send your input, data, output, and license number to support@statmodel.com.
 Gina Allen posted on Friday, October 06, 2006 - 11:30 am
We are working on a project in which we are trying to model five categorical indicators at 6 time points. One variable (retirement) designates a permanent transition (once retired always retired). The rest (work, marriage, etc.) can take any value at each time point. We are planning to run latent class models for each of the five variables over the 6 time points and then do a follow up latent class analysis of the cross-classification of the resulting trajectories. Is there an example we can follow or a more efficient way to program this?
 Bengt O. Muthen posted on Friday, October 06, 2006 - 6:41 pm
I am not sure what you mean when you say "then do a follow up latent class analysis of the cross-classification of the resulting trajectories". I can imagine several approaches - here are two, both are based on an LCA at each time point. One is latent transition analysis where class membership at time t influences class membership at time t+1. There are several examples of that in the UG. The other is latent class growth analysis where the outcome is the latent class variable at each time point - that is a bit cumbersome, however. The latter is the same as doing a joint LCA of all time points, but then structuring the latent class probabilities according to an LCGA.
 Sarah Dauber posted on Monday, October 09, 2006 - 8:54 am
I am interested in using growth curve analysis to analyze data on monthly rates of abstinence from drug use across 18 timepoints. Is this possible to do in MPLUS?

Thank you for your help.
 Linda K. Muthen posted on Tuesday, October 10, 2006 - 8:37 am
Yes, this is possible in Mplus.
 sara hussain posted on Monday, January 29, 2007 - 10:23 am
I am interested in modelling longitudinal poverty (a binary indicator) over 15 waves of data. I am learning about the LCGA approach, however, please could you tell me what the main differences are between this method and latent Markov analysis (of Langeheine & van de Pol)?

Many thanks

 Jungeun Lee posted on Thursday, August 09, 2007 - 12:37 pm

I am running a latent growth mixture modeling with 4 binary variables. When number of classes >1, Mplus gives me a warning like;


The parameter 3 is alpha for the slope of the first class and probably because of that, S.E for the slop of the first class is 0. So, I can't tell it is statistically significant or not. Could you let me know how problematic it is to have the warning like the above? Could you please also let me know what I can do in a situation like this?

Thanks in advance.
 Linda K. Muthen posted on Tuesday, August 14, 2007 - 6:41 pm
This seems like a problematic message. Please send your input, data, output, and license number to support@statmodel.com.
 Jungeun Lee posted on Tuesday, August 21, 2007 - 4:10 pm
I just emailed them to you. Thanks!
 Emily Blood posted on Wednesday, September 24, 2008 - 7:23 am
I am fitting a latent growth curve with repeated observed binary outcome, a latent continuous intercept and a latent continuous slope and a logit link between the latent intercept and slope and binary outcomes, using the MLR estimation. I just want to be clear on the estimation procedure. Does the likelihood assume normality for the observed outcomes (or the unobserved continuous variable with the threshold:Y*) and use the normal density or is the logistic density used in the likelihood? If you could let me know that would be great.
P.S. I have read the Muthen and Asparouhov 2008 paper, thank you for sending that reference. I still have this one question, though.
 Bengt O. Muthen posted on Wednesday, September 24, 2008 - 9:31 am
The binary growth model with logit link uses regular logistic regression of each binary outcome on the growth factors. Regular logistic regression does not necessitate an underlying y* variable, but simply considers the conditional probability of the binary outcome as a function of the growth factors. However, the logistic regression model can equivalently be expressed in terms of such a y* variable that has a logistic density given the predictors (growth factors).
 Emily Blood posted on Wednesday, September 24, 2008 - 11:30 am
Thank you for the response. So the observed data likelihood in this case is expressed as:
Integral[f(xb) * normal density of random effects] integrated over the random effects? Where f(xb)=p(xb)^y * (1-p(xb))^(1-y)? I'm basing this on the estimation section in the Muthen & Aspourohov, 2008 paper and putting in the likelihoods for this specific case. Hopefully I've understood it correctly?
 Bengt O. Muthen posted on Wednesday, September 24, 2008 - 11:47 am
Yes. You have conditional independence of the y's given the random effects.
 Nicholas Bishop posted on Monday, December 14, 2009 - 10:23 am
I am interested in creating a factor-of-curves LGM that utilizes both binary and continuous lower-order curves to estimate the higher order factor. Would this be as simple as defining the lower-order binary curves as categorical (as described in example 6.4 of the user's guide) then creating a model similar to that described by Duncan, Duncan, and Strycker (2006)? Here is a link to their example of the factor-of-curves LGM with only continuous outcomes: http://www.ats.ucla.edu/stat/mplus/examples/ddsla/app52man.inp.txt. Thanks.
 Linda K. Muthen posted on Tuesday, December 15, 2009 - 8:43 am
To change the input from the link to a combination of categorical and continuous variables, add the CATEGORICAL option to specify the variables that are categorical.
 Nicholas Bishop posted on Tuesday, January 05, 2010 - 12:13 pm
I have two questions relating to the factor-of-curves model mentioned above. Is it possible to utilize mixture modeling on the second-order slope and intercept? I would like to examine the heterogeneity in the second order curve. Also, I have missing data related to the outcome variables (smoking and health screening in a sample of older adults). What would be the most efficient way of accounting for data NMAR in the factor-of-curves model (while also utilizing selection modeling with the common I S) ?
 Bengt O. Muthen posted on Tuesday, January 05, 2010 - 3:02 pm
To answer your first question: Yes.

To answer your second question, NMAR is a big topic and is not easy to carry out well; no approach is really "efficient". A first step would be to check if those dropping out have a different mean on the outcome before dropping out than others do. But even if that is so, it doesn't mean that dropout is NMAR; it could still be MAR. I think the pattern-mixture approach is probably the most accessible in terms of exploring the missingness.
 Nicholas Bishop posted on Thursday, January 07, 2010 - 12:01 pm
Thanks Bengt. What potential roadblocks will I face when attempting do this this with categorical outcomes?
 Bengt O. Muthen posted on Friday, January 08, 2010 - 10:47 am
You get more dimensions of integration due to categorical outcomes, so that can make for heavy computations.
 Regan posted on Wednesday, February 03, 2010 - 7:14 pm
I am interested in following up on Mr. Bishop's questions. I have about 2% of my sample that are non-responders on the four indicators that make up my main independent factor variable. I have done some preliminary regression analyses on the variables in the model, and it seems that one could argue that these missing cases violate the MAR assumptions for FIML or MI. Should I drop these cases from analyses, or how do I use the pattern-mixture approach you mentioned in your response to Mr. Bishop?
 Linda K. Muthen posted on Thursday, February 04, 2010 - 7:46 am
With so little missing data, I would estimate the model under MAR.
 Craig Furneaux posted on Thursday, July 22, 2010 - 7:56 pm
Hi Dr Muthen

I am undertaking an analysis of organisational processes, which are all categorical variables, including an array of binary variables related to these processes.

I am interested in the change to processes over time, and wonder whether the MPlus program would be suitable for this purpose?

I am particularly trying to prove / disprove change to these processes over time, based on these categorical variables.

Any help would be greatly appreciated, including any examples of this sort of approach.

 Linda K. Muthen posted on Friday, July 23, 2010 - 11:31 am
You might consider growth modeling, latent transition analysis, or growth mixture modeling. You can read about these in the Topic 3, 4, and 6 course handouts.
 Alain Girard posted on Monday, April 30, 2012 - 10:48 am
I perform a growth model for binary data with a probit link. I have predictors for intercept and slope.

Where can i find equation to compute the predicted probabilities for given values of predictors.

 Linda K. Muthen posted on Monday, April 30, 2012 - 2:06 pm
See slide 45 of the Topic 3 course handout in conjunction with slides 162-164 of the Topic 2 course handout.
 Alain Girard posted on Tuesday, May 01, 2012 - 6:57 am
Thanks for your answer. I just want to confirm my computation.

I estimated the model :

unsevariables = y1 y2 y3 y4 x1 x2;
categorical = y1 y2 y3 y4;

i s | y1@0 y2@1 y3@2 y4@3;
[y1$1@0]; [y2$1@0]; [y3$1@0]; [y4$1@0];
i on x1 x2;
s on x1 x2;

I want to compute P(yj=1|x1, x2) ; j = 1,2,3,4 for given values of x1 and x2.

I known P(yj=1|i, s, x1, x2) = 1-F(-i-(j-1)*s)
thus P(yj=1|i, s, x1, x2) = E(P(yj=1|i, s, x1, x2))

To compute P(yj=1|x1, x2) i simule (using R) i and s for given value of x1 and x2 and compute P(yj=1|i, s, x1, x2) for each simulated subject. I obtain P(yj=1|x1, x2) by take the mean of
P(yj=1|i, s, x1, x2).

Alain Girard
University of Montreal
 Bengt O. Muthen posted on Thursday, May 03, 2012 - 10:01 am
It looks like you are doing numerical integration by simulation. When you don't condition on the growth factors, the normal factors together with logit link requires numerical integration to get the probabilities. This is computed in the PLOT command I believe. With probit link the numerical integration is not needed - instead you have an explicit expession.
 Leslie Roos posted on Wednesday, September 09, 2015 - 10:08 am

I'm conducting longitudinal growth model of binary data using the ML estimator and have run a basic growth model as well as a growth model with a number of predictors regressed on the i and s values. I have a couple of questions, as well as an error message question. Thanks for your time!

(1) What would you recommend reported re indices of model fit for the ML estimator?

(2) Should I compare Loglikellihood values between the basic & indicator models to show that the predictor model is a better fit?

(3) Error message ..."THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX." -- the 'problem' parameter changes each time I run analyses, I also included montecarlo integration with the ML estimator after reading a previous post, but I'm not sure about the purpose of this (and obtain the same error message with or without)

(4) I've requested CINTERVALS for the log odds ratios but do not obtain any in the output -- why might this be?

Thank you!
 Bengt O. Muthen posted on Wednesday, September 09, 2015 - 12:19 pm
(1) Use TECH10 bivariate fit information.

(2) No, I think it is enough to see the that predictors have significant slopes.

(3-4). Send output and license number to support.
 Leslie Roos posted on Wednesday, September 16, 2015 - 10:29 am
Thank you for this input -- would you be able to direct me to a resource for interpreting / reporting indices of model fit for binary latent growth models from Tech10? Because we have 5 time points in the growth model, It seems that there are multiple options for the 'bivariate fit information.'

i.e. Is the Overall Bivariate Pearson Chi-square or Overall Loglikelihood chi-square most important vs. the multiple Bivariate Pearson Chi-square or Loglikelihood chi-square indices from each of the 10 pairs? What are the chi-square thresholds for these indicators indicating good model fit?

 Bengt O. Muthen posted on Wednesday, September 16, 2015 - 5:33 pm
I would look at each pair. Pearson and likelihood chi-2's should agree - otherwise trust neither. I would use this information in a more descriptive way as for the crime example in
the paper on our website:

Muthén, B. & Asparouhov, T. (2009). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.
download paper contact first author show abstract
 Leslie Roos posted on Friday, September 25, 2015 - 12:04 pm
Thanks for your response Bengt --

I'm trying to identify which stats from the output should be reported for our basic bivariate growth model, before moving onto describing results from a model with additional covariates.

The Pearson and Log-Likehood Chi-Squares from Tech 10 are very similar for each pair (i,e.. 25.3 & 27.0; 11.7 & 12.3; 14.0 & 14.5; 10.5 & 10.8). Would you consider these numbers to 'agree' or are there further stats that should be run here.

The 'Model Fit information' Pearson and Log-Likelihood Chi-Squares agree with each other in that they are both significant. -- Were you referring to these or the tech10 information?

Regarding the paper you reference, We have looked into the crime example, but it seems somewhat different regarding LCGA (vs. a basic latent growth model). I'm not clear on if you were referring to the descriptives in Table 6.1, and what would makes sense to report for our example, as we do not have iterative comparative models.

 Leslie Roos posted on Wednesday, December 02, 2015 - 1:23 pm
Hi Bengt, I wanted to follow up about the question, above, and ask if this would be a more appropriate question to email vs. discussion board post? Thank you!
 Bengt O. Muthen posted on Wednesday, December 02, 2015 - 6:22 pm
Ah, looks like this thread got dropped.

I was referring to bivariate TECH10 tests. Check if those give rejection or not.
 Leslie Roos posted on Monday, December 07, 2015 - 3:31 pm
Hi Bengt,

Thanks for your response! Below is example output from our bivariate TECH10 tests, but I've having a difficult time with interpretation. You advice is hugely appreciated.

- Leslie



Est Probabilities

Variable Variable H1 H0 Std Resid (z-score)
Category 1 Category 1 0.550 0.542 0.738
Category 1 Category 2 0.084 0.111 -4.144
Category 2 Category 1 0.190 0.163 3.522
Category 2 Category 2 0.176 0.184 -0.942
Bivariate Pearson Chi-Square 26.618
Bivariate Log-Likelihood Chi-Square 27.515


Overall Bivariate Pearson Chi-Square 118.880
Overall Bivariate Log-Likelihood Chi-Square 122.673
 Bengt O. Muthen posted on Monday, December 07, 2015 - 5:59 pm
The last column gives z-tests so you have significant misfit for 2 of the 4 cells in that bivariate table. If other pairs also have several significant spots of misfit you may want to modify your model. For instance, use a quadratic model instead of a linear one.
 Leslie Roos posted on Monday, January 11, 2016 - 4:50 pm
Thank you so much for this response. It makes sense that the 2 residual Z scores > 1.96 indicate poor fit. I ran the quadratic term you suggest in the basic model and neither the mean nor the variance is significant.

Would you have any additional suggestions?
 Bengt O. Muthen posted on Wednesday, January 13, 2016 - 12:28 pm
You could try to correlate residuals for time-adjacent outcomes to improve model fit. With binary outcomes and ML this calls for adding factors that make the items correlate beyond the growth factors. With WLSMV and Bayes there is no such need.
 Leslie Roos posted on Monday, January 25, 2016 - 4:47 pm
Got it -- it seems that a simple 'Si_T1 with Si_T2 S1_T3...' doesn't work because the indicators Si_T1, Si_T2 etc are categorical/binary -- is there alternate syntax that we could be directed to in line with "adding factors that make the items correlate beyond the growth factors" ? Thanks!
 Bengt O. Muthen posted on Monday, January 25, 2016 - 6:12 pm
Add a factor behind each pair of items. Fix the factor variance at 1 so you get only one loading parameter to represent the residual covariance.
 Leslie Roos posted on Monday, January 25, 2016 - 9:45 pm
Thanks! Just to confirm, the example syntax for this would be:

f12 BY u1 u2;
f13 BY u1 u3;
f23 BY u2 u3;
 Linda K. Muthen posted on Tuesday, January 26, 2016 - 2:48 pm
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message