Mplus Discussion >> Growth Model for Categorical Outcome

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Growth Model for Categorical Outcome

Mplus Discussion > Categorical Data Modeling >

Message/Author

Elizabeth Ginexi posted on Thursday, March 29, 2001 - 10:16 am

In the Mplus version 2 user's guide, example 22.1D shows a linear growth model for a categorical outcome with time-invariant and time-varying covariates. Can this model handle missing data in the outcome variables or will my participants with missing data need to be dropped? Chapter 23 that discusses modeling with missing data only mentions that it is available for analyses with continuous outcomes. Thanks for any guidance.

Linda K. Muthen posted on Sunday, April 01, 2001 - 5:43 pm

The TYPE=MISSING option is not available for categorical outcomes. Any observation with one or more misssing values on the analysis variables will be dropped.

Shige posted on Saturday, December 31, 2005 - 5:33 pm

Is this still the case in version 3?

Shige

bmuthen posted on Saturday, December 31, 2005 - 5:47 pm

No, this post is quite outdated. With categorical outcomes you have 2 major options, ML estimation or limited information weighted least-squares estimation. With the former, you have the usual MAR facility through Type = Missing. With the latter you can now also allow missing data, but the estimation is only MAR when covariates predict missingness; when outcomes predict missingness, MCAR is needed because of the limited (2nd-order) information approach.

Eisuke Segawa posted on Saturday, July 01, 2006 - 8:01 am

The limited information weighted least-squares was available only for complete data (no missing) in older version of Mplus (ver. 2?). It was extended to data with missing in newer version of Mplus. Is there any article describing the extension?

Thank you

Eisuke

Bengt O. Muthen posted on Sunday, July 02, 2006 - 5:30 am

No, only the user's guide. It is pairwise present without covariates and missingness can be predicted by covariates when covariates are part of the model.

Miles Taylor posted on Thursday, January 11, 2007 - 7:01 am

I'm running a growth with random onset model, similar to those in the Albert and Shih 2003 piece and Masyn's work/ tutorials. Although my model handles missingness on the outcomes (both the binary onset and continuous growth) the program kicks out anyone who's missing on independent variables. I'm was planning to use multiple imputation since I could not get this to work but I want to make sure my code is not incorrect first, especially in regard to the appropriate estimators. If a direct maximum likelihood/FIML type option is available then I'd rather go that route. Here's my basic code in Version 4...

ANALYSIS:
TYPE=MISSING;
ESTIMATOR=ML;

MODEL:
f by on1@1;
f by on2@1;
f by on3@1;
f by on4@1;
iy sy| ytot1@0 ytot2@1 ytot3@2;
f ON x1 x2 x3;
iy-sy ON x1 x2 x3;
f@0;
f WITH iy@0;
f WITH is@0;

Thanks,
Miles

Linda K. Muthen posted on Thursday, January 11, 2007 - 8:23 am

TYPE=MISSING; with maximum likelihood is FIML. There is no missing data theory for covariates. Estimation is done conditioned on the covariates. Therefore, any observation with a missing value on a covariate is excluded from the analysis. You can include the covariates in the analysis by mentioning their variances in the MODEL command. They are then treated as dependent variables and distributional assumptions are made about them.

Miles Taylor posted on Thursday, January 11, 2007 - 10:18 am

Hi Linda,

Thanks for your quick response. I just wanted to make sure I hadn't missed something or coded incorrectly. I'll most likely just use imputation for the covariates.

Thanks!

Miles

Miles Taylor posted on Thursday, February 15, 2007 - 1:38 pm

Hi Linda,
I can't seem to get model fit indices for the model listed above (growth with random onset). I can run the growth portion separately and get RMSEA, etc. but none of these are reported for the model above or for the onset only (discrete-time survival)model. I would calculate these by hand but my Chi-sq. stats look suspicious (Chisq=0, 11df). I need to find some way to get fit stats that reviewers are familiar with...even if I have to run the onset and growth models separately. Any suggestions?

Linda K. Muthen posted on Thursday, February 15, 2007 - 2:55 pm

When means, variances, and covariances are not sufficient statistics for model estimation, chi-square and related statistics cannot be computed. In these cases, nested models are compared using loglikelihoods.

Andrew Mackinnon posted on Thursday, June 21, 2007 - 12:08 am

I would be grateful for your comments on an ordinal growth curve model I am working on.

The variables relate to 6 occasions: one taken beforehand, 4 occasions during a process and one afterwards.

The four �during� measures used a 4-point scale. An ordinal growth was fitted to these variables as described in Example 6.4.

The Before and After variables are also ordinal but have unique response categories making it inappropriate to include them in the growth curve. I specified both as categorical in the model and defined a latent continuous variable for Before.

However, this model would not converge until I fixed the first threshold of After. My question is whether this should be necessary and whether this is a reasonable thing to do. Ultimately, I want to elaborate this model by adding more predictors, so I want to be certain that the base model is okay or whether there are better ways of achieving my aims.

Thanks,

Andrew

ANALYSIS:
TYPE=MISSING H1;
MODEL: interc slope | Occ1@0 Occ2@1 Occ3@2 Occ4@3;
B4 by Before; !Create latent continuous variable from ordinal var
b4@1; !Fix variance of latent var for identification
[after$1@0]; !Fix first threshold of ordinal dependent var

interc slope on b4;
after on interc slope;

Bengt O. Muthen posted on Thursday, June 21, 2007 - 8:57 am

It should not be necessary to fix a threshold for the after variable. Perhaps the convergence problem is due to after being very skewed with some very rare categories?

The

B4 by Before;

approach assumes a normal B4 - why not simply treat Before as a continuous covariate?

Andrew Mackinnon posted on Sunday, June 24, 2007 - 9:28 pm

Dear Bengt,

Thanks for your advice. I've experimented with collapsing categories of AFTER to eliminate rare categories. I've also done this the other variables in the model. None of this results in convergence. Neither does treating AFTER as continuous. The parameters concerned drift further away from plausible values as the number of iterations allowed increases. This led me to impose the original constraint. Do you have any suggests about next steps?

With regard to creating a normally distributed latent variable (B4) from the ordinal variable (BEFORE), I did this so as to treat this variable consistently with AFTER, which is measured on the same 7-point scale and also with the four 'during' measures. It seemed to me that if these variables warranted special treatment due to their ordinal properties, so did BEFORE. Is this a reasonable view or am I being too precious?

Thanks,

Andrew

Linda K. Muthen posted on Monday, June 25, 2007 - 8:38 am

It seems we need to see the data to understand this. Please send your input, data, output, and license number to support@statmodel.com.

Emily Blood posted on Sunday, October 05, 2008 - 8:38 am

Hi,
Can you explain more the difference between the theta and delta parameterization when used with a binary growth curve and WLSMV estimator? I have read the manual on this point (example 13.2 and 13.3) but am still not clear on when you would use one over the other and how allowing the scale factors for y*'s to be parameters versus allowing the residual variances of the y*'s to be parameters affects the model that is fit. It would be helpful to see how you how you could get the model results obtained by using DELTA parameterization using the THETA parameterization of vice versa to see what each is doing. Is this possible?
Thanks,
Emily

Linda K. Muthen posted on Sunday, October 05, 2008 - 10:19 am

I would use the default Delta unless the model must be estimated using Theta. See Web Note 4 for a full discussion of this topic.

Emily Blood posted on Sunday, October 05, 2008 - 2:24 pm

I am generating data (outside of Mplus and then fitting the model with Mplus) and using the Theta parameterization is the way that I'm able to recover the parameters I set, if I use the Delta parameterization I don't recover them, but I don't quite understand why that is. That is why I was asking about the full specification of each parameterization and if you can get the results of one by specifying certain constraints in the other. Where is "Web Note 4"?
Thanks,
Emily

Emily Blood posted on Sunday, October 05, 2008 - 2:28 pm

I found Web Note 4 and will read it.
Thanks,
Emily

Arina Gertseva posted on Wednesday, February 25, 2009 - 5:29 pm

Hi,
I try to run a mixture model for a binary outcome measured at four occasions. Below there is the syntax for the program I am trying to estimate.
I would like to know whether this syntax looks O.K., because i receive a message that I have a negative df?
My next question concenrns the possibility of building a mean trend/trajectory for a categorial outcome. Is it possible? Should I estimate an unconditional latent growth model for overall sample?

Thank you.

DATA: FILE IS C:\Documents and Settings\arina\Desktop\friend7.txt;
NOBSERVATIONS=3245;
VARIABLE: NAMES ARE alcohol1 alcohol2 alcohol3;
USEVARIABLES ARE alcohol1 alcohol2 alcohol3;
MISSING ARE;
CLASSES=c(4);

CATEGORICAL ARE alcohol1 alcohol2 alcohol3;
ANALYSIS: TYPE = MIXTURE;
MODEL: %OVERALL%
%C#1%
[alcohol1$1*0 alcohol2$1*0 alcohol3$1*0];
%c#2%
[alcohol1$1*0 alcohol2$1*0 alcohol3$1*1];
%c#3%
[alcohol1$1*0 alcohol2$1*1 alcohol3$1*1];
%c#4%
[alcohol1$1*1 alcohol2$1*1 alcohol3$1*1];

OUTPUT: SAMPSTAT MODINDICES(10) STAND RESIDUAL TECH4;
plot: type is plot3;
series alcohol1 (1) alcohol2(2) alcohol3 (3);

Linda K. Muthen posted on Wednesday, February 25, 2009 - 5:53 pm

You have set up an LCA model. With three categorical indicators you cannot identify four classes.

See Example 8.4 for a Growth Mixture Model for a categorical outcome. A good place to start is with an unconditional model specified in the overall part of the MODEL command.

Arina Gertseva posted on Wednesday, February 25, 2009 - 7:50 pm

Linda,
Thank you very much for a prompt reply.
Drawing on Example 8.4, I modified the initial model (the syntax is below). Does it look correct? After I run the single-class model, I can try to increase the number of classes, right?

Thank you very much.

DATA: FILE IS C:\Documents and Settings\arina\Desktop\friend7.txt;
NOBSERVATIONS=3245;
VARIABLE: NAMES ARE alcohol1 alcohol2 alcohol3;
USEVARIABLES ARE alcohol1 alcohol2 alcohol3;
MISSING ARE;
CLASSES=c(1);
CATEGORICAL ARE alcohol1 alcohol2 alcohol3;
ANALYSIS: TYPE = MIXTURE;
MODEL: %OVERALL%
i s| alcohol1@0 alcohol2@1 alcohol3@2;

OUTPUT: SAMPSTAT RESIDUAL TECH1 TECH8;
plot: type is plot3;
series alcohol1(1) alcohol2(2) alcohol3 (3);

Linda K. Muthen posted on Thursday, February 26, 2009 - 6:29 am

That looks fine. Yes, the next step would be to increase the number of classes.

Arina Gertseva posted on Monday, March 02, 2009 - 10:24 pm

Linda,
I am still working on the model for a binary outcome measured at three occasions(the syntax is in my previous message). For some reason I cannot get the sample statistics and the plot for the mean trend in my output.
Could you please advise me what to do?

Linda K. Muthen posted on Tuesday, March 03, 2009 - 6:26 am

Please send your files and license number to support@statmodel.com.

J.W. posted on Tuesday, October 13, 2009 - 10:12 am

When Delta parameterization is used in a LGM with ordinal outcomes, usually:
1) the mean of the latent intercept growth factor is set to 0.00
2) the threshold invariance was constrained across time.
3) the scale factors are set free while it is fixed at a reference time point.

I have two questions:

1) Should the scale factor always be fixed at 1.0 at a reference time point? I tried to set it at 0.0, it did not work.

2) Alternatively, one can free the intercept factor mean and fix one threshold (e.g., the first threshold) at all time points. In my model with 6 repeated measures, I fixed the first threshold to 0.00 across time, the estimated intercept factor mean was 0.446, when I fixed the first threshold to 1.00, the estimated parameter became 1.446 other estimates remained the same. How do I interpret the results?

Your help will be appreciated!

J.W. posted on Tuesday, October 13, 2009 - 2:09 pm

More info for Question 1 I asked:

The model runs for a positive value specified for the scale factor, but the estimate of the intercept mean varies.How do I interpret the parameter estimate?
Thanks!

Bengt O. Muthen posted on Tuesday, October 13, 2009 - 6:49 pm

1) A scale factor is the inverted SD for a latent response variable so it needs to be positive.

2) For the time point where you have centered the growth model (say time 1), the terms that determine the binary outcome probability is tau - alpha, or in Mplus language [u1$1] - [i]. So you can see why you got the two estimated sets of values. Typically you don't interpret tau or alpha, but merely use them in calculating outcome probabilities. So the choice of parameterization has no real interpretational impact.

J.W. posted on Friday, October 16, 2009 - 2:01 pm

I have a couple of questions for a LGM with 4-categories ordinal outcome measures:

1) WLSMV estimator and Delta parameterization are used in modeling: I would like to confirm the interpretation of the probabilities calculated from the Probit regression coefficients using the formula on p.406 in Mplus User��s Guide. Instead of being probabilities of being in specific categories, they are: probability of y* > threshold 1 (i.e., probability of being in categories 2-4); probability of y* > threshold 2 (i.e., probability of being in categories 3-4); and probability of y* > threshold 3 (i.e., probability of being in category 4), respectively. In addition, does the covariance between the latent growth factors affect the calculation?

2) WLSMV estimator and Theta parameterization are used in modeling: in the unstandardized solution, means and variances/covariance of the latent intercept and slope factors, as well as thresholds and residual variances, all had very large p values; however, the estimates of means and variances/covariance of the latent intercept and slope factors in the standardized solution are very close to the corresponding figures in Delta parameterization. How to explain these? By the way, the threshold estimates, including p-values, in standardized solution are identical in the two parameterizations.

Bengt O. Muthen posted on Saturday, October 17, 2009 - 12:21 pm

1) Only the mean and the variance of y* plays in.

2) We need to see this - please send input, output, data, and license number to support@statmodel.com.

Nicolas Müller posted on Thursday, April 14, 2011 - 10:58 am

I'm fitting an ordinal growth model using the twolevel specification.

I'd like to know if it is possible to test the proportional odds assumption using MPlus. I thought that one way could be to declare my dependent variable as NOMINAL instead of CATEGORICAL, thus having one set of coefficients by category (multinomial regression), in order to check if this model fits significantly better than the ordinal one where the coefficients are restricted to equality.

Anyway, I get this error when I try to fit the model with a NOMINAL dependent variable: Internal Error Code: MDP1039

Is this because what I'm trying to do makes no sense? Or should I send you my input and data along with my serial number?

Bengt O. Muthen posted on Thursday, April 14, 2011 - 6:04 pm

A growth model with a nominal outcome is a funny model. You don't have a single outcome as you do with an ordinal outcome, in the sense of having a single slope, so it almost seems like you have to have C-1 growth models for a C-category outcome.

In general, it is a little involved to test for ordinality using a nominal outcome.

If you like, you can send your output with the error message to support.

Quintana posted on Monday, November 19, 2012 - 1:09 pm

I am running a growth model of substance use at three time points. It is dichotomous (0= haven't consumed in the past year; 1= have consumed at least once in the past year).

When I run the dichotomous growth curve according to the syntax in the example in the user's guide, the model does not run (non-positive definite error). However, if I run the model with that same data and syntax EXCEPT not specifying the drinking variable as categorical in the syntax, the model runs perfectly.

(1) Can I use the output from the analyses where I don't specify the variables as categorical by creating an odds ratio myself using the betas I get from this model?

(2) If not, do you have any suggestions of what could be wrong or what I could do to get my model to work?

Note: I also get the model to work perfectly when I keep the substance use variable continuous (How often have you consumed in the past year on a scale of 1 to 5). However, the variable is very skewed, even when log transformed, so I would like to get the dichotomous model to work as well.

Thank you

Linda K. Muthen posted on Monday, November 19, 2012 - 1:42 pm

Please send the output where it did not work and your license number to support@statmodel.com.

Sarah Dermody posted on Saturday, March 21, 2015 - 11:53 am

I estimated a linear growth model for a binary outcome (binge drinking) for 6 time points. I am examining moderators (all observed binary moderators) of the effect of an intervention (observed and dummy coded variable) on the latent slope term. I have found significant interaction product terms, but I cannot find a way to probe the interaction to interpret the effect (I've looked in the Mplus manual or forums). If the outcome was continuous, I would plot the growth model adjusting for the covariates at different levels of the interaction. This does not seem to be an option for the categorical growth model. I would appreciate any help in determining how to approach this.

Bengt O. Muthen posted on Saturday, March 21, 2015 - 1:05 pm

There should be an Adjusted means option in the menu of the plot window, so that different probability curves can be plotted as a function of various covariate values.

Sarah Dermody posted on Tuesday, March 24, 2015 - 3:47 am

Thank you for your help. I was able to Plot 3 to find this for individual weekly values (i.e., "plot estimated probabilities, conditional on a set of covariates" for a single variable) but not for the probabilities across all the time points (i.e., there was no similar option for a line plot for multiple variables in a series). So I am manually getting the probabilities for each week to make the plot - am I missing a simpler way?

Bengt O. Muthen posted on Tuesday, March 24, 2015 - 8:52 am

Are you using the SERIES option in the Plot command?

maryam salari posted on Monday, October 19, 2015 - 12:03 pm

Hi Bengt
Can i use correlation matrix for categorical data instead of data in growth curve model?How should i set that this correlation matrix comes from categorical data?

Your help will be appreciated.

Bengt O. Muthen posted on Monday, October 19, 2015 - 1:15 pm

No, for growth modeling with categorical outcomes you need to use raw data.

maryam salari posted on Tuesday, October 20, 2015 - 11:34 am

Thank you dear professor.
I need the likelihood function of growth curve models for categorical response. would you please help me?

Bengt O. Muthen posted on Tuesday, October 20, 2015 - 2:06 pm

Look at the book by Fitzmaurice, Laird and Ware: Applied Long'l Analysis; by Wiley.

maryam salari posted on Tuesday, October 20, 2015 - 9:40 pm

Thank you dear professor

Andrea Norcini Pala posted on Friday, February 12, 2016 - 11:33 am

Hello,

I would like to have your feedback on a Growth Model I am working on. The indicators are ordinal variables (3 categories) but I want to treat them as nominal variables. I want to compare (#1) cat 3 vs cat 2; and (#2) cat 3 vs cat 1 (I have 4 time-point). I created a set of dummy variables where (#1) category 1 is coded as missing; and (#2) cat 2 is missing. I am running a two processes model one for comparison #1 and one for comparison #2.
Does it make sense?

Thank you in advance for you feedback on this.

Bengt O. Muthen posted on Friday, February 12, 2016 - 5:17 pm

Growth modeling for nominal outcomes is a rather advanced technique. If you split it up into binary outcome I would simple run them separately, not in a 2-process model given the unusual correlations between the processes.

Andrea Norcini Pala posted on Saturday, February 13, 2016 - 1:04 pm

Thank you!

Andrea Norcini Pala posted on Monday, February 22, 2016 - 6:38 pm

I'd appreciate your feedback on the statistical approach I have used - see post of February 12, 2016. As mentioned, I treated 3-category ordinal outcomes (0 No risk; 1 Low risk; and 3 High risk) as nominal variables.

I have run 3 models, separately (Bayes Estimator):
#1 (dummy) 1 High risk vs 0 - Low or No risk;
#2 0 High risk vs 1 Low risk (No risk = Missing);
#3 0 High risk vs 1 No risk (Low risk = Missing).

The three trajectories were then regressed on a binary variable 0/1 to test the difference between the intervention and the control group.

My question is: do you think this approach is valid or weak/debatable?

I also considered running a Hidden Markov Model to test transition from one category to another over time (although I prefer the first approach).

What would you suggest?
Thank you.

Bengt O. Muthen posted on Tuesday, February 23, 2016 - 6:04 am

I think this approach of splitting up the categories is useful if you don't really believe in the "proportional odds" assumption of ordinal regression but instead want to test if the intervention has different effects for different categories. You are in essence treating the outcome as nominal - which you can also do in a single analysis (Nominal = outcome).

Andrea Norcini Pala posted on Tuesday, February 23, 2016 - 6:35 am

Thank you professor, that is extremely useful.

When you say: "which you can also do in a single analysis (Nominal = outcome)" do you mean that I can specify Nominal are X1-X4 !(the risk index at each time point) and then run LGM (3 month, 6 month, 12 month follow up)

I S | X1@0 X2@3 X3@6 X4@12;

Thank you so much for your help!

Bengt O. Muthen posted on Tuesday, February 23, 2016 - 6:23 pm

I am thinking of the analogy of factor analysis - or IRT analysis - with nominal items (see the IRT literature). Growth modeling uses a factor analysis model. But I don't think I've seen nominal growth done. Your approach is maybe more down to earth.