In the Mplus version 2 user's guide, example 22.1D shows a linear growth model for a categorical outcome with time-invariant and time-varying covariates. Can this model handle missing data in the outcome variables or will my participants with missing data need to be dropped? Chapter 23 that discusses modeling with missing data only mentions that it is available for analyses with continuous outcomes. Thanks for any guidance.
The TYPE=MISSING option is not available for categorical outcomes. Any observation with one or more misssing values on the analysis variables will be dropped.
Shige posted on Saturday, December 31, 2005 - 5:33 pm
Is this still the case in version 3?
bmuthen posted on Saturday, December 31, 2005 - 5:47 pm
No, this post is quite outdated. With categorical outcomes you have 2 major options, ML estimation or limited information weighted least-squares estimation. With the former, you have the usual MAR facility through Type = Missing. With the latter you can now also allow missing data, but the estimation is only MAR when covariates predict missingness; when outcomes predict missingness, MCAR is needed because of the limited (2nd-order) information approach.
The limited information weighted least-squares was available only for complete data (no missing) in older version of Mplus (ver. 2?). It was extended to data with missing in newer version of Mplus. Is there any article describing the extension?
I'm running a growth with random onset model, similar to those in the Albert and Shih 2003 piece and Masyn's work/ tutorials. Although my model handles missingness on the outcomes (both the binary onset and continuous growth) the program kicks out anyone who's missing on independent variables. I'm was planning to use multiple imputation since I could not get this to work but I want to make sure my code is not incorrect first, especially in regard to the appropriate estimators. If a direct maximum likelihood/FIML type option is available then I'd rather go that route. Here's my basic code in Version 4...
TYPE=MISSING; with maximum likelihood is FIML. There is no missing data theory for covariates. Estimation is done conditioned on the covariates. Therefore, any observation with a missing value on a covariate is excluded from the analysis. You can include the covariates in the analysis by mentioning their variances in the MODEL command. They are then treated as dependent variables and distributional assumptions are made about them.
Miles Taylor posted on Thursday, January 11, 2007 - 10:18 am
Thanks for your quick response. I just wanted to make sure I hadn't missed something or coded incorrectly. I'll most likely just use imputation for the covariates.
Miles Taylor posted on Thursday, February 15, 2007 - 1:38 pm
Hi Linda, I can't seem to get model fit indices for the model listed above (growth with random onset). I can run the growth portion separately and get RMSEA, etc. but none of these are reported for the model above or for the onset only (discrete-time survival)model. I would calculate these by hand but my Chi-sq. stats look suspicious (Chisq=0, 11df). I need to find some way to get fit stats that reviewers are familiar with...even if I have to run the onset and growth models separately. Any suggestions?
When means, variances, and covariances are not sufficient statistics for model estimation, chi-square and related statistics cannot be computed. In these cases, nested models are compared using loglikelihoods.
I would be grateful for your comments on an ordinal growth curve model I am working on.
The variables relate to 6 occasions: one taken beforehand, 4 occasions during a process and one afterwards.
The four ‘during’ measures used a 4-point scale. An ordinal growth was fitted to these variables as described in Example 6.4.
The Before and After variables are also ordinal but have unique response categories making it inappropriate to include them in the growth curve. I specified both as categorical in the model and defined a latent continuous variable for Before.
However, this model would not converge until I fixed the first threshold of After. My question is whether this should be necessary and whether this is a reasonable thing to do. Ultimately, I want to elaborate this model by adding more predictors, so I want to be certain that the base model is okay or whether there are better ways of achieving my aims.
ANALYSIS: TYPE=MISSING H1; MODEL: interc slope | Occ1@0 Occ2@1 Occ3@2 Occ4@3; B4 by Before; !Create latent continuous variable from ordinal var b4@1; !Fix variance of latent var for identification [after$1@0]; !Fix first threshold of ordinal dependent var
Thanks for your advice. I've experimented with collapsing categories of AFTER to eliminate rare categories. I've also done this the other variables in the model. None of this results in convergence. Neither does treating AFTER as continuous. The parameters concerned drift further away from plausible values as the number of iterations allowed increases. This led me to impose the original constraint. Do you have any suggests about next steps?
With regard to creating a normally distributed latent variable (B4) from the ordinal variable (BEFORE), I did this so as to treat this variable consistently with AFTER, which is measured on the same 7-point scale and also with the four 'during' measures. It seemed to me that if these variables warranted special treatment due to their ordinal properties, so did BEFORE. Is this a reasonable view or am I being too precious?
It seems we need to see the data to understand this. Please send your input, data, output, and license number to email@example.com.
Emily Blood posted on Sunday, October 05, 2008 - 8:38 am
Hi, Can you explain more the difference between the theta and delta parameterization when used with a binary growth curve and WLSMV estimator? I have read the manual on this point (example 13.2 and 13.3) but am still not clear on when you would use one over the other and how allowing the scale factors for y*'s to be parameters versus allowing the residual variances of the y*'s to be parameters affects the model that is fit. It would be helpful to see how you how you could get the model results obtained by using DELTA parameterization using the THETA parameterization of vice versa to see what each is doing. Is this possible? Thanks, Emily
I would use the default Delta unless the model must be estimated using Theta. See Web Note 4 for a full discussion of this topic.
Emily Blood posted on Sunday, October 05, 2008 - 2:24 pm
I am generating data (outside of Mplus and then fitting the model with Mplus) and using the Theta parameterization is the way that I'm able to recover the parameters I set, if I use the Delta parameterization I don't recover them, but I don't quite understand why that is. That is why I was asking about the full specification of each parameterization and if you can get the results of one by specifying certain constraints in the other. Where is "Web Note 4"? Thanks, Emily
Emily Blood posted on Sunday, October 05, 2008 - 2:28 pm
I found Web Note 4 and will read it. Thanks, Emily
Hi, I try to run a mixture model for a binary outcome measured at four occasions. Below there is the syntax for the program I am trying to estimate. I would like to know whether this syntax looks O.K., because i receive a message that I have a negative df? My next question concenrns the possibility of building a mean trend/trajectory for a categorial outcome. Is it possible? Should I estimate an unconditional latent growth model for overall sample?
DATA: FILE IS C:\Documents and Settings\arina\Desktop\friend7.txt; NOBSERVATIONS=3245; VARIABLE: NAMES ARE alcohol1 alcohol2 alcohol3; USEVARIABLES ARE alcohol1 alcohol2 alcohol3; MISSING ARE; CLASSES=c(4);
Linda, Thank you very much for a prompt reply. Drawing on Example 8.4, I modified the initial model (the syntax is below). Does it look correct? After I run the single-class model, I can try to increase the number of classes, right?
Thank you very much.
DATA: FILE IS C:\Documents and Settings\arina\Desktop\friend7.txt; NOBSERVATIONS=3245; VARIABLE: NAMES ARE alcohol1 alcohol2 alcohol3; USEVARIABLES ARE alcohol1 alcohol2 alcohol3; MISSING ARE; CLASSES=c(1); CATEGORICAL ARE alcohol1 alcohol2 alcohol3; ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% i s| alcohol1@0alcohol2@1alcohol3@2;
OUTPUT: SAMPSTAT RESIDUAL TECH1 TECH8; plot: type is plot3; series alcohol1(1) alcohol2(2) alcohol3 (3);
Linda, I am still working on the model for a binary outcome measured at three occasions(the syntax is in my previous message). For some reason I cannot get the sample statistics and the plot for the mean trend in my output. Could you please advise me what to do?
J.W. posted on Tuesday, October 13, 2009 - 10:12 am
When Delta parameterization is used in a LGM with ordinal outcomes, usually: 1) the mean of the latent intercept growth factor is set to 0.00 2) the threshold invariance was constrained across time. 3) the scale factors are set free while it is fixed at a reference time point.
I have two questions:
1) Should the scale factor always be fixed at 1.0 at a reference time point? I tried to set it at 0.0, it did not work.
2) Alternatively, one can free the intercept factor mean and fix one threshold (e.g., the first threshold) at all time points. In my model with 6 repeated measures, I fixed the first threshold to 0.00 across time, the estimated intercept factor mean was 0.446, when I fixed the first threshold to 1.00, the estimated parameter became 1.446 other estimates remained the same. How do I interpret the results?
Your help will be appreciated!
J.W. posted on Tuesday, October 13, 2009 - 2:09 pm
More info for Question 1 I asked:
The model runs for a positive value specified for the scale factor, but the estimate of the intercept mean varies.How do I interpret the parameter estimate? Thanks!
1) A scale factor is the inverted SD for a latent response variable so it needs to be positive.
2) For the time point where you have centered the growth model (say time 1), the terms that determine the binary outcome probability is tau - alpha, or in Mplus language [u1$1] - [i]. So you can see why you got the two estimated sets of values. Typically you don't interpret tau or alpha, but merely use them in calculating outcome probabilities. So the choice of parameterization has no real interpretational impact.
I have a couple of questions for a LGM with 4-categories ordinal outcome measures:
1) WLSMV estimator and Delta parameterization are used in modeling: I would like to confirm the interpretation of the probabilities calculated from the Probit regression coefficients using the formula on p.406 in Mplus User¡¯s Guide. Instead of being probabilities of being in specific categories, they are: probability of y* > threshold 1 (i.e., probability of being in categories 2-4); probability of y* > threshold 2 (i.e., probability of being in categories 3-4); and probability of y* > threshold 3 (i.e., probability of being in category 4), respectively. In addition, does the covariance between the latent growth factors affect the calculation?
2) WLSMV estimator and Theta parameterization are used in modeling: in the unstandardized solution, means and variances/covariance of the latent intercept and slope factors, as well as thresholds and residual variances, all had very large p values; however, the estimates of means and variances/covariance of the latent intercept and slope factors in the standardized solution are very close to the corresponding figures in Delta parameterization. How to explain these? By the way, the threshold estimates, including p-values, in standardized solution are identical in the two parameterizations.
I'm fitting an ordinal growth model using the twolevel specification.
I'd like to know if it is possible to test the proportional odds assumption using MPlus. I thought that one way could be to declare my dependent variable as NOMINAL instead of CATEGORICAL, thus having one set of coefficients by category (multinomial regression), in order to check if this model fits significantly better than the ordinal one where the coefficients are restricted to equality.
Anyway, I get this error when I try to fit the model with a NOMINAL dependent variable: Internal Error Code: MDP1039
Is this because what I'm trying to do makes no sense? Or should I send you my input and data along with my serial number?
A growth model with a nominal outcome is a funny model. You don't have a single outcome as you do with an ordinal outcome, in the sense of having a single slope, so it almost seems like you have to have C-1 growth models for a C-category outcome.
In general, it is a little involved to test for ordinality using a nominal outcome.
If you like, you can send your output with the error message to support.
Quintana posted on Monday, November 19, 2012 - 1:09 pm
I am running a growth model of substance use at three time points. It is dichotomous (0= haven't consumed in the past year; 1= have consumed at least once in the past year).
When I run the dichotomous growth curve according to the syntax in the example in the user's guide, the model does not run (non-positive definite error). However, if I run the model with that same data and syntax EXCEPT not specifying the drinking variable as categorical in the syntax, the model runs perfectly.
(1) Can I use the output from the analyses where I don't specify the variables as categorical by creating an odds ratio myself using the betas I get from this model?
(2) If not, do you have any suggestions of what could be wrong or what I could do to get my model to work?
Note: I also get the model to work perfectly when I keep the substance use variable continuous (How often have you consumed in the past year on a scale of 1 to 5). However, the variable is very skewed, even when log transformed, so I would like to get the dichotomous model to work as well.
I estimated a linear growth model for a binary outcome (binge drinking) for 6 time points. I am examining moderators (all observed binary moderators) of the effect of an intervention (observed and dummy coded variable) on the latent slope term. I have found significant interaction product terms, but I cannot find a way to probe the interaction to interpret the effect (I've looked in the Mplus manual or forums). If the outcome was continuous, I would plot the growth model adjusting for the covariates at different levels of the interaction. This does not seem to be an option for the categorical growth model. I would appreciate any help in determining how to approach this.
Thank you for your help. I was able to Plot 3 to find this for individual weekly values (i.e., "plot estimated probabilities, conditional on a set of covariates" for a single variable) but not for the probabilities across all the time points (i.e., there was no similar option for a line plot for multiple variables in a series). So I am manually getting the probabilities for each week to make the plot - am I missing a simpler way?
I would like to have your feedback on a Growth Model I am working on. The indicators are ordinal variables (3 categories) but I want to treat them as nominal variables. I want to compare (#1) cat 3 vs cat 2; and (#2) cat 3 vs cat 1 (I have 4 time-point). I created a set of dummy variables where (#1) category 1 is coded as missing; and (#2) cat 2 is missing. I am running a two processes model one for comparison #1 and one for comparison #2. Does it make sense?
Growth modeling for nominal outcomes is a rather advanced technique. If you split it up into binary outcome I would simple run them separately, not in a 2-process model given the unusual correlations between the processes.
I'd appreciate your feedback on the statistical approach I have used - see post of February 12, 2016. As mentioned, I treated 3-category ordinal outcomes (0 No risk; 1 Low risk; and 3 High risk) as nominal variables.
I have run 3 models, separately (Bayes Estimator): #1 (dummy) 1 High risk vs 0 - Low or No risk; #2 0 High risk vs 1 Low risk (No risk = Missing); #3 0 High risk vs 1 No risk (Low risk = Missing).
The three trajectories were then regressed on a binary variable 0/1 to test the difference between the intervention and the control group.
My question is: do you think this approach is valid or weak/debatable?
I also considered running a Hidden Markov Model to test transition from one category to another over time (although I prefer the first approach).
I think this approach of splitting up the categories is useful if you don't really believe in the "proportional odds" assumption of ordinal regression but instead want to test if the intervention has different effects for different categories. You are in essence treating the outcome as nominal - which you can also do in a single analysis (Nominal = outcome).
When you say: "which you can also do in a single analysis (Nominal = outcome)" do you mean that I can specify Nominal are X1-X4 !(the risk index at each time point) and then run LGM (3 month, 6 month, 12 month follow up)
I am thinking of the analogy of factor analysis - or IRT analysis - with nominal items (see the IRT literature). Growth modeling uses a factor analysis model. But I don't think I've seen nominal growth done. Your approach is maybe more down to earth.