Growth Models with Censored Data? PreviousNext
Mplus Discussion > Growth Modeling of Longitudinal Data >
 Dan Bauer posted on Tuesday, July 11, 2000 - 10:18 am
I am hoping to use MPlus to do some growth modeling with censored data. The data cover credit debt over 4 time points. The censoring is on the low end of the distribution, at $500. The censoring threshold is invariant across the four time points.

It seems like a growth model of these data would be a relatively straight-forward extension of the case where data are categorical. That is, both analyses would proceed from a polychoric correlation matrix and incorporate information on thresholds.

My question is whether I am correct that growth modeling can be done with censored variables, and, if so, whether MPlus can do it. References to relevant texts or examples would also be appreciated.
 Bengt O. Muthen posted on Tuesday, July 11, 2000 - 3:23 pm
LISCOMP, the precursor to Mplus, allowed censored variables by Tobit modeling but we felt that this used too strong assumptions and did not include it in Mplus. Although not optimal, I would suggest treating your variables as ordered polytomous, categorizing debts above $500 into a couple of categories. A reference related to this is (although not in a longitudinal context):

Muthén, B., & Speckart, G. (1983). Categorizing skewed, limited dependent variables:Using multivariate probit regression to evaluate the California Civil Addict Program. Evaluation
Review, 7, 257-269. (#3)

Future developments will probably handle these kinds of variables in a better way.
 Dan Bauer posted on Tuesday, July 18, 2000 - 5:22 am
Thank you for responding so promptly to my question about censored data. I wonder if you would not mind expanding your comments. My colleagues here feel that rendering censored data into polytomous categories is losing valuable information. Further, either treatment of the variable (ordinal or censored) requires the assumption that there is an underlying normal distribution. Is there an assumption peculiar to censored data that is particularly suspect?
 bmuthen posted on Tuesday, July 18, 2000 - 7:43 am
The assumption of censored-normal Tobit that I find limiting is that the coefficients for the covariates' influence on the probability of censoring are proportional to the coefficients of the covariates' influence on the amount observed when not censored. The ordered polytomous model would seem to share this limitation in this application. An alternative approach has been taken using two-part (semicontinuous) modeling. You may find it useful to read a recent paper on this by Olsen and Schafer at Penn State. It can be found on the web site

This paper is being revised for JASA.
 Anonymous posted on Friday, August 06, 2004 - 8:49 pm

Great bulletin board here! I have a growth modelling question with some survival aspect involved.

I have a continuous variable (say a functional disability score) which is supposed to have been measured at three times, say times 0, 2 and 6. However, some individuals die in between times 2 and 6, and therefore have no measurement. These deaths occurred while there was still function, so I am unwilling to plug in a zero score for them, as has been suggested elsewhere.

I am interested in primarily the traditional growth modelling aspect, in which I would like to characterise the latent growth curves (ie linear slopes) across people. My question is, how do I handle the deaths in this analysis?

I guess one way is to assume their time 6 measurements are missing at random (conditional on observed trajectories), and therefore "automatically" accounted for in a likelihood based analysis?

Or is there a more elegant method in Mplus which can somehow jointly model both the longitudinal measurements and the survival process? With this,survivors would be censored at time 6, but those dying would have their time of death recorded (exactly) but would have no functional disability measurement at that exact death time.

Many thanks in advance.
 bmuthen posted on Sunday, August 08, 2004 - 11:57 am
MAR would seem a good first step, letting the scores from the first 2 time points predict the time 6 missingness.

You could try more advanced, non-ignorable missing data modeling in line with the movie of my Spring 2004 UCLA course:

- see the Lecture 17 handout (under handouts). For instance, you can try to use the growth slope as a predictor of time 6 missingness.

There have been several recent articles on joint survival modeling and growth modeling in the biostatistics literature. Mplus can not yet do continuous-time survival, but can do very general discrete-time survival modeling.
 Michael J. Zyphur posted on Saturday, November 19, 2005 - 8:03 pm
Hi Linda (sorry for two posts, but I want to keep things organized under the right categories),

Running a censored LGM with values censored (above) at 4.0, I am having problems running a model with freely estimated slope-factor loadings. I.e.,

i s | 1@0 2@1 3@* 4@*

I am thinking this kind of a model might be difficult to estimate with censored data. Is this true?

Thanks again!

 Linda K. Muthen posted on Sunday, November 20, 2005 - 9:59 am
The way you specify free time scores is as follows:

i s | 1@0 2@1 3* 4*;

I am not sure how the program would interpret @*

This type of model should not be difficult with censored data unless the data are perhaps not truly censored and therefore that model is not appropriate for the data.
 Xuecheng, Liu posted on Monday, March 06, 2006 - 12:40 pm
Hi, Muthén and Muthén,

Is it possible (with command “CENSORED” in ANALYSIS: TYPE=MIXTURE) that we specify some variables to be censored from below and above SIMULTANEOUSLY?

Many thanks!
 Linda K. Muthen posted on Monday, March 06, 2006 - 1:29 pm
Variables can be censored from above or below but not both.
 Richard Gibson posted on Tuesday, March 25, 2008 - 4:11 am
Is the model unstable? N = 88 facilities; t = 4 (collapsed over 24 months to 6 month blocks) ; measure y is rate of event per 100 beds; event is rare so lots of zeros => censor inflated.

Model 5
i s | y1@0 y2@1 y3@2 y4@3;
i2 s2 | y1#1@0 y2#1@1 y3#1@2 y4#1@3;
i s on StudyG bt1 bt2;

I ON STUDYG -1.286 0.340 -3.782 0.000
S ON STUDYG 0.374 0.126 2.956 0.003

Model 6
i s s2 on StudyG bt1 bt2;
i2*0.757; s2*0.121;

I ON STUDYG -1.717 0.794 -2.163 0.031
S ON STUDYG 0.504 0.348 1.452 0.147

Model 7
i s i2 s2 on StudyG bt1 bt2;
i2*0.757; s2*0.121;

I ON STUDYG -2.033 1.184 -1.717 0.086
S ON STUDYG 0.623 0.447 1.391 0.164

I am puzzled by the changes in the estimates and standard errors between the three models and would appreciate direction in understanding why they change sufficient to change the inference with the intercept and slope on StudyG. I am worried that the instability suggests more than one maxima.
 Linda K. Muthen posted on Tuesday, March 25, 2008 - 9:14 am
When you leave out paths that are not zero, your model may be misspecified. I would trust Model 7.
 Richard Gibson posted on Tuesday, March 25, 2008 - 3:56 pm
Thanks. The earlier models were just steps along the way towards model 7 and the results of model 7 make sense so I am glad that it is the trustworthy one!

Misspecification can certainly have a powerful effect!
 Mariam Dum posted on Monday, April 21, 2008 - 1:55 pm
I was wondering if you can provide me wtih a good reference on censored data for growth mixture modeling?
 Linda K. Muthen posted on Monday, April 21, 2008 - 4:50 pm
When you say censored, do you mean censored normal data or right censoring as seen in survival models?
 Tabitha McKinley posted on Friday, January 02, 2009 - 5:12 pm
I am working on a dissertation regarding special education student populations and am using data censored from below in order to show ways to display the students' growth (or lack thereof) when they are not adequately answering standardized test items. I have looked at both the censored and censored-inflated models in the Mplus manual, but am having some trouble conceptually comprehending the real expected differences in path values between the two models.
Any help please??? Thanks!
 Bengt O. Muthen posted on Friday, January 02, 2009 - 5:39 pm
Before I answer that, have you considered two-part growth modeling? See the Mplus web site for references (under Papers and under Growth). The Olsen-Schafer article argues for two-part over censored growth modeling.
 Tabitha McKinley posted on Saturday, January 03, 2009 - 5:15 am
I have read the paper, and from what I can determine, it seems that X variables with negative measurement error scores are treated as censored, while those with positive measurement error are treated as noncensored. In addition, their model is estimated using categorical variables, and I am analyzing continuous observations.
 Bengt O. Muthen posted on Sunday, January 04, 2009 - 4:55 pm
I don't think you are reading Olsen-Schafer correctly. Their article is about a continuous outcome which has a strong floor effect (piling up at zero). They turn that into what amounts to a parallel process model for the binary part and the continuous part. See how we describe it in the handout for the short course Topic 4 on our web site. See also the 2 two-part papers under Papers, Two-Part... on our web site.
 Linda K. Muthen posted on Tuesday, February 03, 2009 - 4:28 pm
When you change the zeros to ones, there is more censoring because there were already some ones. This changes the data and therefore the model may have convergence problems.
 Tabitha McKinley posted on Tuesday, February 03, 2009 - 4:53 pm
Thanks for the rapid response. I will be sure to change the entire structure of the data for the next time I run the model.
I am also preparing to create a dataset which mimics the one in the example you provided. Is there a threshold for the amount of censored observations that can be included before I experience convergence problems?
 Linda K. Muthen posted on Tuesday, February 03, 2009 - 5:42 pm
Each example in the user's guide has a Monte Carlo counterpart. This would be a good place to start if you are generating data. You can try various amounts of censoring in this way to see when it becomes problematic.
 Tabitha McKinley posted on Wednesday, February 04, 2009 - 8:32 am
In yout first post you stated that changing the zeros to ones adds to the amount of censoring and causes convergence problems.
Is there a specified amount of censoring that is preferrable when trying to run this model? I was trying to find something about this in the manual and saw nothing specific about how much censoring is "too much" for the model.
 Linda K. Muthen posted on Wednesday, February 04, 2009 - 9:31 am
I don't know how much censoring is too much.
 Tabitha McKinley posted on Wednesday, February 04, 2009 - 10:08 am
Ok thanks. One last question (for today at least), does zero have to the be the value for the lowest observation in a floor effect model? Can other "low" values work in this format?
 Linda K. Muthen posted on Wednesday, February 04, 2009 - 11:17 am
The lowest value does not need to be zero.
 Tabitha McKinley posted on Wednesday, February 11, 2009 - 2:47 pm
I have another set of questions as I've been working with the censored-inflated model, and I can't find the answers in the manual. I'm working on this with Debbi Bandalos and she couldn't answer my questions so I decided to come back here:
1) Is the intercept of the model fixed to zero by default of the program?
2) Is the "censored" part of the model due to censoring at specific time points, or censored across time?
3) How does this model select which values are censored? From what I can see, it looks like the model just selects based on the lowest value in the data, but I've not read anything that supports that.
4) Does the censored part have to be an integer, or can it also be a decimal?
Thanks so much for all of your help. I promise to send you the paper when I'm finished with this :-)
 Bengt O. Muthen posted on Wednesday, February 11, 2009 - 6:28 pm
1) No, it is free by default.

2) You have different censoring for different time points

3) It takes the lowest value unless you give it another value.

4) It can be a real number with decimals.
 Nicholas Bishop posted on Monday, June 01, 2009 - 5:45 pm
I have a similar question as to the one posted "Anonymous posted on Friday, August 06, 2004 - 8:49 pm". I am using a growth curve model that estimates BMI baseline and change over 14 years in an older adult population. The outcomes of interest are BMI slope and intercept but I need to control for mortality selection in this population. Is there a way I can incorporate a discrete time survival analysis into the latent growth model to control for mortality selection? Thanks.
 Bengt O. Muthen posted on Monday, June 01, 2009 - 6:47 pm
If I understand you correctly, you are raising an interesting point about possible non-ignorable missingness due to dropout in the form of mortality. That is, the usual MAR assumption that past observed outcome values predict missingness may not hold. There is a whole set of possibilities for handling such modeling in Mplus. Ironically, I am just preparing a talk on this topic for the UK Mplus Users Group to be presented June 8. This talk and Mplus scripts will be posted. I will also try to record this as a Camtasia web talk.

Adding discrete-time survival modeling is one way to handle this. It is related to so called Diggle-Kenward modeling, which can be done in Mplus. Another way is to use dropout dummy variables in line with Roy (2003) in Biometrics - this is better than pattern-mixture modeling in my view. The Diggle-Kenward approach represents "selection modeling", whereas the Roy approach represents "mixture modeling".
 Nicholas Bishop posted on Friday, June 12, 2009 - 11:31 am
I look forward to the positng of the talk and the MPlus scripts. Please let me know where I can find these resources once they are posted. Thanks.
 Michelle posted on Friday, June 26, 2009 - 7:28 am
Hi - Jumping here from the distal outcomes conversation. I am trying to understand missing vs censored data in a latent class growth-survival analysis model similar to example 8.15 or 8.16 in the MPlus manual or the Muthen & Masyn (2005) paper. In my case, mortality can either be measured by wave (discrete time) or continuously. We have 4 waves of data over 20 yrs. We're like to use latent classes of healthy aging (defined by a score (y1-y4)) to predict mortality (u1-u4 in the discrete-time model).

I have missing data due to non-response at various individual waves, and I don't really know how MPlus is handling this. Unlike the examples provided in Muthen & Masyn (2005), these folks are not really right-censored, as they come back into the dataset at later waves, or we have data on their deaths later. Does MPlus use the survival variable to differentiate between someone who is missing due to death vs someone who is a non-respondent for one wave? A colleague has suggested imputing the non-responses in order to reduce missingness to only one type (death) - is this an appropriate approach? Are there other ways to handle this, either in the data set up or the modeling approach?

Thanks for any guidance you can give,
 Bengt O. Muthen posted on Friday, June 26, 2009 - 9:47 am
It seems to me that intermittent missingness is more likely to obey the MAR assumption (of missingness being predicted by observed variables rather than the missing value that was not captured) than dropout (due to death) missingness. So intermittent missingness needs no special action in the modeling because Mplus does ML under MAR. Dropout missingness on the other hand may be non-ignorable (NMAR; Not Missing At Random) and may therefore need the extra information on the dropout time to obey MAR (the dropout time provides the observed variable that causes missing which MAR needs). So when you add survival modeling to the growth modeling you protect yourself against biases due to dropout missingness.

In short, no imputations are needed, just modeling.
 Michelle posted on Friday, June 26, 2009 - 12:18 pm
Thanks! This is very helpful. The final question I have is, how does MPlus know which instances are intermittent missing and which are missing due to death/dropout? Should these be coded differently in the dataset?

 Bengt O. Muthen posted on Friday, June 26, 2009 - 12:31 pm
Mplus knows. They should not be coded differently. Missing due to dropout simply has missing also for all subsequent values.

In the DTSA part once an event occurs you score u=1 and subsequent u's as missing, which means that this person is no longer part of the risk set for later death. U is the DV in your run so think regression where a person is not part of the computations when missing on the DV.
 Michelle posted on Monday, June 29, 2009 - 9:53 am
Thanks again! The help is much appreciated as I am very new to MPlus!

 Nicholas Bishop posted on Tuesday, August 25, 2009 - 2:32 pm
Hello, I am using a growth curve model that estimates BMI baseline and change over 14 years in an older adult population. The outcomes of interest are BMI slope and intercept but I need to control for mortality selection in this population. I am able to create the GCM and the discrete time survival analysis seperately, but I am interested in incorporating both of these into a single model. Is there a better way to control for the non-ignorable missing data caused by mortality selection? Is there syntax avaiable that would be helpful to me? Thank you.
 Nicholas Bishop posted on Tuesday, August 25, 2009 - 2:37 pm
I forgot to ask if there is example syntax available for selection modeling? Thanks again.
 Bengt O. Muthen posted on Tuesday, August 25, 2009 - 6:47 pm
To account for possibly non-ignorable missingness due to dropout, you may want to analyze the growth model and the survival model jointly. These 2 processes should be correlated. One approach to draw on is the so called Diggle-Kenward selection model. If you email me, I can send you the Mplus input for that approach. I am about to finish an overview paper on these matters.

Selection modeling in the sense of Heckman is not available in Mplus.
 Tim Stump posted on Thursday, September 24, 2009 - 8:54 am
I have a continuous outcome measured at different time points and want to specify a growth model for this process. The outcome is censored above and below. I read in previous posts that you can't specify a variable as censored both above and below. Is this still true? Also, wondering if you have other options or ideas of different approaches for these kind of data.
 Bengt O. Muthen posted on Thursday, September 24, 2009 - 9:29 am
Yes, currently Mplus does not allow censoring from both below and above. Perhaps the variable can be dichotomized? If too much information gets lost that way, perhaps you want to ask why there are these two peaks in the distribution - is this a mixture of different types of people so that mixture modeling is relevant?

In addition, perhaps a combination of censored-normal modeling and two-part modeling is possible. Or maybe we need three-part modeling.
 Tim Stump posted on Thursday, September 24, 2009 - 11:22 am
Dr. Muthen, thanks for responding. Just to provide more info, I'm analyzing the clearance of a certain antigen in urine and serum specimens of patients during a specific treatment. Measurements are taken at baseline, 1, 3, 5, 8, and 13 wks. The censoring limits are .6 at low end and 39 at the high end. Subjects scoring outside these limits from the assay are marked as <.6,>39. So, they actually have values outside the limits, but I don't know what they are. Hence, the reason for specifying a censored variable. There's piling up at either end, but the % of piling up ranges from 10% up to 50% depending on the specimen (urine or serum) and time point.

Given this information, do you think mixture modeling would be relevant? Is three part modeling the same as mixture modeling?
 Bengt O. Muthen posted on Thursday, September 24, 2009 - 2:06 pm
I wonder if your censoring shifts from one end of the scale to the other end over time? If so, you could specify censored-below/above for early/late time points.

If you have censoring from both ends at a given time point, perhaps you have a mixture of patients - for which mixture modeling might be helpful even if it doesn't allow from censoring from above for one class and from below for another.
 Bengt O. Muthen posted on Thursday, September 24, 2009 - 2:11 pm
2 and 3-part modeling is not the same as mixture modeling. We teach on 2-part modeling, mixture modeling, and their differences in our short courses - see handouts at
 Nicholas Bishop posted on Saturday, February 20, 2010 - 6:36 pm
Hello. I am using a growth curve selection model to adjust for NMAR when the y outcome is word recall scores in a sample of elderly adults. I believe the value of y at time t-1 should predict drop out at time t. I have pasted the results for the discrete time portion of the model below.

DD00 ON est s.e. est/s.e. p-value
TOTREC98 0.245 0.043 5.747 0.000
TOTREC00-0.790 0.090 -8.765 0.000

odds ratio
TOTREC98 1.277
TOTREC00 0.454

Am I correct in regressing death in 2000 (DD00) on word recall scores from 1998 (totrec98) and 2000 (totrec00)? If so, am I correct interpreting the odds ratio as "with a one unit increase in word-recall in 1998, the odds of dropout in 2000 increase by 27.7%"? Thanks for your time.
 Bengt O. Muthen posted on Sunday, February 21, 2010 - 11:18 am
Yes, I think so. It sounds like you are working with the Diggle-Kenward selection model. Note, however, that the separate interpretation of the two coefficients (for y_t and y_{t-1}) is not clearcut. DK (1994) discuss an alternative parameterization which you can do using Model Constraint.
 Nicholas Bishop posted on Sunday, February 21, 2010 - 10:03 pm
Dr. Muthen,
I am using the Diggle-Kenward selection model with model constraint. I have pasted the model below. It is not intuitive that higher scores on word recall would be positively associated with mortality, so I will look more into the interpretation of those coefficients.

i s | totrec98@0 totrec00@.2 totrec02@.4 totrec04@.6 totrec06@.8 totrec08@1;

I S ON testfx r4agey_b black hispanic other

dd00 on totrec98 (beta2)
totrec00 (beta1);
dd02 on totrec00 (beta2)
totrec02 (beta1);
dd04 on totrec02 (beta2)
totrec04 (beta1);
dd06 on totrec04 (beta2)
totrec06 (beta1);
dd08 on totrec06 (beta2)
totrec08 (beta1);

Model constraint:
new(theta1 theta2);
theta1 = (beta1+beta2)/2;
theta2 = (beta1-beta2)/2;
 Bengt O. Muthen posted on Monday, February 22, 2010 - 6:02 pm
Are you referring to the estimates for the beta coefficients or the theta coefficients in your setup? Are the signs different for these two sets? See also Diggle-Kenward (1994) eqns (31) and (32).
 Nicholas Bishop posted on Tuesday, February 23, 2010 - 1:46 pm
Dr. Muthen,
In my previous post (2/20), the estimates and odds ratios I was referring to were the coefficients given for the logistic regression portion of the model. The coefficients for the theta parameters are as follows:

New/Additional Parameters
THETA1 -0.225 0.023 -9.659 0.000
THETA2 -0.465 0.056 -8.254 0.000

I am unclear if I should be interpreting the coefficients for the theta parameters or the logistic regression portion of the model. I appreciate your advice with these matters.
 Nicholas Bishop posted on Tuesday, February 23, 2010 - 2:02 pm
Dr. Muthen,
I have DK (1994) in front of me and I see how the reparameterization in eq 32 leads to more easily interpretable results. Thanks for your help with this. I enjoyed reading recently submitted work on non-ignorable data modeling. Thanks again.

 Nicholas Bishop posted on Friday, February 26, 2010 - 5:44 am
Dr. Muthen,
What would be the best method for assessing model fit for the DK mixture models?
 Bengt O. Muthen posted on Friday, February 26, 2010 - 9:36 pm
With mixture models one no longer has the mean-covariance model chi-2 testing that we are used to in SEM. There are no simple sufficient statistics less than the raw data. Typically, a model is evaluated in comparison to another more general model, often by BIC.

In Mplus you can also study the fit of the model relative to the data using TECH7. Here the model-estimated means, variances and covariances are compared to "observed" sample counterparts (the posterior-probability weighed data). When there is no missing data, TECH13 makes it possible to check against 3rd and 4th-order moments as well.
 Nicholas Bishop posted on Tuesday, March 01, 2011 - 1:23 pm
I am using the DATA MISSING command to create dropout variables for a pattern-mixture model, and have a question regarding the TYPE=DDROPOUT setting. When I look at the variables created by this function, a positive (1) dropout dummy variable is only present for the first period of dropout. For periods following the intial dropout, the dummy variable remains zero. When using the TYPE=SDROPOUT setting, the period of dropout is indicated by a 1, follwed by missing data indicators for the remaining periods (discrete-time survival indicators).

When using a pattern-mixture setup, do the DDROPOUT variables control for the fact that an indivudal remains unobserved after the intial period of dropout? A great deal of the dropout I am dealing with is caused by mortality, thus these indivudals remain unobserved after thier intial dropout.

I am currenlty using the DDROPOUT setting to create missing data indicators for pattern-mixture models and the SDROPOUT to create missing data indicators for selection models.
 Bengt O. Muthen posted on Tuesday, March 01, 2011 - 3:29 pm
DDROPOUT creates a set of dummy variables which together tells you which dropout time category an individual belongs to. So for instance dropping out after time 1 with a total of 3 time points would be scored as

0 1 0

The last zero means that the person is not in the study at the 3rd time point either. So, yes, individuals remain unobserved after their initial dropout. In contrast, you can use Mplus to code intermittent missingness.

DDROPOUT is used for exogenous indicators and SDROPOUT is used endogenous indicators.
 Nicholas Bishop posted on Monday, April 11, 2011 - 8:35 am
In using the Mplus DATA MISSING command, as I am aware, the facility produces the missing data indicators for the entire sample without distinction between causes of missingness. I have two forms of absorbing dropout in my study (mortality and institutionalization), as well as intermittent missingness. I would ideally like to create a separate dropout indicator for each of these sources of missing data, then use these to adjust the outcome trajectories separately.

Is it possible for me to use the DATA MISSING command to create these separate missing data indicators? Is there another strategy I could use to properly adjust for these separate forms of dropout and intermittent missingness? Thank you.
 Bengt O. Muthen posted on Monday, April 11, 2011 - 3:00 pm
I don't know that it would be easy to use Data Missing for this - perhaps you might just as well create that information some other way (by Define, saving the new info; or by another program). You could create dropout dummy variables like in pattern-mixture modeling, where you make a distinction between types of dropout, and then see if the growth parameters differ across those dropout times and types. I haven't seen this done, however.
 Ross Larsen posted on Friday, September 16, 2011 - 7:50 am

I have an unusual case to analyze in growth modeling with censored data.

Children were given a test at three time points. The test is administered in two parts. All children were only given the second part if they achieved a threshold score in the first part.

In time 1, the second part of the test was not administered to ANY children. Thus, the children who would have taken the second part of the test were unable to to do so. This leads to time point 1 to have artificially low scores when analyzing the total test scores (part1+part2).

Ordinarily, in SAS I would handle this by creating a censored variable that would tell the procedure which of the children made the threshold and should thus be considered censored from above. In Mplus the censored statement just looks at the highest score available and calls that censored. This is a problem as two children could have both made the the threshold but one child has a lower score and thus would not be considered censored. I thought of including my censored variable (0=did not make threshold, 1=made threshold) as a covariate on the intercept and slope, but I wanted your thoughts on this problem and how you would handle it.

Thank you.
 Bengt O. Muthen posted on Saturday, September 17, 2011 - 10:59 am
Why not view the two parts of the test as two different variables, say x and y for the first and second part? For time 1, only x1 is observed and everyone has missing on y1 so we can ignore y1. For time 2, x2 is observed and y2 is observed for those above the threshold of x1 and missing for the others. Etc for time 3. Missingness fulfills MAR due to missing on y2 being determined by the observed x1 score. So for the first 2 time points the data set has 3 variables: x1, x2, and y2. None of them is censored.
 Jean-Samuel Cloutier posted on Monday, April 02, 2012 - 2:46 pm

I want to build a Multiple indicator growth model.

All my indicators are zero inflated.

What is my best option?

 Jean-Samuel Cloutier posted on Tuesday, April 03, 2012 - 8:33 am
I understant that it'is two part modelling (one binary growth model and one continuous).

I am not sure to understant how to transform my data.

Creating a variable (u) wich take 0 or otherwise 1 is eazy.

But will the second continuous variable (y), will have missing values for zeros?
 Linda K. Muthen posted on Tuesday, April 03, 2012 - 10:29 am
See DATA TWOPART in the user's guide.
 Jean-Samuel Cloutier posted on Thursday, April 12, 2012 - 8:31 am
I have try to model a growing phenomenon but certain characteristics of the phenomenon make it hard to model as a regular growth model.
I have preponderance of 0
The structure of the data evolve:
• the first year some observation are 0 and some observation are 1
• the second year some observation are 1 some observations are 2 and some observation are 0
• the third year some observation are 1 some observations are 2 and some are 3 observation are 0
and so on...
Any advices would be welcomed.
 Linda K. Muthen posted on Thursday, April 12, 2012 - 1:38 pm
If the variable is treated as categorical, see the CATEGORICAL * option on page 489 of the user's guide.
 Jean-Samuel Cloutier posted on Thursday, April 12, 2012 - 6:09 pm
It's more a count variable.
The only binary variable in the model is the first year observation.

Maybe I should consider the variable as categorical anyway. Second year would have three categories (0,1,2). ... Seventh year observations would have 8 categories (0,1,2,3,4,5,6,7).

It's like a school with seven grade, where a majority will never finish the first grade, less that will finish the first, less that will finish the second grade etc. over seven years (A left censored final distribution with preponderence of zeros).

I wish I could model this as a growth model.
 Linda K. Muthen posted on Friday, April 13, 2012 - 8:25 am
It sounds like an ordered categorical variable. I would use CATEGORICAL with the * setting.
 matteo giletta posted on Friday, June 15, 2012 - 6:35 am
Dear Linda & Bengt,
I’ve seen in Topic 4 that you presented a zero-inflated Poisson latent growth curve model in which you defined the intercept and slope only for the count part of the model and not for the zero-inflated one. I would like to do the same using a censored inflated growth curve model. I’ve estimated a conditional model in which I only defined intercept and slope (and the effects of covariates) for the continuous part of the model. Now, I wonder whether this is allowed or I would also need to define an intercept and slope for the inflated part of the model. Considering that this is computationally demanding, if allowed, I would like to avoid it.
 Linda K. Muthen posted on Friday, June 15, 2012 - 8:09 pm
You do not need to do a growth model for the inflated part of the model.
 Li Lin posted on Friday, September 28, 2012 - 8:56 am
I have longitudinal data on sexual functioning, which was measured as a continuous score if the person had sex in the past certain period, otherwise the score is missing. At each time point, if no action, reasons for not having action was recorded as sexual functioning related or not. We are planning to use the two-part growth model for this longitudinal semicontinuous data. To evaluate possible effect of an intervention, it would be desirable to exclude the 0s from the u-part if the no action was due to a function-unrelated reason. In order to do that, where should I put this time-varied unrelated reason (1/0) in the model (u-part specifically)? How to specify the model? Any reference paper would be great. Thanks!
 Linda K. Muthen posted on Sunday, September 30, 2012 - 10:43 am
It sounds like zero represents two things for the binary variable and that you can distinguish between these two. I would turn the zeroes that you want to exclude into missing values.
 Li Lin posted on Tuesday, October 02, 2012 - 1:31 pm
Thanks! I have another question – does the newest version Mplus support Bayesian estimator in two-part model with correlated intercepts and slopes? I am using version 5.21. When I specified "ANALYSIS: ESTIMATOR = BAYES", I got an error saying "*** FATAL ERROR Internal Error Code: VARIANCE COVARIANCE MATRIX NOT SUPPORTED WITH ESTIMATOR=BAYES."
 Bengt O. Muthen posted on Tuesday, October 02, 2012 - 1:48 pm
You need at least Version 6 for Bayes, and preferably Version 7.

Two-part Bayes can be done - see

Muthén, B. (2010). Bayesian analysis in Mplus: A brief introduction. Technical Report.

which is available on our web site together with Mplus scripts.
 Star David posted on Tuesday, November 20, 2012 - 6:01 pm
I'm currently running a GMM with a censored normal data (internalizing problems, sample size 2000), and it usually will take 3-4 hours to finish the analysis, but if I treat the data as normal it run fast. I think our PC is in the top level (i7 8Gram). I noticed that in UG EX.8.3, it said "numerical integration becomes increasingly more computationally demanding as the number of factors and the sample size increase", if there are any way to reduce the time that the program take? And if it's appropriate to treat a censored normal distribution data as normal when conducting GMM, the results would have any difference?
 Bengt O. Muthen posted on Tuesday, November 20, 2012 - 6:32 pm
How many growth factors do you have?

What's the percentage of subjects at the censoring point for each time point?
 Star David posted on Wednesday, November 21, 2012 - 5:35 am
Thanks for replying so quickly!
We have 6 time points and I've tried both 4 growth factors(i s q cu) and 3 growth factors(i s q) model, the latter one take fewer time but still quite a long time(about 10000s with starts = 20 2; classes = c(3);).

Because our DV is continuous (internalizing problems), at each time point there're about 25% of the subjects score at 0.
 Bengt O. Muthen posted on Wednesday, November 21, 2012 - 4:10 pm
Often in mixture modeling with several classes you do not need all growth factors to be random but can fix the variances for some of them. So for instance if you have i s q, you can fix q@0 (still estimating the q mean) and therefore have only 2 dimensions of integration.

With only 25% censoring you probably get a reasonable approximation by treating the outcomes as regular continuous variables. You can use this to more quickly search for which growth factors need to have free variances and thereby check my conjecture above.
 Star David posted on Wednesday, November 21, 2012 - 6:10 pm
Thanks for your advice!
 matteo giletta posted on Tuesday, September 17, 2013 - 9:10 am
Dear Drs. Muthen,

I am conducting some growth models with a censored outcome.
What is the best way to compare unconditional models (e.g., linear vs. quadratic)? Can I compare models as nested using loglikelihood difference tests?

Thank you so much!

 Linda K. Muthen posted on Tuesday, September 17, 2013 - 5:43 pm
The models are nested but because the variance of the quadratic growth factor is fixed at zero in the linear model which is on the border of the admissible parameter space, the difference may not be distributed chi-square. I would compare the models using BIC.
 matteo giletta posted on Wednesday, September 18, 2013 - 8:39 am
Thanks so much for your quick reply!
Would the sample-size adjusted BIC be more appropriate than the BIC in this case? and is there any suggested cut-off to consider a decrease in the BIC a significant indicator of model improvement?
 Linda K. Muthen posted on Wednesday, September 18, 2013 - 10:35 am
I would use BIC. There is a FAQ on the website about cutoffs and BIC.
 Lisa M. Yarnell posted on Thursday, January 23, 2014 - 9:51 pm
Hello, I want to construct a growth model for count indicators that are censored. This seems to be a blend of Examples 6.3 and 6.7 in the current Users' Guide.

Our growth data are symptom counts, which are censored at the time points for which a person is not yet a drinker (and in, fact, censored-inflated as in Ex. 6.3 for the first time point--because many people are not yet drinkers). But as the sample begins to drink over the years, they will have positive symptom counts; and the symptom counts will likely still be zero-inflated, somewhat.

The model I would like to run could thus be more similar to Ex. 6.7, where the zeros can arise from two subpopulations (those who are not yet drinkers and hence have zero symptom count, and those who ARE drinkers but are not showing symptoms). But...I do like the idea of explicitly capturing the censoring of the symptom count data at the earlier time points, as in Example 6.3. In Poisson, the two sources of zeros are thought to be latent, allowing for mixing--while we do know the source of the zeros, and thus could model this explicitly as being censored.

Yet, the indicators are specified as continuous (not count) in Ex. 6.3.

Can you give suggestions on whether our problem could/should be thought of as more similar to 6.3 or 6.7?

Or, can I run model 6.3 but specify the indicators as count?
 Katherine E. Masyn posted on Friday, January 24, 2014 - 2:30 pm
Hi, Lisa.

You might consider an onset-to-growth model (a.k.a. a launch model) where the onset of drinking marks the beginning of the growth process for a given individual (and, as such, the pre-onset zeros are not even part of the growth outcomes). Then in your growth model, you only need to deal with the pile-up of zeros among those who are drinkers but with a zero symptom count.

Check out this article--not exactly what you need for the growth portion of your model but it will give you a start on the onset-to-growth set-up.

Malone, P. S., Northrup, T. F., Masyn, K. E., Lamis, D. A., & Lamont, A. E. (2012). Initiation and persistence of alcohol use in United States Black, Hispanic, and White male and female youth. Addictive Behaviors, 37, 299-305.

Katherine Masyn
 Tom Booth posted on Monday, April 07, 2014 - 11:53 pm
Dear Bengt & Linda,

I am running a simple 4-wave linear GCM with time varying covariates. Across waves I have drop out due to death or serious illness in an ageing sample. I need to account for this. From reading above, it seems the pattern mixture approach using SDROPOUT is the way to go. Does this seem correct?

The GCM here concerns cognitive variables, so whilst cognitive decline is expected prior to death, I do not think I want to predict mortality from cognitive scores in this instance - hence I think the above would be more suitable than a joint GCM survival approach.


 Linda K. Muthen posted on Tuesday, April 08, 2014 - 9:56 am
You should use DDROPOUT not SDROPOUT. See Examples 11.2 and 11.4.
 Tom Booth posted on Wednesday, April 09, 2014 - 6:53 am
Thanks Linda.
 Carlijn C posted on Wednesday, July 02, 2014 - 3:18 pm

I'm running a latent growth model on multiple groups (2 groups: a control and an experimental group). I have a very skewed variable with a lot of zeros (range 0-13; a lower value is better).
The original means of group 1 were: (1) 1.296 (2) 0.724 (3) 0.960 and in group 2: (1) 0.894 (2) 0.599 (3) 0.361. The slope mean of group 1: -0.144, and group 2: -0.260 (significant).
Now, when I'm using the censored method, the means of group 1 are: (1) -1.147 (2) -0.999 (3) -4.157, and in group 2: (1) -0.619 (2) -0.932 (3) -1.059. The slope mean of group 1 is: -0.841, and in group 2: -0.218.

Without the censored method, it seems to me that group 2, the experimental group, did better compared to group 1, the control group. However, with the censored method, it seems this is not longer the case. So I don't understand the outcomes in the censored method. How should I interpret the outcomes in the censored method?
 Bengt O. Muthen posted on Wednesday, July 02, 2014 - 5:19 pm
With a lot of zeros I would recommend instead looking at the two-part growth modeling approach because it is a richer model that can show you separate treatment effects on the probability of zero and the amount above zero. You have a UG example to start from and we also teach on it in our short course videos and handouts.
 anonymous Z posted on Saturday, October 24, 2015 - 2:46 pm
Dear Drs. Muthen,

I am fitting a model with drug use across five time points as the outcome variable (a lot of zeros because some people don’t use, and then the variable value is continuous for those who use). My independent variables are relationship status (0 as not in a relationship and 1 as being in a relationship) and relationship quality (if the participants are not in a relationship at certain time point, the value will be missing) across five time points. I prefer to use a multilevel approach rather a multivariate approach to model the growth curve.

I originally did censored growth modeling (censored from below). Below is the model. I have two questions:
1) Does the explanation for the beta coefficients I get from the censored model the same as when drug is not censored?
2) It seems that you suggest two-part modeling is a better option. Can two-part modeling be done with multilevel approach (univariate approach)? If so, what the syntax should be like?

Thanks so much!

CENSORED ARE drug (b);

Drug on relationship_status;
Drug on relationship_quality;
s | Drug on time;

Drug on treatment;
Drug with s;
 Bengt O. Muthen posted on Saturday, October 24, 2015 - 4:50 pm
1) Yes, beta refers to the underlying uncensored variable.

2) That should work, although I don't think I have done it. You can create the 2 variables (so a bivariate, 2-level model) the same way as we show for wide, 1-level. I think you can use DATA TWOPART to do that.
 anonymous Z posted on Monday, October 26, 2015 - 12:14 pm
Hi, Dr. Muthen,

Thanks for your advice. I ran the model as two-part modeling. Below was the syntax I added. I have two questions:

1.The analysis ran without any error message, however, the output didn't list anything about binary or continuous parts. I got pretty similar results as when I didn't use "DATA TWOPART." Is something wrong with it?


2. The second question is about censored-inflated model. From what I read, the censored-inflated model is very similar to the two-part modeling? what is the difference between them?

Thanks so much,
 anonymous Z posted on Monday, October 26, 2015 - 1:16 pm
To add to my question, when I do the censored-inflated model with the multilevel approach, I got results similar to normal multilevel approach, i.e, the output didn't show the results for binary versus the inflation part. Anything missing in my syntax?
 Bengt O. Muthen posted on Monday, October 26, 2015 - 5:50 pm
First post:

1. You need to specify a growth model for both parts - see the UG example for 2-part.

2. The censored-inflated model is a 2-class model. The 2-part model does not use 2 classes. But they often have quite similar fits.

Second post:

You need to model the inflated part as well.
 anonymous Z posted on Tuesday, October 27, 2015 - 7:17 am
Thank you so much! Could you recommend any papers doing censored-inflated models, which I can refer to to help describe the analysis?
 anonymous Z posted on Tuesday, October 27, 2015 - 11:04 am
Hi Dr. Muthen,

Below is my syntax for the censored-inflated modeling. I got the error message. How should I fix the problem?

Thanks so much,

*** ERROR in MODEL command
Unknown variable: OPi

CENSORED are OP(bi);

cluster = ID;
within = time REL_STA PARTNER;
Between = TX_2 eth;
analysis: type = twolevel random;

S| OP on time;

Si| OPi on time;

OP ON TX_2 eth;
S ON TX_2 eth;

OPi ON TX_2 eth;
Si ON TX_2;
 Bengt O. Muthen posted on Tuesday, October 27, 2015 - 3:00 pm
Opi is not how the inflation part is referred to. See the UG for examples with censored-inflated.
 Ads posted on Monday, February 20, 2017 - 6:19 am
I am looking at a growth mixture model of viral load in HIV over 20 years. However, there are two types of censoring in the dataset, and I was wondering how these might be addressed in models:

1. Via death of participant
2. Participants were recruited at different timepoints (e.g., some at year 1, some at year 10, some at year 19). So some participants have the possibility of having 20 years of data, while others have the possibility of having only 1 (i.e., those recruited at year 19).

How could Mplus be used to incorporate both of these sources of censoring, as they arise from very different processes? As I am aware Mplus cannot handle when variables are censored both above and below and I wanted to ask if there was a way to handle this scenario.

This is a naturalistic study (continually recruited participants and tracked them over time) and I would like to scale the time variable as time since baseline visit unless there is a better way (all suggestions are welcome).
 Bengt O. Muthen posted on Monday, February 20, 2017 - 6:20 pm
1. You can try MAR, but see also the paper on our Missing data web page:

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Contact the first author. Click here to view Mplus outputs used in this paper. Paper can be downloaded from here.

2. With time since baseline, it seems that this censoring would not be a problem.
 Ads posted on Tuesday, February 21, 2017 - 8:58 am
Thank you!
 Brandon Goldstein posted on Friday, March 13, 2020 - 12:52 pm
I am trying to run a two part growth model with 4 time points (number of zeros ranges from 26% to 54%).
Following the manual for model 6.16, iu and su are the intercept and slope for the binary part and iy and sy. In model 6.16 su is held at 0.
I receive an error to look into Tech 4. This shows that the variance for sy is -0.005.
1. Is it appropriate to run a model with sy@0 as well?
2. What is the interpretation of two part growth model in which both sy@0 and su@0. Would that suggest that a growth model is not appropriate for this data?
Thank you!
 Bengt O. Muthen posted on Saturday, March 14, 2020 - 3:03 pm
1. Yes.

2. No - the growth model is fine. It just says that there is no variation across people in the growth. You still get means for the s's.
 Brandon Goldstein posted on Monday, March 16, 2020 - 6:07 am
Thank you
I had forgotten that @0 was a fix for variance. My mistake.
A follow-up question. If the there is no variation across the sample, is this akin to a "fixed" effect? Such that, if a variable is associated with the mean, it would be interpreted similar to a between subjects effect?
I guess this is really a question about how to interpret associations with the estimated mean for the slope. Thanks again!
 Bengt O. Muthen posted on Monday, March 16, 2020 - 10:25 am
Yes, a fixed effect.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message