Message/Author 

Dan Bauer posted on Tuesday, July 11, 2000  10:18 am



I am hoping to use MPlus to do some growth modeling with censored data. The data cover credit debt over 4 time points. The censoring is on the low end of the distribution, at $500. The censoring threshold is invariant across the four time points. It seems like a growth model of these data would be a relatively straightforward extension of the case where data are categorical. That is, both analyses would proceed from a polychoric correlation matrix and incorporate information on thresholds. My question is whether I am correct that growth modeling can be done with censored variables, and, if so, whether MPlus can do it. References to relevant texts or examples would also be appreciated. 


LISCOMP, the precursor to Mplus, allowed censored variables by Tobit modeling but we felt that this used too strong assumptions and did not include it in Mplus. Although not optimal, I would suggest treating your variables as ordered polytomous, categorizing debts above $500 into a couple of categories. A reference related to this is (although not in a longitudinal context): Muthén, B., & Speckart, G. (1983). Categorizing skewed, limited dependent variables:Using multivariate probit regression to evaluate the California Civil Addict Program. Evaluation Review, 7, 257269. (#3) Future developments will probably handle these kinds of variables in a better way. 

Dan Bauer posted on Tuesday, July 18, 2000  5:22 am



Thank you for responding so promptly to my question about censored data. I wonder if you would not mind expanding your comments. My colleagues here feel that rendering censored data into polytomous categories is losing valuable information. Further, either treatment of the variable (ordinal or censored) requires the assumption that there is an underlying normal distribution. Is there an assumption peculiar to censored data that is particularly suspect? 

bmuthen posted on Tuesday, July 18, 2000  7:43 am



The assumption of censorednormal Tobit that I find limiting is that the coefficients for the covariates' influence on the probability of censoring are proportional to the coefficients of the covariates' influence on the amount observed when not censored. The ordered polytomous model would seem to share this limitation in this application. An alternative approach has been taken using twopart (semicontinuous) modeling. You may find it useful to read a recent paper on this by Olsen and Schafer at Penn State. It can be found on the web site http://www.stat.psu.edu/~jls/fspaper.pdf This paper is being revised for JASA. 

Anonymous posted on Friday, August 06, 2004  8:49 pm



Hi, Great bulletin board here! I have a growth modelling question with some survival aspect involved. I have a continuous variable (say a functional disability score) which is supposed to have been measured at three times, say times 0, 2 and 6. However, some individuals die in between times 2 and 6, and therefore have no measurement. These deaths occurred while there was still function, so I am unwilling to plug in a zero score for them, as has been suggested elsewhere. I am interested in primarily the traditional growth modelling aspect, in which I would like to characterise the latent growth curves (ie linear slopes) across people. My question is, how do I handle the deaths in this analysis? I guess one way is to assume their time 6 measurements are missing at random (conditional on observed trajectories), and therefore "automatically" accounted for in a likelihood based analysis? Or is there a more elegant method in Mplus which can somehow jointly model both the longitudinal measurements and the survival process? With this,survivors would be censored at time 6, but those dying would have their time of death recorded (exactly) but would have no functional disability measurement at that exact death time. Many thanks in advance. 

bmuthen posted on Sunday, August 08, 2004  11:57 am



MAR would seem a good first step, letting the scores from the first 2 time points predict the time 6 missingness. You could try more advanced, nonignorable missing data modeling in line with the movie of my Spring 2004 UCLA course: http://www.gseis.ucla.edu/faculty/muthen/courses.htm  see the Lecture 17 handout (under handouts). For instance, you can try to use the growth slope as a predictor of time 6 missingness. There have been several recent articles on joint survival modeling and growth modeling in the biostatistics literature. Mplus can not yet do continuoustime survival, but can do very general discretetime survival modeling. 


Hi Linda (sorry for two posts, but I want to keep things organized under the right categories), Running a censored LGM with values censored (above) at 4.0, I am having problems running a model with freely estimated slopefactor loadings. I.e., i s  1@0 2@1 3@* 4@* I am thinking this kind of a model might be difficult to estimate with censored data. Is this true? Thanks again! mike 


The way you specify free time scores is as follows: i s  1@0 2@1 3* 4*; I am not sure how the program would interpret @* This type of model should not be difficult with censored data unless the data are perhaps not truly censored and therefore that model is not appropriate for the data. 


Hi, Muthén and Muthén, Is it possible (with command “CENSORED” in ANALYSIS: TYPE=MIXTURE) that we specify some variables to be censored from below and above SIMULTANEOUSLY? Many thanks! 


Variables can be censored from above or below but not both. 


Is the model unstable? N = 88 facilities; t = 4 (collapsed over 24 months to 6 month blocks) ; measure y is rate of event per 100 beds; event is rare so lots of zeros => censor inflated. Model 5 i s  y1@0 y2@1 y3@2 y4@3; i2 s2  y1#1@0 y2#1@1 y3#1@2 y4#1@3; i s on StudyG bt1 bt2; I ON STUDYG 1.286 0.340 3.782 0.000 S ON STUDYG 0.374 0.126 2.956 0.003 Model 6 i s s2 on StudyG bt1 bt2; i2*0.757; s2*0.121; I ON STUDYG 1.717 0.794 2.163 0.031 S ON STUDYG 0.504 0.348 1.452 0.147 Model 7 i s i2 s2 on StudyG bt1 bt2; i2*0.757; s2*0.121; I ON STUDYG 2.033 1.184 1.717 0.086 S ON STUDYG 0.623 0.447 1.391 0.164 I am puzzled by the changes in the estimates and standard errors between the three models and would appreciate direction in understanding why they change sufficient to change the inference with the intercept and slope on StudyG. I am worried that the instability suggests more than one maxima. 


When you leave out paths that are not zero, your model may be misspecified. I would trust Model 7. 


Thanks. The earlier models were just steps along the way towards model 7 and the results of model 7 make sense so I am glad that it is the trustworthy one! Misspecification can certainly have a powerful effect! 

Mariam Dum posted on Monday, April 21, 2008  1:55 pm



Hello, I was wondering if you can provide me wtih a good reference on censored data for growth mixture modeling? 


When you say censored, do you mean censored normal data or right censoring as seen in survival models? 


Hi, I am working on a dissertation regarding special education student populations and am using data censored from below in order to show ways to display the students' growth (or lack thereof) when they are not adequately answering standardized test items. I have looked at both the censored and censoredinflated models in the Mplus manual, but am having some trouble conceptually comprehending the real expected differences in path values between the two models. Any help please??? Thanks! 


Before I answer that, have you considered twopart growth modeling? See the Mplus web site for references (under Papers and under Growth). The OlsenSchafer article argues for twopart over censored growth modeling. 


I have read the paper, and from what I can determine, it seems that X variables with negative measurement error scores are treated as censored, while those with positive measurement error are treated as noncensored. In addition, their model is estimated using categorical variables, and I am analyzing continuous observations. 


I don't think you are reading OlsenSchafer correctly. Their article is about a continuous outcome which has a strong floor effect (piling up at zero). They turn that into what amounts to a parallel process model for the binary part and the continuous part. See how we describe it in the handout for the short course Topic 4 on our web site. See also the 2 twopart papers under Papers, TwoPart... on our web site. 


When you change the zeros to ones, there is more censoring because there were already some ones. This changes the data and therefore the model may have convergence problems. 


Thanks for the rapid response. I will be sure to change the entire structure of the data for the next time I run the model. I am also preparing to create a dataset which mimics the one in the example you provided. Is there a threshold for the amount of censored observations that can be included before I experience convergence problems? 


Each example in the user's guide has a Monte Carlo counterpart. This would be a good place to start if you are generating data. You can try various amounts of censoring in this way to see when it becomes problematic. 


In yout first post you stated that changing the zeros to ones adds to the amount of censoring and causes convergence problems. Is there a specified amount of censoring that is preferrable when trying to run this model? I was trying to find something about this in the manual and saw nothing specific about how much censoring is "too much" for the model. 


I don't know how much censoring is too much. 


Ok thanks. One last question (for today at least), does zero have to the be the value for the lowest observation in a floor effect model? Can other "low" values work in this format? 


The lowest value does not need to be zero. 


Hi, I have another set of questions as I've been working with the censoredinflated model, and I can't find the answers in the manual. I'm working on this with Debbi Bandalos and she couldn't answer my questions so I decided to come back here: 1) Is the intercept of the model fixed to zero by default of the program? 2) Is the "censored" part of the model due to censoring at specific time points, or censored across time? 3) How does this model select which values are censored? From what I can see, it looks like the model just selects based on the lowest value in the data, but I've not read anything that supports that. 4) Does the censored part have to be an integer, or can it also be a decimal? Thanks so much for all of your help. I promise to send you the paper when I'm finished with this :) 


1) No, it is free by default. 2) You have different censoring for different time points 3) It takes the lowest value unless you give it another value. 4) It can be a real number with decimals. 


Hello, I have a similar question as to the one posted "Anonymous posted on Friday, August 06, 2004  8:49 pm". I am using a growth curve model that estimates BMI baseline and change over 14 years in an older adult population. The outcomes of interest are BMI slope and intercept but I need to control for mortality selection in this population. Is there a way I can incorporate a discrete time survival analysis into the latent growth model to control for mortality selection? Thanks. 


If I understand you correctly, you are raising an interesting point about possible nonignorable missingness due to dropout in the form of mortality. That is, the usual MAR assumption that past observed outcome values predict missingness may not hold. There is a whole set of possibilities for handling such modeling in Mplus. Ironically, I am just preparing a talk on this topic for the UK Mplus Users Group to be presented June 8. This talk and Mplus scripts will be posted. I will also try to record this as a Camtasia web talk. Adding discretetime survival modeling is one way to handle this. It is related to so called DiggleKenward modeling, which can be done in Mplus. Another way is to use dropout dummy variables in line with Roy (2003) in Biometrics  this is better than patternmixture modeling in my view. The DiggleKenward approach represents "selection modeling", whereas the Roy approach represents "mixture modeling". 


I look forward to the positng of the talk and the MPlus scripts. Please let me know where I can find these resources once they are posted. Thanks. 

Michelle posted on Friday, June 26, 2009  7:28 am



Hi  Jumping here from the distal outcomes conversation. I am trying to understand missing vs censored data in a latent class growthsurvival analysis model similar to example 8.15 or 8.16 in the MPlus manual or the Muthen & Masyn (2005) paper. In my case, mortality can either be measured by wave (discrete time) or continuously. We have 4 waves of data over 20 yrs. We're like to use latent classes of healthy aging (defined by a score (y1y4)) to predict mortality (u1u4 in the discretetime model). I have missing data due to nonresponse at various individual waves, and I don't really know how MPlus is handling this. Unlike the examples provided in Muthen & Masyn (2005), these folks are not really rightcensored, as they come back into the dataset at later waves, or we have data on their deaths later. Does MPlus use the survival variable to differentiate between someone who is missing due to death vs someone who is a nonrespondent for one wave? A colleague has suggested imputing the nonresponses in order to reduce missingness to only one type (death)  is this an appropriate approach? Are there other ways to handle this, either in the data set up or the modeling approach? Thanks for any guidance you can give, Michelle 


It seems to me that intermittent missingness is more likely to obey the MAR assumption (of missingness being predicted by observed variables rather than the missing value that was not captured) than dropout (due to death) missingness. So intermittent missingness needs no special action in the modeling because Mplus does ML under MAR. Dropout missingness on the other hand may be nonignorable (NMAR; Not Missing At Random) and may therefore need the extra information on the dropout time to obey MAR (the dropout time provides the observed variable that causes missing which MAR needs). So when you add survival modeling to the growth modeling you protect yourself against biases due to dropout missingness. In short, no imputations are needed, just modeling. 

Michelle posted on Friday, June 26, 2009  12:18 pm



Thanks! This is very helpful. The final question I have is, how does MPlus know which instances are intermittent missing and which are missing due to death/dropout? Should these be coded differently in the dataset? Thanks! Michelle 


Mplus knows. They should not be coded differently. Missing due to dropout simply has missing also for all subsequent values. In the DTSA part once an event occurs you score u=1 and subsequent u's as missing, which means that this person is no longer part of the risk set for later death. U is the DV in your run so think regression where a person is not part of the computations when missing on the DV. 

Michelle posted on Monday, June 29, 2009  9:53 am



Thanks again! The help is much appreciated as I am very new to MPlus! Michelle 


Hello, I am using a growth curve model that estimates BMI baseline and change over 14 years in an older adult population. The outcomes of interest are BMI slope and intercept but I need to control for mortality selection in this population. I am able to create the GCM and the discrete time survival analysis seperately, but I am interested in incorporating both of these into a single model. Is there a better way to control for the nonignorable missing data caused by mortality selection? Is there syntax avaiable that would be helpful to me? Thank you. 


I forgot to ask if there is example syntax available for selection modeling? Thanks again. 


To account for possibly nonignorable missingness due to dropout, you may want to analyze the growth model and the survival model jointly. These 2 processes should be correlated. One approach to draw on is the so called DiggleKenward selection model. If you email me, I can send you the Mplus input for that approach. I am about to finish an overview paper on these matters. Selection modeling in the sense of Heckman is not available in Mplus. 

Tim Stump posted on Thursday, September 24, 2009  8:54 am



I have a continuous outcome measured at different time points and want to specify a growth model for this process. The outcome is censored above and below. I read in previous posts that you can't specify a variable as censored both above and below. Is this still true? Also, wondering if you have other options or ideas of different approaches for these kind of data. 


Yes, currently Mplus does not allow censoring from both below and above. Perhaps the variable can be dichotomized? If too much information gets lost that way, perhaps you want to ask why there are these two peaks in the distribution  is this a mixture of different types of people so that mixture modeling is relevant? In addition, perhaps a combination of censorednormal modeling and twopart modeling is possible. Or maybe we need threepart modeling. 

Tim Stump posted on Thursday, September 24, 2009  11:22 am



Dr. Muthen, thanks for responding. Just to provide more info, I'm analyzing the clearance of a certain antigen in urine and serum specimens of patients during a specific treatment. Measurements are taken at baseline, 1, 3, 5, 8, and 13 wks. The censoring limits are .6 at low end and 39 at the high end. Subjects scoring outside these limits from the assay are marked as <.6,>39. So, they actually have values outside the limits, but I don't know what they are. Hence, the reason for specifying a censored variable. There's piling up at either end, but the % of piling up ranges from 10% up to 50% depending on the specimen (urine or serum) and time point. Given this information, do you think mixture modeling would be relevant? Is three part modeling the same as mixture modeling? 


I wonder if your censoring shifts from one end of the scale to the other end over time? If so, you could specify censoredbelow/above for early/late time points. If you have censoring from both ends at a given time point, perhaps you have a mixture of patients  for which mixture modeling might be helpful even if it doesn't allow from censoring from above for one class and from below for another. 


2 and 3part modeling is not the same as mixture modeling. We teach on 2part modeling, mixture modeling, and their differences in our short courses  see handouts at http://www.statmodel.com/newhandouts.shtml 


Hello. I am using a growth curve selection model to adjust for NMAR when the y outcome is word recall scores in a sample of elderly adults. I believe the value of y at time t1 should predict drop out at time t. I have pasted the results for the discrete time portion of the model below. DD00 ON est s.e. est/s.e. pvalue TOTREC98 0.245 0.043 5.747 0.000 TOTREC000.790 0.090 8.765 0.000 odds ratio DD00 ON TOTREC98 1.277 TOTREC00 0.454 Am I correct in regressing death in 2000 (DD00) on word recall scores from 1998 (totrec98) and 2000 (totrec00)? If so, am I correct interpreting the odds ratio as "with a one unit increase in wordrecall in 1998, the odds of dropout in 2000 increase by 27.7%"? Thanks for your time. 


Yes, I think so. It sounds like you are working with the DiggleKenward selection model. Note, however, that the separate interpretation of the two coefficients (for y_t and y_{t1}) is not clearcut. DK (1994) discuss an alternative parameterization which you can do using Model Constraint. 


Dr. Muthen, I am using the DiggleKenward selection model with model constraint. I have pasted the model below. It is not intuitive that higher scores on word recall would be positively associated with mortality, so I will look more into the interpretation of those coefficients. Model: %overall% i s  totrec98@0 totrec00@.2 totrec02@.4 totrec04@.6 totrec06@.8 totrec08@1; I S ON testfx r4agey_b black hispanic other dd00 on totrec98 (beta2) totrec00 (beta1); dd02 on totrec00 (beta2) totrec02 (beta1); dd04 on totrec02 (beta2) totrec04 (beta1); dd06 on totrec04 (beta2) totrec06 (beta1); dd08 on totrec06 (beta2) totrec08 (beta1); Model constraint: new(theta1 theta2); theta1 = (beta1+beta2)/2; theta2 = (beta1beta2)/2; 


Are you referring to the estimates for the beta coefficients or the theta coefficients in your setup? Are the signs different for these two sets? See also DiggleKenward (1994) eqns (31) and (32). 


Dr. Muthen, In my previous post (2/20), the estimates and odds ratios I was referring to were the coefficients given for the logistic regression portion of the model. The coefficients for the theta parameters are as follows: New/Additional Parameters THETA1 0.225 0.023 9.659 0.000 THETA2 0.465 0.056 8.254 0.000 I am unclear if I should be interpreting the coefficients for the theta parameters or the logistic regression portion of the model. I appreciate your advice with these matters. 


Dr. Muthen, I have DK (1994) in front of me and I see how the reparameterization in eq 32 leads to more easily interpretable results. Thanks for your help with this. I enjoyed reading recently submitted work on nonignorable data modeling. Thanks again. Nick 


Dr. Muthen, What would be the best method for assessing model fit for the DK mixture models? 


With mixture models one no longer has the meancovariance model chi2 testing that we are used to in SEM. There are no simple sufficient statistics less than the raw data. Typically, a model is evaluated in comparison to another more general model, often by BIC. In Mplus you can also study the fit of the model relative to the data using TECH7. Here the modelestimated means, variances and covariances are compared to "observed" sample counterparts (the posteriorprobability weighed data). When there is no missing data, TECH13 makes it possible to check against 3rd and 4thorder moments as well. 


Hello, I am using the DATA MISSING command to create dropout variables for a patternmixture model, and have a question regarding the TYPE=DDROPOUT setting. When I look at the variables created by this function, a positive (1) dropout dummy variable is only present for the first period of dropout. For periods following the intial dropout, the dummy variable remains zero. When using the TYPE=SDROPOUT setting, the period of dropout is indicated by a 1, follwed by missing data indicators for the remaining periods (discretetime survival indicators). When using a patternmixture setup, do the DDROPOUT variables control for the fact that an indivudal remains unobserved after the intial period of dropout? A great deal of the dropout I am dealing with is caused by mortality, thus these indivudals remain unobserved after thier intial dropout. I am currenlty using the DDROPOUT setting to create missing data indicators for patternmixture models and the SDROPOUT to create missing data indicators for selection models. 


DDROPOUT creates a set of dummy variables which together tells you which dropout time category an individual belongs to. So for instance dropping out after time 1 with a total of 3 time points would be scored as 0 1 0 The last zero means that the person is not in the study at the 3rd time point either. So, yes, individuals remain unobserved after their initial dropout. In contrast, you can use Mplus to code intermittent missingness. DDROPOUT is used for exogenous indicators and SDROPOUT is used endogenous indicators. 


Hello, In using the Mplus DATA MISSING command, as I am aware, the facility produces the missing data indicators for the entire sample without distinction between causes of missingness. I have two forms of absorbing dropout in my study (mortality and institutionalization), as well as intermittent missingness. I would ideally like to create a separate dropout indicator for each of these sources of missing data, then use these to adjust the outcome trajectories separately. Is it possible for me to use the DATA MISSING command to create these separate missing data indicators? Is there another strategy I could use to properly adjust for these separate forms of dropout and intermittent missingness? Thank you. 


I don't know that it would be easy to use Data Missing for this  perhaps you might just as well create that information some other way (by Define, saving the new info; or by another program). You could create dropout dummy variables like in patternmixture modeling, where you make a distinction between types of dropout, and then see if the growth parameters differ across those dropout times and types. I haven't seen this done, however. 

Ross Larsen posted on Friday, September 16, 2011  7:50 am



Hello, I have an unusual case to analyze in growth modeling with censored data. Children were given a test at three time points. The test is administered in two parts. All children were only given the second part if they achieved a threshold score in the first part. In time 1, the second part of the test was not administered to ANY children. Thus, the children who would have taken the second part of the test were unable to to do so. This leads to time point 1 to have artificially low scores when analyzing the total test scores (part1+part2). Ordinarily, in SAS I would handle this by creating a censored variable that would tell the procedure which of the children made the threshold and should thus be considered censored from above. In Mplus the censored statement just looks at the highest score available and calls that censored. This is a problem as two children could have both made the the threshold but one child has a lower score and thus would not be considered censored. I thought of including my censored variable (0=did not make threshold, 1=made threshold) as a covariate on the intercept and slope, but I wanted your thoughts on this problem and how you would handle it. Thank you. 


Why not view the two parts of the test as two different variables, say x and y for the first and second part? For time 1, only x1 is observed and everyone has missing on y1 so we can ignore y1. For time 2, x2 is observed and y2 is observed for those above the threshold of x1 and missing for the others. Etc for time 3. Missingness fulfills MAR due to missing on y2 being determined by the observed x1 score. So for the first 2 time points the data set has 3 variables: x1, x2, and y2. None of them is censored. 


HELLO I want to build a Multiple indicator growth model. All my indicators are zero inflated. What is my best option? Thanks 


I understant that it'is two part modelling (one binary growth model and one continuous). I am not sure to understant how to transform my data. Creating a variable (u) wich take 0 or otherwise 1 is eazy. But will the second continuous variable (y), will have missing values for zeros? 


See DATA TWOPART in the user's guide. 


I have try to model a growing phenomenon but certain characteristics of the phenomenon make it hard to model as a regular growth model. I have preponderance of 0 The structure of the data evolve: • the first year some observation are 0 and some observation are 1 • the second year some observation are 1 some observations are 2 and some observation are 0 • the third year some observation are 1 some observations are 2 and some are 3 observation are 0 and so on... Any advices would be welcomed. 


If the variable is treated as categorical, see the CATEGORICAL * option on page 489 of the user's guide. 


It's more a count variable. The only binary variable in the model is the first year observation. Maybe I should consider the variable as categorical anyway. Second year would have three categories (0,1,2). ... Seventh year observations would have 8 categories (0,1,2,3,4,5,6,7). It's like a school with seven grade, where a majority will never finish the first grade, less that will finish the first, less that will finish the second grade etc. over seven years (A left censored final distribution with preponderence of zeros). I wish I could model this as a growth model. Thanks 


It sounds like an ordered categorical variable. I would use CATEGORICAL with the * setting. 


Dear Linda & Bengt, I’ve seen in Topic 4 that you presented a zeroinflated Poisson latent growth curve model in which you defined the intercept and slope only for the count part of the model and not for the zeroinflated one. I would like to do the same using a censored inflated growth curve model. I’ve estimated a conditional model in which I only defined intercept and slope (and the effects of covariates) for the continuous part of the model. Now, I wonder whether this is allowed or I would also need to define an intercept and slope for the inflated part of the model. Considering that this is computationally demanding, if allowed, I would like to avoid it. Thanks!!! 


You do not need to do a growth model for the inflated part of the model. 

Li Lin posted on Friday, September 28, 2012  8:56 am



I have longitudinal data on sexual functioning, which was measured as a continuous score if the person had sex in the past certain period, otherwise the score is missing. At each time point, if no action, reasons for not having action was recorded as sexual functioning related or not. We are planning to use the twopart growth model for this longitudinal semicontinuous data. To evaluate possible effect of an intervention, it would be desirable to exclude the 0s from the upart if the no action was due to a functionunrelated reason. In order to do that, where should I put this timevaried unrelated reason (1/0) in the model (upart specifically)? How to specify the model? Any reference paper would be great. Thanks! 


It sounds like zero represents two things for the binary variable and that you can distinguish between these two. I would turn the zeroes that you want to exclude into missing values. 

Li Lin posted on Tuesday, October 02, 2012  1:31 pm



Thanks! I have another question – does the newest version Mplus support Bayesian estimator in twopart model with correlated intercepts and slopes? I am using version 5.21. When I specified "ANALYSIS: ESTIMATOR = BAYES", I got an error saying "*** FATAL ERROR Internal Error Code: VARIANCE COVARIANCE MATRIX NOT SUPPORTED WITH ESTIMATOR=BAYES." 


You need at least Version 6 for Bayes, and preferably Version 7. Twopart Bayes can be done  see Muthén, B. (2010). Bayesian analysis in Mplus: A brief introduction. Technical Report. which is available on our web site together with Mplus scripts. 

Star David posted on Tuesday, November 20, 2012  6:01 pm



I'm currently running a GMM with a censored normal data (internalizing problems, sample size 2000), and it usually will take 34 hours to finish the analysis, but if I treat the data as normal it run fast. I think our PC is in the top level (i7 8Gram). I noticed that in UG EX.8.3, it said "numerical integration becomes increasingly more computationally demanding as the number of factors and the sample size increase", if there are any way to reduce the time that the program take? And if it's appropriate to treat a censored normal distribution data as normal when conducting GMM, the results would have any difference? 


How many growth factors do you have? What's the percentage of subjects at the censoring point for each time point? 

Star David posted on Wednesday, November 21, 2012  5:35 am



Thanks for replying so quickly! We have 6 time points and I've tried both 4 growth factors(i s q cu) and 3 growth factors(i s q) model, the latter one take fewer time but still quite a long time(about 10000s with starts = 20 2; classes = c(3);). Because our DV is continuous (internalizing problems), at each time point there're about 25% of the subjects score at 0. 


Often in mixture modeling with several classes you do not need all growth factors to be random but can fix the variances for some of them. So for instance if you have i s q, you can fix q@0 (still estimating the q mean) and therefore have only 2 dimensions of integration. With only 25% censoring you probably get a reasonable approximation by treating the outcomes as regular continuous variables. You can use this to more quickly search for which growth factors need to have free variances and thereby check my conjecture above. 

Star David posted on Wednesday, November 21, 2012  6:10 pm



Thanks for your advice! 


Dear Drs. Muthen, I am conducting some growth models with a censored outcome. What is the best way to compare unconditional models (e.g., linear vs. quadratic)? Can I compare models as nested using loglikelihood difference tests? Thank you so much! Matteo 


The models are nested but because the variance of the quadratic growth factor is fixed at zero in the linear model which is on the border of the admissible parameter space, the difference may not be distributed chisquare. I would compare the models using BIC. 


Thanks so much for your quick reply! Would the samplesize adjusted BIC be more appropriate than the BIC in this case? and is there any suggested cutoff to consider a decrease in the BIC a significant indicator of model improvement? 


I would use BIC. There is a FAQ on the website about cutoffs and BIC. 


Hello, I want to construct a growth model for count indicators that are censored. This seems to be a blend of Examples 6.3 and 6.7 in the current Users' Guide. Our growth data are symptom counts, which are censored at the time points for which a person is not yet a drinker (and in, fact, censoredinflated as in Ex. 6.3 for the first time pointbecause many people are not yet drinkers). But as the sample begins to drink over the years, they will have positive symptom counts; and the symptom counts will likely still be zeroinflated, somewhat. The model I would like to run could thus be more similar to Ex. 6.7, where the zeros can arise from two subpopulations (those who are not yet drinkers and hence have zero symptom count, and those who ARE drinkers but are not showing symptoms). But...I do like the idea of explicitly capturing the censoring of the symptom count data at the earlier time points, as in Example 6.3. In Poisson, the two sources of zeros are thought to be latent, allowing for mixingwhile we do know the source of the zeros, and thus could model this explicitly as being censored. Yet, the indicators are specified as continuous (not count) in Ex. 6.3. Can you give suggestions on whether our problem could/should be thought of as more similar to 6.3 or 6.7? Or, can I run model 6.3 but specify the indicators as count? 


Hi, Lisa. You might consider an onsettogrowth model (a.k.a. a launch model) where the onset of drinking marks the beginning of the growth process for a given individual (and, as such, the preonset zeros are not even part of the growth outcomes). Then in your growth model, you only need to deal with the pileup of zeros among those who are drinkers but with a zero symptom count. Check out this articlenot exactly what you need for the growth portion of your model but it will give you a start on the onsettogrowth setup. Malone, P. S., Northrup, T. F., Masyn, K. E., Lamis, D. A., & Lamont, A. E. (2012). Initiation and persistence of alcohol use in United States Black, Hispanic, and White male and female youth. Addictive Behaviors, 37, 299305. Cheers, Katherine Masyn 

Tom Booth posted on Monday, April 07, 2014  11:53 pm



Dear Bengt & Linda, I am running a simple 4wave linear GCM with time varying covariates. Across waves I have drop out due to death or serious illness in an ageing sample. I need to account for this. From reading above, it seems the pattern mixture approach using SDROPOUT is the way to go. Does this seem correct? The GCM here concerns cognitive variables, so whilst cognitive decline is expected prior to death, I do not think I want to predict mortality from cognitive scores in this instance  hence I think the above would be more suitable than a joint GCM survival approach. Thanks Tom 


You should use DDROPOUT not SDROPOUT. See Examples 11.2 and 11.4. 

Tom Booth posted on Wednesday, April 09, 2014  6:53 am



Thanks Linda. 

Carlijn C posted on Wednesday, July 02, 2014  3:18 pm



Hello, I'm running a latent growth model on multiple groups (2 groups: a control and an experimental group). I have a very skewed variable with a lot of zeros (range 013; a lower value is better). The original means of group 1 were: (1) 1.296 (2) 0.724 (3) 0.960 and in group 2: (1) 0.894 (2) 0.599 (3) 0.361. The slope mean of group 1: 0.144, and group 2: 0.260 (significant). Now, when I'm using the censored method, the means of group 1 are: (1) 1.147 (2) 0.999 (3) 4.157, and in group 2: (1) 0.619 (2) 0.932 (3) 1.059. The slope mean of group 1 is: 0.841, and in group 2: 0.218. Without the censored method, it seems to me that group 2, the experimental group, did better compared to group 1, the control group. However, with the censored method, it seems this is not longer the case. So I don't understand the outcomes in the censored method. How should I interpret the outcomes in the censored method? 


With a lot of zeros I would recommend instead looking at the twopart growth modeling approach because it is a richer model that can show you separate treatment effects on the probability of zero and the amount above zero. You have a UG example to start from and we also teach on it in our short course videos and handouts. 

anonymous Z posted on Saturday, October 24, 2015  2:46 pm



Dear Drs. Muthen, I am fitting a model with drug use across five time points as the outcome variable (a lot of zeros because some people don’t use, and then the variable value is continuous for those who use). My independent variables are relationship status (0 as not in a relationship and 1 as being in a relationship) and relationship quality (if the participants are not in a relationship at certain time point, the value will be missing) across five time points. I prefer to use a multilevel approach rather a multivariate approach to model the growth curve. I originally did censored growth modeling (censored from below). Below is the model. I have two questions: 1) Does the explanation for the beta coefficients I get from the censored model the same as when drug is not censored? 2) It seems that you suggest twopart modeling is a better option. Can twopart modeling be done with multilevel approach (univariate approach)? If so, what the syntax should be like? Thanks so much! VARIABLE: NAMES ARE drug; CENSORED ARE drug (b); %within% Drug on relationship_status; Drug on relationship_quality; s  Drug on time; %between% Drug on treatment; Drug with s; 


1) Yes, beta refers to the underlying uncensored variable. 2) That should work, although I don't think I have done it. You can create the 2 variables (so a bivariate, 2level model) the same way as we show for wide, 1level. I think you can use DATA TWOPART to do that. 

anonymous Z posted on Monday, October 26, 2015  12:14 pm



Hi, Dr. Muthen, Thanks for your advice. I ran the model as twopart modeling. Below was the syntax I added. I have two questions: 1.The analysis ran without any error message, however, the output didn't list anything about binary or continuous parts. I got pretty similar results as when I didn't use "DATA TWOPART." Is something wrong with it? DATA TWOPART: NAMES = AL; BINARY = binAL; CONTINUOUS = conAL; 2. The second question is about censoredinflated model. From what I read, the censoredinflated model is very similar to the twopart modeling? what is the difference between them? Thanks so much, 

anonymous Z posted on Monday, October 26, 2015  1:16 pm



To add to my question, when I do the censoredinflated model with the multilevel approach, I got results similar to normal multilevel approach, i.e, the output didn't show the results for binary versus the inflation part. Anything missing in my syntax? 


First post: 1. You need to specify a growth model for both parts  see the UG example for 2part. 2. The censoredinflated model is a 2class model. The 2part model does not use 2 classes. But they often have quite similar fits. Second post: You need to model the inflated part as well. 

anonymous Z posted on Tuesday, October 27, 2015  7:17 am



Thank you so much! Could you recommend any papers doing censoredinflated models, which I can refer to to help describe the analysis? 

anonymous Z posted on Tuesday, October 27, 2015  11:04 am



Hi Dr. Muthen, Below is my syntax for the censoredinflated modeling. I got the error message. How should I fix the problem? Thanks so much, *** ERROR in MODEL command Unknown variable: OPi CENSORED are OP(bi); cluster = ID; within = time REL_STA PARTNER; Between = TX_2 eth; analysis: type = twolevel random; MODEL: %within% S OP on time; OP ON REL_STA PARTNER; Si OPi on time; OPi ON REL_STA PARTNER; %between% OP ON TX_2 eth; S ON TX_2 eth; OP WITH S; OPi ON TX_2 eth; Si ON TX_2; OPi WITH Si; 


Opi is not how the inflation part is referred to. See the UG for examples with censoredinflated. 

Ads posted on Monday, February 20, 2017  6:19 am



I am looking at a growth mixture model of viral load in HIV over 20 years. However, there are two types of censoring in the dataset, and I was wondering how these might be addressed in models: 1. Via death of participant 2. Participants were recruited at different timepoints (e.g., some at year 1, some at year 10, some at year 19). So some participants have the possibility of having 20 years of data, while others have the possibility of having only 1 (i.e., those recruited at year 19). How could Mplus be used to incorporate both of these sources of censoring, as they arise from very different processes? As I am aware Mplus cannot handle when variables are censored both above and below and I wanted to ask if there was a way to handle this scenario. This is a naturalistic study (continually recruited participants and tracked them over time) and I would like to scale the time variable as time since baseline visit unless there is a better way (all suggestions are welcome). 


1. You can try MAR, but see also the paper on our Missing data web page: http://www.statmodel.com/missingdata.shtml Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with nonignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 1733. Contact the first author. Click here to view Mplus outputs used in this paper. Paper can be downloaded from here. 2. With time since baseline, it seems that this censoring would not be a problem. 

Ads posted on Tuesday, February 21, 2017  8:58 am



Thank you! 

Back to top 