Message/Author 


I have a data set with a censored variable, and wonder if there are any options in MPlus that might let me handle this. Thanks. 


There has been a discussion of censored variables on Mplus Discussion this week under the heading of Growth Modeling of Longitudinal Data: Growth Models For Censored Data. I think the information there may answer your questions. 

Anonymous posted on Tuesday, March 22, 2005  2:58 pm



Hi, I am trying to solve a 5 equations system of simultaneous tobit models and would like to know if Mplus can handle before i buy it. Many Thanks 

bmuthen posted on Tuesday, March 22, 2005  3:21 pm



Yes, Tobit is available for general SEM, latent variable models in Mplus. Not only by weighted least squares estimation, but also by ML. And for ML not only the classic Tobit (classic censorednormal), but also limitinflated censorednormal as well as 2part (semicontinuous) modeling a la Olsen & Schafer (2001) in JASA. 

Anonymous posted on Tuesday, March 22, 2005  3:40 pm



Thanks for your prompt responseif i may ask another question; what is the maximum number of observations i could use in Mplus? 

bmuthen posted on Tuesday, March 22, 2005  4:05 pm



There is no limitation on the max number of observations. The real limit is when the time consumed in the algorithms depend on the sample size. But very large sample sizes (e.g. 100K) should be possible in most cases. 

Anonymous posted on Monday, April 11, 2005  3:54 am



Hi, I run a censored SEM using WLSMV. I am curious to know why the vector of unknown thresholds is said to be estimated and does not appear anywhere. Or is it done internally and cannot be observed? Thank you 

BMuthen posted on Monday, April 11, 2005  5:46 am



I'm not sure what you mean. The censoring points are fixed and not estimated. They are taken from the data. Censored variables do not have thresholds. They have means or intercepts depending on the model. 

Anonymous posted on Monday, April 11, 2005  1:58 pm



Ok thanksI guess i omitted that the WLSMV solves for censored data as well as categorical data with latent variables. Now i understand that the tresholds refer to the later. Thanks! 

Anonymous posted on Monday, June 27, 2005  8:51 am



Hi, I'm estimating an SEM involving an observed censored variable that is dependent on a range of predictors (one latent continuous, and several observed categorical and continuous), and is, in turn, set to predict a categorical outcome. I wonder about the interpretation of path coefficients  of paths both "from" and "to" the censored variable. 1) Are the unstandardized "Estimates" interpretable like linear regression coefficients? (For 1 unit of the independent var, the dependent var increases by b units?) How is this adjusted for the censoring? 2) All the "StdYX" coefficients that involve the censored variable are given as "999". Is this because the censored variable does not have a straightforward standard deviation, or does it indicate some flaw in my model? I'm using the WLSMV estimator. 3) Is there a way to compare the magnitude of "STD" coefficients involving a censored outcome or predictor with "STD" coefficients involving uncensored continuous outcomes? In case that's important: My censored variable is "Age at first pregnancy", ranging from 16 to 33, censored above at age 33 years. Thanks! 

BMuthen posted on Tuesday, June 28, 2005  8:02 am



1. When you have a depedent censored variable, the regression refers to a linear regression of an underlying latent uncensored variable on its predictors. 2. 999 probably refers to the fact that you have a negative variance estimate. 3. You can compare the STDYX coefficients for a censored and uncensored outcome. 


Dear Bengt, Thanks for that. Can I come back on question 2? All variance estimates (including that of the censored variable) in my model are positive. Could there be another source of the "999"? Also, the Rsquare of the censored variable is returned as zero. Model fit indices are reasonable, all paths towards the censored variable and all paths from it are significant. 


Are you using Version 3.12? If so, please send the output, data, and your license number to support@statmodel.com. 

Anonymous posted on Wednesday, July 20, 2005  6:46 am



Dear Dr., What does it mean when Mplus returns "*******" for Residual Variances between dependent variables of a censored system? Thanks in advance 

bmuthen posted on Wednesday, July 20, 2005  7:20 am



This means that the values are huge  too large to fit in the output space. Either your variable is on a very large scale  which can be adjusted in Define  or the model fitting is not going well. 

Anonymous posted on Wednesday, July 20, 2005  8:54 am



Thanks very much Dr.! It turned out they were very large number because they were printed when I used DEFINE. But I am now worried about the meaning and significance of the coefficients estimates because they are different than the one I get when I don't use DEFINE. How can I deal with that? Thank you 

bmuthen posted on Wednesday, July 20, 2005  9:27 am



This would need to be looked at in more detail for these 2 runs. Please send their input, output, data and your license number to support@statmodel.com 


Hi Bengt/Linda, I have run a series of tobit regressions in Mplus 3 and was a little unsure about the correct interpretation of the coefficients in my model which has a censored observed variable regressed onto several uncensored observed variables. Question #1 Is the metric of the uncensored latent dependent variable the same the observed dependent variable which would allow for interpretation of unstandardized coefficients to be the same as in ols regression? Question 2. I plan to present standarized coefficients for some predictors (psychological scale scores) but was wondering if (& how)researchers typically interpret standardized coefficients when the predictor is dichotomous ordered (e.g. high vs low risk groups) and unordered (e.g. gender)? Thanks in advance for any suggestions. cheers chris richardson 


1. Yes 2. I would not standardize wrt to such categorical predictor variables. You can still standardize wrt to the dependent variable. So using only SD(y) in the usual formula: stand(beta) = beta*SD(x)/SD(y). 


I am running a model which includes three confirmatory factors, all with indicators censored from below (at zero). In several cases these variables appear to be zeroinflated. The proportion of zero values for each variable ranges from about 3% to 68%; a majority of the variables have zero values for about 25% to 30% of their observations. I have three questions: 1) Is there a reasonable ruleofthumb/principle in deciding when an inflation term becomes neccessary for modeling excess zeros in censored data? 2) In the specification of a confirmatory factor for which censorinflated variables are indicators, is it necessary to create a second factor on which the inflation terms load, or is it sufficient to simply specify that the variables in the CFA are censorinflated? 3) Given that, if I recall correctly, the INDIRECT command is not available in the context of ALGORITHM=INTEGRATION (required for censored data), I plan to proceed with an evaluation of medation using a series of nested model comparisons following the logic of the Baron and Kenny approach. Is the Wald ChiSquare Test of parameter constraints sufficient for doing this, or should I use the difference between the 2 Log Likelihoods? If I use the difference bewteen the 2 Log Likelihoods, how do I incorporate the scaling correction factor (supplied with each 2LL) into this calculation? 


When you say "zeroinflated" it sounds like you have count outcomes, but since you also talk about censoredinflated variables, this tells me you have continuous outcomes so I will assume this. My preference is to use twopart modeling rather than censored or censoredinflated modeling; it is easier to work with and gives more flexible modeling. See the UG examples for twopart modeling. 1) My rule is 25% 2) No 3) Use the Wald test and you don't have to worry about the scaling correction. I don't know that one procedure is better than the other. 


Thank you. To clarify, the data are not all integers, but they cannot be values less than zero. Does use of the semicontinuous approach become problematic when the mean of the continuous component is relatively low. The danger, as I see it, is that the model could then produce nonsensical predicted values (i.e. values less than zero). Is this issue simply viewed as having such minimal impact that it is generally not addressed? If I do use a semicontinous representation of the factor indicators, am I correct in assuming that the dichotomous and continuous indicators would then all load on the factor? 


The twopart model does not allow the values to go below zero if zero is specified as the floor. The strength of the twopart approach is that the dichotomous and the continuous indicators need not load on the same factor. They can if you specify that, so it is testable. 


I've implemented your recommendation regarding the two part model, which has worked beautifully. Regarding the use of the Wald ChiSquare Test for constraining parameters I am clearly not understanding something. When I constrain paths that the full model indicates are nonzero, I get a reliable (significant) Wald ChiSquare result. However, when I constrain parameters that the full model suggests are not different from zero, I also get a reliable Wald ChiSquare result. Specifically I am constrianing coefficients that represent the regression of the mediator on the predictor variable. Shouldn't it be the case that constraint of the parameter that is not different from zero should reult in a nonsignificant Wald ChiSquare? 


You will need to send your inputs, data, outputs, and license number to support@statodel.com so we can take a closer look at this. 

Carl Maas posted on Tuesday, May 13, 2008  12:34 pm



I am having trouble running an SEM model that includes variables whose distributions are treated as censored normal as well as a variable that is treated as categorical. Included in the specification of the model is a covariance between a dichotomous variable and a censored normal measured variables. Is it possible to do this? The error message I am getting reads, “"Covariances for categorical, censored, count or nominal variables with other observed variables are not defined. Problem with the statement: DATVIOVS WITH BINGE". When I omit the specification of the covariance, the model seems to run okay. Any suggestions? 


Please send your output and license number to support@statmodel.com so I can see the entire picture. 

Wei Chun posted on Friday, September 25, 2009  1:24 am



Dear Linda, I try to run a regression model in which the value of the dependent variable is between 0 and 1. What regression should I run? Is it Tobit regression? Could I run it in Mplus? Many thanks for your advice. 


What exactly is the dependent variable? 

Wei Chun posted on Saturday, September 26, 2009  8:25 pm



Thanks Linda. The dependent variable is productivity (efficiency) change. Actually, it is greater or less than 1, between 0 and 2. Thank you for your help. 


If what you are saying is that you have a variable that ranges between 0 and 2, I would treat it as continuous unless there was a piling up at 0 or 2. 


Dear Linda and Bengt, I have been using 'censored from below' to deal with the skew in my variables for LPA. In order to compare these results with SEMs for the same data, I need loglikelihood and information criteria, which aren't given in the SEM output for censored data. I was wondering what is actually done to the data when it is listed as censored from below, and whether is is a process I could apply to the data before running an SEM? Alternatively, is there an output command I could use to get the information criteria? Thanks very much, Miriam 


It sounds like you are using WLSMV for the CFA and MLR for the LPA. Loglikelihoods are not available for WLSMV. Try using MLR for your CFA. 

Lars Bocker posted on Wednesday, February 26, 2014  8:41 am



Dear Linda and Bengt, I find a lot of information about censored dependent variables, but I have a question about censored independent variables. As part of my SEM I am regressing y on predictor X1. However the effect appears nonlinear: i.e. to a certain value of X1 the effect on y is positive, above this value it becomes negative. I splitted up X1 into two censored variables X1a (X1a=X1 below the limit; X1a=limit value above the limit) and X1b (vice versa). Simply regressing y on X1a and X1b simultaneously results in plausible outcomes, but I was wondering whether this procedure is correct, or whether I am overlooking something (e.g. specifying the censored scale of the predictors)? Best, Lars 


How about simply making y a function of x and (xmean)^2? 

Lars Bocker posted on Thursday, February 27, 2014  8:48 pm



Thanks Bengt for your suggestion. We tried this earlier but xsquared was nonsignificantly positive rather than significantly negative, and we thought maybe we should identify the limit more precisely by the two censored predictors. We also tried an approach with xinterval dummies, which worked, but we prefer a continuous variable approach. 


Sounds like you are doing a piecewise, or spline, approach which seems fine here. 


Hello, I'm running a growth model of suicidality over time with suicidality as a frequency 5point likert scale from "Never" to "At least every day". The scale is not normal (i.e. >80% of participants report "Never" at each time point). I'm try to determine if the model should be 0 inflated, Poisson, or Censored from below. In running multiple models, Censored seems to have the least problems with nonidentification, but nonidentification does repeatedly come up, specifically regarding the intercept parameters. Thank you for your advice!  L 


Try the CATEGORICAL option. Categorical data methodology deals with floor and ceiling effects. You might consider collapsing the categories to make it a binary variable if the cell sizes are small. 


Hi Linda, Thanks for this suggestion. I received an error noting that variables can't be both censored and categorical, so I've removed the censored option. To confirm, with this model: I would no longer be estimating 2 growth models (i.e. i s , ii si) but would instead estimate a model with a categorical dependent variable and assumptions that permit nonnormality? Additionally, if we including time varying covariates (i.e. substance use) with a similar Likert scale, would you recommend also defining this as categorical? The cell sizes are sufficient that we'd prefer to not make suicidality binary. Best L 


Q1. Right. Q2. A variable type should not be declared for covariates; they are treated as continuous. 


Normality is not a concept I would use with categorical variables. Categorical data methodology does not require variables to not have floor or ceiling effects. Covariates should not be defined as being categorical. In regression, covariates are treated as continuous. 


Dear Muthens I have a skewed outcome variable (aggression episodes per months in psychiatric ward  and many patients have zero or one episodes). Skewness = 4.11. Could censored regression be used in this case? Or should it only be used when smaller values could exist even if not observed (censored). Regards, Rolf Gjestad 


Sounds like the outcome is a count variable in which case Poisson or negbin regression can be used. 


Hi I am aware Mplus can handle censored manifest variables, but can it also handle censored latent variables, say for example, in longitudinal confirmatory factor analysis? Regards, David 


To some extent  see the paper Wall, M. M., Guo, J., & Amemiya, Y. (2012). Mixture factor analysis for approximating a nonnormally distributed continuous latent factor with continuous and dichotomous observed variables. Multivariate Behavioral Research, 47:2, 276313. download paper contact first author show abstract 

Back to top 