Censored data PreviousNext
Mplus Discussion > Structural Equation Modeling >
 Holmes Finch posted on Wednesday, July 19, 2000 - 12:42 pm
I have a data set with a censored variable, and wonder if there are any options in M-Plus that might let me handle this. Thanks.
 Linda K. Muthen posted on Wednesday, July 19, 2000 - 2:54 pm
There has been a discussion of censored variables on Mplus Discussion this week under the heading of Growth Modeling of Longitudinal Data: Growth Models For Censored Data. I think the information there may answer your questions.
 Anonymous posted on Tuesday, March 22, 2005 - 2:58 pm

I am trying to solve a 5 equations system of simultaneous tobit models and would like to know if Mplus can handle before i buy it. Many Thanks
 bmuthen posted on Tuesday, March 22, 2005 - 3:21 pm
Yes, Tobit is available for general SEM, latent variable models in Mplus. Not only by weighted least squares estimation, but also by ML. And for ML not only the classic Tobit (classic censored-normal), but also limit-inflated censored-normal as well as 2-part (semicontinuous) modeling a la Olsen & Schafer (2001) in JASA.
 Anonymous posted on Tuesday, March 22, 2005 - 3:40 pm
Thanks for your prompt response---if i may ask another question; what is the maximum number of observations i could use in Mplus?
 bmuthen posted on Tuesday, March 22, 2005 - 4:05 pm
There is no limitation on the max number of observations. The real limit is when the time consumed in the algorithms depend on the sample size. But very large sample sizes (e.g. 100K) should be possible in most cases.
 Anonymous posted on Monday, April 11, 2005 - 3:54 am
I run a censored SEM using WLSMV. I am curious to know why the vector of unknown thresholds is said to be estimated and does not appear anywhere. Or is it done internally and cannot be observed?

Thank you
 BMuthen posted on Monday, April 11, 2005 - 5:46 am
I'm not sure what you mean. The censoring points are fixed and not estimated. They are taken from the data. Censored variables do not have thresholds. They have means or intercepts depending on the model.
 Anonymous posted on Monday, April 11, 2005 - 1:58 pm
Ok thanks--I guess i omitted that the WLSMV solves for censored data as well as categorical data with latent variables. Now i understand that the tresholds refer to the later. Thanks!
 Anonymous posted on Monday, June 27, 2005 - 8:51 am

I'm estimating an SEM involving an observed censored variable that is dependent on a range of predictors (one latent continuous, and several observed categorical and continuous), and is, in turn, set to predict a categorical outcome. I wonder about the interpretation of path coefficients - of paths both "from" and "to" the censored variable.

1) Are the unstandardized "Estimates" interpretable like linear regression coefficients? (For 1 unit of the independent var, the dependent var increases by b units?) How is this adjusted for the censoring?

2) All the "StdYX" coefficients that involve the censored variable are given as "999". Is this because the censored variable does not have a straightforward standard deviation, or does it indicate some flaw in my model? I'm using the WLSMV estimator.

3) Is there a way to compare the magnitude of "STD" coefficients involving a censored outcome or predictor with "STD" coefficients involving uncensored continuous outcomes?

In case that's important: My censored variable is "Age at first pregnancy", ranging from 16 to 33, censored above at age 33 years.

 BMuthen posted on Tuesday, June 28, 2005 - 8:02 am
1. When you have a depedent censored variable, the regression refers to a linear regression of an underlying latent uncensored variable on its predictors.

2. 999 probably refers to the fact that you have a negative variance estimate.

3. You can compare the STDYX coefficients for a censored and uncensored outcome.
 Peter Martin posted on Wednesday, June 29, 2005 - 1:10 am
Dear Bengt,

Thanks for that. Can I come back on question 2? All variance estimates (including that of the censored variable) in my model are positive. Could there be another source of the "999"? Also, the R-square of the censored variable is returned as zero.

Model fit indices are reasonable, all paths towards the censored variable and all paths from it are significant.
 Linda K. Muthen posted on Wednesday, June 29, 2005 - 7:18 am
Are you using Version 3.12? If so, please send the output, data, and your license number to support@statmodel.com.
 Anonymous posted on Wednesday, July 20, 2005 - 6:46 am
Dear Dr.,

What does it mean when Mplus returns "*******" for Residual Variances between dependent variables of a censored system?

Thanks in advance
 bmuthen posted on Wednesday, July 20, 2005 - 7:20 am
This means that the values are huge - too large to fit in the output space. Either your variable is on a very large scale - which can be adjusted in Define - or the model fitting is not going well.
 Anonymous posted on Wednesday, July 20, 2005 - 8:54 am
Thanks very much Dr.!

It turned out they were very large number because they were printed when I used DEFINE. But I am now worried about the meaning and significance of the coefficients estimates because they are different than the one I get when I don't use DEFINE. How can I deal with that?
Thank you
 bmuthen posted on Wednesday, July 20, 2005 - 9:27 am
This would need to be looked at in more detail for these 2 runs. Please send their input, output, data and your license number to support@statmodel.com
 Chris G Richardson posted on Thursday, March 30, 2006 - 10:08 am
Hi Bengt/Linda,
I have run a series of tobit regressions in Mplus 3 and was a little
unsure about the correct interpretation of the coefficients in my model
which has a censored observed variable regressed onto several uncensored
observed variables. Question #1 Is the metric of the uncensored latent
dependent variable the same the observed dependent variable which would
allow for interpretation of unstandardized coefficients to be the same as
in ols regression? Question 2. I plan to present standarized coefficients
for some predictors (psychological scale scores) but was wondering if (&
how)researchers typically interpret standardized coefficients when the
predictor is dichotomous ordered (e.g. high vs low risk groups) and
unordered (e.g. gender)? Thanks in advance for any suggestions.
chris richardson
 Bengt O. Muthen posted on Thursday, March 30, 2006 - 3:20 pm
1. Yes

2. I would not standardize wrt to such categorical predictor variables. You can still standardize wrt to the dependent variable. So using only SD(y) in the usual formula:

stand(beta) = beta*SD(x)/SD(y).
 Charles Green posted on Wednesday, April 12, 2006 - 9:31 pm
I am running a model which includes three confirmatory factors, all with indicators censored from below (at zero). In several cases these variables appear to be zero-inflated. The proportion of zero values for each variable ranges from about 3% to 68%; a majority of the variables have zero values for about 25% to 30% of their observations. I have three questions:

1) Is there a reasonable rule-of-thumb/principle in deciding when an inflation term becomes neccessary for modeling excess zeros in censored data?

2) In the specification of a confirmatory factor for which censor-inflated variables are indicators, is it necessary to create a second factor on which the inflation terms load, or is it sufficient to simply specify that the variables in the CFA are censor-inflated?

3) Given that, if I recall correctly, the INDIRECT command is not available in the context of ALGORITHM=INTEGRATION (required for censored data), I plan to proceed with an evaluation of medation using a series of nested model comparisons following the logic of the Baron and Kenny approach. Is the Wald Chi-Square Test of parameter constraints sufficient for doing this, or should I use the difference between the -2 Log Likelihoods? If I use the difference bewteen the -2 Log Likelihoods, how do I incorporate the scaling correction factor (supplied with each -2LL) into this calculation?
 Bengt O. Muthen posted on Thursday, April 13, 2006 - 11:03 am
When you say "zero-inflated" it sounds like you have count outcomes, but since you also talk about censored-inflated variables, this tells me you have continuous outcomes so I will assume this. My preference is to use two-part modeling rather than censored or censored-inflated modeling; it is easier to work with and gives more flexible modeling. See the UG examples for two-part modeling.

1) My rule is 25%

2) No

3) Use the Wald test and you don't have to worry about the scaling correction. I don't know that one procedure is better than the other.
 Charles Green posted on Thursday, April 13, 2006 - 8:41 pm
Thank you. To clarify, the data are not all integers, but they cannot be values less than zero. Does use of the semi-continuous approach become problematic when the mean of the continuous component is relatively low. The danger, as I see it, is that the model could then produce non-sensical predicted values (i.e. values less than zero). Is this issue simply viewed as having such minimal impact that it is generally not addressed?

If I do use a semicontinous representation of the factor indicators, am I correct in assuming that the dichotomous and continuous indicators would then all load on the factor?
 Bengt O. Muthen posted on Friday, April 14, 2006 - 5:33 am
The two-part model does not allow the values to go below zero if zero is specified as the floor.

The strength of the two-part approach is that the dichotomous and the continuous indicators need not load on the same factor. They can if you specify that, so it is testable.
 Charles Green posted on Monday, April 17, 2006 - 10:33 am
I've implemented your recommendation regarding the two part model, which has worked beautifully. Regarding the use of the Wald Chi-Square Test for constraining parameters I am clearly not understanding something. When I constrain paths that the full model indicates are non-zero, I get a reliable (significant) Wald Chi-Square result. However, when I constrain parameters that the full model suggests are not different from zero, I also get a reliable Wald Chi-Square result. Specifically I am constrianing coefficients that represent the regression of the mediator on the predictor variable. Shouldn't it be the case that constraint of the parameter that is not different from zero should reult in a non-significant Wald Chi-Square?
 Linda K. Muthen posted on Monday, April 17, 2006 - 11:19 am
You will need to send your inputs, data, outputs, and license number to support@statodel.com so we can take a closer look at this.
 Carl Maas posted on Tuesday, May 13, 2008 - 12:34 pm
I am having trouble running an SEM model that includes variables whose distributions are treated as censored normal as well as a variable that is treated as categorical. Included in the specification of the model is a covariance between a dichotomous variable and a censored normal measured variables. Is it possible to do this? The error message I am getting reads, "Covariances for categorical, censored, count or nominal variables with other observed variables are not defined. Problem with the statement: DATVIOVS WITH BINGE". When I omit the specification of the covariance, the model seems to run okay. Any suggestions?
 Linda K. Muthen posted on Tuesday, May 13, 2008 - 12:48 pm
Please send your output and license number to support@statmodel.com so I can see the entire picture.
 Wei Chun posted on Friday, September 25, 2009 - 1:24 am
Dear Linda,

I try to run a regression model in which the value of the dependent variable is between 0 and 1. What regression should I run? Is it Tobit regression? Could I run it in Mplus?

Many thanks for your advice.
 Linda K. Muthen posted on Friday, September 25, 2009 - 9:31 am
What exactly is the dependent variable?
 Wei Chun posted on Saturday, September 26, 2009 - 8:25 pm
Thanks Linda.

The dependent variable is productivity (efficiency) change. Actually, it is greater or less than 1, between 0 and 2.

Thank you for your help.
 Linda K. Muthen posted on Monday, September 28, 2009 - 8:41 am
If what you are saying is that you have a variable that ranges between 0 and 2, I would treat it as continuous unless there was a piling up at 0 or 2.
 Miriam Forbes posted on Thursday, September 27, 2012 - 6:15 pm
Dear Linda and Bengt,

I have been using 'censored from below' to deal with the skew in my variables for LPA. In order to compare these results with SEMs for the same data, I need log-likelihood and information criteria, which aren't given in the SEM output for censored data.

I was wondering what is actually done to the data when it is listed as censored from below, and whether is is a process I could apply to the data before running an SEM?

Alternatively, is there an output command I could use to get the information criteria?

Thanks very much,

 Linda K. Muthen posted on Thursday, September 27, 2012 - 6:23 pm
It sounds like you are using WLSMV for the CFA and MLR for the LPA. Loglikelihoods are not available for WLSMV. Try using MLR for your CFA.
 Lars Bocker posted on Wednesday, February 26, 2014 - 8:41 am
Dear Linda and Bengt,

I find a lot of information about censored dependent variables, but I have a question about censored independent variables.

As part of my SEM I am regressing y on predictor X1. However the effect appears nonlinear: i.e. to a certain value of X1 the effect on y is positive, above this value it becomes negative.

I splitted up X1 into two censored variables X1a (X1a=X1 below the limit; X1a=limit value above the limit) and X1b (vice versa). Simply regressing y on X1a and X1b simultaneously results in plausible outcomes, but I was wondering whether this procedure is correct, or whether I am overlooking something (e.g. specifying the censored scale of the predictors)?

 Bengt O. Muthen posted on Wednesday, February 26, 2014 - 4:15 pm
How about simply making y a function of x and (x-mean)^2?
 Lars Bocker posted on Thursday, February 27, 2014 - 8:48 pm
Thanks Bengt for your suggestion. We tried this earlier but x-squared was non-significantly positive rather than significantly negative, and we thought maybe we should identify the limit more precisely by the two censored predictors. We also tried an approach with x-interval dummies, which worked, but we prefer a continuous variable approach.
 Bengt O. Muthen posted on Friday, February 28, 2014 - 9:45 am
Sounds like you are doing a piecewise, or spline, approach which seems fine here.
 Leslie Roos posted on Sunday, June 08, 2014 - 11:43 am

I'm running a growth model of suicidality over time with suicidality as a frequency 5-point likert scale from "Never" to "At least every day".

The scale is not normal (i.e. >80% of participants report "Never" at each time point). I'm try to determine if the model should be 0 inflated, Poisson, or Censored from below.

In running multiple models, Censored seems to have the least problems with non-identification, but non-identification does repeatedly come up, specifically regarding the intercept parameters.

Thank you for your advice!

- L
 Linda K. Muthen posted on Sunday, June 08, 2014 - 1:39 pm
Try the CATEGORICAL option. Categorical data methodology deals with floor and ceiling effects. You might consider collapsing the categories to make it a binary variable if the cell sizes are small.
 Leslie Roos posted on Sunday, June 08, 2014 - 2:26 pm
Hi Linda,

Thanks for this suggestion. I received an error noting that variables can't be both censored and categorical, so I've removed the censored option.

To confirm, with this model: I would no longer be estimating 2 growth models (i.e. i s , ii si) but would instead estimate a model with a categorical dependent variable and assumptions that permit non-normality?

Additionally, if we including time varying covariates (i.e. substance use) with a similar Likert scale, would you recommend also defining this as categorical?

The cell sizes are sufficient that we'd prefer to not make suicidality binary.

 Bengt O. Muthen posted on Sunday, June 08, 2014 - 4:12 pm
Q1. Right.

Q2. A variable type should not be declared for covariates; they are treated as continuous.
 Linda K. Muthen posted on Sunday, June 08, 2014 - 4:13 pm
Normality is not a concept I would use with categorical variables. Categorical data methodology does not require variables to not have floor or ceiling effects.

Covariates should not be defined as being categorical. In regression, covariates are treated as continuous.
 Rolf Gjestad posted on Friday, August 19, 2016 - 4:18 am
Dear Muthens

I have a skewed outcome variable (aggression episodes per months in psychiatric ward - and many patients have zero or one episodes). Skewness = 4.11. Could censored regression be used in this case? Or should it only be used when smaller values could exist even if not observed (censored).

Rolf Gjestad
 Bengt O. Muthen posted on Friday, August 19, 2016 - 12:00 pm
Sounds like the outcome is a count variable in which case Poisson or negbin regression can be used.
 David Jendryczko posted on Monday, January 22, 2018 - 4:07 am

I am aware Mplus can handle censored manifest variables, but can it also handle censored latent variables, say for example, in longitudinal confirmatory factor analysis?

 Bengt O. Muthen posted on Monday, January 22, 2018 - 10:49 am
To some extent - see the paper

Wall, M. M., Guo, J., & Amemiya, Y. (2012). Mixture factor analysis for approximating a nonnormally distributed continuous latent factor with continuous and dichotomous observed variables. Multivariate Behavioral Research, 47:2, 276-313.
download paper contact first author show abstract
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message