There has been a discussion of censored variables on Mplus Discussion this week under the heading of Growth Modeling of Longitudinal Data: Growth Models For Censored Data. I think the information there may answer your questions.
Anonymous posted on Tuesday, March 22, 2005 - 2:58 pm
I am trying to solve a 5 equations system of simultaneous tobit models and would like to know if Mplus can handle before i buy it. Many Thanks
bmuthen posted on Tuesday, March 22, 2005 - 3:21 pm
Yes, Tobit is available for general SEM, latent variable models in Mplus. Not only by weighted least squares estimation, but also by ML. And for ML not only the classic Tobit (classic censored-normal), but also limit-inflated censored-normal as well as 2-part (semicontinuous) modeling a la Olsen & Schafer (2001) in JASA.
Anonymous posted on Tuesday, March 22, 2005 - 3:40 pm
Thanks for your prompt response---if i may ask another question; what is the maximum number of observations i could use in Mplus?
bmuthen posted on Tuesday, March 22, 2005 - 4:05 pm
There is no limitation on the max number of observations. The real limit is when the time consumed in the algorithms depend on the sample size. But very large sample sizes (e.g. 100K) should be possible in most cases.
Anonymous posted on Monday, April 11, 2005 - 3:54 am
Hi, I run a censored SEM using WLSMV. I am curious to know why the vector of unknown thresholds is said to be estimated and does not appear anywhere. Or is it done internally and cannot be observed?
BMuthen posted on Monday, April 11, 2005 - 5:46 am
I'm not sure what you mean. The censoring points are fixed and not estimated. They are taken from the data. Censored variables do not have thresholds. They have means or intercepts depending on the model.
Anonymous posted on Monday, April 11, 2005 - 1:58 pm
Ok thanks--I guess i omitted that the WLSMV solves for censored data as well as categorical data with latent variables. Now i understand that the tresholds refer to the later. Thanks!
Anonymous posted on Monday, June 27, 2005 - 8:51 am
I'm estimating an SEM involving an observed censored variable that is dependent on a range of predictors (one latent continuous, and several observed categorical and continuous), and is, in turn, set to predict a categorical outcome. I wonder about the interpretation of path coefficients - of paths both "from" and "to" the censored variable.
1) Are the unstandardized "Estimates" interpretable like linear regression coefficients? (For 1 unit of the independent var, the dependent var increases by b units?) How is this adjusted for the censoring?
2) All the "StdYX" coefficients that involve the censored variable are given as "999". Is this because the censored variable does not have a straightforward standard deviation, or does it indicate some flaw in my model? I'm using the WLSMV estimator.
3) Is there a way to compare the magnitude of "STD" coefficients involving a censored outcome or predictor with "STD" coefficients involving uncensored continuous outcomes?
In case that's important: My censored variable is "Age at first pregnancy", ranging from 16 to 33, censored above at age 33 years.
BMuthen posted on Tuesday, June 28, 2005 - 8:02 am
1. When you have a depedent censored variable, the regression refers to a linear regression of an underlying latent uncensored variable on its predictors.
2. 999 probably refers to the fact that you have a negative variance estimate.
3. You can compare the STDYX coefficients for a censored and uncensored outcome.
Thanks for that. Can I come back on question 2? All variance estimates (including that of the censored variable) in my model are positive. Could there be another source of the "999"? Also, the R-square of the censored variable is returned as zero.
Model fit indices are reasonable, all paths towards the censored variable and all paths from it are significant.
Are you using Version 3.12? If so, please send the output, data, and your license number to email@example.com.
Anonymous posted on Wednesday, July 20, 2005 - 6:46 am
What does it mean when Mplus returns "*******" for Residual Variances between dependent variables of a censored system?
Thanks in advance
bmuthen posted on Wednesday, July 20, 2005 - 7:20 am
This means that the values are huge - too large to fit in the output space. Either your variable is on a very large scale - which can be adjusted in Define - or the model fitting is not going well.
Anonymous posted on Wednesday, July 20, 2005 - 8:54 am
Thanks very much Dr.!
It turned out they were very large number because they were printed when I used DEFINE. But I am now worried about the meaning and significance of the coefficients estimates because they are different than the one I get when I don't use DEFINE. How can I deal with that? Thank you
bmuthen posted on Wednesday, July 20, 2005 - 9:27 am
This would need to be looked at in more detail for these 2 runs. Please send their input, output, data and your license number to firstname.lastname@example.org
Hi Bengt/Linda, I have run a series of tobit regressions in Mplus 3 and was a little unsure about the correct interpretation of the coefficients in my model which has a censored observed variable regressed onto several uncensored observed variables. Question #1 Is the metric of the uncensored latent dependent variable the same the observed dependent variable which would allow for interpretation of unstandardized coefficients to be the same as in ols regression? Question 2. I plan to present standarized coefficients for some predictors (psychological scale scores) but was wondering if (& how)researchers typically interpret standardized coefficients when the predictor is dichotomous ordered (e.g. high vs low risk groups) and unordered (e.g. gender)? Thanks in advance for any suggestions. cheers chris richardson
I am running a model which includes three confirmatory factors, all with indicators censored from below (at zero). In several cases these variables appear to be zero-inflated. The proportion of zero values for each variable ranges from about 3% to 68%; a majority of the variables have zero values for about 25% to 30% of their observations. I have three questions:
1) Is there a reasonable rule-of-thumb/principle in deciding when an inflation term becomes neccessary for modeling excess zeros in censored data?
2) In the specification of a confirmatory factor for which censor-inflated variables are indicators, is it necessary to create a second factor on which the inflation terms load, or is it sufficient to simply specify that the variables in the CFA are censor-inflated?
3) Given that, if I recall correctly, the INDIRECT command is not available in the context of ALGORITHM=INTEGRATION (required for censored data), I plan to proceed with an evaluation of medation using a series of nested model comparisons following the logic of the Baron and Kenny approach. Is the Wald Chi-Square Test of parameter constraints sufficient for doing this, or should I use the difference between the -2 Log Likelihoods? If I use the difference bewteen the -2 Log Likelihoods, how do I incorporate the scaling correction factor (supplied with each -2LL) into this calculation?
When you say "zero-inflated" it sounds like you have count outcomes, but since you also talk about censored-inflated variables, this tells me you have continuous outcomes so I will assume this. My preference is to use two-part modeling rather than censored or censored-inflated modeling; it is easier to work with and gives more flexible modeling. See the UG examples for two-part modeling.
1) My rule is 25%
3) Use the Wald test and you don't have to worry about the scaling correction. I don't know that one procedure is better than the other.
Thank you. To clarify, the data are not all integers, but they cannot be values less than zero. Does use of the semi-continuous approach become problematic when the mean of the continuous component is relatively low. The danger, as I see it, is that the model could then produce non-sensical predicted values (i.e. values less than zero). Is this issue simply viewed as having such minimal impact that it is generally not addressed?
If I do use a semicontinous representation of the factor indicators, am I correct in assuming that the dichotomous and continuous indicators would then all load on the factor?
I've implemented your recommendation regarding the two part model, which has worked beautifully. Regarding the use of the Wald Chi-Square Test for constraining parameters I am clearly not understanding something. When I constrain paths that the full model indicates are non-zero, I get a reliable (significant) Wald Chi-Square result. However, when I constrain parameters that the full model suggests are not different from zero, I also get a reliable Wald Chi-Square result. Specifically I am constrianing coefficients that represent the regression of the mediator on the predictor variable. Shouldn't it be the case that constraint of the parameter that is not different from zero should reult in a non-significant Wald Chi-Square?
You will need to send your inputs, data, outputs, and license number to email@example.com so we can take a closer look at this.
Carl Maas posted on Tuesday, May 13, 2008 - 12:34 pm
I am having trouble running an SEM model that includes variables whose distributions are treated as censored normal as well as a variable that is treated as categorical. Included in the specification of the model is a covariance between a dichotomous variable and a censored normal measured variables. Is it possible to do this? The error message I am getting reads, “"Covariances for categorical, censored, count or nominal variables with other observed variables are not defined. Problem with the statement: DATVIOVS WITH BINGE". When I omit the specification of the covariance, the model seems to run okay. Any suggestions?
I have been using 'censored from below' to deal with the skew in my variables for LPA. In order to compare these results with SEMs for the same data, I need log-likelihood and information criteria, which aren't given in the SEM output for censored data.
I was wondering what is actually done to the data when it is listed as censored from below, and whether is is a process I could apply to the data before running an SEM?
Alternatively, is there an output command I could use to get the information criteria?