Censored & Nonnormal data PreviousNext
Mplus Discussion > Structural Equation Modeling >
 Anonymous posted on Wednesday, September 12, 2001 - 11:53 am
Using Mplus, what is the best way to deal with Censored & Nonnormal data? Do you recommend any estimation method?

I found from a stat book that WLSMV in Mplus is appropriate for nonnormal data. Is it also good for censored data?
 Linda K. Muthen posted on Thursday, September 13, 2001 - 9:40 am
For continuous censored variables, I would recommend the Mplus estimator MLM. For categorical outcomes with floor or ceiling effects, I would recommend the Mplus estimator WLSMV. You might also consider that the censoring or floor and ceiling effects is the result of a mixture of subpopulations and consider mixture modeling.
 Anonymous posted on Thursday, September 13, 2001 - 10:29 am
Thanks, Linda. Is it possible to use simultaneously missing data technique and MLM or WLSMV?
 Linda K. Muthen posted on Thursday, September 13, 2001 - 11:49 am
No, missing is available only for ML. There's a table on page 38 of the Mplus User's Guide which shows which estimators are available for various situations. You may find this helpful.
 Anonymous posted on Sunday, October 26, 2003 - 10:43 am
Does the newest version of Mplus have bootstrapping capabilities?
 Linda K. Muthen posted on Sunday, October 26, 2003 - 10:57 am
Mplus Version 3 will have bootstrapping of standard errors and confidence intervals.
 Liesbeth posted on Wednesday, September 29, 2004 - 4:00 am
Is there a single technique in MPLUS that can deal with missing values while the assumption of normality is violated?
 Linda K. Muthen posted on Wednesday, September 29, 2004 - 3:48 pm
There is no general theory for non-normality robust MAR missing data handling. See Web Note 2.
 Dustin posted on Thursday, December 30, 2004 - 1:40 pm
I am attempting to perform a four factor CFA, with each factor consisting of approximately 8-items. I have some missing data, and would also like to compare nested models to compare different factor structures. The problem is that the items are from a measure that uses a likert rating from 0-2.

Here is the question: Can I treat the items as non-normal continuous rather than ordinal and use MLR in Mplus 3? From what I understand, I can not do nested model chi square tests using WLSMV for ordinal variables. Are there any articles on when is it appropriate to treat likert items as continuous non-normal rather than ordinal?

 Linda K. Muthen posted on Thursday, December 30, 2004 - 1:44 pm
The DIFFTEST option of Version 3 allows chi-square difference testing with WLSMV.
 Dustin posted on Friday, December 31, 2004 - 4:55 am
That is great. Out of curiosity, do you have any opinion on when it is appropriate to treat likert scale items as continous non-normal rather than ordinal. There seems to be a dearth of research on the subject, so references in this regard would also be useful.

Thanks again. I look forward to attending some of your courses in the future.
 bmuthen posted on Friday, December 31, 2004 - 5:49 am
See the two Muthen-Kaplan references in the Mplus Reference section under SEM.
 Dustin posted on Friday, December 31, 2004 - 7:33 am
Last question regarding this issue.

The first step of my study involves a CFA with ordinal item indicators (0-2) to test a hypothesized four factor solution. The second step involves relating these factors to longitudinal data (probably a growth curve model) regarding the development of delinquency over 6 follow-up periods. While WLSMV seems approriate to determine the factor structutre, ML estimation seems more appropriate for handeling the missing data present on the outcome variable over time (especially since WLSMV uses pairwise deletion).

Would it be appropriate to first run the CFA with WLSMV and save the factor scores. Then run a ML growth model using the factor scores as predictors. Any other suggestions would be extremely helpful.
 bmuthen posted on Friday, December 31, 2004 - 7:47 am
It sounds like you have a factor as one of the time-invariant covariates of a growth process. What you suggests is reasonable.

Here are some thoughts on alternatives. Although doable, ML for categorical outcomes leads to heavy computations here given at least 3 latent variables (1 for the factor and 2 more for a growth intercept and slope). And, as you say, WLSMV would work with pairwise deletion. One question is what predicts the missingness on the growth outcome. Is it predicted by observed covariates, the factor covariate, or by the outcome at time 1? If the former, WLSMV might still be ok since WLSMV allows MAR wrt covariates. If one of the latter, WLSMV is not ok.
 Dustin posted on Friday, December 31, 2004 - 8:03 am
There are actually four factors that are assessed prior to the growth process, not just one. As a result, ML for categorical outcomes will not produce a chi-square statistic for testing nested model fit (Mplus says is too complex to estimate).

The issue of covariates being associated with missingness in the growth process is an interesting one. I am planning on cotrol for a prior history (lifetime) of deliqnuency at Time 1, while the intercept and slope of the growth factor will be assessed at times 2-7 (delinquency over the last 6 months). It seems like you are saying that this may help adjust for the fact that missingness may be related to delinquency.

Thanks for your helpful comments. As I loyal user of Mplus, this website is great.
 Anonymous posted on Friday, March 18, 2005 - 1:24 pm
When doing poisson regression models in Mplus, is there a way to correct the standard errors for overdispersion?
 Linda K. Muthen posted on Friday, March 18, 2005 - 3:54 pm
Although we have not studied this yet using a simulation study, we think that the MLR standard errors do this. Do you have a reference for a correction for overdisperson that you are thinking about?
 bmuthen posted on Saturday, March 19, 2005 - 4:57 am
Perhaps you are referring to zero-inflated Poisson (ZIP) modeling when you say "overdispersion". If so, yes Mplus can do
ZIP modeling and therefore gets the correct SEs.
 Anonymous posted on Monday, May 30, 2005 - 1:02 pm
If I have non-normal data but a very large sample size (>9000) am I ok if using MLE?
 bmuthen posted on Monday, May 30, 2005 - 3:39 pm
You don't need a very large sample for MLE to give non-normality robust point estimates (and non-normality robust SEs when using the Mplus MLR estimator). But your title mentions "censored" which implies that you have observed variables with a floor or ceiling effect, in which case a standard linear model is probably not appropriate (and large sample size gives no advantage) - in such situations it is better to switch to a non-linear model, such as a censored-normal model, a zero-inflated model, or a two-part model (see the Mplus Version 3 User's Guide).
 Henri Bonnabau posted on Thursday, November 17, 2005 - 3:41 am
could you give me some papers or books to consult for nonnormality data with
hight skweness and kurtosis estimation.
Thanks in advance
 bmuthen posted on Thursday, November 17, 2005 - 5:15 am
Search for articles by Mardia.
 Annonymous posted on Monday, January 30, 2006 - 7:57 am
If I can reduce skewness in my dependent variable from 2.03 to 0.056 without having to remove any outliers, is there any advantage to doing this? In other words, is there a degree of skewness after which WLS is no longer an appropriate estimator?
 Linda K. Muthen posted on Monday, January 30, 2006 - 8:22 am
What is the scale of your dependent variable? And how did you reduce the skewness?
 Annonymous posted on Monday, January 30, 2006 - 8:29 am
it is continuous, and the skewness was reduced by transforming the data in SAS with a macro for the box-cox approach to transformation.
 Linda K. Muthen posted on Monday, January 30, 2006 - 9:18 am
I would not recommend WLS which with continuous outcomes is ADF unless you have a very small model and very large sample. I also would not transform to avoid skewness unless there is another reason to do so, for example, a substantive reason. Instead I would use the MLR estimator.
 Matt Diemer posted on Wednesday, February 15, 2006 - 12:06 pm
To add a follow-up question to this thread:

How would you all recommend addressing skewness/kurtosis in a complex sample design data set using categorical indicators/variables? [also some missing data]

My intent was to use WLSMV (because of the categorical indicators) and have reviewed Yu's (2002) dissertation re: some of these issues.

any suggestions/recommendations for references would be much appreciated.

Thank you,
 bmuthen posted on Thursday, February 16, 2006 - 6:13 am
With categorical variables, the skewness/kurtosis is not a problem for model assumptions as it is with continuous outcomes where normality is violated. The only issue is the possibility of zero cells in bivariate tables, which can be problematic in that information on correlations between variables is limited. The additional feature of complex sample design is incorporated in Mplus. I have a 1989 Soc Meth & Research article on skewed binary outcomes that might be relevant - see our web site under References for categorical data.
 Nina Zuna posted on Sunday, August 20, 2006 - 2:38 pm
Dear Drs. Muthén and Muthén,

I was reading an older book chapter entitled SEM with Non-normal variables: Problems and Remedies by West, Finch, & Curran, 1995; the authors noted the use of a CVM estimator by Muthén.
1. Am I correct to assume that CVM at that time only referred to WLS, but now there are several estimators available in Mplus to handle non-normal data (e.g., MLR, MLM, WLS)?
Secondly, obviously extra caution should be extended when using Likert scales with <10 response options, particularly with Multiple Group Measurement Invariance testing (Lubke & Muthén,2004).
2. I am doing a multiple grp invariance test, have missing data, and am including the means in my model. I noticed I couldn't use MLM or MLMV with missing data. Is it OK to use MLR for Multiple grp Invariance in this situation (my scale is 1-5) or will my means by biased since not addressed in the MLR estimator (or are they)? What are my options?
3. Is a Likert scale considered categorical or continuous, non-normal? If one assumes a Likert scale is categorical as opposed to continuous, is there any need to do the test for multivariate normality or is this test reserved for only continuous variables?

Thank you kindly for your recommendations.
 Bengt O. Muthen posted on Monday, August 21, 2006 - 6:37 am
1. CVM referred to categorical variable methodology, which is represented by WLS, WLSM, and WLSMV, where the latter is the current Mplus default. Mplus now also does CVM using ML and MLR.

MLM refers to analysis with continuous outcomes using non-normality robust ML.

2. Multiple-group analysis with categorical outcomes can be done by WLSMV or by ML(R), where the latter needs to use KNOWNCLASS for the multiple groups. If you declare the dependent variables as categorical, the correct model will be used (irrespective of estimator) and therefore no problems with the estimates.

3. Likert scales can be considered categorical, continuous-normal, or continuous normal - it is your choice. Only if they are strongly skewed with pronounced floor or ceiling effects would I use CVM. Normality tests are only for continuous outcomes.
 Scott posted on Tuesday, August 21, 2007 - 11:07 am
I am conducting LGMM on data with a cohort-sequential design. I have missing data (not missing at random but due to the design). The DVs are continuous (index of self report delinquency).
1) Given the missingness, is there a way to test for nonnormality, beside the SK tests (outlined in Muthen, 2003)?

2) Instead, are the MLM and MLR options robust enough to handle most deviations from normality?

3) Also, what is the difference between MLM and MLR for the estimator options?

4) Which should I use in my case?

 Linda K. Muthen posted on Thursday, August 23, 2007 - 12:10 pm
1. I don't know of any way. You could run with ML and also with MLM or MLR and see if results differ thereby deducing whether there is non-normality.
2. Yes.
3. MLM is described in TechnicalAppendix 4. MLR is described in Technical Appendix 8. MLR uses a sandwich estimator.
4. We use MLR as the default. You should get very close results using either.
 Sofia Diamantopoulou posted on Monday, April 27, 2009 - 6:39 am
Dear Drs Muthen,

I am estimating a path model using MLR estimator and I wonder why chi-square is not included in the results for the tests of model fit.

Thank you in advance.
 Linda K. Muthen posted on Monday, April 27, 2009 - 7:45 am
You must be using this with outcomes that are not continuous. In this case, means, variances, and covariances are not sufficient statistics for model estimation and chi-square and related fit statistics are not available.
 Michin Hong posted on Wednesday, December 29, 2010 - 10:01 am
Dear Drs. Muthen,

I am pretty new to Mplus and working on a SEM model using a data set(n=1837).
For some reasons, I got the following warning messages for 8 different variables. They are all countinous variables with more than 5 categories and seem to be okay in terms of normality (skewness and kurtosis are all less than 2).

I alreday have 5 categorical variables out of 13 variables in the model including the oridnal endogenous variable with 4 categories. So, it seems that the program tries to treat all my variables categorical.

FYI, I used WLSMV estimator.










Thank you.
 Linda K. Muthen posted on Wednesday, December 29, 2010 - 10:43 am
Please send you input, data, output, and license number to support@statmodel.com. It sounds like you are not reading the data correctly.
 Grant Bickerton posted on Thursday, February 24, 2011 - 1:07 pm
I am new to MPlus, and looking to use the MLM estimator to help manage some kurtosis among in some variables. HOwever, I just cannot seem to get it to work. I have put ESTIMATOR = ML; in the analysis command, as well as have LISTWISE=ON; in the data command, but then I get the following error still and the analysis proceds but using the ML estimator.
*** WARNING in ANALYSIS command
Starting with Version 5, TYPE=MISSING is the default for all analyses.
To obtain listwise deletion, use LISTWISE=ON in the DATA command.

I am sorry for such a basic question, but is there something very obvious that I am not including?

Many thanks.
 Linda K. Muthen posted on Thursday, February 24, 2011 - 2:44 pm
The warning is just to inform you of a change starting with Version 5. The MLM estimator can be used only with complete data. Just put ESTIMATOR=MLM; in the ANALYSIS command to get MLM. You won't get that with ML. The MLR estimator is also robust to non-normality and can be used with incomplete data.
 anja koen posted on Tuesday, March 08, 2011 - 3:58 pm
Is the chi-square calculated by the MLR estimator a Satorra-Bentler scaled chi-square? Thanks!
 Bengt O. Muthen posted on Tuesday, March 08, 2011 - 6:36 pm
No. Satorra-Bentler is the MLM chi-2. The MLR chi-2 is asymptotically the same as the Yuan-Bentler (2000) T2* version (see V6 UG, page 533).
 Cameron Hopkin posted on Monday, October 24, 2011 - 12:13 pm
I'm a relative novice in regards to both M Plus and SEM. I've got a skewed count variable -- drug use in the last 30 days -- as my outcome variable, and would like to use zero-inflated poisson modeling to account for that non-normality, but I've got an interaction variable that is important to my analysis. As I understand it, ZIP modeling introduces a "structural zero" parameter -- how would that parameter function alongside interaction effects? Would I need to include it as a possible interactor as well? I'm not sure if I'm even thinking about this in the right way. Any pointers you could give would be appreciated.
 Bengt O. Muthen posted on Monday, October 24, 2011 - 6:25 pm
You can include the interaction, and other covariates, also in the prediction of the binary part of the ZIP (zero-inflated Poisson), that is, the prediction of being at zero or not.

Another approach is to specify the outcome as negative binomial where the inflation part is "built in" and doesn't need referring to.
 Cameron Hopkin posted on Saturday, October 29, 2011 - 7:00 am
Thank you. Speaking of the same model, what is the best way to deal with centering the indicators for the latent interaction variable? My advisor is under the impression that M Plus has some centering scheme by default for interactions, but all my data manipulation, including the construction of the AxB indicators, was done in a different program.

If, as I suspect, there is no default centering, would you recommend centering all indicators and including a mean structure, or using residual-centering with no mean structure as advocated in (Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13, 497–519)? Does the fact that this is a ZIP model have any bearing on that decision?

Thank you for the help!
 Bengt O. Muthen posted on Saturday, October 29, 2011 - 8:33 am
It sounds like you are interested in a latent variable interaction as a predictor. Note that Mplus offers the XWITH approach to that so that it is not necessary to use products of factor indicators. The latent variables entering the XWITH interaction are in typical models centered, that is, have zero means.
 Cameron Hopkin posted on Saturday, October 29, 2011 - 12:13 pm
In attempting to use the XWITH option I seem to have run astray, and am getting a "fatal error: reciprocal interaction problem" message. I think I've reproduced the syntax from the manual's example (5.13) as it applies to my situation (as described directly above), but perhaps I'm wrong. The model follows:

names = .....
usevar = ual30 Imp1 Imp2 Imp3 SS1 SS2 SS3;
missing = blank;
count = ual30 (i);

type = random;

Imp by Imp1 Imp2 Imp3;
SS by SS1 SS2 SS3;
ual30 on Imp SS;
ual30 on IMPxSS;

ual30#1 on Imp SS;
ual30#1 on IMPxSS;

Output: TECH1, TECH8;
 Cameron Hopkin posted on Monday, October 31, 2011 - 7:58 am
Never mind. It was a missing semi-colon on the XWITH statement. Pshh. Don't I feel silly.
 Tracey LaPierre posted on Sunday, November 06, 2011 - 3:08 pm
Dear Drs Muthen,
I am running a path analysis with depressive symptoms as my dependent variable, modeled as a continuous latent variable with 12 observed ordinal indicators that are a count of the number of days in the past week each symptom was experienced. I also have three latent variable mediators with categorical indicators, a categorical independent variable and a number of control variables. I am using WLSMV as my estimator, theta parameterization and calculating bcboot confidence intervals. l have good model fit (measurement & structural). I interpreted the coefficients for the continuous latent dependent variable as regular OLS coefficients. I am trying to respond to a reviewers concern about the non-normality of my dependent variable (and some of the latent mediators with the same issue). In previous work using Stata I have created a single indicator by summing the 12 indicators and logged it to reduce nonnormality, then tested for violations of the normality assumption. 1) Is there a corresponding assumption of normality for continuous latent variables in SEM? 2) How does one test if this assumption is violated and what are the consequences? 3) Can you transform a latent variable to normalize it? 4) Treating my indicators as categorical produces better model fit than treating them as continuous. Should I be logging them and treating them as continuous instead, even if it results is worse model fit?
 Bengt O. Muthen posted on Sunday, November 06, 2011 - 5:58 pm
The fact that your ordinal indicators have skewed distributions does not mean that the factor behind them has a non-normal distribution. You can have a normal factor influencing ordinal indicators and the reason they are skewed is that you are recording a rare event. The strong non-normality of the indicators is really only a potential problem if you treat them as continuous instead of categorical (or counts) (1) Normality is typically assumed for latent variables in SEM. (2) It is hard to test if this holds. (3) I would not transform the indicators. If they are counts you can also treat them as counts instead of categorical.
 Dennis Föste posted on Wednesday, June 19, 2013 - 6:12 am
Dear Drs. Muthén and Muthén,

is there any equivalent to the Stata SSC censornb (Hilbe, 2005) in MPLUS for a survival parameterization of censored negative binomial regression? The survival parameterization of censoring allows censoring to take place anywhere in the data, not only at cut points (Hilbe 2012: 406).

Thank you in advance!
 Bengt O. Muthen posted on Wednesday, June 19, 2013 - 11:57 am
We have truncated and hurdle negbin, but not survival-parameterized censored negbin.
 Eveline Hoeben posted on Tuesday, July 30, 2013 - 6:59 am
Dear Drs. Muthén and Muthén,

I want to estimate a mediation model with a count dependent variable (negative binomial).

The paper ‘Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.’ was very helpful and I was able to run the example inputfiles in tables 52-54. However, if I apply these syntaxes to my own data I get the following error message:'Unknown group name 1 specified in group-specific MODEL command.’
The message shows for count = y(p) as well as for count = y(nb).

Below is (part of) my inputfile, can you tell me what I am doing wrong? I use Mplus version 7.

Many thanks,



y ON x*.116(beta2);
beta1 | y ON m1;
beta1 ON x*-.009(beta3);
m1 ON x*.182(gamma1);
 Linda K. Muthen posted on Tuesday, July 30, 2013 - 7:24 am
Please send the output and your license number to support@statmodel.com.
 Matthew Clement posted on Friday, September 27, 2013 - 6:35 am
I'm hoping to use poisson with a log-link and robust standard errors to model a non-negative, positively skewed, continuous dependent variable. Is this possible in Mplus? If so, how is it done? I can't find any information on the discussion board about this.
 Linda K. Muthen posted on Friday, September 27, 2013 - 6:50 am
This is available in Mplus using the COUNT option. This is appropriate for count variables not for continuous variables.
 Matthew Clement posted on Friday, September 27, 2013 - 7:38 am
Thanks for the response. But, why can't poisson be employed to model continuous variables in Mplus? There is a use for it, and several other statistical software packages do it: http://www.stata.com/meeting/boston10/boston10_nichols.pdf

If you have any solution, it would be greatly appreciated. I'm a grad student, and I've turned to Mplus because of its superior SEM modeling capabilities.
 Tihomir Asparouhov posted on Friday, September 27, 2013 - 10:33 am
You can use the following example as a building block to accommodate GLM with log link.

data: file is 1.dat;
variable: names=x y; constrain=x; usevar=y;
model: [y] (mean);
model constraint: new(b); mean=exp(b*x);

Example 5.23 in the user's guide also illustrates some of these features but that example is unrelated to GLM.
 Matthew Clement posted on Friday, September 27, 2013 - 1:24 pm
Thanks for the response, Tihomir. I've never tried to incorporate a model constraint in Mplus before; I don't know what you're doing. What's "x"? Etc. Could you elaborate or point me to another source that talks about how to estimate generalized linear models in Mplus? Thanks, Matt
 Tihomir Asparouhov posted on Friday, September 27, 2013 - 2:39 pm
X is the predictor variable that comes from the data. In the above model Log(E(Y|X))=b*X which is the GLM model with log link function, i.e,
 Matthew Clement posted on Sunday, September 29, 2013 - 7:35 am
Thanks again for your help.

I have the following CONTINUOUS variables:

y1 (poisson, with log link)
y2 (gaussian, with identity link)

Ultimately, I would like to estimate the following nonrecursive model with robust standard errors:

y2 = y1 + x1 + x2
y1 = y2 + x1

How is that done in Mplus? (I don't see how the poisson family of GLMs is specified in your above example.)
 Tihomir Asparouhov posted on Monday, September 30, 2013 - 11:27 am
This should work

names=y1 y2 x1 x2;
usevar=y1 x2 y2d x1d;
define: y2d=y2; x1d=x1;
constrain=y2 x1;
model: [y1] (mean); y2d on y1 x1d x2;
model constraint: new(a b1 b2); mean=exp(a+b1*y2+b2*x1);

I am doubling the variables that need to be in the constraint and the model statement with the input but you can alternatively do that outside of Mplus. Also I forgot the intercept in my previous post.
 Bengt O. Muthen posted on Monday, September 30, 2013 - 12:38 pm
A FAQ on this has now been posted:

GLM with log link and Poisson regression for continuous variables
 Matthew Clement posted on Monday, September 30, 2013 - 4:49 pm

As suggested, I ran the following:

names=y1 y2 x1 x2;
usevar=y1 x2 y2d x1d;
define: y2d=y2; x1d=x1;
constrain=y2 x1;
model: [y1] (mean); y2d on y1 x1d x2;
model constraint: new(a b1 b2); mean=exp(a+b1*y2+b2*x1);

But now I get the error:

An internal error has occurred. This may be caused by an error in the
DEFINE command or in the USEOBSERVATIONS option of the VARIABLE command.
Check these statements in your input.
 Tihomir Asparouhov posted on Tuesday, October 01, 2013 - 9:50 am
Ops sorry - switch rows 4 and 5
 Matthew Clement posted on Wednesday, October 02, 2013 - 5:43 am

One last question...

If I wanted to add a nonrecursive relationship to this model, what would the model and constraint statements look like? Here are the variables again:

y1 (poisson, with log link)
y2 (gaussian, with identity link)

The model looking like this:
y2 = x1 + x2
y1 = x1
y2 <-> y1
 Tihomir Asparouhov posted on Wednesday, October 02, 2013 - 8:35 am
names=y1 y2 x1 x2;
usevar=y1 y2 x2 x1d;
define: x1d=x1;
model: [y1] (mean); y2 on x1d x2; y1 with y2;
model constraint: new(a b1); mean=exp(a+b1*x1);
 Yvonne LEE posted on Thursday, April 17, 2014 - 7:05 pm
I am a novice to SEM using MPlus. I am constructing a model on rape tendency with two indicators. One is count of official rape offense in adulthood which is an ordinal data (4 level) Another is count of self report rape behavior. As my sample contains 177 offenders of which 36 are rapists. So, both indicators had lots of zero count, making a U-shaped distribution for two indicators. Any need for special caution in modeling? I treat the two indicators as categorical data and run SEM with an error message: residual covariance matrix is not positive. How to fix it?
 Bengt O. Muthen posted on Thursday, April 17, 2014 - 8:27 pm
Your model needs to allow for parameter differences among the offenders (rapists/non-rapists).

If you can't understand the error message, send output and license number to support.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message