Message/Author 

Anonymous posted on Wednesday, September 12, 2001  11:53 am



Using Mplus, what is the best way to deal with Censored & Nonnormal data? Do you recommend any estimation method? I found from a stat book that WLSMV in Mplus is appropriate for nonnormal data. Is it also good for censored data? 


For continuous censored variables, I would recommend the Mplus estimator MLM. For categorical outcomes with floor or ceiling effects, I would recommend the Mplus estimator WLSMV. You might also consider that the censoring or floor and ceiling effects is the result of a mixture of subpopulations and consider mixture modeling. 

Anonymous posted on Thursday, September 13, 2001  10:29 am



Thanks, Linda. Is it possible to use simultaneously missing data technique and MLM or WLSMV? 


No, missing is available only for ML. There's a table on page 38 of the Mplus User's Guide which shows which estimators are available for various situations. You may find this helpful. 

Anonymous posted on Sunday, October 26, 2003  10:43 am



Does the newest version of Mplus have bootstrapping capabilities? 


Mplus Version 3 will have bootstrapping of standard errors and confidence intervals. 

Liesbeth posted on Wednesday, September 29, 2004  4:00 am



Is there a single technique in MPLUS that can deal with missing values while the assumption of normality is violated? 


There is no general theory for nonnormality robust MAR missing data handling. See Web Note 2. 

Dustin posted on Thursday, December 30, 2004  1:40 pm



I am attempting to perform a four factor CFA, with each factor consisting of approximately 8items. I have some missing data, and would also like to compare nested models to compare different factor structures. The problem is that the items are from a measure that uses a likert rating from 02. Here is the question: Can I treat the items as nonnormal continuous rather than ordinal and use MLR in Mplus 3? From what I understand, I can not do nested model chi square tests using WLSMV for ordinal variables. Are there any articles on when is it appropriate to treat likert items as continuous nonnormal rather than ordinal? Thanks! 


The DIFFTEST option of Version 3 allows chisquare difference testing with WLSMV. 

Dustin posted on Friday, December 31, 2004  4:55 am



That is great. Out of curiosity, do you have any opinion on when it is appropriate to treat likert scale items as continous nonnormal rather than ordinal. There seems to be a dearth of research on the subject, so references in this regard would also be useful. Thanks again. I look forward to attending some of your courses in the future. 

bmuthen posted on Friday, December 31, 2004  5:49 am



See the two MuthenKaplan references in the Mplus Reference section under SEM. 

Dustin posted on Friday, December 31, 2004  7:33 am



Last question regarding this issue. The first step of my study involves a CFA with ordinal item indicators (02) to test a hypothesized four factor solution. The second step involves relating these factors to longitudinal data (probably a growth curve model) regarding the development of delinquency over 6 followup periods. While WLSMV seems approriate to determine the factor structutre, ML estimation seems more appropriate for handeling the missing data present on the outcome variable over time (especially since WLSMV uses pairwise deletion). Would it be appropriate to first run the CFA with WLSMV and save the factor scores. Then run a ML growth model using the factor scores as predictors. Any other suggestions would be extremely helpful. 

bmuthen posted on Friday, December 31, 2004  7:47 am



It sounds like you have a factor as one of the timeinvariant covariates of a growth process. What you suggests is reasonable. Here are some thoughts on alternatives. Although doable, ML for categorical outcomes leads to heavy computations here given at least 3 latent variables (1 for the factor and 2 more for a growth intercept and slope). And, as you say, WLSMV would work with pairwise deletion. One question is what predicts the missingness on the growth outcome. Is it predicted by observed covariates, the factor covariate, or by the outcome at time 1? If the former, WLSMV might still be ok since WLSMV allows MAR wrt covariates. If one of the latter, WLSMV is not ok. 

Dustin posted on Friday, December 31, 2004  8:03 am



There are actually four factors that are assessed prior to the growth process, not just one. As a result, ML for categorical outcomes will not produce a chisquare statistic for testing nested model fit (Mplus says is too complex to estimate). The issue of covariates being associated with missingness in the growth process is an interesting one. I am planning on cotrol for a prior history (lifetime) of deliqnuency at Time 1, while the intercept and slope of the growth factor will be assessed at times 27 (delinquency over the last 6 months). It seems like you are saying that this may help adjust for the fact that missingness may be related to delinquency. Thanks for your helpful comments. As I loyal user of Mplus, this website is great. 

Anonymous posted on Friday, March 18, 2005  1:24 pm



When doing poisson regression models in Mplus, is there a way to correct the standard errors for overdispersion? 


Although we have not studied this yet using a simulation study, we think that the MLR standard errors do this. Do you have a reference for a correction for overdisperson that you are thinking about? 

bmuthen posted on Saturday, March 19, 2005  4:57 am



Perhaps you are referring to zeroinflated Poisson (ZIP) modeling when you say "overdispersion". If so, yes Mplus can do ZIP modeling and therefore gets the correct SEs. 

Anonymous posted on Monday, May 30, 2005  1:02 pm



If I have nonnormal data but a very large sample size (>9000) am I ok if using MLE? 

bmuthen posted on Monday, May 30, 2005  3:39 pm



You don't need a very large sample for MLE to give nonnormality robust point estimates (and nonnormality robust SEs when using the Mplus MLR estimator). But your title mentions "censored" which implies that you have observed variables with a floor or ceiling effect, in which case a standard linear model is probably not appropriate (and large sample size gives no advantage)  in such situations it is better to switch to a nonlinear model, such as a censorednormal model, a zeroinflated model, or a twopart model (see the Mplus Version 3 User's Guide). 


Hello, could you give me some papers or books to consult for nonnormality data with hight skweness and kurtosis estimation. Thanks in advance 

bmuthen posted on Thursday, November 17, 2005  5:15 am



Search for articles by Mardia. 

Annonymous posted on Monday, January 30, 2006  7:57 am



If I can reduce skewness in my dependent variable from 2.03 to 0.056 without having to remove any outliers, is there any advantage to doing this? In other words, is there a degree of skewness after which WLS is no longer an appropriate estimator? 


What is the scale of your dependent variable? And how did you reduce the skewness? 

Annonymous posted on Monday, January 30, 2006  8:29 am



it is continuous, and the skewness was reduced by transforming the data in SAS with a macro for the boxcox approach to transformation. 


I would not recommend WLS which with continuous outcomes is ADF unless you have a very small model and very large sample. I also would not transform to avoid skewness unless there is another reason to do so, for example, a substantive reason. Instead I would use the MLR estimator. 

Matt Diemer posted on Wednesday, February 15, 2006  12:06 pm



To add a followup question to this thread: How would you all recommend addressing skewness/kurtosis in a complex sample design data set using categorical indicators/variables? [also some missing data] My intent was to use WLSMV (because of the categorical indicators) and have reviewed Yu's (2002) dissertation re: some of these issues. any suggestions/recommendations for references would be much appreciated. Thank you, 

bmuthen posted on Thursday, February 16, 2006  6:13 am



With categorical variables, the skewness/kurtosis is not a problem for model assumptions as it is with continuous outcomes where normality is violated. The only issue is the possibility of zero cells in bivariate tables, which can be problematic in that information on correlations between variables is limited. The additional feature of complex sample design is incorporated in Mplus. I have a 1989 Soc Meth & Research article on skewed binary outcomes that might be relevant  see our web site under References for categorical data. 

Nina Zuna posted on Sunday, August 20, 2006  2:38 pm



Dear Drs. Muthén and Muthén, I was reading an older book chapter entitled SEM with Nonnormal variables: Problems and Remedies by West, Finch, & Curran, 1995; the authors noted the use of a CVM estimator by Muthén. 1. Am I correct to assume that CVM at that time only referred to WLS, but now there are several estimators available in Mplus to handle nonnormal data (e.g., MLR, MLM, WLS)? Secondly, obviously extra caution should be extended when using Likert scales with <10 response options, particularly with Multiple Group Measurement Invariance testing (Lubke & Muthén,2004). 2. I am doing a multiple grp invariance test, have missing data, and am including the means in my model. I noticed I couldn't use MLM or MLMV with missing data. Is it OK to use MLR for Multiple grp Invariance in this situation (my scale is 15) or will my means by biased since not addressed in the MLR estimator (or are they)? What are my options? 3. Is a Likert scale considered categorical or continuous, nonnormal? If one assumes a Likert scale is categorical as opposed to continuous, is there any need to do the test for multivariate normality or is this test reserved for only continuous variables? Thank you kindly for your recommendations. 


1. CVM referred to categorical variable methodology, which is represented by WLS, WLSM, and WLSMV, where the latter is the current Mplus default. Mplus now also does CVM using ML and MLR. MLM refers to analysis with continuous outcomes using nonnormality robust ML. 2. Multiplegroup analysis with categorical outcomes can be done by WLSMV or by ML(R), where the latter needs to use KNOWNCLASS for the multiple groups. If you declare the dependent variables as categorical, the correct model will be used (irrespective of estimator) and therefore no problems with the estimates. 3. Likert scales can be considered categorical, continuousnormal, or continuous normal  it is your choice. Only if they are strongly skewed with pronounced floor or ceiling effects would I use CVM. Normality tests are only for continuous outcomes. 

Scott posted on Tuesday, August 21, 2007  11:07 am



I am conducting LGMM on data with a cohortsequential design. I have missing data (not missing at random but due to the design). The DVs are continuous (index of self report delinquency). 1) Given the missingness, is there a way to test for nonnormality, beside the SK tests (outlined in Muthen, 2003)? 2) Instead, are the MLM and MLR options robust enough to handle most deviations from normality? 3) Also, what is the difference between MLM and MLR for the estimator options? 4) Which should I use in my case? Thanks. 


1. I don't know of any way. You could run with ML and also with MLM or MLR and see if results differ thereby deducing whether there is nonnormality. 2. Yes. 3. MLM is described in TechnicalAppendix 4. MLR is described in Technical Appendix 8. MLR uses a sandwich estimator. 4. We use MLR as the default. You should get very close results using either. 


Dear Drs Muthen, I am estimating a path model using MLR estimator and I wonder why chisquare is not included in the results for the tests of model fit. Thank you in advance. 


You must be using this with outcomes that are not continuous. In this case, means, variances, and covariances are not sufficient statistics for model estimation and chisquare and related fit statistics are not available. 

Michin Hong posted on Wednesday, December 29, 2010  10:01 am



Dear Drs. Muthen, I am pretty new to Mplus and working on a SEM model using a data set(n=1837). For some reasons, I got the following warning messages for 8 different variables. They are all countinous variables with more than 5 categories and seem to be okay in terms of normality (skewness and kurtosis are all less than 2). I alreday have 5 categorical variables out of 13 variables in the model including the oridnal endogenous variable with 4 categories. So, it seems that the program tries to treat all my variables categorical. FYI, I used WLSMV estimator. *********** WARNING: VARIABLE CG_TIME MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. WARNING: VARIABLE STRESS MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. WARNING: VARIABLE IADL MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. WARNING: VARIABLE MASTERY MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. WARNING: VARIABLE SS MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. WARNING: VARIABLE SVCUSE MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. WARNING: VARIABLE SATIS MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. WARNING: VARIABLE BURDEN MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. ****** Thank you. 


Please send you input, data, output, and license number to support@statmodel.com. It sounds like you are not reading the data correctly. 


I am new to MPlus, and looking to use the MLM estimator to help manage some kurtosis among in some variables. HOwever, I just cannot seem to get it to work. I have put ESTIMATOR = ML; in the analysis command, as well as have LISTWISE=ON; in the data command, but then I get the following error still and the analysis proceds but using the ML estimator. *** WARNING in ANALYSIS command Starting with Version 5, TYPE=MISSING is the default for all analyses. To obtain listwise deletion, use LISTWISE=ON in the DATA command. 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS I am sorry for such a basic question, but is there something very obvious that I am not including? Many thanks. 


The warning is just to inform you of a change starting with Version 5. The MLM estimator can be used only with complete data. Just put ESTIMATOR=MLM; in the ANALYSIS command to get MLM. You won't get that with ML. The MLR estimator is also robust to nonnormality and can be used with incomplete data. 

anja koen posted on Tuesday, March 08, 2011  3:58 pm



Is the chisquare calculated by the MLR estimator a SatorraBentler scaled chisquare? Thanks! 


No. SatorraBentler is the MLM chi2. The MLR chi2 is asymptotically the same as the YuanBentler (2000) T2* version (see V6 UG, page 533). 


I'm a relative novice in regards to both M Plus and SEM. I've got a skewed count variable  drug use in the last 30 days  as my outcome variable, and would like to use zeroinflated poisson modeling to account for that nonnormality, but I've got an interaction variable that is important to my analysis. As I understand it, ZIP modeling introduces a "structural zero" parameter  how would that parameter function alongside interaction effects? Would I need to include it as a possible interactor as well? I'm not sure if I'm even thinking about this in the right way. Any pointers you could give would be appreciated. 


You can include the interaction, and other covariates, also in the prediction of the binary part of the ZIP (zeroinflated Poisson), that is, the prediction of being at zero or not. Another approach is to specify the outcome as negative binomial where the inflation part is "built in" and doesn't need referring to. 


Thank you. Speaking of the same model, what is the best way to deal with centering the indicators for the latent interaction variable? My advisor is under the impression that M Plus has some centering scheme by default for interactions, but all my data manipulation, including the construction of the AxB indicators, was done in a different program. If, as I suspect, there is no default centering, would you recommend centering all indicators and including a mean structure, or using residualcentering with no mean structure as advocated in (Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13, 497–519)? Does the fact that this is a ZIP model have any bearing on that decision? Thank you for the help! 


It sounds like you are interested in a latent variable interaction as a predictor. Note that Mplus offers the XWITH approach to that so that it is not necessary to use products of factor indicators. The latent variables entering the XWITH interaction are in typical models centered, that is, have zero means. 


In attempting to use the XWITH option I seem to have run astray, and am getting a "fatal error: reciprocal interaction problem" message. I think I've reproduced the syntax from the manual's example (5.13) as it applies to my situation (as described directly above), but perhaps I'm wrong. The model follows: VARIABLE: names = ..... usevar = ual30 Imp1 Imp2 Imp3 SS1 SS2 SS3; missing = blank; count = ual30 (i); ANALYSIS: type = random; MODEL: Imp by Imp1 Imp2 Imp3; SS by SS1 SS2 SS3; IMPxSS  Imp XWITH SS ual30 on Imp SS; ual30 on IMPxSS; ual30#1 on Imp SS; ual30#1 on IMPxSS; Output: TECH1, TECH8; 


Never mind. It was a missing semicolon on the XWITH statement. Pshh. Don't I feel silly. 


Dear Drs Muthen, I am running a path analysis with depressive symptoms as my dependent variable, modeled as a continuous latent variable with 12 observed ordinal indicators that are a count of the number of days in the past week each symptom was experienced. I also have three latent variable mediators with categorical indicators, a categorical independent variable and a number of control variables. I am using WLSMV as my estimator, theta parameterization and calculating bcboot confidence intervals. l have good model fit (measurement & structural). I interpreted the coefficients for the continuous latent dependent variable as regular OLS coefficients. I am trying to respond to a reviewers concern about the nonnormality of my dependent variable (and some of the latent mediators with the same issue). In previous work using Stata I have created a single indicator by summing the 12 indicators and logged it to reduce nonnormality, then tested for violations of the normality assumption. 1) Is there a corresponding assumption of normality for continuous latent variables in SEM? 2) How does one test if this assumption is violated and what are the consequences? 3) Can you transform a latent variable to normalize it? 4) Treating my indicators as categorical produces better model fit than treating them as continuous. Should I be logging them and treating them as continuous instead, even if it results is worse model fit? Thanks 


The fact that your ordinal indicators have skewed distributions does not mean that the factor behind them has a nonnormal distribution. You can have a normal factor influencing ordinal indicators and the reason they are skewed is that you are recording a rare event. The strong nonnormality of the indicators is really only a potential problem if you treat them as continuous instead of categorical (or counts) (1) Normality is typically assumed for latent variables in SEM. (2) It is hard to test if this holds. (3) I would not transform the indicators. If they are counts you can also treat them as counts instead of categorical. 


Dear Drs. Muthén and Muthén, is there any equivalent to the Stata SSC censornb (Hilbe, 2005) in MPLUS for a survival parameterization of censored negative binomial regression? The survival parameterization of censoring allows censoring to take place anywhere in the data, not only at cut points (Hilbe 2012: 406). Thank you in advance! 


We have truncated and hurdle negbin, but not survivalparameterized censored negbin. 


Dear Drs. Muthén and Muthén, I want to estimate a mediation model with a count dependent variable (negative binomial). The paper ‘Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.’ was very helpful and I was able to run the example inputfiles in tables 5254. However, if I apply these syntaxes to my own data I get the following error message:'Unknown group name 1 specified in groupspecific MODEL command.’ The message shows for count = y(p) as well as for count = y(nb). Below is (part of) my inputfile, can you tell me what I am doing wrong? I use Mplus version 7. Many thanks, Eveline ANALYSIS: TYPE = RANDOM; ESTIMATOR = ML; MODEL: [y*1.602](beta0); y ON x*.116(beta2); beta1  y ON m1; beta1 ON x*.009(beta3); [beta1*.309](beta1); beta1@0; [m1*7.997](gamma0); m1 ON x*.182(gamma1); m1*5.686(sig); 


Please send the output and your license number to support@statmodel.com. 


I'm hoping to use poisson with a loglink and robust standard errors to model a nonnegative, positively skewed, continuous dependent variable. Is this possible in Mplus? If so, how is it done? I can't find any information on the discussion board about this. 


This is available in Mplus using the COUNT option. This is appropriate for count variables not for continuous variables. 


Thanks for the response. But, why can't poisson be employed to model continuous variables in Mplus? There is a use for it, and several other statistical software packages do it: http://www.stata.com/meeting/boston10/boston10_nichols.pdf If you have any solution, it would be greatly appreciated. I'm a grad student, and I've turned to Mplus because of its superior SEM modeling capabilities. 


You can use the following example as a building block to accommodate GLM with log link. data: file is 1.dat; variable: names=x y; constrain=x; usevar=y; model: [y] (mean); model constraint: new(b); mean=exp(b*x); Example 5.23 in the user's guide also illustrates some of these features but that example is unrelated to GLM. 


Thanks for the response, Tihomir. I've never tried to incorporate a model constraint in Mplus before; I don't know what you're doing. What's "x"? Etc. Could you elaborate or point me to another source that talks about how to estimate generalized linear models in Mplus? Thanks, Matt 


X is the predictor variable that comes from the data. In the above model Log(E(YX))=b*X which is the GLM model with log link function, i.e, Y~N(exp(bX),v). 


Thanks again for your help. I have the following CONTINUOUS variables: y1 (poisson, with log link) y2 (gaussian, with identity link) x1 x2 Ultimately, I would like to estimate the following nonrecursive model with robust standard errors: y2 = y1 + x1 + x2 y1 = y2 + x1 How is that done in Mplus? (I don't see how the poisson family of GLMs is specified in your above example.) 


This should work variable: names=y1 y2 x1 x2; usevar=y1 x2 y2d x1d; define: y2d=y2; x1d=x1; constrain=y2 x1; model: [y1] (mean); y2d on y1 x1d x2; model constraint: new(a b1 b2); mean=exp(a+b1*y2+b2*x1); I am doubling the variables that need to be in the constraint and the model statement with the input but you can alternatively do that outside of Mplus. Also I forgot the intercept in my previous post. 


A FAQ on this has now been posted: GLM with log link and Poisson regression for continuous variables 


Thanks. As suggested, I ran the following: variable: names=y1 y2 x1 x2; usevar=y1 x2 y2d x1d; define: y2d=y2; x1d=x1; constrain=y2 x1; model: [y1] (mean); y2d on y1 x1d x2; model constraint: new(a b1 b2); mean=exp(a+b1*y2+b2*x1); But now I get the error: *** ERROR An internal error has occurred. This may be caused by an error in the DEFINE command or in the USEOBSERVATIONS option of the VARIABLE command. Check these statements in your input. 


Ops sorry  switch rows 4 and 5 


Wonderful! One last question... If I wanted to add a nonrecursive relationship to this model, what would the model and constraint statements look like? Here are the variables again: y1 (poisson, with log link) y2 (gaussian, with identity link) x1 x2 The model looking like this: y2 = x1 + x2 y1 = x1 y2 <> y1 


variable: names=y1 y2 x1 x2; usevar=y1 y2 x2 x1d; constrain=x1; define: x1d=x1; model: [y1] (mean); y2 on x1d x2; y1 with y2; model constraint: new(a b1); mean=exp(a+b1*x1); 

Yvonne LEE posted on Thursday, April 17, 2014  7:05 pm



I am a novice to SEM using MPlus. I am constructing a model on rape tendency with two indicators. One is count of official rape offense in adulthood which is an ordinal data (4 level) Another is count of self report rape behavior. As my sample contains 177 offenders of which 36 are rapists. So, both indicators had lots of zero count, making a Ushaped distribution for two indicators. Any need for special caution in modeling? I treat the two indicators as categorical data and run SEM with an error message: residual covariance matrix is not positive. How to fix it? 


Your model needs to allow for parameter differences among the offenders (rapists/nonrapists). If you can't understand the error message, send output and license number to support. 


Hello, I have a few questions regarding a model using a skewed DV in a crosslag analysis. I have a continuous DV (a proportion score ranging from 01) which is positively skewed with over 45% of the sample scoring 0. From what I can gather online, I think it would be appropriate to treat the DV as a censored (below) variable. I was wondering: 1.) Is this the correct use of the censored option? 2.) Can censored variables be used in crosslag analyses? 3.) What is the best way to assess model fit when using a censored variable? Thanks in advance for your help. 


12: Yes. 3: WLSMV gives fit measures. 


Greetings, I am unable to figure out how to save factor scores for an exploratory structural equation model (ESEM) in which 4 of 34 variables are censored. The rest are simply continuous variables, and I am using the WLSMV estimator. Specifically, I received the following warning: *** WARNING in SAVEDATA command Factor scores can only be computed for analysis with censored outcomes with TYPE=MIXTURE or ALGORITHM=INTEGRATION. Request to save FSCORES is ignored. In light of this warning message, my understanding is that ESEM is not allowed with the command ALGORITHM=INTEGRATION and, furthermore, mixture analysis only applies to when categorical latent variables are present, which does not apply in this case. Is there some other way I can derive and save factor scores, given that 4 of the variables should be censored? Thank you in advance for your assistance, Clayton 


Please send the output and your license number to support@statmodel.com. 

Back to top 