For continuous censored variables, I would recommend the Mplus estimator MLM. For categorical outcomes with floor or ceiling effects, I would recommend the Mplus estimator WLSMV. You might also consider that the censoring or floor and ceiling effects is the result of a mixture of subpopulations and consider mixture modeling.
Anonymous posted on Thursday, September 13, 2001 - 10:29 am
Thanks, Linda. Is it possible to use simultaneously missing data technique and MLM or WLSMV?
There is no general theory for non-normality robust MAR missing data handling. See Web Note 2.
Dustin posted on Thursday, December 30, 2004 - 1:40 pm
I am attempting to perform a four factor CFA, with each factor consisting of approximately 8-items. I have some missing data, and would also like to compare nested models to compare different factor structures. The problem is that the items are from a measure that uses a likert rating from 0-2.
Here is the question: Can I treat the items as non-normal continuous rather than ordinal and use MLR in Mplus 3? From what I understand, I can not do nested model chi square tests using WLSMV for ordinal variables. Are there any articles on when is it appropriate to treat likert items as continuous non-normal rather than ordinal?
The DIFFTEST option of Version 3 allows chi-square difference testing with WLSMV.
Dustin posted on Friday, December 31, 2004 - 4:55 am
That is great. Out of curiosity, do you have any opinion on when it is appropriate to treat likert scale items as continous non-normal rather than ordinal. There seems to be a dearth of research on the subject, so references in this regard would also be useful.
Thanks again. I look forward to attending some of your courses in the future.
bmuthen posted on Friday, December 31, 2004 - 5:49 am
See the two Muthen-Kaplan references in the Mplus Reference section under SEM.
Dustin posted on Friday, December 31, 2004 - 7:33 am
Last question regarding this issue.
The first step of my study involves a CFA with ordinal item indicators (0-2) to test a hypothesized four factor solution. The second step involves relating these factors to longitudinal data (probably a growth curve model) regarding the development of delinquency over 6 follow-up periods. While WLSMV seems approriate to determine the factor structutre, ML estimation seems more appropriate for handeling the missing data present on the outcome variable over time (especially since WLSMV uses pairwise deletion).
Would it be appropriate to first run the CFA with WLSMV and save the factor scores. Then run a ML growth model using the factor scores as predictors. Any other suggestions would be extremely helpful.
bmuthen posted on Friday, December 31, 2004 - 7:47 am
It sounds like you have a factor as one of the time-invariant covariates of a growth process. What you suggests is reasonable.
Here are some thoughts on alternatives. Although doable, ML for categorical outcomes leads to heavy computations here given at least 3 latent variables (1 for the factor and 2 more for a growth intercept and slope). And, as you say, WLSMV would work with pairwise deletion. One question is what predicts the missingness on the growth outcome. Is it predicted by observed covariates, the factor covariate, or by the outcome at time 1? If the former, WLSMV might still be ok since WLSMV allows MAR wrt covariates. If one of the latter, WLSMV is not ok.
Dustin posted on Friday, December 31, 2004 - 8:03 am
There are actually four factors that are assessed prior to the growth process, not just one. As a result, ML for categorical outcomes will not produce a chi-square statistic for testing nested model fit (Mplus says is too complex to estimate).
The issue of covariates being associated with missingness in the growth process is an interesting one. I am planning on cotrol for a prior history (lifetime) of deliqnuency at Time 1, while the intercept and slope of the growth factor will be assessed at times 2-7 (delinquency over the last 6 months). It seems like you are saying that this may help adjust for the fact that missingness may be related to delinquency.
Thanks for your helpful comments. As I loyal user of Mplus, this website is great.
Anonymous posted on Friday, March 18, 2005 - 1:24 pm
When doing poisson regression models in Mplus, is there a way to correct the standard errors for overdispersion?
You don't need a very large sample for MLE to give non-normality robust point estimates (and non-normality robust SEs when using the Mplus MLR estimator). But your title mentions "censored" which implies that you have observed variables with a floor or ceiling effect, in which case a standard linear model is probably not appropriate (and large sample size gives no advantage) - in such situations it is better to switch to a non-linear model, such as a censored-normal model, a zero-inflated model, or a two-part model (see the Mplus Version 3 User's Guide).
Hello, could you give me some papers or books to consult for nonnormality data with hight skweness and kurtosis estimation. Thanks in advance
bmuthen posted on Thursday, November 17, 2005 - 5:15 am
Search for articles by Mardia.
Annonymous posted on Monday, January 30, 2006 - 7:57 am
If I can reduce skewness in my dependent variable from 2.03 to 0.056 without having to remove any outliers, is there any advantage to doing this? In other words, is there a degree of skewness after which WLS is no longer an appropriate estimator?
I would not recommend WLS which with continuous outcomes is ADF unless you have a very small model and very large sample. I also would not transform to avoid skewness unless there is another reason to do so, for example, a substantive reason. Instead I would use the MLR estimator.
Matt Diemer posted on Wednesday, February 15, 2006 - 12:06 pm
To add a follow-up question to this thread:
How would you all recommend addressing skewness/kurtosis in a complex sample design data set using categorical indicators/variables? [also some missing data]
My intent was to use WLSMV (because of the categorical indicators) and have reviewed Yu's (2002) dissertation re: some of these issues.
any suggestions/recommendations for references would be much appreciated.
bmuthen posted on Thursday, February 16, 2006 - 6:13 am
With categorical variables, the skewness/kurtosis is not a problem for model assumptions as it is with continuous outcomes where normality is violated. The only issue is the possibility of zero cells in bivariate tables, which can be problematic in that information on correlations between variables is limited. The additional feature of complex sample design is incorporated in Mplus. I have a 1989 Soc Meth & Research article on skewed binary outcomes that might be relevant - see our web site under References for categorical data.
Nina Zuna posted on Sunday, August 20, 2006 - 2:38 pm
Dear Drs. Muthén and Muthén,
I was reading an older book chapter entitled SEM with Non-normal variables: Problems and Remedies by West, Finch, & Curran, 1995; the authors noted the use of a CVM estimator by Muthén. 1. Am I correct to assume that CVM at that time only referred to WLS, but now there are several estimators available in Mplus to handle non-normal data (e.g., MLR, MLM, WLS)? Secondly, obviously extra caution should be extended when using Likert scales with <10 response options, particularly with Multiple Group Measurement Invariance testing (Lubke & Muthén,2004). 2. I am doing a multiple grp invariance test, have missing data, and am including the means in my model. I noticed I couldn't use MLM or MLMV with missing data. Is it OK to use MLR for Multiple grp Invariance in this situation (my scale is 1-5) or will my means by biased since not addressed in the MLR estimator (or are they)? What are my options? 3. Is a Likert scale considered categorical or continuous, non-normal? If one assumes a Likert scale is categorical as opposed to continuous, is there any need to do the test for multivariate normality or is this test reserved for only continuous variables?
1. CVM referred to categorical variable methodology, which is represented by WLS, WLSM, and WLSMV, where the latter is the current Mplus default. Mplus now also does CVM using ML and MLR.
MLM refers to analysis with continuous outcomes using non-normality robust ML.
2. Multiple-group analysis with categorical outcomes can be done by WLSMV or by ML(R), where the latter needs to use KNOWNCLASS for the multiple groups. If you declare the dependent variables as categorical, the correct model will be used (irrespective of estimator) and therefore no problems with the estimates.
3. Likert scales can be considered categorical, continuous-normal, or continuous normal - it is your choice. Only if they are strongly skewed with pronounced floor or ceiling effects would I use CVM. Normality tests are only for continuous outcomes.
Scott posted on Tuesday, August 21, 2007 - 11:07 am
I am conducting LGMM on data with a cohort-sequential design. I have missing data (not missing at random but due to the design). The DVs are continuous (index of self report delinquency). 1) Given the missingness, is there a way to test for nonnormality, beside the SK tests (outlined in Muthen, 2003)?
2) Instead, are the MLM and MLR options robust enough to handle most deviations from normality?
3) Also, what is the difference between MLM and MLR for the estimator options?
1. I don't know of any way. You could run with ML and also with MLM or MLR and see if results differ thereby deducing whether there is non-normality. 2. Yes. 3. MLM is described in TechnicalAppendix 4. MLR is described in Technical Appendix 8. MLR uses a sandwich estimator. 4. We use MLR as the default. You should get very close results using either.
You must be using this with outcomes that are not continuous. In this case, means, variances, and covariances are not sufficient statistics for model estimation and chi-square and related fit statistics are not available.
Michin Hong posted on Wednesday, December 29, 2010 - 10:01 am
Dear Drs. Muthen,
I am pretty new to Mplus and working on a SEM model using a data set(n=1837). For some reasons, I got the following warning messages for 8 different variables. They are all countinous variables with more than 5 categories and seem to be okay in terms of normality (skewness and kurtosis are all less than 2).
I alreday have 5 categorical variables out of 13 variables in the model including the oridnal endogenous variable with 4 categories. So, it seems that the program tries to treat all my variables categorical.
FYI, I used WLSMV estimator.
*********** WARNING: VARIABLE CG_TIME MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
WARNING: VARIABLE STRESS MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
WARNING: VARIABLE IADL MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
WARNING: VARIABLE MASTERY MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
WARNING: VARIABLE SS MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
WARNING: VARIABLE SVCUSE MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
WARNING: VARIABLE SATIS MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
WARNING: VARIABLE BURDEN MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
I am new to MPlus, and looking to use the MLM estimator to help manage some kurtosis among in some variables. HOwever, I just cannot seem to get it to work. I have put ESTIMATOR = ML; in the analysis command, as well as have LISTWISE=ON; in the data command, but then I get the following error still and the analysis proceds but using the ML estimator. *** WARNING in ANALYSIS command Starting with Version 5, TYPE=MISSING is the default for all analyses. To obtain listwise deletion, use LISTWISE=ON in the DATA command. 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
I am sorry for such a basic question, but is there something very obvious that I am not including?
The warning is just to inform you of a change starting with Version 5. The MLM estimator can be used only with complete data. Just put ESTIMATOR=MLM; in the ANALYSIS command to get MLM. You won't get that with ML. The MLR estimator is also robust to non-normality and can be used with incomplete data.
anja koen posted on Tuesday, March 08, 2011 - 3:58 pm
Is the chi-square calculated by the MLR estimator a Satorra-Bentler scaled chi-square? Thanks!
I'm a relative novice in regards to both M Plus and SEM. I've got a skewed count variable -- drug use in the last 30 days -- as my outcome variable, and would like to use zero-inflated poisson modeling to account for that non-normality, but I've got an interaction variable that is important to my analysis. As I understand it, ZIP modeling introduces a "structural zero" parameter -- how would that parameter function alongside interaction effects? Would I need to include it as a possible interactor as well? I'm not sure if I'm even thinking about this in the right way. Any pointers you could give would be appreciated.
Thank you. Speaking of the same model, what is the best way to deal with centering the indicators for the latent interaction variable? My advisor is under the impression that M Plus has some centering scheme by default for interactions, but all my data manipulation, including the construction of the AxB indicators, was done in a different program.
If, as I suspect, there is no default centering, would you recommend centering all indicators and including a mean structure, or using residual-centering with no mean structure as advocated in (Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13, 497–519)? Does the fact that this is a ZIP model have any bearing on that decision?
It sounds like you are interested in a latent variable interaction as a predictor. Note that Mplus offers the XWITH approach to that so that it is not necessary to use products of factor indicators. The latent variables entering the XWITH interaction are in typical models centered, that is, have zero means.
In attempting to use the XWITH option I seem to have run astray, and am getting a "fatal error: reciprocal interaction problem" message. I think I've reproduced the syntax from the manual's example (5.13) as it applies to my situation (as described directly above), but perhaps I'm wrong. The model follows:
Dear Drs Muthen, I am running a path analysis with depressive symptoms as my dependent variable, modeled as a continuous latent variable with 12 observed ordinal indicators that are a count of the number of days in the past week each symptom was experienced. I also have three latent variable mediators with categorical indicators, a categorical independent variable and a number of control variables. I am using WLSMV as my estimator, theta parameterization and calculating bcboot confidence intervals. l have good model fit (measurement & structural). I interpreted the coefficients for the continuous latent dependent variable as regular OLS coefficients. I am trying to respond to a reviewers concern about the non-normality of my dependent variable (and some of the latent mediators with the same issue). In previous work using Stata I have created a single indicator by summing the 12 indicators and logged it to reduce nonnormality, then tested for violations of the normality assumption. 1) Is there a corresponding assumption of normality for continuous latent variables in SEM? 2) How does one test if this assumption is violated and what are the consequences? 3) Can you transform a latent variable to normalize it? 4) Treating my indicators as categorical produces better model fit than treating them as continuous. Should I be logging them and treating them as continuous instead, even if it results is worse model fit? Thanks
The fact that your ordinal indicators have skewed distributions does not mean that the factor behind them has a non-normal distribution. You can have a normal factor influencing ordinal indicators and the reason they are skewed is that you are recording a rare event. The strong non-normality of the indicators is really only a potential problem if you treat them as continuous instead of categorical (or counts) (1) Normality is typically assumed for latent variables in SEM. (2) It is hard to test if this holds. (3) I would not transform the indicators. If they are counts you can also treat them as counts instead of categorical.
is there any equivalent to the Stata SSC censornb (Hilbe, 2005) in MPLUS for a survival parameterization of censored negative binomial regression? The survival parameterization of censoring allows censoring to take place anywhere in the data, not only at cut points (Hilbe 2012: 406).
I want to estimate a mediation model with a count dependent variable (negative binomial).
The paper ‘Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.’ was very helpful and I was able to run the example inputfiles in tables 52-54. However, if I apply these syntaxes to my own data I get the following error message:'Unknown group name 1 specified in group-specific MODEL command.’ The message shows for count = y(p) as well as for count = y(nb).
Below is (part of) my inputfile, can you tell me what I am doing wrong? I use Mplus version 7.
ANALYSIS: TYPE = RANDOM; ESTIMATOR = ML;
MODEL: [y*-1.602](beta0); y ON x*.116(beta2); beta1 | y ON m1; beta1 ON x*-.009(beta3); [beta1*.309](beta1); beta1@0; [m1*7.997](gamma0); m1 ON x*.182(gamma1); m1*5.686(sig);
I'm hoping to use poisson with a log-link and robust standard errors to model a non-negative, positively skewed, continuous dependent variable. Is this possible in Mplus? If so, how is it done? I can't find any information on the discussion board about this.
Thanks for the response, Tihomir. I've never tried to incorporate a model constraint in Mplus before; I don't know what you're doing. What's "x"? Etc. Could you elaborate or point me to another source that talks about how to estimate generalized linear models in Mplus? Thanks, Matt
I am doubling the variables that need to be in the constraint and the model statement with the input but you can alternatively do that outside of Mplus. Also I forgot the intercept in my previous post.
variable: names=y1 y2 x1 x2; usevar=y1 y2 x2 x1d; constrain=x1; define: x1d=x1; model: [y1] (mean); y2 on x1d x2; y1 with y2; model constraint: new(a b1); mean=exp(a+b1*x1);
Yvonne LEE posted on Thursday, April 17, 2014 - 7:05 pm
I am a novice to SEM using MPlus. I am constructing a model on rape tendency with two indicators. One is count of official rape offense in adulthood which is an ordinal data (4 level) Another is count of self report rape behavior. As my sample contains 177 offenders of which 36 are rapists. So, both indicators had lots of zero count, making a U-shaped distribution for two indicators. Any need for special caution in modeling? I treat the two indicators as categorical data and run SEM with an error message: residual covariance matrix is not positive. How to fix it?
I have a few questions regarding a model using a skewed DV in a cross-lag analysis. I have a continuous DV (a proportion score ranging from 0-1) which is positively skewed with over 45% of the sample scoring 0. From what I can gather online, I think it would be appropriate to treat the DV as a censored (below) variable. I was wondering:
1.) Is this the correct use of the censored option? 2.) Can censored variables be used in cross-lag analyses? 3.) What is the best way to assess model fit when using a censored variable?
I am unable to figure out how to save factor scores for an exploratory structural equation model (ESEM) in which 4 of 34 variables are censored. The rest are simply continuous variables, and I am using the WLSMV estimator.
Specifically, I received the following warning:
*** WARNING in SAVEDATA command Factor scores can only be computed for analysis with censored outcomes with TYPE=MIXTURE or ALGORITHM=INTEGRATION. Request to save FSCORES is ignored.
In light of this warning message, my understanding is that ESEM is not allowed with the command ALGORITHM=INTEGRATION and, furthermore, mixture analysis only applies to when categorical latent variables are present, which does not apply in this case.
Is there some other way I can derive and save factor scores, given that 4 of the variables should be censored?
Tashia Abry posted on Tuesday, March 08, 2016 - 10:02 am
Drs. Muthen, I am estimating a type=complex regression in which my two continuous outcomes are both censored (one from above and one from below). Because I want to also model the covariance between the two outcomes, I am using WLSMV rather than MLR. Is this problematic? And if not, may I interpret the coefficients in the same way as an OLS regression coefficient or does the WLSMV estimator necessitate a different interpretation of the parameter estimates (e.g., because it is on a probit metric)? Thank you in advance.
The linear regression interpretation refers to the underlying continuous latent response variable. There are formulas for the expectation of the observed censored outcome as a function of predictors - see books on censored regression.
Tashia Abry posted on Tuesday, March 08, 2016 - 10:17 am
Thank you for your prompt response. As a follow up question, what is the specific regression interpretation for a continuous censored outcome (and continuous predictor) using WLSMV? Thank you.
It sounds like you are using WLSMV. You need ML to get factor scores for censored. Add ALGORITM=INTEGRATION; and ESTINATOR=ML; to the ANALYSIS command. Remove the BOOTSTRAP command. If that does not work, send the output and your license number to email@example.com.
M.Y posted on Saturday, February 11, 2017 - 9:21 pm
Hi Professor Muthen,
I am running an SEM for my model. Some indicators of my endogenous variable¡ªsmoking attitudes¡ªassessed on a 1 to 7 point likert type scale are highly skewed. About 90% of people give a response of 1. So I am wondering should I dichotomize those indicators and perform a censored variable analysis or just delete the indicators given 90% of cases at this extreme value?
If you have enough items I would delete such items. But SEMNET is a better outlet for these general analysis strategy questions.
Daniel Lee posted on Wednesday, February 07, 2018 - 1:52 pm
Hello Dr. Muthen- I am conducting a latent growth mediation, and the DV (i.e., the latent intercept and slope of ATOD related problems) is zero-inflated. When I include the variables for ATOD related problems within the CENSOR command (i.e., bi), I do not get all the model fit statistics (e.g., RMSEA, CFI) for the mediation model. I do, however, get AIC and BIC.
Is there a way to get RMSEA and CFI for latent growth mediation models when using censor command for the DV? Thank you so much for all of your help!
There is not a overall test of model fit available but you can for instance look to see if the proportion at the lower censoring point is well estimated - we do that in our book.
Otherwise, one approach is to look at neighboring models and compare using either loglikelihood-based chi-square or using BIC. By neighboring models I mean a model that is a bit less restrictive than the original - e.g. including a direct effect that is not included in the original model.