Anonymous posted on Friday, September 06, 2002 - 10:58 pm
I am doing two path models. One is nested, and i want to compare the two using a chi-square difference test. Some variables are not normal, so as i understand things, i will need to use an estimator with robust statistics. All variables are continuous. Can you recommend which estimators are most appropriate?
The only robust estimator for continuous outcomes that can be used for difference testing is MLM. A scaling correction factor is given in the Mplus output and an explanation of how to do difference testing using MLM is given on the Mplus website.
Anonymous posted on Monday, November 18, 2002 - 1:21 pm
In psychology field, exponential variable is quite common in practice. If in the SEM there are categorical, normal and exponential variables, is Mplus still useful? which procedure do you recommond? if not, Can you create special programm (variables are mixture of categorical, normal and exponential) for common use in SEM.
bmuthen posted on Monday, November 18, 2002 - 5:19 pm
I think you are referring to a variable with an exponential distribution. Mplus does not handle such outcomes. Unless a transformation can be done, I am not sure that approximating this as continuous or categorical is recommendable. With enough uses for these kinds of outcomes, the Mplus team might be motivated to add this feature to their list of future expansions.
Mahyar posted on Wednesday, March 26, 2003 - 4:06 pm
I'm new to Mplus and wanted to know whether Mplus has the bootstrapp option? If so, what is it? Also are there any good recent papers on bootstrapping that is in Mplus?
bmuthen posted on Wednesday, March 26, 2003 - 4:08 pm
The current version of Mplus does not have a bootstrap option, but it is on a future wish list. If many users are interested in this, the Mplus team will consider the request.
I'm interested in learning more about the MLR estimator that is suitable for use with clustered or non-clustered non-normal continuous or categorical observed variables. I've read Yuan and Bentler's 2000 article and understand that MLR traces its origins to White's 1982 sandwich variance estimator as well as Satorra and Bentler's seminal work that lead to the widely used T_MLM estimator.
Can you tell me how MLR differs from the usual White sandwich estimator that is used routinely in packages such as SAS (e.g., PROC GENMOD) and Stata (via the -robust- option in most estimation commands)? I know that MLR assumes incomplete data arise from a MAR process whereas the White sandwich estimator assumes incomplete data arise from an MCAR process, but I am curious to learn, in broad strokes, the differences in computation between these two approaches.
Also, if I wanted to compute T_MLR and associated standard errors manually from the outputs of a SEM program such as Mplus, how would I go about doing it?
The MLR formulas are in the Tech appendix 8 on the website. It is the sandwich approach which is the same as White. The missing data aspect comes in by using an observed instead of an expected info matrix in this expression. MAR vs MCAR affects the parameter estimates not the standard errors.
I don't think you can easily compute T_MLR and associated standard errors from the output.
Manuel posted on Tuesday, August 05, 2008 - 7:40 am
I noted a slight (but for me important) difference between the third and fifth edition of the users guide. In the third edition it says that "the MLR chi-square test statistic is also referred to as the Yuan-Bentler T2* test statistic" (p. 368), while the fifth edition states that "the MLR chi-square test statistic is ASYMPTOTICALLY equivalent to the Yuan-Bentler T2* test statistic" (p. 484).
a) What is the reason for this change? If in a finite sample MLR chi^2 is NOT equivalent to Yuan-Bentler T2*, how do the two differ?
b) what is the correct reference for MLR chi^2 (Yuan & Bentler 2000?). I checked appendix 8, which contains no reference to Yuan and Bentler.
c) On a related note, am I correct to assume that the MLM chi-square test statistic is equivalent to the Satorra-Bentler chi-square (even non asymptotically)? What is the correct reference for MLM chi^2 (Satorra & Bentler, 1994?).
a) They are supposed to be equivalent, but Mplus and EQS report slightly different results in some cases. I think the main variation in the different computations of the robust statistic is due to the way the information matrix is computed - observed or expected.
b) Full details on MLR are given in Asparouhov, T. and Muthén, B. (2005). Multivariate Statistical Modeling with Survey Data. Proceedings of the Federal Committee on Statistical Methodology (FCSM) Research Conference. http://www.fcsm.gov/05papers/Asparouhov_Muthen_IIA.pdf
Joonmo Son posted on Sunday, May 23, 2010 - 4:48 am
I employ an endogenous variable, number of volunteering hours per month, which might be regarded as a censored measure because one cannot use infinite time for volunteering activities. Is it right that MLM estimator is proper for such censored non-normal endogenous variable? Or should I use a different specialized estimator (e.g., tobit)? Please note that I am using multiply-imputed data sets for path analysis.
MLM/MLR are for skewed variables without piling up at the end points. CENSORED is for when there is piling up. The model used by CENSORED with ML is described in the regression literature under Tobit regression (Google it) and in factor analysis I have a paper - see
Muthén, B. (1989). Tobit factor analysis. British Journal of Mathematical and Statistical Psychology, 42, 241-250.
Joonmo Son posted on Monday, May 24, 2010 - 1:11 am
I see your point, and thanks for your reference (I found it!).
Would you mind if I raise a further query to make it clear? My dependent variable (volunteering hours per month) is skewed to the left because many respondents (844 out of 1,523, 55% of the respondents) did not volunteer at all as shown below.
However, it should be regarded as a right-censored variable because people should have physical limit for maximum use of their time for volunteering. But as you can see, there is no piling up at the right-end point. Do you think that I can use MLM?
Joonmo Son posted on Monday, May 24, 2010 - 6:25 am
Following on the previous posting: I do not think that my dependent variable involves left-"censoring" because 0 hour of volunteering literally means that the respondents chose not to volunteer. Rather is it plausible to think of it as a count measure that has zero-inflation for which I may employ negative binomial regression? The variable is overdispersed indeed (mean=6.6, S.E.=13.9).
If MLM estimator can be a good estimator to take care of my dependent variable, it would be great. Thanks in advance for your answer.
You should not treat this variable as a regular continuous variable using MLM or MLR. You have a strong floor effect (844 subjects at 0) which needs a different model than the regular linear model. You can use either censored-normal or count modeling. Treating it as a count variable is probably best. You can use Poisson, Zero-inflated Poisson, or negative binomial. For a discussion of those options, see the Web Talk video on regression with a count dependent variable at
the type 2 model integrates a second y* for an additional process determining the choice part (0;1)
y1= 1 if y1* > 0 y1= 0 if Y1* <=0
y2 =y2* if y1* > 0 y2 =0 if Y1* <=0
with two separate regressions for y1* and y2*.
it is a bivariate sample selection model, equivalent to tobit modell with stchastic treshold and also called probit selection equation. see Cameron, Adrian Colin; Trivedi, Pravin K. (2009): Microeconometrics. Methods and applications. 8. printing. Cambridge: Cambridge Univ. Press. p 547
Maybe I can replicate this with the two-part model?
Don't know the answer, but the two-part thinking is interesting, with a censored-normal (regular Tobit) model for the y2 part instead of the two-part missingness for y2 when y1=0. But you have to check that you get the right likelihood that way - or check with known results for a data set.
Apologies if the answers can be found elsewhere in these posts - tried to locate them but could not.
I am conducting a path analysis (all observed variables) with continuous predictors and a continuous outcome. Two of my predictors are count variables and are highly skewed (piling up on the left end - most had few negative events and few alcoholic drinks in the past week). My outcome is a hormone concentration and is usually log-transformed to correct for non-normality. There is missing data I need to account for. Several questions regarding Mplus:
1) Can Mplus "handle" my two skewed count predictors (piling up on left end) and if so, which estimator would you recommend?
2) Given the answer to #1, does it matter that the outcome/dependent variable has been log-transformed?
3) Can missing data (MAR or MCAR) be handled by Mplus if the estimator you recommended in #1 is used?
1-2. There are no distributional assumptions made about covariates. For both the count and continuous variables, transformations that yield a linear relationship might be considered. You might consider not transforming the continuous variable but instead treating it as censored. I would use the MLR estimator.
Ted Barker posted on Wednesday, April 25, 2012 - 11:05 am
Is there a range of values of skew in which the MLR estimator functions best?
I know of no articles on this topic. It is a research question.
Tracy Witte posted on Thursday, April 11, 2013 - 9:14 am
Given that no distributional assumptions are made about covariates, is there any difference in Mplus between a model that specifies as predictor variable as a count variable and leaving that statement out?
Dear Drs. Muthén and Muthén, I am very new with SEM and Mplus. I am deeply get stuck which really need your help. I am running several mediation models in which independent variables are latent variables; mediators are latent variables; 4 original outcome variables are including (1) 5 ordered categorical variable; (2) latent variables for example: alcohol problems in which alcohol problems = mean (x1, x2, x3) and x1, x2, x3 = 5 ordered categorical variables. I transferred all outcome variables into real scale so they become continuous variables. However, it is very much skew (the sknewness is about 9.4 and 7.4 even I used square root transformed)
My questions: (1) if the outcome variables are very much skew, can I till model it with estimator = MLR? (2) if not, I also think about zero inflated negative binominal regression. Is this ok to use this model? If yes, what outcome variables I should use (the original variables or the transformed ones)? If no, please suggest me what kind of model I should use. Many thanks and Regards,
(1) if your strong skew is due to strong floor or ceiling effects MLR won't help because you may need a non-linear model instead of the standard linear.
(2) Using ZINB assumes that you have a count outcome which it doesn't sound like you have. You can treat the outcomes as ordered categorical (ordinal variables) and let them be indicators of a latent variable.
Howard Li posted on Friday, June 03, 2016 - 10:50 am
Dear Madam or Sir: I am using path analysis in my study, evaluating the mediating role of social norm and self-efficacy on the association between community engagement and condom use. The study have four variables as follow: 1. community engagement, 8 items, dichotomy (yer or no) 2. social norm, 6 items, 5 point likert scale 3. self-efficacy, 7 items, 5 point likert scale 4. condom use frequency with different partners, 4 items, and each item is their own outcome. My question is what estimator I should use in this path analysis using Mplus? Thank you so much!!!
1)A co-author asks, what estimation method is used in model. In my understanding we use the maximum-likelihood-approach and because the data are not normally distributed we choose the Satorra-Bentler corrections. Is that right or may be something else meant? 2)Which references we can cite? 3)Because all variable are manifest. Shall we call the model as OLS regression or SEM? 4)Because the model is "saturated", fit indices will show perfect fit (either zero or 1.0 for most fit indices) and these statistics cannot be used to determine how well the model fits. Which indexes should we report? Best regards, Martin