Mplus Discussion >> Appropriate estimator for path model with non-normal data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Appropriate estimator for path model ...

Mplus Discussion > Structural Equation Modeling >

Message/Author

Anonymous posted on Friday, September 06, 2002 - 10:58 pm

I am doing two path models. One is nested, and i want to compare the two using a chi-square difference test. Some variables are not normal, so as i understand things, i will need to use an estimator with robust statistics. All variables are continuous. Can you recommend which estimators are most appropriate?

Linda K. Muthen posted on Saturday, September 07, 2002 - 9:11 am

The only robust estimator for continuous outcomes that can be used for difference testing is MLM. A scaling correction factor is given in the Mplus output and an explanation of how to do difference testing using MLM is given on the Mplus website.

Anonymous posted on Monday, November 18, 2002 - 1:21 pm

Hello, Linda,

In psychology field, exponential variable is quite common in practice.
If in the SEM there are categorical, normal and exponential variables, is Mplus still useful? which procedure do you recommond? if not, Can you create special programm (variables are mixture of categorical, normal and exponential) for common use in SEM.

Daniel

bmuthen posted on Monday, November 18, 2002 - 5:19 pm

I think you are referring to a variable with an exponential distribution. Mplus does not handle such outcomes. Unless a transformation can be done, I am not sure that approximating this as continuous or categorical is recommendable. With enough uses for these kinds of outcomes, the Mplus team might be motivated to add this feature to their list of future expansions.

Mahyar posted on Wednesday, March 26, 2003 - 4:06 pm

I'm new to Mplus and wanted to know whether Mplus has the bootstrapp option? If so, what is it? Also are there any good recent papers on bootstrapping that is in Mplus?

bmuthen posted on Wednesday, March 26, 2003 - 4:08 pm

The current version of Mplus does not have a bootstrap option, but it is on a future wish list. If many users are interested in this, the Mplus team will consider the request.

Tor Neilands posted on Friday, June 17, 2005 - 9:12 pm

I'm interested in learning more about the MLR estimator that is suitable for use with clustered or non-clustered non-normal continuous or categorical observed variables. I've read Yuan and Bentler's 2000 article and understand that MLR traces its origins to White's 1982 sandwich variance estimator as well as Satorra and Bentler's seminal work that lead to the widely used T_MLM estimator.

Can you tell me how MLR differs from the usual White sandwich estimator that is used routinely in packages such as SAS (e.g., PROC GENMOD) and Stata (via the -robust- option in most estimation commands)? I know that MLR assumes incomplete data arise from a MAR process whereas the White sandwich estimator assumes incomplete data arise from an MCAR process, but I am curious to learn, in broad strokes, the differences in computation between these two approaches.

Also, if I wanted to compute T_MLR and associated standard errors manually from the outputs of a SEM program such as Mplus, how would I go about doing it?

With many thanks,

Tor Neilands

BMuthen posted on Sunday, June 19, 2005 - 5:14 am

The MLR formulas are in the Tech appendix 8 on the website. It is the sandwich approach which is the same as White. The missing data aspect comes in by using an observed instead of an expected info matrix in this expression. MAR vs MCAR affects the parameter estimates not the standard errors.

I don't think you can easily compute T_MLR and associated standard errors from the output.

Manuel posted on Tuesday, August 05, 2008 - 7:40 am

I noted a slight (but for me important) difference between the third and fifth edition of the users guide. In the third edition it says that "the MLR chi-square test statistic is also referred to as the Yuan-Bentler T2* test statistic" (p. 368), while the fifth edition states that "the MLR chi-square test statistic is ASYMPTOTICALLY equivalent to the Yuan-Bentler T2* test statistic" (p. 484).

a) What is the reason for this change? If in a finite sample MLR chi^2 is NOT equivalent to Yuan-Bentler T2*, how do the two differ?

b) what is the correct reference for MLR chi^2 (Yuan & Bentler 2000?). I checked appendix 8, which contains no reference to Yuan and Bentler.

c) On a related note, am I correct to assume that the MLM chi-square test statistic is equivalent to
the Satorra-Bentler chi-square (even non asymptotically)? What is the correct reference for MLM chi^2 (Satorra & Bentler, 1994?).

Thank you very much for any insights!

Linda K. Muthen posted on Tuesday, August 05, 2008 - 9:59 am

a) They are supposed to be equivalent, but Mplus and EQS report slightly different results in some cases. I think the main variation in the different computations of the robust statistic is due to the way the information matrix is computed - observed or expected.

b) Full details on MLR are given in
Asparouhov, T. and Muth�n, B. (2005). Multivariate Statistical Modeling with Survey Data. Proceedings of the Federal Committee on Statistical Methodology
(FCSM) Research Conference.
http://www.fcsm.gov/05papers/Asparouhov_Muthen_IIA.pdf

c) Yes.

Joonmo Son posted on Sunday, May 23, 2010 - 4:48 am

I employ an endogenous variable, number of volunteering hours per month, which might be regarded as a censored measure because one cannot use infinite time for volunteering activities. Is it right that MLM estimator is proper for such censored non-normal endogenous variable? Or should I use a different specialized estimator (e.g., tobit)? Please note that I am using multiply-imputed data sets for path analysis.

Linda K. Muthen posted on Sunday, May 23, 2010 - 9:50 am

If the variable does not have a piling up of values at one end, I would use MLM or MLR which are robust to non-normality. It it does, I would use the CENSORED option to obtain Tobit modeling.

Joonmo Son posted on Sunday, May 23, 2010 - 7:27 pm

Thanks for your kind and helpful answer. Would you give me one or two reference(s) dealing with the issue of how MLM takes care of censored continuous dependent variables?

Bengt O. Muthen posted on Sunday, May 23, 2010 - 9:36 pm

MLM/MLR are for skewed variables without piling up at the end points. CENSORED is for when there is piling up. The model used by CENSORED with ML is described in the regression literature under Tobit regression (Google it) and in factor analysis I have a paper - see

Muth�n, B. (1989). Tobit factor analysis. British Journal of Mathematical and Statistical Psychology, 42, 241-250.

Joonmo Son posted on Monday, May 24, 2010 - 1:11 am

I see your point, and thanks for your reference (I found it!).

Would you mind if I raise a further query to make it clear? My dependent variable (volunteering hours per month) is skewed to the left because many respondents (844 out of 1,523, 55% of the respondents) did not volunteer at all as shown below.

volhours N
0 844
1 25
2 62
3 35
4 76
5 44
��
80 3
82 1
110 1
112 1
120 1
125 1
200 1

Total 1,523

However, it should be regarded as a right-censored variable because people should have physical limit for maximum use of their time for volunteering. But as you can see, there is no piling up at the right-end point. Do you think that I can use MLM?

Joonmo Son posted on Monday, May 24, 2010 - 6:25 am

Following on the previous posting: I do not think that my dependent variable involves left-"censoring" because 0 hour of volunteering literally means that the respondents chose not to volunteer. Rather is it plausible to think of it as a count measure that has zero-inflation for which I may employ negative binomial regression? The variable is overdispersed indeed (mean=6.6, S.E.=13.9).

If MLM estimator can be a good estimator to take care of my dependent variable, it would be great. Thanks in advance for your answer.

Bengt O. Muthen posted on Monday, May 24, 2010 - 8:22 am

You should not treat this variable as a regular continuous variable using MLM or MLR. You have a strong floor effect (844 subjects at 0) which needs a different model than the regular linear model. You can use either censored-normal or count modeling. Treating it as a count variable is probably best. You can use Poisson, Zero-inflated Poisson, or negative binomial. For a discussion of those options, see the Web Talk video on regression with a count dependent variable at

http://www.statmodel.com/webtalks.shtml

Alexander Kapeller posted on Sunday, May 15, 2011 - 3:57 am

Hello,
I am following the discussion of tobit models. Does Mplues calculate the normal tobit or also the tobit type 2 model.

how would i specify the second?

thanks a lot

Bengt O. Muthen posted on Sunday, May 15, 2011 - 10:17 am

Remind me what Tobit type 2 is - it was about 30 years since I looked at that - was that estimating the censoring point? Mplus only does the classic Tobit of Tobin.

Alexander Kapeller posted on Sunday, May 15, 2011 - 10:43 am

hi Bengt,

the type 2 model integrates a second y* for an additional process determining the choice part (0;1)

y1= 1 if y1* > 0
y1= 0 if Y1* <=0

y2 =y2* if y1* > 0
y2 =0 if Y1* <=0

with two separate regressions for y1* and y2*.

it is a bivariate sample selection model, equivalent to tobit modell with stchastic treshold and also called probit selection equation.
see Cameron, Adrian Colin; Trivedi, Pravin K. (2009): Microeconometrics. Methods and applications. 8. printing. Cambridge: Cambridge Univ. Press. p 547

Maybe I can replicate this with the two-part model?

Bengt O. Muthen posted on Sunday, May 15, 2011 - 8:41 pm

Don't know the answer, but the two-part thinking is interesting, with a censored-normal (regular Tobit) model for the y2 part instead of the two-part missingness for y2 when y1=0. But you have to check that you get the right likelihood that way - or check with known results for a data set.

Melissa Hagan posted on Wednesday, November 09, 2011 - 9:21 am

Apologies if the answers can be found elsewhere in these posts - tried to locate them but could not.

I am conducting a path analysis (all observed variables) with continuous predictors and a continuous outcome. Two of my predictors are count variables and are highly skewed (piling up on the left end - most had few negative events and few alcoholic drinks in the past week). My outcome is a hormone concentration and is usually log-transformed to correct for non-normality. There is missing data I need to account for. Several questions regarding Mplus:

1) Can Mplus "handle" my two skewed count predictors (piling up on left end) and if so, which estimator would you recommend?

2) Given the answer to #1, does it matter that the outcome/dependent variable has been log-transformed?

3) Can missing data (MAR or MCAR) be handled by Mplus if the estimator you recommended in #1 is used?

Thanks in advance!

Linda K. Muthen posted on Wednesday, November 09, 2011 - 5:44 pm

1-2. There are no distributional assumptions made about covariates. For both the count and continuous variables, transformations that yield a linear relationship might be considered. You might consider not transforming the continuous variable but instead treating it as censored. I would use the MLR estimator.

3. Yes.

Ted Barker posted on Wednesday, April 25, 2012 - 11:05 am

Hello,

Is there a range of values of skew in which the MLR estimator functions best?

Many thanks!

Ted

Linda K. Muthen posted on Thursday, April 26, 2012 - 1:48 pm

I know of no articles on this topic. It is a research question.

Tracy Witte posted on Thursday, April 11, 2013 - 9:14 am

Given that no distributional assumptions are made about covariates, is there any difference in Mplus between a model that specifies as predictor variable as a count variable and leaving that statement out?

Linda K. Muthen posted on Thursday, April 11, 2013 - 1:22 pm

The COUNT option is for dependent variables only. In regression, all covariates are treated as continuous variables.

Pham Bich Diep posted on Thursday, August 07, 2014 - 8:40 pm

Dear Drs. Muth�n and Muth�n,
I am very new with SEM and Mplus. I am deeply get stuck which really need your help. I am running several mediation models in which independent variables are latent variables; mediators are latent variables; 4 original outcome variables are including (1) 5 ordered categorical variable; (2) latent variables for example: alcohol problems in which alcohol problems = mean (x1, x2, x3) and x1, x2, x3 = 5 ordered categorical variables.
I transferred all outcome variables into real scale so they become continuous variables. However, it is very much skew (the sknewness is about 9.4 and 7.4 even I used square root transformed)

My questions: (1) if the outcome variables are very much skew, can I till model it with estimator = MLR?
(2) if not, I also think about zero inflated negative binominal regression. Is this ok to use this model? If yes, what outcome variables I should use (the original variables or the transformed ones)? If no, please suggest me what kind of model I should use.
Many thanks and Regards,

Bengt O. Muthen posted on Friday, August 08, 2014 - 12:03 pm

(1) if your strong skew is due to strong floor or ceiling effects MLR won't help because you may need a non-linear model instead of the standard linear.

(2) Using ZINB assumes that you have a count outcome which it doesn't sound like you have. You can treat the outcomes as ordered categorical (ordinal variables) and let them be indicators of a latent variable.

Howard Li posted on Friday, June 03, 2016 - 10:50 am

Dear Madam or Sir:
I am using path analysis in my study, evaluating the mediating role of social norm and self-efficacy on the association between community engagement and condom use. The study have four variables as follow:
1. community engagement, 8 items, dichotomy (yer or no)
2. social norm, 6 items, 5 point likert scale
3. self-efficacy, 7 items, 5 point likert scale
4. condom use frequency with different partners, 4 items, and each item is their own outcome.
My question is what estimator I should use in this path analysis using Mplus?
Thank you so much!!!

Linda K. Muthen posted on Friday, June 03, 2016 - 3:06 pm

When one or more dependent variables are categorical, WLSMV is the default. You can also use ML.

Martin Ratzmann posted on Wednesday, September 07, 2016 - 12:22 am

Dear Drs. Muthen,
Maybe we use results from mplus for an article. I think we have definitional misunderstandings between co-authors and you can help to clarify this.

First, we have a simple model with five manifest variables:

x1 (number of persons)
x2 (number of doctors)
x3 (a dummy coded variable for operating room)

y1 y2 (concentration of specific particles)

We have included the resulting 2-way-, and 3-way-interaction terms:

x1x3
x2x3
x1x2
x1x2x3

Because the data are not normally distributed, we used the MLM estimator.

ANALYSIS:
ESTIMATOR=MLM;
y1 y2 ON x1 x2 x3
x1x3
x2x3
x1x2
x1x2x3;

1)A co-author asks, what estimation method is used in model. In my understanding we use the maximum-likelihood-approach and because the data are not normally distributed we choose the Satorra-Bentler corrections. Is that right or may be something else meant?
2)Which references we can cite?
3)Because all variable are manifest. Shall we call the model as OLS regression or SEM?
4)Because the model is "saturated", fit indices will show perfect fit (either zero or 1.0 for most fit indices) and these statistics cannot be used to determine how well the model fits. Which indexes should we report?
Best regards,
Martin

Bengt O. Muthen posted on Wednesday, September 07, 2016 - 5:50 pm

1) You are right.

2) See references on our webpage:

http://www.statmodel.com/chidiff.shtml

3) I would call it multivariate regression (OLS and ML estimates are the same).

4) I would do the usual regression checks of outliers and plots of residuals.

Martin Ratzmann posted on Thursday, September 08, 2016 - 1:48 am

I thank you very much!

Rhyan posted on Thursday, April 12, 2018 - 7:05 am

Hi. I am running a large path analysis with 5 continuous and 2 dichotomous hypothesized mediators between an ordinal exogenous measure and a dichotomous outcome measure.

I'm currently using the Estimator = WLSMV and FIML to account for the different types of variables and missing data.

My question is: 4 of the continuous measures are not normally distributed. I've seen mixed thoughts online about transforming these variables. What would you recommend?

Thank you.

Bengt O. Muthen posted on Thursday, April 12, 2018 - 12:24 pm

The main issue is if the non-normal variables have strong floor or ceiling effects in which case you might want to use the Censored option.

Note also that mediation with categorical mediators and outcomes are optimally handled by counterfactually-defined indirect and direct effects. See our Mediation page on our website.

shonnslc posted on Thursday, May 16, 2019 - 7:24 pm

Hi I am running path model with a severely non-normal endogenous variable (skewness: 5.45/kurtosis:36.80). This variable is proportion data (0 to 1). Around 80% of my participants have 0. I found that cubic root is effective in transforming the variable to become relatively normal. At the same time, I am wondering in this case if other models such as censored regression or censored-inflated regression are also appropriate to model my data. If yes, should I use transformation or use other models? Thank you.

Bengt O. Muthen posted on Friday, May 17, 2019 - 11:14 am

Just use the censored approach - transforming doesn't solve it with an 80% floor effect.