Mplus Discussion >> Path analysis with logit regression vs. probit regression

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Path analysis with logit regression v...

Mplus Discussion > Categorical Data Modeling >

Message/Author

Anonymous posted on Wednesday, April 27, 2005 - 6:58 pm

Hi everyone, I am a newbie to MPLUS. I have a question about basic logic behind the path analysis with categoical outcome. Why is the probit regression set to be the default model to estimate the path coefficients? Why not use logit regression since it is preferable in population studies? I check the user's guide, but it does not explain the reason. Does anyone come across any paper that explain why? Thanks in advance!

Linda K. Muthen posted on Thursday, April 28, 2005 - 8:38 am

The probit model is the default in Mplus because it is a more general model than the logit model for multivariate dependent variables. The logit model is also available in Mplus. There are general statistical references describing multivariate logistic distributions and their restrictions related to correlations.

Achamyeleh Gebremariam posted on Wednesday, October 27, 2010 - 2:32 pm

This is my first time using Mplus as well as doing path analysis.

I have dichotomous outcomes and independent variables that are either dichotomous or nominal (eg. race).

Here is the setup: u1, u2 and u3 are the dichotomous dependent variables and x1 and x2 are independent variables with x2 representing race categories.

VARIABLE: NAMES ARE u1 u2 u3 x1 x2;
CATEGORICAL IS u1 u2 u3;

ANALYSIS: ESTIMATOR = ML;
MODEL: u1 ON u2 u3 x1 x2;
u3 ON x3 ;
u2 ON u3;

A few questions:
a) Is this a correct set up?
b)how do I specify that x2 (race) is a nominal variable?
c) can the coefficients provided be interpreted as path coefficients? or how do I identify the direct and indirect effects?

My goal is to have path coefficients for the relationships and a corresponding significance test. I appreciate your help.

Thank you.

Linda K. Muthen posted on Wednesday, October 27, 2010 - 4:16 pm

If race is a nominal variable, you need to create a set of dummy variables to use as covariates. With three categories, two dummy variables are needed. You can use DEFINE to create these dummy variables. See a regression textbook if you do not know how to do this.

The tyoe of regression coefficients in a path analysis are determined by the scale of the dependent variable and the estimator that is used. With ML and categorical dependent variables, logistic regression coefficients are estimated. Examples 3.11 through 3.17 show various path analyses. Example 3.16 shows how to use MODEL INDIRECT to obtain indirect effects.

Achamyeleh Gebremariam posted on Monday, November 01, 2010 - 6:46 am

Thank you Linda for your quick response.I have been able to run the model after creating the dummy variables.

Now I have two questions.
1) One is regarding indirect effects. I did follow example 3.16 and put the indirect effects as follows:

VARIABLE:NAMES ARE u1 u2 u3 x1 x2 x3 x4 x5 x6;
USEVARIABLES ARE u1 u2 u3 x1 x2 x4 x5 x6;
CATEGORICAL IS u1 u2 u3 ;
ANALYSIS:ESTIMATOR = MLR;
MODEL: u1 ON u2 u3 x1 x2 x4 x5 x6;
u2 ON u3;
u3 ON x2;
MODEL INDIRECT:
u1 IND u2 u3 x2;
u1 IND u3 x2;
u1 IND u2 u3;

I get this error message. Anything I need to do differently?

*** ERROR
MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION.

2) The data comes from a sample survey and I analyzed it separately without the indirect effects and the # of observations it uses (n=1,261) is different from the number of observations I expect to see (n=1,643). Is there anything that may cause the drop in the number of observations?

Thanks again for the help.

Linda K. Muthen posted on Monday, November 01, 2010 - 9:12 am

MODEL CONSTRAINT can be used to compute indirect effects when MODEL INDIRECT is not available. However, in Mplus with maximum likelihood and categorical mediators, this cannot be done. With categorical mediators, indirect effects should be computed using probit regression and weighted least squares estimation.

If cases are not used, you should have a message to that effect, for example, observations with missing on observed exogenous variables are deleted. If this is not the case, you may be reading the data incorrectly and should send your files and license number to support@statmodel.com.

Achamyeleh Gebremariam posted on Tuesday, November 02, 2010 - 12:44 pm

Thanks again Linda for your help. I did the analysis with constraint. and it works fine. I created the constraints as products of the path coefficients. I have a couple of questions.

a) Would that be correct even in the case with logit models?
b) Can I exponentiate the indirect effect and interpret it as an odds ratio? If not how would I go about describing the extent of indirect effect?

I still have problems with the sample size in the weighted vs. unweighted analysis and have sent the data and the Mplus code. Thanks for your help.

Linda K. Muthen posted on Tuesday, November 02, 2010 - 2:31 pm

You cannot have an indirect effect with a categorical mediator with maximum likelihood estimation. If the mediator is continuous that is okay. In this case and the final dependent variable is categorical, you can exponentiate the indirect effect to obtain an odds ratio.

Achamyeleh Gebremariam posted on Wednesday, November 03, 2010 - 8:12 am

Linda,

Even though I cannot have indirect effect with a categorical mediator under MLE, in my set up it is acceptable to exponentiate indirect effects obtained by multiplying the path coefficients to obtain an odds ratio. Am I understanding your comment right?

If possible, I would appreciate suggestions to any references that address this issue.

Thank you very much for all your help.

Linda K. Muthen posted on Wednesday, November 03, 2010 - 12:38 pm

This would not be correct for the model you show above because the mediator is categorical. The reason is that when the mediator is a dependent variable, it is treated as a latent response variable whereas when it is an independent variable, it is treated as a continuous variable. I know of no reference for this. The mediator must treated in the same way both when it is a dependent variable and an independent variable to compute at indirect effect.

Achamyeleh Gebremariam posted on Friday, November 05, 2010 - 6:51 am

You mentioned a few posts ago that with categorical mediators one should use probit regression with WLS to get indirect effects.
a) Would that approach work in my case?
b) Can indirect effects be obtained by multiplying the path coefficients?

Thanks again for all your help.

Linda K. Muthen posted on Friday, November 05, 2010 - 6:57 am

Yes.
Yes.

Cecily Na posted on Sunday, December 12, 2010 - 2:22 pm

Dear Linda,
I did an SEM model with categorical variables. The output has means/intercepts/threshold for each level of every categorical variable. Where can I find the overall intercept for the probit structural equation? What syntax command should I use?

For example,
STD [z score] = intercept + b1*crime + b2* drug; where crime and drug are all categorical variables.

Thanks a lot!

Linda K. Muthen posted on Sunday, December 12, 2010 - 5:31 pm

The intercept for a probit regression is under the heading Threshold.

Melissa Kimber posted on Wednesday, February 01, 2012 - 7:02 am

Hello,
I am trying to run a path analysis with a binary dependent variable.
I keep getting the following message:

**Categorical variable TREAT contains non-integer values.**

TREAT is my dependent variable.
Any thoughts?
Thank you.

***Melissa

Melissa Kimber posted on Wednesday, February 01, 2012 - 7:05 am

Sorry, just to add, all of my values for TREAT are either zero or 1.....so, I am not clear on why I am having a problem.
So I need to change it from 0 & 1 to 1 & 2?

Thanks

Linda K. Muthen posted on Wednesday, February 01, 2012 - 7:31 am

You are reading the data incorrectly. You probably have blanks in the data set which are not allowed with free format data. If you can't see the problem, send the input, data, output and your license number to support@statmodel.com.

Mario Mueller posted on Monday, April 16, 2012 - 2:34 am

Hello,

I'm running a path model (N=10.000) with a binary (75/25-split) dependent variable, 8 independent variables and indirect effects.

When I specify the MODEL INDIRECT command I get the message...

* The chi-square value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used
for chi-square difference testing in the regular way. MLM, MLR and WLSM
chi-square difference testing is described on the Mplus website. MLMV, WLSMV,
and ULSMV difference testing is done using the DIFFTEST option.

...and when using MODEL CONSTRAINT the output didn't provide any fit indices. Can I ignore this message or what exactly does it mean?

Thanks, Mario

Linda K. Muthen posted on Monday, April 16, 2012 - 6:06 am

The message is telling you what to do if you are doing difference testing of two nested models. It does not sound like you are so you can ignore this message.

Samuli Helle posted on Tuesday, November 13, 2012 - 2:14 am

Hi,

I'm doing a path analysis with two continuous DVs and one binary DV with MLR estimation. All IVs are continuous. For the logit part, I get "reasonable" parameter estimates and their SEs, but the odds ratio estimates show the following:

LOGISTIC REGRESSION ODDS RATIO RESULTS

GYNTULEH ON
VDR 0.000
ODR *********

What's wrong with the odds ratio estimates? For the binary DV, ca. 60 cases have value of 1 and ca. 170 cases have value of 0.

Thanks a lot!

Samuli

Linda K. Muthen posted on Tuesday, November 13, 2012 - 5:40 am

Please send the output and your license number to support@statmodel.com.

Leslie Roos posted on Thursday, June 27, 2013 - 7:18 am

Hello!

I'm running a mediation model with 2 binary mediators and a nominal outcome. I'm interested in determining if there is a significant mediation for each of the mediators on each 2 of the levels of the nominal outcome variable vs. nominal reference group. My understanding is with nominal outcome variables & model constraint, it is not possible to obtain Indirect Odds ratios, however it is possible to obtain if the indirect effect is significant.

What would be the best way to determine the significance of the indirect effect of the mediator at each level of the nominal outcome? Below is the relevant part of the model. I was attempting to constrain the nominal outcome (Jail2L) at the first 2 levels.

MODEL:
aax12Nas ON cdum1; (a1);
asub ON cdum1; (a2);
Jail2L@1 ON aax12Nas; (b1);
Jail2L@1 ON asub; (b2);
Jail2L@2 ON aax12Nas; (b10);
Jail2L@2 ON asub; (b20);

MODEL CONSTRAINT:
NEW(eff1 eff2 eff10 eff20);
eff1=a1*b1;
eff2=a2*b2;
eff10=a1*b10;
eff20=a2*b20;

Thank you!
Leslie

Leslie Roos posted on Thursday, June 27, 2013 - 7:36 am

Hi,

I am now trying to use the '#' as a constraint and it seems to have worked. I wanted to check in that I am not violating specific assumptions in categorical / nominal variable modelling?

Thank you again!
Leslie

aax12Nas ON cdum1 (a1);
asub ON cdum1 (a2);
Jail2L#1 ON aax12Nas (b1);
Jail2L#2 ON aax12Nas (b10);
Jail2L#1 ON asub (b2);
Jail2L#2 ON asub (b20);

MODEL CONSTRAINT:
NEW(eff1 eff2 eff10 eff20);
eff1=a1*b1;
eff2=a2*b2;
eff10=a1*b10;
eff20=a2*b20;

Linda K. Muthen posted on Thursday, June 27, 2013 - 11:21 am

Indirect effects cannot be computed as products in your situation. See the following paper which is available on the website:

Muth�n, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus

Frida Jonsson posted on Tuesday, February 23, 2016 - 10:20 am

Hi!
I�m examining a multiple mediator model through path analysis. I have variables at four time points and the model include three IV�s (at t1), three DV�s at t2 (1 binary and 2 continuous), three DV�s at t3 (1 binary and 2 continuous) and one final continuous DV at t4. I have used WLSMV with THETA and examined indirect effects using MODEL INDIRECT together with the BOOTSTRAP analysis command.

I�ve got a response from a reviewer suggesting that I should use Monte Carlo Integration to attain a logit rather than a probit model, since �in a multivariate case these are superior�. I�m not sure in what ways this would be beneficial or the more correct way to examine my model.

From what I have read on this forum, using WLSMV rather than MLR with INTEGRATION=MONTECARLO and MODEL CONSTRAINT, is the better option is you have categorical mediators. Is this correct? If yes, can you point me to any references supporting this?

Also, if I were to do as the reviewer suggests, then I would not be able to have the residuals of my mediators be correlated (risk of having a misspecificed model). Is this an argument I can use to support the WLSMV approach?

Thank you for your time!
/Frida

Bengt O. Muthen posted on Tuesday, February 23, 2016 - 6:28 pm

When you have a mediation model, ML becomes problematic if you have categorical mediators in that you would have to combine linear and non-linear regressions. WLSMV instead works with a latent response variable mediator.

I don't know what the reviewer means by logit being superior for the multivariate case. I would say the opposite because the probit with WLSMV is more flexible in the multivariate case - like you say correlated residuals. Although missing data can be an issue. See also our FAQ on estimator choices with categorical variables.

Mark Thomas posted on Thursday, September 08, 2016 - 6:40 am

Hi Dr. Muthen,

I am trying to run a path analysis with logistic regression with subgroup analyses (male vs. female). I have successfully ran separate models for the binary outcome with the whole sample, and continuous outcome for subgroup analyses. However, I encounter the following issue when I try to do logit with dichotomous variables.

" ALGORITHM=INTEGRATION is not available for multiple group analysis.
Try using the KNOWNCLASS option for TYPE=MIXTURE."

My code:
VARIABLE: NAMES ARE SEX AGE RACE EDU BMI SMK ALCHL KILO
BPhy BVerb BAng BHost CMH BPAQ MAP EPI NOR SLOPE AUC ZMS DIMS;
USEVARIABLES ARE BAng epi nor slope auc map diMS SEX AGE RACE EDU ;
MISSING ARE .;
CATEGORICAL IS DIMS;
GROUPING IS SEX (0=Male, 1=Female);

ANALYSIS: ESTIMATOR = ML;
PROCESS = 2;
BOOTSTRAP = 5000;
INTEGRATION = MONTECARLO;

Thank you.

Linda K. Muthen posted on Thursday, September 08, 2016 - 11:57 am

With ML and categorical outcomes, you must use KNOWNCLASS instead of GROUPING. When classes are known, this is the same as multiple group analysis. See Example 5.33 where this is shown for Bayes. Just use ML instead of Bayes.

Jolien Vleeshouwers posted on Wednesday, January 25, 2017 - 11:35 pm

Hi,

I am running half-longitudinal mediation models using model constraint. I calculated Odds Ratios for the indirect effect, and am now wondering how to interpret the results. Could you please help me interpret the Odds ratios for indirect effect?

Thank you!

Kind regards,
Jolien

Bengt O. Muthen posted on Thursday, January 26, 2017 - 5:05 pm

Answered elsewhere.

Cathy Bae posted on Friday, September 29, 2017 - 11:49 pm

Hi everyone,

This is my first time using Mplus to run path analysis with logistic regression.

I have a dichotomous dependent variable and continous dependent & independent variables.
(Y4 is the dichotomous and Y3,Y2,Y1, and X1 are continous).
The model is fully ordered recursive path model.
I wish to obtain unstandardized coefficients, S.E., standardized coefficients, appropriate Model fit results, and R-square.

I have set it up as:

Variable:
Names are
Y4 Y3 Y2 Y1 X1;
Categorical is
Y4;
Analysis: Estimator = ML;
Model:
Y4 on X1 Y1 Y2 Y3;
Y3 on X1 Y1 Y2;
Y2 on X1 Y1;
Y1 on X1;
Output: STDYX;

I have a few questions:
a) Is this a correct set up?
b) I have set the estimator as ML. Should I run it with different estimator?
c) When I run it without a defined estimator, Mplus automatically selects probit model and returns Chi-Sqaure Test for Model Fit, RMSEA, CFI/TLI, Chi-sqaure...for Baseline Model and WRMR. Is probit a better tool or Is there a way to obtain these model fits for current set up?
d) How should I interpret AIC, BIC and Adjusted BIC?

I really really appreciate your help.
Thank you very much.

Bengt O. Muthen posted on Sunday, October 01, 2017 - 12:44 pm

You have a binary outcome which calls for counterfactually -defined effects. But you also have multiple mediators which makes it complex - see our FAQ:

Mediation: multiple mediators

You can use ML and it will give you indirect effects for Y*, the latent response variable behind the binary outcome. Logit is fine.

You talk about a test but it seems that you have zero degrees of freedom.

BIC etc is for model comparison. You can read about it in our new RMA book.