Jim Shaw posted on Monday, November 09, 2009 - 3:11 pm
I am trying to estimate the parameters of a path model where X (independent variable), Y (dependent variable), and Z (mediator) are all binary (ordered categorical) variables.
The MODEL INDIRECT statement can be used to estimate the indirect effect of X on Y via Z when a probit link function is specified in conjunction with the ULS, WLS, WLSM, or WLSMV estimator. I have observed that the WLSMV (and other LS) estimates for X and Z in the model for Y are not equivalent to the corresponding ML estimates. However, the WLSMV and ML estimates for X in the model for Z are the same. Why the discrepancy?
My preference would be to fit models for Y and Z using the ML estimator. However, indirect effects cannot be estimated when the ML estimator is used (due to the need for integration). It seems to me that I should be able to estimate the indirect effect as the product of (1) the estimate for X in the model for Z and (2) the estimate for Z in the model for Y regardless of whether a logit (ML) or probit (WLSMV) specification (estimator) is used.
There are 3 factors involved here: estimator, probit/logit model choice, and z vs z* model choice (z being your observed mediator). ML can do probit, but a remaining difference between the ML approach using probit and the WLSMV probit approach is the z vs z* difference, that is, the choice between the observed binary mediator or its underlying continuous latent response variable. This choice has no impact on the regression of z on x. In the regression of y on the mediator, however, ML uses z while WLSMV uses z*. With z you cannot multiply slopes due to different link functions.
Jim Shaw posted on Thursday, November 12, 2009 - 11:18 am
Since my previous post, I have read that Mplus substitutes the probit model score variable for the observed categorical variable (Z) in the outcome (Y) equation.
In the context of MLE, the score is the derivative of the log-likelihood function with respect to the product of the regressors and estimated parameters. I am not sure how the score variable is derived following weighted least squares estimation, though I presume it has a similar interpretation.
This leads to two questions:
(1) If the first- and second-stage regressions were estimated using ML probit regression, could one still substitute the score variable from the first-stage regression for Z in the second-stage regression and derive consistent estimates for X (in eq. 1) and Z (in eq. 2) allowing for the computation of the indirect effect?
(2) If the approach discussed in (1) is valid, then could it be extended to nominal variables? Let N be a categorical variable that mediates the relationship between X and Y (as defined previously). Could one fit a ML multinomial probit model to N, generate score variables for each of N's categories (save 1), and then model Y as a function of X and the 3 score variables for N?
Jim Shaw posted on Thursday, November 12, 2009 - 11:46 am
My comments regarding "N" may have been unclear. I meant to say that N is a nominal variable with 4 levels. Thus, 3 score variables would be generated after modeling of N via multinomial probit.
"that Mplus substitutes the probit model score variable for the observed categorical variable (Z) in the outcome (Y) equation."
See my previous post for what Mplus does with the mediator in the Y equation.
Jim Shaw posted on Friday, November 13, 2009 - 11:42 am
In the chapter of his logistic regression text that discusses path analysis with logistic regression, Scott Menard notes that one approach to estimating path coefficients with categorical mediator and outcome variables is to "calculate probit or logit model scores for observed categorical variables and then use these scores as input" for the structural model. He attributes this approach to the chapter you authored in Testing Structural Equation Models.
I interpreted what Dr. Menard wrote as meaning that the score variable predicted from the probit regression of Z on X is substituted for Z in the probit regression of Y on Z and X. That is, Y is regressed on Z* and X instead of Z and X, where Z* is represented by the prediced score for Z from the first regression. With MLE, the score is the derivative of the LL function with respect to x'b.