I'm trying to fit 2 different models, in model I have an ordinal (3 categories) as a mediating variable and the response is binary. Is it right that the mediating variable should be (or treated as) continuous? In one my second model, both the mediating and response variable are binary. Please let me know.
No, in Mplus, with logistic regression and maximum likelihood estimation, a categorical mediator is treated as a continuous variable even though it is declared categorical. In probit regession with weighted least squares estimation, a categorical mediator is treated as an underlying latent response variable.
Thanks for your response. Another question: I'm trying to use the PLOT command to get the estimated probabilities but all I get is scatterplots and histograms. Could you give me an example say when your dependent (u1) is binary and 2 independent are continuous (x1 and x2)? Thanks.
I have a path model with two ordered categorical variables, so defined in Mplus. One is purely independent and the other has a mediating role in the model (is both dependent and independent).
Where the ordered categorical variable is purely independent, does Mplus treat it as indicative of any underlying latent variable or does it treat it as, in effect, a numeric variable? Suppose, for instance, the variable is coded: 0,1,2,3,4. Is the coefficient for its effect on a dependent simply that of a similarly coded numeric variable, or is it something different?
Now, suppose the variable is a mediating variable. Regarding its effects on other variables my questions are the same as those in the prior paragraph. In other words, is there any difference from the purely independent situation?
Second, consider the (mediating) variable in its role as a dependent variable. Is it simply a numeric variable or have we now shifted to a probit model and, thus, interpretation has shifted accordingly (for instance, variability would determined by the conventions of probit analysis).
For both the IV and DV situation you have a choice between (1) working with the observed variable, treating the scores as continuous, or (2) working with an underlying continuous latent response variable that relates to the observed via thresholds.
For the IV case, if you don't put the variable on the categorical list, (1) will be used, and if you put it on the categ list, (2) will be used. I tend to prefer (1) if I think this results in approximately linear regressions.
For the DV case, (1) will be used when applying ML. When applying WLSMV, (2) will be used. The choice doesn't matter when considering the DV regressed on its predictors because then (1) and (2) are equivalent. The choice does matter a bit when considering the equation where the DV is a predictor. The choice depends on the substance of the application.
yang posted on Tuesday, November 11, 2008 - 7:10 am
I have y,m1,m2,m3 and x all binary. I also have covariates z. I want to estimate the indirect effect of x on y through m1,m2 and m3. I would like the indirect effect to have odds ratio interpretation. What is the code for doing this? I was trying the following. I appreciate your help.
model: y on m1 z; y on m2 z; y on m3 z; m1 m2 m3 on x z;
model indirect: y ind m1 x; y ind m2 x; y ind m2 x;
I'd like to follow up to Linda's post on March 25, 2008:
"No, in Mplus, with logistic regression and maximum likelihood estimation, a categorical mediator is treated as a continuous variable even though it is declared categorical. In probit regession with weighted least squares estimation, a categorical mediator is treated as an underlying latent response variable."
I am struggling with this. Imagine a system where X -> Y -> Z and Y is categorical. If the intervening variable is categorical, then the indirect effect can only be transmitted from X to Z if Y changes state in response to X. Otherwise no indirect effect can be transmitted from X to Z. It would seem like any calculation of the indirect effect from X to Z through Y would have to be weighted by the probability of Y changing state.
1. h is a continuous latent variable so a linear regression is estimated for h ON x z. If y is binary, a probit regression is estimated if the weighted least squares estimator is used. If the maximum likelihood estimator is used, a logistic regression coefficient is estimated as the default. A probit link is also available.
2. I would only do this with weighted least squares. This product is a linear regression coefficient of the indirect effect x on y* where y* is the continuous latent response variable underlying y. See the MacKinnon et al. article in Clinical Trials for further information which is on our website.
Hello, we have a model with a binary exogenous predictor (X), several mediators (M, some binary, some continuous) and a binary DV (Y). We used a ML estimation, declaring all binary mediators and the DV as categorical. Now we have to questions:
1. We are interested in the residual covariances between all mediators. However, we get a error message ("Covariances for categorical, censored, count or nominal variables with other observed variables are not defined.") when using the corresponding WITH syntax. What can we do about this?
2. We compared two models: (a) X and M are predictors of Y (like in standard logistic regression), and (b) a "causal chain" X -> M -> Y (with an additional direct effect of X on Y). We observed that the effects of M on Y are identical in both models. Could you explain the reason for this?
1. See Example 7.16 for a way to specify a residual covariances with categorical indicators and maximum likelihood estimation. Note that in this case, each residual covariance is one dimension of integration. If you want a lot of residual covariances, you should use weighted least squares estimation.
Use variables are: catTime01 catTime12 numTime1 numTime2; categorical are catTime12; model: numTime1 on catTime01; numTime2 on numTime1 catTime12; catTime12 on numTime1 catTime01;
catTime1 and catTime2 take on the values of 0,1, or 2. numTime1 and numTime2 are quantitative variables. This analysis (one quite similar) was carried out using the MLR estimator:
Consider the regression on numTime2. In its role as a predictor in this regression, is catTime12 treated as a numeric variable – in other words, is it treated in the same way as catTime01 is treated in the regression on numTime1. Or, on the other hand, is it treated as some kind of latent variable (say as a logistically distributed variable with its variance defined in some particular way)? This comes down to how I interpret the coefficient for catTime12 – if catTime12 is treated as numeric then the interpretation is: as catTime12 increases by 1 unit – either from 0 to 1 or from 1 to 2 – then numTime2 changes by X units. But if it is treated as a latent variable then I need assistance with interpretation.
I apologize for my confusion. I posted a similar question under the categorical mediator thread on October 25, 2008 and received a response but thought I should double check this. See also March 25, 2008 responses on this same thread.
In this regression, the regression coefficient for Z is .5
Would the interpretation of this coefficient be: when Z has the value of 1, the predicted value of Y is .5 units higher than when Z has the value of 0? (In other words, in this regression, is Z treated as a numeric variable?)
I am interested both when Z is purely exogenous and when it is a mediator.
In my actual model, which models several waves of data, Z takes on both of these roles. even though in the first wave of data Z is exogenous, i nevertheless defined it as categorical. (i didn't think it would do any harm to define a purely exogenous variable as categorical, but tell me if I am wrong.) My actual model is much like the following where Z1 represents Z at time 1 and Z2 does so at time 2 (and Q1 and Q2 are continuuous variables at times 1 and 2):
Use variables are: Z1 Z2 Q1 Q2; categorical are Z1 Z2; model: Q1 on Z1; Q2 on Q1 Z2; Z2 on Q1 Z1;
In regression, covariates can be binary or continuous. In both cases, they are treated as continuous. The binary variable z1 is exogenous. The regression coefficent is the difference in going from category 0 to category 1. Note that z1 is not a mediator and should not be on the categorical list. The binary covariate z2 is a mediator. It is treated as a continuous observed variable using maximum likelihood estimation and a continuous latent resposnse variable using weighted least squares estimation.
gibbon lab posted on Tuesday, April 12, 2011 - 11:07 am
Hi Dr. Muthen,
If I have such a path analysis: z on y; y on x; where z and x are continuous and y is a binary variable (0/1). I declared that y is categorical and used the weighted least squares estimation under THETA parameterization. I am not sure how I should interpret the indirect effect from x to z. Suppose the Mplus output provides a beta coefficient=0.05 for the indirect path x->y->z. Is the 0.05 interpreted as the amount of increase in y if x increases by 1 unit? Thanks.
gibbon lab posted on Tuesday, April 12, 2011 - 11:10 am
Sorry. There is a typo in my last sentence in the above post. It should be "Is the 0.05 interpreted as the amount of increase in z if x increases by 1 unit?" Thanks.
gibbon lab posted on Thursday, April 21, 2011 - 11:22 am
Can you help me understand the interpretation of the indirect effect thru the two direct paths? Here is my thought, if the beta coef=0.5 for the first path x->y. Under THETA parameterization, the link is probit. So the interpretation is that the probability of y=1 increases by 0.5 under probit scale when x increases by 1 unit.
Now if the coef=0.1 for the second path y->z, the interpretation would be that z increases by 0.1 if y changes from 0 to 1 since this is just regular linear regression.
Do we get the beta coef for indirect path by 0.5*0.1=0.05? It looks strange to me because x only affect the probability of y=1 and the effect of y on z (0.1) has nothing to do with that probability. Thanks.
I think y is your mediator, and it is binary. With WLSMV you then have a probit regression of y on x. This is linear regression of y* (the cont's latent response variable) on x. Then you have y->z, where z is continuous, which is translated to z on y* in Mplus, also a linear regression. Note that it is not z on y.
To answer your first paragraph, it is not the probability that increases 0.5 but the probit, that is, y*. That then has to be translated into a probability change.
To answer your second paragraph, it is not a matter of y changing from 0 to 1, but y* changing.
The indirect effect is 0.5*0.1 but interpreted as above.
You should also read:
MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499-513.
which is on our web site under Papers, Mediational Modeling.
Xu, Man posted on Monday, January 16, 2012 - 10:45 am
Dear Dr. Muthen,
Thank you so much for your recent help on TSCORE and missing data. It has been very very helpful to me! I now have resolved my analysis regarding these issues. However, a new dilema arises while I try to look at the mediation effect of an ordinal variable on the intercept and slope parameter of the second order growth curve model. And this growth curve was as you know already fitted using the TSCORE fuction. Now I have strated looking at how predicators earlier in life can predict cognitive decline as modelled with the growth curve. This involves mediating variables, and one of them is ordinal (education attainment).
I noticed that in the model output that regression paths of the earliest variables to this ordinal mediator had logit links, and there were linear regression paths from this ordinal mediator to the later outcomes (the slope and the intercept of the growth curve).
I don't get the mediation effect from the output as it was noted that MODEL INDIRECT is not possible in the case of TYPE=RANDOM.
I was just wondering, how do you think I should specify the model in order to calculate the mediation effects here?
Here are some estimation specifiaiton that the programme has asked for: TYPE=RANDOM; INTEGRATION=MONTECARLO;
The estimator showed in output was mlr. Link was logit.
You can treat you mediator as continuous and use MODEL CONSTRAINT to create the indirect effect as the product of the two linear regression coefficients. For another option, see the following paper which is available on the website:
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
Xu, Man posted on Saturday, January 21, 2012 - 7:08 am
Thank you! I will first try to have a read at this paper.
Jan Zirk posted on Monday, July 16, 2012 - 8:26 am
Dear Linda or Bengt, I have two competing models: M1) P on S S on G G on F W on F M2) P on S S on G F on G W on F G is a binary group variable. First I estimated the two models with MLR and found that model 1) when G defined as categorical performs better on AIC than 2) in which G is not defined as categorical variable (because here G is only an independent binary variable). Next I estimated them with Bayesian approach and again 1) fits better (in M1 PPP=.125 vs. in M2 PPP=.000), but there is a difference in the number of free parameters between these 2 models (M1: 11, M2: 12). Because of the categorical mediator in M1, DIC for this model is not available. Is comparison of this 2 models based only on 95%CI_chi2 and PPP correct, despite they have different number of free parameters?
I think I may have figured out the cause of the problem (maybe). My x variable is highly skewed. When I converted it to a binary variable and re-ran both analyses (above) I get the same results (i.e., M on X significant in both cases).