Anonymous posted on Tuesday, March 25, 2003 - 2:32 pm
I am testing a mediation model in which my x variable and mediator variable are continuous, but my Y variable is dichotomous. Is there something like a Sobel test that can be used to calculate the mediated effect in this situation? Thanks!
bmuthen posted on Tuesday, March 25, 2003 - 2:39 pm
You can use the Sobel approach to test the mediated effect on the underlying y* variable for your ultimate, dichotomous outcome. You use the same approach as if y was continuous. The necessary variances and covariances for the estimates are found in Tech3.
Are you familiar with David MacKinnon's Psych Methods article from last year (issue 2)? He seems to find that the Sobel test is noticeably underpowered.
bmuthen posted on Wednesday, March 26, 2003 - 10:36 am
Yes, I think that could be a concern with small sample sizes.
Anonymous posted on Monday, November 15, 2004 - 7:19 pm
I ran a mediation model in which x and the mediator are continous latent variables with categorical indicators and y is an observed categorical outcome. I specified type = missing and parameterization is theta. The WLSMV estimator was used. I have three questions: 1. Is it correct to do mediation using the IND statement in Mplus with a categorical observed dependent variable? 2. If so, in interpreting the mediated effect in Mplus output, am I correct in thinking the probit link function, rather than the logit link function was used? 3. How do you interpret the parameter estimates Mplus outputs for the indirect and direct effects, given the probit function was used? The manual shows calculations to determine probabilities and odds ratios if the logit link function was used, but not the probit link function.
1. Yes. 2. The probit link is used with WLSMV. The logit link is used with maximum likelihood estimators in Mplus. 3. Any indirect effect that has a probit regression coefficient as part of it is interpreted as a probit regression coefficient. See Technical Appendix 1 for a description of how to calculate probabilities from a probit regression coefficient.
Anonymous posted on Tuesday, November 30, 2004 - 2:17 pm
Hi - thank you for your response. I wanted to follow up to make sure I am not completely misunderstanding mediation with probit regression coefficients. I think I calculated the probabilties correctly, but I am having difficulty understanding what the probabilities for the direct and indirect effects mean.
My independent variable is a continuous latent variable (DELINQ). My mediator is a continuous latent variable (EXPEC). My outcome is an observed binary variable (EVERALC).
The model results are: Est S.E. Est/S.E. Std StdYX
Specific Indirect DELINQ 0.308 0.006 52.165 0.553 0.339
The threshold for EVERALC is .167
Here is what I did to convert probits to probabilties. I used the formula: Normdist*(-Threshold + B1(Mean on DELINQ))
Direct effect: Normdist (-.167 + .290*0) = .43 What I want this .43 to mean is the probability of EVERALC=1 after controlling for EXPEC and when DELINQ is at its mean. I get confused here because I am not sure that EXPEC is at its mean? Does this matter?
For the direct effect, if I calculate 1 standard deviation below the mean of DELINQ and 1 standard deviation above the mean, respectively, I get: .32 and .55.
Now for the indirect effect: Normdist (-.167+ .308*0) = .43 Mathematically, this makes sense that the answer is the same as the direct effect. But its confusing conceptually. Am I missing something? I again calculated 1 standard deviation below the mean and 1 standard deviation above the mean on DELINQ and got: .32 and .56.
The model does not give me an estimate for the total effect. But can I assume the total effect is the direct effect plus the indirect effect? (.308 + .290 = .598).
Thank you for your help!
bmuthen posted on Tuesday, November 30, 2004 - 6:05 pm
There are a couple of issues here. First, you should ask for a standardized solution so that you get a printout of the residual variance for the "y*" variable behind Everalc (see tech app 1). This variance should be used in the calculations, dividing the Normdist argument with its standard deviation.
Second, the indirect and direct effects are on the y* variable behind Everalc to make the analogy with the continuous dependent variable case. So, you are asking about the effect on this y* variable of a 1-unit change in Delinq. This analogy is useful regarding your direct effect question, because with continuous dependent variables you would not ask at which value the mediator is held.
Third, note that in terms of probabilities, because of the non-linear probability curve, a certain change in y* has different impact on the probability change depending on where on the y* scale you are. This means that indirect and direct effects are not translatable into simple probability statements.
Others may want to chime in.
Anonymous posted on Wednesday, December 01, 2004 - 12:02 pm
Is there something I can do with the probit regression coefficients for the direct and indirect effects that would make them understandable/interpretable for the average reader?
bmuthen posted on Wednesday, December 01, 2004 - 3:43 pm
I think similar issues are studied by David McKinnon at ASU who might chime in later in this discussion - or you can contact him directly.
Some of the problems with mediation in logistic and probit regression are described in MacKinnon and Dwyer (1993) Evaluation Review, 17, 144-158 and also Winship and Mare (1983). American Journal of Sociology, 89, 54-110. We have a paper under review now that more clearly describes the reason for the problem and a solution (MacKinnon, Lockwood, Brown, & Hoffman, 2004).
In general, however, simulation studies of the delta method (a.k.a. Sobel) and related standard errors for the X to M path from OLS and M to Y (adjusted for X) path from logistic or probit regression suggest that this method works in the same way (at least in terms of power and Type I error rates)as for continuous variables. A extensive simulation study was presetned at the Society for Prevention Research 2003 meeting on this topic (MacKinnon, Yoon, and Lockwood, 2003). We are about to submit a paper on this. As mentioned by Patrick Malone, the delta method standard error test has lower power than a test based on the distribution of the product and resampling tests,for the categorical Y as well as the continuous Y.
Note that Mplus now has the bootstrap confidence limits for asymmetric CLs, so you could use confidence limits from the Mplus program with the specification of Y as a categorical variable. It seems sensible to me that the IND statement will be accurate with the WLSMV estimator and categorical variables. I believe that the standard errors for these effects are based on the multivariate delta method.
The question regarding the meaning of the probit coefficients is a good one and I don't have a complete answer. In the past, I have interpreted the coefficients as the change in the probit (z) latent variable for a one unit change in X, as described in Bengt's message. In logistic regression, e(coefficient) is the odds ratio, as you know and that has very clear intepretation And computing predicted probabilites as suggested by Linda Muthen is a great way to summarize the overall model results. I suppose you could make different plots of predicted probabilities varying the value of one preditor, as anonymous has done. I think it is important to keep track of the means in this plot as it is usually a good idea to plot values that actually occurred in your data. Another option (but check the research literature first) is to convert the probit coefficients to logistic coefficients using the general formulas in Maddalla and other places--then discuss the coefficients in terms of odds ratios. There is probably a study somewhere that evaluated how accurate this was--the accuracy probably depends on whether the probits are in the tails of the distribution where the logit and probit differ the most.
I'm also testing mediation with dichotomous outcomes and I'm at my wit's end trying to figure out how to translate the probit coefficients into probabilities.
My model has a dichotomous outcome - Adult sexual assault between two time points (ASA).
There are three primary predictors: Child sexual abuse (CSA) - dichotomous Psychological Distress (PD) - continuous Sex motives (SM) - continuous
I also have 5 covariates in the model.
I have two indirect paths in my model - CSA to PD to SM to ASA CSA to PD to ASA
I am using WLSMV estimation.
First question - which I believe was answered above by David MacKinnon - are the significance tests for the indirect paths (using model indirect) appropriate with probit coefficients?
Second question, How to I translate the probits into probabilities? Let me tell you what I have done - I have looked at Technical Appendix 1 referenced by Linda Muthen above - this may be enlightening for some people but it is not for me – it seems like equation 9 may be the one to use but this is not clear to me or what to plug into it. I have obtained several Sage pubs on this issue to try to understand it - Liao's Interpreting Probability Models is helpful and there is a formula on page 23 demonstrating how to translate probits into probabilities that I understand and I was able to figure out in SPSS how to use the cummulative normal distribution function to translate a z score into a probability. The problem is that the formula in Liao is as follows:
The problem I am having translating this is that when I look at my printout from my SEM model - I don't have a constant like I would in a traditional regression so this formula must be modified somehow.
I looked at the post above with the formula for the example with everalc but again - I'm missing something - what is the threshold and where do you get it to plug into this equation?
Hopefully my question is clear and what steps I have taken to try and resolve this on my own. This analyses are for a manuscript that I am trying to get back ASAP to a journal - although I know that rerunning my analyses as I have with the WLSMV with categorical outcomes is the most appropriate analysis because I have a low base-rate dichotomous outcome, these new analyses were not requested by the reviewers and I am worried that I’m going to have to dump these new analyses and go back to my previous analyses (where ASA was treated as a continuous outcome) if I can't figure out how to report the probits as probabilities so that the reader (and I) can understand them.
I would appreciate any assistance and some hand-holding on these calculations. Thanks in advance.
BMuthen posted on Thursday, January 20, 2005 - 7:55 pm
The general formula for translating probit coefficients into probabilities is:
prob (y=1) = f (-threshold + b1*x1 + b2*x2 ...),
where f is the cumulative normal distribution function. Take the threshold value from the output (it is the negative of the intercept constant that the article used).
Thanks for the quick response - I am having trouble locating the threshold value from the output - I don't see it anywhere in my standard output. I then added basically every output option I could (many of which weren't valid anyway) and I still didn't see it anywhere. Sorry. Here is my input:
VARIABLE: names are csa t2agecs t3intv dep hos anx sxc1 sxc2 sxc3 asa t1racbnb t1racwnw t2rses0 t2psych nt2phys0; usevariables = t2agecs t3intv t1racbnb t1racwnw t2rses0 t2psych nt2phys0 csa dep hos anx sxc1 sxc2 sxc3 asa; categorical are asa;
You have to add MEANSTRUCTURE to the TYPE option of the ANALYSIS command. Then you will find the thresholds in the results under Thresholds. If this doesn't solve the problem, send the full output to email@example.com and briefly explain your problem.
the hyphen in front of threshold - should i treat that as a negative - specifically, the threshold number in my print out is .753 - in this formula - is that then "negative .753" - i just want to be prefectly clear. thank you!!
bmuthen posted on Sunday, January 23, 2005 - 3:31 pm
Yes, "-theshold" means the negative of the threshold.
thank you for clarifying - i'll have a go at the calculations then! thanks!
Mary posted on Wednesday, April 13, 2005 - 8:37 am
I am working on a model in which X is a mediator between A and B. Hence, I am looking at indirect relation (IND) going from A to B through X. I know that the coefficient of the indirect relation comes from the product of the coefficients of the regressions of 'X ON A' and 'B ON X'. What is the interpretation of the indirect relation? What does it means if it is negative?
BMuthen posted on Wednesday, April 13, 2005 - 11:17 pm
The indirect effect is interpreted as follows. For a one unit change in a, the variable b changes by the value of the indirect effect. If a increases by one, then b decreases if you have a negative indirect effect.
I have a question about a mediation model within a standard path analysis where I have an ordinal outcome measure with a clear bi-modal distribution. My instinct is to treat this variable as a dichotomous variable, although I know that this will cost me variance and is generally discouraged. I have followed this discussion and see that Probit regression coefficients are output for dichotomous outcomes and that reporting probabilities is a good general way of summarizing the indirect/direct effects. Does anyone have a suggestion of the best way to proceed with an outcome measure with a bimodal distrubution?
BMuthen posted on Tuesday, April 19, 2005 - 11:55 am
It sounds like a reasonable approach to dichotomize the variable as you suggest. If you believe that the two modes correspond to the means of different latent classes of individuals, you could instead try mixture modeling and treat the outcome as ordinal or continuous.
I do believe that the modes respond to means of two different classes of individuals, but it might be hard for me to justify this theoretically given the lack of data on the population sampled. The ordinal measure ranges from 0-5 with different ranges for each level (2 = 5-10 years, but 4 = greater than 20, but not forever). The modes are 2 (32%) an 4 (29%). If I split the distribution into a dichotomous variable I get a 56.5/43.5 split.
In terms of the mixture modeling suggestion, I think I will try this. However, is it a problem to have 1 observed indicator for a 2 class latent variable or do I treat each level as it's own indicator? And would the suggestions in terms of reporting the results of a mediation model still be apply?
bmuthen posted on Tuesday, April 19, 2005 - 1:29 pm
You can have 2 classes and only 1 indicator when you have covariates in the model as you do. The mediational aspect of the model gets more complex to report with a mixture although is perhaps more realistic.
Anonymous posted on Saturday, May 21, 2005 - 2:52 pm
In MPlus, how to test whether indirect effects are significant or not? Could you please provide a formula for the calculation? And, for such a test, is there any difference between the case when the mediator variable is a binary variable and the case when the mediator variable is a continuous variable? Thanks!
I'm not sure if you are aware of MODEL INDIRECT. This is how indirect effects are tested in Mplus. There is a MacKinnon reference in the user's guide under MODEL INDIRECT that describes the Mplus indirect effects computations.
Anonymous posted on Wednesday, August 17, 2005 - 9:26 am
As far as I understood, the coefficients for a relationsship with a binomial or categorical outcome are probit-coefficients, whereas relationships between two continous variables are OLS regression coefficients. If a model contains one as well as the other relationship, then I do have two different kinds of coefficients. Is it allowed to compare the standardized values in the sense of "this variable has a greater relative effect on y as the other one", although they are not the same?
BMuthen posted on Wednesday, August 17, 2005 - 2:14 pm
I don't think standardized probit and standardized linear regression coefficients are comparable. Although they are comparable in terms of the latent response variable underlying the categorical outcome, the former ultimately relate to a probability whereas the latter do not.
I am using mediational analysis with a dichotomous dependent variable and I am not sure which approach (i.e.between SEM and logistic regression) I should use. Could you please let me know which is the best approach to use and why.
I'm not sure what your mean by the distinction "between SEM and logistic regression".
In Mplus, there are two estimation options for a dichotomous dependent variable. With the weighted least squares estimator, you can estimate a probit regression. With maximum likelihood, you can estimate a logistic regression.
bmuthen posted on Wednesday, December 07, 2005 - 7:56 am
To add to Linda's answer, here is what I just posted on SEMNET:
Both Colin and Istvan's posts concern a dependent observed variable that is binary. Colin has a factor model predicting discrimination or not and Istvan has a path model where the ultimate dependent variable is survival or not. Istvan's model also has a mediating variable of colouration which perhaps is categorical. These applications gave rise to a discussion of analysis using SEM and logistic regression, where it wasn't clear what was possible. Latent class analysis also came up but is not relevant because there is no latent categorical variable involved.
Because of this, it is worthwhile to make clear that both Colin's and Istvan's examples can be analyzed by Mplus since Version 3 came out in March 2004. If by conventional SEM one means analysis using continuous outcomes, conventional SEM can therefore be combined with logistic regression features, still using maximum-likelihood estimation. The categorical dependent variable can be an ultimate dependent variable or a mediating variable. Several examples of related types are given in the Mplus User's Guide and the examples can be seen at http://www.statmodel.com/ugexcerpts.shtml. There are also free web videos with these kinds of examples to watch at http://www.statmodel.com/trainhandouts.shtml
Annonymous posted on Saturday, February 04, 2006 - 2:33 pm
In Mplus version 3, is the significance test for indirect effects based on the Sobel method [a*b/SQRT(b2*sa2 + a2*sb2)] or the Mackinnon method (a*b/ standard error of a*b)?
bmuthen posted on Sunday, February 05, 2006 - 5:04 pm
Mplus uses the standard error of a*b. I think you refer to the fact that in some writing the covariance between the a and b estimates is excluded - my understanding is that this is done for special simple models where this covariance is in fact zero. Mplus cannot exclude this covariance because it allows for general modeling. Mplus also provides bootstrap standard errors as well as bootstrap confidence intervals for cases where the a*b distribution is not close to normal.
Annonymous posted on Wednesday, February 08, 2006 - 10:29 am
In my model, the outcome is dichotomous, the predictor variable is ordinal, and the mediating variable is continuous latent. The estimator is WLSMV and this is a probit regression. In trying to sort out how to calculate the interpretation of the parameters (given that it is probit not logistic), I have been confused by references to the 'threshold' values - in this type of model, should I not be using the model estimated slope plus the Beta estimates for the covariates (multiplied by the values of X)?
Andrew posted on Wednesday, February 08, 2006 - 1:07 pm
What are the meaning of negative thresholds in models involving ordinal data? I am running a CFA with ordinal items using a WLSMV estimator. Is this something I should be concerned about?
bmuthen posted on Wednesday, February 08, 2006 - 6:26 pm
Answer to Anonymous: To get the indirect effect you should use the product of the slopes. The threshold is the same as a negative intercept and is therefore not used for indirect effect calculations.
bmuthen posted on Wednesday, February 08, 2006 - 6:27 pm
Answer to Andrew. Negative thresholds are fine. The analogy is an intercept - it can have both positive and negative values.
Annonymous posted on Monday, February 13, 2006 - 11:55 am
For the calculation of probit probabilities as outlined in a previous post [prob (y=1) = f (-threshold + b1*x1 + b2*x2 ...)], are the 'b' values the unstandardized parameter estimates?
Yes, the b values are the unstandardized probit regression coefficients.
Annonymous posted on Tuesday, February 14, 2006 - 8:57 am
I have a model in which: independent is categorical mediator1 is categorical mediator2 is latent dependent is binary
Due to the binary dependent variable, probit regression used. I assume that the parameters associated with dependent variable and mediator1 are probit regressions.
Is the relationship between categorical independent and latent continuous mediator2 simply a linear regression?
if so, using the standardized parameter estimates, can i compare the 'magnitude' of the two indirect pathways given that one is composed of 2 probit paths and the other is part probit part linear regression?
Annonymous posted on Tuesday, February 14, 2006 - 9:26 am
Follow up: for mediator1, the categorical variable, is there any way of testing if the slopes are equal between category levels?
bmuthen posted on Tuesday, February 14, 2006 - 5:08 pm
Answers to your 3 questions:
1. that's a correct assumption.
3. that is a stretch given that coefficients have different meaning.
Regarding your follow-up question, I thought your mediator 1 was an ordered polytomous variable which therefore has only 1 slope.
Annonymous posted on Wednesday, February 15, 2006 - 7:20 am
Ok...re: point number 3, I should just ignore the output that is generated with the model indirect option then?
bmuthen posted on Wednesday, February 15, 2006 - 7:59 am
I would not ignore the indirect output since you can certainly take a separate look at each of the two indirect effects and see how large they are. The "stretch" I mentioned concerns comparing the two indirects, but this is ok if you accept the latent response variable conceptualization for the categorical mediator; then both indirect effects go via a continuous mediator.
Annonymous posted on Friday, February 17, 2006 - 1:31 pm
re: I thought your mediator 1 was an ordered polytomous variable which therefore has only 1 slope.
it is an ordered polytomous variable (4 levels). there are three threshold values listed in the output for this variable, with $1, $2, and $3 appended to the variable name. what i am trying to do is calculate the probability values associated with the mediator (as a dependent variable) in relation to the value of the independent variable.
given that there are three threshold values provided, am I supposed to calculate three seperate equations for the probabilitiies? if so, does this mean that all of the probabilities are all 'relative' to the lowest level of the dependent variable, since it appears to have been excluded?
I am assuming you are using weighted least squares estimation.
For an ordered categorical (ordinal) dependent variable with three categories, the probit regression model expresses the probability of u given x using the two thresholds t1 and t2 and the single probit regression coefficient b,
P (u = 0 | x) = F (t1 - b*x), P (u = 1 | x) = F (t2 - b*x) - F (t1 - b*x), P (u = 2 | x) = F (- t2 + b*x).
where F is the standard normal distribution function.
Annonymous posted on Tuesday, February 21, 2006 - 7:13 am
Ok, thanks. There are four levels to this ordinal variable, though - so is the next level to this equation
p ( u = 4 | x) = F (-t3 + b*x) ?
sorry for all the questions - can you recommend an any particularly good resources?
bmuthen posted on Tuesday, February 21, 2006 - 3:19 pm
In that case the u=0 and the u=1 probs are as before, while
bmuthen posted on Tuesday, February 21, 2006 - 3:20 pm
A good intro ref is
Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons.
annonymous posted on Thursday, February 23, 2006 - 6:13 am
If the outcome variable is dichotomous, and one of the predictor variables is a latent continuous variable derived from 2 categorical variables, what values would I subsitute in to calculate probabilities in the case of the latent variable? would it still be in units of 'one'?
I work with Mplus and have a few question marks that I would like to inqure about.
(1) Can I compare the direct effects from a model with mediation effects to the ones of the same model but without the mediation variables? If so, can I conclude anything about the mediation effect based on this comparison?
2) I have a model where the indirect effect ABC is significant but weak, the direct effect AB is non-significant, and the direct effect AC is significant. Can I make the affirmation that a mediation effect exists? What if AB is significant and AC is not ?
(3) How should I interpret the following result: a negative indirect effect (ABC) and a positive direct effect (AC). Is this what we call a supressor effect?
Thanks you very much for your help. I promise I will acknowledge you at my dissertation oral defense Best Regards,
re: post by bmuthen on Wednesday, February 15, 2006 - 7:59 am "I would not ignore the indirect output since you can certainly take a separate look at each of the two indirect effects and see how large they are. The "stretch" I mentioned concerns comparing the two indirects, but this is ok if you accept the latent response variable conceptualization for the categorical mediator; then both indirect effects go via a continuous mediator."
Could you recommend a reference supporting the examination of indirect paths (but not the direct comparison of their coefficients) in a model where the mediating variables have different scales (i.e. latent vs. ordinal)?
Regarding the earlier question about interpreting probit regression coefficients, I have a situation with a latent variable measured by two continuous indicators predicting a dichotomous mediator which in turn predicts a continuous outcome. Because the mediator is dichotomous, we are using the THETA parameterization to fit this model with WLSMV estimation. Would it be appropriate to multiply the probit coefficient by the constant 1.7 and then raise e to the power of the resulting value to obtain an approximate odds ratio for this effect?
This would not be appropriate. By multiplying a probit coefficient by 1.7, you put it on the logit scale. You don't make it a logistic regression coefficient. You cannot turn a probit coefficient into an odds ratio because it does not have constant odds as a logistic regression coefficient does.
Anna Song posted on Tuesday, December 19, 2006 - 2:54 pm
Dear Muthen and Muthen,
Regarding the formula to calculate probit probabilities:
prob (y=1) = f (-threshold + b1*x1 + b2*x2 ...)
would it be appropriate to apply the forumula to generate probability values based on an indirect effect estimated via the probit WLSMV estimator? In other words, is the indirect effect coefficient a probit coefficient?
If it is not appropriate, how would you suggest presenting the indirect effect?
If an indirect effect is the product of two probit regression coefficients, the indirect effect is a probit regression coefficient.
Anna Song posted on Tuesday, December 19, 2006 - 8:12 pm
Thank you so much. Another question we had was how to interpret the model as a whole. We have a categorical predictor variable (x), 2 categ. mediators (m1, m2) & a dichot endogenous variable (y). We used WSMLV. Our results are as follows:
RMSEA = 0.0, WRMR = 0.17 MEANS/INTERCEPTS/THRESHOLDS Y = .80, M1 = 1.83, M2 = 2.30 Coefficient(S.E.) M1 on X: 0.13(0.02) M2 on X: 0.07(0.03) Y on M1: 1.09(0.09) Y on M2: 0.99(0.09) Intercepts: M1: 1.83(0.04); M2 2.30(0.05) Thresholds: Y: 5.73(0.40) Indirect effects X, M1, Y 0.15 (0.03) X, M2, Y 0.07 (0.03) Total 0.21 (0.05) **** We calculate: p|y=1;x=3|=f(-0.8 + 0.21*3)= 43.25. Would this be correct? Also, we are interpreting these results to mean X increases M1 & M1 increases Y. Y increases 1.09 units on a cumulative normal curve (or z-score units) for each unit increase M1. X increases M2 & M2 increases Y. Y increases 0.99 z-score units for each unit increase in M2. Y increases 0.15 z-score units for each unit increase in X (due to M1). Y increases 0.07 z-score units due to X on M2. The total indirect effect of X on Y is .21. Would our interpretations be correct? Any feedback on this would be greatly appreciated.
Boliang Guo posted on Wednesday, December 20, 2006 - 1:43 am
for categorical Y and/or M, you should rescale a and b for mediation effects computing, see Prof. Mackinnon's 1993 paper(Evaluation review), maybe one of Muthen's paper (psychometrika 1984) also mentioned the categorical Y and/or M? alternatively you can refer Huang's 2004 paper published in Statist. Med.(2713-2728)
Tor Neilands posted on Wednesday, December 20, 2006 - 11:30 am
If Anna were to use the standardized coefficients instead of the unstandardized coefficients, would the need to rescale be obviated?
I don't think you need anything but the product of unstandardized probit coefficients, which is what you get from Model Indirect in Mplus. McKinnon et al has a paper under review that maybe we can share shortly.
... CATEGORICAL IS Y1 Y2; ANALYSIS: PARAMETERIZATION=THETA; MODEL: Y2 ON X1 X2; Y1 ON X1 X2 X3 Y2;
Y1 is a binary variable, and Y2 is an ordinal variable. When I interpret the coefficients, can I interpret them as "the changes of the latent continuous variable underlying" Y2 (in the first equation) and Y1 (in the second equation)? For example, if the coefficient of X1, a continuous variable in the first equation, is .2, can I say one unit increase in X1 will lead to .2 increase in the continuous latent variable underlying Y2? Thanks!
Yes, you can say that. You can also look at sign and significance of the parameter estimate and convert it to a probability.
chris dawes posted on Tuesday, February 05, 2008 - 11:14 am
I am working on a mediation model with random a intercept in which the mediator and dependent variables are dichotomous. I am modeling this as:
VARIABLE: NAMES ARE Y M X ID; CATEGORICAL ARE Y M; WITHIN IS X; CLUSTER = ID; ANALYSIS: TYPE = TWOLEVEL; ALGORITHM = INTEGRATION; INTEGRATION = MONTECARLO; LINK = PROBIT; MODEL: %WITHIN% M ON X; Y ON X M; %BETWEEN%
I know I have to calculate the Sobel test by hand, but are the coefficients and standard errors comparable across the two equations or do they need to be rescaled?
They are comparable. I am not, however, aware of an article that discusses the case of mediation when both distal (y) and mediator (m) are categorical, only when y is. MacKinnon at ASU might know - his book is now out.
chris dawes posted on Tuesday, February 05, 2008 - 2:18 pm
Thank you! One quick followup. In the code above, do I need to include M in the WITHIN statement (line 3) since it is an independent variable in the second model?
It sounds like you are asking if you should have M included in your statement
Y ON X M;
if so, yes.
Note, however, that both M and Y are allowed variation on both Within and Between (see UG chapter 9), so you want your Between part of the model to relate these variables. As it is, I think you will find only their variances estimated on Between with covariance=0.
How does Mplus calculate the standard error of estimates of indirect effects? According to one of the messages posted in 2006, Mplus uses MacKinnon's standard error of a*b. Does this mean that Mplus calculates the standard error from the empirical distributions of a*b?
I just realized that the standard error printed in Mplus output is different from the standard error I calculated using Sobel's delta method.
The standard errors for indirect effects are Delta method standard errors unless bootstrap is requested. If you have discrepancies, please send your input, data, output, hand calculations, and license number to firstname.lastname@example.org.
This discussion site is excellent but I have a few questions that I am not entirely clear on.
I am testing a mediation model: X – M – Y
X is a latent variable (3 continuous indicators) M is an observed continuous variable (scale 0-4) Y is dichotomous
1. Using the delta method and model IND: is the standardisesed (SDYX) specific indirect estimate the estimate of the z test (sobel test)?
2. However, the indicators of my latent X variable deviate from normality therefore I think I should use bootstrapping or is it ok to report sobel z as the WLSMV estimator is used (which are robust to non-normality- I think?)
a. If I apply bootstrap and report the 95% CINTERVAL, how can I determine whether this estimate is significant?
3. Finally, as both my direct and indirect effects are probit coefficients is it best to calculate the probability for (1) mean of X latent variable and plus and minus one standard deviation and (2) mean of M variable and plus and minus one standard deviation when explaining my model?
I mean change the sign. That means if it is positive change it to negative, If it is negative change it to positive.
Emily Blood posted on Monday, December 22, 2008 - 2:00 pm
Hi, I have a latent growth curve with binary outcomes and mediation between repeated predictor and repeated outcome. I am using the logit link and MLR estimation and running a monte carlo simulation (data generated outside of Mplus). The INDIRECT option does not work with this type of estimation so based on an earlier post from Dr. Muthen I'm using the MODEL CONSTRAINT command to obtain the total effect between the predictor and outcome. I do get the mean value of the total effect of the monte carlo sample, but my problem is that the correct population value is not being used in calculating the 95% coverage. I put in the true population values for the direct and indirect path parameters which =0.3, but the true population parameter value is being indicated as 0.5 in the output. Can this be corrected? A starting value (specified with "*0.3" in the MODEL CONSTRAINT command) is ignored. Thanks!
The starting value is given as part of the NEW option, for example,
Emily Blood posted on Tuesday, December 23, 2008 - 7:08 am
Again, I have mediated binary growth curve. The MacKinnon and Dwyer 1993 paper indicates that both the a and b need to be divided by the v(Y*) before being multiplied to obtain the value of the indirect effect, however, above it has been stated that just using the product of the unstandardized a and b is all that is needed to obtain the value of the indirect effect (and that this is what is given by the indirect statement). When this is not done the a*b estimates are all positively biased (Table 2 of their article). Which of these is correct--or when does each apply? The reason I ask is that I have generated data with known a and b and am estimating the total and indirect effect and all of my estimates (obtained from MODEL INDIRECT or MODEL CONSTRAINT depending on if I use probit or logit link) are positively biased. I wonder if I'm not calculating the indirect and total effect correctly? Any insight you could give would be greatly appreciated. Thanks, Emily
MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499-513.
In summary, for two fixed effects the indirect effect is the product of a and b. For two random effects, the indirect effect is the product a and b plus their covariance. For one fixed effect and one random effect, the coefficient is the product of a and b.
Emily Blood posted on Tuesday, December 23, 2008 - 9:42 am
I have read their more recent article which seems to indicate that I would use the a*b in my case where a and b are the unstandardized regression coefficients. However when doing this, I am getting all positive biases so I was wondering if I was missing something.
I am running a model that perfectly matches the description of PATH ANALYSIS WITH A CATEGORICAL DEPENDENT VARIABLE AND A CONTINUOUS MEDIATING VARIABLE WITH MISSING DATA as mentioned in your manual.
However, there is about 60 out of 178 cases missing for the mediating variable. Nonetheless, I notice that the analysis seems to be based on the full sample of 178 as this is the sample size indicated. The ESTIMATOR = MLR and INTEGRATION = MONTECARLO. Are the missing variables being imputed somehow? And how reliable are these results with about 1/3 of the data missing? can I just use this procedure for an empirical publication?
Missing variables are not being imputed in the Mplus ML estimation. The analysis uses the "MAR" assumption of the classic Little & Rubin (2002) missing data book. MAR uses all available information such as individuals with scores on only x and y, not m. The sample size listed is the total number of subjects who contribute to the analysis.
1/3 of the data missing is a lot and means that there can be many reasons that MAR does not hold. This amount of missingness would/should raise questions by a journal and needs to be discussed. As a first step you should see if the means and variances of y and x are different for those with data on m versus the others. And you should ask yourself if the values of x, or y, are related to the missingness on m. It's a big topic, but there is good literature - see also overviews in Psych Methods by Graham, Schafer and others.
A colleague and I have computed a mediator model with continuous exogenous variables, mixed (i.e., both binary and continuous) mediators and a dichotomous outcome using Mplus v.5.2. We applied a WLSMV estimation in order to get estimators for indirect effects. A reviewer asks us to explain why we recurred to a probit instead of a logit link.
Two questions: 1. From my understanding of the literature (e.g., the mentioned MacKinnon et al. 2007 paper), computation of indirect effects is basically feasible and valid for both logit and probit regressions. Do you agree? 2. In preparation of our reply to the reviewer, we are wondering why Mplus calculates indirect effects exclusively for WLS but not for ML estimation. Are there computational or technical reasons for this?
Thank you so much in advance, Oliver Arránz Becker
In principle, indirect effects are feasible for both probit and logistic regression. In Mplus, indirect effects are computed for weighted least squares probit models. The reason is that the mediator is treated as a u* latent response variable when it is a dependent variable and when it is an independent variable. This makes the computation of an indirect effect correct.
In Mplus, when maximum likelihood is used for probit and logistic regression, the mediator is treated as u* when it is a dependent variable and u when it is an independent variable making it incorrect in Mplus to create an indirect effect.
We have a recursive pathmodel (all observed) with a dichotomous dependent variable, the IV's are continuous or dummycoded and we have one mediating variable that we can operationalise as either continuous or ordinal (preferably the later).
a) We use the WLSMV-estimator. If we choose a ordinal mediator we then have all probit coefficients, if we choose a continuous mediator, we have a mix of OLS and probit coefficients.
b) The probit coefficients in the pathmodel refer to the latent continuous variables, so it is possible to apply standard pathmodeling techniques. I.e. (in)direct effects, multiplying coefficients, ... Is this also the case if we have a mix of OLS and probit coefficients?
c) As there are no latent variables, we can ignore the residual variance and convert the probit coefficients for total, direct and indirect effects to probabilities (if we set all the variables in the pathmodel to a value). This is possible for both the model with the continuous mediator (intercept), as for the one with the latent continuous mediator (by using the thresholds).
d) We can preform mediation analysis using the output of the INDIRECT-statement, both if the mediator is manifest continuous or latent continuous (ordinal).
Thanks for the response. A follow-up regarding standardization: my understanding from the UG is that I should use StdY standardization for my dummy-coded IV's and StdYX for continuous IV's (with a dichotomous DV).
Unfortunately I get a warning that StdY is not available when using WLSMV and categorical outcomes. Why is this/how should I handle it ("destandardize" StdYX by dividing by sd(x))?
Thanks again. Hopefully a last followup question: for presenting results, I would like to show the standardized total, direct and indirect effects together with (bootstrapped) CI's around them.
I can do this for the StdYX-standardized total & (in)direct effects, but what with StdY? Is it possible to "destandardize" here also, and how should I handle the CI around the resulting StdY total or (in)direct effect?
although i am not doing the same type of analysis, i thought i would post here because i have a similar situation to maarten in which stdy is not being printed (although i am not sure why and i do not get an error message; it just doesn't print when requested). perhaps it is related to using WLSMV with multiple groups?
in any case, i want to make sure i understand how to destandardize. i understand that i want to divide by the standard deviation of my predictors--does that mean the standard deviation of each predictor to get its own proper STDY standardization? when mplus does not print the variance in the analysis (e.g., for categorical predictors) does this mean finding the variance/SD through some other means and then computing?
so, for example, a binary predictor has a "variance" of .07 and a "standard deviation" of .272. its STDYX for predicting outcome is -.11, so its STDY is -.11/.272, or about -.40. is that correct?
apologies if this question seems quite basic, but i want to make absolutely sure i understand what to do here. thanks in advance for any help!
thanks, linda, you are absolutely correct. the sample statistics for each group are there; i just didn't see them at first.
Kesinee posted on Tuesday, August 16, 2011 - 1:14 pm
Dear all, I ran a path analysis (all observed variables) for either ordinal or binary outcome with ML (logit coefficient). All mediators are continuous variables. Independent variable (X) has 4 categorized. My questions 1) Do I have to report standardized coefficients between X and Ms, if so what type of standardized coefficients. I understand that with Ms and Y, StdYX should be used, is it correct? 2) Which model constraint options obtained for indirect effect, is there standardized or unstandrdized product? Thank you for your help.
I tried to obtain bootstrap estimates for the indirect effect in a mediation model where x and m are continuous and y is dichotomous. Under WLSMV, my bootstrap standard errors are always very very high, and do not compare at all with those obtained with the Model indirect command in mplus. I am not talking about small discrepancies. Ex. Est/S.E = 3.00 with Model Indirect and 1.38 with bootstrap estimates. My model contains covariates. I was able to replicate this problem with two different models using different samples and using either a sem or a path analysis.
Is there any known issue with using bootstrapping with WLSMV in Mplus? Or am I doing something wrong?
An example of the syntax that I use: TITLE: Mediation with dichotomous y variable DATA: FILE IS data1.dat; VARIABLE: NAMES ARE x1 x2 c1 c2 m1 d1; USEVARIABLES ARE x1 x2 c1 c2 m1 d1; CATEGORICAL ARE d1; ANALYSIS: BOOTSTRAP 5000; MODEL: d1 ON m c1 c2 x1 x2; m ON x1 x2 c1 c2; MODEL INDIRECT: d1 IND m x1; OUTPUT: standardized; CINTERVAL(BCBOOTSTRAP);
Dear all, I have a recursive path model with a dichotomous x variable, with two dichotomous mediators (u1 & u2), and a continuous outcome (y). I've asked Mplus v6.1 for indirect effects, using WLSMV as the estimator. For example:
MODEL: Y u1 u2 ON X ; Y ON u1 u2 ; MODEL INDIRECT: Y IND X ;
My question is: 1) Are the indirect effects the product of the probit coefficient and the ordinary regression coefficient?
2) If so, is Mplus automatically rescaling the probit and ordinary regression coefficients to be the same scale?
If you can't use MODEL INDIRECT and must define your indirect effect in MODEL CONSTRAINT, you can also standardize it in MODEL CONSTRAINT by multiplying by the standard deviation of the covariate and dividing it by the standard deviation of the final outcome.
Thach D Tran posted on Tuesday, September 06, 2011 - 4:28 pm
I would like to clarify my understanding of indirect coefficients in case of binary outcomes. I have run a mediation model of a binary outcome (Y). The interested independent variable (X) is a binary variable too but the mediator is a continuous latent variable (M). I want to measure the effect of X on Y via M. The WLSMV estimator was used with MODEL INDIRECT: Y IND X;
In the output:
Thresholds Estimate S.E. Est./S.E. P-Value
Y$1 1.620 0.919 1.763 0.078
Effects from X to Y
Estimate S.E. Est./S.E. P-Value
Total 1.391 0.504 2.760 0.006
My questions are: 1. Is 1.391 the profit coefficient? 2. If yes, can I use the below formulas: Prob (Y=1|X=1) = f(-1.62 + 1.391*1)= .409 Prob (Y=1|X=0) = f(-1.62 + 1.391*0)= .0526
So that I can interpret that the probability of Y=1 when X=0 is .0526 and when X=1 is .409
I have the following model: A binary dependent variable, several independent variables, interactions between the independent variables (all have impacts on the mdeiators) and latent multiple mediators (all continuous).
I read the paper Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. However, I am not sure, if this approach is applicable for my (more complex) model. I have the following concerns:
1) All the examples in the paper refer to a single mediator. Does the approach also apply to MULTIPLE MEDIATORS? (I also read the paper of Imai et al. 2010, which say that future research should extend the approach to multiple mediators. Has this been done already?)
2) I need to estimate a logit model, but I have multiple latent mediators. On page 25 of the paper, Bengt mentions that the latent mediator approach using logistic regression is not available in MPlus. Do I understand right that you NEVER can compute indirect effects with LOGISTIC regressions in MPlus in the case of LATENT mediators – neither with this new approach?
3) Does this approach also apply to multiple independent variables, which interact with each other and in the case that ALL independent variables and their interactions have an impact on ALL mediators?
1) As I understand it, these formulas apply also here. When mediators influence each other, the general formula needs special explication, but not when they don't.
2) Page 25 talks about a latent mediator behind a categorical observed variable. In contrast, there is no extra difficulty if your mediators are latent factors measured by multiple indicators.
3) Yes, but the results you get are for one IV conditional on some values of the others.
Hoon Lee posted on Tuesday, February 07, 2012 - 8:41 am
I am running a similar model posted at the top of this page, but the DV is ordinal (IV and mediator are continuous). I have two specific questions.
1. I am wondering if I can use the negative log-log link function (lower cases are more likely) instead of the logit or probit function in Mplus. If not, is there another fix for an ordinal dependent variable with an unevenly greater number of lower cases?
2. Can use a bootstrap method to estimate an indirect effect, even with different types of models (IV to mediator: continuous to continuous; Mediator to DV: continuous to ordinal).
I am not sure, if I understood the question of standardization correctly. For the case of a mediation model, in which the IV is categorical (dummy coded) (x), the mediators are latent continuous (m) and the DV is dichotomous (y):
Is it correct that I have to standardize the coefficients m ON x with StdY and y ON m with StdYX?
And one follow-up question (sorry, I am a bit confused about standardization): What about the indirect effects? Do I have to report the standardized coefficients, too? If yes, which one (Std gives me the same results as under the raw coefficients)?
Hi, I am running a mediation model. My mediation variable and dependent variable are both dichotomous, and my independent variable is continuous. I have 3 questions which I was hoping you could help with; 1.Is it Ok to run the INDIRECT command when the independent variable is continuous, and the mediation effect and the dependent variables are dichotomous 2.can I use the estimated indirect effect and confidence intervals that are produced from the indirect command? 3. If no to Q2 what do I need to do to get a reliable measure of the indirect effect in this case? Thank you very much Lorraine
See the following paper which is available on the website:
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
WEIHAI ZHAN posted on Wednesday, May 02, 2012 - 1:10 pm
Dear Drs. Muthen,
I am running mediation analysis with independent continuous variable X, dichotomized mediator M, and dichotomized outcome Y (n = 1146). M & Y were not rare. I used WLSMV estimator and Delta parameterization. The p-value for the regression coefficient of X was 0.385 in the model X -> M; p-value for the regression coefficient of M was 0.000 in the model M -> Y; and the p-value for the indirect effect of X -> M ->Y was 0.39, non-significant.
However, when I used ML estimator for path analysis (ML seems cannot be used in modeling indirect effect), the p-value for the regression coefficient of X was 0.000 in the model X -> M; p-value for the regression coefficient of M was 0.000 in the model M -> Y. So I sense a significant indirect effect of X -> M ->Y if ML estimator could be used in the mediation analysis.
Do you have any suggestions to resolve this discrepancy?
The difference you find is because with WLSMV m*, the latent response variable underlying m, is used whereas with ML m is used. See the following paper which is available on the website for further information:
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
Thanks. It seems that the appendix Section was not included in the paper you mentioned. For example, in page 28, you wrote, "The Mplus inputs are shown in the addpendix Section 14.5". However, I could not find it. May I know where I can find the "addpendix Section 14.5"? Thanks in advance.
I realized that the paper (Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus) contains only 110 pages @ http://www.statmodel.com/download/causalmediation.pdf The whole paper may actually contain more than 143 pages based on information at page 86. So is it possible for me to get a complete version of this paper? Thank you very much.
I am using mediational analysis with a categorical dependent variable with three categories. Should I just specify that the dependent variable as categorical or do I have to generate two dummy dependent variables and include both of them in the SEM? If I need to generate two dummy variables, should I specify the correlation between the two dummy dependent variables as 0?
ema budahab posted on Monday, August 27, 2012 - 6:59 am
I have a question about using regression in mediation. i have two catogrs vari, and interval vari as independent vari , one mediate is interval, one depent is inteval , so i used 4 stap's keeney and Baron after run regrssion that i did not use catogories vari in test mediation coz i can not use in regreasion after coding them as 1,2 ,3 not as dummy. in addition one of catogo vat as demographics i dropped after result anova found it that it has not relationship with dependent variable in first hypothesis so how can solve my mistake after i am waiting on viva ? is there any theory can you recomened me to read it to justify these mistakes in my viva?
I have still a question regarding transforming the probit coefficients into probabilities:
I understood that I could use the standardized coefficients as well (STDYX because all my IV's and mediators are continous) to calculate the probabilities. In one previous post you mentioned that then I should standardize the IV's and mediators as well. Do you mean z-standardize? Then all the variables would have a mean of zero and a sd of 1, which would make the calculations much easier...
(In my output I cannot find an unstandardized solution, there ist only STD and STDYX...)
I read the explanation in chapter 14. I only want to be sure that I am doing the right thing:
Usually I can find the raw coefficients in the first table (MODEL RESULTS) of the output.
But, I found that the estimates in the first table (that are supposed to be the raw coefficients) are exactly the same as the STD estimates in the following table. Therefore I thought that I do not get any real "raw" coefficients in my output.
Nancy Hood posted on Thursday, August 30, 2012 - 7:23 am
Hello, I am interested in calculating probabilities from a probit regression model with a 3-level ordinal mediator (and a 3-level ordinal outcome) using WLSMV estimation. I'm not sure what values to plug in to the probability equation for the mediator variable (M) to obtain the following interpretation(s): "the probability that Y=0 given that M = 0 (or 1 or 2) and X is at its mean is..." Can the threshold values for m* be used for this purpose? Would the probability for the middle category of M be the difference between the probabilities for the first and last categories of M?
Thanks in advance for your time!
Malki Stohl posted on Thursday, August 30, 2012 - 11:42 am
I am running a mediation model using dichotomous dependent, independent, and mediator variables. My data also has weighting so I used type=complex and could not use boot strapping. My problem is that sometimes when I run the model I get different model estimates. If I open and close Mplus sometimes the model estimates will change as well. The estimates are all similar (usually changes in the hundreth decimal place) but I can't figure out why this is happening or which estimates to use. Thanks.
I think you mean 3-category outcomes rather than 3-level. WLSMV uses the underlying continuous latent response variable m* as the predictor of the distal outcome y, not the ordinal m itself. To answer questions like yours, see
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
on our web site under Papers, Mediational Modeling.
We haven't seen this kind of behavior. You would have to send us 2 outputs to show us. Be sure to use Version 6.12. Because you have a binary mediator and distal outcome, you should also see the paper I mention above. If hard to understand, contact the stat consulting at your Univ.
b1 would be an OLS regression coefficient from a regression from the IV on M1 b2 would be an OLS regression coefficient from another regression from the M1 on M2 and b3 would be a probit coefficient from a third regression from the M2 on the dichotomous DV. Thus, using the formula above, I would add OLS coefficients and probits and not multiplicate them. Therefore the result would not be a probit coefficient...
Obviously it is only possible to calculate probabilities for the total effect and not for the single specific indirect effects.
Is that right?
Is there another way to calculate the probability for the DV to be 1 for different values of the IV, M1 and M2 based on the calculation of the specific indirect effects? Because, when I use the formula above for the total effect, I can only calculate probabilities for different values of the IV...
This forum has been super helpful so far! I have one follow up question.
I have run a mediation model with two continuous predictors and a dichotomous outcome. Now I am trying to test moderation of the mediation model (moderator = male, female). I attempted to determine which model better fit the data by running two models. In the first, I specified one class (everyone) with the KNOWNCLASS option, and in the second I specified two separate classes with the KNOWNCLASS option (male and female). I then compared the bayesian estimates to determine which was the better fitting model. Since the model with two classes fit better, I used the GROUPING option to assess whether the indirect paths and bootstrap confidence intervals are significant for both males and females. Does this seem like the right approach? Or is it inappropriate to use Bayesian estimates in this way?
I'm not sure what you mean about using the GROUPING option. It is not available with Bayes.
Tchiki Davis posted on Friday, September 14, 2012 - 10:42 am
Thank you for such a quick response! Let me clarify.
First I used the Bayes estimates to determine which model was a better fit. Since the model with 2 classes fit better, I wanted to determine whether the indirect effects and bootstrap confidence intervals were significant for both boys and girls. Since bootstrap is not allowed with ALGORITHM=INTEGRATION and Bayes estimates are only given for ALGORITHM=INTEGRATION, I wrote new syntax using the GROUPING option instead of the KNOWNCLASS option to follow up on why the two class model was better. So I guess I have two questions: 1. Are Bayes estimates appropriate for determining whether a model with two classes specified is better than a model with one class specified? 2. Does it seem right to assess indirect effects and bootstrap confidence intervals for each gender using the GROUPING option?
Sara posted on Saturday, January 05, 2013 - 6:54 pm
Hi Drs. Muthen,
Thank you for this forum--it has been very helpful for me in terms of learning how to run and interpret various analyses.
I am testing a mediational model with a categorical DV and would like to report standardized estimates. My output provides S.E. and p-value estimates for the unstandardized estimates, but not for the standardized estimates. Additionally, my output also does not provide a p-value for the R^2 values. Is there a way to obtain this information?
It sounds like you are using the WLSMV estimator. You can try ML or Bayes.
Leslie Roos posted on Friday, February 01, 2013 - 11:16 pm
Thank you again in advance, for your continued advice. I am testing a mediation question with binary IV, mediator, and DV using a path analysis with multiple (continuous and categorical) covariates in a complex dataset (strat, cluster & weight).
I have successfully run the analyses using model & model constraint for the indirect effect, and have determined there is a significant partial mediation as indicated by the p-values, log odds, and CIs. but am now stuck on
(1)if it is possible to appropriately determine the path coefficients for binary variables?
A paper with a similar design (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1447389/) reports calculating path coefficients using tetrachoric correlations, but I have been unable to figure out how to produce these with the complex sampling design adjusted for covariates.
(2) Is this possible & could you direct to appropriate analysis design?
(3) Would the 'slopes' produced by the following syntax equate to the coefficients?
With binary mediator and DV, you should use the approach described in
Muthen, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
which is on our web site with Mplus scripts.
Leslie Roos posted on Saturday, February 02, 2013 - 10:18 am
Thank you for the direction to advice and the scripts -- They are a wonderful resource!
I've run into trouble using the 2 options described in the manuscript for binary mediator & outcome (Monte Carlo & Bayes) as neither seem to be permitted with Complex weight, stratification, cluster variables?
While I believe I am getting reliable Log Odds Ratios and CIs with the syntax above, I was attempting to get path coefficients -- the only paper that I find that does this uses the tetrachoric correlations, and I was attempting to replicate those through the syntax above, but wasn't sure if the "model estimated slopes" were appropriate as path coefficients here?
Thank you again for your time in advising and suggestions!
Bayes does not handle complex survey features, but the ML approach I show does.
You get path coefficients - they are the Model estimated slopes. But the indirect effect is not a*b as in the continuous variable case.
Leslie Roos posted on Saturday, February 02, 2013 - 10:53 am
I see -- thank you! So it is appropriate to utilize the slopes for specific observed variable relationships (M on X, Y on X, Y on M), but not to create an indirect effect path as in continuous mediator variable analyses?
It seems this is due to the non linear nature of logistic functions as well as the Mediator being treated as continuous in one regression & categorical in the other?
lamjas posted on Tuesday, September 03, 2013 - 1:09 am
Hi Dr. Muthen,
My model has one binary outcome (observed), one binary mediator (observed), one continuous mediator (latent), two independent variables (latent continuous), and two covariates (age and gender).
The syntax of the model as follow:
Variable: Usevariables are age gender clm1-clm5 iv1-iv5 y binm;
Categorical are y binm;
Analysis: Estimator is WLSMV;
Model: clm by clm1-clm5; !latent continuous mediator iv1 by iv1-iv5; !latent continuous IV iv2 by iv6-iv6; !latent continuous IV
y on binm clm iv1 iv2; binm clm on iv1 iv2; Y binm clm iv1 iv2 on gender age;
I read several discussion threads and the "Causal mediation" paper. I want to confirm the following issues:
(1) Should I report unstandardized or standardized coefficients? If I should report standardized coefficients, I understand that I should use StdYX if exogenous variables are continuous. Then, how about the coefficients of Y on binm (both are binary items)?
(2) For indirect effects, is it appropriate to use "Model Indirect" function to obtain when the model has both binary and continuous mediators with a binary outcome?
(3) Is is necessary to use bootstrap methodology to determine the significance of mediation when binary mediator is in the model?
1. It is your choice which coefficients to report. Use StdY.
2. Yes with WLSMV.
3. Not necessary but you can.
ywang posted on Tuesday, September 03, 2013 - 12:38 pm
This is a follow-up question for the above discussion. Is it possible to use bootstrap methodology with Mplus 7 to determine the signficance of mediation for the mediated discrete-time survival analyses (y1-y5 are categorical outcome variables, x--M-y1-5)? I remember that I cannot do it with earlier version.
Please send the output of your earlier analysis and your license number to email@example.com so we can see why bootstrap is not allowed.
jiesi guo posted on Wednesday, September 25, 2013 - 5:45 am
Hi Dr. Muthen,
My model has one binary outcome Y (observed), one categorical mediator M (4 cates, 0-4), two independent variables A & B(two latent continuous), and one covariate (age). I also use used the latent moderated structural (LMS) equation approach to model the latent interaction between A and B Y The syntax of the model as follow: Variable: Usevariables are age a1-a5 b1-b5 Y M; Categorical are Y M; define: standardize age; ANALYSIS: estimator=MLR; TYPE=RANDOM; link = probit; Algorithm = integration; INTEGRATION = MONTECARLO; Model: A by a1-a5; !latent continuous IV B by b1-b5; !latent continuous IV Y on M A B; M on A B; bXa | A xwith B; Y M on bXa; Y M A B on age; I use prob (y=1) = f (-threshold + b1*x1 + b2*x2 ...) to calculate probability of indirect effect for combinations of values of A and M (e.g. A = -1sd, M = 1, A=1sd, M = 3 and so on...). I also calculate the probability of direct effect of A/B on M P(Y=1|x)=F(t1-b1*x1) P(Y=2|x)=F(t2 - b1*x1 ) - F(t1 - b1*x1) P(Y=3|x)=F(t3 - b1*x1 ) - F(t2 - b1*x1) P(Y=4|x)=F(-t3+b1*x1) (1) is this correct (syntax & calculation)?
jiesi guo posted on Wednesday, September 25, 2013 - 5:46 am
follow-up questions (2)however, can I use this formula to calculate the interaction effect between A and B on M e.g, P(Y=1|x)= F(t1-b1*A-b2*B-b3*A*B) P(Y=2|x)=F(t2-b1*A-b2*B-b3*A*B)-F(t1-b1*A-b2*B-b3*A*B) and so on.
(3)if I standardize age, do I need to include the control variable because the mean of age become zero.
Indirect effects are not computed for different values of the mediator. They are computed for different values of the observed exogenous variable.
Mplus Discussion posts are limited to 1500 characters. In the future, keep your posts within that limit.
jiesi guo posted on Wednesday, September 25, 2013 - 4:22 pm
Dr. Muthen, Thank you for your quick feedback. sorry for the previous posts. if my understanding is correct, prob (y=1) = f (-threshold + b1*x1 + b2*x2 ...)is used to calculate probability of indirect effects A->M->Y. b1 is path coefficient of A->M b2 is path coefficient of M->Y x1 & x2 can be combinations of values for A and M (e.g. A = -1sd, M = 1, A=1sd, M = 3 and so on...)
(2)regarding direct effect, can I use following formula to calculate the interaction effect between A and B on M e.g, P(M=1|A,B)= F(t1-b1*A-b2*B-b3*A*B) P(M=2|A,B)=F(t2-b1*A-b2*B-b3*A*B)-F(t1-b1*A-b2*B-b3*A*B) P(M=3|A,B)=F(t3-b1*A-b2*B-b3*A*B)-F(t2-b1*A-b2*B-b3*A*B)and so on.
(3)if I standardize age (covariate), do I need to include the control variable because the mean of age become zero.
So I don't understand your formulas. Also, with a binary Y, the probability is not:
P(Y=1|x) = f(-threshold + a*b*x)
because you need to include the effect of the M residual e1. And, furthermore, indirect effects in terms of probabilities should be expressed as in my paper:
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
look for instance at section 6.1.
jiesi G posted on Friday, September 27, 2013 - 7:11 am
Hi - thank you for your response. if the effect of the residual e1 is included. indirect effect: P(Y=1|x) = f(-threshold + a*b*x+b*e1) direct effect: P(Y=1|x) = f(-threshold +c*x) (1) it this correct?
(2)in my model two IVs (latent variables) with their interaction using MLS approach are included. I have a moderated mediation. M=a1*X+a2*W+a3*XW+e1 Y=b*M+c1*X+c2*W+c3*XW+e2 the conditional indirect effect of X on Y is b(a1+a3W) Can I calculate probability for this indirect effect?
(1) No, this is not correct. The e1 term is not an observed variable but an unobserved residual that needs to be integrated out. This is what is discussed in Section 6.1 that I referred to.
(2)-(3). Number (1) needs to be clear first.
jiesi G posted on Friday, September 27, 2013 - 8:32 pm
Thank you for your quick response. I read Section 6.1 carefully and understood to use the causal indirect effect ¦µ[probit(1, 1)]-¦µ[probit(1, 0)] to calculate the probability difference.
(2) with regard to moderated mediation that I posted previously, Y=¦Â0+ ¦Â1*M+ ¦Â2*X+ ¦Â3*W+ ¦Â4*XW + e1 M= ¦Ã0+ ¦Ã1*X+ ¦Ã2*W+ ¦Ã3*XW+e2= ¦Ã0+ ¦Ã2*W+(¦Ã1+¦Ã3*W)*X Calculate the causal conditional indirect effect of X on Y (i.e., ¦Â1*(¦Ã1+¦Ã3*W))
Your notation came out garbled - it looks like you have the same slope notation for both the Y and the M equation.
jiesi G posted on Saturday, September 28, 2013 - 1:57 pm
Sorry for that. The formula works well on my laptop. Thank you for your quick response. I read Section 6.1 carefully and understood to use the causal indirect effect (cumulative normal distribution [probit(1,1)]-[probit(1,0)]) to calculate the probability difference.
(2) with regard to moderated mediation that I posted previously, Y=b0+ b1*M+ b2*X+ b3*W+ b4*XW + e1 M= a0+a1*X+a2*W+a3*XW+e2=a0+ a2*W+(a1+ a3*W)*X+e2 I am thinking of calculating the causal conditional indirect effect of X on Y (i.e., b1*(a1+a3*W))
Your notation is different from (1) and (2) in my paper. If you change your notation it will be easier for you to see what's wrong.
Note that in my eqn (1), beta3 is the slope for the interaction term x*m. You have the interaction term x*w - again, I would suggest changing to my notation. More importantly, your sqrt(variance) expression is missing the counterpart to my beta3 which you see in my eqn (23).
So changing your notation, you should be able to follow my formulas in (22) and (23).
jiesi G posted on Sunday, September 29, 2013 - 3:16 am
Thank you for your quick response. In my model, two IVs (latent variables; X,Z) and with their interaction (XZ) using MLS approach are included. We have a moderation effect is mediated by M. it was the same as Figure 15 in your paper.
Direct effect: X, Z and XZ predict Y (outcome) Indirect effect: X, Z and XZ predict M, and M predicts Y
(1) Can I calculate probability difference with and without the conditional indirect effect of X on Y (i.e., TIE)? (2)If Yes, I think sqrt(variance) is still sqrt(b1^2* sig2+1) because of no interaction between IVs and M. I try to write it down. probit(1, 1) = [b0 + b1*a0+b2 +b3*Z +b1*r2*Z+ b1*( a1+ a3*Z)/sqrt(b1^2* sig2+1) probit(1, 0) =[ b0 + b1*a0+b2 +b3*Z +b1*r2*Z)/sqrt(b1^2* sig2+1)
jiesi G posted on Sunday, September 29, 2013 - 3:19 am
sorry. I forget to change my notation. probit(1, 1) = [b0 + b1*r0+b2 +b3*Z +b1*r2*Z+ b1*( r1+ r3*Z)/sqrt(b1^2* sig2+1) probit(1, 0) =[ b0 + b1*r0+b2 +b3*Z +b1*r2*Z)/sqrt(b1^2* sig2+1)
Looks like you have the formulas right, taking into account that your Z is my c and your b3 is not my beta3, and that you have an interaction among your covariates and I have an interaction between one covariate and the mediator.
You can compute both TIE and PIE according to the paper.
I am running a model with a dichotomous independent variable, 6 dichotomous mediators that then predict 1 continuous mediator, and a continuous dependent variable.
1) Is it okay to use ML estimation when some of the mediators are dichotomous? Do I have to specify in the model syntax that some of the mediators are categorical, or is this type of syntax only necessary when the DV is categorical?
2) In running a mediation/path model such as the one I described, I'm getting quite different results when I use bootstrapping vs. not with WLSMV as an estimator. The results using bootstrapping or not do not differ when I use ML without specifying that some of my mediators are dichotomous. I'm not sure how to interpret this or which model is accurate to run. Is running a bias-corrected bootstrap a must?
3) Most of the direct effects in my model are significant, but the vast majority of specific indirect effects are insignificant. I'm having a hard time interpreting why indirect effects would be so consistently insignificant when the direct effects that comprise this indirect path are significant?
With binary mediators you need to use WLSMV in order for the usual indirect effects to be meaningful. The mediators are DVs and therefore need to be specified as categorical. You don't need bootstrap unless your sample is small. Indirect effects are insignificant if the mediators are not the right ones.
Jetty posted on Friday, November 08, 2013 - 1:08 pm
Bengt and Linda, I would really appreciate your help with my analysis. I have 1 focal continuous predictor, 4 continuous mediators, and 1 ordered categorical outcome. All variables are observed. I am trying to decide on best way to test mediation and the most intuitive way of showing the results. I decided on INDIRECT and bootstrap, so it necessitated WLSMV estimator. But I just realized that the obtained coefficients are probit, not logit, so I lose the interpretability of ORs. Comparing the estimates, SE, and associated p values between model run with WLSMV and model run with MLR show very little differences, and nothing fluctuates in or out of significance.
I have two questions:
1. Is it acceptable to report the MLR results for the path analysis in order to get ORs, but the WLSMV model when describing the mediators? Alternatively, I used model constraints with MLR to do a Sobel test, but I'd rather stick with bootstrapping.
2. Which fit indices would you recommend I present if I go the MLR route? For WLSMV, I was thinking Chi sq, RSMEA, and CFI.
2. There are no absolute fit indices with maximum likelihood and categorical variables. Nested models can be tested using -2 times the loglikelihood difference which is distributed as chi-square.
Jetty posted on Friday, November 08, 2013 - 3:03 pm
Thanks, Linda. I will go with WLSMV but I have trouble finding examples in the literature. Would it be to sufficient to report the probit coefficients and interpret them in a general way (e.g, positive coefficient means that an increase in the predictor leads to an increase in the predicted probability of Y). I could calculate predicted probabilities but since they depend on the values of other predictors in the model as well as the thresholds, I am not sure how many predicted probabilities I need to calculate. How are the predicted probabilities interpreted when they refer to indirect effects?
WLSMV is a good choice here. Your indirect effect a*b refers to the continuous latent response variable behind your ordinal outcome. I wouldn't worry about probabilities, but if you insist you need to read about "causal effects" in either Valeri-VanderWeele (2013) in Psych Methods or in
Jetty posted on Tuesday, November 12, 2013 - 6:08 pm
Thanks, Bengt. I have another question. As I mentioned, I am running a simple path analysis, with 1 focal continuous predictor, 4 continuous mediators,4 covariates, and 1 ordered categorical outcome. All variables are observed. I decided on INDIRECT and bootstrap in order to test mediation. I just realized that I am getting different estimates when I include the MODEL INDIRECT command, in addition to MODEL. Specifically, when I am not testing mediation via the indirect command, my syntax looks like this:
ANALYSIS: bootstrap=50000; Model: Y on X M1 M2 M3 M4 C1 C2 C3 C4; output: CINTERVAL (BCBOOTSTRAP);
Here's when I include MODEL INDIRECT.
ANALYSIS: bootstrap=50000; Model: M1 M2 M3 M4 on X C1 C2 C3 C4; Y on X M1 M2 M3 M4 C1 C2 C3 C4; MODEL INDIRECT: Y IND M1 X; Y IND M2 X; Y IND M3 X; Y IND M4 X; output: CINTERVAL (BCBOOTSTRAP);
Why do my estimates for the relationships between Y and all covariates and mediators differ between the two models? The differences are greater than what I see between different bootstrap draws. Thank you!
I am testing a path model where I have a dichotomous IV predicting 5 mediators (4 dichotomous and 1 continuous), these predict 1 continuous variable, which predicts the continuous DV. The dichotomous IV also directly predicts the continuous DV.
(1) The direct relationship between the IV and DV is significant and there are also multiple significant indirect pathways between the two. I want to know the degree to which the relationship between the IV and DV is reduced after considering all the indirect effects/mediators. Is there a way to ask MPLUS for this information? My initial thought was to run the IV on the DV without any of the mediators, see the standardized estimate and R square on the DV, and that would give me an idea of how much the standardized estimate is reduced--but I'm not sure if there's a better way to do this?
(2) The model uses the WLSMV estimator. I am not getting any p-values associated with the standardized estimates, although p-values do appear for the unstandardized estimates. Is there a way for me to ask for the p-values associated with the standardized estimates? Can I report the standardized estimates and along with them the associated p-values for the associated unstandardized estimates?
(3) My DV has a high level of kurtosis and is also skewed. Does the WLSMV estimator correct for this or is it robust enough? Is there another estimator I should use?
Thank you so much for your response. To follow-up:
1) Regarding the direct effect without mediators (c), the direct effect with mediators (c'), and a comparison of the two, do you have the specific reference to the MacKinnon book? Is there a way to get those particular statisitcs directly from MPLUS?
2) Is it acceptable to report standardized estimates and the p-values from the unstandardized estimates (assuming I do not have access to Version 7.2 to get the standardized p-values)?
1. The reference is in the user's guide. There is no way to get these values directly from Mplus.
2. No. You will need to wait for Version 7.2 if you want these.
jiesi G posted on Saturday, November 16, 2013 - 3:20 am
I am running a SEM model with continuous predictors and binary outcomes, including a latent interaction. I use MLR estimator and link = probit.The models (with and without latent interaction) did not report related fit statistics.
I conducted a Chi-Square difference test (TRd) for the nested models (with and without latent interaction) using the loglikelihood. p-value is <.001, BIC and AIC became smaller when including the latent interaction. It seems like goodness of fit is getting better.
I am writing up a paper for the analysis, I want to report some fit statistics (e.g., CFI, TLI). Can I compute them based on these models?
With TYPE=RANDOM and XWITH, means, variances, and covariances are not sufficient statistics for model estimation. Chi-square and related fit statistics are not available. In this case nested models can be compared using -2 times the loglikelihood difference which is distributed as chi-square.
jiesi G posted on Sunday, December 15, 2013 - 4:49 am
Hello, Linda, I have a simple mediation model where X is continuous latent variables, M is a continuous mediator, Y is a binary outcome variable. I use MLR estimator with link=probit. X->M is MLR regression coefficient (a) M->Y is probit regression coefficient (b)
(1) Can I just use MODEL CONSTRAINT to calculate the indirect effect a*b ?
(2) Is it necessary to use your method that introduced in "causal mediation" paper to calculate the indirect effect by including the effect of mediator M residual variance ?
(3)If so, when I have two continuous mediators and one independent variable X1->X2->M->Y in which only Y is binary outcome variable, how can I calculate the indirect effect from X1 to Y including the effect of residual variance of continuous mediators X2 and M?
I’m fairly new to mediation analysis (as are others in my field), so I’m not sure whether the model I’d like to run makes sense (or is doable)? The Mplus syntax of my model is shown below. What complicates things is that I have continuous (Y1), count (neg.bin. owing to many zeroes and skewed distribution, Y3-Y4) and binary (Y2) mediators as well as dependent variables. I also have several covariates influencing these mediators/dependent variables and I’m not sure how that affects to the calculations of indirect effects. Moreover, I have clustering to account for clustering within families.
VARIABLE: CLUSTER IS clus; CATEGORICAL IS y2; COUNT ARE y3 y4 (nb); ANALYSIS: ESTIMATOR = MLR; INTEGRATION=MONTECARLO(5000); TYPE = COMPLEX; MODEL: y1 ON x; y2 ON x y1; y3 ON x y1 y2; y4 ON x y1 y3; y1-y4 ON cov1-cov10;
Is it sensible to estimate this model and look for how the effect of x on Y4 is potentially mediated by Y1-Y3?
This is a complex setting that requires several considerations to be meaningful. Usually we don't give general analysis advice, but this setting has some novel features requiring new tools.
You probably want to approach the analysis in small steps, analyzing parts of the model first. Here is what comes to mind:
For the part of the model that ends with y2 (so including covariates, x, y1, and y2) you have a binary final outcome y2. This alone calls for special mediation modeling in line with "causal inference" based on "counterfactuals". I have summarized the issues in my paper on our website:
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Click here to view the Technical appendix that goes with this paper and click here for the Mplus input appendix. Click here to view Mplus inputs, data, and outputs used in this paper.
There is also a simpler, shorter version of this paper under review that goes with a forthcoming Mplus version.
Consider next the part of the model that ends with the count variable y3. The y3 variable is influenced by the binary y2 and my paper discusses the special issues that arise with a mediator (y2) that is binary. Then you have the issue of mediation with a count variable as the final DV; also discussed in my paper.
Next, you have the full model with a final count variable (y4). Here you have to also consider what it means to have a mediating count variable (y3). My paper discusses that.
To simplify I would recommend a series of models where you have only one mediator:
I think I'll skip the causal-assumptions from my model for now... However, one thing about my data troubles me: the binary DV (Y2) is whether an individual married or not, the first count variable (Y3) is how many offspring the individual had (obviously zero if not married) and the second count variable (Y4) is how many of those offspring born survived to adulthood (obviously zero if no offspring were born). Since Y3 and Y4 have many zeroes, I have fitted ZINB model for those outcomes (that fits better than NB only) and used Y2 to predict the zero-inflation part of Y3 and Y3 to predict the zero-inflation in Y4. Does this approach make sense or is there an alternative option in MPlus to handle those excess-zeroes in Y3 and Y4 in this setting?
I have two questions related to a SEM model using mediation with a binary dependent variable.
In my model, I have four dummy independent variables, one continuous latent mediation variable and one binary dependent variable. I have calculated indirect effects from all independent variables over the mediation variable on the dependent variable. I used the IND command for this in a PROBIT regression (estimation WLSMV).
My questions are the following: - What type of mediation does MPlus use in a probit model when using the IND statement? Is it the sobel test mediation? Or another type? - What type of standardization should I specify in order to get the correct standardized regression coefficients? For now, I only used the unstandardized b's, but I would prefer standardized coefficients. Should I use STDYX, STDY, STDX or STD? Or another statement?
Thank you very much in advance for helping me out.
MODEL INDIRECT uses Delta method standards errors. Sobel is a special case of Delta method standard errors.
With continuous covariates, use StdYX. With binary covariates, use StdY.
Kim posted on Wednesday, April 09, 2014 - 11:04 pm
Dear Professor Muthen,
Thank you very much for this usefull response. Still, I have one more question about the standardization. I have both continuous covariates (age) as binary covariates (dummies for education & marital status)in my model. What standardization should I prefer? STDYX or STDY? Or should I run the model twice, once with STDYX for the binary covariates and once with STDY for the continuous covariates? And does the type of mediation variable matter, as this is a latent factor (so continuous variable)?
Thank you again for your usefull response. Unfortunately, I cannot obtain the STDY standardization for the binary covariates in my model as I am using the WLSMV estimation to estimate a PROBIT model. In the MPlus users guide on p.643, it is stated that "For weighted least squares estimation when the model has covariates, STDY and standard errors for standardized estimates are not available".
So, I can only become the standardized regression coefficients for the continuous covariates and not for the binary covariates. Is there a solution for this problem? Should I use the STD standardized coefficients for the binary covariates? Or can I calculate the STDY standardized coefficients from the STDYX and STD coefficients?