Message/Author 

Anonymous posted on Tuesday, March 25, 2003  2:32 pm



I am testing a mediation model in which my x variable and mediator variable are continuous, but my Y variable is dichotomous. Is there something like a Sobel test that can be used to calculate the mediated effect in this situation? Thanks! 

bmuthen posted on Tuesday, March 25, 2003  2:39 pm



You can use the Sobel approach to test the mediated effect on the underlying y* variable for your ultimate, dichotomous outcome. You use the same approach as if y was continuous. The necessary variances and covariances for the estimates are found in Tech3. 


Bengt, Are you familiar with David MacKinnon's Psych Methods article from last year (issue 2)? He seems to find that the Sobel test is noticeably underpowered. 

bmuthen posted on Wednesday, March 26, 2003  10:36 am



Yes, I think that could be a concern with small sample sizes. 

Anonymous posted on Monday, November 15, 2004  7:19 pm



Hi  I ran a mediation model in which x and the mediator are continous latent variables with categorical indicators and y is an observed categorical outcome. I specified type = missing and parameterization is theta. The WLSMV estimator was used. I have three questions: 1. Is it correct to do mediation using the IND statement in Mplus with a categorical observed dependent variable? 2. If so, in interpreting the mediated effect in Mplus output, am I correct in thinking the probit link function, rather than the logit link function was used? 3. How do you interpret the parameter estimates Mplus outputs for the indirect and direct effects, given the probit function was used? The manual shows calculations to determine probabilities and odds ratios if the logit link function was used, but not the probit link function. Thank you! 


1. Yes. 2. The probit link is used with WLSMV. The logit link is used with maximum likelihood estimators in Mplus. 3. Any indirect effect that has a probit regression coefficient as part of it is interpreted as a probit regression coefficient. See Technical Appendix 1 for a description of how to calculate probabilities from a probit regression coefficient. 

Anonymous posted on Tuesday, November 30, 2004  2:17 pm



Hi  thank you for your response. I wanted to follow up to make sure I am not completely misunderstanding mediation with probit regression coefficients. I think I calculated the probabilties correctly, but I am having difficulty understanding what the probabilities for the direct and indirect effects mean. My independent variable is a continuous latent variable (DELINQ). My mediator is a continuous latent variable (EXPEC). My outcome is an observed binary variable (EVERALC). The model results are: Est S.E. Est/S.E. Std StdYX EXPEC ON DELINQ 0.760 0.014 54.235 0.616 0.616 EVERALC DELINQ 0.290 0.008 38.121 0.522 0.320 EXPEC 0.405 0.007 59.748 0.897 0.551 Specific Indirect DELINQ 0.308 0.006 52.165 0.553 0.339 The threshold for EVERALC is .167 Here is what I did to convert probits to probabilties. I used the formula: Normdist*(Threshold + B1(Mean on DELINQ)) Direct effect: Normdist (.167 + .290*0) = .43 What I want this .43 to mean is the probability of EVERALC=1 after controlling for EXPEC and when DELINQ is at its mean. I get confused here because I am not sure that EXPEC is at its mean? Does this matter? For the direct effect, if I calculate 1 standard deviation below the mean of DELINQ and 1 standard deviation above the mean, respectively, I get: .32 and .55. Now for the indirect effect: Normdist (.167+ .308*0) = .43 Mathematically, this makes sense that the answer is the same as the direct effect. But its confusing conceptually. Am I missing something? I again calculated 1 standard deviation below the mean and 1 standard deviation above the mean on DELINQ and got: .32 and .56. The model does not give me an estimate for the total effect. But can I assume the total effect is the direct effect plus the indirect effect? (.308 + .290 = .598). Thank you for your help! 

bmuthen posted on Tuesday, November 30, 2004  6:05 pm



There are a couple of issues here. First, you should ask for a standardized solution so that you get a printout of the residual variance for the "y*" variable behind Everalc (see tech app 1). This variance should be used in the calculations, dividing the Normdist argument with its standard deviation. Second, the indirect and direct effects are on the y* variable behind Everalc to make the analogy with the continuous dependent variable case. So, you are asking about the effect on this y* variable of a 1unit change in Delinq. This analogy is useful regarding your direct effect question, because with continuous dependent variables you would not ask at which value the mediator is held. Third, note that in terms of probabilities, because of the nonlinear probability curve, a certain change in y* has different impact on the probability change depending on where on the y* scale you are. This means that indirect and direct effects are not translatable into simple probability statements. Others may want to chime in. 

Anonymous posted on Wednesday, December 01, 2004  12:02 pm



Ok. Thanks. Is there something I can do with the probit regression coefficients for the direct and indirect effects that would make them understandable/interpretable for the average reader? 

bmuthen posted on Wednesday, December 01, 2004  3:43 pm



I think similar issues are studied by David McKinnon at ASU who might chime in later in this discussion  or you can contact him directly. 


Dear Bengt, Thanks for asking me to comment. Some of the problems with mediation in logistic and probit regression are described in MacKinnon and Dwyer (1993) Evaluation Review, 17, 144158 and also Winship and Mare (1983). American Journal of Sociology, 89, 54110. We have a paper under review now that more clearly describes the reason for the problem and a solution (MacKinnon, Lockwood, Brown, & Hoffman, 2004). In general, however, simulation studies of the delta method (a.k.a. Sobel) and related standard errors for the X to M path from OLS and M to Y (adjusted for X) path from logistic or probit regression suggest that this method works in the same way (at least in terms of power and Type I error rates)as for continuous variables. A extensive simulation study was presetned at the Society for Prevention Research 2003 meeting on this topic (MacKinnon, Yoon, and Lockwood, 2003). We are about to submit a paper on this. As mentioned by Patrick Malone, the delta method standard error test has lower power than a test based on the distribution of the product and resampling tests,for the categorical Y as well as the continuous Y. Note that Mplus now has the bootstrap confidence limits for asymmetric CLs, so you could use confidence limits from the Mplus program with the specification of Y as a categorical variable. It seems sensible to me that the IND statement will be accurate with the WLSMV estimator and categorical variables. I believe that the standard errors for these effects are based on the multivariate delta method. The question regarding the meaning of the probit coefficients is a good one and I don't have a complete answer. In the past, I have interpreted the coefficients as the change in the probit (z) latent variable for a one unit change in X, as described in Bengt's message. In logistic regression, e(coefficient) is the odds ratio, as you know and that has very clear intepretation And computing predicted probabilites as suggested by Linda Muthen is a great way to summarize the overall model results. I suppose you could make different plots of predicted probabilities varying the value of one preditor, as anonymous has done. I think it is important to keep track of the means in this plot as it is usually a good idea to plot values that actually occurred in your data. Another option (but check the research literature first) is to convert the probit coefficients to logistic coefficients using the general formulas in Maddalla and other placesthen discuss the coefficients in terms of odds ratios. There is probably a study somewhere that evaluated how accurate this wasthe accuracy probably depends on whether the probits are in the tails of the distribution where the logit and probit differ the most. 


I'm also testing mediation with dichotomous outcomes and I'm at my wit's end trying to figure out how to translate the probit coefficients into probabilities. My model has a dichotomous outcome  Adult sexual assault between two time points (ASA). There are three primary predictors: Child sexual abuse (CSA)  dichotomous Psychological Distress (PD)  continuous Sex motives (SM)  continuous I also have 5 covariates in the model. I have two indirect paths in my model  CSA to PD to SM to ASA CSA to PD to ASA I am using WLSMV estimation. First question  which I believe was answered above by David MacKinnon  are the significance tests for the indirect paths (using model indirect) appropriate with probit coefficients? Second question, How to I translate the probits into probabilities? Let me tell you what I have done  I have looked at Technical Appendix 1 referenced by Linda Muthen above  this may be enlightening for some people but it is not for me – it seems like equation 9 may be the one to use but this is not clear to me or what to plug into it. I have obtained several Sage pubs on this issue to try to understand it  Liao's Interpreting Probability Models is helpful and there is a formula on page 23 demonstrating how to translate probits into probabilities that I understand and I was able to figure out in SPSS how to use the cummulative normal distribution function to translate a z score into a probability. The problem is that the formula in Liao is as follows: Prob(white female)(y = 1) = CND (.106Xconstant  .780*white  .377 * female) The problem I am having translating this is that when I look at my printout from my SEM model  I don't have a constant like I would in a traditional regression so this formula must be modified somehow. I looked at the post above with the formula for the example with everalc but again  I'm missing something  what is the threshold and where do you get it to plug into this equation? Hopefully my question is clear and what steps I have taken to try and resolve this on my own. This analyses are for a manuscript that I am trying to get back ASAP to a journal  although I know that rerunning my analyses as I have with the WLSMV with categorical outcomes is the most appropriate analysis because I have a low baserate dichotomous outcome, these new analyses were not requested by the reviewers and I am worried that I’m going to have to dump these new analyses and go back to my previous analyses (where ASA was treated as a continuous outcome) if I can't figure out how to report the probits as probabilities so that the reader (and I) can understand them. I would appreciate any assistance and some handholding on these calculations. Thanks in advance. 

BMuthen posted on Thursday, January 20, 2005  7:55 pm



The general formula for translating probit coefficients into probabilities is: prob (y=1) = f (threshold + b1*x1 + b2*x2 ...), where f is the cumulative normal distribution function. Take the threshold value from the output (it is the negative of the intercept constant that the article used). 


Thanks for the quick response  I am having trouble locating the threshold value from the output  I don't see it anywhere in my standard output. I then added basically every output option I could (many of which weren't valid anyway) and I still didn't see it anywhere. Sorry. Here is my input: VARIABLE: names are csa t2agecs t3intv dep hos anx sxc1 sxc2 sxc3 asa t1racbnb t1racwnw t2rses0 t2psych nt2phys0; usevariables = t2agecs t3intv t1racbnb t1racwnw t2rses0 t2psych nt2phys0 csa dep hos anx sxc1 sxc2 sxc3 asa; categorical are asa; Model: Distress by dep anx hos; sexcope by sxc1 sxc2 sxc3; ASA on sexcope distress csa t2agecs t3intv t1racbnb t1racwnw t2rses0 t2psych nt2phys0; sexcope on distress t2agecs t3intv t1racbnb t1racwnw t2rses0 t2psych nt2phys0; distress on csa t2agecs t3intv t1racbnb t1racwnw t2rses0 t2psych nt2phys0; Model indirect: ASA via distress csa; ANALYSIS: !TYPE IS BASIC; ESTIMATOR IS wlsmv; ITERATIONS = 1000; CONVERGENCE = 0.00005; OUTPUT: standardized; res; mod; cint; h1se; h1te; pat; fscoef; tech1; tech2; tech3; tech4; tech5; tech6; tech7; tech8; tech9; tech10; tech11; tech12; tech13; 


You have to add MEANSTRUCTURE to the TYPE option of the ANALYSIS command. Then you will find the thresholds in the results under Thresholds. If this doesn't solve the problem, send the full output to support@statmodel.com and briefly explain your problem. 


yes  Meanstructure did the trick  thank you. one more question though  bengt wrote: prob (y=1) = f (threshold + b1*x1 + b2*x2 ...), the hyphen in front of threshold  should i treat that as a negative  specifically, the threshold number in my print out is .753  in this formula  is that then "negative .753"  i just want to be prefectly clear. thank you!! 

bmuthen posted on Sunday, January 23, 2005  3:31 pm



Yes, "theshold" means the negative of the threshold. 


thank you for clarifying  i'll have a go at the calculations then! thanks! 

Mary posted on Wednesday, April 13, 2005  8:37 am



Hi, I am working on a model in which X is a mediator between A and B. Hence, I am looking at indirect relation (IND) going from A to B through X. I know that the coefficient of the indirect relation comes from the product of the coefficients of the regressions of 'X ON A' and 'B ON X'. What is the interpretation of the indirect relation? What does it means if it is negative? Best regards, Mary 

BMuthen posted on Wednesday, April 13, 2005  11:17 pm



The indirect effect is interpreted as follows. For a one unit change in a, the variable b changes by the value of the indirect effect. If a increases by one, then b decreases if you have a negative indirect effect. 


I have a question about a mediation model within a standard path analysis where I have an ordinal outcome measure with a clear bimodal distribution. My instinct is to treat this variable as a dichotomous variable, although I know that this will cost me variance and is generally discouraged. I have followed this discussion and see that Probit regression coefficients are output for dichotomous outcomes and that reporting probabilities is a good general way of summarizing the indirect/direct effects. Does anyone have a suggestion of the best way to proceed with an outcome measure with a bimodal distrubution? 

BMuthen posted on Tuesday, April 19, 2005  11:55 am



It sounds like a reasonable approach to dichotomize the variable as you suggest. If you believe that the two modes correspond to the means of different latent classes of individuals, you could instead try mixture modeling and treat the outcome as ordinal or continuous. 


Thank you Dr. Muthen for your suggestion. I do believe that the modes respond to means of two different classes of individuals, but it might be hard for me to justify this theoretically given the lack of data on the population sampled. The ordinal measure ranges from 05 with different ranges for each level (2 = 510 years, but 4 = greater than 20, but not forever). The modes are 2 (32%) an 4 (29%). If I split the distribution into a dichotomous variable I get a 56.5/43.5 split. In terms of the mixture modeling suggestion, I think I will try this. However, is it a problem to have 1 observed indicator for a 2 class latent variable or do I treat each level as it's own indicator? And would the suggestions in terms of reporting the results of a mediation model still be apply? 

bmuthen posted on Tuesday, April 19, 2005  1:29 pm



You can have 2 classes and only 1 indicator when you have covariates in the model as you do. The mediational aspect of the model gets more complex to report with a mixture although is perhaps more realistic. 

Anonymous posted on Saturday, May 21, 2005  2:52 pm



In MPlus, how to test whether indirect effects are significant or not? Could you please provide a formula for the calculation? And, for such a test, is there any difference between the case when the mediator variable is a binary variable and the case when the mediator variable is a continuous variable? Thanks! David 


I'm not sure if you are aware of MODEL INDIRECT. This is how indirect effects are tested in Mplus. There is a MacKinnon reference in the user's guide under MODEL INDIRECT that describes the Mplus indirect effects computations. 

Anonymous posted on Wednesday, August 17, 2005  9:26 am



As far as I understood, the coefficients for a relationsship with a binomial or categorical outcome are probitcoefficients, whereas relationships between two continous variables are OLS regression coefficients. If a model contains one as well as the other relationship, then I do have two different kinds of coefficients. Is it allowed to compare the standardized values in the sense of "this variable has a greater relative effect on y as the other one", although they are not the same? 

BMuthen posted on Wednesday, August 17, 2005  2:14 pm



I don't think standardized probit and standardized linear regression coefficients are comparable. Although they are comparable in terms of the latent response variable underlying the categorical outcome, the former ultimately relate to a probability whereas the latter do not. 


I am using mediational analysis with a dichotomous dependent variable and I am not sure which approach (i.e.between SEM and logistic regression) I should use. Could you please let me know which is the best approach to use and why. Thanks. Julietta 


I'm not sure what your mean by the distinction "between SEM and logistic regression". In Mplus, there are two estimation options for a dichotomous dependent variable. With the weighted least squares estimator, you can estimate a probit regression. With maximum likelihood, you can estimate a logistic regression. 

bmuthen posted on Wednesday, December 07, 2005  7:56 am



To add to Linda's answer, here is what I just posted on SEMNET: Both Colin and Istvan's posts concern a dependent observed variable that is binary. Colin has a factor model predicting discrimination or not and Istvan has a path model where the ultimate dependent variable is survival or not. Istvan's model also has a mediating variable of colouration which perhaps is categorical. These applications gave rise to a discussion of analysis using SEM and logistic regression, where it wasn't clear what was possible. Latent class analysis also came up but is not relevant because there is no latent categorical variable involved. Because of this, it is worthwhile to make clear that both Colin's and Istvan's examples can be analyzed by Mplus since Version 3 came out in March 2004. If by conventional SEM one means analysis using continuous outcomes, conventional SEM can therefore be combined with logistic regression features, still using maximumlikelihood estimation. The categorical dependent variable can be an ultimate dependent variable or a mediating variable. Several examples of related types are given in the Mplus User's Guide and the examples can be seen at http://www.statmodel.com/ugexcerpts.shtml. There are also free web videos with these kinds of examples to watch at http://www.statmodel.com/trainhandouts.shtml Bengt Muthen 

Annonymous posted on Saturday, February 04, 2006  2:33 pm



In Mplus version 3, is the significance test for indirect effects based on the Sobel method [a*b/SQRT(b2*sa2 + a2*sb2)] or the Mackinnon method (a*b/ standard error of a*b)? 

bmuthen posted on Sunday, February 05, 2006  5:04 pm



Mplus uses the standard error of a*b. I think you refer to the fact that in some writing the covariance between the a and b estimates is excluded  my understanding is that this is done for special simple models where this covariance is in fact zero. Mplus cannot exclude this covariance because it allows for general modeling. Mplus also provides bootstrap standard errors as well as bootstrap confidence intervals for cases where the a*b distribution is not close to normal. 

Annonymous posted on Wednesday, February 08, 2006  10:29 am



In my model, the outcome is dichotomous, the predictor variable is ordinal, and the mediating variable is continuous latent. The estimator is WLSMV and this is a probit regression. In trying to sort out how to calculate the interpretation of the parameters (given that it is probit not logistic), I have been confused by references to the 'threshold' values  in this type of model, should I not be using the model estimated slope plus the Beta estimates for the covariates (multiplied by the values of X)? 

Andrew posted on Wednesday, February 08, 2006  1:07 pm



What are the meaning of negative thresholds in models involving ordinal data? I am running a CFA with ordinal items using a WLSMV estimator. Is this something I should be concerned about? 

bmuthen posted on Wednesday, February 08, 2006  6:26 pm



Answer to Anonymous: To get the indirect effect you should use the product of the slopes. The threshold is the same as a negative intercept and is therefore not used for indirect effect calculations. 

bmuthen posted on Wednesday, February 08, 2006  6:27 pm



Answer to Andrew. Negative thresholds are fine. The analogy is an intercept  it can have both positive and negative values. 

Annonymous posted on Monday, February 13, 2006  11:55 am



For the calculation of probit probabilities as outlined in a previous post [prob (y=1) = f (threshold + b1*x1 + b2*x2 ...)], are the 'b' values the unstandardized parameter estimates? 


Yes, the b values are the unstandardized probit regression coefficients. 

Annonymous posted on Tuesday, February 14, 2006  8:57 am



I have a model in which: independent is categorical mediator1 is categorical mediator2 is latent dependent is binary Due to the binary dependent variable, probit regression used. I assume that the parameters associated with dependent variable and mediator1 are probit regressions. Is the relationship between categorical independent and latent continuous mediator2 simply a linear regression? if so, using the standardized parameter estimates, can i compare the 'magnitude' of the two indirect pathways given that one is composed of 2 probit paths and the other is part probit part linear regression? 

Annonymous posted on Tuesday, February 14, 2006  9:26 am



Follow up: for mediator1, the categorical variable, is there any way of testing if the slopes are equal between category levels? 

bmuthen posted on Tuesday, February 14, 2006  5:08 pm



Answers to your 3 questions: 1. that's a correct assumption. 2. yes 3. that is a stretch given that coefficients have different meaning. Regarding your followup question, I thought your mediator 1 was an ordered polytomous variable which therefore has only 1 slope. 

Annonymous posted on Wednesday, February 15, 2006  7:20 am



Ok...re: point number 3, I should just ignore the output that is generated with the model indirect option then? 

bmuthen posted on Wednesday, February 15, 2006  7:59 am



I would not ignore the indirect output since you can certainly take a separate look at each of the two indirect effects and see how large they are. The "stretch" I mentioned concerns comparing the two indirects, but this is ok if you accept the latent response variable conceptualization for the categorical mediator; then both indirect effects go via a continuous mediator. 

Annonymous posted on Friday, February 17, 2006  1:31 pm



re: I thought your mediator 1 was an ordered polytomous variable which therefore has only 1 slope. it is an ordered polytomous variable (4 levels). there are three threshold values listed in the output for this variable, with $1, $2, and $3 appended to the variable name. what i am trying to do is calculate the probability values associated with the mediator (as a dependent variable) in relation to the value of the independent variable. given that there are three threshold values provided, am I supposed to calculate three seperate equations for the probabilitiies? if so, does this mean that all of the probabilities are all 'relative' to the lowest level of the dependent variable, since it appears to have been excluded? 


I am assuming you are using weighted least squares estimation. For an ordered categorical (ordinal) dependent variable with three categories, the probit regression model expresses the probability of u given x using the two thresholds t1 and t2 and the single probit regression coefficient b, P (u = 0  x) = F (t1  b*x), P (u = 1  x) = F (t2  b*x)  F (t1  b*x), P (u = 2  x) = F ( t2 + b*x). where F is the standard normal distribution function. 

Annonymous posted on Tuesday, February 21, 2006  7:13 am



Ok, thanks. There are four levels to this ordinal variable, though  so is the next level to this equation p ( u = 4  x) = F (t3 + b*x) ? sorry for all the questions  can you recommend an any particularly good resources? 

bmuthen posted on Tuesday, February 21, 2006  3:19 pm



In that case the u=0 and the u=1 probs are as before, while P(u = 2  x) = F(t3  b*x)  F(t2  b*x) P(u = 3  x) = F(t3 + b*x) 

bmuthen posted on Tuesday, February 21, 2006  3:20 pm



A good intro ref is Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons. 

annonymous posted on Thursday, February 23, 2006  6:13 am



If the outcome variable is dichotomous, and one of the predictor variables is a latent continuous variable derived from 2 categorical variables, what values would I subsitute in to calculate probabilities in the case of the latent variable? would it still be in units of 'one'? 


You could calculate the probability for the mean of the latent variable and plus and minus one standard deviation. 

Annonymous posted on Monday, February 27, 2006  1:38 pm



When tstatistics are generated from the model indirect command, is it a one or a two tail value? I am interested in knowing the value associated with an alpha of 0.05. 


Twotailed  1.96. 

HWard posted on Tuesday, March 07, 2006  12:28 pm



What source would I reference for the type of mediation significance test that is used in Mplus? 


The Version 4 User's Guide references MacKinnon, D.P., Lockwood, C.M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39, 99128. 


Dear Muthen and Muthen, I work with Mplus and have a few question marks that I would like to inqure about. (1) Can I compare the direct effects from a model with mediation effects to the ones of the same model but without the mediation variables? If so, can I conclude anything about the mediation effect based on this comparison? 2) I have a model where the indirect effect ABC is significant but weak, the direct effect AB is nonsignificant, and the direct effect AC is significant. Can I make the affirmation that a mediation effect exists? What if AB is significant and AC is not ? (3) How should I interpret the following result: a negative indirect effect (ABC) and a positive direct effect (AC). Is this what we call a supressor effect? Thanks you very much for your help. I promise I will acknowledge you at my dissertation oral defense Best Regards, Barbatruc 

HW posted on Tuesday, October 03, 2006  1:12 pm



re: post by bmuthen on Wednesday, February 15, 2006  7:59 am "I would not ignore the indirect output since you can certainly take a separate look at each of the two indirect effects and see how large they are. The "stretch" I mentioned concerns comparing the two indirects, but this is ok if you accept the latent response variable conceptualization for the categorical mediator; then both indirect effects go via a continuous mediator." Could you recommend a reference supporting the examination of indirect paths (but not the direct comparison of their coefficients) in a model where the mediating variables have different scales (i.e. latent vs. ordinal)? 


Please contact David MacKinnon at ASU for writings related to this. 


Hi, Bengt. Regarding the earlier question about interpreting probit regression coefficients, I have a situation with a latent variable measured by two continuous indicators predicting a dichotomous mediator which in turn predicts a continuous outcome. Because the mediator is dichotomous, we are using the THETA parameterization to fit this model with WLSMV estimation. Would it be appropriate to multiply the probit coefficient by the constant 1.7 and then raise e to the power of the resulting value to obtain an approximate odds ratio for this effect? Thanks so much and best regards, Tor 


This would not be appropriate. By multiplying a probit coefficient by 1.7, you put it on the logit scale. You don't make it a logistic regression coefficient. You cannot turn a probit coefficient into an odds ratio because it does not have constant odds as a logistic regression coefficient does. 

Anna Song posted on Tuesday, December 19, 2006  2:54 pm



Dear Muthen and Muthen, Regarding the formula to calculate probit probabilities: prob (y=1) = f (threshold + b1*x1 + b2*x2 ...) would it be appropriate to apply the forumula to generate probability values based on an indirect effect estimated via the probit WLSMV estimator? In other words, is the indirect effect coefficient a probit coefficient? If it is not appropriate, how would you suggest presenting the indirect effect? Thank you, Anna 


If an indirect effect is the product of two probit regression coefficients, the indirect effect is a probit regression coefficient. 

Anna Song posted on Tuesday, December 19, 2006  8:12 pm



Thank you so much. Another question we had was how to interpret the model as a whole. We have a categorical predictor variable (x), 2 categ. mediators (m1, m2) & a dichot endogenous variable (y). We used WSMLV. Our results are as follows: RMSEA = 0.0, WRMR = 0.17 MEANS/INTERCEPTS/THRESHOLDS Y = .80, M1 = 1.83, M2 = 2.30 Coefficient(S.E.) M1 on X: 0.13(0.02) M2 on X: 0.07(0.03) Y on M1: 1.09(0.09) Y on M2: 0.99(0.09) Intercepts: M1: 1.83(0.04); M2 2.30(0.05) Thresholds: Y: 5.73(0.40) Indirect effects X, M1, Y 0.15 (0.03) X, M2, Y 0.07 (0.03) Total 0.21 (0.05) **** We calculate: py=1;x=3=f(0.8 + 0.21*3)= 43.25. Would this be correct? Also, we are interpreting these results to mean X increases M1 & M1 increases Y. Y increases 1.09 units on a cumulative normal curve (or zscore units) for each unit increase M1. X increases M2 & M2 increases Y. Y increases 0.99 zscore units for each unit increase in M2. Y increases 0.15 zscore units for each unit increase in X (due to M1). Y increases 0.07 zscore units due to X on M2. The total indirect effect of X on Y is .21. Would our interpretations be correct? Any feedback on this would be greatly appreciated. Best, Anna 

Boliang Guo posted on Wednesday, December 20, 2006  1:43 am



for categorical Y and/or M, you should rescale a and b for mediation effects computing, see Prof. Mackinnon's 1993 paper(Evaluation review), maybe one of Muthen's paper (psychometrika 1984) also mentioned the categorical Y and/or M? alternatively you can refer Huang's 2004 paper published in Statist. Med.(27132728) 

Tor Neilands posted on Wednesday, December 20, 2006  11:30 am



If Anna were to use the standardized coefficients instead of the unstandardized coefficients, would the need to rescale be obviated? 


I don't think you need anything but the product of unstandardized probit coefficients, which is what you get from Model Indirect in Mplus. McKinnon et al has a paper under review that maybe we can share shortly. 

dm posted on Monday, January 01, 2007  4:30 pm



Hi, I am running a path analysis like this: ... CATEGORICAL IS Y1 Y2; ANALYSIS: PARAMETERIZATION=THETA; MODEL: Y2 ON X1 X2; Y1 ON X1 X2 X3 Y2; Y1 is a binary variable, and Y2 is an ordinal variable. When I interpret the coefficients, can I interpret them as "the changes of the latent continuous variable underlying" Y2 (in the first equation) and Y1 (in the second equation)? For example, if the coefficient of X1, a continuous variable in the first equation, is .2, can I say one unit increase in X1 will lead to .2 increase in the continuous latent variable underlying Y2? Thanks! 


Yes, you can say that. You can also look at sign and significance of the parameter estimate and convert it to a probability. 

chris dawes posted on Tuesday, February 05, 2008  11:14 am



I am working on a mediation model with random a intercept in which the mediator and dependent variables are dichotomous. I am modeling this as: VARIABLE: NAMES ARE Y M X ID; CATEGORICAL ARE Y M; WITHIN IS X; CLUSTER = ID; ANALYSIS: TYPE = TWOLEVEL; ALGORITHM = INTEGRATION; INTEGRATION = MONTECARLO; LINK = PROBIT; MODEL: %WITHIN% M ON X; Y ON X M; %BETWEEN% I know I have to calculate the Sobel test by hand, but are the coefficients and standard errors comparable across the two equations or do they need to be rescaled? Thanks. 


They are comparable. I am not, however, aware of an article that discusses the case of mediation when both distal (y) and mediator (m) are categorical, only when y is. MacKinnon at ASU might know  his book is now out. 

chris dawes posted on Tuesday, February 05, 2008  2:18 pm



Thank you! One quick followup. In the code above, do I need to include M in the WITHIN statement (line 3) since it is an independent variable in the second model? 


It sounds like you are asking if you should have M included in your statement Y ON X M; if so, yes. Note, however, that both M and Y are allowed variation on both Within and Between (see UG chapter 9), so you want your Between part of the model to relate these variables. As it is, I think you will find only their variances estimated on Between with covariance=0. 


Hello, How does Mplus calculate the standard error of estimates of indirect effects? According to one of the messages posted in 2006, Mplus uses MacKinnon's standard error of a*b. Does this mean that Mplus calculates the standard error from the empirical distributions of a*b? I just realized that the standard error printed in Mplus output is different from the standard error I calculated using Sobel's delta method. Thank you. 


The standard errors for indirect effects are Delta method standard errors unless bootstrap is requested. If you have discrepancies, please send your input, data, output, hand calculations, and license number to support@statmodel.com. 


Dear Linda and Bengt This discussion site is excellent but I have a few questions that I am not entirely clear on. I am testing a mediation model: X – M – Y X is a latent variable (3 continuous indicators) M is an observed continuous variable (scale 04) Y is dichotomous 1. Using the delta method and model IND: is the standardisesed (SDYX) specific indirect estimate the estimate of the z test (sobel test)? 2. However, the indicators of my latent X variable deviate from normality therefore I think I should use bootstrapping or is it ok to report sobel z as the WLSMV estimator is used (which are robust to nonnormality I think?) a. If I apply bootstrap and report the 95% CINTERVAL, how can I determine whether this estimate is significant? 3. Finally, as both my direct and indirect effects are probit coefficients is it best to calculate the probability for (1) mean of X latent variable and plus and minus one standard deviation and (2) mean of M variable and plus and minus one standard deviation when explaining my model? Thank you for your help. Grainne 


1. It is the raw coefficient. Significance of the raw coefficient is determined by column three of the output, the ratio of the parameter estimate to its standard error. 2. Normality is not a relevant concept for categorical dependent variables. Floor and ceiling effects are taken into account by categorical data modeling procedures. 2a. If the confidence internal covers the hypothesized value, then the hypothesis cannot be rejected. 3. Yes. 


See also MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499513. which is posted under Papers, Mediational Modeling. 


Thank you for your quick response. Having read MacKinnon's paper I have one final question: is the model IND based on the product of coefficients method? (I think it is but I just want to be sure). Thanks again. 


Yes. 


In relation to translating probit coefficients into probabilities: prob (y=1) = f (threshold + b1*x1 + b2*x2 ...), If the threshold in the output is already negative, is this indicative of a problem with my data or can I continue with the formula using this negative value to calculate probability? Kind Regards Grainne 


You should change the sign of the threshold whether it is positive or negative. 


Thanks Linda, To clarify when you say 'change the sign...' do you mean change the sign to positive if it is negative in the output, and change to negative if it is positive? OR irregardless of the sign in the output use the negative when calculating probability? thank you 


I mean change the sign. That means if it is positive change it to negative, If it is negative change it to positive. 

Emily Blood posted on Monday, December 22, 2008  2:00 pm



Hi, I have a latent growth curve with binary outcomes and mediation between repeated predictor and repeated outcome. I am using the logit link and MLR estimation and running a monte carlo simulation (data generated outside of Mplus). The INDIRECT option does not work with this type of estimation so based on an earlier post from Dr. Muthen I'm using the MODEL CONSTRAINT command to obtain the total effect between the predictor and outcome. I do get the mean value of the total effect of the monte carlo sample, but my problem is that the correct population value is not being used in calculating the 95% coverage. I put in the true population values for the direct and indirect path parameters which =0.3, but the true population parameter value is being indicated as 0.5 in the output. Can this be corrected? A starting value (specified with "*0.3" in the MODEL CONSTRAINT command) is ignored. Thanks! 


The starting value is given as part of the NEW option, for example, NEW (c*.3); 

Emily Blood posted on Tuesday, December 23, 2008  7:08 am



Again, I have mediated binary growth curve. The MacKinnon and Dwyer 1993 paper indicates that both the a and b need to be divided by the v(Y*) before being multiplied to obtain the value of the indirect effect, however, above it has been stated that just using the product of the unstandardized a and b is all that is needed to obtain the value of the indirect effect (and that this is what is given by the indirect statement). When this is not done the a*b estimates are all positively biased (Table 2 of their article). Which of these is corrector when does each apply? The reason I ask is that I have generated data with known a and b and am estimating the total and indirect effect and all of my estimates (obtained from MODEL INDIRECT or MODEL CONSTRAINT depending on if I use probit or logit link) are positively biased. I wonder if I'm not calculating the indirect and total effect correctly? Any insight you could give would be greatly appreciated. Thanks, Emily 


Please see their more recent article: MacKinnon, D.P., Lockwood, C.M., Brown, C.H., Wang, W., & Hoffman, J.M. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499513. In summary, for two fixed effects the indirect effect is the product of a and b. For two random effects, the indirect effect is the product a and b plus their covariance. For one fixed effect and one random effect, the coefficient is the product of a and b. 

Emily Blood posted on Tuesday, December 23, 2008  9:42 am



I have read their more recent article which seems to indicate that I would use the a*b in my case where a and b are the unstandardized regression coefficients. However when doing this, I am getting all positive biases so I was wondering if I was missing something. 


What is your evidence that the indirect effects are biased? 


Hi there, I am running a model that perfectly matches the description of PATH ANALYSIS WITH A CATEGORICAL DEPENDENT VARIABLE AND A CONTINUOUS MEDIATING VARIABLE WITH MISSING DATA as mentioned in your manual. However, there is about 60 out of 178 cases missing for the mediating variable. Nonetheless, I notice that the analysis seems to be based on the full sample of 178 as this is the sample size indicated. The ESTIMATOR = MLR and INTEGRATION = MONTECARLO. Are the missing variables being imputed somehow? And how reliable are these results with about 1/3 of the data missing? can I just use this procedure for an empirical publication? Many thanks for your reply, Serge Rijsdijk Rotterdam, the Netherlands 


Missing variables are not being imputed in the Mplus ML estimation. The analysis uses the "MAR" assumption of the classic Little & Rubin (2002) missing data book. MAR uses all available information such as individuals with scores on only x and y, not m. The sample size listed is the total number of subjects who contribute to the analysis. 1/3 of the data missing is a lot and means that there can be many reasons that MAR does not hold. This amount of missingness would/should raise questions by a journal and needs to be discussed. As a first step you should see if the means and variances of y and x are different for those with data on m versus the others. And you should ask yourself if the values of x, or y, are related to the missingness on m. It's a big topic, but there is good literature  see also overviews in Psych Methods by Graham, Schafer and others. 


A colleague and I have computed a mediator model with continuous exogenous variables, mixed (i.e., both binary and continuous) mediators and a dichotomous outcome using Mplus v.5.2. We applied a WLSMV estimation in order to get estimators for indirect effects. A reviewer asks us to explain why we recurred to a probit instead of a logit link. Two questions: 1. From my understanding of the literature (e.g., the mentioned MacKinnon et al. 2007 paper), computation of indirect effects is basically feasible and valid for both logit and probit regressions. Do you agree? 2. In preparation of our reply to the reviewer, we are wondering why Mplus calculates indirect effects exclusively for WLS but not for ML estimation. Are there computational or technical reasons for this? Thank you so much in advance, Oliver Arránz Becker 


In principle, indirect effects are feasible for both probit and logistic regression. In Mplus, indirect effects are computed for weighted least squares probit models. The reason is that the mediator is treated as a u* latent response variable when it is a dependent variable and when it is an independent variable. This makes the computation of an indirect effect correct. In Mplus, when maximum likelihood is used for probit and logistic regression, the mediator is treated as u* when it is a dependent variable and u when it is an independent variable making it incorrect in Mplus to create an indirect effect. 


We have a recursive pathmodel (all observed) with a dichotomous dependent variable, the IV's are continuous or dummycoded and we have one mediating variable that we can operationalise as either continuous or ordinal (preferably the later). a) We use the WLSMVestimator. If we choose a ordinal mediator we then have all probit coefficients, if we choose a continuous mediator, we have a mix of OLS and probit coefficients. b) The probit coefficients in the pathmodel refer to the latent continuous variables, so it is possible to apply standard pathmodeling techniques. I.e. (in)direct effects, multiplying coefficients, ... Is this also the case if we have a mix of OLS and probit coefficients? c) As there are no latent variables, we can ignore the residual variance and convert the probit coefficients for total, direct and indirect effects to probabilities (if we set all the variables in the pathmodel to a value). This is possible for both the model with the continuous mediator (intercept), as for the one with the latent continuous mediator (by using the thresholds). d) We can preform mediation analysis using the output of the INDIRECTstatement, both if the mediator is manifest continuous or latent continuous (ordinal). Is this more or less correct? Thanks in advance. 


This sounds correct. 


Thanks for the response. A followup regarding standardization: my understanding from the UG is that I should use StdY standardization for my dummycoded IV's and StdYX for continuous IV's (with a dichotomous DV). Unfortunately I get a warning that StdY is not available when using WLSMV and categorical outcomes. Why is this/how should I handle it ("destandardize" StdYX by dividing by sd(x))? 


Yes, destandardize as you say. 


Thanks again. Hopefully a last followup question: for presenting results, I would like to show the standardized total, direct and indirect effects together with (bootstrapped) CI's around them. I can do this for the StdYXstandardized total & (in)direct effects, but what with StdY? Is it possible to "destandardize" here also, and how should I handle the CI around the resulting StdY total or (in)direct effect? 


It would not be possible to destandardize and compute bootstrapped confidence intervals. 


although i am not doing the same type of analysis, i thought i would post here because i have a similar situation to maarten in which stdy is not being printed (although i am not sure why and i do not get an error message; it just doesn't print when requested). perhaps it is related to using WLSMV with multiple groups? in any case, i want to make sure i understand how to destandardize. i understand that i want to divide by the standard deviation of my predictorsdoes that mean the standard deviation of each predictor to get its own proper STDY standardization? when mplus does not print the variance in the analysis (e.g., for categorical predictors) does this mean finding the variance/SD through some other means and then computing? so, for example, a binary predictor has a "variance" of .07 and a "standard deviation" of .272. its STDYX for predicting outcome is .11, so its STDY is .11/.272, or about .40. is that correct? apologies if this question seems quite basic, but i want to make absolutely sure i understand what to do here. thanks in advance for any help! 


Yes, that's it. You get the predictor variances from Sampstat, or a Type=Basic run. 


thanks for the confirmation! sampstat gives the overall sample statistic; since it's a multiple group analysis, i want to use the SD/variance for each group, correct? for example, to destandardize a predictor in the "male" group, i would want the SD of that predictor in the "male" group, not the SD of that predictor in the combined males + females sample. assuming i am right about all of the aboveit might be nice to see sample stats by group in a future version of mplus. thanks! 


Yes, for each group. If you ask for SAMPSTAT in the OUTPUT command when a GROUPING variance is used, you will get sample statistics for each group. 


thanks, linda, you are absolutely correct. the sample statistics for each group are there; i just didn't see them at first. 

Kesinee posted on Tuesday, August 16, 2011  1:14 pm



Dear all, I ran a path analysis (all observed variables) for either ordinal or binary outcome with ML (logit coefficient). All mediators are continuous variables. Independent variable (X) has 4 categorized. My questions 1) Do I have to report standardized coefficients between X and Ms, if so what type of standardized coefficients. I understand that with Ms and Y, StdYX should be used, is it correct? 2) Which model constraint options obtained for indirect effect, is there standardized or unstandrdized product? Thank you for your help. 


1. Many reviewers want standardized coefficients reported. For y ON m use StdYX. For m ON x use StdY. 2. With MODEL CONSTRAINT, you obtain unstandardized indirect effects. 


I tried to obtain bootstrap estimates for the indirect effect in a mediation model where x and m are continuous and y is dichotomous. Under WLSMV, my bootstrap standard errors are always very very high, and do not compare at all with those obtained with the Model indirect command in mplus. I am not talking about small discrepancies. Ex. Est/S.E = 3.00 with Model Indirect and 1.38 with bootstrap estimates. My model contains covariates. I was able to replicate this problem with two different models using different samples and using either a sem or a path analysis. Is there any known issue with using bootstrapping with WLSMV in Mplus? Or am I doing something wrong? An example of the syntax that I use: TITLE: Mediation with dichotomous y variable DATA: FILE IS data1.dat; VARIABLE: NAMES ARE x1 x2 c1 c2 m1 d1; USEVARIABLES ARE x1 x2 c1 c2 m1 d1; CATEGORICAL ARE d1; ANALYSIS: BOOTSTRAP 5000; MODEL: d1 ON m c1 c2 x1 x2; m ON x1 x2 c1 c2; MODEL INDIRECT: d1 IND m x1; OUTPUT: standardized; CINTERVAL(BCBOOTSTRAP); Thanks! 


Please send the outputs that show the problem and your license number to support@statmodel.com. 


Dear all, I have a recursive path model with a dichotomous x variable, with two dichotomous mediators (u1 & u2), and a continuous outcome (y). I've asked Mplus v6.1 for indirect effects, using WLSMV as the estimator. For example: MODEL: Y u1 u2 ON X ; Y ON u1 u2 ; MODEL INDIRECT: Y IND X ; My question is: 1) Are the indirect effects the product of the probit coefficient and the ordinary regression coefficient? 2) If so, is Mplus automatically rescaling the probit and ordinary regression coefficients to be the same scale? Thanks! 


1. Yes. 2. No rescaling is necessary. 

Kesinee posted on Sunday, August 28, 2011  7:28 am



Hello. Following your answer from August 17, 2010: How can I obtained standardized indirect effect form Model Constraint options? Thank you for your times. 


If you can't use MODEL INDIRECT and must define your indirect effect in MODEL CONSTRAINT, you can also standardize it in MODEL CONSTRAINT by multiplying by the standard deviation of the covariate and dividing it by the standard deviation of the final outcome. 

Thach D Tran posted on Tuesday, September 06, 2011  4:28 pm



Dear Muthen&Muthen, I would like to clarify my understanding of indirect coefficients in case of binary outcomes. I have run a mediation model of a binary outcome (Y). The interested independent variable (X) is a binary variable too but the mediator is a continuous latent variable (M). I want to measure the effect of X on Y via M. The WLSMV estimator was used with MODEL INDIRECT: Y IND X; In the output: …. Thresholds Estimate S.E. Est./S.E. PValue Y$1 1.620 0.919 1.763 0.078 …… Effects from X to Y Estimate S.E. Est./S.E. PValue Total 1.391 0.504 2.760 0.006 My questions are: 1. Is 1.391 the profit coefficient? 2. If yes, can I use the below formulas: Prob (Y=1X=1) = f(1.62 + 1.391*1)= .409 Prob (Y=1X=0) = f(1.62 + 1.391*0)= .0526 So that I can interpret that the probability of Y=1 when X=0 is .0526 and when X=1 is .409 Thank you so much in advance 


Yes on all the above. 

Heike B. posted on Wednesday, December 14, 2011  3:29 am



I would like to verify my understanding of turning probit coefficients into probabilities. In an earlier posting Linda described the logic P (u = 0  x) = F (t1  b*x), P (u = 1  x) = F (t2  b*x)  F (t1  b*x), P (u = 2  x) = F ( t2 + b*x). for an independent variable with three categories and a single predicator x. My understanding is that when I have more than one predictor this would augment to P (u = 0  x) = F (t1  b1*x1b2*x2...), P (u = 1  x) = F (t2  b1*x1b2*x2...)  F (t1  b1*x1  b2*x2  ...), P (u = 2  x) = F ( t2 + b1*x1 + b2*x2*...) with F: cumulative standard normal distribution function x1, x2, ...: predictors that can be continous or categorical b1, b2, ...: the unstandardized probit coefficients as given in the MODEL RESULT section t1, t2, t3: thresholds from the Model result section I would be glad if you could let me know if this is correct. Also does it make sense to calculate these probabilities also for the standardized coefficients? Many thanks in advance. Heike 


This looks correct. To get probabilities for the standardized coefficients, you would need to standardize x and you would get the same results. 


Dear, I have following multilevel model i'd like to test. x1 > x2 and x3 > u > y on level 1 x1_mean > y on level 2. u is dichotomous and y is a count (nb) variable. 1) I haven't found a way to calculate indirect effects x1 IND y? Can it be done? 2) Is x_mean introduced in a correct way? thank you in advance, Ruben. Variable: Names are clus y u x1 x2 x3 x1_mean ; Usevariables are y u x1 x2 x3 x_mean; within is u x1 x2 x3; between is x_mean; categorical is u; count is y (nb); cluster = clus; analysis: type = twolevel; starts 50; model: %within% x2 on x1; x3 on x2 x1; u on x2 x3; y on u x1 x2 x3; %between% y on x_mean; 


You need to run this is Mplus using ML. You cannot use a product to compute an indirect effect in this case. Please see the following paper which is available on the website for an alternative method: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Submitted for publication. 


Dear Linda, dear Bengt, I have the following model: A binary dependent variable, several independent variables, interactions between the independent variables (all have impacts on the mdeiators) and latent multiple mediators (all continuous). I read the paper Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. However, I am not sure, if this approach is applicable for my (more complex) model. I have the following concerns: 1) All the examples in the paper refer to a single mediator. Does the approach also apply to MULTIPLE MEDIATORS? (I also read the paper of Imai et al. 2010, which say that future research should extend the approach to multiple mediators. Has this been done already?) 2) I need to estimate a logit model, but I have multiple latent mediators. On page 25 of the paper, Bengt mentions that the latent mediator approach using logistic regression is not available in MPlus. Do I understand right that you NEVER can compute indirect effects with LOGISTIC regressions in MPlus in the case of LATENT mediators – neither with this new approach? 3) Does this approach also apply to multiple independent variables, which interact with each other and in the case that ALL independent variables and their interactions have an impact on ALL mediators? I would appreciate your advice. 


1) As I understand it, these formulas apply also here. When mediators influence each other, the general formula needs special explication, but not when they don't. 2) Page 25 talks about a latent mediator behind a categorical observed variable. In contrast, there is no extra difficulty if your mediators are latent factors measured by multiple indicators. 3) Yes, but the results you get are for one IV conditional on some values of the others. 

Hoon Lee posted on Tuesday, February 07, 2012  8:41 am



I am running a similar model posted at the top of this page, but the DV is ordinal (IV and mediator are continuous). I have two specific questions. 1. I am wondering if I can use the negative loglog link function (lower cases are more likely) instead of the logit or probit function in Mplus. If not, is there another fix for an ordinal dependent variable with an unevenly greater number of lower cases? 2. Can use a bootstrap method to estimate an indirect effect, even with different types of models (IV to mediator: continuous to continuous; Mediator to DV: continuous to ordinal). Thank you very much in advance! 


1. No loglog link function yet in Mplus. Perhaps a mixture model would work. 2. Yes, but if there is missing on the mediator then it requires numerical integration where Mplus does not yet have bootstrap. 


Dear Linda, dear Bengt, I am not sure, if I understood the question of standardization correctly. For the case of a mediation model, in which the IV is categorical (dummy coded) (x), the mediators are latent continuous (m) and the DV is dichotomous (y): Is it correct that I have to standardize the coefficients m ON x with StdY and y ON m with StdYX? Thank you very much in advance! 


And one followup question (sorry, I am a bit confused about standardization): What about the indirect effects? Do I have to report the standardized coefficients, too? If yes, which one (Std gives me the same results as under the raw coefficients)? 


You don't have to standardize, but if you want to standardize, here are the answers to your 2 questions: Q1. Yes. Q2. The indirect effect is an effect from a categorical IV to a continuous DV, so therefore you would use StdY. Std standardizes wrt latent variables which your model does not have. 


Hi, I am running a mediation model. My mediation variable and dependent variable are both dichotomous, and my independent variable is continuous. I have 3 questions which I was hoping you could help with; 1.Is it Ok to run the INDIRECT command when the independent variable is continuous, and the mediation effect and the dependent variables are dichotomous 2.can I use the estimated indirect effect and confidence intervals that are produced from the indirect command? 3. If no to Q2 what do I need to do to get a reliable measure of the indirect effect in this case? Thank you very much Lorraine 


1. You can use MODEL INDIRECT with the WLSMV estimator in your situation. 2. Yes. For further information on this topic, see the following paper on the website: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. 


Dear Linda,Bengt, I have been searching on your forum quite a while, but couldn't find an answer to the following question: Is it possible to perform mediation analysis in a multinomial logistic regression in MPLUS? And if so, would you have a good reference that deals with this issue? Thanking you in advance! 


See the following paper which is available on the website: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. 

WEIHAI ZHAN posted on Wednesday, May 02, 2012  1:10 pm



Dear Drs. Muthen, I am running mediation analysis with independent continuous variable X, dichotomized mediator M, and dichotomized outcome Y (n = 1146). M & Y were not rare. I used WLSMV estimator and Delta parameterization. The pvalue for the regression coefficient of X was 0.385 in the model X > M; pvalue for the regression coefficient of M was 0.000 in the model M > Y; and the pvalue for the indirect effect of X > M >Y was 0.39, nonsignificant. However, when I used ML estimator for path analysis (ML seems cannot be used in modeling indirect effect), the pvalue for the regression coefficient of X was 0.000 in the model X > M; pvalue for the regression coefficient of M was 0.000 in the model M > Y. So I sense a significant indirect effect of X > M >Y if ML estimator could be used in the mediation analysis. Do you have any suggestions to resolve this discrepancy? Thank you very much. Weihai 


The difference you find is because with WLSMV m*, the latent response variable underlying m, is used whereas with ML m is used. See the following paper which is available on the website for further information: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. 


Thanks. It seems that the appendix Section was not included in the paper you mentioned. For example, in page 28, you wrote, "The Mplus inputs are shown in the addpendix Section 14.5". However, I could not find it. May I know where I can find the "addpendix Section 14.5"? Thanks in advance. 


I realized that the paper (Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus) contains only 110 pages @ http://www.statmodel.com/download/causalmediation.pdf The whole paper may actually contain more than 143 pages based on information at page 86. So is it possible for me to get a complete version of this paper? Thank you very much. 


You find the Appendix where the paper is posted at http://www.statmodel.com/papers.shtml See Mediational Modeling. 

ywang posted on Monday, June 18, 2012  8:02 am



Dear Drs. Muthen: I am using mediational analysis with a categorical dependent variable with three categories. Should I just specify that the dependent variable as categorical or do I have to generate two dummy dependent variables and include both of them in the SEM? If I need to generate two dummy variables, should I specify the correlation between the two dummy dependent variables as 0? Thanks a lot for the help! 


Just put the variable on the CATEGORICAL list. It will be treated as an ordered categorical variable. 


Dear Drs. Muthen, I calculated a mediation analysis with two continous mediators. The dependent variable is dichotomous, the independet variable is continous. I used the WLSMV estimator. Now I'd like to calculate the probabilities for the indirect effects using the formula: prob (y=1) = f (threshold + b1*x1 + b2*x2...) I suppose in my case the formula would be: prob (y=1) = f (threshold + b1*x1 + b2*x2 + b3*x3) where b1 is the unstandardized path coefficient between IV and M1, b2 between M1 and M2 and b3 between M2 and DV. My idea is to calculate probilities for combinations of values for IV, M1 and M2 (e.g. IV = 1sd, M1 = 1sd, M2 = 1sd; IV = mean, M1 = 1sd, M2 = 1sd and so on...). But, as I understand it from previous posts, b1 and b2 would be OLS regression coefficients, whereas b3 is a probit coefficient. 1) Is this right? Or am I missing something? 2) Is it still possible to calculate a probability for the DV to be 1? 3) I included some control variables when calculating the mediation effect. Can I leave them out when calculating the probability? Thank you very much in advance! Katja Schuller 


The product of a linear and probit coefficient is a probit coefficient because the probit is the final outcome. 1. It looks correct. 2. Yes. 3. You should include the control variables. See on the website for further discussion: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. 


Thank you very much for your quick reply! Sincerely, Katja Schuller 

ema budahab posted on Monday, August 27, 2012  6:59 am



Hello, I have a question about using regression in mediation. i have two catogrs vari, and interval vari as independent vari , one mediate is interval, one depent is inteval , so i used 4 stap's keeney and Baron after run regrssion that i did not use catogories vari in test mediation coz i can not use in regreasion after coding them as 1,2 ,3 not as dummy. in addition one of catogo vat as demographics i dropped after result anova found it that it has not relationship with dependent variable in first hypothesis so how can solve my mistake after i am waiting on viva ? is there any theory can you recomened me to read it to justify these mistakes in my viva? 


Dear Drs. Muthen, I have still a question regarding transforming the probit coefficients into probabilities: I understood that I could use the standardized coefficients as well (STDYX because all my IV's and mediators are continous) to calculate the probabilities. In one previous post you mentioned that then I should standardize the IV's and mediators as well. Do you mean zstandardize? Then all the variables would have a mean of zero and a sd of 1, which would make the calculations much easier... (In my output I cannot find an unstandardized solution, there ist only STD and STDYX...) Thank you very much! Sincerely, Katja Schuller 


You should use the raw results to compute probabilities. See Chapter 14 of the user's guide. 


Thank you very much for your reply. I read the explanation in chapter 14. I only want to be sure that I am doing the right thing: Usually I can find the raw coefficients in the first table (MODEL RESULTS) of the output. But, I found that the estimates in the first table (that are supposed to be the raw coefficients) are exactly the same as the STD estimates in the following table. Therefore I thought that I do not get any real "raw" coefficients in my output. Could this be true? Thank you once more for your help! Katja Schuller 


Please send the output and your license number to support@statmodel.com. 

Nancy Hood posted on Thursday, August 30, 2012  7:23 am



Hello, I am interested in calculating probabilities from a probit regression model with a 3level ordinal mediator (and a 3level ordinal outcome) using WLSMV estimation. I'm not sure what values to plug in to the probability equation for the mediator variable (M) to obtain the following interpretation(s): "the probability that Y=0 given that M = 0 (or 1 or 2) and X is at its mean is..." Can the threshold values for m* be used for this purpose? Would the probability for the middle category of M be the difference between the probabilities for the first and last categories of M? Thanks in advance for your time! 

Malki Stohl posted on Thursday, August 30, 2012  11:42 am



Hello, I am running a mediation model using dichotomous dependent, independent, and mediator variables. My data also has weighting so I used type=complex and could not use boot strapping. My problem is that sometimes when I run the model I get different model estimates. If I open and close Mplus sometimes the model estimates will change as well. The estimates are all similar (usually changes in the hundreth decimal place) but I can't figure out why this is happening or which estimates to use. Thanks. 


Nancy Hood: I think you mean 3category outcomes rather than 3level. WLSMV uses the underlying continuous latent response variable m* as the predictor of the distal outcome y, not the ordinal m itself. To answer questions like yours, see Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. on our web site under Papers, Mediational Modeling. 


Malki Stohl: We haven't seen this kind of behavior. You would have to send us 2 outputs to show us. Be sure to use Version 6.12. Because you have a binary mediator and distal outcome, you should also see the paper I mention above. If hard to understand, contact the stat consulting at your Univ. 


Dear Drs. Muthen, I noticed that my idea how to calculate probabilities from the probit coefficients (posted here on 23rd of August) might not work: In the formula prob (y=1) = f (threshold + b1*x1 + b2*x2 + b3*x3) b1 would be an OLS regression coefficient from a regression from the IV on M1 b2 would be an OLS regression coefficient from another regression from the M1 on M2 and b3 would be a probit coefficient from a third regression from the M2 on the dichotomous DV. Thus, using the formula above, I would add OLS coefficients and probits and not multiplicate them. Therefore the result would not be a probit coefficient... Obviously it is only possible to calculate probabilities for the total effect and not for the single specific indirect effects. Is that right? Is there another way to calculate the probability for the DV to be 1 for different values of the IV, M1 and M2 based on the calculation of the specific indirect effects? Because, when I use the formula above for the total effect, I can only calculate probabilities for different values of the IV... Sincerely, Katja Schuller 


See the paper on our web site: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. 


This forum has been super helpful so far! I have one follow up question. I have run a mediation model with two continuous predictors and a dichotomous outcome. Now I am trying to test moderation of the mediation model (moderator = male, female). I attempted to determine which model better fit the data by running two models. In the first, I specified one class (everyone) with the KNOWNCLASS option, and in the second I specified two separate classes with the KNOWNCLASS option (male and female). I then compared the bayesian estimates to determine which was the better fitting model. Since the model with two classes fit better, I used the GROUPING option to assess whether the indirect paths and bootstrap confidence intervals are significant for both males and females. Does this seem like the right approach? Or is it inappropriate to use Bayesian estimates in this way? 


I'm not sure what you mean about using the GROUPING option. It is not available with Bayes. 

Tchiki Davis posted on Friday, September 14, 2012  10:42 am



Thank you for such a quick response! Let me clarify. First I used the Bayes estimates to determine which model was a better fit. Since the model with 2 classes fit better, I wanted to determine whether the indirect effects and bootstrap confidence intervals were significant for both boys and girls. Since bootstrap is not allowed with ALGORITHM=INTEGRATION and Bayes estimates are only given for ALGORITHM=INTEGRATION, I wrote new syntax using the GROUPING option instead of the KNOWNCLASS option to follow up on why the two class model was better. So I guess I have two questions: 1. Are Bayes estimates appropriate for determining whether a model with two classes specified is better than a model with one class specified? 2. Does it seem right to assess indirect effects and bootstrap confidence intervals for each gender using the GROUPING option? Thank you again for your help! 


Please send the output where you use the GROUPING option and your license number to support@statmodel.com. 

nata posted on Thursday, November 22, 2012  3:55 am



Dear MplusTeam, I am running a full SEM in which I have 9 observed binary indicators (x1x9), 3 latent factor (f1f3) and 1 binary outcome (y). The model is the following: f1 BY x1 x2; f2 BY x3x6; f3 BY x7x9; y ON f1f3; f2 ON f1 f3; f3 ON f1; f1 ON f3; Because my crosssample design I can not to establish the causality of relationships between F1 and F3. Then, I suppose that a reciprocal causation exists. However Mplus doesn't give me any output. Do you know why? Thanks 


Please send your input, data, and license number to support@statmodel.com. 

Sara posted on Saturday, January 05, 2013  6:54 pm



Hi Drs. Muthen, Thank you for this forumit has been very helpful for me in terms of learning how to run and interpret various analyses. I am testing a mediational model with a categorical DV and would like to report standardized estimates. My output provides S.E. and pvalue estimates for the unstandardized estimates, but not for the standardized estimates. Additionally, my output also does not provide a pvalue for the R^2 values. Is there a way to obtain this information? Thanks! 


It sounds like you are using the WLSMV estimator. You can try ML or Bayes. 

Leslie Roos posted on Friday, February 01, 2013  11:16 pm



Hello! Thank you again in advance, for your continued advice. I am testing a mediation question with binary IV, mediator, and DV using a path analysis with multiple (continuous and categorical) covariates in a complex dataset (strat, cluster & weight). I have successfully run the analyses using model & model constraint for the indirect effect, and have determined there is a significant partial mediation as indicated by the pvalues, log odds, and CIs. but am now stuck on (1)if it is possible to appropriately determine the path coefficients for binary variables? A paper with a similar design (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1447389/) reports calculating path coefficients using tetrachoric correlations, but I have been unable to figure out how to produce these with the complex sampling design adjusted for covariates. (2) Is this possible & could you direct to appropriate analysis design? (3) Would the 'slopes' produced by the following syntax equate to the coefficients? TYPE = Complex; ESTIMATOR = WLSMV; ITERATIONS = 1000; CONVERGENCE = 0.00001; MODEL: M ON X; Y ON M; Y ON X; Y ON Cov1 cov2 cov3; M ON Cov1 cov2 cov3; MODEL Indirect: Y VIA M X; Best and Thank you Leslie 

Leslie Roos posted on Friday, February 01, 2013  11:33 pm



I should add: OUTPUT: SAMPSTAT; STANDARDIZED; MODINDICES (1); RESIDUAL; 


With binary mediator and DV, you should use the approach described in Muthen, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. which is on our web site with Mplus scripts. 

Leslie Roos posted on Saturday, February 02, 2013  10:18 am



Thank you for the direction to advice and the scripts  They are a wonderful resource! I've run into trouble using the 2 options described in the manuscript for binary mediator & outcome (Monte Carlo & Bayes) as neither seem to be permitted with Complex weight, stratification, cluster variables? While I believe I am getting reliable Log Odds Ratios and CIs with the syntax above, I was attempting to get path coefficients  the only paper that I find that does this uses the tetrachoric correlations, and I was attempting to replicate those through the syntax above, but wasn't sure if the "model estimated slopes" were appropriate as path coefficients here? Thank you again for your time in advising and suggestions! 


Bayes does not handle complex survey features, but the ML approach I show does. You get path coefficients  they are the Model estimated slopes. But the indirect effect is not a*b as in the continuous variable case. 

Leslie Roos posted on Saturday, February 02, 2013  10:53 am



I see  thank you! So it is appropriate to utilize the slopes for specific observed variable relationships (M on X, Y on X, Y on M), but not to create an indirect effect path as in continuous mediator variable analyses? It seems this is due to the non linear nature of logistic functions as well as the Mediator being treated as continuous in one regression & categorical in the other? Thank you again! 


Right. Yes. 

lamjas posted on Tuesday, September 03, 2013  1:09 am



Hi Dr. Muthen, My model has one binary outcome (observed), one binary mediator (observed), one continuous mediator (latent), two independent variables (latent continuous), and two covariates (age and gender). The syntax of the model as follow: Variable: Usevariables are age gender clm1clm5 iv1iv5 y binm; Categorical are y binm; Analysis: Estimator is WLSMV; Model: clm by clm1clm5; !latent continuous mediator iv1 by iv1iv5; !latent continuous IV iv2 by iv6iv6; !latent continuous IV y on binm clm iv1 iv2; binm clm on iv1 iv2; Y binm clm iv1 iv2 on gender age; I read several discussion threads and the "Causal mediation" paper. I want to confirm the following issues: (1) Should I report unstandardized or standardized coefficients? If I should report standardized coefficients, I understand that I should use StdYX if exogenous variables are continuous. Then, how about the coefficients of Y on binm (both are binary items)? (2) For indirect effects, is it appropriate to use "Model Indirect" function to obtain when the model has both binary and continuous mediators with a binary outcome? (3) Is is necessary to use bootstrap methodology to determine the significance of mediation when binary mediator is in the model? Thank you for your help! 


1. It is your choice which coefficients to report. Use StdY. 2. Yes with WLSMV. 3. Not necessary but you can. 

ywang posted on Tuesday, September 03, 2013  12:38 pm



This is a followup question for the above discussion. Is it possible to use bootstrap methodology with Mplus 7 to determine the signficance of mediation for the mediated discretetime survival analyses (y1y5 are categorical outcome variables, xMy15)? I remember that I cannot do it with earlier version. Thanks for the help! 


Please send the output of your earlier analysis and your license number to support@statmodel.com so we can see why bootstrap is not allowed. 

jiesi guo posted on Wednesday, September 25, 2013  5:45 am



Hi Dr. Muthen, My model has one binary outcome Y (observed), one categorical mediator M (4 cates, 04), two independent variables A & B(two latent continuous), and one covariate (age). I also use used the latent moderated structural (LMS) equation approach to model the latent interaction between A and B Y The syntax of the model as follow: Variable: Usevariables are age a1a5 b1b5 Y M; Categorical are Y M; define: standardize age; ANALYSIS: estimator=MLR; TYPE=RANDOM; link = probit; Algorithm = integration; INTEGRATION = MONTECARLO; Model: A by a1a5; !latent continuous IV B by b1b5; !latent continuous IV Y on M A B; M on A B; bXa  A xwith B; Y M on bXa; Y M A B on age; I use prob (y=1) = f (threshold + b1*x1 + b2*x2 ...) to calculate probability of indirect effect for combinations of values of A and M (e.g. A = 1sd, M = 1, A=1sd, M = 3 and so on...). I also calculate the probability of direct effect of A/B on M P(Y=1x)=F(t1b1*x1) P(Y=2x)=F(t2  b1*x1 )  F(t1  b1*x1) P(Y=3x)=F(t3  b1*x1 )  F(t2  b1*x1) P(Y=4x)=F(t3+b1*x1) (1) is this correct (syntax & calculation)? 

jiesi guo posted on Wednesday, September 25, 2013  5:46 am



followup questions (2)however, can I use this formula to calculate the interaction effect between A and B on M e.g, P(Y=1x)= F(t1b1*Ab2*Bb3*A*B) P(Y=2x)=F(t2b1*Ab2*Bb3*A*B)F(t1b1*Ab2*Bb3*A*B) and so on. (3)if I standardize age, do I need to include the control variable because the mean of age become zero. Thank you for your help! 


Indirect effects are not computed for different values of the mediator. They are computed for different values of the observed exogenous variable. Mplus Discussion posts are limited to 1500 characters. In the future, keep your posts within that limit. 

jiesi guo posted on Wednesday, September 25, 2013  4:22 pm



Dr. Muthen, Thank you for your quick feedback. sorry for the previous posts. if my understanding is correct, prob (y=1) = f (threshold + b1*x1 + b2*x2 ...)is used to calculate probability of indirect effects A>M>Y. b1 is path coefficient of A>M b2 is path coefficient of M>Y x1 & x2 can be combinations of values for A and M (e.g. A = 1sd, M = 1, A=1sd, M = 3 and so on...) (2)regarding direct effect, can I use following formula to calculate the interaction effect between A and B on M e.g, P(M=1A,B)= F(t1b1*Ab2*Bb3*A*B) P(M=2A,B)=F(t2b1*Ab2*Bb3*A*B)F(t1b1*Ab2*Bb3*A*B) P(M=3A,B)=F(t3b1*Ab2*Bb3*A*B)F(t2b1*Ab2*Bb3*A*B)and so on. (3)if I standardize age (covariate), do I need to include the control variable because the mean of age become zero. 


If you have M= a*X + e1 Y = b*M + c*x + e2 the indirect effect is Y = a*b*x +...... So I don't understand your formulas. Also, with a binary Y, the probability is not: P(Y=1x) = f(threshold + a*b*x) because you need to include the effect of the M residual e1. And, furthermore, indirect effects in terms of probabilities should be expressed as in my paper: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. look for instance at section 6.1. 

jiesi G posted on Friday, September 27, 2013  7:11 am



Hi  thank you for your response. if the effect of the residual e1 is included. indirect effect: P(Y=1x) = f(threshold + a*b*x+b*e1) direct effect: P(Y=1x) = f(threshold +c*x) (1) it this correct? (2)in my model two IVs (latent variables) with their interaction using MLS approach are included. I have a moderated mediation. M=a1*X+a2*W+a3*XW+e1 Y=b*M+c1*X+c2*W+c3*XW+e2 the conditional indirect effect of X on Y is b(a1+a3W) Can I calculate probability for this indirect effect? (3)Total indirect effect: P(Y=1x,w) = f(threshold + a1*b*X+a2*b*W+b(a1+a3W)X+b*e1) ??? 


(1) No, this is not correct. The e1 term is not an observed variable but an unobserved residual that needs to be integrated out. This is what is discussed in Section 6.1 that I referred to. (2)(3). Number (1) needs to be clear first. 

jiesi G posted on Friday, September 27, 2013  8:32 pm



Thank you for your quick response. I read Section 6.1 carefully and understood to use the causal indirect effect ¦µ[probit(1, 1)]¦µ[probit(1, 0)] to calculate the probability difference. (2) with regard to moderated mediation that I posted previously, Y=¦Â0+ ¦Â1*M+ ¦Â2*X+ ¦Â3*W+ ¦Â4*XW + e1 M= ¦Ã0+ ¦Ã1*X+ ¦Ã2*W+ ¦Ã3*XW+e2= ¦Ã0+ ¦Ã2*W+(¦Ã1+¦Ã3*W)*X Calculate the causal conditional indirect effect of X on Y (i.e., ¦Â1*(¦Ã1+¦Ã3*W)) probit(1, 1) = [¦Â0 + ¦Â1¦Ã0+¦Â2+ ¦Ã2*¦Â1*W +¦Â3*W+¦Â 4*W+ ¦Â1*(¦Ã1+¦Ã3*W)/sqrt(¦Â1^2*sig2+1) probit(1, 0) =[¦Â0 + ¦Â1¦Ã0+¦Â2+ ¦Ã2*¦Â1*W +¦Â3*W+¦Â 4*W)/sqrt(¦Â1^2*sig2+1) sig2 is residual variance of mediatior. And I want to calculate different values of W (i.g., W=+1sd, W=mean, W=1sd) Is this correct? Thank you. 


Your notation came out garbled  it looks like you have the same slope notation for both the Y and the M equation. 

jiesi G posted on Saturday, September 28, 2013  1:57 pm



Sorry for that. The formula works well on my laptop. Thank you for your quick response. I read Section 6.1 carefully and understood to use the causal indirect effect (cumulative normal distribution [probit(1,1)][probit(1,0)]) to calculate the probability difference. (2) with regard to moderated mediation that I posted previously, Y=b0+ b1*M+ b2*X+ b3*W+ b4*XW + e1 M= a0+a1*X+a2*W+a3*XW+e2=a0+ a2*W+(a1+ a3*W)*X+e2 I am thinking of calculating the causal conditional indirect effect of X on Y (i.e., b1*(a1+a3*W)) probit(1, 1) = [b0 + b1*a0+b2 + a2* b1*W +b3*W +b4*W+ b1*( a1+ a3*W)]/sqrt(b1^2* sig2+1) probit(1, 0) =[b0 + b1*a0+b2+ a2*b1*W +b3*W+b4*W)]/sqrt(b1^2* sig2+1) sig2 is residual variance of mediatior. And I want to calculate different values of M (i.g., M=+1sd, M=mean, M=1sd) Is this correct? Thank you. 


Your notation is different from (1) and (2) in my paper. If you change your notation it will be easier for you to see what's wrong. Note that in my eqn (1), beta3 is the slope for the interaction term x*m. You have the interaction term x*w  again, I would suggest changing to my notation. More importantly, your sqrt(variance) expression is missing the counterpart to my beta3 which you see in my eqn (23). So changing your notation, you should be able to follow my formulas in (22) and (23). 

jiesi G posted on Sunday, September 29, 2013  3:16 am



Thank you for your quick response. In my model, two IVs (latent variables; X,Z) and with their interaction (XZ) using MLS approach are included. We have a moderation effect is mediated by M. it was the same as Figure 15 in your paper. Direct effect: X, Z and XZ predict Y (outcome) Indirect effect: X, Z and XZ predict M, and M predicts Y Y=b0+ b1*M+ b2*X+ b3*XZ + e1 eqn(45) M= r0+r1*X+r2*Z+r3*XZ+e2 eqn(46) DE = b2+b3*Z; TIE = b1(r1 + r3*Z) (1) Can I calculate probability difference with and without the conditional indirect effect of X on Y (i.e., TIE)? (2)If Yes, I think sqrt(variance) is still sqrt(b1^2* sig2+1) because of no interaction between IVs and M. I try to write it down. probit(1, 1) = [b0 + b1*a0+b2 +b3*Z +b1*r2*Z+ b1*( a1+ a3*Z)/sqrt(b1^2* sig2+1) probit(1, 0) =[ b0 + b1*a0+b2 +b3*Z +b1*r2*Z)/sqrt(b1^2* sig2+1) 

jiesi G posted on Sunday, September 29, 2013  3:19 am



sorry. I forget to change my notation. probit(1, 1) = [b0 + b1*r0+b2 +b3*Z +b1*r2*Z+ b1*( r1+ r3*Z)/sqrt(b1^2* sig2+1) probit(1, 0) =[ b0 + b1*r0+b2 +b3*Z +b1*r2*Z)/sqrt(b1^2* sig2+1) Thank you. 


Looks like you have the formulas right, taking into account that your Z is my c and your b3 is not my beta3, and that you have an interaction among your covariates and I have an interaction between one covariate and the mediator. You can compute both TIE and PIE according to the paper. 


Hello, I am running a model with a dichotomous independent variable, 6 dichotomous mediators that then predict 1 continuous mediator, and a continuous dependent variable. 1) Is it okay to use ML estimation when some of the mediators are dichotomous? Do I have to specify in the model syntax that some of the mediators are categorical, or is this type of syntax only necessary when the DV is categorical? 2) In running a mediation/path model such as the one I described, I'm getting quite different results when I use bootstrapping vs. not with WLSMV as an estimator. The results using bootstrapping or not do not differ when I use ML without specifying that some of my mediators are dichotomous. I'm not sure how to interpret this or which model is accurate to run. Is running a biascorrected bootstrap a must? 3) Most of the direct effects in my model are significant, but the vast majority of specific indirect effects are insignificant. I'm having a hard time interpreting why indirect effects would be so consistently insignificant when the direct effects that comprise this indirect path are significant? Thanks so much! 


With binary mediators you need to use WLSMV in order for the usual indirect effects to be meaningful. The mediators are DVs and therefore need to be specified as categorical. You don't need bootstrap unless your sample is small. Indirect effects are insignificant if the mediators are not the right ones. 

Jetty posted on Friday, November 08, 2013  1:08 pm



Bengt and Linda, I would really appreciate your help with my analysis. I have 1 focal continuous predictor, 4 continuous mediators, and 1 ordered categorical outcome. All variables are observed. I am trying to decide on best way to test mediation and the most intuitive way of showing the results. I decided on INDIRECT and bootstrap, so it necessitated WLSMV estimator. But I just realized that the obtained coefficients are probit, not logit, so I lose the interpretability of ORs. Comparing the estimates, SE, and associated p values between model run with WLSMV and model run with MLR show very little differences, and nothing fluctuates in or out of significance. I have two questions: 1. Is it acceptable to report the MLR results for the path analysis in order to get ORs, but the WLSMV model when describing the mediators? Alternatively, I used model constraints with MLR to do a Sobel test, but I'd rather stick with bootstrapping. 2. Which fit indices would you recommend I present if I go the MLR route? For WLSMV, I was thinking Chi sq, RSMEA, and CFI. Thanks in advance! 


1. No. 2. There are no absolute fit indices with maximum likelihood and categorical variables. Nested models can be tested using 2 times the loglikelihood difference which is distributed as chisquare. 

Jetty posted on Friday, November 08, 2013  3:03 pm



Thanks, Linda. I will go with WLSMV but I have trouble finding examples in the literature. Would it be to sufficient to report the probit coefficients and interpret them in a general way (e.g, positive coefficient means that an increase in the predictor leads to an increase in the predicted probability of Y). I could calculate predicted probabilities but since they depend on the values of other predictors in the model as well as the thresholds, I am not sure how many predicted probabilities I need to calculate. How are the predicted probabilities interpreted when they refer to indirect effects? 


WLSMV is a good choice here. Your indirect effect a*b refers to the continuous latent response variable behind your ordinal outcome. I wouldn't worry about probabilities, but if you insist you need to read about "causal effects" in either ValeriVanderWeele (2013) in Psych Methods or in http://www.statmodel.com/download/causalmediation.pdf 

Jetty posted on Tuesday, November 12, 2013  6:08 pm



Thanks, Bengt. I have another question. As I mentioned, I am running a simple path analysis, with 1 focal continuous predictor, 4 continuous mediators,4 covariates, and 1 ordered categorical outcome. All variables are observed. I decided on INDIRECT and bootstrap in order to test mediation. I just realized that I am getting different estimates when I include the MODEL INDIRECT command, in addition to MODEL. Specifically, when I am not testing mediation via the indirect command, my syntax looks like this: ANALYSIS: bootstrap=50000; Model: Y on X M1 M2 M3 M4 C1 C2 C3 C4; output: CINTERVAL (BCBOOTSTRAP); Here's when I include MODEL INDIRECT. ANALYSIS: bootstrap=50000; Model: M1 M2 M3 M4 on X C1 C2 C3 C4; Y on X M1 M2 M3 M4 C1 C2 C3 C4; MODEL INDIRECT: Y IND M1 X; Y IND M2 X; Y IND M3 X; Y IND M4 X; output: CINTERVAL (BCBOOTSTRAP); Why do my estimates for the relationships between Y and all covariates and mediators differ between the two models? The differences are greater than what I see between different bootstrap draws. Thank you! 


MODEL INDIRECT is done after model estimation. It should not affect model estimation. Please send the two outputs and your license number to support@statmodel.com. 


I am testing a path model where I have a dichotomous IV predicting 5 mediators (4 dichotomous and 1 continuous), these predict 1 continuous variable, which predicts the continuous DV. The dichotomous IV also directly predicts the continuous DV. (1) The direct relationship between the IV and DV is significant and there are also multiple significant indirect pathways between the two. I want to know the degree to which the relationship between the IV and DV is reduced after considering all the indirect effects/mediators. Is there a way to ask MPLUS for this information? My initial thought was to run the IV on the DV without any of the mediators, see the standardized estimate and R square on the DV, and that would give me an idea of how much the standardized estimate is reducedbut I'm not sure if there's a better way to do this? (2) The model uses the WLSMV estimator. I am not getting any pvalues associated with the standardized estimates, although pvalues do appear for the unstandardized estimates. Is there a way for me to ask for the pvalues associated with the standardized estimates? Can I report the standardized estimates and along with them the associated pvalues for the associated unstandardized estimates? (3) My DV has a high level of kurtosis and is also skewed. Does the WLSMV estimator correct for this or is it robust enough? Is there another estimator I should use? 


1. See the MacKinnon book where the direct effect without mediators, c, and the direct effect with mediators, c', are compared. 2. These will be available in Version 7.2. 3. WLSMV is not robust to nonnormality. 


Hello Linda, Thank you so much for your response. To followup: 1) Regarding the direct effect without mediators (c), the direct effect with mediators (c'), and a comparison of the two, do you have the specific reference to the MacKinnon book? Is there a way to get those particular statisitcs directly from MPLUS? 2) Is it acceptable to report standardized estimates and the pvalues from the unstandardized estimates (assuming I do not have access to Version 7.2 to get the standardized pvalues)? Thanks again for your help. 


1. The reference is in the user's guide. There is no way to get these values directly from Mplus. 2. No. You will need to wait for Version 7.2 if you want these. 

jiesi G posted on Saturday, November 16, 2013  3:20 am



I am running a SEM model with continuous predictors and binary outcomes, including a latent interaction. I use MLR estimator and link = probit.The models (with and without latent interaction) did not report related fit statistics. I conducted a ChiSquare difference test (TRd) for the nested models (with and without latent interaction) using the loglikelihood. pvalue is <.001, BIC and AIC became smaller when including the latent interaction. It seems like goodness of fit is getting better. I am writing up a paper for the analysis, I want to report some fit statistics (e.g., CFI, TLI). Can I compute them based on these models? Thank you ! 


With TYPE=RANDOM and XWITH, means, variances, and covariances are not sufficient statistics for model estimation. Chisquare and related fit statistics are not available. In this case nested models can be compared using 2 times the loglikelihood difference which is distributed as chisquare. 

jiesi G posted on Sunday, December 15, 2013  4:49 am



Hello, Linda, I have a simple mediation model where X is continuous latent variables, M is a continuous mediator, Y is a binary outcome variable. I use MLR estimator with link=probit. X>M is MLR regression coefficient (a) M>Y is probit regression coefficient (b) (1) Can I just use MODEL CONSTRAINT to calculate the indirect effect a*b ? (2) Is it necessary to use your method that introduced in "causal mediation" paper to calculate the indirect effect by including the effect of mediator M residual variance ? (3)If so, when I have two continuous mediators and one independent variable X1>X2>M>Y in which only Y is binary outcome variable, how can I calculate the indirect effect from X1 to Y including the effect of residual variance of continuous mediators X2 and M? Thank you very much. 


(1) that refers to the indirect effect on the latent response variable Y*, not Y. See my 2011 causal effects paper. (2) Yes, if you want the effect on Y. (3) The general mediation effect formulas still apply, where the formulas in my paper have to be modified. 


Dear Dr. Muthen, I’m fairly new to mediation analysis (as are others in my field), so I’m not sure whether the model I’d like to run makes sense (or is doable)? The Mplus syntax of my model is shown below. What complicates things is that I have continuous (Y1), count (neg.bin. owing to many zeroes and skewed distribution, Y3Y4) and binary (Y2) mediators as well as dependent variables. I also have several covariates influencing these mediators/dependent variables and I’m not sure how that affects to the calculations of indirect effects. Moreover, I have clustering to account for clustering within families. VARIABLE: CLUSTER IS clus; CATEGORICAL IS y2; COUNT ARE y3 y4 (nb); ANALYSIS: ESTIMATOR = MLR; INTEGRATION=MONTECARLO(5000); TYPE = COMPLEX; MODEL: y1 ON x; y2 ON x y1; y3 ON x y1 y2; y4 ON x y1 y3; y1y4 ON cov1cov10; Is it sensible to estimate this model and look for how the effect of x on Y4 is potentially mediated by Y1Y3? 


This is a complex setting that requires several considerations to be meaningful. Usually we don't give general analysis advice, but this setting has some novel features requiring new tools. You probably want to approach the analysis in small steps, analyzing parts of the model first. Here is what comes to mind: For the part of the model that ends with y2 (so including covariates, x, y1, and y2) you have a binary final outcome y2. This alone calls for special mediation modeling in line with "causal inference" based on "counterfactuals". I have summarized the issues in my paper on our website: Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Click here to view the Technical appendix that goes with this paper and click here for the Mplus input appendix. Click here to view Mplus inputs, data, and outputs used in this paper. There is also a simpler, shorter version of this paper under review that goes with a forthcoming Mplus version. Consider next the part of the model that ends with the count variable y3. The y3 variable is influenced by the binary y2 and my paper discusses the special issues that arise with a mediator (y2) that is binary. Then you have the issue of mediation with a count variable as the final DV; also discussed in my paper. Next, you have the full model with a final count variable (y4). Here you have to also consider what it means to have a mediating count variable (y3). My paper discusses that. To simplify I would recommend a series of models where you have only one mediator: 1. IVs: covs, x. DVs: y1, y2 2. IVs: covs x, y1. DVs: y2, y3 etc. 


Thanks Bengt! Sounds complicated as I expected... It doesn't help that Y1 is actually a survival variable, but modeling it with a Cox model makes Mplus to crash immediately. How robust MLR is to very skewed continuous variables? I think the situation would a little easier if I could treat Y3Y4 as continuous instead of count variables. 


If you have a crash, please send to Support. MLR doesn't help enough when you have a strong floor effect, certainly not with more than 50% at zero. 


I think I'll skip the causalassumptions from my model for now... However, one thing about my data troubles me: the binary DV (Y2) is whether an individual married or not, the first count variable (Y3) is how many offspring the individual had (obviously zero if not married) and the second count variable (Y4) is how many of those offspring born survived to adulthood (obviously zero if no offspring were born). Since Y3 and Y4 have many zeroes, I have fitted ZINB model for those outcomes (that fits better than NB only) and used Y2 to predict the zeroinflation part of Y3 and Y3 to predict the zeroinflation in Y4. Does this approach make sense or is there an alternative option in MPlus to handle those excesszeroes in Y3 and Y4 in this setting? Many thanks in advance! 


Seems reasonable, although perhaps dichotomize Y3 when it predicts the zeroinflation in Y4. 

Kim posted on Wednesday, April 09, 2014  2:50 am



Dear Professor Muthen, I have two questions related to a SEM model using mediation with a binary dependent variable. In my model, I have four dummy independent variables, one continuous latent mediation variable and one binary dependent variable. I have calculated indirect effects from all independent variables over the mediation variable on the dependent variable. I used the IND command for this in a PROBIT regression (estimation WLSMV). My questions are the following:  What type of mediation does MPlus use in a probit model when using the IND statement? Is it the sobel test mediation? Or another type?  What type of standardization should I specify in order to get the correct standardized regression coefficients? For now, I only used the unstandardized b's, but I would prefer standardized coefficients. Should I use STDYX, STDY, STDX or STD? Or another statement? Thank you very much in advance for helping me out. 


MODEL INDIRECT uses Delta method standards errors. Sobel is a special case of Delta method standard errors. With continuous covariates, use StdYX. With binary covariates, use StdY. 

Kim posted on Wednesday, April 09, 2014  11:04 pm



Dear Professor Muthen, Thank you very much for this usefull response. Still, I have one more question about the standardization. I have both continuous covariates (age) as binary covariates (dummies for education & marital status)in my model. What standardization should I prefer? STDYX or STDY? Or should I run the model twice, once with STDYX for the binary covariates and once with STDY for the continuous covariates? And does the type of mediation variable matter, as this is a latent factor (so continuous variable)? Thank you in advance. 


You can get both standardizations at the same time by asking for both or asking for STANDARDIZED. It is the observed exogenous variable that matters not the mediator 

Kim posted on Wednesday, April 16, 2014  5:22 am



Dear Professor Muthen, Thank you again for your usefull response. Unfortunately, I cannot obtain the STDY standardization for the binary covariates in my model as I am using the WLSMV estimation to estimate a PROBIT model. In the MPlus users guide on p.643, it is stated that "For weighted least squares estimation when the model has covariates, STDY and standard errors for standardized estimates are not available". So, I can only become the standardized regression coefficients for the continuous covariates and not for the binary covariates. Is there a solution for this problem? Should I use the STD standardized coefficients for the binary covariates? Or can I calculate the STDY standardized coefficients from the STDYX and STD coefficients? Thank you very much for helping me once again. 


You can get STDY from STDYX by getting rid of the X standardization. That means dividing the STDYX coefficient by the SD for the binary X variable. 

Yvonne LEE posted on Thursday, June 26, 2014  4:58 pm



I am new to Mplus and is now doing SEM on an categorical outcome. I would like to compare the following 3 significant indirect paths. SES0123 ind PORNUSE F3 HTWS F1 SES0123 ind AQ_PA F4 HTWS F1 SES0123 ind AQ_PA YSQ_ENTI HTWS F1 I write the following commands but failed to run the analysis. Pls advise. MODEL: f1 by EAbuse PAbuse SAbuse ENeglect CTS_PV NegExpWo; f3 by SDOM HSAQ_SE SEX_OB MASA_SC Bumby_Ha SEXCOP; f4 by CSS_TLV BAI_EA HIT_MINI CAVS; f5 by YSQ_Defe YSQ_ISO YS_NoCon AQ_H SELF_PI Relation; HTWS on f1; f3 f4 f5 YSQ_ENTI on HTWS; PornUse AQ_PA on f3 f4 f5 YSQ_ENTI; SES0123 on PornUse AQ_PA; SES0123 on PornUse(aa); PornUse on f3(ab); f3 on HTWS(ac); HTWS on f1(ad); SES0123 on AQ_PA(ba); AQ_PA on f4(bb); f4 on HTWS(bc); HTWS on f1(bd); SES0123 on AQ_PA(ca); AQ_PA on YSQ_ENTI(cb); YSQ_ENTI on HTWS(cc); HTWS on f1(cd); MODEL INDIRECT: SES0123 IND f1; AQ_PA IND f1; MODEL CONSTRAINT: NEW(SexP VOf4 VOenti); SexP=aa*ab*ac*ad; VOf4=ba*bb*bc*bd; VOenti=ca*cb*cc*cd; MODEL TEST: SexP=VOf4; SexP=VOenti; VOf4=VOenti; 


Please send the output and your license number to support@statmodel.com. 

db40 posted on Friday, September 12, 2014  5:13 am



Hello Linda, I have question about the model constraint command. I have a binary mediation model and I am controlling for a range of background variables and I am wondering if I should be including all the background variables on the mediators like below and then include them in the model constraint command. Or Should I just regress Y1 on all the background variables and keep the background variables out of the mediation model. A rough example: MODEL: y1 on m1 (p1) ; y1 on x1 (p2) ; m1 on x1 (p3) ; m1 on b1b7 (p4p10) ; y1 on b1b7 (p11p17) ; MODEL CONSTRAINT: New ( y1_m1_x1 y1_m1_b1 y1_m1_b2 y1_m1_b3 y1_m1_b4 y1_m1_b5 y1_m1_b6 y1_m1_b7) ; etc... 


Is y1 the binary variable? 

db40 posted on Monday, September 15, 2014  1:24 am



Apologies Bengt, the mediators and the outcomes are binary. M1M4 and Y1 


I would include all background variables in all regressions. With all these binary variables, I would use WLSMV. 


Hi, I am running a longitudinal mediation model with a continuous predictor, continuous mediator, and categorical outcome. Here is my syntax: Categorical is DV1; Model: DV1 on M1 IV; M1 on IV; Model Indirect: DV1 IND M1 IV; Analysis: Bootstrap = 1000; Output: Patterns Cinterval(Bootstrap) Standardized; I keep getting the following error message: "Statements in MODEL INDIRECT must include the keyword IND or VIA. No valid keyword specified." This doesn't make sense to me, since I did indeed specify the keyword "IND." (I've also tried running this with "VIA," but receive the same error message.) I receive the same error whether or not I specify Analysis:bootstrap. Am I missing something obvious? Thank you very much for your help! Alexandra 


Please send output and license number to support. 

namer posted on Wednesday, July 01, 2015  7:54 am



Dear Bengt and Linda, I am running a moderated mediation model with a binary outcome and a continuous mediator and a mix of binary and continuous IVs. My moderators are a set of dummy variables (i.e. 3 groups = 2 dummies) and I test the moderation of path m > y. where m is continuous and y is binary. I have been doing this using MLR  but I noticed that my standard errors for the binary predictors, the dummy variables and the dummy*mediator interactions vary greatly between MLR/ML estimators  none of the standard errors of continuous IVs are affected. The MLR model suggests the moderation is significant, but ML not. I have read that MLR generally outperforms ML in terms of standard errors, but given that my standard errors don't differ on the continuous variables is it possible that MLR is not appropriate in this type of model, leading to big changes in S.E. between ML and MLR? Or is this an indicator that MLR might be indeed outperforming ML in this case? Thank you kindly, Namer 


With a binary outcome you should use the new MOD option (see v7.2 language addendum). MLR is better than ML, but bootstrap may be better because it offers nonsymmetric confidence intervals for effects. See also the paper on our website: Muthén, B. & Asparouhov, T. (2015). Causal effects in mediation modeling: An introduction with applications to latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 1223. DOI:10.1080/10705511.2014.935843 

anonymous Z posted on Saturday, September 12, 2015  12:48 pm



Hi Dr. Muthen, Following the formula below suggested by you, I am trying to translate the probit coefficients into probabilities : prob (y=1) = f (threshold + b*x ) I have two questions: 1.For the value of x in the formula, should the average (mean) of x be used? 2.Can I translate probability into odds? If so, can I simply use the formula Odds = p/(1p)? Thanks so much! 

anonymous Z posted on Saturday, September 12, 2015  12:57 pm



to clarify on my second question, I am thinking to use odds ratio to explain the results. Is this appropriate? 

anonymous Z posted on Saturday, September 12, 2015  1:58 pm



Sorry for multiple posts, I am reading Mplus online note, topic 2. It seems that you suggest a way to convert probit beta to logit beta logit beta = probit beta * c where c = 1.81. So does this mean, for example, if my probit beta=0.25, then logit beta = .25*1.81= .45. then OR = e.45 = 1.57 Thanks so much! 


Answers to your first 2 questions: 1. That's up to you  the result is a function of the x value. 2.Yes, you can translate any probabilities into odds  also prob's from probit. But if you have several x variables, the odds are not constant over the values of the other x unless you have a logistic model.  Why not use the logistic model? 

anonymous Z posted on Saturday, September 12, 2015  6:46 pm



Hi Dr. Muthen, Thanks for your response. I am using Bayes estimation with mediator as dichotomous. So it is probit model. In this case, should I convert probit beta into logit beta based on your note on topic 2 and then calculate odds ratio? logit beta = probit beta * c where c = 1.81. or should I just convert probit probability into odds ratio directly? Which way is better? Thanks! 


I would go with the latter which is more precise. 

Jirs Meuris posted on Sunday, December 20, 2015  6:51 am



I was adapting the Mplus code provided in your 2011 paper in the Appendix Table 33 for mediation with binary outcome and logistic regression and I was wondering where the starting values for the direct, indirect, and odds ratio come from? NEW(ind*.45 dir*.4 oddsrat*.7); Should these be changed depending on my model and if so where could I find these values? Thanks in advance! 


No need to change them unless you are doing a simulation study where these values will be what coverage is based on. Note also the automation of this in the 2015 paper on our website: Muthén, B. & Asparouhov, T. (2015). Causal effects in mediation modeling: An introduction with applications to latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 1223. DOI:10.1080/10705511.2014.935843 

DavidBoyda posted on Saturday, February 27, 2016  10:17 am



Dear Dr Muthen, I am conducting a mediation model with dichotomous outcomes using the model constraint command. Which estimator is appropriate for dichotomous outcomes? Can I use MLR for say odd ratios or is it preferable to use WLSMV and convert the Probits to probabilities. 


It would be your choice which to use. With dichotomous outcomes, MLR requires numerical integration and can become computationally demanding. MLR has better missing data handling. WLSMV has only probit regression. MLR has both logistica and probit regression. 

DavidBoyda posted on Sunday, February 28, 2016  7:23 am



thank you so much Linda. 


Hi Drs. Muthén, I want to run an MSEM mediation model with a continuous predictor (mat_age_c), continuous mediator (famst1r), and binary outcome(earcigx). I modified code provided by Preacher et al. (2010) for a 111 model with continuous outcomes by a) specifying the categorical outcome, b) using MLR and integration, and c) putting latent factors behind the predictors and covariates at the %between% level: %WITHIN% famst1r ON mat_age_c (aw); earcigx ON famst1r (bw); earcigx ON mat_age_c; %BETWEEN% m BY mat_age_c@1; mat_age_c@0; m; f BY famst1r@1; famst1r@0; f; famst1r ON m (ab); earcigx ON f (bb); earcigx ON m; earcigx f m; MODEL CONSTRAINT: NEW(indw indb); indw = aw*bw; indb = ab*bb; Are my modifications appropriate? If so, based on papers by you (2015)and VanderWeele & Vansteelandt (2010), should I add a term for the interaction between the predictor and mediator when estimating the indirect effect? Thanks for your time! 


Because your outcome is binary the indw definition of the indirect effect is not a counterfactuallydefined effect as in my 2015 article. With 2level modeling, this is more complex but could probably be expressed in Model Constraint (I haven't looked at this). It becomes even more complex if you add an M*X interaction. 


Dr. Muthen, Thanks so much for your quick response. If I maintain the general SEM framework rather than the counterfactual framework do you think the model I specified above would be appropriate for the binary outcome? Thanks again for your time. 


Yes, but you have to be aware that the within indirect effect that you have refers to a continuous latent response variable behind the observed binary outcome. 


Thanks again! I was under the impression that the within indirect effect would refer to a continuous latent response variable if I were using probit models, but here I am using logistic regression. Will the indirect effect still refer to a continuous latent response variable behind the observed binary outcome? 


Yes, logit and probit can both be thought of via a continuous latent response variable  just with different residual variances. 


Hi, I am running a mediation model using MLR, I have 4 independent variables (3 binary, 1 continuous), one continuous mediator and 4 binary outcome variables. Forgive me if this is a very basic question, but why am I not getting the chisquare test of model fit in my output? I would like to compare this model with one in which certain paths are fixed to zero. Thanks 


With maximum likelihood and categorical outcomes, chisquare and related fit statistics are not available because means, variances, and covariances are not sufficient statistics for model estimation. Difference testing of nested models can be done using the loglikelihoods. How to do this is shown on the website under Difference Testing for MLR. 


Thank you, Dr. B. Muthen for your responses to my previous question about indirect effects with binary outcomes. I have one more question: would the interpretation of an indirect effect of say 0.05, then, be for every one unit increase in the predictor, there is a 0.05 unit increase in the continuous latent response variable through the mediator? 


Thank you for your prompt response! 


Erikka: Yes. 

Ted Fong posted on Thursday, September 01, 2016  8:29 am



Dear Dr. Muthen, I'm currently reading Chapter 8 of your new book on regression and mediation analysis using Mplus. I have a question regarding the calculation of odds ratio on p.324. With a 22.2% and 12.0% vaccination rate for intervention and control groups respectively, should the odds ratio be equal to: [0.222/(10.222)]/[0.12/(10.12)] = 2.09 I cannot figure out how the odds ratio metric turns out to be 2.70 at the 5th line. Please advise. 


My mistake  it should be 2.09. 

Ted Fong posted on Thursday, September 01, 2016  7:50 pm



Thanks so much for your clarification! Your new book is so informative and I have learnt a lot from studying it. 

Back to top 