Mplus Discussion >> Interpretting results of model with dichotomous outcome

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Interpretting results of model with d...

Mplus Discussion > Categorical Data Modeling >

Message/Author

jkirby posted on Friday, July 06, 2001 - 8:55 am

I am estimating a model with several observed dichotomous outcomes. To simplify, lets say that I have one exogenous variable (X), and two endogenous variables (Y1 and Y2), all of which are dichotomous. X affects both Y1 and Y2, and Y1 affects Y2. I would like to be able to say something about the extent to which X affects the probability of Y2 (both directly and indirectly through Y1), rather than limiting my discussion to how the underlying latent variables are related. Reviewers have requested that I go beyond the sign and direction of coefficients in my interpretation--- a reasonable request--- I am just not sure how to do it.

Any help on calculated/interpretting predicted probabilities or using some other approach to interpretation would be greatly appreciated.

Thanks!

bmuthen posted on Sunday, July 08, 2001 - 10:53 am

You can study how x affects y2 probabilities directly and indirectly as follows. Assume

y*_1 = g_1*x + e_1,
y*_2 = b*y*_1 + g_2*x + e_2,

where y* denotes the underlying latent response variable and b and g are regression coefficients. It follows that

y*_2 = b*g_1*x + g_2*x + b*e_1 + e_2.

Mplus assumes that V(y* | x) = 1, so that V(b*e_1 + e_2) = 1. Then

P(y_2 = 1 | x) = P(y*_2 > t_2 | x) =
F(- t_2 + b*g_1*x + g_2*x),

where tau_2 is the threshold for y_2 and F is the standard normal distribution function found in tables. The second term in the argument of F is the indirect effect and the third term is the direct effect. Using different values of x, the effects of x on y_2 = 1 probabilities via these two terms can be computed.

Anonymous posted on Monday, September 06, 2004 - 5:11 pm

Hello,

I have two (unrelated) models from which I am trying to calculate the probability of a binary outcome (labeled u3 in Model 1 and u2 in Model 2) for given values of the other variables. I have the Day 3 MPlus handouts, which are proving quite helpful for this, but I still have a few questions.

Model 1
MODEL: f1 BY u1 u2;
u3 ON f1 x1 x2;
ANALYSIS: TYPE = general MISSING h1;

Model 2
MODEL: y1 ON x1;
u1 ON y1;
u2 ON u1;
ANALYSIS: TYPE = MEANSTRUCTURE;

For Model 1, MPlus outputs a residual variance for the outcome of interest (as in page 19 of the handout). I was planning to plug this into the probability equation as shown on pages 21-22. However, the Model 2 output does not contain a residual variance. Is this 1, as you imply in your response above, or do I need to calculate it using other items from the output (and, if so, how?)

Finally, for Model 1, to compare the effects of f1, x1, and x2 on u3, are the Std or StdYX estimates most appropriate?

Thanks from a novice!

Linda K. Muthen posted on Wednesday, September 29, 2004 - 4:26 pm

I would need to see your two outputs to understand why you are getting a residual variance in one and not the other. With categorical outcomes, residual variances are printed at the end of the results with r-square when you request a standardized solution.

I think you would look at stdyx.

Anonymous posted on Tuesday, January 11, 2005 - 9:55 am

Hi-

I have two unrelated questions. First, is there anywhere in the output that specifies whether a model was estimated using logit or probit. I thought I read in the user manual (or in the discussion) that random models using MLR are estimated as probit models. But in the output, I noticed that default logit thresholds were mentioned.

Second, I estimated a logit model using Mplus and STATA. While the coefficients on the independent variables were all virtually identical, the intercepts were quite different. Does Mplus calculate intercepts differently?

Thanks

Linda K. Muthen posted on Wednesday, January 12, 2005 - 4:52 pm

Weighted least squares estimation is done using probit regression. Maximum likelihood estimation including MLR is done using logistic regression.

Mplus uses thresholds instead of intercepts which should be the negative of the intercepts. You may be comparing probit and logit given your misunderstanding in paragraph one.

Peggy Tonkin posted on Tuesday, February 01, 2005 - 6:05 am

I am modeling continuous mediators with a categorical outcome. I asked for the IND effects for each mediator on the outcome and get the specific indirect and sum of indirects. Can I add these to the direct effect to get the total effect of each mediator on the outcome?
Peggy Tonkin

Linda K. Muthen posted on Tuesday, February 01, 2005 - 7:12 pm

See MODEL INDIRECT in the Mplus User's Guide for a full description of IND. Say y IND x1 not y IND x2 x1 to get all possible indirect effects and a total effect.

peggy tonkin posted on Wednesday, February 02, 2005 - 8:00 am

Thank You.
Peggy Tonkin

Anonymous posted on Wednesday, June 01, 2005 - 1:11 pm

I am working on a path analysis with categorical dependent variables using MPLUS (including indirect effect), but I don't know how to interpret the coefficients of the direct and indirect effects. I have seen your answer regarding this, but still feel a bit confused by the formula you gave. Could you please give a concrete example? For instance, how to interpret the following probit regression (direct and indirect effects):

...
CATEGORICAL IS y x1
MODEL: y ON x1 x2 x3
x1 ON x2 x4
MODEL INDIRECT:
y x1 x2

The result:

Estimates S.E. Est./S.E.

y ON
x1 -0.463 0.107 -4.328
x2 -0.295 0.161 -1.838
x3 0.063 0.383 0.165
x1 ON
x2 0.309 0.161 1.913
x4 0.004 0.067 0.062

Effects from x2 to y

Sum of indirect -0.143 0.084 -1.705

Specific indirect

y
x1
x2 -0.143 0.084 -1.705

bmuthen posted on Wednesday, June 01, 2005 - 6:09 pm

I think you are in the WLSMV - probit framework where you can think in terms of continuous latent response variables underlying the categorical outcomes. So for

y on x1 x2 x3;
x1 on x2 x4;

where y and x1 are categorical, the continuous latent response variables can be called x1* and y*. The indirect effect of say x2 on y via x1 is therefore viewed as an indirect effect of x2 on y* via x1* and is obtained as the product of the coefficients for x1* regressed on x2 and y* on x1* (which are the coefficients printed in the regular output), and this product is interpreted exactly the way you would interpret it had x1* and y* been observed (continuous) variables. And as you say, more has been said in earlier posts.

Anonymous posted on Wednesday, June 01, 2005 - 7:20 pm

Thanks for answering my question above. Three quick questions:

1. In MPLUS's probit regression, is threshold the constant term in STATA's probit regression (the sign of MPLUS's threshold is opposite to the sign of STATA's constant term)?

2. To get the threshold value, I add "TYPE=MEANSTRUCTURE" in ANALYSIS. In the results:

Thresholds
y1$1 -2.524 1.488 -1.696
y2$1 2.341 1.608 1.456

Does -2.524 refer to the threshold in the equation where y1 is the dependent variable?

3. BTW, how does MPLUS obtain standard errors for indirect effect in path analysis, when dependent variables are categorical?

Thanks!

Anonymous posted on Wednesday, June 01, 2005 - 7:59 pm

Sorry. one more question:

In the answer regarding interpreting coefficients you gave in 2001 (first message in this section), it seems you were addressing the case when there are two endogenous variales (y*_1 and y*_2) and one exogenous variable (x). If I have more exogeneous variables, when I interpret the coefficient of one particular variable, do I need to take the mean value of other exogeneous variables? Or, can I disregard the value of other exogeneous variables and only use the formula you gave, which is P(y_2 = 1 | x) = P(y*_2 > t_2 | x) = F(- t_2 + b*g_1*x + g_2*x)?

I guess I need to control the value of other variables, but I want to make sure. Thanks!

bmuthen posted on Thursday, June 02, 2005 - 9:12 am

Answers to your questions:

1. Yes.

2. Yes

3. Two ways: Delta method and bootstrap (see User's Guide). The Delta method considers the product of slope estimates; the principle is the same as with continuous outcomes.

4. You need to use values for all your exogenous variables, since each slope refers to a partial effect just like in standard regression analysis.

Anonymous posted on Friday, July 22, 2005 - 12:06 pm

Hi, I have two questions:

1. What's the major difference between a probit model estimated by WLSMV in MPLUS and a probit model estimated by ML?

2. In SEM, when the dependent variables are ordered categorical, you said MPLUS takes them as continuous latent variables by using WLSMV estimation. Is the "latent variable" here the same as "latent variable" in factor analysis? I ask this because my impression is that a latent variable is often based on several observated variables. But when you take ONE categorical variable as a latent variable, there is only one observed variable -- are you saying that in this case, a latent variable is actually based on observed categories in the observed categorical variable?

Thanks!

bmuthen posted on Friday, July 22, 2005 - 12:22 pm

1. The results of those two estimators would probably be very similar (we plan to have probit ML in Mplus in the future).

2. Yes, the latent variable here is a continuous latent response variable underlying a single observed categorical variable. It is not a factor with multiple indicators but is specific to a certain observed variable. It can be thought of as what you really want to measure, whereas your measurement is a crude reflection of the response variable - the observed categories inform about which range (between neighboring thresholds) the response variable is in, but not its specific value. It is sometimes called a response propensity.

Anonymous posted on Saturday, August 20, 2005 - 8:17 am

Hi, I am estimating a SEM model by using probit estimation (WLSMV). In one of the equations, the dependent variable (Y1) is binary, and in this equation, the coefficient of a continuous independent variable (X1) is .23 (p<.001). I have two questions:

1. Can I interpret the coefficient as: for one unit increase of X1, the latent continuous variable underlying Y1 increases by .23?

2. How can I interpret the coefficient in terms of probability? Readers are used to seeing that the effects of independent variables on a dichotomous dependent variable are interpreted in a way of probability change. Since this is a probit model by WLSMV, not a logit one by ML, I am not sure how to get this alternative interpretation.

Thanks!

Linda K. Muthen posted on Saturday, August 20, 2005 - 9:33 am

1. Yes.
2. The probability you ask for is computed as P,

P = 1 - probability ((threshold - z)/sqrt(theta)),

where,

threshold = the threshold of the dichotomous event,

theta = the residual variance for y* of the dichotomous event obtained from the standardized solution,

and, for example

z = a*eta1 + b*eta2 + c*x,

where a, b, and c are the estimated regression coefficients of y* for the dichotomous event, regressed on two factors and one x. P is the conditional probability of the event given those factor values and x value.

To compute P you choose values of eta1, eta2, and x that you are interested in and evaluate z for those values. You then use a normal probability table to obtain

probability ((threshold - z)/sqrt(theta)),

from which you obtain the desired P.

Anonymous posted on Saturday, August 20, 2005 - 2:44 pm

Thanks a lot for your response. I can only find the threshold of the dichotomous event, but don't know how to find theta and eta1/eta2.

1. How to obtain "the residual variance for y* of the dichotomous event from the standardized solution?

2. What do you mean by standardized solution above?

3. What are eta1 and eta2?

Or, Could you please give me a real example?

Thanks!!!

bmuthen posted on Sunday, August 21, 2005 - 1:41 pm

1. Theta is the residual variance which is found in the output next to the R-square values (at least if you request a standardized solution).

2. If you type "Standardized" in the OUTPUT command you get slopes standardized to unit variance.

3. In this example, eta1 and eta2 are factors used to illustrate the case where you have not only x's but also factors that influence the categorical outcome. If you don't have factors, then you drop that part.

Peter Martin posted on Friday, October 14, 2005 - 3:34 am

Hello,

Referring to the discussion of the last few postings (Aug 2005): How can I calculate the predicted probabilities of a probit when I am doing a path model using multiple imputation? When type=imputation, standardized output is not available, so it seems I don't get the residual variance of y*.

I notice that I do get a matrix of thetas in the TECH1 output; will that contain the resid var of y*, though (i.e. are these the same thetas, or is there a homonym here)? Anyway, in my output the theta matrix contains only zeros (so they are unusable for the probability calculation).

If I may make a suggestion: It would be quite nice to have estimated probabilities as an output option in MPlus - similar to the postestimation programs people like Gary King or Scott Long have written for STATA. But maybe that's asking too much? Mplus is a brilliant programme as it is, of course.

bmuthen posted on Friday, October 14, 2005 - 10:12 am

To get the y* residual variance you have to use the parameter estimates printed (which have been averaged over the imputed data sets) and the formulas of Appendix 2 of the Version 2 Tech Appendix on the web site - see especially formula (43). We have it on our list to add more output features for imputed runs and also the estimated probabilities for individuals.

Diana Clarke posted on Wednesday, March 29, 2006 - 9:06 am

Hi Bengt,
1. In a SEM model with categorical main independent variable (5-level nominal so 4 dummy variable created), categorical/ordinal endogenous variables (mediators) and a binary outcome, would one report the standardized or the unstandardized coefficients? Can you clearly explain the pros and cons of each?

2. Whan are the benefits of calculating and reporting the probabilities?

Bengt O. Muthen posted on Wednesday, March 29, 2006 - 6:49 pm

Standardization is beta*SD(x)/SD(y).

1. I would not standardize wrt to x here, only wrt y. Standardization wrt to x is only suitable when x is continuous - it does not make to talk about a standard deviation (SD) change for a categorical variable (you want to consider changing categories). Standardization wrt to categorical mediators or ultimate outcomes may or may not be done. I personally feel that the rush to standardization is often not necessary - I like raw coefficients. Certainly, in logistic regression one typically does not standardize. But is is possible to do so considering as the variance the variance of the latent response variable underlying the categorical variable.

2. I think reporting key estimated probabilities for categorical dependent variables is much better than standardizations. This clearly shows what the model implies.

Diana Clarke posted on Thursday, March 30, 2006 - 5:06 am

Hi Bengt,
Thanks for the response above (March 29, 2006 - 9:06 am). I have a few follow-up questions related to the calculation of the probabilities using the scenario below:

in a model:
y1 on d2 d3 d4 x1 x2 x3;
y2 on y1 d2 d3 d4 x1 x3;
y3 on y1 y2 d2 d3 d4 x1 x2;
y4 on y1 y3 d2 d3 d4 x1 x2 x3;

where y1-y3 are 4-level ordinal variable, y4 is binary x1 and x2 are dichotomous variables and d2-d4 are dummy variables that represents my main independent variable with d1 the referent category left out.
1. How would I calculate the probability of y4=1 for different categories of my main independent variables (i.e. the probability of an event (y4=1) for d2=1 compared to d4=1)?
2. Would I have to do this at each threshold value for each endogenous variable in the model (i.e. 3 threshold values for each)?
3. With respect to the continuous exogenous variable, is it sufficient to just include the group mean value for the variable?

Bengt O. Muthen posted on Thursday, March 30, 2006 - 3:43 pm

I assume you use the WLSMV estimator (so probit and u* variables used for mediation), and not ML (logit and u variables used for mediation). Then it is simple:

1.You would express y4* in terms of the "reduced-form", that is in terms of the x variables d2, d3, d4, x1, x2, x3 (just like you would in a regular mediational path model for continuous outcomes). Then you are looking at a regular probit regression for which our V4 UG chapter 13 gives prob formulas.

2. No, because your y1-y3 are ordinal variables which have each only 1 slope and therefore the category does not have an influence; this makes is simple.

3. Yes.

Diana Clarke posted on Friday, March 31, 2006 - 6:14 am

Can you supply the formulas? I have V3 of the UG which only supply the prob formulas for the logistic coefficients.

Linda K. Muthen posted on Friday, March 31, 2006 - 7:59 am

The Version 4 Mplus User's Guide is available on the website as a pdf.

Diana Clarke posted on Thursday, April 06, 2006 - 12:56 pm

Hi Bengt,
Thanks for your response above. However, if my model contains a correlation between two of my mediating variables, how is this taken into account in calculating the probability.
That is, the complete model is:

y1 on d2 d3 d4 x1 x2 x3;
y2 on y1 d2 d3 d4 x1 x3;
y3 on y1 d2 d3 d4 x1 x2;
y4 on y1 y3 d2 d3 d4 x1 x2 x3;
y2 with y3;

and I am trying to calculate the effect of d2 on y4.

Bengt O. Muthen posted on Thursday, April 06, 2006 - 5:59 pm

Having that correlation in the model changes the parameter estimates and therefore the indirect effects, but not the procedure for calculating the probabilities (as a function of indirect and direct effects).

For instance, if you have

x--> y1 -->z with slopes a1 and b1
x--> y2 -->z with slopes a2 and b2

then z expressed as a function of x is

E(z | x) = b1*a1 + b2*a2

irrespective of y1 and y2 having correlated residuals.

Diana Clarke posted on Wednesday, April 26, 2006 - 10:39 pm

Hi Bengt,
Since one can obtain beta coefficients for the specific indirect paths with MPLUS once your model is recursive, I am assuming that one could calculate the probability of X on Y through a specific indirect path (as opposed to the overall indirect path). Is this correct?

Linda K. Muthen posted on Thursday, April 27, 2006 - 8:52 am

Yes, you can do this using the VIA option of MODEL INDIRECT.

Hossein Azadi posted on Sunday, October 08, 2006 - 9:03 pm

Hello,

How can I calculate direct and indirect effect in path analysis by SPSS?

Linda K. Muthen posted on Monday, October 09, 2006 - 8:32 am

I don't know if SPSS has an automatic way to calculate indirect effects. You would need to contact their technical support.

Antonio A. Morgan-Lopez posted on Monday, October 09, 2006 - 11:38 am

Kris Preacher (@ U. Kansas) has some nice SPSS macros (and corresponding papers) to calculate indirect effects in single mediator models, multiple mediator models and med mod/mod med models @ http://www.psych.ku.edu/preacher/

Hossein Azadi posted on Friday, October 13, 2006 - 2:33 am

So, would you please kindly introduce me an appropriate package for Path Analysis?

Linda K. Muthen posted on Friday, October 13, 2006 - 9:10 am

Mplus can estimate a path model and provide indirect effects.

Hossein Azadi posted on Monday, October 16, 2006 - 10:37 pm

Thanks a lot. One more question: is there any free dwonloadable version (such as student version) of Mplus available on the web? If so, would you please kindly put its web address on the board?

Linda K. Muthen posted on Tuesday, October 17, 2006 - 7:12 am

Yes, we have a free demo which is exactly the same as the regular version except for a limitation on the number of variables that can be analyzed. The full user's guide is also on the website. You can access both via www.statmodel.com.

Hossein Azadi posted on Wednesday, October 18, 2006 - 2:26 am

Many thanks,
Hossein Azadi

Nadia micali posted on Monday, September 17, 2007 - 3:52 am

Sorry for the silly question, but doing a logistic regression on Mplus (using on) i get an estimate of 4.6, how should this be interpreted (i.e. what is it)? you also get an odds ratio, so it si not an oods ratio.
how do you use it tracing a path?

Linda K. Muthen posted on Tuesday, September 18, 2007 - 4:39 am

It is a logit, that is, a log odds. If you are asking about an indirect effect, you can use the probit link and then the indirect effect is the product of the two regression coefficients in the indirect effect.

Magda M�nica Martins Rocha posted on Wednesday, December 19, 2007 - 2:03 am

HI.

I'm trying to understand the results i have from a confirmatory factor analysis where all the six indicators are binary. One of the threshols is -1.363. is it possible, and what does it means?

Thank you

Magda Rocha

Linda K. Muthen posted on Wednesday, December 19, 2007 - 9:05 am

Assuming you are using WLSMV, the threshold is a z-score indicating a probability of greater than .5.

Hossein posted on Thursday, March 12, 2009 - 5:22 am

Hi

I have a dependent variable (Y) which is nominal with three levels (degrdation, constant, improvement), and several independent variables (Xs) which are interval. Can I estimate any regression here? If so, what kind? and is it possible with SPSS?

Thanks

Linda K. Muthen posted on Thursday, March 12, 2009 - 7:25 am

You can specify a multinomial logistic regression in Mplus. You will have to check with SPSS to see if they do multinomial logistic regression.

Thomas Schl�sser posted on Wednesday, March 18, 2009 - 4:28 am

Hello,

One nominal independent X (3 conditions).

Six mediating nominal variables M1-M6, each of them with different number of groups (each mediating variable describes a membership to different clusters in one of six measures)

One binary dependent variable Y (a decision of the subjects)

The idea is to show the changes in behavior Y (decision) effected by a different treatment (X: condition) is mediated by some of the changes (belonging to one cluster and not another) within the six mediators.

I also used Hayes' (beta) indirect script for binary outcomes but then tried to do this with MPlus. The problem is obviously that in both approaches the mediators (are or) have to be defined as categoricals. But doing this, mediation depends on ordering of the nominal variables, of course. Is it possible to force Mplus to do a multinominal regression M on X and Y on M?
Doing this manually with SPSS it shows that multinominal regression for some of the mediators brings significant dependencies in both directions.

With the bootstrapping procedure I am forced to set variables to categorical. Using MonteCarlo I cannot build IND or VIA effects.
Grouping of course only works with one mediating variable, which hinders to show specific indirect effects.

Do you have an answer for me? Would be so great, it's my disseration.

Thank you, Thomas

Bengt O. Muthen posted on Friday, March 20, 2009 - 12:16 pm

So it sounds like you want a multinomial logistic regression of M on X and a binary logistic regression of the Y on M. The latter of course needs to be interpreted as Y probabilities shifting as a function of the nominal M categories. The way I can see this done (using ML) is to represent M by a latent class variable c, making M and c the same by using the M intercepts to connect its categories with those of c. Y prob's (thresholds) would then shift as a function of the c classes. I don't know how one would think of indirect effects in this context, however.

TS posted on Saturday, March 21, 2009 - 1:42 am

Thank you for your answer. I will try it this way. I got two further questions: 1. Is it a problem that I have six latent variables each with two, three or four nominal categories at the end?
And 2. (to your last sentence) Do you mean there is no way to think about indirect effects or is it a technical problem? There is a strong direct effect of changes in X changing probabilities to be in one or the other state of Y. But the mediators may carry some of the changes meaning a specific pattern within the six mediators may sig. change the probability of changing state in Y. Some of the mediators can be interpreted as categorical and it can be shown that such significant specific indirect effects exist for them. So I wonder if the sig. nominal connection of M and X and the sig nominal connection of Y on M carry such effect.
Thank you again, Thomas

Bengt O. Muthen posted on Saturday, March 21, 2009 - 10:40 am

1. No.

2. I mean that an indirect effect is not simply the product of 2 slopes in this case. As you say, the mediator class probability is influenced by x and the mediator class influences the mean of the y so there is certainly an indirect effect of x, but a more complex one.

Mary E. Mackesy-Amiti posted on Friday, November 19, 2010 - 1:17 pm

Hello,

In interpreting the results of a path analysis with a dichotomous outcome using WLSMV -

What is the difference between this probabilty:

[posted on Saturday, August 20, 2005 - 9:33 am]

P = 1 - probability ((threshold - z)/sqrt(theta)),

and this:

[posted on Sunday, July 08, 2001 - 10:53 am]

P(y_2 = 1 | x) = P(y*_2 > t_2 | x) =
F(- t_2 + b*g_1*x + g_2*x)

thank you

Bengt O. Muthen posted on Saturday, November 20, 2010 - 7:55 am

They are the same. This has to do with the symmetry property

F(v) = 1-F(-v).

See for instance intro stat books for the case where F is the normal distribution function (Phi).

So for your two versions, threshold = t_2 and z = the g*x expression. The only difference is that in the second expression it is assumed that the residual variance (theta) is 1.

cathy labrish posted on Sunday, August 26, 2012 - 6:51 pm

Quick question re how to interpret coefficients from a regression of a continuous latent on an observed binary (and an observed ordinal).

In the case of a binary outcome, what is the reference category 0 or 1 (eg. if my equation is y=.345eta1 then do I interpret this as for every one unit increase in eta1 my log odds of y being 0 increase by .345 or do I interpret it as my log odds of y being 1 increases by .345).

Similarly, for an ordinal outcome, do I interpret y as the likelihood of being in the next lower category or the likelihood of being in the next higher category.

Thanks!

Linda K. Muthen posted on Tuesday, August 28, 2012 - 11:15 am

The regression of a continuous latent variable is a linear regression. For a binary item, zero is the reference category. See Chapter 14 of the user's guide for interpretation.

Todd Hartman posted on Saturday, January 05, 2013 - 9:34 am

Could someone clarify the interpretation of indirect effects using predicted probabilities in a simple mediation model with a binary dependent variable? (I've carefully read through multiple threads, the user guides, and scoured the Internet--a concrete example would help me fix ideas.)

The path model is X --> M --> Y, where X is a binary treatment variable, M is a continuous mediating variable, and Y is a binary outcome. Below are the unstandardized coeff. (using WLSMV):

M ON X: a = .089 (.035)
Y ON M: b = 1.99 (.17)
ON X: c = .054 (.142)
Intercepts (for model with M): .673 (.021)
Thresholds (Y$1): 1.355 (.139)
Indirect Effect (X to Y): .176 (.071)

Using the formula provided in the user guides and this thread to calculate predicted probabilities:

P(Y=1|X) = F(-t + a*b*X + c*X)
So, when X = 0: P(Y=1|X) = F(-1.355 + .177(0) + .054(0)) = .087
And, when X = 1: P(Y=1|X) = F(-1.355 + .177(1) + .054(1)) = .130

1) Do these calculations look correct? My concern is that these predicted probabilities seem pretty low when I look at the raw data. For instance, the mean value of Y when X is 0 is .49, and it is .59 when X equals 1. Am I missing an intercept or something?

Bengt O. Muthen posted on Saturday, January 05, 2013 - 4:30 pm

Even when you condition on X there is some variation left in M, namely its residual. This means that to get the probability of Y you have to integrate out this residual and the formula is more complex than what you have.

You can see how this is done in the paper

Muth�n, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Submitted for publication.

which is on our web site under Papers, Mediational Modeling. The Tech Appendix goes through the formulas in Section 13.2. Note that you may be better off presenting causal effects as discussed in the paper - you will find Mplus scripts there.

Todd Hartman posted on Sunday, January 06, 2013 - 11:47 am

Thanks for clarifying. Your manuscript was very helpful, particularly the appendices showing the calculations for the causal indirect effects and specific examples with accompanying Mplus scripts.

I do have one more follow-up question. I noticed for your aggressive behavior and intention to stop smoking continuous mediators (for Examples 1 and 2, respectively), you transform these variables:

agg5 = (sctaa15s-2.400)/1.100; (On page 117)
intent = (intent-1.456)/0.8854; (On page 120)

Could you explain what you've done here (and why)? I'm just wondering whether I would need to use some sort of standardized mediator to obtain proper estimates.

Bengt O. Muthen posted on Sunday, January 06, 2013 - 5:37 pm

These standardizations are just done for easier interpretation. Subtracting the mean is typically done when interaction terms are considered.

Todd Hartman posted on Monday, January 07, 2013 - 10:29 pm

Ah, that makes sense. Thanks so much for your help--everything works beautifully now for a path model with a binary outcome.

What about a path model with an ordinal outcome? Seems to be a common situation and my hope is to use Mplus exclusively rather than having to switch back and forth between different software to do all of these types of analyses (like Imai et al.'s 'mediation' package in R). Is there a straightforward way of modifying the formulas/scripts to calculate causal indirect effects for a particular category of an ordinal outcome?

For instance, for a 4-category outcome variable (1,2,3,4), would it be possible to substitute 'mbeta2' for the ordinal outcome (from 'mbeta0' for the binary threshold) to get the indirect effect of moving from a value of 3 to 4?

MODEL:
[y$1] (mbeta0);
[y$2] (mbeta1);
[y$3] (mbeta2);
...

MODEL CONSTRAINT:
...
arg11=-mbeta2+beta2+beta1*(gamma0+gamma1);
arg10=-mbeta2+beta2+beta1*gamma0;
arg00=-mbeta2+beta1*gamma0;

Bengt O. Muthen posted on Tuesday, January 08, 2013 - 2:37 pm

I am glad you are moving ahead on it. Please send any paper you write on it - these techniques need to be more widely used.

The effect formulas generalize directly to an ordered categorical (ordinal)
and an unordered categorical (nominal) outcome. For a 0/1 binary outcome, the expected value for the outcome is the same as the probability of category 1. With an ordered categorical outcome, the expected value for the outcome is a sum over the non-zero categories, weighted by their probabilities. This, however, assumes a certain scoring for the categories. For example, an equidistant scoring such as 0, 1, 2,... may not be substantively motivated due to the difference between two adjacent categories representing a substantively larger difference than two other adjacent categories. As an alternative, the probability for each category can be considered, an approach that is also suitable for a nominal outcome.

Jiebing Wang posted on Tuesday, August 05, 2014 - 5:18 pm

Hi Dr. Muthen,

I did CFA for binary indicators using WLSMV estimator. I have questions on the interpretation of the factor loading and threshold.
1. Interpretation of the factor loading, is it correct?
��probit (y = 1) = -��+�˦�
Factor loading �� can be interpreted in the linear form as 1 unit increase in �� results in �� units increase in the probit of getting the observed indicator as 1.��

2. Can factor loading also be interpreted as probability? I found in a book that ��the change in the probability is difficult to interpret when in the nonlinear form of the normal cumulative distribution function as it varies depend on the value of predictor �ǡ�? Is it correct?

3. Interpretation of the threshold?
How to interpret the threshold for binary data? I found an interpretation ��The thresholds, or cut points, reflects the predicted cumulative probabilities at covariate values of zero.�� Is it correct?

Many thanks!

Jiebing

Bengt O. Muthen posted on Tuesday, August 05, 2014 - 5:27 pm

Please look at the video and handout for Topic 2 on our website.

Haigen Huang posted on Monday, November 09, 2015 - 11:35 am

(1) For binary dependent variables, is this the default equation that Mplus uses for regression?
LN(P/(1-P)) =Intercept+beta1*X1+beta2*X2+...+error-term
Where LN=natural logarithm, P=probability of the incident, X1 and X2 are independent variables, and beta1 and beta2 are coefficients.

(2) What is the difference between threshold reported by Mplus and intercept?

I will appreciate!

Bengt O. Muthen posted on Monday, November 09, 2015 - 4:16 pm

(1) Yes, with logit link.

(2) The threshold is the negative of the intercept.

Geoffrey Hoffman posted on Friday, February 26, 2016 - 11:52 am

I am using MPlus for cross-lagged panel analyses using SEM, I have several baseline covariates in Year 1 and then two variables (A and B, where A is a latent factor and B is an observed dichotomous variable) each of which is measured in Years 1, 2, and 3. A and B are endogenous dependent variables. So, A in Year 2 is regressed on B in Year 1 and B in Year 2 regressed on A in Year 1 (and the same two relationships are modeled between Years 2 and 3).

To compute predicted probabilities for the risk of B in Year 2 at the mean of A in Year 1 (or, given a 1 SD change in A), can I use the same formula indicated earlier in this thread (and as illustrated on slide #163 of Topic 2)?

If I were not using Year 1 baseline covariates, then I believe the computation of the risk of B in Year 2 at the mean of A in Year 1 would involve: tau, the threshold for B in Year 2, lambda, the coefficient of B in Year 2 regressed on A in Year 1, and k, the coefficient of A in Year 2 on A in Year 1. Is that correct? And would that also work when including the baseline Year 1 covariates?

Also, is theta the SE of the Estimate under R-square? (In the example in Topic 2, slide #160, the output looks different than the MPlus output I get.)

And, finally, should one use the STDYX Standardized results for tau, lambda and k?

Thank you kindly.

Geoff

Bengt O. Muthen posted on Friday, February 26, 2016 - 3:11 pm

Q1. Yes.

Q2. No lambda because you are not predicting a factor indicator. Not the coefficient of A in Year 2 on
A in Year 1, just B in Year 2 on A in year 1.

Q3. Theta is the residual variance from the stand'd section

Q4. No, use raw estimates.

Geoffrey Hoffman posted on Wednesday, March 02, 2016 - 6:06 am

Thank you. The formula from slide #163 of Topic 2 is:

P(u_ij=1|n_i,x_i)=1-F[(tau_j-lambda_j n_i - k_jx_i)/(theta_jj)^.5]

1. It sounds as if this is not quite right for my model--so, I should not include lambda_j. Is that correct? (Yet, slide #163 seemed to use an example with a dichotomous outcome and used lambda.) Other than removing lambda, does the formula remain the same?

(So that would leave threshold for B in Year 2 - mean of A in Year 1 (i.e., 0) - coefficient of B in Year 2 on A in Year 1 times mean of B in Year 1 times 1/sqrt of theta.)

2. Does the formula account for covariates included at baseline in Year 1? To obtain a predicted probability in Stata with a regression model, I can set other covariates to their means--can that be done here?

3. What is the residual variance from the standardized section? As illustrated on slides 160 and 163 in Topic 2, theta is the residual variance under R-square. In my output under STD standardization, I have a value of .535 for the estimate and 0.018 for the SE of the latent variable A in Year 1.

4. When displaying standardized model output, should I use STDYX Standardization?

Thank you kindly.

Bengt O. Muthen posted on Thursday, March 03, 2016 - 6:39 pm

Lambda refers to a factor influencing its indicator which I don't think was your situation so you would find the value perhaps in Beta (you can tell what the names are by looking at the output and TECH1).

I worry that me giving you piece-wise advice on the formulas via quick posts on Mplus Discussion will not get things exactly right since I won't be digging into your model. Instead, I suggest that you take this to a statistical consultant who can look at it carefully. But you have to say which estimate goes with which relationship.

Vaiva Gerasimaviciute posted on Tuesday, April 11, 2017 - 1:11 am

Am I understanding the calculations at the beginning of this thread correct?:

If my probit regression coefficient is 0.094, S.E. is 0.033, and the threshold of y is 3.267, then the P(Y=1|x)=

F(-3.267 + 0.094*x)?

Do I interpret this as increase in probability Y=1 as x changes from 0 to 1?

Bengt O. Muthen posted on Tuesday, April 11, 2017 - 6:25 pm

The change is from

F(-3.267)

to

F(-3.267 + 0.094)

Vaiva Gerasimaviciute posted on Tuesday, April 18, 2017 - 7:02 am

Thank you.....
In the case that this probit regression is included in a larger SEM model, X and Y are both categorical (binary) outcomes (with probit link), and also predictors in another regression path. Then, the estimated coefficients refer to the relation between the underlying Y* and X*. Thus, F(-3.267)
to F(-3.267 + 0.094) is the increment in probability of Y=1 when the underlying X* goes from 0 to 1.
Could we say something about the probabilities of Y=1 regarding the observed X (not just for X*)? Is it possible to compute/estimate some probabilities of Y=1 for X (not X*) changing from 0 to 1?

Bengt O. Muthen posted on Tuesday, April 18, 2017 - 6:01 pm

For simplicity, let's say you regress Y* on X* only and that X* is a DV as yo say. Let's assume Y*, X* are bivariate normal (perhaps with X* conditioned on background variables). Then the probability that Y=1 given that X=1 (or any other combination) is obtained by the bivariate integral with the appropriate thresholds as limits. Mplus does not offer such computations in Model Constraint so you would have to do this in some other routine (e.g. from Numerical Recipes). So it's a bit messy.

You could instead of WLSMV and its working with X* use ML or Bayes (with predictor = observed) because then the regression of Y (or Y*) is on X, not X*. See our new book where this is discussed in the context of X being a mediator.

Zvone posted on Sunday, August 06, 2017 - 2:48 pm

Hi Dr Muthen

I have two related questions.

I read the forum and I wonder how it is possible to interpret results both as probit coefficients (after calculating probabilities) and as a linear regression when y* is considered?

Is it possible to combine both?
Are the threshold to calculate probit probabilities and y* the same value?

Thank you

Zipi

Bengt O. Muthen posted on Sunday, August 06, 2017 - 5:21 pm

Q1: Yes

Q2: I don't know what "Combine both" means here. You can discuss results in both terms.

Q3: The threshold is in the y* metric but they are not the same. You can read about this in Chapter 5 of our new book.

Zvone posted on Sunday, August 06, 2017 - 11:44 pm

Dear Bengt,

thank you for your response.
However, I am not in a position to find/buy a book now, so I have one more question.

Can you explain third answer: The threshold is in the y* metric but they are not the same.

I want to present both underlying latent variable y* and predicted probabilities.

If below is formula for calculating probabilities, I do not know where to find y*, since threshold is included in this formula?
P = 1 - probability ((threshold - z)/sqrt(theta))

I have also consulted with the presentation on new book, with slide 90 - results for HPV vaccination data.

Thank you and I am sorry if this is confusing.

Zvone posted on Monday, August 07, 2017 - 12:21 am

Hi Bengt,

and sorry for double post.

I completely failed to put my thoughts in the post before.

I mean, since the y* is followed with linear regression formula, I was wondering what would be an intercept (B0)?

Thank you

Bengt O. Muthen posted on Monday, August 07, 2017 - 8:19 am

Take a look at the handout and video of Topic 2 from our short courses on our website.