Calculating Probit Probabilities PreviousNext
Mplus Discussion > Categorical Data Modeling >
Message/Author
 George Burruss posted on Friday, November 07, 2008 - 12:56 pm
I want to make sure I am calculating the probabilities for a probit SEM, where my outcome is 4 category ordered variable. The example in the Mplus manual has an example for three categories.

Is this correct?

P(Y=1|x)=F(t1 - b1*x1+b2*x2...)

P(Y=2|x)=F(t2 - b1*x1+b2*x2...) - F(t1-b1*x1+b2*x2...)

P(Y=3|x)=F(t3 - b1*x1+b2*x2...) - F(t2 - b1*x1+b2*x2...) - F(t1 - b1*x1+b2*x2...)

P(Y=4|x)=F(-t3 + b1*x1+b2*x2...)

When I calculate the results, the last category drops from a probability of .62 to .01, which makes me think I'm doing it wrong.
 Bengt O. Muthen posted on Friday, November 07, 2008 - 5:36 pm
You have to carry the minus sign for all the x terms, not just the first.

Also the y=3 probability is incorrect - the last term should be dropped. Think of a normal curve where you are interested in the area between 2 thresholds - only the lower and the higher limit for the category are involved.
 George Burruss posted on Saturday, November 08, 2008 - 6:51 am
Thank you for the clarification; I see my mistake for Y3. Let me make sure I understand your first point.

Carry the minus sign for all x terms; thus for each level of Y, I subtract b1*x1, subtract b2*x2, subtract b3*x3...

So, P(Y=2|x)=F(t2 - b1*x1 - b2*x2...) - F(t1 - b1*x1 - b2*x2...)

And for the last level of Y, I add all the regression weights (e.g., P(Y=4|x)=F(-t3 + b1*x1 + b2*x2 + b3*x3...))
 Bengt O. Muthen posted on Saturday, November 08, 2008 - 7:28 am
Right.
 Ramzi Mabsout posted on Wednesday, November 18, 2009 - 10:01 am
Can I "use" the standardised probit coefficients & threshfolds to calculate probabilities? Do you expect them the probabilties calculated from the standardised coefficients to differ from those calculated from the unstandardied coefficients?

thank you.

ramzi
 Ramzi Mabsout posted on Wednesday, November 18, 2009 - 1:16 pm
If you may also please offer me some guidance on estimating the probability impacts on categorical indicators given changes in the latent variable with a Probit CFA .

thank you very much.

ramzi
 Bengt O. Muthen posted on Wednesday, November 18, 2009 - 5:49 pm
Use unstandardized estimates to compute probabilities.

With latent variables you can compute the probability for say 1 SD below and above the latent variable mean, where SD comes from the latent variable variance, square-rooted.
 Ramzi Mabsout posted on Monday, November 23, 2009 - 1:20 am
Is it correct then to compute the probability effect of a standard
deviation increase as follows:

F(t[1]-B*MEAN)-F(t[1]-B*(MEAN+1SD))

Where F is the standard normal, MEAN the factor mean, SD the factor standard deviation, B the probit
unstandardised coefficient, and t(1) the first threshold?

Thank you very much

ramzi
 Bengt O. Muthen posted on Monday, November 23, 2009 - 2:32 pm
Yes, unless you have residual variances for the indicators that are different from unity. So it depends on whether you have covariates in the model and whether you have a multiple-group situation. See the handout of Topic 2 for how to compute probit-based probabilities when you have factors and covariates.
 Ramzi Mabsout posted on Tuesday, November 24, 2009 - 4:11 am
Thank you for that, I found the handout topic 2 very helpful. However, I am not considering the probability effects of the covariates on the indicators but the probability effect of the factor on the indicators. My question is, do I also have to include the covariates in the computation of the factor probability effects on the indicators?
 Bengt O. Muthen posted on Tuesday, November 24, 2009 - 1:03 pm
If the covariates influence the factor and not the indicators directly you don't have to include the covariates in the computation. You just have to know what the mean and variance of the factor is, either conditional on a certain covariate value or marginally.

See also the Mplus Web Note #4.
 Ramzi Mabsout posted on Thursday, November 26, 2009 - 9:37 am
Does this mean the formula I posted on the 23rd of November is the one to use to compute the probability effects of a factor on the categorical indicators?

thank you again.

ramzi
 Bengt O. Muthen posted on Thursday, November 26, 2009 - 10:03 am
Web Note 4 gives the answer in formula (7) which shows that you need to involve the residual variance theta. Theta is not a free parameter but is computed as a remainder to add up to unit latent response variable variance when there are no covariates in the model. This means that Delta in equation (10) is one which then gives you theta as a function of lambda and psi. Then you take the square root of that theta value and use it to divide your F arguments.
 Ramzi Mabsout posted on Friday, November 27, 2009 - 4:10 am
Thank you very much for this. Should I then use the inverse of the square root of the theta to compute both marginal factor effects on the indicators and model estimated probability proportion for the categorical indicator ? Can I also use the formulas in Webnote 4 to compute the factor probability effects on the reference indicator whose loading is fixed to one (to scale the factor; note that I am using the theta not delta parameterization)? & how come MPLUS computes unstandardised thresholds for this reference indicator if, according to Kamata & Bauer (2008:138), such thresholds should be 0?
thank you.
ramzi
 Bengt O. Muthen posted on Friday, November 27, 2009 - 10:51 am
Answers to your 3 questions:

Yes.

Yes, but with the Theta parameterization the latent response variable variance is not 1. Instead, you should work with theta=1. Note also that with covariates you follow the formulas of theTopic 2 item bias segment.

There are different ways to parameterize the model. I have not read the reference you give, but it might be that they fix the threshold to zero for the indicator with lambda fixed at 1 because they instead free the factor mean. These are equivalent parameterizations in terms of model fit. And as far as I can see none is preferable from an interpretation point of view.
 Ramzi Mabsout posted on Sunday, November 29, 2009 - 1:57 am
If I understand correctly then, the formula I posted on the 23rd November to compute the probability effect of a standard deviation increase in the factor on the categorical indicator __ F(t[1]-B*MEAN)-F(t[1]-B*(MEAN+1SD)) __ should be the correct one to use with covariates and the theta parametrisation.

I am however still not sure how to interpret this probability. It should not be the probability of changing thresholds at the mean of the factor because the other threshold, threshold2, is not included in the argument?
 Bengt O. Muthen posted on Sunday, November 29, 2009 - 4:53 pm
Regarding your first question, yes this is the correct formula if your covariates do not have direct effects on the item and you use the Theta parameterization. This is seen in the Topic 2 handout I pointed to - see page 163, eqns (46) and (47). You just have to drop x and note that theta=1.

Regarding your second question, your formula concerns the probability of being below the threshold which with a binary item implies the probability of observing 0 (not 1). It sounds like you consider a polytomous item. With say 3 categories you have 3 different probabilities to consider. Those probs are computed in line with eqns (18)-(20) of Technical Appendix 1 on our web site, where the x is your factor.
 George Burruss posted on Thursday, November 11, 2010 - 11:54 am
In a post above about calculating probabilities based on probit from WLSVM, Professor Muthen responded:

"...With latent variables you can compute the probability for say 1 SD below and above the latent variable mean, where SD comes from the latent variable variance, square-rooted."

My question: Because the factor mean in a single group analysis is zero, do you calculate the probability of a categorical observed variable with a latent variable of interest using a mean of zero?

That is for Y1 ON F1, where for F1 => B*(mean+1SD) where the mean is always zero and the SD is the square root of the estimated factor variance?
 Linda K. Muthen posted on Friday, November 12, 2010 - 9:25 am
Yes.
 Sarah Ryan posted on Wednesday, September 28, 2011 - 12:50 pm
I have a model with 4 control covariates, 4 LV's, and 2 x's predicting an observed y with 5 levels. I am using WLSMV.

The latent factors, but not any of their indicators, are regressed on the observed covariates. The observed y is regressed on the observed covariates. After reading the posts above, I'm still not clear on whether I need to include the covariates (ie use formulas 46 and 47 of Topic 2 handout) in the prediction of probabilities for y. I think I probably do, but can you confirm or disconfirm?

Thanks in advance.
 Bengt O. Muthen posted on Wednesday, September 28, 2011 - 6:00 pm
Yes, you do need to include the covariates - as is done in those formulas with x. Unless, you have centered (subtracted their means) those covariates and want to compute the probability at their means (which is zero for the centered version).
 Sarah Ryan posted on Thursday, September 29, 2011 - 4:48 pm
Okay, so in calculating y probabilities do I include only those covariates that share a significant association with the outcome (given covariate associations with all other predictors)?

Also, I would use this equation, then (with zero values for latent factor means), yes?

P (uij=1|eta*ij,xi) = 1– F[(tj – lambda*j_eta*i - kj_xi)*(1/sqrt(theta))]

(with theta being the y* residual variance in the standardized output).

Are you aware of any resources that can guide me in expanding this for a five-level outcome? I'm having trouble, and I'm not sure if it is because I'm not expanding the equation correctly or if my output is problematic. My thresholds are 1.51, 2.13, 2.19, and 2.96 and the y* residual variance is .45 (ie 1/sqrt(.45)=1.36), so I'm getting either very large or very small probabilities.
 Bengt O. Muthen posted on Thursday, September 29, 2011 - 9:22 pm
You should use all covariates included in the model, unless you consider a simplified model.

You get the factor mean value from Tech4 - it might not be zero, depending on the covariates influencing it.

See Appendix 1 of the V2 appendix to get the expression for the probabilities of an outcome u with more than 2 categories.
 Sarah Ryan posted on Friday, September 30, 2011 - 5:14 pm
Once again, thank you. With this information, I have predicted probabilities and the world makes sense again!
 Jak posted on Thursday, January 19, 2012 - 7:54 am
1) When I use (multilevel categorical) MLR estimation with link = probit, is it correct that the variance of the underlying latent response variables is 1?

2) I am trying to obtain the univariate probabilities for the categorical variables from the estimated thresholds. I thought a (first) threshold of -1.28 should match an observed proportion of .10 in the first category. But this seems incorrect, I hope you can help me out.


Thanks in advance!
 Linda K. Muthen posted on Thursday, January 19, 2012 - 6:43 pm
1. The residual variance is one.

2. See the formulas on page 440 of the user's guide.
 Jak posted on Wednesday, January 25, 2012 - 7:27 am
Thank you for your reply. To be sure:

With residual variance, you mean the residual variance at the within level?
Or the total (within+between) residual variance?

Kind regards, Suzanne
 Linda K. Muthen posted on Wednesday, January 25, 2012 - 11:35 am
It is the residual variance of the underlying latent response variable on within. In the between part of the model, the categorical variable is a continuous random intercept.
 Christina Policastro posted on Thursday, August 08, 2013 - 1:20 pm
I have estimated a probit model and have a question about the interpretation of the coefficients. I have noticed in other sources that authors recommend calculating marginal effects to make the probit coefficients more interpretable. However, this is not the method recommended in Chapter 14 of the Mplus user’s manual. Is there something about the way that the Mplus program estimates the coefficients that affects the translation of the probit coefficients? Why does the Mplus manual recommend calculating the predicted probabilities directly from the probit equation rather than from marginal effects? Thanks in advance for any help you can provide.
 Bengt O. Muthen posted on Thursday, August 08, 2013 - 2:45 pm
Marginal effects are certainly also of interest, but are not provided automatically by Mplus. For definitions, see e.g. the book by Long (1997), section 3.7.4. The marginal effect of one covariate varies as a function of other covariate values so it isn't a simple description. I think you can express them in Mplus using the new LOOP and PLOT options of MODEL CONSTRAINT. This way you would also get confidence intervals.
 db40 posted on Monday, September 01, 2014 - 9:32 am
Dear Dr. Muthen,

I am trying to calculate probit probabilities for a mediation model and I am wondering if this is correct.

Starting with X > M

Prob (M = 1 | physical abuse = 1)= F (- threshold + unstandardised est * 1) = F (-1.682)
= a probability of 0.046284.


Threshold = M1$1 1.995 (add the minus in front)
X1 (physical abuse = YES =1 ) = 0.313
 Linda K. Muthen posted on Monday, September 01, 2014 - 10:33 am
See Chapter 14 in the user's guide. There is a section on converting probit coefficients to probabilities that would apply to m on x.
 LIDYANE CAMELO posted on Monday, May 18, 2015 - 12:21 pm
Dear all,

I am using SEM with probit regression, since my outcome is a binary variable, and I would like to know if I am calculating the probabilities properly. I have six equation as you can see below, and I would like to estimate probabilities for U4 (binary outcome)

F1 BY U1-U3;
Y1 ON Y2 Y3;
Y4 ON F1 Y1 Y2 Y3;
F1 ON Y1 Y2 Y3;
Y5 ON F1 Y1 Y2 Y3 Y4;
U4 ON F1 Y1 Y2 Y3 Y4 Y5;

I have the following questions:

1) Is it necessary to use only the last equation to estimate probabilities of U4 or is it necessary to substitute the variables in the last equation for the equations that predict those variables? For example, in the second equation Y1 is regressed on Y2 and Y3. Thus, should I substitute the Y1 in the last equation for Y2 and Y3?
2) The F1 is a continuous latent variable and its indicators are ordinal variables (5 categories). By default the Mplus estimate the latent variable with mean 0 and SD equal 1, doesn’t it? However, when I use the TECH4 option the mean of F1 is not 0. Thus, I don’t know in which value I need fix the latent variable to calculate the probabilities.

Thank you very much.


Lidyane
 Bengt O. Muthen posted on Monday, May 18, 2015 - 5:19 pm
1) Just use the last equation.

2) In a single-group, non-mixture analysis the mean of F1 is zero by default. If this doesn't agree with what you see, please send output and license number to support.
 LIDYANE CAMELO posted on Tuesday, May 19, 2015 - 5:28 am
Thank you very much for your response!

With regard the mean of latent variable I saw that when I use the TECH4 option when I created the latent variable without the role SEM model the mean is zero. However, when I use TECH4 option with all the equations from the SEM the mean change. Is this normal? Can I really consider the mean equal zero in this case?
 Bengt O. Muthen posted on Tuesday, May 19, 2015 - 5:54 am
I see now that you have

F1 ON Y1 Y2 Y3;

so that although the intercept of F1 is zero by default, its mean is a function of the Y1-Y3 means and their slopes. Go by the Tech4 mean and SD (square root of the variance).
 LIDYANE CAMELO posted on Tuesday, May 19, 2015 - 6:33 am
Thank you very much, but for be sure, I would like to confirm some things:

1) I need run the role SEM model with the Tech4 option and use the mean of the the latent variable to calculate the probabilities, right?

2) With regard the SD of latent variable, you said that is the square root of variance. But where can I find the variance? Is the number in the diagonal of the covariance matrix?
 Bengt O. Muthen posted on Tuesday, May 19, 2015 - 6:30 pm
1) Right.

2) You get the variance from TECH4 also.
 LIDYANE CAMELO posted on Friday, June 19, 2015 - 10:31 am
Dear all,

I have one more question. I am calculating the probabilities using the information of my SEM model using probit regression using the follow equation :
P (u = 1 | x) = F (-t + b*x)

I would like to know, if I need to consider the correlation between errors in this equation, and if so how to do that?
 Bengt O. Muthen posted on Friday, June 19, 2015 - 5:11 pm
No, this univariate expression does not involve correlated errors.
 LIDYANE CAMELO posted on Friday, June 26, 2015 - 4:28 pm
I am using SEM with probit regression and I would like to estimate predict probabilities. I am using the equation below that I found in Mplus guide. However, I do not know where and how the residual variance need to be incorporated in this equation. I did not found this in Mplus guide.
P (u = 1 | x) = F (-t + b*x +b2*x2...)
Another question:
The “R-Squared + residual variance” need to be equal 1, doesn't it? I do not know what is happening, because when I use categorical variable the sum of these two estimates is higher than 1. You can see part of my output below. The order of the columns is: variable, R-Square and residual variance
Variable R-SQUARE Residual Variance
A_DM: 0.217 0.847
ESC_CAT5: 0.821 0.246
RENDAF: 0.745 0.339
CLASSE5 : 0.783 0.293
AF : 0.116 0.954
 Bengt O. Muthen posted on Sunday, June 28, 2015 - 10:38 am
There are no free residual variances estimated. Or, viewed a different way, they are fixed at 1 in probit and already incorporated in the parameters. The standardized solution will give R-square+resvar=1. The stand sol. gives resvars as "remainders", making the variance add up to 1.
 LIDYANE CAMELO posted on Monday, June 29, 2015 - 4:58 am
Thank you very much for your response! Now I understood.

I have one more question. I would like to plot graphs with probabilities of an event according with our main latent variable (X). For doing that I will fix X in 5 differents points (-2SD, -1SD, mean, +1SD, +2SD) and the others variables in the model will be fixed in their mean. Thus I will have 5 different probabilities and it will be possible to plot a graph. However, doing that, I am estimating only the direct effect of X on Y, but I have also interesting in the total effect of X on Y and I would like to know if there is some way to estimate the probabilities that will reflect the total effect instead of only direct effect.

Thank you.

Best regards
 Bengt O. Muthen posted on Monday, June 29, 2015 - 5:18 pm
You talk about X and Y and you also talk about a total effect instead of the direct effect of X on Y - what do you mean by total effect; it seems like you have a third variable?
 LIDYANE CAMELO posted on Monday, June 29, 2015 - 5:51 pm
I have one main explicative continuos variable, 6 mediators (1 binary and 5 continuous variables, including one latent variable) and a binary outcome. You can see below my equations. I would like to estimate probabilities of U4 (total and direct effect) according with F1. The direct effect of F1 on U4, I saw that I can estimate using the equation P (u = 1 | x) = F (-t + b*x) with the estimates of this equation: U4 ON F1 Y1 Y2 Y3 Y4 Y5 Y6. But I also want to estimate probabilies to reflect the total effect of F1 on U4 and I do not know how to do that.

F1 BY U1-U3;
U4 ON F1 Y1 Y2 Y3 Y4 Y5 Y6;
Y1 ON Y2 Y3;
Y2 ON F1 Y1;
Y3 ON Y1 Y2 Y5;
Y4 ON F1 Y1 Y2 Y5 Y6;
Y5 ON F1 Y1 Y2 Y3 Y4;
Y6 ON F1 Y3 Y4;
F1 ON Y1 Y2 Y3;
 Bengt O. Muthen posted on Wednesday, July 01, 2015 - 5:36 pm
Binary mediator and binary outcome both in principle call for the counterfactually-defined effects discussed in the paper on our website

Muthén, B. & Asparouhov, T. (2015). Causal effects in mediation modeling: An introduction with applications to latent variables. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 12-23. DOI:10.1080/10705511.2014.935843

This shows how to get the effects expressed in probabilities. But you have many complications in your model including reciprocal interaction (Y1 ON Y2 and Y2 ON Y1), so it would be hard to apply the above approach. You would need extensive statistical consulting.

The best you can do is to instead of focusing on the DV u4 you have to focus on u4*, the underlying continuous latent response variable for u4. The effects are then handled by WLSMV Model Indirect.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: