Calculating Probit Probabilities PreviousNext
Mplus Discussion > Categorical Data Modeling >
Message/Author
 George Burruss posted on Friday, November 07, 2008 - 12:56 pm
I want to make sure I am calculating the probabilities for a probit SEM, where my outcome is 4 category ordered variable. The example in the Mplus manual has an example for three categories.

Is this correct?

P(Y=1|x)=F(t1 - b1*x1+b2*x2...)

P(Y=2|x)=F(t2 - b1*x1+b2*x2...) - F(t1-b1*x1+b2*x2...)

P(Y=3|x)=F(t3 - b1*x1+b2*x2...) - F(t2 - b1*x1+b2*x2...) - F(t1 - b1*x1+b2*x2...)

P(Y=4|x)=F(-t3 + b1*x1+b2*x2...)

When I calculate the results, the last category drops from a probability of .62 to .01, which makes me think I'm doing it wrong.
 Bengt O. Muthen posted on Friday, November 07, 2008 - 5:36 pm
You have to carry the minus sign for all the x terms, not just the first.

Also the y=3 probability is incorrect - the last term should be dropped. Think of a normal curve where you are interested in the area between 2 thresholds - only the lower and the higher limit for the category are involved.
 George Burruss posted on Saturday, November 08, 2008 - 6:51 am
Thank you for the clarification; I see my mistake for Y3. Let me make sure I understand your first point.

Carry the minus sign for all x terms; thus for each level of Y, I subtract b1*x1, subtract b2*x2, subtract b3*x3...

So, P(Y=2|x)=F(t2 - b1*x1 - b2*x2...) - F(t1 - b1*x1 - b2*x2...)

And for the last level of Y, I add all the regression weights (e.g., P(Y=4|x)=F(-t3 + b1*x1 + b2*x2 + b3*x3...))
 Bengt O. Muthen posted on Saturday, November 08, 2008 - 7:28 am
Right.
 Ramzi Mabsout posted on Wednesday, November 18, 2009 - 10:01 am
Can I "use" the standardised probit coefficients & threshfolds to calculate probabilities? Do you expect them the probabilties calculated from the standardised coefficients to differ from those calculated from the unstandardied coefficients?

thank you.

ramzi
 Ramzi Mabsout posted on Wednesday, November 18, 2009 - 1:16 pm
If you may also please offer me some guidance on estimating the probability impacts on categorical indicators given changes in the latent variable with a Probit CFA .

thank you very much.

ramzi
 Bengt O. Muthen posted on Wednesday, November 18, 2009 - 5:49 pm
Use unstandardized estimates to compute probabilities.

With latent variables you can compute the probability for say 1 SD below and above the latent variable mean, where SD comes from the latent variable variance, square-rooted.
 Ramzi Mabsout posted on Monday, November 23, 2009 - 1:20 am
Is it correct then to compute the probability effect of a standard
deviation increase as follows:

F(t[1]-B*MEAN)-F(t[1]-B*(MEAN+1SD))

Where F is the standard normal, MEAN the factor mean, SD the factor standard deviation, B the probit
unstandardised coefficient, and t(1) the first threshold?

Thank you very much

ramzi
 Bengt O. Muthen posted on Monday, November 23, 2009 - 2:32 pm
Yes, unless you have residual variances for the indicators that are different from unity. So it depends on whether you have covariates in the model and whether you have a multiple-group situation. See the handout of Topic 2 for how to compute probit-based probabilities when you have factors and covariates.
 Ramzi Mabsout posted on Tuesday, November 24, 2009 - 4:11 am
Thank you for that, I found the handout topic 2 very helpful. However, I am not considering the probability effects of the covariates on the indicators but the probability effect of the factor on the indicators. My question is, do I also have to include the covariates in the computation of the factor probability effects on the indicators?
 Bengt O. Muthen posted on Tuesday, November 24, 2009 - 1:03 pm
If the covariates influence the factor and not the indicators directly you don't have to include the covariates in the computation. You just have to know what the mean and variance of the factor is, either conditional on a certain covariate value or marginally.

See also the Mplus Web Note #4.
 Ramzi Mabsout posted on Thursday, November 26, 2009 - 9:37 am
Does this mean the formula I posted on the 23rd of November is the one to use to compute the probability effects of a factor on the categorical indicators?

thank you again.

ramzi
 Bengt O. Muthen posted on Thursday, November 26, 2009 - 10:03 am
Web Note 4 gives the answer in formula (7) which shows that you need to involve the residual variance theta. Theta is not a free parameter but is computed as a remainder to add up to unit latent response variable variance when there are no covariates in the model. This means that Delta in equation (10) is one which then gives you theta as a function of lambda and psi. Then you take the square root of that theta value and use it to divide your F arguments.
 Ramzi Mabsout posted on Friday, November 27, 2009 - 4:10 am
Thank you very much for this. Should I then use the inverse of the square root of the theta to compute both marginal factor effects on the indicators and model estimated probability proportion for the categorical indicator ? Can I also use the formulas in Webnote 4 to compute the factor probability effects on the reference indicator whose loading is fixed to one (to scale the factor; note that I am using the theta not delta parameterization)? & how come MPLUS computes unstandardised thresholds for this reference indicator if, according to Kamata & Bauer (2008:138), such thresholds should be 0?
thank you.
ramzi
 Bengt O. Muthen posted on Friday, November 27, 2009 - 10:51 am
Answers to your 3 questions:

Yes.

Yes, but with the Theta parameterization the latent response variable variance is not 1. Instead, you should work with theta=1. Note also that with covariates you follow the formulas of theTopic 2 item bias segment.

There are different ways to parameterize the model. I have not read the reference you give, but it might be that they fix the threshold to zero for the indicator with lambda fixed at 1 because they instead free the factor mean. These are equivalent parameterizations in terms of model fit. And as far as I can see none is preferable from an interpretation point of view.
 Ramzi Mabsout posted on Sunday, November 29, 2009 - 1:57 am
If I understand correctly then, the formula I posted on the 23rd November to compute the probability effect of a standard deviation increase in the factor on the categorical indicator __ F(t[1]-B*MEAN)-F(t[1]-B*(MEAN+1SD)) __ should be the correct one to use with covariates and the theta parametrisation.

I am however still not sure how to interpret this probability. It should not be the probability of changing thresholds at the mean of the factor because the other threshold, threshold2, is not included in the argument?
 Bengt O. Muthen posted on Sunday, November 29, 2009 - 4:53 pm
Regarding your first question, yes this is the correct formula if your covariates do not have direct effects on the item and you use the Theta parameterization. This is seen in the Topic 2 handout I pointed to - see page 163, eqns (46) and (47). You just have to drop x and note that theta=1.

Regarding your second question, your formula concerns the probability of being below the threshold which with a binary item implies the probability of observing 0 (not 1). It sounds like you consider a polytomous item. With say 3 categories you have 3 different probabilities to consider. Those probs are computed in line with eqns (18)-(20) of Technical Appendix 1 on our web site, where the x is your factor.
 George Burruss posted on Thursday, November 11, 2010 - 11:54 am
In a post above about calculating probabilities based on probit from WLSVM, Professor Muthen responded:

"...With latent variables you can compute the probability for say 1 SD below and above the latent variable mean, where SD comes from the latent variable variance, square-rooted."

My question: Because the factor mean in a single group analysis is zero, do you calculate the probability of a categorical observed variable with a latent variable of interest using a mean of zero?

That is for Y1 ON F1, where for F1 => B*(mean+1SD) where the mean is always zero and the SD is the square root of the estimated factor variance?
 Linda K. Muthen posted on Friday, November 12, 2010 - 9:25 am
Yes.
 Sarah Ryan posted on Wednesday, September 28, 2011 - 12:50 pm
I have a model with 4 control covariates, 4 LV's, and 2 x's predicting an observed y with 5 levels. I am using WLSMV.

The latent factors, but not any of their indicators, are regressed on the observed covariates. The observed y is regressed on the observed covariates. After reading the posts above, I'm still not clear on whether I need to include the covariates (ie use formulas 46 and 47 of Topic 2 handout) in the prediction of probabilities for y. I think I probably do, but can you confirm or disconfirm?

Thanks in advance.
 Bengt O. Muthen posted on Wednesday, September 28, 2011 - 6:00 pm
Yes, you do need to include the covariates - as is done in those formulas with x. Unless, you have centered (subtracted their means) those covariates and want to compute the probability at their means (which is zero for the centered version).
 Sarah Ryan posted on Thursday, September 29, 2011 - 4:48 pm
Okay, so in calculating y probabilities do I include only those covariates that share a significant association with the outcome (given covariate associations with all other predictors)?

Also, I would use this equation, then (with zero values for latent factor means), yes?

P (uij=1|eta*ij,xi) = 1– F[(tj – lambda*j_eta*i - kj_xi)*(1/sqrt(theta))]

(with theta being the y* residual variance in the standardized output).

Are you aware of any resources that can guide me in expanding this for a five-level outcome? I'm having trouble, and I'm not sure if it is because I'm not expanding the equation correctly or if my output is problematic. My thresholds are 1.51, 2.13, 2.19, and 2.96 and the y* residual variance is .45 (ie 1/sqrt(.45)=1.36), so I'm getting either very large or very small probabilities.
 Bengt O. Muthen posted on Thursday, September 29, 2011 - 9:22 pm
You should use all covariates included in the model, unless you consider a simplified model.

You get the factor mean value from Tech4 - it might not be zero, depending on the covariates influencing it.

See Appendix 1 of the V2 appendix to get the expression for the probabilities of an outcome u with more than 2 categories.
 Sarah Ryan posted on Friday, September 30, 2011 - 5:14 pm
Once again, thank you. With this information, I have predicted probabilities and the world makes sense again!
 Jak posted on Thursday, January 19, 2012 - 7:54 am
1) When I use (multilevel categorical) MLR estimation with link = probit, is it correct that the variance of the underlying latent response variables is 1?

2) I am trying to obtain the univariate probabilities for the categorical variables from the estimated thresholds. I thought a (first) threshold of -1.28 should match an observed proportion of .10 in the first category. But this seems incorrect, I hope you can help me out.


Thanks in advance!
 Linda K. Muthen posted on Thursday, January 19, 2012 - 6:43 pm
1. The residual variance is one.

2. See the formulas on page 440 of the user's guide.
 Jak posted on Wednesday, January 25, 2012 - 7:27 am
Thank you for your reply. To be sure:

With residual variance, you mean the residual variance at the within level?
Or the total (within+between) residual variance?

Kind regards, Suzanne
 Linda K. Muthen posted on Wednesday, January 25, 2012 - 11:35 am
It is the residual variance of the underlying latent response variable on within. In the between part of the model, the categorical variable is a continuous random intercept.
 Christina Policastro posted on Thursday, August 08, 2013 - 1:20 pm
I have estimated a probit model and have a question about the interpretation of the coefficients. I have noticed in other sources that authors recommend calculating marginal effects to make the probit coefficients more interpretable. However, this is not the method recommended in Chapter 14 of the Mplus user’s manual. Is there something about the way that the Mplus program estimates the coefficients that affects the translation of the probit coefficients? Why does the Mplus manual recommend calculating the predicted probabilities directly from the probit equation rather than from marginal effects? Thanks in advance for any help you can provide.
 Bengt O. Muthen posted on Thursday, August 08, 2013 - 2:45 pm
Marginal effects are certainly also of interest, but are not provided automatically by Mplus. For definitions, see e.g. the book by Long (1997), section 3.7.4. The marginal effect of one covariate varies as a function of other covariate values so it isn't a simple description. I think you can express them in Mplus using the new LOOP and PLOT options of MODEL CONSTRAINT. This way you would also get confidence intervals.
 db40 posted on Monday, September 01, 2014 - 9:32 am
Dear Dr. Muthen,

I am trying to calculate probit probabilities for a mediation model and I am wondering if this is correct.

Starting with X > M

Prob (M = 1 | physical abuse = 1)= F (- threshold + unstandardised est * 1) = F (-1.682)
= a probability of 0.046284.


Threshold = M1$1 1.995 (add the minus in front)
X1 (physical abuse = YES =1 ) = 0.313
 Linda K. Muthen posted on Monday, September 01, 2014 - 10:33 am
See Chapter 14 in the user's guide. There is a section on converting probit coefficients to probabilities that would apply to m on x.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: