Threshold and probability
Message/Author
 John Lee posted on Monday, June 22, 2009 - 12:50 am
Hi,

At the Mplus Users Guide V5 (p. 407), it was stated that the P(u=1|x)=1/(1+exp(-a-b*x)).

I have just fitted a simple model on a binary variable (0: 100 times vs 1: 200 times):
TITLE: binary variable
DATA: FILE IS binary1.dat;
VARIABLE: NAMES x w; ! w: frequency
FREQWEIGHT IS w;
CATEGORICAL ARE x;
USEVARIABLES ARE x;
MODEL:
[x\$1];
OUTPUT:

The followings are part of the output:
SUMMARY OF ANALYSIS

SUMMARY OF CATEGORICAL DATA PROPORTIONS
X
Category 1 0.333
Category 2 0.667

MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value

Thresholds
X\$1 -0.431 0.075 -5.754 0.000

The estimated probability is clearly .667 (200/300). When I do the calculations based on the estimated threshold, it is P(u=1|x)=1/(1+exp(0.431))=.39. Even if I use 1-.39=.606, it is still different from the expected value .667.

Did I miss something? Thanks.
 Bengt O. Muthen posted on Monday, June 22, 2009 - 7:55 am
You should use the formula

P(u=1|x)=1/(1+exp(-0.431))=.606

because the threshold is -a. But then you also have to take into account the freqweight values.
 Linda K. Muthen posted on Monday, June 22, 2009 - 8:58 am
Also, I think you are using the formula for logistic regression rather than probit regression.
 John Lee posted on Monday, June 22, 2009 - 7:16 pm
Dear Bengt and Linda,

Thanks for the replies. Yes, I am using the formula for logistic regression as I thought that the default link function is logistic rather than probit.

In my data file, the data are:
x w
0 100
1 200

where w is the frequency weight. I have also tried a version without the frequency weight. That is, I created 100 "0" and 200 "1".

The estimated probability based on the formula is P(u=1|x)=1/(1+exp(-0.431))=.606.

My concern is that the "correct" estimate on the probability should clearly be 100/300=.667.
 Linda K. Muthen posted on Tuesday, June 23, 2009 - 6:05 am
For maximum likelihood, the default link is logit. However, the default estimator for categorical outcomes is weighted least squares and probit regression. What you posted does not show which estimator you used. Please send your full output and license number to support@statmodel.com.