Message/Author |
|
Phil Wood posted on Sunday, August 09, 2020 - 10:39 am
|
|
|
When using a single continuous predictor variable with mean zero and standard deviation of 1 to predicct a categorical variable (which has a 30%/70% split) I notice that the unstandardized and standardized thresholds are different for logistic regression (using either WLSMV or MLR). I note that the continuous variable is skewed, but I don't understand why I would observe different thresholds. Does anyone have insight into this? thanks! |
|
Phil Wood posted on Monday, August 10, 2020 - 12:59 pm
|
|
|
I guess I should also add that I'm puzzled as to why the threshold does not convert to the sample proportions, given that the predictor variable has a mean of zero. thanks! |
|
|
The standardized threshold becomes a z-score for the continuous latent response variable Y* given that this variable has unit variance when standardized. The standardization divides the threshold by the model-estimated SD of Y*. |
|
Phil Wood posted on Monday, August 10, 2020 - 3:43 pm
|
|
|
Thanks! that's very helpful. Do you have thoughts as to why a mean-centered predictor in a one predictor model would not result in the threshold representing the binary proportions in the sample? |
|
|
The Y* variance conditional on X is set to 1. So the total variance is the explained variance plus 1, that is, the unstandardized threshold is not for a Y* in a variance 1 metric. |
|
Phil Wood posted on Monday, August 10, 2020 - 6:21 pm
|
|
|
I apologize for being so dense. There's really two questions here and you've answered my question about the variance of the latent response variable. My other question is: if the continuous predictor is centered to a mean of zero why does the logit (or probit, if you're using that) of the threshold not correspond to the proportion of zeros and ones in the dichotomous variable? I would think if the predictor variable has a mean of zero in the data set that the proportion would get recovered that way. It's very close, but not a replication. |
|
|
Yes, this is tricky (as Joreskog used to say). Regressing a binary u on a continuous x variable, we model the conditional probability P(u=1 |x) and sometimes we are interested in the marginal P(u=1) derived from that model - this is what you are looking at now I think. The average x is 0 but P(u=1) is not the same as P(u=1 | x=0). It is instead the average over all observations of P(u=1 | x_i). The average of the probabilities is not the same as the probability of the average (of x) due to the P function being non-linear in its argument. As an aside, as on page 233, you can do a LOOP PLOT of the estimated probabilities for different values of x using Model Constraint with prob = phi(-tau + b*x); |
|
Phil Wood posted on Tuesday, August 11, 2020 - 11:43 am
|
|
|
thanks. That clears it up well. I had been thinking that, analogous to a continuous regression line going through (mean_x, mean_y) that something similar had to happen here. Your explanation also covers why the discrepancy is more marked for the case of a skewed independent variable. |
|
Back to top |