Mplus Discussion >> Beginner Q on standardized logistic regression

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Beginner Q on standardized logistic r...

Mplus Discussion > Categorical Data Modeling >

Message/Author

Phil Wood posted on Sunday, August 09, 2020 - 10:39 am

When using a single continuous predictor variable with mean zero and standard deviation of 1 to predicct a categorical variable (which has a 30%/70% split) I notice that the unstandardized and standardized thresholds are different for logistic regression (using either WLSMV or MLR). I note that the continuous variable is skewed, but I don't understand why I would observe different thresholds. Does anyone have insight into this? thanks!

Phil Wood posted on Monday, August 10, 2020 - 12:59 pm

I guess I should also add that I'm puzzled as to why the threshold does not convert to the sample proportions, given that the predictor variable has a mean of zero. thanks!

Bengt O. Muthen posted on Monday, August 10, 2020 - 3:00 pm

The standardized threshold becomes a z-score for the continuous latent response variable Y* given that this variable has unit variance when standardized. The standardization divides the threshold by the model-estimated SD of Y*.

Phil Wood posted on Monday, August 10, 2020 - 3:43 pm

Thanks! that's very helpful. Do you have thoughts as to why a mean-centered predictor in a one predictor model would not result in the threshold representing the binary proportions in the sample?

Bengt O. Muthen posted on Monday, August 10, 2020 - 4:01 pm

The Y* variance conditional on X is set to 1. So the total variance is the explained variance plus 1, that is, the unstandardized threshold is not for a Y* in a variance 1 metric.

Phil Wood posted on Monday, August 10, 2020 - 6:21 pm

I apologize for being so dense. There's really two questions here and you've answered my question about the variance of the latent response variable. My other question is: if the continuous predictor is centered to a mean of zero why does the logit (or probit, if you're using that) of the threshold not correspond to the proportion of zeros and ones in the dichotomous variable? I would think if the predictor variable has a mean of zero in the data set that the proportion would get recovered that way. It's very close, but not a replication.

Bengt O. Muthen posted on Tuesday, August 11, 2020 - 9:30 am

Yes, this is tricky (as Joreskog used to say). Regressing a binary u on a continuous x variable, we model the conditional probability P(u=1 |x) and sometimes we are interested in the marginal P(u=1) derived from that model - this is what you are looking at now I think. The average x is 0 but P(u=1) is not the same as P(u=1 | x=0). It is instead the average over all observations of P(u=1 | x_i). The average of the probabilities is not the same as the probability of the average (of x) due to the P function being non-linear in its argument.

As an aside, as on page 233, you can do a LOOP PLOT of the estimated probabilities for different values of x using Model Constraint with

prob = phi(-tau + b*x);

Phil Wood posted on Tuesday, August 11, 2020 - 11:43 am

thanks. That clears it up well. I had been thinking that, analogous to a continuous regression line going through (mean_x, mean_y) that something similar had to happen here. Your explanation also covers why the discrepancy is more marked for the case of a skewed independent variable.