Mplus Discussion >> Bootstrap SE/CI for class probs given xs & RRs

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Bootstrap SE/CI for class probs given...

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Trang Q. Nguyen posted on Wednesday, September 18, 2013 - 9:38 am

I am regressing a latent class variable (5 classes) on several nominal covariates (each coded in dummy variables), using the manual 3-step method. I am using MODEL CONSTRAINT NEW to get class probabilities conditional on levels of each covariate (setting the others to their means); and based on these probabilities to calculate some risk ratios (e.g., ratio of class 1 probability comparing rural to urban area). I have three questions:

1) For these quantities, can I trust the SEs that are estimated? Or is it preferable to bootstrap for SEs and CIs?

2) If bootstrapping, is it preferable to use BOOTSTRAP or BCBOOTSTRAP?

3) How is the bootstrapping done? Is it drawing multiple samples with replacement from the data and estimating the model for each sample? Or is it drawing from the model-estimated distributions of the key parameters? I suspect the former, because regression coefficient SEs change between no bootstrapping and bootstrapping, but I wonder about this because while bootstrapping takes a long time, it does not take 1000 times more time when I specify BOOTSTRAP = 1000.

Thank you!

Linda K. Muthen posted on Wednesday, September 18, 2013 - 10:35 am

1. You may want to use bootstrap if you think your risk ratios are not normally distributed.

2. BOOTSTRAP gives bootstrapped standard errors. If you do not want symmetric confidence intervals, you can ask for BCBOOTSTRAP.

3. We draw multiple samples with replacement from the data.

Trang Q. Nguyen posted on Wednesday, September 18, 2013 - 11:14 am

Thank you so much. This is very helpful.

May I ask for further clarification on point 2? Your answer suggests that CINTERVAL(BOOTSTRAP) gives symmetric CIs and CINTERVAL(BCBOOTSTRAP) gives non-symmetric CIs. But when I specified CINTERVAL(BOOTSTRAP), the confidence intervals were not symmetric, which makes me think they may not be in the form of estimate +/- SE*multiplier. Are they quantile intervals?

Also, what does BCBOOTSTRAP do that is different from BOOTSTRAP?

If this is too much to explain, please kindly suggest what I should read to understand these methods.

Thanks much!

Bengt O. Muthen posted on Wednesday, September 18, 2013 - 11:35 am

When you say

BOOTSTRAP = 500;

in the ANALYSIS command you get bootstrapped SEs and the confidence intervals can use those SEs in a symmetric fashion like regular ML SEs. But if you also say in the OUTPUT command

CINTERVAL(BOOTSTRAP)

or

CINTERVAL(BCBOOTSTRAP)

you get the percentiles for the confidence interval from the bootstrap sample and the interval is therefore non-symmetric. This is described on page 727 of the V7 UG. Some authors recommend BCBOOTSTRAP.

Trang Q. Nguyen posted on Wednesday, September 18, 2013 - 12:10 pm

I am crystal clear now. Thank you so much! You both are extremely helpful!

Trang Q. Nguyen posted on Wednesday, September 18, 2013 - 2:27 pm

May I ask a related question: Is there a way to save the intercepts and regression coefficients from all the bootstrap samples?

I am interested in risk ratios associated with each covariate while controlling for the others. Instead of setting all the other covariates at their means (or any other arbitrary pattern), I am thinking it would be better to use all the individual data. Say with a covariate with two levels 0 and 1, I would like to: (1) assign value 0 to everyone while keeping their own data for the other covariates, use the model estimated intercepts and coefficients to calculate all individuals' class probabilities and take the mean for each class probability; then (2) assign value 1, calculate individual class probabilities and take the mean; and then (3) calculate risk ratios using these mean probabilities. To get bootstrap confidence intervals for these risk ratios, I would need to do this using the intercepts and regression coefficients from all the bootstrap samples.

Does this make sense to you, and is there a way to save all these intercepts and coefficients?

If not, do you have advice for how else I could achieve this? Or another way to summarize the effect of one covariate adjusting for the others?

Thank you!

Bengt O. Muthen posted on Wednesday, September 18, 2013 - 3:50 pm

No, Mplus doesn't save the bootstrap values. A simple approach would be to utilize the constant odds ratio feature of the multinomial logistic regression; that is controlling for other covariates.

Trang Q. Nguyen posted on Wednesday, September 18, 2013 - 7:47 pm

Thank you. Could you direct me to some reading to learn about this constant odds ratio feature of the multinomial logistic regression? I am only familiar with constant odds ratio in the ordered logistic regression for an ordinal dependent variable (assuming constant odds ratio regardless of threshold).

With multinomial logistic regression, I am only aware of odds ratio of being in one class as opposed to a reference class. With number of classes > 2, I find this odds ratio hard to understand (and easy to misunderstand) because it ignores what is happening with the classes outside the pair.

Thanks much!

Bengt O. Muthen posted on Thursday, September 19, 2013 - 9:28 am

The constant odds ratio property of logistic regression does not have to do with ordered polytomous response but is present already in the binary outcome case. See our Topic 2 handout from our video courses, where slides 39-40 show this property. On slides 58-59 you see the multinomial model for nominal outcomes and you see that for 2 categories it is the same as regular logistic regression for a binary outcome. Put the two sets of slides together and you can show that the property holds also for the nominal case. Not sure I can point to a book on it.

Trang Q. Nguyen posted on Friday, September 20, 2013 - 8:40 am

Oh I see what you mean. The OR is constant conditioning on the other covariates, regardless of the pattern of the other covariates.

What I struggle with is the unwieldy interpretation of the OR in multinomial regression. If we have three classes C=1,2,3 regressed on a binary X1=0,1 and several other predictors X2...Xn, regression coefficients are log OR of one class as opposed to the reference class, e.g.,

log((P(C=1|X1=1,X2...Xn)/P(C=3|X1=1,X2...Xn))/(P(C=1|X1=0,X2...Xn)/P(C=3|X1=0,X2...Xn))

For inference, most of the time this quantity is not immediately useful, as we also need to know how many people are in class 2 for each level of X1. That's why I am converting coefs to probabilities:

P(C=1|X1=1,X2...Xn)
P(C=1|X1=0,X2...Xn)
etc.

and calculating risk ratios to compare the cases of X1=1 to X1=0:

P(C=1|X1=1,X2...Xn)/P(C=1|X1=0,X2...Xn)
P(C=1 or C=2|X1=1,X2...Xn)/P(C=1 or C=2|X1=0,X2...Xn)

Unlike the ORs, these probabilities and RRs (and their statistical significance) are not constant, as they depend on the values of X2...Xn. Are there conventions regarding which values of X2...Xn to use?

I understand this goes beyond the scope of Mplus Discussion, but would really appreciate any advice you have.

Bengt O. Muthen posted on Friday, September 20, 2013 - 5:51 pm

Typically one conditions on the means of X2...Xn I would think. I don't know if there are conventions in epidemiology - perhaps there is an EPINET discussion list?

Trang Q. Nguyen posted on Saturday, September 21, 2013 - 12:36 pm

Thank you. I'll use the means and perhaps the most common pattern(s). Thanks much for your help!