I am regressing a latent class variable (5 classes) on several nominal covariates (each coded in dummy variables), using the manual 3-step method. I am using MODEL CONSTRAINT NEW to get class probabilities conditional on levels of each covariate (setting the others to their means); and based on these probabilities to calculate some risk ratios (e.g., ratio of class 1 probability comparing rural to urban area). I have three questions:
1) For these quantities, can I trust the SEs that are estimated? Or is it preferable to bootstrap for SEs and CIs?
2) If bootstrapping, is it preferable to use BOOTSTRAP or BCBOOTSTRAP?
3) How is the bootstrapping done? Is it drawing multiple samples with replacement from the data and estimating the model for each sample? Or is it drawing from the model-estimated distributions of the key parameters? I suspect the former, because regression coefficient SEs change between no bootstrapping and bootstrapping, but I wonder about this because while bootstrapping takes a long time, it does not take 1000 times more time when I specify BOOTSTRAP = 1000.
May I ask for further clarification on point 2? Your answer suggests that CINTERVAL(BOOTSTRAP) gives symmetric CIs and CINTERVAL(BCBOOTSTRAP) gives non-symmetric CIs. But when I specified CINTERVAL(BOOTSTRAP), the confidence intervals were not symmetric, which makes me think they may not be in the form of estimate +/- SE*multiplier. Are they quantile intervals?
Also, what does BCBOOTSTRAP do that is different from BOOTSTRAP?
If this is too much to explain, please kindly suggest what I should read to understand these methods.
in the ANALYSIS command you get bootstrapped SEs and the confidence intervals can use those SEs in a symmetric fashion like regular ML SEs. But if you also say in the OUTPUT command
you get the percentiles for the confidence interval from the bootstrap sample and the interval is therefore non-symmetric. This is described on page 727 of the V7 UG. Some authors recommend BCBOOTSTRAP.
May I ask a related question: Is there a way to save the intercepts and regression coefficients from all the bootstrap samples?
I am interested in risk ratios associated with each covariate while controlling for the others. Instead of setting all the other covariates at their means (or any other arbitrary pattern), I am thinking it would be better to use all the individual data. Say with a covariate with two levels 0 and 1, I would like to: (1) assign value 0 to everyone while keeping their own data for the other covariates, use the model estimated intercepts and coefficients to calculate all individuals' class probabilities and take the mean for each class probability; then (2) assign value 1, calculate individual class probabilities and take the mean; and then (3) calculate risk ratios using these mean probabilities. To get bootstrap confidence intervals for these risk ratios, I would need to do this using the intercepts and regression coefficients from all the bootstrap samples.
Does this make sense to you, and is there a way to save all these intercepts and coefficients?
If not, do you have advice for how else I could achieve this? Or another way to summarize the effect of one covariate adjusting for the others?
Thank you. Could you direct me to some reading to learn about this constant odds ratio feature of the multinomial logistic regression? I am only familiar with constant odds ratio in the ordered logistic regression for an ordinal dependent variable (assuming constant odds ratio regardless of threshold).
With multinomial logistic regression, I am only aware of odds ratio of being in one class as opposed to a reference class. With number of classes > 2, I find this odds ratio hard to understand (and easy to misunderstand) because it ignores what is happening with the classes outside the pair.
The constant odds ratio property of logistic regression does not have to do with ordered polytomous response but is present already in the binary outcome case. See our Topic 2 handout from our video courses, where slides 39-40 show this property. On slides 58-59 you see the multinomial model for nominal outcomes and you see that for 2 categories it is the same as regular logistic regression for a binary outcome. Put the two sets of slides together and you can show that the property holds also for the nominal case. Not sure I can point to a book on it.
Oh I see what you mean. The OR is constant conditioning on the other covariates, regardless of the pattern of the other covariates.
What I struggle with is the unwieldy interpretation of the OR in multinomial regression. If we have three classes C=1,2,3 regressed on a binary X1=0,1 and several other predictors X2...Xn, regression coefficients are log OR of one class as opposed to the reference class, e.g.,
For inference, most of the time this quantity is not immediately useful, as we also need to know how many people are in class 2 for each level of X1. That's why I am converting coefs to probabilities:
P(C=1|X1=1,X2...Xn) P(C=1|X1=0,X2...Xn) etc.
and calculating risk ratios to compare the cases of X1=1 to X1=0:
P(C=1|X1=1,X2...Xn)/P(C=1|X1=0,X2...Xn) P(C=1 or C=2|X1=1,X2...Xn)/P(C=1 or C=2|X1=0,X2...Xn)
Unlike the ORs, these probabilities and RRs (and their statistical significance) are not constant, as they depend on the values of X2...Xn. Are there conventions regarding which values of X2...Xn to use?
I understand this goes beyond the scope of Mplus Discussion, but would really appreciate any advice you have.