I have two questions. Firstly how can I use these results to plot the probability of category membership given a certain value of 'PARTNER'. Secondly is it possible to calculate a fit statistic for my model or, in terms of the outcome measure, the percentage of accurate classification.
Many thanks Peter Mulhall
bmuthen posted on Saturday, November 09, 2002 - 12:23 pm
For the first question, please see the Mplus User's Guide, Technical Appendix 1, bottom of page 342. You may also check the Agresti pages mentioned here. You want to compute probabilities as a function of Partner values, but because you have several other covariates in addition to Partner, you have to decide at which values of the other covariates you want to do this, e.g. for males, for a specific treatment, for average Units, etc.
For the second question, yes, this should be possible - please refer to the Hosmer-Lemeshow logistic regression book in the Reference section of the Mplus web site.
Thank you for your prompt reply. I did look at the manual but for the benefit of someone with a phobia of formulae would you mind spelling out as linear equations how to calculate the probabilities for the three categories based on one variable (say PARTNER) and the values above.
Many,many thanks Peter Mulhall
bmuthen posted on Monday, November 11, 2002 - 4:46 pm
Although this goes a little beyond what we typically do, I think this is a phobia we can easily help with. Look at equation (27), page 342. Beta2 corresponds to your highest threshold which looks like it is 0.320 from your message above - except that you have to switch the sign, so Beta2 = -0.320 (Similarly, Beta1 is the negative of your lowest threshold -1.648). If you had only Partner among your predictors and got the slope coefficient -0.372, this would be the value to use for Beta in (27), multiplying by the Partner value (x). Using a hand calculator, (27) then gives you the probability for the highest category as a function of different Partner (x) values.
Note also that there are good introductory books on this. If Hosmer-Lemeshow is not suitable, Agresti has 2 books - one that is called An Introduction to Categorical Data Analysis.
Anonymous posted on Tuesday, November 12, 2002 - 12:49 pm
Following this thread would one be right in saying then that the probability of of being in the middle category was ((p=1/x or 2/x)-(p=2/x)) and that the probability of being in the lowest category was 1-(p=1/x or 2/x)?
bmuthen posted on Tuesday, November 12, 2002 - 2:57 pm
Yes, a proportional odds model is used. There is no automatic test of the assumption. I think it should be possible to do such testing via the Mplus multinomial logistic regression for unordered categorical response, using Model test (Wald test) and Model constraint features (see the Version 4 User's Guide on the Mplus web site).
socrates posted on Tuesday, November 14, 2006 - 10:39 am
Dear Dr. Muthén
In order to compare two sets of different predictors, I ran two ordinal logistic regression models with the same criterion (positive, neutral and negative) but different predictors. How can I now decide which set of predictors is the better one? To my knowledge, a likelihood ratio test is not appropriate because the models are not nested (they contain different predictors). However, can I rely on the BIC and if yes, is there a significance test to compare the two BIC values (like, e.g., the Lo-Mendell-Rubin test in GMM)?
I estimated a single factor model for ordinal observed variables (six ordinal categories) using ML: CATEGORICAL ARE y1-y4; MODEL: f by y1-y4; [f@0]; My understanding is that the thresholds in this model are the cumulative log odds for y <= category j. I thought that the predicted probability for category 1 for y1 should then be exp(y1$1)/(1+exp(y1$1)), and for category 2 it should be exp(y1$2)/(1+exp(y1$2)) minus the predicted probability for category 1, and so on. However, this does not correspond to the estimated proportions in the residual output for the univariate distributions. Would you mind indicating what is wrong with my calculations and how to calculate the predicted probabilities from the estimated thresholds in a latent variable model?
I recently ran a path model which contained both a categorical dependent variable and two categorical mediator variables. I have now received some comments back from a journal asking me to run a model with logit coefficients instead of the default probit coefficients. My analysis includes a sampling weight and I have found that specifying method = MLR will run the model and give me a logit coefficients in the outcome (although I am not completely sure how this differs from method = ML). However, I am wanting to know how I should interpret these coefficients, in particular for the mediating categorical variables in the model. Can I assume the coefficient is the logit coefficient and the thresholds in the output are the intercepts for each of the different categories of the outcome? and that the odds are proportional? I hope that question makes sense.
The estimates from ML and MLR are the same. Only the standard errors and fit statistics differ. MLR has robust standard errors. All regression coefficients, for both final and mediating variables, are ogistic regression coefficients. A threshold is the negative value of an intercept. Yes, the odds are proportional.
Maximum likelihood estimation in general is not defined for weights for either parameter estimates or standard errors. The MLR standard error computations can include weights and the parameter estimates are with weights pseudo maximum likelihood.
I am a new user of Mplus and have some questions on how to run ordinal logistic regression models when also testing mediation or moderation. We have 4 categorical independent variables on the pubertal and psycho-social timing of adolescents and made 8 dummies (late or early timing)to include them in the analysis. Furthermore, there is one continuous independent variable in the analysis (alcohol specific rules set by parents).The outcome variable is Alcohol use defined by: non-drinkers, drinkers, bingers. We want to test in 2 separate models if the effect of the timing measures (dummies)on Alcohol use is mediated by alcohol specific rules set by parents OR if there are any interaction effects between the timing variables and alcohol specific rules set by parents regarding alcohol use. In the mediation model we used a bootstrap, estimator WLSMV In the moderation model we used MLR.
- The ordinal outcome variable is skewed to the right. Is there something I should do to overcome the problem of skewness? - Which estimators are most appropriate for these 2 analyses and which output would you recommend to ask for? sample size = 1893. - In de mediation model, I do get significant indirect results for the dummy variables, however, the direct model results of some dummies on alcohol use are not significant. Does this mean that there can't be mediation for these variables, because there is no direct effect?
I still have one more question regarding the mediation and moderation analyses I described in my previous question. I used the MLR estimator for moderation analysis. Is this the best estimator to use in this analysis or are there better onces I could use? Furthermore, which fit indices should I at least report in these mediation and moderation analyses?
Your estimator choice will be determined in most cases by the analysis. MLR is a good choice. However, if you want to estimate indirect effects you need to use WLSMV. All fit indices that are available will be given as the default.
I am estimating a logistic regression with a number of predictors. Because Mplus deletes cases with missing X variables, I am calling in variances of some of the predictors. I also have a quadratic term. However, when I try to call in the variance of the quadratic term, the model no longer converges. Is there some reason this cannot be done?
Linda, thank you for a quick answer. I am estimating STI infection risk from number of partners. I have both the number of partners and number squared to see if the relationship is linear or quadratic. When I call in variance of all but the quadratic term, the quadratic term is significant, but I am losing N because it has missing values. Calling for variances of all of the predictors (including the quadratic term) results in non-convergence.
Thank you. I have sent my particular question to the support staff. I also have a broader question regarding calling in variances. You say that, if I bring in variances, it should be for all of the predictors. What about dichotomous predictors like gender (especially when they do not have any missing values)?
I am having difficulty running a logistic regression on my ordinal data (25 items loading onto 5 factors). My outcome variables are dichotomous (7) and continuous (1). The data is weighted so I am using the MLR estimator. This is what I am trying to model:
bcon BY BCON1 BCON2 BCON3 BCON4 BCON5; bhyp BY BHYP1 BHYP2 BHYP3 BHYP4 BHYP5; bemo BY BEMO1 BEMO2 BEMO3 BEMO4 BEMO5; bpeer BY BPEER1 BPEER2 BPEER3 BPEER4 BPEER5; bpro BY BPRO1 BPRO2 BPRO3 BPRO4 BPRO5;
But I get this fatal error message: THERE IS NOT ENOUGH MEMORY SPACE … THE ANALYSIS REQUIRES 5 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.75938E+06 INTEGRATION POINTS … YOU CAN TRY TO REDUCE THE NUMBER OF DIMENSIONS OF INTEGRATION OR THE NUMBER OF INTEGRATION POINTS OR USE INTEGRATION=MONTECARLO WITH FEWER NUMBER OF INTEGRATION POINTS… So I tried it with the MONTECARLO simulation and got this error: THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-ZERO DERIVATIVE OF THE OBSERVED-DATA LOGLIKELIHOOD. THE MCONVERGENCE CRITERION OF THE EM ALGORITHM IS NOT FULFILLED. CHECK YOUR STARTING VALUES OR INCREASE THE NUMBER OF MITERATIONS. ESTIMATES CANNOT BE TRUSTED. THE LOGLIKELIHOOD DERIVATIVE FOR PARAMETER 70 IS 0.14864617D+02. Can you shed any light on this? Thanks in advance.
With categorical factor indicators and maximum likelihood estimation, numerical integration is required, with each factor requiring one dimension of integration. You have five which is computationally demanding. You can try INTEGRATION=MONTECARLO (5000). For categorical factor indicators and many factors, WLSMV may be a better choice because numerical integration is not required.