Predicted value of HLM ordinal logist... PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Jessica Li posted on Thursday, July 03, 2014 - 11:06 am
How do I get/calculate the predicted value of an outcome in a multilevel ordinal logistic regression? Even outside Mplus. Where are the intercepts?

Estimate S.E. Est./S.E. P-Value
Within Level
C1 -0.011 0.014 -0.816 0.414
C2 -0.063 0.099 -0.635 0.525
C3 0.051 0.117 0.436 0.663
C4 0.113 0.045 2.498 0.012
C5 -0.019 0.007 -2.568 0.010
C6 0.009 0.013 0.673 0.501
C7 0.034 0.052 0.657 0.511
C8 -0.124 0.091 -1.364 0.173
R1 -0.382 0.067 -5.693 0.000
R2 -0.652 0.121 -5.405 0.000
R3 -0.236 0.065 -3.604 0.000
U1 -0.194 0.074 -2.615 0.009
U2 -0.195 0.056 -3.477 0.001
U3 -0.139 0.100 -1.397 0.162
U4 0.271 0.094 2.883 0.004
Between Level
O$1 -4.734 0.673 -7.034 0.000
O$2 -2.945 0.668 -4.406 0.000
 Jessica Li posted on Thursday, July 03, 2014 - 11:07 am
========Here is my input=============
TITLE: Org model
DATA: FILE ="\Desktop\g07023.csv";
VARIABLE: NAMES are sid c1 c2 c3 c4 c5 c6 c7 c8 r1 r2 r3 u1 u2 u3 u4 u o;
categorical are o;
Missing are all (9999);
USEVARIABLES are sid c1 c2 c3 c4 c5 c6 c7 c8 r1 r2 r3 u1 u2 u3 u4 o;
WITHIN are c1 c2 c3 c4 c5 c6 c7 c8 r1 r2 r3 u1 u2 u3 u4;
CLUSTER are sid;
Estimator are ml;
o ON c1 c2 c3 c4 c5 c6 c7 c8 r1 r2 r3 u1 u2 u3 u4;
 Jessica Li posted on Thursday, July 03, 2014 - 11:09 am
I should clarify. I was trying to get the predicted value (either categorical or continuous is fine)of the outcome for every case/observation.

 Bengt O. Muthen posted on Thursday, July 03, 2014 - 5:03 pm
If you have a binary DV the intercept is the negative of the threshold. You have an ordinal DV with 3 categories so two thresholds. You can computed the predicted probability for different random intercept values as shown on slide 66 of our Topic 7 handout, where slides 60-66 deal with understanding two-level logistic regression. See handout and video on our website.

We ask that you limit postings to one window.
 ljc posted on Monday, September 29, 2014 - 7:07 am
Slide 66 of topic 7 only has the patterns or cluster sizes. Am I looking in the wrong place?
 Bengt O. Muthen posted on Monday, September 29, 2014 - 10:29 am
Slide 66 refers to the Larsen-Merlo article - this is a good one to study. The slide looks like:

Understanding The Between-Level Intercept
Intra-class correlation
ICC = 0.807/(π2/3+ 0.807) = 0.20
Odds ratios
Larsen & Merlo (2005). Appropriate assessment of neighborhood
effects on individual health: Integrating random and fixed effects in
multilevel logistic regression. American Journal of
Epidemiology, 161, 81-88.
Larsen proposes MOR:
"Consider two persons with the same covariates, chosen randomly from
two different clusters. The MOR is the median odds ratio between the
person of higher propensity and the person of lower propensity."
MOR = exp( √(2* σ2) * Φ-1 (0.75) )
In the current example, ICC = 0.20, MOR = 2.36
Compare β0j= -1 SD and β0j= +1 SD from the mean: For males at the
aggression mean the probability varies from 0.14 to 0.50
 ljc posted on Monday, September 29, 2014 - 11:38 am
Sorry, I hate to be dense, but I don't understand the &# notation.

I think your last sentence has the answer I am looking for which is, the formula for the predicted value for each cluster.
I think I am supposed add (or subtract) the standard deviation to something, but I am not sure what.

Just as a note, I only have a random intercept in my particular example.
 Bengt O. Muthen posted on Monday, September 29, 2014 - 2:00 pm
The text got garbled when copying from the PPT pdf - check the handout instead.
 ljc posted on Monday, September 29, 2014 - 2:54 pm
I found it. It is slide 58 using the version that is on the Mplus homepage.

But it still doesn't help me with probabilities for specific clusters. It just helps me get a range.

I can get cluster specific predictions easily with SAS, but SAS will delete cases with missing x and MPlus won't.

I hope you consider adding predicted values to your to save command in the future. Thanks.
 S.Arunachalam posted on Monday, September 29, 2014 - 9:01 pm
Respected Prof. Muthen I have a similar request:

foe estimator being ML or MLR how to get:
1.) Predicted values (y-hat) and
2.) Residuals (resid)

e.g. I have a latent variable (LV) with three indicators. This LV is a dependent variable Y in the model. So to get the residual of this LV after being predicted by say another independent variable X which also latent with three indicator. i.e.
Y by y1 y2 y3;
X by x1 x2 x3;
Y on X;

Just so, I found an interesting way to get this If X and Y were not latent variables. I can get this from the scatter plot-->save plot data.

However for latent variables this is not there! Please help. (in stata we get this using the predict command for non latent variable regressions)
 Bengt O. Muthen posted on Tuesday, September 30, 2014 - 8:54 am
Answer to ljc:

Perhaps what you are asking for is answered by getting factor scores for the cluster effects, that is, the random intercepts. You get this by Save=FSCORES. Then you plug that into the formula.

Regarding the slide number, I am looking at the 3/29/11 Topic 7 handout at our usual site
 Tihomir Asparouhov posted on Tuesday, September 30, 2014 - 10:00 am
Answer to S.Arunachalam

You can get estimates (posterior mean) for Y and X using

savedata: file is 1.dat; save=FSCORES;

The residuals in the Y on X regression can be computed manually. Just use the estimated coefficient in that regression beta and the estimates for Y and X to get "Y - beta X" residual.
 Lindsey M. King posted on Wednesday, January 27, 2016 - 3:06 pm
I am trying to calculate predicted probabilities for a multilevel mediation model with a 4-category ordinal DV. The mediation pathway is not significant, so I am only interested in the level-2 direct effect of x on y. Using MODEL CONSTRAINT and the mean value of x yields implausible results. The probabilities do all add to 1, but the distribution is extremely unlikely. Below is a pared-down version of my model. Am I using the correct equation given the model?

WITHIN = [level-1 IVs] ;
CLUSTER = country ;
DEFINE: x = log(x) ;


y ON [individual-level variables] ;
m ON [individual-level variables] ;
m ON x (a) ;
y ON m (b) ;
y ON x (coef) ;
[y$1] (tau1) ;
[y$2] (tau2) ;
[y$3] (tau3) ;

NEW(indb p1 p2 p3 p4) ;
indb=a*b ;
p1 = phi(tau1 - 2.125*coef) ;
p2 = phi(tau2 - 2.125*coef) - phi(tau1 - 2.125*coef) ;
p3 = phi(tau3 - 2.125*coef) - phi(tau2 - 2.125*coef) ;
p4 = phi(-tau3 + 2.125*coef) ;
 Tihomir Asparouhov posted on Wednesday, January 27, 2016 - 6:02 pm
Your problem might be solved simply by centering all covariates.

The main problem is that when you say "predicted probabilities" you have to clarify if these are condtional or unconditional probabilities.

The probit regression gives you P(Y|X). To get the unconditional probabilities P(Y) you have to do something like this

p1 = phi((tau1 - Mean(X)*Beta)/sqrt(1+Beta*Var(X)*Beta^T+VBY)) ;
p2 = phi((tau2 -Mean(X)*Beta)/sqrt(1+Beta*Var(X)*Beta^T+VBY)) -
phi((tau1 -Mean(X)*Beta)/sqrt(1+Beta*Var(X)*Beta^T+VBY)) ;


where VBY is the between variance of Y and Beta*Var(X)*Beta^T is the total variance for the beta*X predictor. I notice that you skipped the within level model completely (but you shouldn't in general). If you skip it that means you condition on all within level X to be zero, which might be inappropriate or irrelevant.

Even the above approach assumes normal distribution for X - it may be best to average P(Y|X) for all X in your data set. We offer that now for single level via the individual predicted values see web note #20 but not for two-level yet.
 Lindsey M. King posted on Thursday, January 28, 2016 - 5:50 pm
Thank you for the quick reply.

1) Grand-mean centering mostly worked, and betas were similar in both models--a good sign. I did not grand-mean center the mediator, as doing so switched the sign of the level-2 coefficient. There is no meaningful zero value for the mediator (it's on a 1-5 scale), so it's difficult to tell whether the raw or centered scores are correct. Would grand-mean centering a mediator cause problems with the partitioning of within and between variance, or do you think the raw scores are the more likely problem?

2) Just to verify some figures in the denominator:
VBY is the residual variance of y and Beta^T is the squared posterior s.d.?

(All level-1 variables are dummy-coded such that the combined omitted categories represent the benchmark person. I omitted it because I didn't want to clutter the page.)
 Tihomir Asparouhov posted on Friday, January 29, 2016 - 8:14 am
1) You would have to use the above formula that I gave.

2) Beta^T is the regression coefficient transposed.
 Lindsey M. King posted on Saturday, January 30, 2016 - 1:13 pm
I now understand the formula--I forgot that SEM must be thought of in terms of matrices. While I understand this theoretically, I'm still at a loss as to which specific numbers to use from the output. Beta*Beta^T suggests Beta is squared to make it conformable with the rest of the equation, but I'm unsure if Var(X) is the residual variance or some other number. Based on page 46 of the Topic 2 handout (, it does seem to be the residual variance of Beta. Do I understand this correctly?
 Tihomir Asparouhov posted on Monday, February 01, 2016 - 12:49 pm
We don't actually compute Var(X) in the case when X is a covariate. You can compute this separately - it is the variance covariance of all predictors, not residual variance. However, I would still recommend that you figure out how to compute P(Y|X) for one observation. That has no Var(X). It would use the observed values for M, X and the estimated random intercepts factor score.

p1 = phi((tau1 -b*m-X*Beta-YB);
p2 = phi((tau2 -b*m-X*Beta-YB) - phi((tau1-b*m- X*Beta-YB);

where YB is the posterior mean of the random intercept.
 Jacqueline Sims posted on Wednesday, March 20, 2019 - 9:20 pm
Like the original poster, I am trying to calculate predicted values in a multilevel ordinal logistic regression. I have watched the video for Topic 7 (which was very helpful), and have been studying the slides. I follow the lesson very clearly, but am then lost on where the values of .14 and .50 come from for values of aggression. They are simply 1 SD below and above the mean, which looks like it would then be 32? I must be missing something, because I do not see where the threshold values then come in when calculating P(uij = 1 | xij) in the formula on the top left of slide 61. Thank you so much for your response and for all of these very helpful resources.
 Bengt O. Muthen posted on Friday, March 22, 2019 - 2:32 pm
The values you refer to are on slide 66. The formula for the probability is on slide 61. I think I just plugged in

beta_0j = -threshold (threshold= 2.981)
beta_1j = 0.060*aggress (to which you have to add 1.071 for Male)

I don't recall what the aggress mean or SD values were.

So the probability was evaluated at the zero mean of the random intercept beta_0j. Not conditioning on the random intercept value, you have to do numerical integration to get the probability because the intercept is normal which doesn't connect nicely with a logistic probability link like it does with a probit probability link.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message