R-Square PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
Message/Author
 Tom Munk posted on Thursday, November 17, 2005 - 12:05 pm
I've tried (and failed) to replicate MPLUS R-square values.

For example, in a two-level model with no predictors, the variances were 898(within) and 372(between). By adding a set of predictors at each level, I obtain variances of 844(w) and 99(b). These are reductions of 6%(w) and 73.4%(b). I would expect MPLUS's R-sq values to closely match these, but they don't. They are 22.5%(w) and 17.6%(b).

How are the MPLUS values calculated? Can they be interpreted as the fraction of the (within- or between-) variance explained by the predictors?
 Linda K. Muthen posted on Thursday, November 17, 2005 - 2:24 pm
R-square is variance explained divided by total variance. I would need to see what you are basing your numbers on the see what you are doing. Please send your input, data, output, and license number to support@statmodel.com.
 Huang Xiaorui posted on Thursday, October 19, 2006 - 7:45 am
I am doing a twolevel model analysis, I can get R-square for each regression. But I can't get the whole model's R-square. How can it be displayed in Mplus? Thanks.
Within Level

Observed
Variable R-Square

PMIMP 0.491
TMIMP 0.092
PMEFF 0.720
TMEFF 0.072
PMR 0.483
TMR 0.057
MI 0.124

Latent
Variable R-Square

EFFORT 0.345
IMPUL 0.072

Between Level
 Linda K. Muthen posted on Thursday, October 19, 2006 - 9:59 am
There is not an R-square for the full model.
 liesbeth mercken posted on Tuesday, September 18, 2007 - 1:28 am
Hi,
If you have the following model
Y1 ON Y2;
Y2 ON X1 X2 X3;
X1 WITH X2 X3;
X2 WITH X3;

and you have to add X4 to the model in the following way:
Y1 ON Y2 X4;
Y2 ON X1 X2 X3;
Y2 WITH X4;
X1 WITH X2 X3;
X2 WITH X3;

is it possible that R-square for Y2 decreases after you entered X4 in the model? And is this the case because you added a corelation with Y2?

thank you,
Liesbeth
 Linda K. Muthen posted on Tuesday, September 18, 2007 - 4:34 am
I'm not sure what might happen in this case. In regression, the means, variances, and covariaces of the covariates are not model paremters. You should not mention x1, x2, x3, and x4 in the MODEL command except on the right-hand side of ON. Also, why do you have y1 ON x2 and not y2 ON x4?
 liesbeth mercken posted on Tuesday, September 18, 2007 - 7:14 am
Hi Linda,
Thank you for your reaction.
Y2 on X2 is theoretically and empirically supported. The association between Y2 and X4 is completely new.
X4 is a new concept that is measured at the exact same time as Y2, so I can't use an ON statement as I then conclude causality in a specific direction.
But... I actually tried what you suggested Y2 ON X4,
but to be sure also X4 ON Y2.
Both models give significant and good results (also both interpretable)...
So I though it was best to choose for the safe road and keep it in the model as a correlation. Would you suggest to choose a direction?
thank you
 Bengt O. Muthen posted on Thursday, September 20, 2007 - 10:57 am
When you added x4 to the model you let it correlate only with y2 and y1, not x1-x3. This could cause a misspecified model with too high chi-square test of fit, in which case the estimates should not be interpreted. Perhaps you want to regress x4 on x1-x3. Also, you might have a misfitting model due to y1 not having direct influence from x1-x3.

In any case, even if you have well-fitting models, with these left-out effects I would say that the R-square behavior is not predictable.
 Marko Neumann posted on Friday, February 15, 2008 - 3:57 am
To my knowledge, the basis for the Rē computation are the variance components of level 1 and level 2 resulting from the empty model. However, the Rē calculated by Mplus does not seem to be identical to the Rē values that result from using the empty model. Mplus seems rather to refer to the estimated within- and between level parts of the (co)variance given in the sample statistics output of the specified model which differ considerably from the variance components of the empty model, depending on the specified individual level predictors.

example:
The dependent variable is school achievement. The empty model provides a within variance of 413 and a between variance of 775, and thus an intra class correlation of .652 (which is quite high, but normal in the tracked German secondary school system). Inclusion of level 1 predictor variables (e. g., prior knowledge, SES) results in an increased within variance(918) and a decreased between level variance (52) in the samp stats output. The residual variances in the specified model were 185 for the within and 52 (that means equal to the between variance part of the samp stats output) for the between level. Rē for within was .80 (resulting from (918-185)/918= .798). On the contrary, computation of Rē on the basis of the variance components of the empty model (like in HLM) would result in (413-185)/413= .55.

End of Part I
 Marko Neumann posted on Friday, February 15, 2008 - 3:59 am
Part II

Including level 2 predictor variables in the next step (e. g., school track, mean achievement) results in the following variance components: within = 825 and between = 72; the residual variances were: within = 183 and between = 27. In Mplus within Rē was .78 (resulting from (825-183)/825= .778) and between Rē was .63 (resulting from (72-27)/72= .625).
Computation of Rē on basis of the variance components of the empty model would result in (413-183)/413= .56 for within and (775-27)/ 775= .97 for between.

To sum up, I have three main questions: Why do the variance components at level 2 in the samp stats output decrease after inclusion of predictor variables? Why do they differ so much from the variance components in the empty model? How do I have to interpret them? I would be very grateful if you could answer my question.
 Linda K. Muthen posted on Friday, February 15, 2008 - 9:30 am
The only reason that I can think of for sample statistics to differ is that the sample size has changed. If this is not the reason, please send your inputs, data, outputs, and license number to support@statmodel.com.
 Marko Neumann posted on Monday, February 18, 2008 - 3:02 am
To my knowledge, the basis for the Rē computation are the variance components of level 1 and level 2 resulting from the empty model. However, the Rē calculated by Mplus does not seem to be identical to the Rē values that result from using the empty model. Mplus seems rather to refer to the estimated within- and between level parts of the (co)variance given in the sample statistics output of the specified model which can differ considerably from the variance components of the empty model, depending on the specified individual level predictors.

I have three main questions: Why do the variance components at level 2 in the samp stats output decrease after inclusion of predictor variables? Why do they sometimes differ so much from the variance components in the empty model? How do I have to interpret them? I would be very grateful if you could answer my question.
 Bengt O. Muthen posted on Monday, February 18, 2008 - 9:13 am
Mplus computes R-square as the ratio of estimated explained variance in the numerator and estimated total variance in the denominator, where this is done for each level separately.

R-square on level 2 refers to proportion variance explained in random intercepts. With different level 1 predictors, the intercept definition changes (value of y when all x's = 0), so the level 2 R-square can therefore change.
 Stephan posted on Sunday, March 02, 2008 - 11:32 pm
Hello,
in one of the web courses Bengt mentioned that user can have good overall model fit (incl. GFI) but rather bad R-square for each of their latent variables. Could you please give me any further hints? Under which circumstances might that occur? Thanks a lot for your help.-Stephen
 Linda K. Muthen posted on Monday, March 03, 2008 - 7:34 am
R-square is not a test of model fit. It describes the variance of the dependent variable explained by a set of covariates. A model can fit the data well even when the set of covariates does not explain the variance in the dependent variable.
 Stephan posted on Monday, March 03, 2008 - 3:08 pm
Dear Linda,
thanks for the response. But R-square for latent variables is not 1-residual variance?
 Stephan posted on Monday, March 03, 2008 - 3:12 pm
sorry, didn't see the output with the standardized residual variances. I am o.k. now.
Thanks for your help. Best, Stephen
 Darla Kendzor posted on Tuesday, May 12, 2009 - 2:31 pm
I have submitted a manuscript for publication that describes a model which includes 4 continuous latent variables (IV/mediators) and 3 dichotomous observed outcome variables (DVs). A reviewer has asked me to include the percent of the variance accounted for by the IVs/mediators in the DVs. Mplus provides r-square values for each observed and latent variable, and I am wondering if mplus also provides an r-square for the whole model (i.e., the percent of variance in each of the outcomes [DVs] accounted for by ALL of the IVs/mediators)? If not, would it be meaningful to square the standardized total direct and indirect effect coefficient to obtain an r-square value this way? If neither of these ideas are possible, do you have any suggestions about how I might answer the reviewer's concern?
 Linda K. Muthen posted on Wednesday, May 13, 2009 - 9:52 am
I would give the R-squares for each regression. A model R-square does not make sense because the aim of the model is not to maximize variance explained but to reproduce variances and covariances. R-square is not a model fit statistic.
 Lois Downey posted on Tuesday, October 20, 2009 - 5:19 pm
Since Mplus doesn't provide a confidence interval around r-squared for clustered regression models, I've been computing the confidence intervals manually, using the point estimate and estimated standard error provided in the output. However, I'm invariably getting a lower bound that is negative. This seems counter-intuitive, given that r-squared cannot be less than zero. Are the negative values likely a result of rounding error, given that Mplus provides estimates rounded to 3 digits? (I'm multiplying the estimated standard error by 1.96 and then subtracting and adding the result to the point estimate.)

Thanks.
 Linda K. Muthen posted on Wednesday, October 21, 2009 - 9:47 am
This can happen because there are no restrictions put on the confidence intervals in their computation. I don't think it is a rounding error.
 Alden Gross posted on Monday, May 03, 2010 - 12:41 pm
Dear Drs. Muthen,
I'd like to report R^2 values from growth curve models for latent growth factors (intercept and slope). I understand MPLUS calculates R-square values as the (variance explained by the model) divided by (total variance) of an outcome.

I have two questions. First, what, other than stated regression paths, contribute to a latent outcome's R^2? Second, when I add demographic variables to explain the outcome, why does MPLUS' given R^2 decrease?

Here are some excerpts:
MODEL:
if sf | y1@0 y2@1 y3@2;
i1 s1 | x1v1@0 x1v2@1 x1v3@2;
i2 s2 | x2v1@0 x2v2@1 x2v3@2;
if ON i1 i2; sf ON i1 i2;

I would expect R^2 for "if" to be a sum of squared standardized regression coefficients for "if" regressed on i1 and i2. For instance, if the standardized B for if on i1=0.375 and i2=0.574, I think R^2 =0.375^2+0.574^2= 47%. However, MPLUS's given R^2 = 76%, which is correctly (1-residual/total) = (1-0.146/0.607) = 0.759.

When I then add covariates in another model (e.g., if ON age sex education;), the MPLUS-given R^2 decreases.
Thanks so much!
 Linda K. Muthen posted on Tuesday, May 04, 2010 - 9:52 am
You have forgotten the covariance. You need to add:

2*cov*b1*b2
 Brondeel Ruben posted on Saturday, December 04, 2010 - 6:39 am
Hi.

I fitted a logistic regression model with complex data. I was asked how much variance is explained by the model. So I stated 'standardized' in the output command.
1. What kind of R-square is this? Is it an adjusted version of the R-square, something like a Nagelkerke's R-square?
2. Is it meaningful in this context?
3. How does it handle the complex data structure? Or is that irrelevant for the computation of the R-square (in this case, but also for the case of a continuous dependent)

Regards,
Ruben.
 Linda K. Muthen posted on Sunday, December 05, 2010 - 11:12 am
The R-square is the variance explained in the latent response variable underlying the categorical variable. See the following book for further information:

Long, S. (1997). Regression models for categorical and limited
dependent variables. Thousand Oaks: Sage.

R-square is not affected by non-independence of observations. R-square is usually not used for logistic regression.
 jmaslow posted on Tuesday, May 10, 2011 - 7:52 am
Hello,

I am able to replicate the r square produced by Mplus in latent variable SEM models. I am attempting to extend this to SEM models containing a latent interaction calculated with XWITH.

I use the formula provided by Mooijaart & Satorra (2009) to calculate % variance explained by the main effects and the interaction term and also include 2*cov*b1*b2 and the term that multiplies the b of the interaction term by the variance and covariances of the latent variables and sum all of these terms.

However, by this method, the variance explained in the model with xwith included is much lower than the variance explained by the model with only main effects (16% versus 35%). I feel that what is missing is the covariance of the main effects with the interaction term, which are not provided by Mplus. Is there a way to get these covariances from the program, calculate them, or force them to be 0?

Thank you.
 Bengt O. Muthen posted on Tuesday, May 10, 2011 - 10:46 am
You say

"and the term that multiplies the b of the interaction term by the variance and covariances of the latent variables"

But, that term should be b*b*(V(f1)*V(f2)+[cov(f1,f2)]**2).

Then you say

"I feel that what is missing is the covariance of the main effects with the interaction term"

There are no such terms missing because all third-order moments are zero due to the normality assumption of the factors.
 jmaslow posted on Wednesday, May 11, 2011 - 9:19 am
Thank you, Dr. Muthen. This is very helpful. Just to be clear, then, the main effects and interaction in a model with XWITH are completely uncorrelated with each other and can be considered independent?
 Linda K. Muthen posted on Wednesday, May 11, 2011 - 9:21 am
Yes.
 Hans Leto posted on Thursday, May 31, 2012 - 1:56 pm
Dr. Muthen.

Which would be the formula to calculate R-squared for a three-way interaction.

For instance:
V(f4) = b1*2 V(f1)+ b2*2 V(f2)+ b3*2 V(f3) + 2*f1*f2 Cov(f1,f2) + b4**2 V(f1xf2xf3) +

I am not sure how to specify the Cov when I have three factors.

Thank you.
 Linda K. Muthen posted on Thursday, May 31, 2012 - 3:38 pm
R-square is 1 minus the standardized residual variance of the dependent variable.
 Hans Leto posted on Friday, June 01, 2012 - 2:52 am
But the standardized residual variance is not provided in the output with TYPE=RANDOM? How can I request or calculate it?
 Linda K. Muthen posted on Friday, June 01, 2012 - 9:55 am
See the following FAQ on the website:

Latent variable interactions
 caroline masquillier posted on Tuesday, July 17, 2012 - 5:26 am
Dear Drs. Muthen,

What kind of R-square do we get in Mplus output for a logistic regression? Is this an adjusted version of the R-square, something like a Nagelkerke's R-square?

Thank you very much
 Linda K. Muthen posted on Tuesday, July 17, 2012 - 11:01 am
The R-square is for the latent response variable. It is described in the Snijders and Bosker book Multilevel Analysis.
 May Yang posted on Wednesday, April 17, 2013 - 9:26 am
Hello,
I believe this is along the same lines as Tom Munk's posting on 11/17/05. I am confused on how R2 is calculated in a multi-level model. So R2= variance explained/ total variance. When I run a null model (without any predictors), I get a within variance=2.234. 2.234 to me is the total variance. When level 1 predictors are added, I get variance estimate of 0.147. I would assume that R2 is then 0.147/2.234 = 0.066 but the output report R2 (obtained using standardized option) = 0.086. What am I missing here? Thank you.
 Bengt O. Muthen posted on Wednesday, April 17, 2013 - 12:23 pm
On each of the 2 levels, R-square is explained variance divided by total variance (on that level). Yes, 2.234 is the total variance on level 1, but when adding a predictors the variance parameter that gets estimated is the residual variance.

If this does not explain things, please send output to support.
 Elina Dale posted on Thursday, October 10, 2013 - 4:16 pm
Dear Dr. Muthen,

As you wrote, the R-square is the variance explained in the latent response variable underlying the categorical variable. Is it proportion?

In MPlus CFA output, the R-Square estimate is provided next to observed factor indicators. So, if we see an R-Sq of 0.400 next to y1 (factor 1 indicator), does it mean that 40% of variance in factor 1 is explained by this categorical variable (y1)? But then there are 3 other indicators that also have a similar R-Sq estimate and so, when you add them up they are >100%.

Also, what do residual variance values (Column 6 in R-Square table) mean in this case?

I am giving an example of the output below, so that you understand what res variance and R-Sq values I am referring to.

R-SQUARE

Observed variable (Column 1)
Estimate (Column 2) S.E. (Column 3)
Est./S.E. (Column 4)
Two-Tailed P-value (Column 5)
Residual Variance (Column 6)

Thank you!
 Linda K. Muthen posted on Friday, October 11, 2013 - 6:10 am
The R-square for y1 means that 40% of the variance of y1 is explained by the factor. Factor indicators are dependent variables and factors are independent variables in the factor model.

Residual variances are not model parameters for categorical variables. The values given under residual variance are computed after model estimation as remainders.
 Johnson Song posted on Thursday, March 06, 2014 - 3:18 pm
Dear Dr. Muthen,

If the statistically significance test of between-level R^2 indicates that the R^2 is not statistically significant at (R^2=.07,p>.05), does it mean that the portion of the variances of this latent variable explained by the predictors is not different from zero even though the regression coefficients actually is statistically significant (p=.017)?

Whether should I still report the significant regression coefficients?

Thank you so much for your advice!

Best,
John
 Bengt O. Muthen posted on Friday, March 07, 2014 - 4:39 pm
Q1. Yes. But note that the test of R-2 may not work as well as the test of the regression coefficient because the sampling distribution of R-2 may not be as close to normal.

Q2. I would do that. In general I don't report R-2 significance.
 Lance Rappaport posted on Wednesday, December 03, 2014 - 11:16 am
Hello.

I understand that the stdyx option is not available when using random slopes in a multilevel context. I understand that r-square at level 1 cannot be estimated as it varies as a function of the grouping variable. My question is why an r-squared value cannot be computed for a level 2 variable?

For example, I have random slopes at level 1, which predict a level 2 endogenous variable. How could I compute an r-square for this endogenous variable? R-square for this variable should not vary as a function of anything included in the model.

Sincerely,
Lance
 Bengt O. Muthen posted on Wednesday, December 03, 2014 - 4:30 pm
The R-2 for level-2 is well defined as you say. You can express it using Model Constraint with Model parameter labels.
 Lance Rappaport posted on Wednesday, December 03, 2014 - 5:07 pm
Thank you very much, especially for such a prompt response. Would it be okay if I asked how to use model parameter labels to capture the standardized residual variance?
 Bengt O. Muthen posted on Thursday, December 04, 2014 - 11:40 am
The general approach is shown in UG ex 5.20.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: