Tom Munk posted on Thursday, November 17, 2005 - 12:05 pm
I've tried (and failed) to replicate MPLUS R-square values.
For example, in a two-level model with no predictors, the variances were 898(within) and 372(between). By adding a set of predictors at each level, I obtain variances of 844(w) and 99(b). These are reductions of 6%(w) and 73.4%(b). I would expect MPLUS's R-sq values to closely match these, but they don't. They are 22.5%(w) and 17.6%(b).
How are the MPLUS values calculated? Can they be interpreted as the fraction of the (within- or between-) variance explained by the predictors?
R-square is variance explained divided by total variance. I would need to see what you are basing your numbers on the see what you are doing. Please send your input, data, output, and license number to firstname.lastname@example.org.
I'm not sure what might happen in this case. In regression, the means, variances, and covariaces of the covariates are not model paremters. You should not mention x1, x2, x3, and x4 in the MODEL command except on the right-hand side of ON. Also, why do you have y1 ON x2 and not y2 ON x4?
Hi Linda, Thank you for your reaction. Y2 on X2 is theoretically and empirically supported. The association between Y2 and X4 is completely new. X4 is a new concept that is measured at the exact same time as Y2, so I can't use an ON statement as I then conclude causality in a specific direction. But... I actually tried what you suggested Y2 ON X4, but to be sure also X4 ON Y2. Both models give significant and good results (also both interpretable)... So I though it was best to choose for the safe road and keep it in the model as a correlation. Would you suggest to choose a direction? thank you
When you added x4 to the model you let it correlate only with y2 and y1, not x1-x3. This could cause a misspecified model with too high chi-square test of fit, in which case the estimates should not be interpreted. Perhaps you want to regress x4 on x1-x3. Also, you might have a misfitting model due to y1 not having direct influence from x1-x3.
In any case, even if you have well-fitting models, with these left-out effects I would say that the R-square behavior is not predictable.
To my knowledge, the basis for the RČ computation are the variance components of level 1 and level 2 resulting from the empty model. However, the RČ calculated by Mplus does not seem to be identical to the RČ values that result from using the empty model. Mplus seems rather to refer to the estimated within- and between level parts of the (co)variance given in the sample statistics output of the specified model which differ considerably from the variance components of the empty model, depending on the specified individual level predictors.
example: The dependent variable is school achievement. The empty model provides a within variance of 413 and a between variance of 775, and thus an intra class correlation of .652 (which is quite high, but normal in the tracked German secondary school system). Inclusion of level 1 predictor variables (e. g., prior knowledge, SES) results in an increased within variance(918) and a decreased between level variance (52) in the samp stats output. The residual variances in the specified model were 185 for the within and 52 (that means equal to the between variance part of the samp stats output) for the between level. RČ for within was .80 (resulting from (918-185)/918= .798). On the contrary, computation of RČ on the basis of the variance components of the empty model (like in HLM) would result in (413-185)/413= .55.
Including level 2 predictor variables in the next step (e. g., school track, mean achievement) results in the following variance components: within = 825 and between = 72; the residual variances were: within = 183 and between = 27. In Mplus within RČ was .78 (resulting from (825-183)/825= .778) and between RČ was .63 (resulting from (72-27)/72= .625). Computation of RČ on basis of the variance components of the empty model would result in (413-183)/413= .56 for within and (775-27)/ 775= .97 for between.
To sum up, I have three main questions: Why do the variance components at level 2 in the samp stats output decrease after inclusion of predictor variables? Why do they differ so much from the variance components in the empty model? How do I have to interpret them? I would be very grateful if you could answer my question.
The only reason that I can think of for sample statistics to differ is that the sample size has changed. If this is not the reason, please send your inputs, data, outputs, and license number to email@example.com.
To my knowledge, the basis for the RČ computation are the variance components of level 1 and level 2 resulting from the empty model. However, the RČ calculated by Mplus does not seem to be identical to the RČ values that result from using the empty model. Mplus seems rather to refer to the estimated within- and between level parts of the (co)variance given in the sample statistics output of the specified model which can differ considerably from the variance components of the empty model, depending on the specified individual level predictors.
I have three main questions: Why do the variance components at level 2 in the samp stats output decrease after inclusion of predictor variables? Why do they sometimes differ so much from the variance components in the empty model? How do I have to interpret them? I would be very grateful if you could answer my question.
Mplus computes R-square as the ratio of estimated explained variance in the numerator and estimated total variance in the denominator, where this is done for each level separately.
R-square on level 2 refers to proportion variance explained in random intercepts. With different level 1 predictors, the intercept definition changes (value of y when all x's = 0), so the level 2 R-square can therefore change.
Stephan posted on Sunday, March 02, 2008 - 11:32 pm
Hello, in one of the web courses Bengt mentioned that user can have good overall model fit (incl. GFI) but rather bad R-square for each of their latent variables. Could you please give me any further hints? Under which circumstances might that occur? Thanks a lot for your help.-Stephen
R-square is not a test of model fit. It describes the variance of the dependent variable explained by a set of covariates. A model can fit the data well even when the set of covariates does not explain the variance in the dependent variable.
Stephan posted on Monday, March 03, 2008 - 3:08 pm
Dear Linda, thanks for the response. But R-square for latent variables is not 1-residual variance?
Stephan posted on Monday, March 03, 2008 - 3:12 pm
sorry, didn't see the output with the standardized residual variances. I am o.k. now. Thanks for your help. Best, Stephen
I have submitted a manuscript for publication that describes a model which includes 4 continuous latent variables (IV/mediators) and 3 dichotomous observed outcome variables (DVs). A reviewer has asked me to include the percent of the variance accounted for by the IVs/mediators in the DVs. Mplus provides r-square values for each observed and latent variable, and I am wondering if mplus also provides an r-square for the whole model (i.e., the percent of variance in each of the outcomes [DVs] accounted for by ALL of the IVs/mediators)? If not, would it be meaningful to square the standardized total direct and indirect effect coefficient to obtain an r-square value this way? If neither of these ideas are possible, do you have any suggestions about how I might answer the reviewer's concern?
I would give the R-squares for each regression. A model R-square does not make sense because the aim of the model is not to maximize variance explained but to reproduce variances and covariances. R-square is not a model fit statistic.
Lois Downey posted on Tuesday, October 20, 2009 - 5:19 pm
Since Mplus doesn't provide a confidence interval around r-squared for clustered regression models, I've been computing the confidence intervals manually, using the point estimate and estimated standard error provided in the output. However, I'm invariably getting a lower bound that is negative. This seems counter-intuitive, given that r-squared cannot be less than zero. Are the negative values likely a result of rounding error, given that Mplus provides estimates rounded to 3 digits? (I'm multiplying the estimated standard error by 1.96 and then subtracting and adding the result to the point estimate.)
Dear Drs. Muthen, I'd like to report R^2 values from growth curve models for latent growth factors (intercept and slope). I understand MPLUS calculates R-square values as the (variance explained by the model) divided by (total variance) of an outcome.
I have two questions. First, what, other than stated regression paths, contribute to a latent outcome's R^2? Second, when I add demographic variables to explain the outcome, why does MPLUS' given R^2 decrease?
I would expect R^2 for "if" to be a sum of squared standardized regression coefficients for "if" regressed on i1 and i2. For instance, if the standardized B for if on i1=0.375 and i2=0.574, I think R^2 =0.375^2+0.574^2= 47%. However, MPLUS's given R^2 = 76%, which is correctly (1-residual/total) = (1-0.146/0.607) = 0.759.
When I then add covariates in another model (e.g., if ON age sex education;), the MPLUS-given R^2 decreases. Thanks so much!
I fitted a logistic regression model with complex data. I was asked how much variance is explained by the model. So I stated 'standardized' in the output command. 1. What kind of R-square is this? Is it an adjusted version of the R-square, something like a Nagelkerke's R-square? 2. Is it meaningful in this context? 3. How does it handle the complex data structure? Or is that irrelevant for the computation of the R-square (in this case, but also for the case of a continuous dependent)
I am able to replicate the r square produced by Mplus in latent variable SEM models. I am attempting to extend this to SEM models containing a latent interaction calculated with XWITH.
I use the formula provided by Mooijaart & Satorra (2009) to calculate % variance explained by the main effects and the interaction term and also include 2*cov*b1*b2 and the term that multiplies the b of the interaction term by the variance and covariances of the latent variables and sum all of these terms.
However, by this method, the variance explained in the model with xwith included is much lower than the variance explained by the model with only main effects (16% versus 35%). I feel that what is missing is the covariance of the main effects with the interaction term, which are not provided by Mplus. Is there a way to get these covariances from the program, calculate them, or force them to be 0?
"and the term that multiplies the b of the interaction term by the variance and covariances of the latent variables"
But, that term should be b*b*(V(f1)*V(f2)+[cov(f1,f2)]**2).
Then you say
"I feel that what is missing is the covariance of the main effects with the interaction term"
There are no such terms missing because all third-order moments are zero due to the normality assumption of the factors.
jmaslow posted on Wednesday, May 11, 2011 - 9:19 am
Thank you, Dr. Muthen. This is very helpful. Just to be clear, then, the main effects and interaction in a model with XWITH are completely uncorrelated with each other and can be considered independent?
The R-square is for the latent response variable. It is described in the Snijders and Bosker book Multilevel Analysis.
May Yang posted on Wednesday, April 17, 2013 - 9:26 am
Hello, I believe this is along the same lines as Tom Munk's posting on 11/17/05. I am confused on how R2 is calculated in a multi-level model. So R2= variance explained/ total variance. When I run a null model (without any predictors), I get a within variance=2.234. 2.234 to me is the total variance. When level 1 predictors are added, I get variance estimate of 0.147. I would assume that R2 is then 0.147/2.234 = 0.066 but the output report R2 (obtained using standardized option) = 0.086. What am I missing here? Thank you.
On each of the 2 levels, R-square is explained variance divided by total variance (on that level). Yes, 2.234 is the total variance on level 1, but when adding a predictors the variance parameter that gets estimated is the residual variance.
If this does not explain things, please send output to support.
Elina Dale posted on Thursday, October 10, 2013 - 4:16 pm
Dear Dr. Muthen,
As you wrote, the R-square is the variance explained in the latent response variable underlying the categorical variable. Is it proportion?
In MPlus CFA output, the R-Square estimate is provided next to observed factor indicators. So, if we see an R-Sq of 0.400 next to y1 (factor 1 indicator), does it mean that 40% of variance in factor 1 is explained by this categorical variable (y1)? But then there are 3 other indicators that also have a similar R-Sq estimate and so, when you add them up they are >100%.
Also, what do residual variance values (Column 6 in R-Square table) mean in this case?
I am giving an example of the output below, so that you understand what res variance and R-Sq values I am referring to.
If the statistically significance test of between-level R^2 indicates that the R^2 is not statistically significant at (R^2=.07,p>.05), does it mean that the portion of the variances of this latent variable explained by the predictors is not different from zero even though the regression coefficients actually is statistically significant (p=.017)?
Whether should I still report the significant regression coefficients?
I understand that the stdyx option is not available when using random slopes in a multilevel context. I understand that r-square at level 1 cannot be estimated as it varies as a function of the grouping variable. My question is why an r-squared value cannot be computed for a level 2 variable?
For example, I have random slopes at level 1, which predict a level 2 endogenous variable. How could I compute an r-square for this endogenous variable? R-square for this variable should not vary as a function of anything included in the model.
I am running a continuous-time survival analysis using the cox regression model (based on example 6.21). I included Output: stdyx and can see my standardized coefficients but the R-Square line is coming up blank. Do you know why this would be?
I'm running a very simple multiple regression model with one continuous outcome and 5 predictors. Three of the five predictors are considered covariates and not target predictors. I'm interested in the added value of the target predictors. My idea was to run two models, the first with just the three covariates predicting the outcome, and the second model with the two target predictors added. I would look at the change in R-squared. I know there are ways to test whether the R-squared increase was statistically significant, via an F-test of change. Is there any way to test that in Mplus? I've been told that the chi-square difference p-value would be equivalent. If I were to compare models using the chi-square difference test, would I set the parameters of the target predictors in my nested model to zero, and then free them up in the second model? Thanks!
No such F-test in Mplus. I would go about it the way you describe in your second to last sentence.
Dirk Pelt posted on Wednesday, October 14, 2015 - 1:16 am
Dear Bengt and Linda,
I have read this whole thread and it appears that most questions relate to the fact that there appear to be two ways of calculating R2 (please correct me if I'm wrong):
1. One based on comparison with a null model including a random intercept only. The formula is: (var of null model - var of model with predictors)/var of null model. This is the 'standard' way in multilevel models as described in Kreft and De Leeuw (1998), Snijders and Bosker etc. This formula can be applied to each level. This is all based on unstandardized coefficients/variables.
2. One that Mplus reports, which is 1-standardized residual or simply the sum of all standardized beta coefficients squared. Again separately for each level.
Which is the correct one to use? I believe that in the multilevel literature, standardizing of coefficients is always treated as something problematic. Thank you!
For each level Mplus uses the standard R-square formula:
(1) (Variance explained by covariates)/(Total DV variance)
That happens to be the same as 1 - stand'd res var.
I think the formula you mention is the same as (1) because the var of null model is the total DV variance and when you say "var of model with predictors" you may be referring to the residual variance in the model with predictors. The difference is then what I call "variance explained by covariates".
li zhou posted on Thursday, December 10, 2015 - 10:37 am
Dear Dr. Muthen,
I conducted a quite simple model as follows ('pl' is the DV):
bf by bf_know bf_info bf_fami bf_expe; bl by bl_abse bl_sale; si by si_price si_qual si_serv; bf_know bf_expe with bf_info bf_fami; bl on bf; pl on bl; pl on bf; pl on si; pl on age gender gi edu shopper
The 'STANDARDIZED MODEL RESULTS' were quite okay. However, the results of 'R-SQUARE' were as follows:
Please send output to Support along with your license number.
Rick Borst posted on Wednesday, November 02, 2016 - 3:36 am
Dear Professors Muthen,
I analyzed a model with and without a latent interaction term (as a matter of fact it is one moderator which has an effect on two IV's). The outputs show that the R-square is higher without the interaction term than with the interaction term. Although, the interactions are significant. Is it possible to have significant interactions term but a lower r-square or is there something wrong with my input? If it is possible: does that mean that my model with interactions should be discarded despite the significant interactions?
1-2. Level-1 r-square for count DVs is not defined because count regression does not have a residual variance. For Level 2 you work with continuous random effect variables so there the usual R-square for linear regression is used.
Also, and probably more importantly, I have 6 groups dummy coded in the regressions, and I would like to report an effect size for the groups with a significant difference relative to the reference group (run multiple times to change the reference groups).
Any thoughts on how I might do this would be appreciated.
Just interpret the exponentiated coefficients. You can also express the model-estimated means for the count variable and/or consider the corresponding probability of having the count e.g. = 0. See, e.g., our new book.