Message/Author 

Tom Munk posted on Thursday, November 17, 2005  12:05 pm



I've tried (and failed) to replicate MPLUS Rsquare values. For example, in a twolevel model with no predictors, the variances were 898(within) and 372(between). By adding a set of predictors at each level, I obtain variances of 844(w) and 99(b). These are reductions of 6%(w) and 73.4%(b). I would expect MPLUS's Rsq values to closely match these, but they don't. They are 22.5%(w) and 17.6%(b). How are the MPLUS values calculated? Can they be interpreted as the fraction of the (within or between) variance explained by the predictors? 


Rsquare is variance explained divided by total variance. I would need to see what you are basing your numbers on the see what you are doing. Please send your input, data, output, and license number to support@statmodel.com. 


I am doing a twolevel model analysis, I can get Rsquare for each regression. But I can't get the whole model's Rsquare. How can it be displayed in Mplus? Thanks. Within Level Observed Variable RSquare PMIMP 0.491 TMIMP 0.092 PMEFF 0.720 TMEFF 0.072 PMR 0.483 TMR 0.057 MI 0.124 Latent Variable RSquare EFFORT 0.345 IMPUL 0.072 Between Level 


There is not an Rsquare for the full model. 


Hi, If you have the following model Y1 ON Y2; Y2 ON X1 X2 X3; X1 WITH X2 X3; X2 WITH X3; and you have to add X4 to the model in the following way: Y1 ON Y2 X4; Y2 ON X1 X2 X3; Y2 WITH X4; X1 WITH X2 X3; X2 WITH X3; is it possible that Rsquare for Y2 decreases after you entered X4 in the model? And is this the case because you added a corelation with Y2? thank you, Liesbeth 


I'm not sure what might happen in this case. In regression, the means, variances, and covariaces of the covariates are not model paremters. You should not mention x1, x2, x3, and x4 in the MODEL command except on the righthand side of ON. Also, why do you have y1 ON x2 and not y2 ON x4? 


Hi Linda, Thank you for your reaction. Y2 on X2 is theoretically and empirically supported. The association between Y2 and X4 is completely new. X4 is a new concept that is measured at the exact same time as Y2, so I can't use an ON statement as I then conclude causality in a specific direction. But... I actually tried what you suggested Y2 ON X4, but to be sure also X4 ON Y2. Both models give significant and good results (also both interpretable)... So I though it was best to choose for the safe road and keep it in the model as a correlation. Would you suggest to choose a direction? thank you 


When you added x4 to the model you let it correlate only with y2 and y1, not x1x3. This could cause a misspecified model with too high chisquare test of fit, in which case the estimates should not be interpreted. Perhaps you want to regress x4 on x1x3. Also, you might have a misfitting model due to y1 not having direct influence from x1x3. In any case, even if you have wellfitting models, with these leftout effects I would say that the Rsquare behavior is not predictable. 


To my knowledge, the basis for the Rē computation are the variance components of level 1 and level 2 resulting from the empty model. However, the Rē calculated by Mplus does not seem to be identical to the Rē values that result from using the empty model. Mplus seems rather to refer to the estimated within and between level parts of the (co)variance given in the sample statistics output of the specified model which differ considerably from the variance components of the empty model, depending on the specified individual level predictors. example: The dependent variable is school achievement. The empty model provides a within variance of 413 and a between variance of 775, and thus an intra class correlation of .652 (which is quite high, but normal in the tracked German secondary school system). Inclusion of level 1 predictor variables (e. g., prior knowledge, SES) results in an increased within variance(918) and a decreased between level variance (52) in the samp stats output. The residual variances in the specified model were 185 for the within and 52 (that means equal to the between variance part of the samp stats output) for the between level. Rē for within was .80 (resulting from (918185)/918= .798). On the contrary, computation of Rē on the basis of the variance components of the empty model (like in HLM) would result in (413185)/413= .55. End of Part I 


Part II Including level 2 predictor variables in the next step (e. g., school track, mean achievement) results in the following variance components: within = 825 and between = 72; the residual variances were: within = 183 and between = 27. In Mplus within Rē was .78 (resulting from (825183)/825= .778) and between Rē was .63 (resulting from (7227)/72= .625). Computation of Rē on basis of the variance components of the empty model would result in (413183)/413= .56 for within and (77527)/ 775= .97 for between. To sum up, I have three main questions: Why do the variance components at level 2 in the samp stats output decrease after inclusion of predictor variables? Why do they differ so much from the variance components in the empty model? How do I have to interpret them? I would be very grateful if you could answer my question. 


The only reason that I can think of for sample statistics to differ is that the sample size has changed. If this is not the reason, please send your inputs, data, outputs, and license number to support@statmodel.com. 


To my knowledge, the basis for the Rē computation are the variance components of level 1 and level 2 resulting from the empty model. However, the Rē calculated by Mplus does not seem to be identical to the Rē values that result from using the empty model. Mplus seems rather to refer to the estimated within and between level parts of the (co)variance given in the sample statistics output of the specified model which can differ considerably from the variance components of the empty model, depending on the specified individual level predictors. I have three main questions: Why do the variance components at level 2 in the samp stats output decrease after inclusion of predictor variables? Why do they sometimes differ so much from the variance components in the empty model? How do I have to interpret them? I would be very grateful if you could answer my question. 


Mplus computes Rsquare as the ratio of estimated explained variance in the numerator and estimated total variance in the denominator, where this is done for each level separately. Rsquare on level 2 refers to proportion variance explained in random intercepts. With different level 1 predictors, the intercept definition changes (value of y when all x's = 0), so the level 2 Rsquare can therefore change. 

Stephan posted on Sunday, March 02, 2008  11:32 pm



Hello, in one of the web courses Bengt mentioned that user can have good overall model fit (incl. GFI) but rather bad Rsquare for each of their latent variables. Could you please give me any further hints? Under which circumstances might that occur? Thanks a lot for your help.Stephen 


Rsquare is not a test of model fit. It describes the variance of the dependent variable explained by a set of covariates. A model can fit the data well even when the set of covariates does not explain the variance in the dependent variable. 

Stephan posted on Monday, March 03, 2008  3:08 pm



Dear Linda, thanks for the response. But Rsquare for latent variables is not 1residual variance? 

Stephan posted on Monday, March 03, 2008  3:12 pm



sorry, didn't see the output with the standardized residual variances. I am o.k. now. Thanks for your help. Best, Stephen 


I have submitted a manuscript for publication that describes a model which includes 4 continuous latent variables (IV/mediators) and 3 dichotomous observed outcome variables (DVs). A reviewer has asked me to include the percent of the variance accounted for by the IVs/mediators in the DVs. Mplus provides rsquare values for each observed and latent variable, and I am wondering if mplus also provides an rsquare for the whole model (i.e., the percent of variance in each of the outcomes [DVs] accounted for by ALL of the IVs/mediators)? If not, would it be meaningful to square the standardized total direct and indirect effect coefficient to obtain an rsquare value this way? If neither of these ideas are possible, do you have any suggestions about how I might answer the reviewer's concern? 


I would give the Rsquares for each regression. A model Rsquare does not make sense because the aim of the model is not to maximize variance explained but to reproduce variances and covariances. Rsquare is not a model fit statistic. 

Lois Downey posted on Tuesday, October 20, 2009  5:19 pm



Since Mplus doesn't provide a confidence interval around rsquared for clustered regression models, I've been computing the confidence intervals manually, using the point estimate and estimated standard error provided in the output. However, I'm invariably getting a lower bound that is negative. This seems counterintuitive, given that rsquared cannot be less than zero. Are the negative values likely a result of rounding error, given that Mplus provides estimates rounded to 3 digits? (I'm multiplying the estimated standard error by 1.96 and then subtracting and adding the result to the point estimate.) Thanks. 


This can happen because there are no restrictions put on the confidence intervals in their computation. I don't think it is a rounding error. 


Dear Drs. Muthen, I'd like to report R^2 values from growth curve models for latent growth factors (intercept and slope). I understand MPLUS calculates Rsquare values as the (variance explained by the model) divided by (total variance) of an outcome. I have two questions. First, what, other than stated regression paths, contribute to a latent outcome's R^2? Second, when I add demographic variables to explain the outcome, why does MPLUS' given R^2 decrease? Here are some excerpts: MODEL: if sf  y1@0 y2@1 y3@2; i1 s1  x1v1@0 x1v2@1 x1v3@2; i2 s2  x2v1@0 x2v2@1 x2v3@2; if ON i1 i2; sf ON i1 i2; I would expect R^2 for "if" to be a sum of squared standardized regression coefficients for "if" regressed on i1 and i2. For instance, if the standardized B for if on i1=0.375 and i2=0.574, I think R^2 =0.375^2+0.574^2= 47%. However, MPLUS's given R^2 = 76%, which is correctly (1residual/total) = (10.146/0.607) = 0.759. When I then add covariates in another model (e.g., if ON age sex education;), the MPLUSgiven R^2 decreases. Thanks so much! 


You have forgotten the covariance. You need to add: 2*cov*b1*b2 


Hi. I fitted a logistic regression model with complex data. I was asked how much variance is explained by the model. So I stated 'standardized' in the output command. 1. What kind of Rsquare is this? Is it an adjusted version of the Rsquare, something like a Nagelkerke's Rsquare? 2. Is it meaningful in this context? 3. How does it handle the complex data structure? Or is that irrelevant for the computation of the Rsquare (in this case, but also for the case of a continuous dependent) Regards, Ruben. 


The Rsquare is the variance explained in the latent response variable underlying the categorical variable. See the following book for further information: Long, S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks: Sage. Rsquare is not affected by nonindependence of observations. Rsquare is usually not used for logistic regression. 

jmaslow posted on Tuesday, May 10, 2011  7:52 am



Hello, I am able to replicate the r square produced by Mplus in latent variable SEM models. I am attempting to extend this to SEM models containing a latent interaction calculated with XWITH. I use the formula provided by Mooijaart & Satorra (2009) to calculate % variance explained by the main effects and the interaction term and also include 2*cov*b1*b2 and the term that multiplies the b of the interaction term by the variance and covariances of the latent variables and sum all of these terms. However, by this method, the variance explained in the model with xwith included is much lower than the variance explained by the model with only main effects (16% versus 35%). I feel that what is missing is the covariance of the main effects with the interaction term, which are not provided by Mplus. Is there a way to get these covariances from the program, calculate them, or force them to be 0? Thank you. 


You say "and the term that multiplies the b of the interaction term by the variance and covariances of the latent variables" But, that term should be b*b*(V(f1)*V(f2)+[cov(f1,f2)]**2). Then you say "I feel that what is missing is the covariance of the main effects with the interaction term" There are no such terms missing because all thirdorder moments are zero due to the normality assumption of the factors. 

jmaslow posted on Wednesday, May 11, 2011  9:19 am



Thank you, Dr. Muthen. This is very helpful. Just to be clear, then, the main effects and interaction in a model with XWITH are completely uncorrelated with each other and can be considered independent? 


Yes. 

Hans Leto posted on Thursday, May 31, 2012  1:56 pm



Dr. Muthen. Which would be the formula to calculate Rsquared for a threeway interaction. For instance: V(f4) = b1*2 V(f1)+ b2*2 V(f2)+ b3*2 V(f3) + 2*f1*f2 Cov(f1,f2) + b4**2 V(f1xf2xf3) + I am not sure how to specify the Cov when I have three factors. Thank you. 


Rsquare is 1 minus the standardized residual variance of the dependent variable. 

Hans Leto posted on Friday, June 01, 2012  2:52 am



But the standardized residual variance is not provided in the output with TYPE=RANDOM? How can I request or calculate it? 


See the following FAQ on the website: Latent variable interactions 


Dear Drs. Muthen, What kind of Rsquare do we get in Mplus output for a logistic regression? Is this an adjusted version of the Rsquare, something like a Nagelkerke's Rsquare? Thank you very much 


The Rsquare is for the latent response variable. It is described in the Snijders and Bosker book Multilevel Analysis. 

May Yang posted on Wednesday, April 17, 2013  9:26 am



Hello, I believe this is along the same lines as Tom Munk's posting on 11/17/05. I am confused on how R2 is calculated in a multilevel model. So R2= variance explained/ total variance. When I run a null model (without any predictors), I get a within variance=2.234. 2.234 to me is the total variance. When level 1 predictors are added, I get variance estimate of 0.147. I would assume that R2 is then 0.147/2.234 = 0.066 but the output report R2 (obtained using standardized option) = 0.086. What am I missing here? Thank you. 


On each of the 2 levels, Rsquare is explained variance divided by total variance (on that level). Yes, 2.234 is the total variance on level 1, but when adding a predictors the variance parameter that gets estimated is the residual variance. If this does not explain things, please send output to support. 

Elina Dale posted on Thursday, October 10, 2013  4:16 pm



Dear Dr. Muthen, As you wrote, the Rsquare is the variance explained in the latent response variable underlying the categorical variable. Is it proportion? In MPlus CFA output, the RSquare estimate is provided next to observed factor indicators. So, if we see an RSq of 0.400 next to y1 (factor 1 indicator), does it mean that 40% of variance in factor 1 is explained by this categorical variable (y1)? But then there are 3 other indicators that also have a similar RSq estimate and so, when you add them up they are >100%. Also, what do residual variance values (Column 6 in RSquare table) mean in this case? I am giving an example of the output below, so that you understand what res variance and RSq values I am referring to. RSQUARE Observed variable (Column 1) Estimate (Column 2) S.E. (Column 3) Est./S.E. (Column 4) TwoTailed Pvalue (Column 5) Residual Variance (Column 6) Thank you! 


The Rsquare for y1 means that 40% of the variance of y1 is explained by the factor. Factor indicators are dependent variables and factors are independent variables in the factor model. Residual variances are not model parameters for categorical variables. The values given under residual variance are computed after model estimation as remainders. 


Dear Dr. Muthen, If the statistically significance test of betweenlevel R^2 indicates that the R^2 is not statistically significant at (R^2=.07,p>.05), does it mean that the portion of the variances of this latent variable explained by the predictors is not different from zero even though the regression coefficients actually is statistically significant (p=.017)? Whether should I still report the significant regression coefficients? Thank you so much for your advice! Best, John 


Q1. Yes. But note that the test of R2 may not work as well as the test of the regression coefficient because the sampling distribution of R2 may not be as close to normal. Q2. I would do that. In general I don't report R2 significance. 


Hello. I understand that the stdyx option is not available when using random slopes in a multilevel context. I understand that rsquare at level 1 cannot be estimated as it varies as a function of the grouping variable. My question is why an rsquared value cannot be computed for a level 2 variable? For example, I have random slopes at level 1, which predict a level 2 endogenous variable. How could I compute an rsquare for this endogenous variable? Rsquare for this variable should not vary as a function of anything included in the model. Sincerely, Lance 


The R2 for level2 is well defined as you say. You can express it using Model Constraint with Model parameter labels. 


Thank you very much, especially for such a prompt response. Would it be okay if I asked how to use model parameter labels to capture the standardized residual variance? 


The general approach is shown in UG ex 5.20. 

Back to top 