Message/Author 


Hi. I'd like to know how to obtain the proportion of variance explained for each equation in a structural equation model. Can someone fill me in? Thanks. 


If you ask for STANDARDIZED in the OUTPUT command, you will get standardized estimates and also an rsquare for each dependent variable in the model. 

Anonymous posted on Wednesday, November 28, 2001  11:49 am



I'm wondering if there's a way to compare the explained variance (Rsquare values) between different models. Can something like an Rsquare change test be used with SEM? Thanks for any suggestions. 


I am not aware of any particular literature about Rsquare change in SEM models. You may get a better response by posting this on SEMNET. 


Hi, I have a run a second order factor analysis with a SchmidLeiman transformation. There are 20 categorical items. These load to four first order factors and also to one general factor. The correlations between the four first order factors are set to zero(eg orthogonal) and there is no path between the first order factors and the general factor. I specify Stand in the output. My question is how do I calculate the percentage variance explained by the factors. I have in mind that the general factor should account for a large proportion of the varince explained and minor factors very much less. How do I partition the variance explained from the residual variance for the 20 items that is shown in the output. Thanks in advance for any advice. 

bmuthen posted on Tuesday, January 29, 2002  1:12 pm



In this type of model, the general factor influences all items and the specific factors can be seen as residual factors explaining further correlations among subsets of items. Typically, you have all factors uncorrelated. This means that computation of the percentage variance explained by a certain factor is simply using the squared loading times the factor variance. Dividing by the item variance (i.e. y* variance, which is 1 if no covariates), gives the proportion variance in the item explained by the factor in question. 

Leigh Roeger posted on Wednesday, January 30, 2002  3:57 pm



Thank you for that advice. Can I follow up with two further questions. First. I am assuming when you say squared loadings times the factor variance you mean the standardised factor loadings? Is this correct? The second question is more a modelling question. The primary purpose of this analysis is to examine whether the general factor accounts for most of the variation in the items and the specific factors very much less so. I notice that some analysts (eg Gustafsoon) fix the variance estimates of all the latent variables to 1. Do you see some advantage in this or is it a hang over from early LISREL analyses which used correlation matrixes. Any views most welcome. 

Leigh Roeger posted on Wednesday, January 30, 2002  4:45 pm



Adding a further question? Should the squared factor loadings times the factor variance in the total model (in my case 1 general factor and four specific factors) add to 100%? It doesnt in my case. Is there a way to calulate this so it does add to 100. That way you can see the proportion of variance accounted for by the general factor and the specific factors. 

bmuthen posted on Thursday, January 31, 2002  11:51 am



You can use the unstandardized loadings. The variance accounted for by the factor plus the residual variance add up to 100%. With categorical outcomes, you get the residual variance printed in the Rsquare section of the output when requesting Standardized; the total (y*) variance is 1, unless you have covariates in the model. As with CFA in general, you can set the metric of the factor by fixing its variance to one or by fixing one loading at one. If anything is unclear, please send a full output to support and your percentage calculation can be checked out. 

esoofi posted on Tuesday, February 19, 2002  2:58 pm



Hi Linda, Is rsq shown on page 75 of the User's Guide the same as the one defined on page 288 of Bollen (1989) Structural equations with latent variables? If not, please give me thr formula and the reference. Ehsan 

bmuthen posted on Tuesday, February 19, 2002  3:01 pm



Yes, they are the same. 

Anonymous posted on Thursday, September 26, 2002  4:41 pm



(Reposted by Mplus Discussion webmaster due to forum malfunction.) Hi, this might be a very basic question but maybe you can help me? I am working on a sem with six latent constructs, all with categorical indicators. I am confused about calculating the variance explained by its indicators for a latent variables. Is the variance/residual variance listed in the results section the variance explained? For the endogenous latent variables the variance at the end of the residuals section is the variance explained by the other latent variables, isn't it? How can I calculate the variance explained by the indicators for exogenous and endogenous variables? thanks! 

bmuthen posted on Friday, September 27, 2002  9:56 am



When you ask for Standardized solution, you get an output section with Rsquare values for dependent variables. This gives you the proportion of variance explained in these dependent variables, be they observed or latent. 


Hi. I'm wondering how I can use the estimated reliability of variables in SEM under Mplus. For example, in a model I would like to run, the final dependent variable is attitude to nonmarital cohabitation. I have two indicators, whose correlation gives me a first, albeit rough, indication of their reliability (about .70). Another endogenous variable is church attendance, for which other evidence suggests a reliability in the neighbourhood of .75. In some other programs, I could declare how much of the variance for each indicator appeared to be unreliable, and the output would be modified accordingly. I'd like to know whether I can do this in Mplus, for endogenous and/or for endogenous variables. Can someone fill me in? Thanks in advance. 


You can create a latent variable for the observed variable as follows: f BY y@1; y@a; where a is the error variance in y chosen as a = (1 – reliability) * sample variance. 

Anonymous posted on Thursday, May 27, 2004  11:47 am



Hi. I'd like to know how to obtain the proportion of variance explained for the total structural equation model. I believe the software should generate that for me, but just don't know how to tell it to do that. Could someone help me out? Thanks. 

bmuthen posted on Thursday, May 27, 2004  1:03 pm



Isn't it sufficient to consider the Rsquares for each equation in the model? Particularly since SEM does not aim to explain variance but correlations (covariances). Other readers opinions are welcome. 

Anonymous posted on Wednesday, November 10, 2004  2:44 pm



Hi there, I noticed that there are not significance values for the estimated correlation matix for the latent variables. How would I compute significance? Also, I would like to know the estimated correlation matrix for manifest variables and latent variables. How would I obtain these? Thanks in advance! 

bmuthen posted on Sunday, November 14, 2004  11:27 am



To get significance tests for the latent variable correlations, set the metric of the factors by fixing their variances to 1 instead of setting the metric in the loadings. The estimated correlations between manifest variables and between manifest and latent variables are not given by Mplus but have to be computed from the parameter estimates. 

Anonymous posted on Friday, November 19, 2004  2:46 pm



Hi there, Thanks for your message! To follow up....I noticed in the Mplus manual that the correlation between exogenous manifest variables are fixed in the model estimation. We can obtain these correlations with the samstat option. Correct? Also, I read that the covariances among continous latent independent variables and manifest independent variables are free (pertaining to my model's specific case). Thus, does Mplus automatically include these covariances (exogenous latent and manifest variables) in the model estimation and just not show the results in the MODEL RESULTS section. I believe I should not specify these covariances since they are exogenous variables. Is this correct? You mentioned above that I would need to compute the estimated covariances between manifest and latent variables from the parameter estimates. Lets say that I have a model with 3 IVs (2 manifest and 1 latentfor simplicity) predicting one continous latent variable (thus simple regression). How would I compute the covariance between the latent variable and first manifest variable given the information provided by Mplus? Thanks in advance! 

bmuthen posted on Friday, November 19, 2004  4:56 pm



Yes on your first question. Regarding correlations between exogenous manifest and latent variables, I misunderstood your question in your earlier message  I didn't realize you were talking about parameters in an exogeneous part of the model as opposed to more generally how to get correlations between manifest and latent variables . In principle, relations among manifest and latent exogeneous variables should be included in a model, either as correlations or as regressions of the latents on the manifests (assuming as if often the case that the manifests are demographics). If Mplus does not include such relationships in the output, then they are not estimated, and you need to add them to your model. This is different from the case of correlations among exogeneous manifest variables where the correlations are fixed at the sample values because these would be the estimates if they correlation parameters were free to be estimated. Relationships between manifest and latent variables, however, cannot be anticipated from the sample statistics but need to be estimated. Hope that clears it up. 

Anonymous posted on Friday, November 26, 2004  11:45 am



Hi, I'm sorry if it's a stupid question but I read the discussion and I still don't understand how you can calculate the explained variance for each factor in a CFA. I'm doing a CFA with 18 binary indicators and 3 factors. When i type output=stand, i get the Rsquared for each indicators but not for each factor, ... just to add, the factor variances are fix to 1 thanks 


The factor indicators are dependent variables in the factor analysis model. The factors are independent variables. You obtain rsquares only for dependent variables. 

Anonymous posted on Friday, November 26, 2004  12:27 pm



so is it possible to have the Rsquared for each factor? I mean: do you know how I can calculate it? thanks 


A factor does not have an rsquare because it is not a dependent variable in a CFA model. Perhaps you mean the variance contributed by each factor to the observed variable variances. If so, you sum the squared factor loadings from an orthogonal (factors uncorrelated) solution and divide by the number of observed factor indicators. 

Anonymous posted on Saturday, November 27, 2004  8:11 pm



Hi, I have a question regarding the consequences of specifying covariances between exogenous latent and manifest varibles. I specified type = missing h1, estimator = ml, and algorithm = integration for my model. I understand that one must specify covariances between exogenous latent and manifest variables. I did this, but when I did the model output produced the covariances between exogenous manifest variables (not specified in code). I read in the Mplus discussion that one should not specify exogenous covariances between manifest variables. I did not specify this in code and Mplus produced this in the Model Results. Is it the case with algorithm=integration that Mplus does not use the sample values? I just want to make sure I am not misspecifing my model. Also, I read in the manual that one cannot specify missing h1 with algorithm=integration. I specified this and it ran. Can you explain this? Is it really running missing plain then? Further, how should one decide which info matrix to use (e.g., observed, expected, or comb)? Thanks in advance! 


I would have to see your full output to answer your question. Please send it to support@statmodel.com. H1 is on by default with ALGORITHM=INTEGRATION. Use the information matrix that is the default. 

Anonymous posted on Monday, November 29, 2004  4:54 am



Hi, Yes i meant the variance contributed by each factor to the observed variable variances. So I sum "the squared factor loadings from an orthogonal (factors uncorrelated) solution and divide by the number of observed factor indicators". But when i do that, I got 0.66 for the first factor, 0.63 for the second factor and 0.72 for the third factor. As you can see, if we add the 3 variance contributed by the 3 factors, i get more than 100%!!! I think there is something wrong, no? Just to precise: the 3 factors are uncorrelated Thanks again for your help 


If I remember correctly, you are doing a CFA. In this case, you should be squaring the standardized factor loadings I believe. In CFA, a covariance matrix is analyzed. In EFA, a correlation matrix is analyzed. 


Good morning. Is it possible to save the rsquared values out to a file? I'm doing some rather elaborate file manipulation and that would be very helpful. I haven't found anything in the SAVEDATA documentation. Thanks. 


The RESULTS option of the SAVEDATA command does not currently save rsquare values. I will place your request in our suggestion file for the future. 


Ok, thanks. 

Alicia posted on Monday, June 13, 2005  11:40 am



I'm doing GGMM and interested in how much variance I am explaining in the observed variables (V308V4708 in this analysis). I specify STANDARDIZED in the output command, but the RSQUARE section lists only each class followed by a blank line. Here is my input: VARIABLE: NAMES ARE V106 V350 CLGSTAT V308 V1708 V2708 V3708 V4708 C35152 C35153 C35154 C35155 C35156 C35157 C35158 C35159 C35160 C35161 C35162 C35163 C35164 C35165 C35166 C35197 C35199 C35200 C35201 ; CENSORED = v308(b) v1708(b) v2708(b) v3708(b) v4708(b) C35152(b) C35153(b) C35154(b) C35155(b) C35156(b) C35157(b) C35158(b) C35159(b) C35160(b) C35161(b) C35162(b) C35163(b) C35164(b) C35165(b) C35166(b) C35197(b) C35199(b) C35200(b) C35201(b); USEVARIABLES ARE v308 v1708 v2708 v3708 v4708 C35152 C35153 C35154 C35155 C35156 C35157 C35158 C35159 C35160 C35161 C35162 C35163 C35164 C35165 C35166 C35197 C35199 C35200 C35201 ; WEIGHT IS v106; CLASSES = c (5); ANALYSIS: TYPE = MIXTURE; MITERATIONS = 500; MODEL: %OVERALL% disorder BY C35152@1 C35153*.7 C35154*.7 C35155*.7 C35156*.7 C35157*.7 C35158*.7 C35159*.7 C35160*.7 C35161*.7 C35162*.7 C35163*.7 C35164*.7 C35165*.7 C35166*.7 C35197*.7 C35199*.7 C35200*.7 C35201*.7; i s q v308@0 v1708@1 v2708@2 v3708@3 v4708@4; %c#1% [i*5.5]; [s*1]; [q@0]; [disorder]; %c#2% [i*5.5]; [s*3]; [q@0]; [disorder]; %c#3% [i*3]; [s*3]; [q*5]; %c#4% [i*6]; [s@0]; [q@0]; [disorder]; %c#5% [i*0]; [s@0]; [q@0]; [disorder]; OUTPUT: TECH1 TECH12 RESIDUAL STANDARDIZED; 


This is a support question. Please send your output and license number to support@statmodel.com. 

Anonymous posted on Thursday, June 30, 2005  10:27 am



I used Mplus with ML to estimate a path model without latent variables. In one model, I had only 1 DV and 5 IV's. The R^2 was .882. In a second model, I had this 1 DV and the same 5 IV's in addition to other variables in the model, some of which predicted two of the IV's mentioned above (with no additional predictors of the DV mentioned above). The R^2 for this same DV in the 2nd model was .802. Why the difference in R^2 for this one DV when exactly the same predictors were used to predict this DV in both models? The regression coefficients and standard errors relating to this one DV were identical in both models. 

BMuthen posted on Saturday, July 02, 2005  5:50 pm



The second model that you specify sounds like it is overidentified. This may make the estimated covariance matrix for the initial set of 5 iv's different than it was in the first model. You can see if this is the case by requesting RESIDUAL in the OUTPUT command for the second model and comparing it to the SAMPSTAT values for the five iv's in the first model. 

Anonymous posted on Thursday, September 15, 2005  12:22 pm



Linda, Where is the error term for each indicator in the Mplus output? Is it the S.E. column? Thanks 


Under residual variances. The column S.E. contains the standard errors of the parameter estimates. 

yulan han posted on Tuesday, November 08, 2005  6:05 pm



Hi,Linda Could you tell me how to make the error terms of two variables correlated in Mplus? Thanks. 


y1 WITH y2; where y1 and y2 are dependent variables in the model. See the WITH command in the Mplus User's Guide. 

HWard posted on Friday, March 10, 2006  9:20 am



I am running models in which a dichotomous 'obesity' variable is the final dependent variable. obesity is regressed onto education, activity, diet, age, and smoking status, with indirect pathways running from education to obesity through activity,smoking, and diet. I have run seperate models for men and women, and found significant parameters associated with obesity for women only  however, the rsquare values for the obesity variable is often higher for men than for women. this doesn't make sense to me as none of the variables were significantly related to obesity among men. any thoughts on why this might be happening? education and activity are categorical, diet is a latent varialbe, age in continuous, and smoking is entered into the model as two dummy variables. 


Rsquare is variance explained. There may be less variance for one group versus the other and less power for one group versus the other. The obesity variables may behave differently for males versus females. 


I have the following SEM model F1 > F2 > F5 F2> F4 F3> F4> F5 F3> F5 One of my thesis questions was that F2, F4, and F4 have a direct effect on F5. F5 is a single indicator latent variable. When I run the model, all the paths are significant apart from the F4> F5 When I look at the RSquared, F2 and F3 explained 88% of the variance of F4, and F2 and F3 explained 94 % of the variance of F5. Can I explain the nonsignificant path by the fact that the remaining variance of F4, not being explained by F2 and F3, is too small to add a unique contribution to F5?. Besides, F2 and F3 explain F5 to a great extent (94 %) so it is little variance left for any unique contribution of F4, are my arguments correct? I have an additional question, if I employ multiple regression, both hierarchical or stepwise, all the observed variables are significant in predicting F5. However, if I run SEM , the variables for F4> F5 are not significant, what are the possible reasons? 


The nonsignificant direct effect of F4 on F5 may be due to F4 being highly correlated with F2 and F3, so that F4 does not contribute much in the F2, F3, F4 prediction of F5. In your multiple regressions you predict F5 using observed variables. The factors of the SEM don't contain the measurement errors in those observed variables so you get different results. Another explanation could be that your SEM model does not fit the data well. 


I want to be able to report the variance explained by each of 5 exogneous variables on other model components. My understanding is that if the exogenous variables are specified to be orthogonal and are the only predictors of a model variable, I can simply square the StdYX. But what if the exogenous variables are correlated, or what if other model variables also predict the variable? Can the StdYX be interpreted as a semipartial correlation? 


Why don't you simply report the Rsquare that you get when requesting the standardized solution? 


Thanks for the quick reply. Actually, I am reporting the Rsquare values, but I was wondering if it is possible to somehow partial the variance between the exogneous variables, i.e. is there a semipartial correlation equivalent in SEM? 


I think it would be hard to partial out variance explained contributions from correlated predictors. 

Amy Bleakley posted on Thursday, January 18, 2007  11:48 am



Hi I am estimating a nonrecursive, multiple group model. The models do not include any latent variables. Two questions: 1. Which method does Mplus use to calculate the rsquared for nonrecursive models? 2. For one of the groups, I am obtaining an undefined r squared. The models have excellent fit, and negative variance (with both error variances and the estimated variances) isn't an issue. Do you have any idea what the problem may be? What diagnostics would you suggest? Thanks! 


1. Same as for recursive models. 2. Please send the input, data, output, and your license number to support@statmodel.com. Usually this is caused by a negative residual variance 


This is a follow up to Bleakley's question posted on 1/18/2007. I'm taking Linda's response to the question to mean that the Rsquares reported in the output for a nonrecursive model represent the proportion of variance explained and I do not have to calculate an alternate Rsquare using the Bentler & Raykov equation (2000). Bentler & Raykov contend that in nonrecursive models measures of explained variance are not simply summarized by a standard Rsquare measure because of the reciprocal interdependencies among variables. Thanks. Bentler & Raykov (2000). On measures of explained variance in nonrecursive structural equation models. Journal of Applied Psychology, 85, 125131. 


Mplus does not provide the Bentler and Raykov Rsquare for nonrecursive models. If you feel that would be better for your purposes, you would need to calculate that by hand. 


Is calculating the Bentler and Raykov Rsquare for nonrecursive models something that you and Bengt typically recommend? I don't know whether it would be better for my purposes or not (or even how I would figure that out). Thanks. 


We don't know enough about it to recommend it or not. We typically don't work with nonrecursive models. Are the arguments in the article compelling? 


An afterthought. If you calculate both, you can see how different they actually are. 


Hi, I have a question about SEM and explained variance. I have that king of model: F1 by y1y3; F2 by y4y6; F3 by y7y9 ; F3 on f1 f2 ; Dropout ON f2 ; Dropout ON f2 ; First, is it possible to have a variable dichotomous (dropout) ON a factor? In the output, I have an Rsquare of 0,739 for the variable dropout. Is that true to tell that 74% of the variance of dropout is explained by the model? Thank You 


Yes it is possible to regress a dichotomous variable on a factor. It is the variance explained by the set of covariates that the dependent variable was regressed on not by the model. With a dichotomous outcome, it is the variance explained of the underlying latent variable not the observed variable. 


Is it possible to know if my model explained well the variable dropout ? Thank You 


The fit of the full model to the data can be assessed using fit statistics like Chisquare, RMSEA, etc. The variability in dropout is not explained by the model but by the covariates for dropout. 


I have 2 latent predictors (categorical) and their interaction predicting an observed outcome (continuous)…is there any way to get RSquare? 


Rsquare has not been defined for this application. 

Paul Silvia posted on Thursday, November 29, 2007  10:35 am



Hi. A quick questionhow does Mplus 5 compute the standard error of Rsquare? (I've checked the UG and Technical Appendices.) 


Using the "Delta method" where the variance of Rsquare is computed from the variances and covariances of the parameter estimates that contributes to Rsquare. The Version 5 Tech doc about standardizations describe the Delta approach in general. 


Hi Linda and Bengt: I have tested a model with one exogneous latent construct and controlling for 3 other observed measures. I have been asked by my advisor how much more does the one exogenous latent construct add to the explained variance of the final DVs above and beyond the 3 control variables. After reviewing this thread, I'm not sure if the variance can be parcelled out? I will also review the SEMNET archives to see if I can figure this out. Thanks, Sue 


We do not know of any way to disentangle this. 


The exogeneous factor is presumably correlated with the observed exogeneous control variables so it acts to influence the variance explained not only by itself but also through the control variables, i.e. in a too complex fashion. 


As always, I am most appreciative. Thank you again. Sue 


I have run 2 models each containing a total of 14 factors. In the one model, I regressed one of the factors (factor #1) on one of the others (factor #2). In the other model, I regressed factor #1 on both factor #2 and a third factor (factor #3). The Rsquare for the latent variable factor #1 is larger in the first model (with one fewer predictor) and this makes no sense to me. I have not constrained the measurement model parameter estimates to be equal across the two models, might this have something to do with my nonsensical result? One other strange aspect of the output is that the modification indices for both models include output regarding a measurement model parameter that is already being freely estimated. 


I would need more information to answer your question. Please send both outputs and your license number to support@statmodel.com. 

jmaslow posted on Friday, December 04, 2009  1:57 pm



I have a fairly simple model in which 2 latent exogenous variables predict 2 latent outcome variables. The latent outcomes are correlated with each other, and each has only 1 indicator. I have requested standardized output, but I do not receive r square values for the latent outcome variables, only for each of my observed variables (which are the indicators of the independent and dependent latents). How can I calculate r square in each dependent variable? 


Please send your full output and license number to support@statmodel.com. 


Hi Linda, I am using a second order latent factor (measured by three first order factors) as a mediator between sociodemographic and health variables and an outcome. after running the hypothesized model the modification indices suggest paths from the sociodemographic and health variables to the first order factors of the mediating factor and also paths from the first order factors of the mediating factor to the outcome. does this suggest the first order factors are more important in predicting the outcome than the second order factor? is it suggesting that we use the three first order factors instead of one second order factor? thanks 


The modification indices show where freeing parameters will improve model fit. Whether the parameters should be freed should be guided by theory. 

Mike Zyphur posted on Thursday, September 16, 2010  8:40 pm



Hi Linda and Bengt, Quick question: relevant to the notion of semipartial/part correlations, can you think of any way to work directly with residuals without influencing the part of the model influencing a variable Y directly? I cannot. Setting up a latent variable for residual as e by Y@1; Y@0; is not adequate and I can think of no other way. Thanks for any help mike 


Here's one example we have: The following approach can be used to obtain the residual for a factor indicator y4 and to regress a variable z on this residual. MODEL: f BY y1y4; y4res BY; ! define the residual as a factor, picking up the residual variance as the variance of this !factor y4 ON y4res@1; ! let the residual !influence y4 with the requisite unit !slope y4@0; ! fix the original residual !variance to zero z ON y4res; y4res WITH f@0; 

Mike Zyphur posted on Saturday, September 18, 2010  5:51 am



Hi Linda, Thanks for the input and your time. With the y4res model, it seems that in many ways specifying the residual of y4 as a factor y4res is very similar to not having the factor there. For example if we have two models: Y1 on X1 X2; versus Y1 on X1; Y1@0; Y1res by; Y1 on Y1res@1; Y1res on X2; they will return equivalent parameter estimates inasmuch as Y1 is simultaneously regressed on X1 and X2 (through Y1res). I can't think of how to do it, but I want to specify a structural model for the residual after other parts of the model have been estimatedlike a semipartial correlation. With covariances this is easily done as Y1 on X1; Y1 with X2; but extending this to allow regression among the residual and X2, after estimating the effect of X1 on Y1, is tough. A 2 stage procedure is likely required, where one would save the residuals and then use them as data. Thoughts? Thanks!! mike 


Maybe the dependent variable needs to be a factor with multiple indicators. 

Mike Zyphur posted on Sunday, September 19, 2010  9:06 am



Sorry, I missed it before, but the solution seems as follows, and does require the logic of latent variables for residuals. Here are a few different types of models. For semipartial correlations where the goal is r_xy partialing for Z: Y on Z; Y with X; For changes in R^2 in stepwise regression, where a first step contains Z and the second step adds X, we need a semipartial regression coefficient partialing for Z, which can be done with X on Z; X@0; X_res by; X on X_res@1; X_res on Y; and we can add additional "steps", for example with a variable W in the third step as X W on Z; W on X; X@0; X_res by; X on X_res@1; X_res on Y; W@0; W_res by; W on W_res@1; W_res on Y; Then we can compute changes in R^2 in a MODEL CONSTRAINT statement where we divide the residual of Y by the total variance in Y. For future for all forum goers, see Preacher (2006) for details. Happy modeling! Preacher, K. J. (2006). Testing complex correlation hypotheses with structural equation models. Structural Equation Modeling, 13, 520543. 

luke fryer posted on Wednesday, September 29, 2010  9:32 pm



I would like to ask about a post from 2004: "Anonymous posted on Thursday, May 27, 2004  11:47 am Hi. I'd like to know how to obtain the proportion of variance explained for the total structural equation model. I believe the software should generate that for me, but just don't know how to tell it to do that. Could someone help me out? Thanks." Dr. Muthen's reply suggested that it would be sufficient to consider the RSquare for each part of the model... I think... In my case, I am modeling a test of Vocabulary knowledge (second language learners) and in addition to testing the dimensions of vocabulary recall, I wish to determine the proportion of variance (from the entire data set) explained by the working model. Confirmatory factor analysis with Mplus of the test suggests that our model fits the data well(dichotomous data). But because it is a test, we need to know whether the model sufficiently explains the variance in the datawhether the model as a whole and the factors individually are meaningful with regard to variance explained. Is there some way to do this within Mplus? Thank you luke 


Mplus provides Rsquare for each dependent variable in the model. Model fit assesses the closeness of the H0 to the H1 model. 


Hi, I've tried to read through most of the posts here, but I haven't quite found what I'm looking for, so please forgive me if it's a repeat. In a basic SEM, I'm trying to estimate the Variance Explained for each individual path (with multiple paths of course), in addition to the total variance explained (which I know how to get in Mplus). Now, I'm a regular user of LISREL (don't shoot me, I'm trying to convert), and I've found that by simply multiplying the value in the 'Correlation matrix of the ETA' with the BETA path, you get the amount of VE to that predictive path only (this is validated by adding them together and getting the same value as the Sum of Squared Multiple Correlations for the dependent variable). My question is can Mplus provide output for the equivalent matrix of the 'Correlation matrix of the ETA' provided by LISREL? If not, is there anyway to compute individual path VE's from Mplus? I only ask because some journals love VE estimates as they think we get too excited about significance for any one path. Cheers, 


The TECH4 option of the OUTPUT command will give you the correlations of the factors. 

sunnyshi posted on Monday, March 07, 2011  8:29 pm



Dear Mplus, I am running a SEM with a latent interaction term. Although I requested standardized output, due to type=random, the message says" STANDARDIZED (STD, STDY, STDYX) options are not available for TYPE=RANDOM. Under this case, how could I get the R square for each dependent variable? Thanks! Best 


See our FAQ "The variance of a dependent variable as a function of latent variables that have an interaction is discussed in Mooijaart and Satorra" 


I was hoping you could help me with a question regarding percent variance accounted for in EFA. I have 3 factors, varimax rotated. I understand that this is calculated by summing the squared loadings for each factor and dividing this by the number of items, but my questions are: 1) Do I zero out the loadings that are under my cutoff criteria (.4) before summing? 2) Do I divide by the total number of initial items entered in the analysis, or by the number of items that were retained? 


1. You don't delete any loadings. 2. You use the number of variables on the USEVARIABLES list. 


Hello, I have read through this post, but I'm still not sure if my question has actually been answered. I am trying to calculate the variance explained by ENERGY (a latent variable) predicting pewl6m while controlling for TX, BMI, and Age. I know the R^2 provides the proportion of variance explained by the whole model, but I want to know only what ENERGY is contributing. See syntax below. ENERGY BY MR47MR56; ENERGY ON TX; pewl6m ON ENERGY TX BMI AGE; BMI ON TX; BMI; AGE; MR50 WITH MR48; MR54 WITH MR53; Also, for the R^2 I do not get a pvalue in the output. How can I tell if the R^2 is significant? Thank you. 


We don't give a contribution for each covariate only for the set of covariates. The current version of Mplus provides pvalues for Rsquare. 


Hello again, Thank you for your response, but is there a way to calculate proportion of variance for a specific predictor using the standardized coefficient or residual variances? Also, I am using Mplus version 6, but the pvalues are not showing up for Rsquare. Do I need to type Standardized in the Output section? Thanks 


I am unaware of a way to obtain Rsquare for one of a set of predictors. You will not get Rsquare unless you ask for STANDARDIZED in the OUTPUT command. 


I ran a path model including only observed variables. For the outcome variable, Mplus calculated an Rsquare of 0.76. This seemed high to me, so I ran a simple multiple regression analysis using the same 4 predictors in SPSS, and I got an Rsquare of 0.31. Do you know why there would be such a considerable discrepancy? 


It would be impossible to say without seeing both outputs. The most likely reason is that the samples are not the same. If you would like us to look at it, send the two outputs and your license number to support@statmodel.com. 

Dave posted on Tuesday, April 24, 2012  11:25 am



Hello, I would appreciate your guidance on three points. I have been asked to support including a latent variable in a model predicting a binary outcome after controlling for other variables, or the Rsquare for one of a set of predictors. One approach I have experimented with is constraining the prediction for the latent variable to zero, looking at the Rsquare for the rest of the model and then releasing the constraint and checking for a difference in the Rsquare value. Ignoring the problem I address next, is this a valid approach and would an observed difference would have meaning? Second, I recognize that for binary variables Mplus calculates "the variance explained of the underlying latent variable not the observed variable". I think my binary variable does not have an underlying latent variable (it is an either/or variable, e.g. pregnant or not, quit or not). For this type of variable, does McKelvey & Zavoina pseudo Rsquared make sense? Do you have a suggestion for how to figure out something like the change in the model's ability to properly classify cases based on including the latent variable or not? Thanks. 


I would not use Rsquare for this purpose particularly with categorical variables. In the logistic regression literature, classification quality is used for this type of decision. You can look at the classification quality of the model with the factor and without the factor. You do this as follows. 1. Create an estimated probability for each person's observed 0 or 1. 2. Create a classification for each person based on the rule 0 if estimated probability less than or equal to .5 and 1 if estimated probability is greater than .5 3. For the estimated models, compute the estimated probability for each person and create a classification according to number 2. 4. Compare to classifitions from the two estimated models to the observed scores and see which model has the best classification. 

Heike Link posted on Friday, April 27, 2012  4:54 am



Hello, I am rather new to Mplus and have a question: I have estimated a model which consists of an ordinary logit model and a latent variable part, whereby both parts where estimated jointly by using MLR with Monte Carlo simulation. The structure of my model is below: Y ON x1x5 (Y is a binary variable, x1x5 are continuous predictors) Y ON LV2 LV1 BY i1 i2 (i1 and i2 are continuous indicators) x6 x7 ON LV1 (x6, x7 are exogenous variables, continuos) LV2 BY i3 i4 i5 i6 (i3 – i6 are categorical indicators) x8 x9 ON LV2 (x8, x9 are exogenous variables, continuous) LV2 ON LV1. The model was aimed at providing a better explanation of Y by including latent variables and this worked out well, the parameters are significant and can well be interpreted. However, when I compare the simple Logit part (Y ON x1x5, estimated separately) with the results of my model above I see on the one hand an increase in Rsquare (it doubles) but a dramatic fall in the LogLikelihood and in AIC and BIC which have not expected. Of course I have tested the latent variable part separately with CFA using WLSMV and get reasonable fit indices (CFI 0.987, TLI 0.982, RMSE 0.028). I would be grateful on any explanation on this. 


The two models have different sets of dependent variables. This is why the loglikelihoods are different. They are in different metrics. They should not be compared. 

Dave posted on Friday, April 27, 2012  3:19 pm



Hello Linda, Thank you for your guidance on my questions regarding change in a model's ability to classify cases. I have a couple of follow up questions regarding how to implement your suggested approach. I have used the guidance on the message boards to figure out how to calculate the probabilities for a model with observed covariates using the material in short course topic 2 and the user's guide (April 2010, p. 440  Calculating Probabilities From Probit Regression Coefficients). I am stuck on how to calculate the probabilities for individual cases after I include a latent variable in the model. My questions: 1  Do I use each individual's factor score multiplied by the estimate for the latent variable and add this term to the probability equation? 2  To test the differences properly does the estimate for the latent variable need to be constrained to zero in the first round of estimates? 3  Should I calculate probabilities using estimates or standardized estimates (and if standardized estimates are used do I use a combination of different types of standardization based on the type of covariate, e.g., binary: STDY, continuous: STDYX of just one set of estimates)? Thank you in advance for your suggestions. 


See Slides 163 and 164 in the Topic 2 course handout on the website. 

Dave posted on Monday, April 30, 2012  2:42 pm



Hello Linda, Thank for pointing me towards slides 163/164 and equations 46/47. I have been going over the slides. Correct me if I have misunderstood the equations, but my take is that they calculate item probabilities for one of the indicators of f1 and a covariate. I am wondering if it is possible to extend these equations to a model where you have multiple lambas and etas (say a latent variable with three indicators so that you can account for the full latent variable in calculating individual probabilities). If this is possible, what would this equation would look like? 


The model in the example has multiple indicators. We just focus one one. You can do one at a time. 

Dave posted on Thursday, May 03, 2012  1:02 pm



Hello Linda, Do equations 46/47 work for creating estimated probabilities for each person's observed 0 or 1 when the two models I am trying to compare (estimator = WLSMV) are like: Model 1: F1 by y1 y2 y3; Binary on F1@0 control1 control2 control3; (Alternatively, Model 1 could be just the second line: Binary on control1 control2 control3); and Model 2: F1 by y1 y2 y3; Binary on F1 control1 control2 control3; I am not seeing how I account for F1 (and y1 y2 y3) in the formulas. Do I have to reformulate the model such that I have a lambda for Binary, and if so what would that look like? 


Binary on F1 is just like having another indicator of F1, so it is correct to use the formulas mentioned above. You don't have to bring in y1y3 because you condition on F1 and once you do that, y1y3 don't have further influence on Binary. 

Heike Link posted on Tuesday, June 19, 2012  6:48 am



Hi, I have estimated an integrated binary logit and latent variable model by using the MLR estimator, with Monte Carlo simulation for integration. Since the conventional R square does not make sense here, I am wondering whether the R square given in the output is McFaddens Pseudo R square or any other of the available measures. Many thanks for an answer! 


The Rsquare is for the continuous latent response variable underlying the binary DV. This is in line with the ZavoinaMcKelvey article and the SnijderBoskers multilevel book. 


Hello, Can you please explain how the standard error and pvalues for Rsquare are calculated in Mplus? I am running a linear regression model on observed variables in a complex data set (students clustered in classrooms). My understanding has always been that R square is tested by F = (Rsquare/k)/((1Rsquare)/(Nk1)). I am not sure how to interpret the standard error associated with R square in the Mplus output and how it is used in calculating the pvalue. Thank you for your help. 


We use a ztest instead of an F test. The standard error is computed using the Delta method. 

Utkun Ozdil posted on Monday, August 13, 2012  8:52 pm



Hi,, I'm estimating a twolevel model with student and calssroomlevel predictors regressed on three dependent variables. When I typed STANDARDIZED in the OUTPUT command to get the explained variances, for the studentlevel model including only level 1 predictors Mplus output gave me residual variances at within, variances at the between, and Rsquares at the within part of the model for each of the three dependent variables. For the full model in which I included both student and classroomlevel predictors the output gave me residual variances and Rsquares at within and between levels for each of the three dependent variables. My question is that instead of getting these values for each dependent variable, as models generally can explain variance through fixed effects as one single Rsquare term per model, is there a way to get a single variance term via Mplus per the student model and the full model? Thanks. 


We don't give an Rsquare for dependent variables together. 

Utkun Ozdil posted on Tuesday, August 14, 2012  12:35 pm



Linda, in the output of the withinclassroom model where I included only gender and ses as studentlevel predictors,I got these Rsquares for Math1, Math2, and Math3. Should I have estimated separate covariate models for Math1, Math2, and Math3 so that Mplus output would direct me towards a single Rsquare at the withinlevel for each? TITLE: WithinClassroom Model DATA: FILE IS wcm.dat; VARIABLE:NAMES ARE class gender ses Math1 Math2 Math3; USEVARIABLES ARE gender ses Math1 Math2 Math3; WITHIN= gender ses; CENTERING = GRANDMEAN (gender ses); CLUSTER IS class; MODEL: %WITHIN% Math1 ON gender ses; Math2 ON gender ses; Math3 ON gender ses; %BETWEEN% Math1; Math2; Math3; ANALYSIS: TYPE IS TWOLEVEL; OUTPUT: SAMPSTAT STANDARDIZED; Thanks. 


Please send the output and your license number to support@statmodel.com. I can't understand what you mean. 

Stace Swayne posted on Saturday, December 01, 2012  9:23 am



Dear Dr. Muthen, I ran the following model f1 by x1 x2 x3 x4 x5; f2 by v1 v2 v3 v4 v5 v6; f3 by v7 v8 v9 v10 v11; f1 ON f2 f3; My question: I want to find out whether f2 or f3 is a better predictor of f1 (variance explained). Is it possible to get this from the Mplus output? Thanks 


You can compare the STD coefficients. You can't divide up the explained variance because f2 and f3 are correlated. 

Tyler Mason posted on Monday, April 22, 2013  12:16 pm



I ran an SEM model in Mplus and asked for the R2 values. Do I interpret the R2 values as the model explaining % of variance in the DV? or do the R2 values means the % ecplained by the last mediator/IV? Thanks! 


Each Rsquare value is the variance explained for the dependent variables by all covariatess in the model for that dependent variable. For the following regression, y ON x1 x2 x3; Rsquare is the variance in y explained by x1, x2, and x3. 

Heike Link posted on Thursday, June 06, 2013  1:30 am



Hello, Some time ago I have performed an analysis with Mplus and have estimated a binary logit model with latent variables (a MIMIC part), estimated jointly by using MLR with Monte Carlo simulation. The structure of my model is: Now I have received several comments and requests from reviewers of a paper and I would be very grateful for some advise on the following issues. 1) I was requested to give further measures for Pseudo R squares in addition to the Zavoina McKelvey one. Is this possible in Mplus? 2) The reviewers recommended to use the p^2 value defined as 1(L(β)/L(0)) where L(β) is the LogLikelihood of the final model at convergence and L(0) is the LogLikelihood of a model only containing the alternativespecific constant. How can I get this in the Mplus output? And for clarification: The LogLikelihood given in the output is shown as the H0 value (what is this exactly?) and the output also gives a scaling factor for MLR – what exactly is this factor? 3) Finally: When using Monte Carlo simulation for MLR, is it based on Halton draws? Many thanks for your efforts! Heike 


1) No, but you could probably compute this by hand using the Mplus output. 2) I am not familiar with he p^2 value; I wonder if you are referring to the McFadden Rsquare, which uses the ratio of ML's loglikelihood values with free slopes for covariates versus slopes fixed at zero. For a description of loglikelihood, Google McFadden's Rsquare or Google likelihood. 3) You say "simulation", but perhaps you mean "integration"? I don't know what Halton draws are, so we probably don't do that. 

Heike Link posted on Thursday, June 06, 2013  9:52 am



Many thanks for the quick reply! But I still do not know what the H0 value of the Loglikelihood means, and what the scaling factor is (see my second question). On my third question: Yes, I meant integration! 


The H0 loglikelihood is the loglikelihood for the estimated model. See Bryant and Satorra (2011) for a description of the scaling correction factor. The link to the paper is at the bottom of the section on the website that describes MLM and MLR Difference testing. 


Hello, I would like to test for a significant change in the R2 of an observed variable in an SEM model. In the first model, the observed metric variable is regressed on four latent factors. In the second model, the observed variable is regressed only on a secondorder factor that is constituted by the four latent factors. Is there a way to test for a significant change in the R2 between the two models? Thanks! 


I don't know of any way for that to be done. 


Dear Drs. Muthen, I am referring to a question asked by sunnyshi about the calculation of R square when using type=random. I have had a look at the explanation in the FAQs as you recommended, but I don't understand it completely. Where do I find the values for beta, the variance of eta2, the variance of zeta1 and the variance of zeta3 in Mplus? Or can I just assume they are equal to one? I have another independent variable in my model would I simply add the beta4 to the equation to calculate the variance of eta3? Thank you so much! 


Which FAQ are you referring to? 


Sorry for not beeing precise: FAQ "Latent variable interactions" 


Take Figure 2 of the FAQ as an example. For this model the correspondence between the notation in the figure and the Mplus output is: beta = eta1 ON eta2 var(eta2) = eta2 variance var(zeta1) = eta1 residual variance etc The variance of eta3 is obtained via (16) where the exact expression depends on whether or not your other independent variable interacts with other ones. If you are not familiar with these expressions, you might need a statistical consultant. 


Thanks a lot, Prof. Muthen, One of my moderating variables is binary. Is there still a way to calculate R sqaure myself? (the variance of the variable and the covariance of the binary variable with the other construct would be needed). 


With a binary observed mediator you can do twogroup analysis as mentioned on page 688 of the V7 UG. The formulas of the FAQ refer to normally distributed variable and would not apply to a binary variable. 


Hello, I have the following model: USEVARIABLES ARE crp_log tnf_log il6_log effort1 noteat1 badslp1 getgoin1 concntr1 asgood1 happy1 hopeful1 enjlife1 blue1 depress1 lonely1 cryspel1 sad1 lffail1 fearful1 age1c race1c gender1 bmi1c sbp1c glucos1c hdl1 chol1 trig1 cig1c walkmn1c alc1c dm031c htnmed1c lipid1c; Model: NEGAFF BY blue1 depress1 lonely1 cryspel1 sad1 lffail1 fearful1; POSS BY asgood1 happy1 hopeful1 enjlife1; SOM BY effort1 badslp1 getgoin1 noteat1 concntr1; crp_log il6_log tnf_log ON SOM POSS NEGAFF age1c race1c gender1 bmi1c sbp1c glucos1c hdl1 chol1 trig1 cig1c walkmn1c alc1c dm031c htnmed1c lipid1c; OUTPUT: Standardized sampstat CINTERVAL; *** ERROR One or more variables have a variance greater than the maximum allowed of 1000000. Check your data and format statement or rescale the variable(s) using the DEFINE command. How can I use the DEFINE command to rescale this log_transformed variable on MPlus. My main dataset is on STATA Thx much Al 


Divide it by a constant that reduces its variance. 


Hello, I am trying to find out how R square is calculated for a DV in a structural equation model when there are multiple, nonorthogonal predictors (i.e., like in multiple regression). I am looking at a model in which, for example, y is predicted only by x1. R square in this case is beta squared (right?). Then I look at a model in which y is predicted by both x1 and x2, which are correlated. The effect of x1 is now different, and there is the additional effect of x2. How is r square of y calculated now? In my case, the effect of x1 increases when x2 is included (and the effect of x2 becomes negative comparing to when it is the single predictor, as in a suppression effect). I expected variance explained to increase, but r square hardly changes. I was wondering why differences in the effects are not reflected also in the variance explained. I assume that knowing how it is calculated would help interpret the r squared values I am getting. Thank you very much in advance, Michal 


With y = beta*x+e beta^2 is the Rsquare only if V(y)=1. The general expression is R2 = explained variance/total variance, where total (y) variance adds the residual variance. Or, 1  standardized residual variance. Explained variance when there are 2 x's is beta1^2*V(x1)+beta2^2*V(x2)+2*beta1*beta2*Cov(x1,x2). 


thank you. this is really helpfull! I have a couple more technical questions and a couple more conceptual, I hope it is not too long: 1. I used only values from the standardized solution (where variances of x1 and x2 are always 1, and their covariance is the correlation). The resulting r^2s matched those in the output, so I assume this was right. Is there meaning to calculating the above with non standardized estimates? 2. If one of the betas is positive and the other negative, then the total R^2 is always smaller than when the two effects are in the same direction. Is this correct? I have one high positive effect and one smaller but significant negative effect. The total R^2 is smaller than it would have been with only the large effect, so I assume it is because the effects are opposite. but is this true? 3. relatedly, in the case of 2x's, is it possible to determine what proportion of total R^2 stems from each x? 4. in the case of 2 x's, would it be correct to say that, hypothetically, if we could have measured only the unique part of x1 (what it doesn't have in common with x2), than the effect of such a measure on y would be beta1? Thanks very much for your help! Michal 


1. either is fine. 24: You will get a fuller response for these general modeling questions on a discussion forum like SEMNET. 


Hello, We are running a path analysis with one predictor at age 0 to three outcomes at age 4, working through two sets of variables at age 2 and age 4. Each set of variables is the same at age 2 and age 4. We would like to know what proportion of the main effect is explained by each of these two pathways. We used the: outcome IND predictor; command and got a total effect, as well as a total indirect effect. One of our predictor to outcome pathways has a larger indirect effect than the total effect. How can we calculate the proportion of the total effect explained by all mediators and then explained by each of the age 2> age 4> outcome pathways? How do we deal with indirect effects that work in different directions, thus creating the scenario where the total indirect effect > total effect. Any advice you might have would be greatly appreciated! All the best, Jenna 


Rsquare cannot be divided up in this way. 


Thank you for your quick response. We are hoping to get an estimate of how strong a mediator is / or by how much of the total variance is explained by the mediators. Can we use the indirect/direct effects given by the model to calculate this? All the best, Jenna 


I would not use the proportion when the indirect effect is larger than the total effect. You may want to discuss on SEMNET. 


Dear Dr. Muthen [This is the model input] MODEL: PA by KPB_m KPE_m; OA by KOAA_m KOAB_m; MA by KMM_m KMF_m; Eng_S by ESS_m ES_m; Eng_S on PA OA MA ; Eng_S on EV_m; KPB_m; KPE_m; KOAA_m; KOAB_m; KMM_m; KMF_m; ESS_m; ES_m; EV_m; [This is the Rsquare value that I found on the Mplus output file] RSQUARE Observed Variable Estimate S.E. Est./S.E. PValue KPB_M 0.368 0.069 5.360 0.000 KPE_M 0.397 0.071 5.554 0.000 KOAA_M 0.690 0.062 11.126 0.000 KOAB_M 0.456 0.057 7.977 0.000 KMM_M 0.389 0.056 6.944 0.000 KMF_M 0.567 0.062 9.190 0.000 ESS_M 0.581 0.045 12.867 0.000 ES_M 0.842 0.035 24.269 0.000 Latent Variable Estimate S.E. Est./S.E. PValue ENG_S 0.886 0.049 18.066 0.000 Based on the output, I know that the total explained variance of ENG_S is 88.6% . However, could you please tell me the way that I calculate how much of the total variance was explained by each of the variables (PA, OA, MA, and EVm)? How can I partition the explained variance in the model? Thank you in advance for your help. 


Rsquare cannot be partitioned because the covariates are correlated. 


Thank you so much for your answer. If then, can I conduct commonality analysis to partition the variance explained? Or, is there any way that I can see how much of the variance explained by each of the variables (PA, MA, OA, and EV_m)? 


You may want to ask these general questions on SEMNET. 

Back to top 