Mplus Discussion >> Variance Explained

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Variance Explained

Mplus Discussion > Structural Equation Modeling >

Message/Author

Robert Arnold posted on Saturday, June 02, 2001 - 11:49 am

Hi. I'd like to know how to obtain the proportion
of variance explained for each equation in a
structural equation model. Can someone fill
me in?

Thanks.

Linda K. Muthen posted on Monday, June 04, 2001 - 8:39 am

If you ask for STANDARDIZED in the OUTPUT command, you will get standardized estimates and also an r-square for each dependent variable in the model.

Anonymous posted on Wednesday, November 28, 2001 - 11:49 am

I'm wondering if there's a way to compare the explained variance (R-square values) between different models. Can something like an R-square change test be used with SEM? Thanks for any suggestions.

Linda K. Muthen posted on Thursday, November 29, 2001 - 8:45 am

I am not aware of any particular literature about R-square change in SEM models. You may get a better response by posting this on SEMNET.

Leigh Roeger posted on Monday, January 28, 2002 - 5:28 pm

Hi,

I have a run a second order factor analysis with a Schmid-Leiman transformation.

There are 20 categorical items. These load to four first order factors and also to one general factor. The correlations between the four first order factors are set to zero(eg orthogonal) and there is no path between the first order factors and the general factor. I specify Stand in the output.

My question is how do I calculate the percentage variance explained by the factors. I have in mind that the general factor should account for a large proportion of the varince explained and minor factors very much less. How do I partition the variance explained from the residual variance for the 20 items that is shown in the output.

Thanks in advance for any advice.

bmuthen posted on Tuesday, January 29, 2002 - 1:12 pm

In this type of model, the general factor influences all items and the specific factors can be seen as residual factors explaining further correlations among subsets of items. Typically, you have all factors uncorrelated. This means that computation of the percentage variance explained by a certain factor is simply using the squared loading times the factor variance. Dividing by the item variance (i.e. y* variance, which is 1 if no covariates), gives the proportion variance in the item explained by the factor in question.

Leigh Roeger posted on Wednesday, January 30, 2002 - 3:57 pm

Thank you for that advice. Can I follow up with two further questions.

First. I am assuming when you say squared loadings times the factor variance you mean the standardised factor loadings? Is this correct?

The second question is more a modelling question. The primary purpose of this analysis is to examine whether the general factor accounts for most of the variation in the items and the specific factors very much less so. I notice that some analysts (eg Gustafsoon) fix the variance estimates of all the latent variables to 1. Do you see some advantage in this or is it a hang over from early LISREL analyses which used correlation matrixes.

Any views most welcome.

Leigh Roeger posted on Wednesday, January 30, 2002 - 4:45 pm

Adding a further question? Should the squared factor loadings times the factor variance in the total model (in my case 1 general factor and four specific factors) add to 100%? It doesnt in my case. Is there a way to calulate this so it does add to 100. That way you can see the proportion of variance accounted for by the general factor and the specific factors.

bmuthen posted on Thursday, January 31, 2002 - 11:51 am

You can use the unstandardized loadings. The variance accounted for by the factor plus the residual variance add up to 100%. With categorical outcomes, you get the residual variance printed in the R-square section of the output when requesting Standardized; the total (y*) variance is 1, unless you have covariates in the model. As with CFA in general, you can set the metric of the factor by fixing its variance to one or by fixing one loading at one. If anything is unclear, please send a full output to support and your percentage calculation can be checked out.

esoofi posted on Tuesday, February 19, 2002 - 2:58 pm

Hi Linda,
Is r-sq shown on page 75 of the User's Guide the
same as the one defined on page 288 of
Bollen (1989) Structural equations with latent variables? If not, please give me thr formula
and the reference.
Ehsan

bmuthen posted on Tuesday, February 19, 2002 - 3:01 pm

Yes, they are the same.

Anonymous posted on Thursday, September 26, 2002 - 4:41 pm

(Reposted by Mplus Discussion webmaster due to forum malfunction.)

Hi,
this might be a very basic question but maybe you can help me? I am working on a sem with six latent constructs, all with categorical indicators.

I am confused about calculating the variance explained by its indicators for a latent variables. Is the variance/residual variance listed in the results section the variance explained? For the endogenous latent variables the variance at the end of the residuals section is the variance explained by the other latent variables, isn't it? How can I calculate the variance explained by the indicators for exogenous and endogenous variables?
thanks!

bmuthen posted on Friday, September 27, 2002 - 9:56 am

When you ask for Standardized solution, you get an output section with R-square values for dependent variables. This gives you the proportion of variance explained in these dependent variables, be they observed or latent.

Robert Arnold posted on Tuesday, May 18, 2004 - 5:21 pm

Hi. I'm wondering how I can use the estimated reliability of variables in SEM under Mplus. For example, in a model I would like to run, the final dependent variable is attitude to non-marital cohabitation. I have two indicators, whose correlation gives me a first, albeit rough, indication of their reliability (about .70). Another endogenous variable is church attendance, for which other evidence suggests a reliability in the neighbourhood of .75.

In some other programs, I could declare how much of the variance for each indicator appeared to be unreliable, and the output would be modified
accordingly. I'd like to know whether I can do this in Mplus, for endogenous and/or for endogenous variables. Can someone fill me in?

Thanks in advance.

Linda K. Muthen posted on Tuesday, May 18, 2004 - 6:57 pm

You can create a latent variable for the observed variable as follows:

f BY y@1;
y@a;

where a is the error variance in y chosen as
a = (1 � reliability) * sample variance.

Anonymous posted on Thursday, May 27, 2004 - 11:47 am

Hi. I'd like to know how to obtain the proportion of variance explained for the total
structural equation model. I believe the software should generate that for me, but just don't know how to tell it to do that. Could someone help me out?

Thanks.

bmuthen posted on Thursday, May 27, 2004 - 1:03 pm

Isn't it sufficient to consider the R-squares for each equation in the model? Particularly since SEM does not aim to explain variance but correlations (covariances). Other readers opinions are welcome.

Anonymous posted on Wednesday, November 10, 2004 - 2:44 pm

Hi there,

I noticed that there are not significance values for the estimated correlation matix for the latent variables. How would I compute significance?

Also, I would like to know the estimated correlation matrix for manifest variables and latent variables. How would I obtain these?

Thanks in advance!

bmuthen posted on Sunday, November 14, 2004 - 11:27 am

To get significance tests for the latent variable correlations, set the metric of the factors by fixing their variances to 1 instead of setting the metric in the loadings.

The estimated correlations between manifest variables and between manifest and latent variables are not given by Mplus but have to be computed from the parameter estimates.

Anonymous posted on Friday, November 19, 2004 - 2:46 pm

Hi there,

Thanks for your message! To follow up....I noticed in the Mplus manual that the correlation between exogenous manifest variables are fixed in the model estimation. We can obtain these correlations with the samstat option. Correct? Also, I read that the covariances among continous latent independent variables and manifest independent variables are free (pertaining to my model's specific case). Thus, does Mplus automatically include these covariances (exogenous latent and manifest variables) in the model estimation and just not show the results in the MODEL RESULTS section. I believe I should not specify these covariances since they are exogenous variables. Is this correct? You mentioned above that I would need to compute the estimated covariances between manifest and latent variables from the parameter estimates. Lets say that I have a model with 3 IVs (2 manifest and 1 latent-for simplicity) predicting one continous latent variable (thus simple regression). How would I compute the covariance between the latent variable and first manifest variable given the information provided by Mplus?

Thanks in advance!

bmuthen posted on Friday, November 19, 2004 - 4:56 pm

Yes on your first question.

Regarding correlations between exogenous manifest and latent variables, I misunderstood your question in your earlier message - I didn't realize you were talking about parameters in an exogeneous part of the model as opposed to more generally how to get correlations between manifest and latent variables . In principle, relations among manifest and latent exogeneous variables should be included in a model, either as correlations or as regressions of the latents on the manifests (assuming as if often the case that the manifests are demographics). If Mplus does not include such relationships in the output, then they are not estimated, and you need to add them to your model. This is different from the case of correlations among exogeneous manifest variables where the correlations are fixed at the sample values because these would be the estimates if they correlation parameters were free to be estimated. Relationships between manifest and latent variables, however, cannot be anticipated from the sample statistics but need to be estimated. Hope that clears it up.

Anonymous posted on Friday, November 26, 2004 - 11:45 am

Hi,

I'm sorry if it's a stupid question but I read the discussion and I still don't understand how you can calculate the explained variance for each factor in a CFA. I'm doing a CFA with 18 binary indicators and 3 factors. When i type output=stand, i get the R-squared for each indicators but not for each factor, ...

just to add, the factor variances are fix to 1

thanks

Linda K. Muthen posted on Friday, November 26, 2004 - 12:24 pm

The factor indicators are dependent variables in the factor analysis model. The factors are independent variables. You obtain r-squares only for dependent variables.

Anonymous posted on Friday, November 26, 2004 - 12:27 pm

so is it possible to have the R-squared for each factor? I mean: do you know how I can calculate it?

thanks

Linda K. Muthen posted on Saturday, November 27, 2004 - 10:26 am

A factor does not have an r-square because it is not a dependent variable in a CFA model. Perhaps you mean the variance contributed by each factor to the observed variable variances. If so, you sum the squared factor loadings from an orthogonal (factors uncorrelated) solution and divide by the number of observed factor indicators.

Anonymous posted on Saturday, November 27, 2004 - 8:11 pm

Hi,

I have a question regarding the consequences of specifying covariances between exogenous latent and manifest varibles. I specified type = missing h1, estimator = ml, and algorithm = integration for my model. I understand that one must specify covariances between exogenous latent and manifest variables. I did this, but when I did the model output produced the covariances between exogenous manifest variables (not specified in code). I read in the Mplus discussion that one should not specify exogenous covariances between manifest variables. I did not specify this in code and Mplus produced this in the Model Results. Is it the case with algorithm=integration that Mplus does not use the sample values? I just want to make sure I am not misspecifing my model.

Also, I read in the manual that one cannot specify missing h1 with algorithm=integration. I specified this and it ran. Can you explain this? Is it really running missing plain then? Further, how should one decide which info matrix to use (e.g., observed, expected, or comb)?

Thanks in advance!

Linda K. Muthen posted on Sunday, November 28, 2004 - 8:44 am

I would have to see your full output to answer your question. Please send it to support@statmodel.com. H1 is on by default with ALGORITHM=INTEGRATION. Use the information matrix that is the default.

Anonymous posted on Monday, November 29, 2004 - 4:54 am

Hi,

Yes i meant the variance contributed by each factor to the observed variable variances.

So I sum "the squared factor loadings from an orthogonal (factors uncorrelated) solution and divide by the number of observed factor indicators". But when i do that, I got 0.66 for the first factor, 0.63 for the second factor and 0.72 for the third factor.
As you can see, if we add the 3 variance contributed by the 3 factors, i get more than 100%!!!
I think there is something wrong, no?

Just to precise: the 3 factors are uncorrelated

Thanks again for your help

Linda K. Muthen posted on Monday, November 29, 2004 - 8:10 am

If I remember correctly, you are doing a CFA. In this case, you should be squaring the standardized factor loadings I believe. In CFA, a covariance matrix is analyzed. In EFA, a correlation matrix is analyzed.

Patrick Malone posted on Friday, June 10, 2005 - 7:54 am

Good morning.

Is it possible to save the r-squared values out to a file? I'm doing some rather elaborate file manipulation and that would be very helpful. I haven't found anything in the SAVEDATA documentation.

Thanks.

Linda K. Muthen posted on Friday, June 10, 2005 - 9:00 am

The RESULTS option of the SAVEDATA command does not currently save r-square values. I will place your request in our suggestion file for the future.

Patrick Malone posted on Friday, June 10, 2005 - 10:48 am

Ok, thanks.

Alicia posted on Monday, June 13, 2005 - 11:40 am

I'm doing GGMM and interested in how much variance I am explaining in the observed variables (V308-V4708 in this analysis).

I specify STANDARDIZED in the output command, but the R-SQUARE section lists only each class followed by a blank line.

Here is my input:

VARIABLE:
NAMES ARE V106 V350 CLGSTAT V308 V1708 V2708 V3708 V4708 C35152 C35153
C35154 C35155 C35156 C35157 C35158 C35159 C35160 C35161
C35162 C35163 C35164 C35165
C35166 C35197 C35199 C35200 C35201 ;
CENSORED = v308(b) v1708(b) v2708(b) v3708(b) v4708(b) C35152(b) C35153(b)
C35154(b) C35155(b) C35156(b) C35157(b) C35158(b) C35159(b)
C35160(b) C35161(b) C35162(b) C35163(b) C35164(b) C35165(b)
C35166(b) C35197(b) C35199(b) C35200(b) C35201(b);
USEVARIABLES ARE v308 v1708 v2708 v3708 v4708 C35152 C35153 C35154
C35155 C35156 C35157 C35158 C35159 C35160 C35161
C35162 C35163 C35164 C35165 C35166 C35197 C35199 C35200
C35201 ;
WEIGHT IS v106;
CLASSES = c (5);
ANALYSIS: TYPE = MIXTURE;
MITERATIONS = 500;

MODEL: %OVERALL%
disorder BY C35152@1
C35153*.7 C35154*.7 C35155*.7 C35156*.7 C35157*.7 C35158*.7
C35159*.7 C35160*.7 C35161*.7 C35162*.7 C35163*.7 C35164*.7
C35165*.7 C35166*.7 C35197*.7 C35199*.7 C35200*.7 C35201*.7;

i s q| v308@0 v1708@1 v2708@2 v3708@3 v4708@4;

%c#1%
[i*5.5];
[s*1];
[q@0];
[disorder];

%c#2%
[i*5.5];
[s*-3];
[q@0];
[disorder];

%c#3%
[i*-3];
[s*3];
[q*-5];

%c#4%
[i*6];
[s@0];
[q@0];
[disorder];

%c#5%
[i*0];
[s@0];
[q@0];
[disorder];

OUTPUT: TECH1 TECH12 RESIDUAL STANDARDIZED;

Linda K. Muthen posted on Tuesday, June 14, 2005 - 9:21 am

This is a support question. Please send your output and license number to support@statmodel.com.

Anonymous posted on Thursday, June 30, 2005 - 10:27 am

I used Mplus with ML to estimate a path model without latent variables. In one model, I had only 1 DV and 5 IV's. The R^2 was .882. In a second model, I had this 1 DV and the same 5 IV's in addition to other variables in the model, some of which predicted two of the IV's mentioned above (with no additional predictors of the DV mentioned above). The R^2 for this same DV in the 2nd model was .802.

Why the difference in R^2 for this one DV when exactly the same predictors were used to predict this DV in both models? The regression coefficients and standard errors relating to this one DV were identical in both models.

BMuthen posted on Saturday, July 02, 2005 - 5:50 pm

The second model that you specify sounds like it is over-identified. This may make the estimated covariance matrix for the initial set of 5 iv's different than it was in the first model. You can see if this is the case by requesting RESIDUAL in the OUTPUT command for the second model and comparing it to the SAMPSTAT values for the five iv's in the first model.

Anonymous posted on Thursday, September 15, 2005 - 12:22 pm

Linda,

Where is the error term for each indicator in the Mplus output? Is it the S.E. column?
Thanks

Linda K. Muthen posted on Thursday, September 15, 2005 - 2:28 pm

Under residual variances. The column S.E. contains the standard errors of the parameter estimates.

yulan han posted on Tuesday, November 08, 2005 - 6:05 pm

Hi,Linda
Could you tell me how to make the error terms of two variables correlated in Mplus?
Thanks.

Linda K. Muthen posted on Saturday, November 12, 2005 - 5:50 pm

y1 WITH y2; where y1 and y2 are dependent variables in the model. See the WITH command in the Mplus User's Guide.

HWard posted on Friday, March 10, 2006 - 9:20 am

I am running models in which a dichotomous 'obesity' variable is the final dependent variable. obesity is regressed onto education, activity, diet, age, and smoking status, with indirect pathways running from education to obesity through activity,smoking, and diet. I have run seperate models for men and women, and found significant parameters associated with obesity for women only - however, the r-square values for the obesity variable is often higher for men than for women. this doesn't make sense to me as none of the variables were significantly related to obesity among men. any thoughts on why this might be happening? education and activity are categorical, diet is a latent varialbe, age in continuous, and smoking is entered into the model as two dummy variables.

Linda K. Muthen posted on Friday, March 10, 2006 - 9:47 am

R-square is variance explained. There may be less variance for one group versus the other and less power for one group versus the other. The obesity variables may behave differently for males versus females.

marina saiz posted on Monday, March 13, 2006 - 2:49 pm

I have the following SEM model
F1 -------------> F2 ---------> F5
F2------------> F4
F3-----------> F4-----------> F5
F3----------------------> F5

One of my thesis questions was that F2, F4, and F4 have a direct effect on F5. F5 is a single indicator latent variable.

When I run the model, all the paths are significant apart from the F4----> F5

When I look at the R-Squared, F2 and F3 explained 88% of the variance of F4, and F2 and F3 explained 94 % of the variance of F5.
Can I explain the non-significant path by the fact that the remaining variance of F4, not being explained by F2 and F3, is too small to add a unique contribution to F5?. Besides, F2 and F3 explain F5 to a great extent (94 %) so it is little variance left for any unique contribution of F4, are my arguments correct?

I have an additional question, if I employ multiple regression, both hierarchical or step-wise, all the observed variables are significant in predicting F5. However, if I run SEM , the variables for F4---> F5 are not significant, what are the possible reasons?

Bengt O. Muthen posted on Tuesday, March 14, 2006 - 5:45 am

The non-significant direct effect of F4 on F5 may be due to F4 being highly correlated with F2 and F3, so that F4 does not contribute much in the F2, F3, F4 prediction of F5.

In your multiple regressions you predict F5 using observed variables. The factors of the SEM don't contain the measurement errors in those observed variables so you get different results. Another explanation could be that your SEM model does not fit the data well.

Johanna Klaus posted on Tuesday, March 14, 2006 - 11:55 am

I want to be able to report the variance explained by each of 5 exogneous variables on other model components. My understanding is that if the exogenous variables are specified to be orthogonal and are the only predictors of a model variable, I can simply square the StdYX. But what if the exogenous variables are correlated, or what if other model variables also predict the variable? Can the StdYX be interpreted as a semipartial correlation?

Bengt O. Muthen posted on Tuesday, March 14, 2006 - 5:46 pm

Why don't you simply report the R-square that you get when requesting the standardized solution?

Johanna Klaus posted on Wednesday, March 15, 2006 - 12:06 pm

Thanks for the quick reply. Actually, I am reporting the R-square values, but I was wondering if it is possible to somehow partial the variance between the exogneous variables, i.e. is there a semipartial correlation equivalent in SEM?

Bengt O. Muthen posted on Thursday, March 16, 2006 - 7:49 am

I think it would be hard to partial out variance explained contributions from correlated predictors.

Amy Bleakley posted on Thursday, January 18, 2007 - 11:48 am

Hi-

I am estimating a non-recursive, multiple group model. The models do not include any latent variables. Two questions:

1. Which method does Mplus use to calculate the r-squared for non-recursive models?

2. For one of the groups, I am obtaining an undefined r squared. The models have excellent fit, and negative variance (with both error variances and the estimated variances) isn't an issue. Do you have any idea what the problem may be? What diagnostics would you suggest?

Thanks!

Linda K. Muthen posted on Friday, January 19, 2007 - 10:24 am

1. Same as for recursive models.

2. Please send the input, data, output, and your license number to support@statmodel.com. Usually this is caused by a negative residual variance

Peggy Clements posted on Tuesday, June 26, 2007 - 12:18 pm

This is a follow up to Bleakley's question posted on 1/18/2007. I'm taking Linda's response to the question to mean that

the R-squares reported in the output for a nonrecursive model represent the proportion of variance explained and I do not have to calculate an alternate R-square using the Bentler & Raykov equation (2000). Bentler & Raykov contend that in nonrecursive models measures of explained variance are not simply summarized by a standard R-square measure because of the reciprocal interdependencies among variables.

Thanks.

Bentler & Raykov (2000). On measures of explained variance in nonrecursive structural equation models. Journal of Applied Psychology, 85, 125-131.

Linda K. Muthen posted on Tuesday, June 26, 2007 - 1:21 pm

Mplus does not provide the Bentler and Raykov R-square for nonrecursive models. If you feel that would be better for your purposes, you would need to calculate that by hand.

Peggy Clements posted on Friday, June 29, 2007 - 12:58 pm

Is calculating the Bentler and Raykov R-square for nonrecursive models something that you and Bengt typically recommend? I don't know whether it would be better for my purposes or not (or even how I would figure that out). Thanks.

Linda K. Muthen posted on Friday, June 29, 2007 - 1:58 pm

We don't know enough about it to recommend it or not. We typically don't work with nonrecursive models. Are the arguments in the article compelling?

Linda K. Muthen posted on Friday, June 29, 2007 - 2:23 pm

An afterthought. If you calculate both, you can see how different they actually are.

Annie Desrosiers posted on Wednesday, July 11, 2007 - 10:54 am

Hi,

I have a question about SEM and explained variance.

I have that king of model:

F1 by y1-y3;
F2 by y4-y6;
F3 by y7-y9 ;
F3 on f1 f2 ;
Dropout ON f2 ;
Dropout ON f2 ;

First, is it possible to have a variable dichotomous (dropout) ON a factor?

In the output, I have an R-square of 0,739 for the variable dropout.
Is that true to tell that 74% of the variance of dropout is explained by the model?

Thank You

Linda K. Muthen posted on Wednesday, July 11, 2007 - 11:23 am

Yes it is possible to regress a dichotomous variable on a factor.

It is the variance explained by the set of covariates that the dependent variable was regressed on not by the model. With a dichotomous outcome, it is the variance explained of the underlying latent variable not the observed variable.

Annie Desrosiers posted on Wednesday, July 11, 2007 - 11:36 am

Is it possible to know if my model explained well the variable dropout ?

Thank You

Linda K. Muthen posted on Wednesday, July 11, 2007 - 5:09 pm

The fit of the full model to the data can be assessed using fit statistics like Chi-square, RMSEA, etc. The variability in dropout is not explained by the model but by the covariates for dropout.

Ricardo Villarreal De Silva posted on Thursday, July 19, 2007 - 10:24 am

I have 2 latent predictors (categorical) and their interaction predicting an observed outcome (continuous)�is there any way to get R-Square?

Linda K. Muthen posted on Thursday, July 19, 2007 - 1:37 pm

R-square has not been defined for this application.

Paul Silvia posted on Thursday, November 29, 2007 - 10:35 am

Hi. A quick question---how does Mplus 5 compute the standard error of R-square? (I've checked the UG and Technical Appendices.)

Bengt O. Muthen posted on Thursday, November 29, 2007 - 8:12 pm

Using the "Delta method" where the variance of R-square is computed from the variances and covariances of the parameter estimates that contributes to R-square. The Version 5 Tech doc about standardizations describe the Delta approach in general.

Susan Seibold-Simpson posted on Tuesday, June 10, 2008 - 9:07 am

Hi Linda and Bengt:
I have tested a model with one exogneous latent construct and controlling for 3 other observed measures. I have been asked by my advisor how much more does the one exogenous latent construct add to the explained variance of the final DVs above and beyond the 3 control variables. After reviewing this thread, I'm not sure if the variance can be parcelled out? I will also review the SEMNET archives to see if I can figure this out. Thanks, Sue

Linda K. Muthen posted on Tuesday, June 10, 2008 - 4:01 pm

We do not know of any way to disentangle this.

Bengt O. Muthen posted on Tuesday, June 10, 2008 - 4:13 pm

The exogeneous factor is presumably correlated with the observed exogeneous control variables so it acts to influence the variance explained not only by itself but also through the control variables, i.e. in a too complex fashion.

Susan Seibold-Simpson posted on Wednesday, June 11, 2008 - 9:52 am

As always, I am most appreciative.
Thank you again. Sue

Richard E. Zinbarg posted on Monday, October 06, 2008 - 2:16 pm

I have run 2 models each containing a total of 14 factors. In the one model, I regressed one of the factors (factor #1) on one of the others (factor #2). In the other model, I regressed factor #1 on both factor #2 and a third factor (factor #3). The R-square for the latent variable factor #1 is larger in the first model (with one fewer predictor) and this makes no sense to me. I have not constrained the measurement model parameter estimates to be equal across the two models, might this have something to do with my nonsensical result? One other strange aspect of the output is that the modification indices for both models include output regarding a measurement model parameter that is already being freely estimated.

Linda K. Muthen posted on Monday, October 06, 2008 - 4:07 pm

I would need more information to answer your question. Please send both outputs and your license number to support@statmodel.com.

jmaslow posted on Friday, December 04, 2009 - 1:57 pm

I have a fairly simple model in which 2 latent exogenous variables predict 2 latent outcome variables. The latent outcomes are correlated with each other, and each has only 1 indicator. I have requested standardized output, but I do not receive r square values for the latent outcome variables, only for each of my observed variables (which are the indicators of the independent and dependent latents). How can I calculate r square in each dependent variable?

Linda K. Muthen posted on Friday, December 04, 2009 - 3:47 pm

Please send your full output and license number to support@statmodel.com.

Selahadin Ibrahim posted on Monday, June 14, 2010 - 7:04 am

Hi Linda,
I am using a second order latent factor (measured by three first order factors) as a mediator between socio-demographic and health variables and an outcome.
after running the hypothesized model the modification indices suggest paths from the socio-demographic and health variables to
the first order factors of the mediating factor and also paths from the first order factors of the mediating factor to the outcome.

does this suggest the first order factors are more important in predicting the outcome than the second order factor? is it suggesting that we use the three first order factors instead of one second order factor?

thanks

Linda K. Muthen posted on Monday, June 14, 2010 - 10:48 am

The modification indices show where freeing parameters will improve model fit. Whether the parameters should be freed should be guided by theory.

Mike Zyphur posted on Thursday, September 16, 2010 - 8:40 pm

Hi Linda and Bengt,
Quick question: relevant to the notion of semi-partial/part correlations, can you think of any way to work directly with residuals without influencing the part of the model influencing a variable Y directly? I cannot.

Setting up a latent variable for residual as

e by Y@1;
Y@0;

is not adequate and I can think of no other way.

Thanks for any help
mike

Linda K. Muthen posted on Friday, September 17, 2010 - 8:34 am

Here's one example we have:

The following approach can be used to obtain the residual for a factor indicator y4 and to regress a variable z on this residual.

MODEL:
f BY y1-y4;
y4res BY; ! define the residual as a factor, picking up the residual variance as the variance of this !factor
y4 ON y4res@1; ! let the residual !influence y4 with the requisite unit !slope
y4@0; ! fix the original residual !variance to zero
z ON y4res;
y4res WITH f@0;

Mike Zyphur posted on Saturday, September 18, 2010 - 5:51 am

Hi Linda,
Thanks for the input and your time. With the y4res model, it seems that in many ways specifying the residual of y4 as a factor y4res is very similar to not having the factor there. For example if we have two models:

Y1 on X1 X2;

versus

Y1 on X1;
Y1@0; Y1res by; Y1 on Y1res@1;
Y1res on X2;

they will return equivalent parameter estimates inasmuch as Y1 is simultaneously regressed on X1 and X2 (through Y1res). I can't think of how to do it, but I want to specify a structural model for the residual after other parts of the model have been estimated--like a semi-partial correlation. With covariances this is easily done as

Y1 on X1;
Y1 with X2;

but extending this to allow regression among the residual and X2, after estimating the effect of X1 on Y1, is tough. A 2 stage procedure is likely required, where one would save the residuals and then use them as data. Thoughts?

Thanks!!
mike

Linda K. Muthen posted on Saturday, September 18, 2010 - 7:52 am

Maybe the dependent variable needs to be a factor with multiple indicators.

Mike Zyphur posted on Sunday, September 19, 2010 - 9:06 am

Sorry, I missed it before, but the solution seems as follows, and does require the logic of latent variables for residuals. Here are a few different types of models. For semi-partial correlations where the goal is r_xy partialing for Z:

Y on Z;
Y with X;

For changes in R^2 in step-wise regression, where a first step contains Z and the second step adds X, we need a semi-partial regression coefficient partialing for Z, which can be done with

X on Z;
X@0; X_res by; X on X_res@1;
X_res on Y;

and we can add additional "steps", for example with a variable W in the third step as

X W on Z;
W on X;
X@0; X_res by; X on X_res@1;
X_res on Y;
W@0; W_res by; W on W_res@1;
W_res on Y;

Then we can compute changes in R^2 in a MODEL CONSTRAINT statement where we divide the residual of Y by the total variance in Y.

For future for all forum goers, see Preacher (2006) for details. Happy modeling!

Preacher, K. J. (2006). Testing complex correlation hypotheses with structural equation models. Structural Equation Modeling, 13, 520-543.

luke fryer posted on Wednesday, September 29, 2010 - 9:32 pm

I would like to ask about a post from 2004:

"Anonymous posted on Thursday, May 27, 2004 - 11:47 am
Hi. I'd like to know how to obtain the proportion of variance explained for the total
structural equation model. I believe the software should generate that for me, but just don't know how to tell it to do that. Could someone help me out?

Thanks."

Dr. Muthen's reply suggested that it would be sufficient to consider the R-Square for each part of the model... I think...

In my case, I am modeling a test of Vocabulary knowledge (second language learners) and in addition to testing the dimensions of vocabulary recall, I wish to determine the proportion of variance (from the entire data set) explained by the working model.

Confirmatory factor analysis with Mplus of the test suggests that our model fits the data well(dichotomous data). But because it is a test, we need to know whether the model sufficiently explains the variance in the data--whether the model as a whole and the factors individually are meaningful with regard to variance explained.

Is there some way to do this within Mplus?

Thank you

luke

Linda K. Muthen posted on Thursday, September 30, 2010 - 5:37 am

Mplus provides R-square for each dependent variable in the model. Model fit assesses the closeness of the H0 to the H1 model.

Gawaian Bodkin-Andrews posted on Sunday, February 13, 2011 - 9:14 pm

Hi,

I've tried to read through most of the posts here, but I haven't quite found what I'm looking for, so please forgive me if it's a repeat.

In a basic SEM, I'm trying to estimate the Variance Explained for each individual path (with multiple paths of course), in addition to the total variance explained (which I know how to get in Mplus). Now, I'm a regular user of LISREL (don't shoot me, I'm trying to convert), and I've found that by simply multiplying the value in the 'Correlation matrix of the ETA' with the BETA path, you get the amount of VE to that predictive path only (this is validated by adding them together and getting the same value as the Sum of Squared Multiple Correlations for the dependent variable).

My question is can Mplus provide output for the equivalent matrix of the 'Correlation matrix of the ETA' provided by LISREL? If not, is there anyway to compute individual path VE's from Mplus?

I only ask because some journals love VE estimates as they think we get too excited about significance for any one path.

Cheers,

Linda K. Muthen posted on Monday, February 14, 2011 - 8:28 am

The TECH4 option of the OUTPUT command will give you the correlations of the factors.

sunnyshi posted on Monday, March 07, 2011 - 8:29 pm

Dear Mplus,
I am running a SEM with a latent interaction term. Although I requested standardized output, due to type=random, the message says" STANDARDIZED (STD, STDY, STDYX) options are not available for TYPE=RANDOM. Under this case, how could I get the R square for each dependent variable? Thanks!

Best

Bengt O. Muthen posted on Monday, March 07, 2011 - 8:31 pm

See our FAQ

"The variance of a dependent variable as a function of latent variables that have an interaction is discussed in Mooijaart and Satorra"

Alicia E Moss posted on Tuesday, March 08, 2011 - 3:39 pm

I was hoping you could help me with a question regarding percent variance accounted for in EFA. I have 3 factors, varimax rotated. I understand that this is calculated by summing the squared loadings for each factor and dividing this by the number of items, but my questions are:

1) Do I zero out the loadings that are under my cutoff criteria (.4) before summing?
2) Do I divide by the total number of initial items entered in the analysis, or by the number of items that were retained?

Bengt O. Muthen posted on Tuesday, March 08, 2011 - 6:32 pm

1. You don't delete any loadings.

2. You use the number of variables on the USEVARIABLES list.

Stephanie Fitzpatrick posted on Wednesday, March 09, 2011 - 8:40 am

Hello,

I have read through this post, but I'm still not sure if my question has actually been answered. I am trying to calculate the variance explained by ENERGY (a latent variable) predicting pewl6m while controlling for TX, BMI, and Age. I know the R^2 provides the proportion of variance explained by the whole model, but I want to know only what ENERGY is contributing. See syntax below.

ENERGY BY MR47-MR56;

ENERGY ON TX;
pewl6m ON ENERGY TX BMI AGE;
BMI ON TX;
BMI;
AGE;

MR50 WITH MR48;
MR54 WITH MR53;

Also, for the R^2 I do not get a p-value in the output. How can I tell if the R^2 is significant?

Thank you.

Linda K. Muthen posted on Wednesday, March 09, 2011 - 10:12 am

We don't give a contribution for each covariate only for the set of covariates.

The current version of Mplus provides p-values for R-square.

Stephanie Fitzpatrick posted on Wednesday, March 09, 2011 - 7:35 pm

Hello again,

Thank you for your response, but is there a way to calculate proportion of variance for a specific predictor using the standardized coefficient or residual variances? Also, I am using Mplus version 6, but the p-values are not showing up for R-square. Do I need to type Standardized in the Output section?

Thanks

Linda K. Muthen posted on Wednesday, March 09, 2011 - 8:16 pm

I am unaware of a way to obtain R-square for one of a set of predictors.

You will not get R-square unless you ask for STANDARDIZED in the OUTPUT command.

Jaimee Heffner posted on Thursday, June 16, 2011 - 1:40 pm

I ran a path model including only observed variables. For the outcome variable, Mplus calculated an R-square of 0.76. This seemed high to me, so I ran a simple multiple regression analysis using the same 4 predictors in SPSS, and I got an R-square of 0.31. Do you know why there would be such a considerable discrepancy?

Linda K. Muthen posted on Thursday, June 16, 2011 - 2:25 pm

It would be impossible to say without seeing both outputs. The most likely reason is that the samples are not the same. If you would like us to look at it, send the two outputs and your license number to support@statmodel.com.

Dave posted on Tuesday, April 24, 2012 - 11:25 am

Hello, I would appreciate your guidance on three points. I have been asked to support including a latent variable in a model predicting a binary outcome after controlling for other variables, or the R-square for one of a set of predictors.

One approach I have experimented with is constraining the prediction for the latent variable to zero, looking at the R-square for the rest of the model and then releasing the constraint and checking for a difference in the R-square value. Ignoring the problem I address next, is this a valid approach and would an observed difference would have meaning?

Second, I recognize that for binary variables Mplus calculates "the variance explained of the underlying latent variable not the observed variable". I think my binary variable does not have an underlying latent variable (it is an either/or variable, e.g. pregnant or not, quit or not). For this type of variable, does McKelvey & Zavoina pseudo R-squared make sense?

Do you have a suggestion for how to figure out something like the change in the model's ability to properly classify cases based on including the latent variable or not? Thanks.

Linda K. Muthen posted on Wednesday, April 25, 2012 - 10:55 am

I would not use R-square for this purpose particularly with categorical variables. In the logistic regression literature, classification quality is used for this type of decision. You can look at the classification quality of the model with the factor and without the factor. You do this as follows.

1. Create an estimated probability for each person's observed 0 or 1.
2. Create a classification for each person based on the rule 0 if estimated probability less than or equal to .5 and 1 if estimated probability is greater than .5
3. For the estimated models, compute the estimated probability for each person and create a classification according to number 2.
4. Compare to classifitions from the two estimated models to the observed scores and see which model has the best classification.

Heike Link posted on Friday, April 27, 2012 - 4:54 am

Hello,

I am rather new to Mplus and have a question:

I have estimated a model which consists of an ordinary logit model and a latent variable part, whereby both parts where estimated jointly by using MLR with Monte Carlo simulation. The structure of my model is below:

Y ON x1-x5 (Y is a binary variable, x1-x5 are continuous predictors)
Y ON LV2
LV1 BY i1 i2 (i1 and i2 are continuous indicators)
x6 x7 ON LV1 (x6, x7 are exogenous variables, continuos)
LV2 BY i3 i4 i5 i6 (i3 � i6 are categorical indicators)
x8 x9 ON LV2 (x8, x9 are exogenous variables, continuous)
LV2 ON LV1.

The model was aimed at providing a better explanation of Y by including latent variables and this worked out well, the parameters are significant and can well be interpreted. However, when I compare the simple Logit part (Y ON x1-x5, estimated separately) with the results of my model above I see on the one hand an increase in Rsquare (it doubles) but a dramatic fall in the LogLikelihood and in AIC and BIC which have not expected.
Of course I have tested the latent variable part separately with CFA using WLSMV and get reasonable fit indices (CFI 0.987, TLI 0.982, RMSE 0.028).
I would be grateful on any explanation on this.

Linda K. Muthen posted on Friday, April 27, 2012 - 9:02 am

The two models have different sets of dependent variables. This is why the loglikelihoods are different. They are in different metrics. They should not be compared.

Dave posted on Friday, April 27, 2012 - 3:19 pm

Hello Linda,

Thank you for your guidance on my questions regarding change in a model's ability to classify cases. I have a couple of follow up questions regarding how to implement your suggested approach.

I have used the guidance on the message boards to figure out how to calculate the probabilities for a model with observed covariates using the material in short course topic 2 and the user's guide (April 2010, p. 440 - Calculating Probabilities From Probit Regression Coefficients). I am stuck on how to calculate the probabilities for individual cases after I include a latent variable in the model. My questions:
1 - Do I use each individual's factor score multiplied by the estimate for the latent variable and add this term to the probability equation?
2 - To test the differences properly does the estimate for the latent variable need to be constrained to zero in the first round of estimates?
3 - Should I calculate probabilities using estimates or standardized estimates (and if standardized estimates are used do I use a combination of different types of standardization based on the type of covariate, e.g., binary: STDY, continuous: STDYX of just one set of estimates)? Thank you in advance for your suggestions.

Linda K. Muthen posted on Saturday, April 28, 2012 - 6:03 pm

See Slides 163 and 164 in the Topic 2 course handout on the website.

Dave posted on Monday, April 30, 2012 - 2:42 pm

Hello Linda,

Thank for pointing me towards slides 163/164 and equations 46/47. I have been going over the slides. Correct me if I have misunderstood the equations, but my take is that they calculate item probabilities for one of the indicators of f1 and a covariate.

I am wondering if it is possible to extend these equations to a model where you have multiple lambas and etas (say a latent variable with three indicators so that you can account for the full latent variable in calculating individual probabilities). If this is possible, what would this equation would look like?

Linda K. Muthen posted on Tuesday, May 01, 2012 - 10:26 am

The model in the example has multiple indicators. We just focus one one. You can do one at a time.

Dave posted on Thursday, May 03, 2012 - 1:02 pm

Hello Linda,

Do equations 46/47 work for creating estimated probabilities for each person's observed 0 or 1 when the two models I am trying to compare (estimator = WLSMV) are like:

Model 1:
F1 by y1 y2 y3;
Binary on F1@0 control1 control2 control3;
(Alternatively, Model 1 could be just the second line: Binary on control1 control2 control3);

and
Model 2:
F1 by y1 y2 y3;
Binary on F1 control1 control2 control3;

I am not seeing how I account for F1 (and y1 y2 y3) in the formulas. Do I have to reformulate the model such that I have a lambda for Binary, and if so what would that look like?

Bengt O. Muthen posted on Saturday, May 05, 2012 - 7:35 am

Binary on F1 is just like having another indicator of F1, so it is correct to use the formulas mentioned above. You don't have to bring in y1-y3 because you condition on F1 and once you do that, y1-y3 don't have further influence on Binary.

Heike Link posted on Tuesday, June 19, 2012 - 6:48 am

Hi,

I have estimated an integrated binary logit and latent variable model by using the MLR estimator, with Monte Carlo simulation for integration. Since the conventional R square does not make sense here, I am wondering whether the R square given in the output is McFaddens Pseudo R square or any other of the available measures.

Many thanks for an answer!

Bengt O. Muthen posted on Tuesday, June 19, 2012 - 7:48 am

The R-square is for the continuous latent response variable underlying the binary DV. This is in line with the Zavoina-McKelvey article and the Snijder-Boskers multilevel book.

Sarah Parsons posted on Saturday, July 07, 2012 - 11:18 am

Hello,

Can you please explain how the standard error and p-values for R-square are calculated in Mplus?

I am running a linear regression model on observed variables in a complex data set (students clustered in classrooms). My understanding has always been that R square is tested by F = (R-square/k)/((1-R-square)/(N-k-1)). I am not sure how to interpret the standard error associated with R square in the Mplus output and how it is used in calculating the p-value.

Thank you for your help.

Linda K. Muthen posted on Monday, July 09, 2012 - 11:18 am

We use a z-test instead of an F test. The standard error is computed using the Delta method.

Utkun Ozdil posted on Monday, August 13, 2012 - 8:52 pm

Hi,,
I'm estimating a two-level model with student- and calssroom-level predictors regressed on three dependent variables. When I typed STANDARDIZED in the OUTPUT command to get the explained variances, for the student-level model including only level 1 predictors Mplus output gave me residual variances at within, variances at the between, and R-squares at the within part of the model for each of the three dependent variables. For the full model in which I included both student- and classroom-level predictors the output gave me residual variances and R-squares at within and between levels for each of the three dependent variables.
My question is that instead of getting these values for each dependent variable, as models generally can explain variance through fixed effects as one single R-square term per model, is there a way to get a single variance term via Mplus per the student model and the full model?
Thanks.

Linda K. Muthen posted on Tuesday, August 14, 2012 - 12:07 pm

We don't give an R-square for dependent variables together.

Utkun Ozdil posted on Tuesday, August 14, 2012 - 12:35 pm

Linda, in the output of the within-classroom model where I included only gender and ses as student-level predictors,I got these R-squares for Math1, Math2, and Math3. Should I have estimated separate covariate models for Math1, Math2, and Math3 so that Mplus output would direct me towards a single R-square at the within-level for each?

TITLE: Within-Classroom Model
DATA: FILE IS wcm.dat;
VARIABLE:NAMES ARE class gender ses Math1 Math2 Math3;
USEVARIABLES ARE gender ses Math1 Math2 Math3;
WITHIN= gender ses;
CENTERING = GRANDMEAN (gender ses);
CLUSTER IS class;
MODEL: %WITHIN%
Math1 ON gender ses;
Math2 ON gender ses;
Math3 ON gender ses;
%BETWEEN%
Math1;
Math2;
Math3;
ANALYSIS: TYPE IS TWOLEVEL;
OUTPUT: SAMPSTAT STANDARDIZED;
Thanks.

Linda K. Muthen posted on Tuesday, August 14, 2012 - 1:46 pm

Please send the output and your license number to support@statmodel.com. I can't understand what you mean.

Stace Swayne posted on Saturday, December 01, 2012 - 9:23 am

Dear Dr. Muthen,

I ran the following model

f1 by x1 x2 x3 x4 x5;
f2 by v1 v2 v3 v4 v5 v6;
f3 by v7 v8 v9 v10 v11;

f1 ON f2 f3;

My question: I want to find out whether f2 or f3 is a better predictor of f1 (variance explained).

Is it possible to get this from the Mplus output?

Thanks

Bengt O. Muthen posted on Saturday, December 01, 2012 - 9:32 am

You can compare the STD coefficients. You can't divide up the explained variance because f2 and f3 are correlated.

Tyler Mason posted on Monday, April 22, 2013 - 12:16 pm

I ran an SEM model in Mplus and asked for the R2 values. Do I interpret the R2 values as the model explaining % of variance in the DV? or do the R2 values means the % ecplained by the last mediator/IV?

Thanks!

Linda K. Muthen posted on Monday, April 22, 2013 - 12:23 pm

Each R-square value is the variance explained for the dependent variables by all covariatess in the model for that dependent variable. For the following regression,

y ON x1 x2 x3;

R-square is the variance in y explained by x1, x2, and x3.

Heike Link posted on Thursday, June 06, 2013 - 1:30 am

Hello,

Some time ago I have performed an analysis with Mplus and have estimated a binary logit model with latent variables (a MIMIC part), estimated jointly by using MLR with Monte Carlo simulation. The structure of my model is:

Now I have received several comments and requests from reviewers of a paper and I would be very grateful for some advise on the following issues.

1) I was requested to give further measures for Pseudo R squares in addition to the Zavoina McKelvey one. Is this possible in Mplus?
2) The reviewers recommended to use the p^2 value defined as 1-(L(β)/L(0)) where L(β) is the LogLikelihood of the final model at convergence and L(0) is the LogLikelihood of a model only containing the alternative-specific constant. How can I get this in the Mplus output? And for clarification: The LogLikelihood given in the output is shown as the H0 value (what is this exactly?) and the output also gives a scaling factor for MLR � what exactly is this factor?
3) Finally: When using Monte Carlo simulation for MLR, is it based on Halton draws?

Many thanks for your efforts!

Heike

Bengt O. Muthen posted on Thursday, June 06, 2013 - 7:42 am

1) No, but you could probably compute this by hand using the Mplus output.

2) I am not familiar with he p^2 value; I wonder if you are referring to the McFadden R-square, which uses the ratio of ML's loglikelihood values with free slopes for covariates versus slopes fixed at zero. For a description of loglikelihood, Google McFadden's R-square or Google likelihood.

3) You say "simulation", but perhaps you mean "integration"? I don't know what Halton draws are, so we probably don't do that.

Heike Link posted on Thursday, June 06, 2013 - 9:52 am

Many thanks for the quick reply!

But I still do not know what the H0 value of the Loglikelihood means, and what the scaling factor is (see my second question).

On my third question: Yes, I meant integration!

Linda K. Muthen posted on Thursday, June 06, 2013 - 11:37 am

The H0 loglikelihood is the loglikelihood for the estimated model. See Bryant and Satorra (2011) for a description of the scaling correction factor. The link to the paper is at the bottom of the section on the website that describes MLM and MLR Difference testing.

Tobias Stark posted on Wednesday, April 02, 2014 - 12:11 pm

Hello,

I would like to test for a significant change in the R2 of an observed variable in an SEM model.

In the first model, the observed metric variable is regressed on four latent factors. In the second model, the observed variable is regressed only on a second-order factor that is constituted by the four latent factors. Is there a way to test for a significant change in the R2 between the two models?

Thanks!

Linda K. Muthen posted on Wednesday, April 02, 2014 - 3:17 pm

I don't know of any way for that to be done.

Martin Heinberg posted on Thursday, December 11, 2014 - 9:03 am

Dear Drs. Muthen,

I am referring to a question asked by sunnyshi about the calculation of R square when using type=random.
I have had a look at the explanation in the FAQs as you recommended, but I don't understand it completely.
Where do I find the values for beta, the variance of eta2, the variance of zeta1 and the variance of zeta3 in Mplus? Or can I just assume they are equal to one?

I have another independent variable in my model would I simply add the beta4 to the equation to calculate the variance of eta3?

Thank you so much!

Bengt O. Muthen posted on Thursday, December 11, 2014 - 6:28 pm

Which FAQ are you referring to?

Martin Heinberg posted on Friday, December 12, 2014 - 5:47 am

Sorry for not beeing precise: FAQ "Latent variable interactions"

Bengt O. Muthen posted on Friday, December 12, 2014 - 9:33 am

Take Figure 2 of the FAQ as an example. For this model the correspondence between the notation in the figure and the Mplus output is:

beta = eta1 ON eta2

var(eta2) = eta2 variance

var(zeta1) = eta1 residual variance

etc

The variance of eta3 is obtained via (16) where the exact expression depends on whether or not your other independent variable interacts with other ones.

If you are not familiar with these expressions, you might need a statistical consultant.

Martin Heinberg posted on Saturday, December 13, 2014 - 5:46 am

Thanks a lot, Prof. Muthen,

One of my moderating variables is binary. Is there still a way to calculate R sqaure myself? (the variance of the variable and the covariance of the binary variable with the other construct would be needed).

Bengt O. Muthen posted on Saturday, December 13, 2014 - 5:45 pm

With a binary observed mediator you can do two-group analysis as mentioned on page 688 of the V7 UG. The formulas of the FAQ refer to normally distributed variable and would not apply to a binary variable.

Alvaro Camacho posted on Sunday, June 28, 2015 - 7:00 pm

Hello,
I have the following model:

USEVARIABLES ARE crp_log tnf_log il6_log
effort1 noteat1 badslp1 getgoin1 concntr1
asgood1 happy1 hopeful1 enjlife1
blue1 depress1 lonely1 cryspel1 sad1 lffail1 fearful1
age1c race1c gender1
bmi1c sbp1c glucos1c hdl1 chol1 trig1
cig1c walkmn1c alc1c
dm031c htnmed1c lipid1c;

Model:
NEGAFF BY blue1 depress1 lonely1 cryspel1 sad1 lffail1 fearful1;
POSS BY asgood1 happy1 hopeful1 enjlife1;
SOM BY effort1 badslp1 getgoin1 noteat1 concntr1;
crp_log il6_log tnf_log ON SOM POSS NEGAFF
age1c race1c gender1
bmi1c sbp1c glucos1c hdl1 chol1 trig1
cig1c walkmn1c alc1c
dm031c htnmed1c lipid1c;

OUTPUT:
Standardized sampstat CINTERVAL;

*** ERROR
One or more variables have a variance greater than the maximum allowed of 1000000.
Check your data and format statement or rescale the variable(s) using the DEFINE command.

How can I use the DEFINE command to rescale this log_transformed variable on MPlus. My main dataset is on STATA
Thx much
Al

Linda K. Muthen posted on Monday, June 29, 2015 - 6:16 am

Divide it by a constant that reduces its variance.

Michal Berkowitz posted on Monday, August 10, 2015 - 2:22 pm

Hello,

I am trying to find out how R square is calculated for a DV in a structural equation model when there are multiple, non-orthogonal predictors (i.e., like in multiple regression). I am looking at a model in which, for example, y is predicted only by x1. R square in this case is beta squared (right?). Then I look at a model in which y is predicted by both x1 and x2, which are correlated. The effect of x1 is now different, and there is the additional effect of x2. How is r square of y calculated now? In my case, the effect of x1 increases when x2 is included (and the effect of x2 becomes negative comparing to when it is the single predictor, as in a suppression effect). I expected variance explained to increase, but r square hardly changes. I was wondering why differences in the effects are not reflected also in the variance explained. I assume that knowing how it is calculated would help interpret the r squared values I am getting.

Thank you very much in advance,
Michal

Bengt O. Muthen posted on Tuesday, August 11, 2015 - 1:55 pm

With

y = beta*x+e

beta^2 is the R-square only if V(y)=1. The general expression is

R-2 = explained variance/total variance,

where total (y) variance adds the residual variance.

Or, 1 - standardized residual variance.

Explained variance when there are 2 x's is

beta1^2*V(x1)+beta2^2*V(x2)+2*beta1*beta2*Cov(x1,x2).

Michal Berkowitz posted on Wednesday, August 12, 2015 - 4:32 am

thank you. this is really helpfull! I have a couple more technical questions and a couple more conceptual, I hope it is not too long:

1. I used only values from the standardized solution (where variances of x1 and x2 are always 1, and their covariance is the correlation). The resulting r^2s matched those in the output, so I assume this was right. Is there meaning to calculating the above with non standardized estimates?

2. If one of the betas is positive and the other negative, then the total R^2 is always smaller than when the two effects are in the same direction. Is this correct? I have one high positive effect and one smaller but significant negative effect. The total R^2 is smaller than it would have been with only the large effect, so I assume it is because the effects are opposite. but is this true?

3. relatedly, in the case of 2x's, is it possible to determine what proportion of total R^2 stems from each x?

4. in the case of 2 x's, would it be correct to say that, hypothetically, if we could have measured only the unique part of x1 (what it doesn't have in common with x2), than the effect of such a measure on y would be beta1?

Thanks very much for your help!

Michal

Bengt O. Muthen posted on Wednesday, August 12, 2015 - 4:38 pm

1. either is fine.

2-4: You will get a fuller response for these general modeling questions on a discussion forum like SEMNET.

Jenna E Finch posted on Thursday, October 29, 2015 - 6:46 pm

Hello,

We are running a path analysis with one predictor at age 0 to three outcomes at age 4, working through two sets of variables at age 2 and age 4. Each set of variables is the same at age 2 and age 4.

We would like to know what proportion of the main effect is explained by each of these two pathways.

We used the:
outcome IND predictor;
command and got a total effect, as well as a total indirect effect.

One of our predictor to outcome pathways has a larger indirect effect than the total effect. How can we calculate the proportion of the total effect explained by all mediators and then explained by each of the age 2--> age 4--> outcome pathways? How do we deal with indirect effects that work in different directions, thus creating the scenario where the total indirect effect > total effect.

Any advice you might have would be greatly appreciated!

All the best,
Jenna

Linda K. Muthen posted on Friday, October 30, 2015 - 11:12 am

R-square cannot be divided up in this way.

Jenna E Finch posted on Friday, October 30, 2015 - 2:30 pm

Thank you for your quick response. We are hoping to get an estimate of how strong a mediator is / or by how much of the total variance is explained by the mediators. Can we use the indirect/direct effects given by the model to calculate this?

All the best,
Jenna

Bengt O. Muthen posted on Friday, October 30, 2015 - 4:49 pm

I would not use the proportion when the indirect effect is larger than the total effect. You may want to discuss on SEMNET.

Bae, Han Suk posted on Monday, October 10, 2016 - 1:33 am

Dear Dr. Muthen

[This is the model input]

MODEL:
PA by KPB_m KPE_m;
OA by KOAA_m KOAB_m;
MA by KMM_m KMF_m;
Eng_S by ESS_m ES_m;

Eng_S on PA OA MA ;
Eng_S on EV_m;

KPB_m; KPE_m; KOAA_m; KOAB_m; KMM_m; KMF_m; ESS_m; ES_m; EV_m;

[This is the R-square value that I found on the Mplus output file]

R-SQUARE

Observed
Variable Estimate S.E. Est./S.E. P-Value

KPB_M 0.368 0.069 5.360 0.000
KPE_M 0.397 0.071 5.554 0.000
KOAA_M 0.690 0.062 11.126 0.000
KOAB_M 0.456 0.057 7.977 0.000
KMM_M 0.389 0.056 6.944 0.000
KMF_M 0.567 0.062 9.190 0.000
ESS_M 0.581 0.045 12.867 0.000
ES_M 0.842 0.035 24.269 0.000

Latent
Variable Estimate S.E. Est./S.E. P-Value
ENG_S 0.886 0.049 18.066 0.000

Based on the output, I know that the total explained variance of ENG_S is 88.6% .

However, could you please tell me the way that I calculate how much of the total variance was explained by each of the variables (PA, OA, MA, and EV-m)? How can I partition the explained variance in the model?

Thank you in advance for your help.

Linda K. Muthen posted on Monday, October 10, 2016 - 10:55 am

R-square cannot be partitioned because the covariates are correlated.

Bae, Han Suk posted on Monday, October 10, 2016 - 6:57 pm

Thank you so much for your answer.

If then, can I conduct commonality analysis to partition the variance explained?

Or, is there any way that I can see how much of the variance explained by each of the variables (PA, MA, OA, and EV_m)?

Bengt O. Muthen posted on Tuesday, October 11, 2016 - 11:58 am

You may want to ask these general questions on SEMNET.

Robert Allen King posted on Saturday, February 04, 2017 - 12:04 pm

The standardized output provides an R-Square estimate of the latent, dependent variable. Is this estimate equivalent to the total amount of variance that this dependent variable accounts for (including the independent variables, the mediators, and the control variables)?

Linda K. Muthen posted on Sunday, February 05, 2017 - 6:56 am

The R-square is for all variables that the dependent variable was regressed on.

s9901470 posted on Tuesday, February 21, 2017 - 3:37 am

I am trying to get R-square for each level in a cross-classified growth model. When I asked for standardized results, it says:

STANDARDIZED (STD, STDY, STDYX) options for TYPE=RANDOM are not available with
ESTIMATOR=BAYES.

How can I get something broadly equivalent to variance explained at each level?

Thank you, in advance

Bengt O. Muthen posted on Tuesday, February 21, 2017 - 1:55 pm

Do you need to use Type=Random? This is for random slopes. If not, you get standardized.

Mplus Version 8 will give standardized also with random slopes.

s9901470 posted on Wednesday, February 22, 2017 - 3:31 am

Yes, I need TYPE=CROSSCLASSIFIED RANDOM. Do I have to wait until version 8 or is there anything else that can indicate variance explained at each level?

Bengt O. Muthen posted on Wednesday, February 22, 2017 - 12:13 pm

Standardizing a model with random slopes requires special attention. See the 2016 article in Psych Methods by Schuurman et al "How to compare...".

Connor Jones posted on Sunday, July 02, 2017 - 5:16 pm

Hello,

I have a simple regression model with three observed variables predicting one DV. I would like to determine whether creating a latent factor made up of these predictors improves prediction of said DV.

As expected, the latent model results in an increase in explained variance (R2) of the DV, but I do not know whether this increase is significant.

Do you know of any ways MPlus can be used to compare latent and non-latent models for improvement in prediction?

Thank you!

Bengt O. Muthen posted on Sunday, July 02, 2017 - 5:24 pm

I don't know.

Jennifer Hepditch posted on Monday, March 12, 2018 - 12:12 pm

Hi there
Is there a way to tell if an interaction explains a SIGNIFICANT amount of variance ie the p value (I know the amount of variance explained)?

To get the variance explained I compared a model with the interaction set @0 and one where all was free and then just subtracted the standardized residual variances (since that is essentially the R2 change right)? Model is saturated because I am running a multivariate multiple regression using Mplus due to nesting.

Model is as follows:
Analysis:TYPE = COMPLEX;
ESTIMATOR = MLR;

Model:
DV1 ON CoV1
CoV2
x
m
xm;

DV2 ON CoV1
CoV2
x
m
xm; !This is interaction I want to know
!if adds significant variance to the
!model

CoV1 WITH CoV2 x m xm;
CoV2 WITH x m xm;
x WITH m xm;
m WITH xm;

OUTPUT: TECH1 Tech2 Tech3 STDYX Tech4;

Bengt O. Muthen posted on Monday, March 12, 2018 - 3:27 pm

There is not a simple way to do this. I think it is sufficient to check if the slope for the xm term is significant.

VALERIA IVANIUSHINA posted on Thursday, March 15, 2018 - 7:40 am

Hi Linda and Bengt:
I am running a model with a continuous dependent variable Exam and several predictors (observed and latent).
The model has excellent fit (chi-squared p value = 0.19, other indices just perfect)

Standardized solution output shows that R-square for Exam = 0.17 (P = 0.016)
But none of the paths from predictors to Exam is significant.

R square is the variance explained by the set of covariates that the dependent variable was regressed on, but I can see that all regression coefficients are non-significant.
How is it possible? Could it happen because my model is wrong?

MODEL:
External BY Egoal2 Egoal3 Egoal4 Egoal5;
SelfEff BY SE1 SE3 SE5;
External WITH SelfEff;
External SelfEff ON gender EGEsum GPA;
GrG1 ON External SelfEff GPA;
Exam ON GrG1 External SelfEff GPA EGEsum;

Bengt O. Muthen posted on Thursday, March 15, 2018 - 4:42 pm

The fit can be good while having non-significant relations. This can happen with low N, low correlations, or both.

You may want to discuss this further on a general list like SEMNET.

Amber Fahey posted on Thursday, June 07, 2018 - 11:56 am

Hello, A Co-author wants me to report R squared for my distal models but the output doesn't produce that since the model includes random intercepts and slopes. I can compute R2 with total variance-residual var. /total variance but unlike my conditional model, the distal model output only provides residual variance. Is there a way to request total variance? Here is my syntax Dr. Linda Muthen helped me with previously:

ANALYSIS: TYPE = TWOLEVEL RANDOM;
MODEL: %WITHIN%
s | OLOG ON DAY;
%BETWEEN%
Folog by olog;
olog@0;
[folog@0];
FOLOG S ON day2inpt AGE FIMTOTA TFCDays
EduYears Sex Racec Emppre
PTA1;
SRS ON FOLOG S day2inpt AGE FIMTOTA TFCDays
EduYears Sex Racec Emppre
PTA1;
SWLST ON FOLOG S day2inpt AGE FIMTOTA TFCDays
EduYears Sex Racec Emppre
PTA1;
FIMfu ON FOLOG S day2inpt AGE FIMTOTA TFCDays
EduYears Sex Racec Emppre
PTA1;
DRSPI ON FOLOG S day2inpt AGE FIMTOTA TFCDays
EduYears Sex Racec Emppre
PTA1;
PARSum ON FOLOG S day2inpt AGE FIMTOTA TFCDays
EduYears Sex Racec Emppre
PTA1;
fOLOG WITH S;

Bengt O. Muthen posted on Thursday, June 07, 2018 - 5:58 pm

Bayes gives you R-square also with random slopes. The background is given in the Schuurman et al (2016) Psych Methods paper.

Kerry Lee posted on Thursday, November 07, 2019 - 2:13 am

Dear Prof Muthen,

Using BAYES to run a simple regression with one manifest predictor and one latent criterion, I obtained a non-significant regression beta but a R-squared with a one-tailed p-value below .001. With just one predictor, I am not sure what to make of this.

On reflection, the Bayes one-tailed p-value refers to the proportion of estimated values below zero (when the estimate is positive). Am I correct that R-squared cannot be negative, thus its associated p-value will always be below .001? Does this also affect the 95% Bayesian C.I.? If so, how does one decide if a R-squared is significant?

Sincerely,
Kerry.

Tihomir Asparouhov posted on Thursday, November 07, 2019 - 5:20 pm

This is correct. The R2 is always positive - the R2 entire posterior distribution is always positive, i.e., the p-Value is always 0. The significance of the R2 as a statistical hypothesis has the same meaning as the significance of all the predictors. Since you have just one, that is the significance of the beta. In the upcoming Mplus 8.4 you will be able to use Wald test for multiple predictors. So generally I would not recommend using the R2 scale for that hypothesis as it would unnecessarily create a problem. If for some reason you decide to use it anyway you can use the traditional ML approach where you divide point estimate/standard error and you decide the significance if this is >1.96. ML and Bayes are asymptotically equivalent so this is a valid approach as well but I would still not recommend it since a more direct approach is available.

Kerry Lee posted on Tuesday, November 19, 2019 - 12:52 am

Following up on my earlier post of Nov 7, I experimented with another manifest predictor and the same latent criterion. Using a ML estimator, the regression beta was significant but the R^2 was not. The estimate was as expected (R^2 estimate = regression beta^2), however, the S.E. were different. With just one predictor should they not be the same? Thinking it may have to do with the latent variable, I also tried another model using a single manifest criterion but the S.E. for the beta and the R^2 still did not agree. Grateful if you could let me know the reason for the difference.

My last question is that in the case of a regression with multiple predictors (using ML), can the p-value of the R^2 be relied on as a test of the significance of the overall explanatory model? Same as one would in, say SPSS.

Bengt O. Muthen posted on Wednesday, November 20, 2019 - 4:19 pm

R2 is not = beta^2 because there is also a residual variance involved. Also, the sampling distribution of R2 is different than for beta and is probably more non-normal so may need either bootstrap or Bayes CIs to be correctly evaluated for significance.

A simpler approach is using Model Test with all beta's = 0. This is available for both ML and now in Mplus version 8.4 also for Bayes.