I have a run a second order factor analysis with a Schmid-Leiman transformation.
There are 20 categorical items. These load to four first order factors and also to one general factor. The correlations between the four first order factors are set to zero(eg orthogonal) and there is no path between the first order factors and the general factor. I specify Stand in the output.
My question is how do I calculate the percentage variance explained by the factors. I have in mind that the general factor should account for a large proportion of the varince explained and minor factors very much less. How do I partition the variance explained from the residual variance for the 20 items that is shown in the output.
Thanks in advance for any advice.
bmuthen posted on Tuesday, January 29, 2002 - 1:12 pm
In this type of model, the general factor influences all items and the specific factors can be seen as residual factors explaining further correlations among subsets of items. Typically, you have all factors uncorrelated. This means that computation of the percentage variance explained by a certain factor is simply using the squared loading times the factor variance. Dividing by the item variance (i.e. y* variance, which is 1 if no covariates), gives the proportion variance in the item explained by the factor in question.
Leigh Roeger posted on Wednesday, January 30, 2002 - 3:57 pm
Thank you for that advice. Can I follow up with two further questions.
First. I am assuming when you say squared loadings times the factor variance you mean the standardised factor loadings? Is this correct?
The second question is more a modelling question. The primary purpose of this analysis is to examine whether the general factor accounts for most of the variation in the items and the specific factors very much less so. I notice that some analysts (eg Gustafsoon) fix the variance estimates of all the latent variables to 1. Do you see some advantage in this or is it a hang over from early LISREL analyses which used correlation matrixes.
Adding a further question? Should the squared factor loadings times the factor variance in the total model (in my case 1 general factor and four specific factors) add to 100%? It doesnt in my case. Is there a way to calulate this so it does add to 100. That way you can see the proportion of variance accounted for by the general factor and the specific factors.
bmuthen posted on Thursday, January 31, 2002 - 11:51 am
You can use the unstandardized loadings. The variance accounted for by the factor plus the residual variance add up to 100%. With categorical outcomes, you get the residual variance printed in the R-square section of the output when requesting Standardized; the total (y*) variance is 1, unless you have covariates in the model. As with CFA in general, you can set the metric of the factor by fixing its variance to one or by fixing one loading at one. If anything is unclear, please send a full output to support and your percentage calculation can be checked out.
esoofi posted on Tuesday, February 19, 2002 - 2:58 pm
Hi Linda, Is r-sq shown on page 75 of the User's Guide the same as the one defined on page 288 of Bollen (1989) Structural equations with latent variables? If not, please give me thr formula and the reference. Ehsan
bmuthen posted on Tuesday, February 19, 2002 - 3:01 pm
Yes, they are the same.
Anonymous posted on Thursday, September 26, 2002 - 4:41 pm
(Reposted by Mplus Discussion webmaster due to forum malfunction.)
Hi, this might be a very basic question but maybe you can help me? I am working on a sem with six latent constructs, all with categorical indicators.
I am confused about calculating the variance explained by its indicators for a latent variables. Is the variance/residual variance listed in the results section the variance explained? For the endogenous latent variables the variance at the end of the residuals section is the variance explained by the other latent variables, isn't it? How can I calculate the variance explained by the indicators for exogenous and endogenous variables? thanks!
bmuthen posted on Friday, September 27, 2002 - 9:56 am
When you ask for Standardized solution, you get an output section with R-square values for dependent variables. This gives you the proportion of variance explained in these dependent variables, be they observed or latent.
Hi. I'm wondering how I can use the estimated reliability of variables in SEM under Mplus. For example, in a model I would like to run, the final dependent variable is attitude to non-marital cohabitation. I have two indicators, whose correlation gives me a first, albeit rough, indication of their reliability (about .70). Another endogenous variable is church attendance, for which other evidence suggests a reliability in the neighbourhood of .75.
In some other programs, I could declare how much of the variance for each indicator appeared to be unreliable, and the output would be modified accordingly. I'd like to know whether I can do this in Mplus, for endogenous and/or for endogenous variables. Can someone fill me in?
where a is the error variance in y chosen as a = (1 reliability) * sample variance.
Anonymous posted on Thursday, May 27, 2004 - 11:47 am
Hi. I'd like to know how to obtain the proportion of variance explained for the total structural equation model. I believe the software should generate that for me, but just don't know how to tell it to do that. Could someone help me out?
bmuthen posted on Thursday, May 27, 2004 - 1:03 pm
Isn't it sufficient to consider the R-squares for each equation in the model? Particularly since SEM does not aim to explain variance but correlations (covariances). Other readers opinions are welcome.
Anonymous posted on Wednesday, November 10, 2004 - 2:44 pm
I noticed that there are not significance values for the estimated correlation matix for the latent variables. How would I compute significance?
Also, I would like to know the estimated correlation matrix for manifest variables and latent variables. How would I obtain these?
Thanks in advance!
bmuthen posted on Sunday, November 14, 2004 - 11:27 am
To get significance tests for the latent variable correlations, set the metric of the factors by fixing their variances to 1 instead of setting the metric in the loadings.
The estimated correlations between manifest variables and between manifest and latent variables are not given by Mplus but have to be computed from the parameter estimates.
Anonymous posted on Friday, November 19, 2004 - 2:46 pm
Thanks for your message! To follow up....I noticed in the Mplus manual that the correlation between exogenous manifest variables are fixed in the model estimation. We can obtain these correlations with the samstat option. Correct? Also, I read that the covariances among continous latent independent variables and manifest independent variables are free (pertaining to my model's specific case). Thus, does Mplus automatically include these covariances (exogenous latent and manifest variables) in the model estimation and just not show the results in the MODEL RESULTS section. I believe I should not specify these covariances since they are exogenous variables. Is this correct? You mentioned above that I would need to compute the estimated covariances between manifest and latent variables from the parameter estimates. Lets say that I have a model with 3 IVs (2 manifest and 1 latent-for simplicity) predicting one continous latent variable (thus simple regression). How would I compute the covariance between the latent variable and first manifest variable given the information provided by Mplus?
Thanks in advance!
bmuthen posted on Friday, November 19, 2004 - 4:56 pm
Yes on your first question.
Regarding correlations between exogenous manifest and latent variables, I misunderstood your question in your earlier message - I didn't realize you were talking about parameters in an exogeneous part of the model as opposed to more generally how to get correlations between manifest and latent variables . In principle, relations among manifest and latent exogeneous variables should be included in a model, either as correlations or as regressions of the latents on the manifests (assuming as if often the case that the manifests are demographics). If Mplus does not include such relationships in the output, then they are not estimated, and you need to add them to your model. This is different from the case of correlations among exogeneous manifest variables where the correlations are fixed at the sample values because these would be the estimates if they correlation parameters were free to be estimated. Relationships between manifest and latent variables, however, cannot be anticipated from the sample statistics but need to be estimated. Hope that clears it up.
Anonymous posted on Friday, November 26, 2004 - 11:45 am
I'm sorry if it's a stupid question but I read the discussion and I still don't understand how you can calculate the explained variance for each factor in a CFA. I'm doing a CFA with 18 binary indicators and 3 factors. When i type output=stand, i get the R-squared for each indicators but not for each factor, ...
A factor does not have an r-square because it is not a dependent variable in a CFA model. Perhaps you mean the variance contributed by each factor to the observed variable variances. If so, you sum the squared factor loadings from an orthogonal (factors uncorrelated) solution and divide by the number of observed factor indicators.
Anonymous posted on Saturday, November 27, 2004 - 8:11 pm
I have a question regarding the consequences of specifying covariances between exogenous latent and manifest varibles. I specified type = missing h1, estimator = ml, and algorithm = integration for my model. I understand that one must specify covariances between exogenous latent and manifest variables. I did this, but when I did the model output produced the covariances between exogenous manifest variables (not specified in code). I read in the Mplus discussion that one should not specify exogenous covariances between manifest variables. I did not specify this in code and Mplus produced this in the Model Results. Is it the case with algorithm=integration that Mplus does not use the sample values? I just want to make sure I am not misspecifing my model.
Also, I read in the manual that one cannot specify missing h1 with algorithm=integration. I specified this and it ran. Can you explain this? Is it really running missing plain then? Further, how should one decide which info matrix to use (e.g., observed, expected, or comb)?
I would have to see your full output to answer your question. Please send it to firstname.lastname@example.org. H1 is on by default with ALGORITHM=INTEGRATION. Use the information matrix that is the default.
Anonymous posted on Monday, November 29, 2004 - 4:54 am
Yes i meant the variance contributed by each factor to the observed variable variances.
So I sum "the squared factor loadings from an orthogonal (factors uncorrelated) solution and divide by the number of observed factor indicators". But when i do that, I got 0.66 for the first factor, 0.63 for the second factor and 0.72 for the third factor. As you can see, if we add the 3 variance contributed by the 3 factors, i get more than 100%!!! I think there is something wrong, no?
If I remember correctly, you are doing a CFA. In this case, you should be squaring the standardized factor loadings I believe. In CFA, a covariance matrix is analyzed. In EFA, a correlation matrix is analyzed.
Anonymous posted on Thursday, June 30, 2005 - 10:27 am
I used Mplus with ML to estimate a path model without latent variables. In one model, I had only 1 DV and 5 IV's. The R^2 was .882. In a second model, I had this 1 DV and the same 5 IV's in addition to other variables in the model, some of which predicted two of the IV's mentioned above (with no additional predictors of the DV mentioned above). The R^2 for this same DV in the 2nd model was .802.
Why the difference in R^2 for this one DV when exactly the same predictors were used to predict this DV in both models? The regression coefficients and standard errors relating to this one DV were identical in both models.
BMuthen posted on Saturday, July 02, 2005 - 5:50 pm
The second model that you specify sounds like it is over-identified. This may make the estimated covariance matrix for the initial set of 5 iv's different than it was in the first model. You can see if this is the case by requesting RESIDUAL in the OUTPUT command for the second model and comparing it to the SAMPSTAT values for the five iv's in the first model.
Anonymous posted on Thursday, September 15, 2005 - 12:22 pm
Where is the error term for each indicator in the Mplus output? Is it the S.E. column? Thanks
I am running models in which a dichotomous 'obesity' variable is the final dependent variable. obesity is regressed onto education, activity, diet, age, and smoking status, with indirect pathways running from education to obesity through activity,smoking, and diet. I have run seperate models for men and women, and found significant parameters associated with obesity for women only - however, the r-square values for the obesity variable is often higher for men than for women. this doesn't make sense to me as none of the variables were significantly related to obesity among men. any thoughts on why this might be happening? education and activity are categorical, diet is a latent varialbe, age in continuous, and smoking is entered into the model as two dummy variables.
R-square is variance explained. There may be less variance for one group versus the other and less power for one group versus the other. The obesity variables may behave differently for males versus females.
I have the following SEM model F1 -------------> F2 ---------> F5 F2------------> F4 F3-----------> F4-----------> F5 F3----------------------> F5
One of my thesis questions was that F2, F4, and F4 have a direct effect on F5. F5 is a single indicator latent variable.
When I run the model, all the paths are significant apart from the F4----> F5
When I look at the R-Squared, F2 and F3 explained 88% of the variance of F4, and F2 and F3 explained 94 % of the variance of F5. Can I explain the non-significant path by the fact that the remaining variance of F4, not being explained by F2 and F3, is too small to add a unique contribution to F5?. Besides, F2 and F3 explain F5 to a great extent (94 %) so it is little variance left for any unique contribution of F4, are my arguments correct?
I have an additional question, if I employ multiple regression, both hierarchical or step-wise, all the observed variables are significant in predicting F5. However, if I run SEM , the variables for F4---> F5 are not significant, what are the possible reasons?
The non-significant direct effect of F4 on F5 may be due to F4 being highly correlated with F2 and F3, so that F4 does not contribute much in the F2, F3, F4 prediction of F5.
In your multiple regressions you predict F5 using observed variables. The factors of the SEM don't contain the measurement errors in those observed variables so you get different results. Another explanation could be that your SEM model does not fit the data well.
I want to be able to report the variance explained by each of 5 exogneous variables on other model components. My understanding is that if the exogenous variables are specified to be orthogonal and are the only predictors of a model variable, I can simply square the StdYX. But what if the exogenous variables are correlated, or what if other model variables also predict the variable? Can the StdYX be interpreted as a semipartial correlation?
Thanks for the quick reply. Actually, I am reporting the R-square values, but I was wondering if it is possible to somehow partial the variance between the exogneous variables, i.e. is there a semipartial correlation equivalent in SEM?
I think it would be hard to partial out variance explained contributions from correlated predictors.
Amy Bleakley posted on Thursday, January 18, 2007 - 11:48 am
I am estimating a non-recursive, multiple group model. The models do not include any latent variables. Two questions:
1. Which method does Mplus use to calculate the r-squared for non-recursive models?
2. For one of the groups, I am obtaining an undefined r squared. The models have excellent fit, and negative variance (with both error variances and the estimated variances) isn't an issue. Do you have any idea what the problem may be? What diagnostics would you suggest?
This is a follow up to Bleakley's question posted on 1/18/2007. I'm taking Linda's response to the question to mean that
the R-squares reported in the output for a nonrecursive model represent the proportion of variance explained and I do not have to calculate an alternate R-square using the Bentler & Raykov equation (2000). Bentler & Raykov contend that in nonrecursive models measures of explained variance are not simply summarized by a standard R-square measure because of the reciprocal interdependencies among variables.
Bentler & Raykov (2000). On measures of explained variance in nonrecursive structural equation models. Journal of Applied Psychology, 85, 125-131.
Is calculating the Bentler and Raykov R-square for nonrecursive models something that you and Bengt typically recommend? I don't know whether it would be better for my purposes or not (or even how I would figure that out). Thanks.
Yes it is possible to regress a dichotomous variable on a factor.
It is the variance explained by the set of covariates that the dependent variable was regressed on not by the model. With a dichotomous outcome, it is the variance explained of the underlying latent variable not the observed variable.
Using the "Delta method" where the variance of R-square is computed from the variances and covariances of the parameter estimates that contributes to R-square. The Version 5 Tech doc about standardizations describe the Delta approach in general.
Hi Linda and Bengt: I have tested a model with one exogneous latent construct and controlling for 3 other observed measures. I have been asked by my advisor how much more does the one exogenous latent construct add to the explained variance of the final DVs above and beyond the 3 control variables. After reviewing this thread, I'm not sure if the variance can be parcelled out? I will also review the SEMNET archives to see if I can figure this out. Thanks, Sue
The exogeneous factor is presumably correlated with the observed exogeneous control variables so it acts to influence the variance explained not only by itself but also through the control variables, i.e. in a too complex fashion.
I have run 2 models each containing a total of 14 factors. In the one model, I regressed one of the factors (factor #1) on one of the others (factor #2). In the other model, I regressed factor #1 on both factor #2 and a third factor (factor #3). The R-square for the latent variable factor #1 is larger in the first model (with one fewer predictor) and this makes no sense to me. I have not constrained the measurement model parameter estimates to be equal across the two models, might this have something to do with my nonsensical result? One other strange aspect of the output is that the modification indices for both models include output regarding a measurement model parameter that is already being freely estimated.
I would need more information to answer your question. Please send both outputs and your license number to email@example.com.
jmaslow posted on Friday, December 04, 2009 - 1:57 pm
I have a fairly simple model in which 2 latent exogenous variables predict 2 latent outcome variables. The latent outcomes are correlated with each other, and each has only 1 indicator. I have requested standardized output, but I do not receive r square values for the latent outcome variables, only for each of my observed variables (which are the indicators of the independent and dependent latents). How can I calculate r square in each dependent variable?
Hi Linda, I am using a second order latent factor (measured by three first order factors) as a mediator between socio-demographic and health variables and an outcome. after running the hypothesized model the modification indices suggest paths from the socio-demographic and health variables to the first order factors of the mediating factor and also paths from the first order factors of the mediating factor to the outcome.
does this suggest the first order factors are more important in predicting the outcome than the second order factor? is it suggesting that we use the three first order factors instead of one second order factor?
The modification indices show where freeing parameters will improve model fit. Whether the parameters should be freed should be guided by theory.
Mike Zyphur posted on Thursday, September 16, 2010 - 8:40 pm
Hi Linda and Bengt, Quick question: relevant to the notion of semi-partial/part correlations, can you think of any way to work directly with residuals without influencing the part of the model influencing a variable Y directly? I cannot.
The following approach can be used to obtain the residual for a factor indicator y4 and to regress a variable z on this residual.
MODEL: f BY y1-y4; y4res BY; ! define the residual as a factor, picking up the residual variance as the variance of this !factor y4 ON y4res@1; ! let the residual !influence y4 with the requisite unit !slope y4@0; ! fix the original residual !variance to zero z ON y4res; y4res WITH f@0;
Mike Zyphur posted on Saturday, September 18, 2010 - 5:51 am
Hi Linda, Thanks for the input and your time. With the y4res model, it seems that in many ways specifying the residual of y4 as a factor y4res is very similar to not having the factor there. For example if we have two models:
they will return equivalent parameter estimates inasmuch as Y1 is simultaneously regressed on X1 and X2 (through Y1res). I can't think of how to do it, but I want to specify a structural model for the residual after other parts of the model have been estimated--like a semi-partial correlation. With covariances this is easily done as
Y1 on X1; Y1 with X2;
but extending this to allow regression among the residual and X2, after estimating the effect of X1 on Y1, is tough. A 2 stage procedure is likely required, where one would save the residuals and then use them as data. Thoughts?
Maybe the dependent variable needs to be a factor with multiple indicators.
Mike Zyphur posted on Sunday, September 19, 2010 - 9:06 am
Sorry, I missed it before, but the solution seems as follows, and does require the logic of latent variables for residuals. Here are a few different types of models. For semi-partial correlations where the goal is r_xy partialing for Z:
Y on Z; Y with X;
For changes in R^2 in step-wise regression, where a first step contains Z and the second step adds X, we need a semi-partial regression coefficient partialing for Z, which can be done with
Then we can compute changes in R^2 in a MODEL CONSTRAINT statement where we divide the residual of Y by the total variance in Y.
For future for all forum goers, see Preacher (2006) for details. Happy modeling!
Preacher, K. J. (2006). Testing complex correlation hypotheses with structural equation models. Structural Equation Modeling, 13, 520-543.
luke fryer posted on Wednesday, September 29, 2010 - 9:32 pm
I would like to ask about a post from 2004:
"Anonymous posted on Thursday, May 27, 2004 - 11:47 am Hi. I'd like to know how to obtain the proportion of variance explained for the total structural equation model. I believe the software should generate that for me, but just don't know how to tell it to do that. Could someone help me out?
Dr. Muthen's reply suggested that it would be sufficient to consider the R-Square for each part of the model... I think...
In my case, I am modeling a test of Vocabulary knowledge (second language learners) and in addition to testing the dimensions of vocabulary recall, I wish to determine the proportion of variance (from the entire data set) explained by the working model.
Confirmatory factor analysis with Mplus of the test suggests that our model fits the data well(dichotomous data). But because it is a test, we need to know whether the model sufficiently explains the variance in the data--whether the model as a whole and the factors individually are meaningful with regard to variance explained.
I've tried to read through most of the posts here, but I haven't quite found what I'm looking for, so please forgive me if it's a repeat.
In a basic SEM, I'm trying to estimate the Variance Explained for each individual path (with multiple paths of course), in addition to the total variance explained (which I know how to get in Mplus). Now, I'm a regular user of LISREL (don't shoot me, I'm trying to convert), and I've found that by simply multiplying the value in the 'Correlation matrix of the ETA' with the BETA path, you get the amount of VE to that predictive path only (this is validated by adding them together and getting the same value as the Sum of Squared Multiple Correlations for the dependent variable).
My question is can Mplus provide output for the equivalent matrix of the 'Correlation matrix of the ETA' provided by LISREL? If not, is there anyway to compute individual path VE's from Mplus?
I only ask because some journals love VE estimates as they think we get too excited about significance for any one path.
The TECH4 option of the OUTPUT command will give you the correlations of the factors.
sunnyshi posted on Monday, March 07, 2011 - 8:29 pm
Dear Mplus, I am running a SEM with a latent interaction term. Although I requested standardized output, due to type=random, the message says" STANDARDIZED (STD, STDY, STDYX) options are not available for TYPE=RANDOM. Under this case, how could I get the R square for each dependent variable? Thanks!
I was hoping you could help me with a question regarding percent variance accounted for in EFA. I have 3 factors, varimax rotated. I understand that this is calculated by summing the squared loadings for each factor and dividing this by the number of items, but my questions are:
1) Do I zero out the loadings that are under my cutoff criteria (.4) before summing? 2) Do I divide by the total number of initial items entered in the analysis, or by the number of items that were retained?
I have read through this post, but I'm still not sure if my question has actually been answered. I am trying to calculate the variance explained by ENERGY (a latent variable) predicting pewl6m while controlling for TX, BMI, and Age. I know the R^2 provides the proportion of variance explained by the whole model, but I want to know only what ENERGY is contributing. See syntax below.
ENERGY BY MR47-MR56;
ENERGY ON TX; pewl6m ON ENERGY TX BMI AGE; BMI ON TX; BMI; AGE;
MR50 WITH MR48; MR54 WITH MR53;
Also, for the R^2 I do not get a p-value in the output. How can I tell if the R^2 is significant?
Thank you for your response, but is there a way to calculate proportion of variance for a specific predictor using the standardized coefficient or residual variances? Also, I am using Mplus version 6, but the p-values are not showing up for R-square. Do I need to type Standardized in the Output section?
I ran a path model including only observed variables. For the outcome variable, Mplus calculated an R-square of 0.76. This seemed high to me, so I ran a simple multiple regression analysis using the same 4 predictors in SPSS, and I got an R-square of 0.31. Do you know why there would be such a considerable discrepancy?
It would be impossible to say without seeing both outputs. The most likely reason is that the samples are not the same. If you would like us to look at it, send the two outputs and your license number to firstname.lastname@example.org.
Hello, I would appreciate your guidance on three points. I have been asked to support including a latent variable in a model predicting a binary outcome after controlling for other variables, or the R-square for one of a set of predictors.
One approach I have experimented with is constraining the prediction for the latent variable to zero, looking at the R-square for the rest of the model and then releasing the constraint and checking for a difference in the R-square value. Ignoring the problem I address next, is this a valid approach and would an observed difference would have meaning?
Second, I recognize that for binary variables Mplus calculates "the variance explained of the underlying latent variable not the observed variable". I think my binary variable does not have an underlying latent variable (it is an either/or variable, e.g. pregnant or not, quit or not). For this type of variable, does McKelvey & Zavoina pseudo R-squared make sense?
Do you have a suggestion for how to figure out something like the change in the model's ability to properly classify cases based on including the latent variable or not? Thanks.
I would not use R-square for this purpose particularly with categorical variables. In the logistic regression literature, classification quality is used for this type of decision. You can look at the classification quality of the model with the factor and without the factor. You do this as follows.
1. Create an estimated probability for each person's observed 0 or 1. 2. Create a classification for each person based on the rule 0 if estimated probability less than or equal to .5 and 1 if estimated probability is greater than .5 3. For the estimated models, compute the estimated probability for each person and create a classification according to number 2. 4. Compare to classifitions from the two estimated models to the observed scores and see which model has the best classification.
Heike Link posted on Friday, April 27, 2012 - 4:54 am
I am rather new to Mplus and have a question:
I have estimated a model which consists of an ordinary logit model and a latent variable part, whereby both parts where estimated jointly by using MLR with Monte Carlo simulation. The structure of my model is below:
Y ON x1-x5 (Y is a binary variable, x1-x5 are continuous predictors) Y ON LV2 LV1 BY i1 i2 (i1 and i2 are continuous indicators) x6 x7 ON LV1 (x6, x7 are exogenous variables, continuos) LV2 BY i3 i4 i5 i6 (i3 i6 are categorical indicators) x8 x9 ON LV2 (x8, x9 are exogenous variables, continuous) LV2 ON LV1.
The model was aimed at providing a better explanation of Y by including latent variables and this worked out well, the parameters are significant and can well be interpreted. However, when I compare the simple Logit part (Y ON x1-x5, estimated separately) with the results of my model above I see on the one hand an increase in Rsquare (it doubles) but a dramatic fall in the LogLikelihood and in AIC and BIC which have not expected. Of course I have tested the latent variable part separately with CFA using WLSMV and get reasonable fit indices (CFI 0.987, TLI 0.982, RMSE 0.028). I would be grateful on any explanation on this.
Thank you for your guidance on my questions regarding change in a model's ability to classify cases. I have a couple of follow up questions regarding how to implement your suggested approach.
I have used the guidance on the message boards to figure out how to calculate the probabilities for a model with observed covariates using the material in short course topic 2 and the user's guide (April 2010, p. 440 - Calculating Probabilities From Probit Regression Coefficients). I am stuck on how to calculate the probabilities for individual cases after I include a latent variable in the model. My questions: 1 - Do I use each individual's factor score multiplied by the estimate for the latent variable and add this term to the probability equation? 2 - To test the differences properly does the estimate for the latent variable need to be constrained to zero in the first round of estimates? 3 - Should I calculate probabilities using estimates or standardized estimates (and if standardized estimates are used do I use a combination of different types of standardization based on the type of covariate, e.g., binary: STDY, continuous: STDYX of just one set of estimates)? Thank you in advance for your suggestions.
Thank for pointing me towards slides 163/164 and equations 46/47. I have been going over the slides. Correct me if I have misunderstood the equations, but my take is that they calculate item probabilities for one of the indicators of f1 and a covariate.
I am wondering if it is possible to extend these equations to a model where you have multiple lambas and etas (say a latent variable with three indicators so that you can account for the full latent variable in calculating individual probabilities). If this is possible, what would this equation would look like?
Binary on F1 is just like having another indicator of F1, so it is correct to use the formulas mentioned above. You don't have to bring in y1-y3 because you condition on F1 and once you do that, y1-y3 don't have further influence on Binary.
Heike Link posted on Tuesday, June 19, 2012 - 6:48 am
I have estimated an integrated binary logit and latent variable model by using the MLR estimator, with Monte Carlo simulation for integration. Since the conventional R square does not make sense here, I am wondering whether the R square given in the output is McFaddens Pseudo R square or any other of the available measures.
Can you please explain how the standard error and p-values for R-square are calculated in Mplus?
I am running a linear regression model on observed variables in a complex data set (students clustered in classrooms). My understanding has always been that R square is tested by F = (R-square/k)/((1-R-square)/(N-k-1)). I am not sure how to interpret the standard error associated with R square in the Mplus output and how it is used in calculating the p-value.
We use a z-test instead of an F test. The standard error is computed using the Delta method.
Utkun Ozdil posted on Monday, August 13, 2012 - 8:52 pm
Hi,, I'm estimating a two-level model with student- and calssroom-level predictors regressed on three dependent variables. When I typed STANDARDIZED in the OUTPUT command to get the explained variances, for the student-level model including only level 1 predictors Mplus output gave me residual variances at within, variances at the between, and R-squares at the within part of the model for each of the three dependent variables. For the full model in which I included both student- and classroom-level predictors the output gave me residual variances and R-squares at within and between levels for each of the three dependent variables. My question is that instead of getting these values for each dependent variable, as models generally can explain variance through fixed effects as one single R-square term per model, is there a way to get a single variance term via Mplus per the student model and the full model? Thanks.
We don't give an R-square for dependent variables together.
Utkun Ozdil posted on Tuesday, August 14, 2012 - 12:35 pm
Linda, in the output of the within-classroom model where I included only gender and ses as student-level predictors,I got these R-squares for Math1, Math2, and Math3. Should I have estimated separate covariate models for Math1, Math2, and Math3 so that Mplus output would direct me towards a single R-square at the within-level for each?
TITLE: Within-Classroom Model DATA: FILE IS wcm.dat; VARIABLE:NAMES ARE class gender ses Math1 Math2 Math3; USEVARIABLES ARE gender ses Math1 Math2 Math3; WITHIN= gender ses; CENTERING = GRANDMEAN (gender ses); CLUSTER IS class; MODEL: %WITHIN% Math1 ON gender ses; Math2 ON gender ses; Math3 ON gender ses; %BETWEEN% Math1; Math2; Math3; ANALYSIS: TYPE IS TWOLEVEL; OUTPUT: SAMPSTAT STANDARDIZED; Thanks.
You can compare the STD coefficients. You can't divide up the explained variance because f2 and f3 are correlated.
Tyler Mason posted on Monday, April 22, 2013 - 12:16 pm
I ran an SEM model in Mplus and asked for the R2 values. Do I interpret the R2 values as the model explaining % of variance in the DV? or do the R2 values means the % ecplained by the last mediator/IV?
Each R-square value is the variance explained for the dependent variables by all covariatess in the model for that dependent variable. For the following regression,
y ON x1 x2 x3;
R-square is the variance in y explained by x1, x2, and x3.
Heike Link posted on Thursday, June 06, 2013 - 1:30 am
Some time ago I have performed an analysis with Mplus and have estimated a binary logit model with latent variables (a MIMIC part), estimated jointly by using MLR with Monte Carlo simulation. The structure of my model is:
Now I have received several comments and requests from reviewers of a paper and I would be very grateful for some advise on the following issues.
1) I was requested to give further measures for Pseudo R squares in addition to the Zavoina McKelvey one. Is this possible in Mplus? 2) The reviewers recommended to use the p^2 value defined as 1-(L(β)/L(0)) where L(β) is the LogLikelihood of the final model at convergence and L(0) is the LogLikelihood of a model only containing the alternative-specific constant. How can I get this in the Mplus output? And for clarification: The LogLikelihood given in the output is shown as the H0 value (what is this exactly?) and the output also gives a scaling factor for MLR what exactly is this factor? 3) Finally: When using Monte Carlo simulation for MLR, is it based on Halton draws?
1) No, but you could probably compute this by hand using the Mplus output.
2) I am not familiar with he p^2 value; I wonder if you are referring to the McFadden R-square, which uses the ratio of ML's loglikelihood values with free slopes for covariates versus slopes fixed at zero. For a description of loglikelihood, Google McFadden's R-square or Google likelihood.
3) You say "simulation", but perhaps you mean "integration"? I don't know what Halton draws are, so we probably don't do that.
Heike Link posted on Thursday, June 06, 2013 - 9:52 am
Many thanks for the quick reply!
But I still do not know what the H0 value of the Loglikelihood means, and what the scaling factor is (see my second question).
The H0 loglikelihood is the loglikelihood for the estimated model. See Bryant and Satorra (2011) for a description of the scaling correction factor. The link to the paper is at the bottom of the section on the website that describes MLM and MLR Difference testing.
I would like to test for a significant change in the R2 of an observed variable in an SEM model.
In the first model, the observed metric variable is regressed on four latent factors. In the second model, the observed variable is regressed only on a second-order factor that is constituted by the four latent factors. Is there a way to test for a significant change in the R2 between the two models?
I am referring to a question asked by sunnyshi about the calculation of R square when using type=random. I have had a look at the explanation in the FAQs as you recommended, but I don't understand it completely. Where do I find the values for beta, the variance of eta2, the variance of zeta1 and the variance of zeta3 in Mplus? Or can I just assume they are equal to one?
I have another independent variable in my model would I simply add the beta4 to the equation to calculate the variance of eta3?
One of my moderating variables is binary. Is there still a way to calculate R sqaure myself? (the variance of the variable and the covariance of the binary variable with the other construct would be needed).
With a binary observed mediator you can do two-group analysis as mentioned on page 688 of the V7 UG. The formulas of the FAQ refer to normally distributed variable and would not apply to a binary variable.
I am trying to find out how R square is calculated for a DV in a structural equation model when there are multiple, non-orthogonal predictors (i.e., like in multiple regression). I am looking at a model in which, for example, y is predicted only by x1. R square in this case is beta squared (right?). Then I look at a model in which y is predicted by both x1 and x2, which are correlated. The effect of x1 is now different, and there is the additional effect of x2. How is r square of y calculated now? In my case, the effect of x1 increases when x2 is included (and the effect of x2 becomes negative comparing to when it is the single predictor, as in a suppression effect). I expected variance explained to increase, but r square hardly changes. I was wondering why differences in the effects are not reflected also in the variance explained. I assume that knowing how it is calculated would help interpret the r squared values I am getting.
thank you. this is really helpfull! I have a couple more technical questions and a couple more conceptual, I hope it is not too long:
1. I used only values from the standardized solution (where variances of x1 and x2 are always 1, and their covariance is the correlation). The resulting r^2s matched those in the output, so I assume this was right. Is there meaning to calculating the above with non standardized estimates?
2. If one of the betas is positive and the other negative, then the total R^2 is always smaller than when the two effects are in the same direction. Is this correct? I have one high positive effect and one smaller but significant negative effect. The total R^2 is smaller than it would have been with only the large effect, so I assume it is because the effects are opposite. but is this true?
3. relatedly, in the case of 2x's, is it possible to determine what proportion of total R^2 stems from each x?
4. in the case of 2 x's, would it be correct to say that, hypothetically, if we could have measured only the unique part of x1 (what it doesn't have in common with x2), than the effect of such a measure on y would be beta1?
We are running a path analysis with one predictor at age 0 to three outcomes at age 4, working through two sets of variables at age 2 and age 4. Each set of variables is the same at age 2 and age 4.
We would like to know what proportion of the main effect is explained by each of these two pathways.
We used the: outcome IND predictor; command and got a total effect, as well as a total indirect effect.
One of our predictor to outcome pathways has a larger indirect effect than the total effect. How can we calculate the proportion of the total effect explained by all mediators and then explained by each of the age 2--> age 4--> outcome pathways? How do we deal with indirect effects that work in different directions, thus creating the scenario where the total indirect effect > total effect.
Any advice you might have would be greatly appreciated!
Thank you for your quick response. We are hoping to get an estimate of how strong a mediator is / or by how much of the total variance is explained by the mediators. Can we use the indirect/direct effects given by the model to calculate this?
Based on the output, I know that the total explained variance of ENG_S is 88.6% .
However, could you please tell me the way that I calculate how much of the total variance was explained by each of the variables (PA, OA, MA, and EV-m)? How can I partition the explained variance in the model?
The standardized output provides an R-Square estimate of the latent, dependent variable. Is this estimate equivalent to the total amount of variance that this dependent variable accounts for (including the independent variables, the mediators, and the control variables)?
Hi there Is there a way to tell if an interaction explains a SIGNIFICANT amount of variance ie the p value (I know the amount of variance explained)?
To get the variance explained I compared a model with the interaction set @0 and one where all was free and then just subtracted the standardized residual variances (since that is essentially the R2 change right)? Model is saturated because I am running a multivariate multiple regression using Mplus due to nesting.
Model is as follows: Analysis:TYPE = COMPLEX; ESTIMATOR = MLR;
Model: DV1 ON CoV1 CoV2 x m xm;
DV2 ON CoV1 CoV2 x m xm; !This is interaction I want to know !if adds significant variance to the !model
CoV1 WITH CoV2 x m xm; CoV2 WITH x m xm; x WITH m xm; m WITH xm;
Hi Linda and Bengt: I am running a model with a continuous dependent variable Exam and several predictors (observed and latent). The model has excellent fit (chi-squared p value = 0.19, other indices just perfect)
Standardized solution output shows that R-square for Exam = 0.17 (P = 0.016) But none of the paths from predictors to Exam is significant.
R square is the variance explained by the set of covariates that the dependent variable was regressed on, but I can see that all regression coefficients are non-significant. How is it possible? Could it happen because my model is wrong?
MODEL: External BY Egoal2 Egoal3 Egoal4 Egoal5; SelfEff BY SE1 SE3 SE5; External WITH SelfEff; External SelfEff ON gender EGEsum GPA; GrG1 ON External SelfEff GPA; Exam ON GrG1 External SelfEff GPA EGEsum;
The fit can be good while having non-significant relations. This can happen with low N, low correlations, or both.
You may want to discuss this further on a general list like SEMNET.
Amber Fahey posted on Thursday, June 07, 2018 - 11:56 am
Hello, A Co-author wants me to report R squared for my distal models but the output doesn't produce that since the model includes random intercepts and slopes. I can compute R2 with total variance-residual var. /total variance but unlike my conditional model, the distal model output only provides residual variance. Is there a way to request total variance? Here is my syntax Dr. Linda Muthen helped me with previously:
ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% s | OLOG ON DAY; %BETWEEN% Folog by olog; olog@0; [folog@0]; FOLOG S ON day2inpt AGE FIMTOTA TFCDays EduYears Sex Racec Emppre PTA1; SRS ON FOLOG S day2inpt AGE FIMTOTA TFCDays EduYears Sex Racec Emppre PTA1; SWLST ON FOLOG S day2inpt AGE FIMTOTA TFCDays EduYears Sex Racec Emppre PTA1; FIMfu ON FOLOG S day2inpt AGE FIMTOTA TFCDays EduYears Sex Racec Emppre PTA1; DRSPI ON FOLOG S day2inpt AGE FIMTOTA TFCDays EduYears Sex Racec Emppre PTA1; PARSum ON FOLOG S day2inpt AGE FIMTOTA TFCDays EduYears Sex Racec Emppre PTA1; fOLOG WITH S;
Bayes gives you R-square also with random slopes. The background is given in the Schuurman et al (2016) Psych Methods paper.
Kerry Lee posted on Thursday, November 07, 2019 - 2:13 am
Dear Prof Muthen,
Using BAYES to run a simple regression with one manifest predictor and one latent criterion, I obtained a non-significant regression beta but a R-squared with a one-tailed p-value below .001. With just one predictor, I am not sure what to make of this.
On reflection, the Bayes one-tailed p-value refers to the proportion of estimated values below zero (when the estimate is positive). Am I correct that R-squared cannot be negative, thus its associated p-value will always be below .001? Does this also affect the 95% Bayesian C.I.? If so, how does one decide if a R-squared is significant?
This is correct. The R2 is always positive - the R2 entire posterior distribution is always positive, i.e., the p-Value is always 0. The significance of the R2 as a statistical hypothesis has the same meaning as the significance of all the predictors. Since you have just one, that is the significance of the beta. In the upcoming Mplus 8.4 you will be able to use Wald test for multiple predictors. So generally I would not recommend using the R2 scale for that hypothesis as it would unnecessarily create a problem. If for some reason you decide to use it anyway you can use the traditional ML approach where you divide point estimate/standard error and you decide the significance if this is >1.96. ML and Bayes are asymptotically equivalent so this is a valid approach as well but I would still not recommend it since a more direct approach is available.
Kerry Lee posted on Tuesday, November 19, 2019 - 12:52 am
Following up on my earlier post of Nov 7, I experimented with another manifest predictor and the same latent criterion. Using a ML estimator, the regression beta was significant but the R^2 was not. The estimate was as expected (R^2 estimate = regression beta^2), however, the S.E. were different. With just one predictor should they not be the same? Thinking it may have to do with the latent variable, I also tried another model using a single manifest criterion but the S.E. for the beta and the R^2 still did not agree. Grateful if you could let me know the reason for the difference.
My last question is that in the case of a regression with multiple predictors (using ML), can the p-value of the R^2 be relied on as a test of the significance of the overall explanatory model? Same as one would in, say SPSS.
R2 is not = beta^2 because there is also a residual variance involved. Also, the sampling distribution of R2 is different than for beta and is probably more non-normal so may need either bootstrap or Bayes CIs to be correctly evaluated for significance.
A simpler approach is using Model Test with all beta's = 0. This is available for both ML and now in Mplus version 8.4 also for Bayes.