Mplus Discussion >> Calculating the % variance explain

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Calculating the % variance explain

Mplus Discussion > Exploratory Factor Analysis >

Message/Author

Larry posted on Thursday, December 30, 1999 - 8:40 am

From the information given in the EFA output can I calculate % variance explained? If so, could you tell me how (or provide me with a reference)?

Bengt O. Muthen posted on Thursday, December 30, 1999 - 11:14 am

The proportion variance explained in each observed variable can be obtained from the EFA output as 1 minus the residual variance. Some analysts also find it useful to consider how much of the variance in all the observed variables is due to a given factor - in a Varimax rotation you get this by using the sum of squares of the loadings in a given column.

Larry posted on Sunday, January 02, 2000 - 7:29 pm

Dr. Muthen response on 12-30-99:
"Some analysts also find it useful to consider how much of the variance in all the observed variables is due to a given factor - in a Varimax rotation you get this by using the sum of squares of the loadings in a given column."

Dr. Muthen - Is the above describing how one would determine the % variance explained which is associated with a specific eigenvalues? If I am understanding your response so far, given the correlation between factors with the PROMAX rotation how can I calculate the % variance explained for each eigenvalue?

Bengt O. Muthen posted on Monday, January 03, 2000 - 6:21 pm

Larry (please call me Bengt), yes in its extension this is related to eigenvalue issues. If we use Principal Component Analysis as an approximate method for doing factor analysis, a loading column (factor) sums of squares would be the eigenvalue. However, as Joreskog and others have emphasized (for instance in his nice Abt book chapter), the Mplus factor analysis methods, as opposed to PCA, attempt to reproduce correlations, not maximize variance, so the eigenvalues don't relate to the estimates in that simple fashion. With PROMAX, percent variance explained in the observed variables by a given factor is necessarily quite complex because the factors correlate. If I were to report variance explained by a factor, which I don't do, I think I would simply use the VARIMAX solution given that it is most often so close to PROMAX. Maybe others have other opinions.

Bill Farmer posted on Wednesday, August 29, 2001 - 8:51 pm

Bengt,

In your response to Larry (the second one), you mention that you DON'T report the variance explained by each factor in an analysis. Why is this?

bmuthen posted on Thursday, August 30, 2001 - 8:51 am

I am more interested in how much variance in a given observed indicator a factor contributes to. The contribution of a factor to all indicators depends on how many indicators of that domain that the investigator includes. But I guess for an established measurement instrument, where the indicators are predetermined, the overall variance contribution might be of interest; I just don't typically look at that.

Anonymous posted on Friday, November 07, 2003 - 9:11 am

Is there a way to test whether the polychoric correlation is significantly different from zero or not? I realize that when I run the factor analysis between ordinal variables) the correlations that it prints out are polychoric and I would like to also report p-values with these.

Linda K. Muthen posted on Saturday, November 08, 2003 - 5:26 pm

If you ask for TYPE=BASIC, you will get the correlations and their standard errors. You can use the ratio of the correlation to its standard error to test the significance of the correlation.

Anonymous posted on Monday, November 10, 2003 - 5:59 am

After we have the estimate and its standard error, we can create the ratio. However, we need to know the disribution of this ratio in order to perform a test. What is the distribution?

Linda K. Muthen posted on Monday, November 10, 2003 - 6:05 am

Approximately z or t. You can use 1.96 as a cutoff for a two-tailed test with a p-value of .05.

splewis posted on Sunday, March 21, 2004 - 1:11 pm

Hi Dr. Muthen

I am relatively new to M-Plus and have recently calculated an EFA using dichotomous variables. I am trying to compute the R-square value for each of the 3 factors that best represent the data, but am unsure of how to do so. I saw above that you can subtract the residual variance, but I remain a tad confused -- any suggestions you can offer would be most appreciated.

Thanks in advance

Stephen

bmuthen posted on Sunday, March 21, 2004 - 2:14 pm

Traditionally in EFA, researchers have been interested in two different variance-related descriptors. One is the counterpart to the regular R-square in regression and gives the proportion of the variance in an observed variable that is explained by the factors. This is the same as one minus the residual variance for the observed variable. Another is how much variance in all observed variables that a factor explains. With uncorrelated factors this is the sum of the squared loadings on that a certain factor divided by the number of observed variables. Having said this, note that the goal of factor analysis is not to explain variance but to explain correlation. This distinction was not made clear in some earlier factor analysis literature which was drawing on principal component analysis estimation.

mese1732 posted on Wednesday, April 14, 2004 - 11:36 pm

Hello,

I am a post gradute student at Addis Ababa University. I am working on learners' attitude towards error correction. I would like you to help me, in showing, how to use[calculate] factor analyisis in calculating factors related to attitude.

Thank you so much!
Meseret Teshome[AddisAbaba University]

meseret Teshome posted on Wednesday, April 14, 2004 - 11:39 pm

Linda K. Muthen posted on Thursday, April 15, 2004 - 8:09 am

This is a rather broad topic for this discussion board. If you go to www.statmodel.com, there are many references related to factor analysis.

Daniel E Bontempo posted on Tuesday, October 05, 2004 - 7:47 pm

I am still a little unclear about calculating the % of varience explained. I did a 5 factor EFA on 95 categorical items. Most egienvalues range from +22 down to 0, and the last 11 or so range from 0 to -.95. I used WLSMV because a non-positivedefinite information matrix prevented WLS from running. I also plan some CFA analyses using WLSMV, so I choose WLSMV instead of the default ULS for the EFA.

I gather the negative eigenvalues act to prevent the total variance from being greater than 100%.

In an earlier reply, you mentioned summing the squared loadings in the VARIMAX rotation. Can I do this even though there are negative eighevalues? Will I get an inflated value for explained variance? I want to compare my work to prior EFA of the same instrument which used NOVAX and reported % of total variance explained?

Thanks

Daniel E Bontempo posted on Tuesday, October 05, 2004 - 8:41 pm

I get the following message when doing categorical EFA with the ULS estimator:

THE INPUT SAMPLE CORRELATION MATRIX IS NOT POSITIVE DEFINITE. THE ESTIMATES GIVEN BELOW
ARE STILL VALID.

However, the solution does not have a strong resemblence to what I get with WLSMV. The eigenvalue range looks about the same, but the PROMAX correlation matrix is very different. Also a number of items jump factors or switch signs.

The WLSMV yeilds a factor structure that compares favorably to prior EFA work with this instrument. How should I think about these ULS vs WLSMV solution differences?

Thanks

Bengt O. Muthen posted on Tuesday, October 12, 2004 - 4:34 pm

There are a couple of issues here. The eigenvalues are given for the sample correlation matrix. They are not involved in calculating the variance explained using the VARIMAX solution.

The differences you are seeing between ULS and WLSMV are likely due to the fact that the factors are coming out in a different order which has not meaning. Also, factor loadings changing signs have no significance if they all do this. Another reason ULS and WLSMV may be different could be related to the failure of your WLS run. If the weight matrix is not positive definite, the WLSMV solution may not be of high quality.

jasmine K.S posted on Saturday, November 19, 2005 - 2:03 am

I am Jasmine K.S working as a lecturer and doing research on software resue, I have some factors to measure the success of software reuse in percentage format. How I will do factor analysis for these data?Please help me .

Linda K. Muthen posted on Saturday, November 19, 2005 - 8:04 am

I would treat these as continuous outcomes. If you have very strong floor or ceiling effects, I would consider modeling that takes this into account.

Roisin O'Connor posted on Wednesday, June 11, 2008 - 11:54 am

I am running an EFA with continuous data and am focusing on a 4-factor solution. I am using the MLM estimator (because of some skewness in my data) and am interested in the Varimax rotation output. I am not sure how to find two pieces of information in the MPlus output, these are: (1) the variance explained by each of the 4 extracted factors; (2) the extracted communalities.

I would appreciate some direction with this. I am not requesting anything specific other than residulas, so maybe I am missing a command.

Bengt O. Muthen posted on Wednesday, June 11, 2008 - 6:43 pm

(1) Variance explained in a set of variables by a factor is not given in Mplus. We have not been inclined to add this for 3 reasons. One is that factor analysis does not aim to explain variance but correlations. The other is that typically, oblique factors need to be extracted in which case the concept of variance explained by a factor is not clearcut. Variance explained also connects with Principal Component Analysis which is not factor analysis. But it might be of interest to look at the eigenvalues to get a descriptive picture of the drop in eigenvalues up to 4 factors - the percentage of total eigenvalues is sometimes reported. If you want variance explained in the Varimax case in Mplus you will have to add up the squared loadings in each factor column and divide by the number of items.

(2) The extracted communalities are 1 minus the estimated residual variances given in the output.

Roisin O'Connor posted on Thursday, June 12, 2008 - 8:34 am

Thank you. This makes a lot of sense. I have run into another problem, however. I have missing data and it seems that the MLM estimator with Type=EFA and Missing data is not possible.

My rational for wanting to use MLM is that I go on to run a CFA with a separate data set and use MLR (because of the non-normality of the data). It would seem that I should be using a procedure at the EFA stage that is also robust to non-normality.

Q. Is it possible to run the EFA with Type = missing data and using an estimator robust to non-normality?

Linda K. Muthen posted on Thursday, June 12, 2008 - 12:29 pm

You can use MLR for both EFA and CFA. It is robust to non-normality.

Roisin O'Connor posted on Tuesday, June 24, 2008 - 3:00 pm

When using MLR with EFA with Type=missing, I get an error saying that:

*** ERROR in Analysis command
Estimator MLR is not allowed with TYPE = EFA MISSING.
Default will be used.

Any thoughts?

Linda K. Muthen posted on Tuesday, June 24, 2008 - 3:22 pm

It is allowed in the current version of the program. You must be using an old version.

Paul Tremblay posted on Friday, July 04, 2008 - 4:00 pm

In EFA with a varimax rotation, is it normal for the rotated factors not to appear in order of magnitude. I calculated the % of variance explained by each factor after rotation and the first factor was the smallest followed, by the second and then the third. In SPSS, the order usually goes from largest to smallest not only before rotation but also after. How does Mplus decide which will be the first factor after rotation? Thanks,
Paul Tremblay.

Bengt O. Muthen posted on Saturday, July 05, 2008 - 9:38 am

The order of the factors is indeterminate in EFA - there is no reason one order should be preferred over another. Percent variance explained is a principal component criterion, not a criterion for EFA model fit. In Mplus the order of the factors is a function of the optimization and has no interpretational meaning. You can rearrange the factor order any way you want for publication purposes.

Michelle Rasdale posted on Thursday, October 21, 2010 - 3:27 am

Hi everyone,

What is regarded as a respectable amount of percentage variance? I need to know guidelines for small/medium/large amounts of percentage variance.

Thanks

Linda K. Muthen posted on Thursday, October 21, 2010 - 2:12 pm

I don't know of any guidelines for this. I would check a regression book to see if this is covered.

Utkun Ozdil posted on Friday, March 11, 2011 - 10:33 am

In a twolevel EFA with categorical variables MPlus uses GEOMIN rotation.
1) With correlated factors is it possible to calculate the percent of variance explained by each factor at both within/between levels?

2) Does the formula (add up the square loadings in each factor column and divide by the number of items) for VARIMAX rotation hold for GEOMIN?

Thanks...

Bengt O. Muthen posted on Friday, March 11, 2011 - 10:57 am

1. No, because the factors are correlated their contribution can't be disentangled.

2. No, Geomin is a correlated-factors rotation.

Marco d'errico posted on Monday, June 20, 2011 - 2:27 am

Dear All,

just a quick information. I'm quite new at Mplus so don't blame me if I am asking the-same-old question.

I've done a EFA with 10 categorical vars (dummy) (one of which has really low variance); 4 factors fit well (Chi square 81.612; CFI 0.999; TLI 0.995; RMSEA 0.022; SRMR 0.030). But I have a strange result in the Factor Determinancies; F1 1.008; F2 5.695; F3 0.784 and F4 1.257. How is it possible that F2 determinance was so high?

Tks in advance
MdE

Linda K. Muthen posted on Monday, June 20, 2011 - 8:44 am

We no longer give factor determinacies for categorical items. You should use the PLOT command to look at information curves.

peter pitt posted on Tuesday, June 28, 2011 - 1:43 am

Hello,

I'm new at this, so I may be asking a stupid question here:
Above it was mentioned that, with EFA, the variance explained by a factor equals the sum of the squared loadings in the factor column divided by the number of items (in case of varimax rotation). I'm wondering, does the same hold for the ESEM case? So, when analyzing two groups with ESEM with the same loading structure for the two groups, do the factors then explain an equal amount of variance in each group?

Thanks a lot!

Linda K. Muthen posted on Tuesday, June 28, 2011 - 8:00 am

EFA and ESEM are identical unless ESEM uses covariates or includes other parameters like residual covariances. You can apply the variance explained to an orthogonal rotation in ESEM the same way as you do in EFA as long as the ESEM model is equivalent to the EFA model.

peter pitt posted on Friday, July 01, 2011 - 1:29 pm

Dear Dr. Muthen,

Thank you for your comments, but I�m still struggling with some questions. Suppose I have analyzed two groups with multigroup exploratory factor analysis (multigroup ESEM) and I�ve found a 3 factor solution with the same loading matrix for both groups. Also suppose that the rotation is orthogonal. My questions now are: (a) does a specific factor from this 3 factor solution explain the same amount of variance in each group (and if so, does the variance explained by this factor equal the sum of its squared factor loadings)? Or, is this only the case if the unique variances are constrained to be equal for both groups? (b) How much variance does the factor explain overall, this is considering both data blocks as a single one by concatenating them? How can I calculate this overall amount of explained variance on the basis of the ESEM output?

I�m sorry to have disturbed you with all these questions, but I�m still struggling a bit to get a full understanding of multigroup EFA (ESEM).

Thanks a lot in advance!

Bengt O. Muthen posted on Saturday, July 02, 2011 - 8:25 am

First, note that explaining variance is not the goal of factor analysis, explaining covariance is.

Even if you have group-invariant factor loadings and uncorrelated factors, ESEM lets the factor variances differ from the unities that the reference (first) group has, so that will change the amount of variance explained by a factor. The residual variances only play a part if you are considering the proportion of variance, not the variance per se.

J.D. Haltigan posted on Thursday, November 10, 2011 - 7:50 pm

Relevant to this thread: could one say (if a reviewer requests it) that prior to rotation, the % variance accounted for by the X factors was xxxx? I realize this is inconsistent with the goal of EFA in terms of reproducing correlations, but I am in a situation where a reviewer is requesting it....SPSS of course gives you above info (with the warning that you can not use SS of the rotated loadings to derive prop. var.). A lot of folks in dev. psych use PCA when indeed it would appear that EFA should be their choice and so points of difference are rampantly misunderstood.

Secondly, is reporting that percent of variance (prior to rotation at extraction) simply reporting the percent of variance explained as if one used an orthogonal rotation (i.e., Varimax)?

Bengt O. Muthen posted on Friday, November 11, 2011 - 11:10 am

This sounds outdated - like what is used in principal component analysis, where the sums of squares of loadings in a column is the variance explained by that factor/component. Before rotation. As we know, PCA is not FA - PCA is a biased, although reasonable, FA estimator. But in FA I would consider the sums of squares of loadings in a column taken from an orthogonal rotation such as the Varimax solution. I guess one can view it as a descriptive measure of the importance of the factor, although by no means a critical one, I think.

J.D. Haltigan posted on Friday, November 11, 2011 - 12:06 pm

How is it that rotation changes the percent of variance accounted for in the factors? For example, in a Varimax rotation is it that depending on the location of the axes, certain manifest indicators have more of their variance explained by a given factor? The reason I ask is b/c rotation doesn't change model fit, but yet it does change how items load and the total percent of variance accounted for (at least in the case of varimax).

Bengt O. Muthen posted on Friday, November 11, 2011 - 2:49 pm

Q1. Not "accounted for in the factors", but "accounted for by the factors".

Q2. Yes, different (orthogonal) rotations give different loadings for a factor and therefore changes the factor interpretation and the sums of squares of loadings for the factor.

True, rotation doesn't change fit to the correlation matrix. Which goes to show that percent variance accounted for by the factor is not a goal in FA.

chidi O posted on Saturday, May 12, 2012 - 4:14 pm

Hi,
Is it correct to interpret the following "eigenvalues for sample correlation matrix" (EFA) as e.g factor structure with 3-factors has eigenvalue >1?? Would this correspond to the scree plot?

eigenvalues for sample correlation matrix
1 2 3 4 5
___ _____ _____ _______ ________
11.583 1.699 1.167 0.745 0.621

I couldn't reproduce them manually.
Thanks.

Linda K. Muthen posted on Sunday, May 13, 2012 - 10:56 am

Yes and yes.

Giada Perazzi posted on Wednesday, January 23, 2013 - 4:29 am

Dear Doctor Muth�n,
I am a graduated student from Catholic University of Milan and I am learning how to use Mplus. I am very sorry and I know that it could be a very simple question but I tried to read the forum without success. I am doing EFA on a tetracoric matrix of data from dichotomic items using Promax rotation and wlsmv (Muth�n, DuToit and Spisic, 1997). I have a satisfactory solution with 3 factors but I am not able to understand the variance explained by this solution. Could you help me please? I tried to ask to the software the remaining variance but I couldn't figure out how.

Thank you in advance.
Giada Perazzi

Linda K. Muthen posted on Wednesday, January 23, 2013 - 12:04 pm

Variance explained is a concept related to principal component analysis not factor analysis. In factor analysis, the objective is to reproduce the observed correlation matrix. For an orthogonal solution, variance explained can be computed as the sum of the squared factor loadings divided by the number of factor indicators.

JJL posted on Monday, April 08, 2013 - 7:48 am

What a great thread. Pretty amazing this started almost 15 years ago and still has an incredible amount of information. Thanks for all you do.

ylam posted on Tuesday, November 12, 2013 - 7:38 am

Hi,
I am doing a EFA (Variamx rotation) for 14 dichotomous items, and WLSMV was selected. As there was no CFI reported in the output, is there anything I missed? or How could I get this?
Thank you very much.

Linda K. Muthen posted on Tuesday, November 12, 2013 - 8:14 am

Please send the output and your license number to support@statmodel.com so I can see why you did not get CFI.

Augustine Osman posted on Friday, January 17, 2014 - 1:50 pm

Dear Dr. Muthen-- I have 2 quick questions about the typical output for the bifactor ESEM modeling. In particular, does the ESEM- bifactor give information about the proportion of variance accounted for by the general factor (g) and the variance accounted for by the specific [lower-order] factors? If so, where can I look in the output for these? Many thanks for consideration of this query—Sincerely, ---Augustine

Bengt O. Muthen posted on Friday, January 17, 2014 - 2:51 pm

No, you have to compute those yourself from the estimated model parameters.

Augustine Osman posted on Friday, January 17, 2014 - 3:06 pm

Many thanks for your response. I have been doing these manually but thought I could ask to be sure that I am not missing important parts of the mplus out.

Bengt O. Muthen posted on Friday, January 17, 2014 - 3:42 pm

Note also that bi-factor ESEM needs careful consideration. For instance, to get an identified model at least one item has to load on the general factor only and only the remaining items put on the ESEM factor list.

Yvonne LEE posted on Monday, June 23, 2014 - 5:21 pm

I am doing SEM with WLSMV estimation due to the ordered categorical outcome indicator. 1. May I know how to calculate the proportion of variance explained by a specific variable /factor? So far, I know Rsquare in my case cannot be used for that. 2. How to interpret the 'estimate' of the indirect effect (standardised)? 3. How to compare the 2 indirect paths to see which has higher explanatory power? Sorry for my very basic questions.

Bengt O. Muthen posted on Tuesday, June 24, 2014 - 11:19 am

1.Proportion variance in a factor indicator explained by the factor is given by R-square in the standardized section of the output. Perhaps you refer to proportion variance explained for a set of observed variables like is sometimes done in EFA with uncorrelated factors (the sum of the squared factor loadings divided by the number of variables). If so, that is not really relevant in SEM where we don't try to explain variance but covariance and where the factors are typically not uncorrelated.

2. That's a long story - please see e.g. McKinnon's mediation book.

3. You can express the two indirect effects and their difference in Model Constraint.

Masih Shafiei posted on Monday, January 12, 2015 - 8:32 am

Dear Linda/Bengt,

I am conducting an EFA (using WLSMV as estimator and promax as rotation method) on a set of 22 categorical (4 categories each) items. I should add, both multivariate skewness and kurtosis are statistically significant, indicating violation of multivariate normality assumption for the data. To determine the appropriate number of factors to retain, I used eigenvalue-greater-than-one rule and scree plot. From the scree plot, it was concluded that either 1 or 2 factors should be retained (both had eigenvalues greater than 1). So, I decided to conduct EFA with 1, 2 and 3 factors (with the same estimator and rotation method) and examine RMASEA values and presence of negative residual variances in order to decide on the number of factor that best explain the underlying structure of the data. I wanted to know if what I did (examining a range of number of factors in this condition) is correct.

Bests,
Masih Shafiei, M.D.

Linda K. Muthen posted on Monday, January 12, 2015 - 10:49 am

If you are treating the variables as categorical by putting them on the CATEGORICAL list, skewness and kurtosis are not relevant concerns. Categorical data methodology handles this.

Masih Shafiei posted on Monday, January 12, 2015 - 12:02 pm

Ok, thanks. Actually my concern was that examining a range of number of factors, when conducting efa using WLSMV estimator on categorical data is correct or not? Or this approach can only be employed when using ML as esmitator?

Linda K. Muthen posted on Monday, January 12, 2015 - 4:12 pm

This does not change with the estimator being used. The strategy is the same.

Masih Shafiei posted on Tuesday, January 13, 2015 - 9:51 am

Thank you, and another question. The questionnaire I am dealing with, has a scoring system through which 4-point likert-type responses to each of the indicators are transformed to a 3-point likert scale. to calculate the total score of the questionaire, the transformed 3-point-likert scores are used. I am wondering the polychoric correlation matrix should be calculated based on the raw 4-point likert responses or the transformed 3-ponit likert scores.

Bests,

Linda K. Muthen posted on Tuesday, January 13, 2015 - 12:32 pm

It is computed on the variables you use in the analysis.

Masih Shafiei posted on Tuesday, January 13, 2015 - 1:11 pm

I meant which of the two sets of variables,4-point likert scale or the transformed 3-point likert scale, I should use.

P.S. items of the questionnaire are on a 4-point likert scale. Responders should choose between 4 ordered categories (totally disagree, disagree, agree, and totally agree). After the questionnaires are filled out, we transform responses to each of the items to a 3-point likert scale (0,1 or 2). Therefore, I have two set of variables, one has 4 ordered categories , and the other has 3 ordered categories.

Linda K. Muthen posted on Tuesday, January 13, 2015 - 1:51 pm

That would need to be your choice.

Masih Shafiei posted on Tuesday, January 13, 2015 - 9:22 pm

OK, thank you.

Masih Shafiei posted on Thursday, March 26, 2015 - 9:19 am

Dear Bengt and Linda Muthen,

It is repeatedly stated in previous comments that variance explained in a set of variables by a factor is not given in Mplus, since factor analysis does not aim to explain variance but correlations.
I wanted to cite this statement in a manuscript of mine but couldn't find any specific article containing such statement.
It would be appreciated if you could give a reference for this, that could be cited in a scientific article.

Bests,

Bengt O. Muthen posted on Thursday, March 26, 2015 - 1:45 pm

That factor analysis aims to explain correlations rather than variances is for example stated in Joreskog's nice intro article Basic Ideas of Factor and Component Analysis. It is in the 1979 Advanced in Factor Analysis and SEM book edited by Magidson containing Joreskog and Sorbom articles. It may be hard to find, however, since the book is out of print, but maybe libraries or Joreskog himself could provide it.

Or, perhaps this article discusses it:

Fabrigar, L.R., Wegener, D.T., MacCallum, R.C. & Strahan, E.J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299.

I think variance explained can still be looked at as a descriptive of analysis results. It is just my opinion that it isn't a critical matter - choosing the number of factors shouldn't be based on it (that's PCA) - and it doesn't generalize to the more common situation of correlated factors. Others may disagree, but to me, it is an outdated concept for factor analysis.

Anton Dominicson posted on Wednesday, August 24, 2016 - 9:47 pm

Hi. Would the explained formula (sum of squared loadings divided by the number of indicators) work with any CFA model? If not, how can I calculate the explained variance for a CFA model? Also, would you consider the explained variance an aspect of interest in CFA? Thanks.

Linda K. Muthen posted on Thursday, August 25, 2016 - 8:55 am

This is appropriate only for orthogonal factors.