Bill Roberts posted on Thursday, September 05, 2002 - 10:02 am
I have a question about the eigenvalues for EFA in Mplus. Using maximum likelihood for EFA in Mplus I get 12 eigenvalues GE 1 that are all positive. The eigenvalues ranged from .35 to 6.379 for the 45 Likert-type items in the analysis. I compared these results with the same items and the same analysis using sas. Eigenvalues from sas ranged from -.46 to 9.25 with 6 eigenvalues GE 1. If I use the typical cut-off of for an eigenvalue GE 1 to explore the underlying dimensionality of the items, sas would indicate 6 factors and Mplus would indicate 12. I compared the rotated promax factor pattern matrix between sas and Mplus for six factors and found that they are similar with minor differences that is probably due to rounding. I tried to run EFA in Mplus with 12 factors and ran into a convergence problem. How should I interpret the eigenvalues given by Mplus?
bmuthen posted on Thursday, September 05, 2002 - 5:37 pm
If SAS and Mplus have eigenvalue differences, I would assume that they are not computed for the same matrix. Assuming that your outcomes are continuous variables, the eigenvalues should all be positive. Perhaps SAS handles any missing data differently in this run, e.g. using pair-wise present data which could lead to negative eigenvalues. Another reason for differences in the matrix used for eigenvalue computation is if an iterated principal factor method has been used in SAS so that the sample matrix is not used, but a sample matrix that has an adjusted diagonal. Mplus consider the eigenvalues for the sample correlation matrix. The Mplus eigenvalues GE 1 can be used to guide in determining the number of factors, but such a guide is a very rough one - a somewhat less rough guide is to use the scree approach.
Thank you for discussing reasons that could account for differences in the eigenvalues. I am fairly certain that I can rule out pair-wise deletion of missing cases. According to the sas documentation, missing cases are deleted listwise by default. The sample size is the same in both programs. The simple descriptive statistics and correlation matrix look nearly identical. Perhaps sas uses a different matrix to compute the eigenvalues, as you suggested.
bmuthen posted on Friday, September 06, 2002 - 1:45 pm
Which factor extraction method in SAS is being used?
They must be doing something to the correlation matrix otherwise no eigenvalue would be negative because when a correlation matrix is computed from listwise present data the correlation matrix is positive definite.
I am finding that by default, SAS sets prior commonalties for each variable to its squared multiple correlation with other variables in the analysis when the method is maximum likelihood. After taking a closer look at the SAS output, I see that the cumulative variance exceeds 100 percent, at which point, the eigenvalues become negative bringing the cumulative variance back to 100 percent. If, however, the priors option is set to one for method = maximum likelihood all eigenvalues are positive. When squared multiple correlations are inserted along the diagonal of the correlation matrix, then the total variance to be decomposed into factors is less than the number of variables. Eigenvalues using principal components as the method of analysis in SAS were identical to what I am finding in the Mplus output using maximum likelihood as the estimator. Is there a way to specify EFA in Mplus using maximum likelihood and insert squared multiple correlations along the diagonal of the correlation matrix using nonsummary data?
bmuthen posted on Monday, September 09, 2002 - 6:28 pm
The answer is no, and I can't see the need for it in terms of model estimation. The adjustments to the diagonal are connected with either descriptive purposes - getting a picture of relevant eigenvalues to guide in choosing number of factors - or are part of simpler estimation methods such as principal factoring. You can get Mplus to give such eigenvalues if you input an adjusted correlation matrix. But to get maximum-likelihood estimation from a correlation matrix, you don't want to adjust the diagonal of the correlation matrix.
Just a few more words on this. Mplus computes eigenvalues just like in principal component analysis (keeping the diagonal elements as they are - here 1). You can use such eigenvalues to descriptively guide you in choosing the number of factors. The idea of adjusting the diagonal is that this perhaps makes the resulting matrix more closely approximate Lambda*Psi*Lambda' in
Sigma = Lambda*Psi*Lambda' + Theta,
where the eigenvalues shed light on the rank (number of factors) of Lambda*Psi*Lambda'.
Hervé CACI posted on Thursday, August 07, 2003 - 2:41 am
Bengt & Linda,
I'm using WLSMV with 4-point Likert like item scores (Mplus 2.01). I understand that I can use other estimators as well.
I'm puzzled by the fact that Mplus outputs a different item correlation matrix as the number of factors extracted grows. I would rather assume that the correlation matrix remained unchanged, because the eigenvalues of this matrix are a guide to the number of factors to extract/rotate.
Also, I'd like to know on which correlation matrix are computed the eigenvalues ? The first printed out, or some hidden matrix ?
The correlation matrices printed are model estimated and then after that the residuals are printed. The residuals are the model estimated minus the observed values. As the model changes, that is, as more factors are extracted, the model estimated values will change. The eigenvalues are based on the observed correlation matrix.
Anonymous posted on Tuesday, August 24, 2004 - 9:53 am
I am wondering how to correctly define an EFA using continuous variables using MPLUS (for write up). Since ones are placed on the diagonal is this analysis really a principal components analysis or a factor analysis with principal component extraction method? To my understanding there should not be residual variances for manifest variables using PCA. Is this correct?
It is a factor analysis using a maximum likelihood or unweighted least squares estimator. It does not use the principal components estimator.
anonymous posted on Wednesday, October 19, 2005 - 12:02 pm
I have a similar question to the one posed above. How would one describe an EFA using categorical variables using MpLUS (for write up)? Would it be correct to write, "a factor analysis using WLSMV estimator and promax rotation?" What would you write about the extraction method?
In EFA, Mplus outputs eigenvalues from the sample correlation matrix (i.e., with 1's on the diagonal) that can be used to determine the number of factors to retain (e.g., using scree plot, parallel analysis). However, some researchers have argued that when one is conducting an EFA, it may be more accurate to use eigenvalues from the reduced correlation matrix (e.g., with communalities on the diagonal) to determine the number of factors to retain. I was hoping you could explain to me how to obtain these latter eigenvalues in MPlus. In particular, how to do so in the context of my current research situation: EFA with binary data (WLSMV estimation; data are weighted).
I am not familiar with this method and it certainly isn't directly implemented in Mplus. I am not sure if it could be done in Mplus using the Monte Carlo simulation features however.
Xuan Huang posted on Friday, June 08, 2007 - 3:05 pm
Dear professors, I conducted EFA with eight 7-point scale items. I treated these variables as categorical and used wlsmv as estimator. I got 1 eigenvalue larger than 1 which is 5.419. All other eigenvalues range from .149 to .610. The eigenvalue indicates one-factor model may be good. Here are my results: One-factor: ÷2(14)=109.876, P=0.0000, RMSEA=.154, RMSR=.0469; Two-factor: ÷2(10)=47.601, P=0.0000, RMSEA=.114, RMSR=.0280; Three-factor: ÷2(6)=21.236, P=0.0017, RMSEA=.094, RMSR=.0165; Four-factor: ÷2(2)=4.803, P=.0906, RMSEA=.07, RMSR=.009, one residual variance is negative; I am confused how to interpret the eigenvalue. It indicates one-factor model but one-factor model has large RMSEA value and significant ÷2. Could you give me some hints on the inconsistency between what eigenvalue suggests and what the model fix index suggests? Thank you very much!
In view of that, the eigenvalues, and the RMSR, I would conclude one factor. I would, however, look at the other factor solutions and see which items cross-load and think about if that is what you would expect. Some items may not be behaving properly.
QianLi Xue posted on Wednesday, September 30, 2009 - 6:31 am
Hi, Linda, Does MPLUS provide summary statistics for the amount or % of variance explained by each of the factors in EFA?
No, because amount of variance explained is not the focus of factor analysis, but rather of principal component analysis. Also, the percentages are well-defined only for orthogonal rotations such as Varimax, which may not be an optimal rotation method. In the case of orthogonal rotation, you can compute the percentages yourself by summing the squared loadings in a column.
In an EFA of categroical variables, I have negative eigenvalues for 2 of the 32 variables. Is the solution inadmissable? Can this be ignored or should I make some adjustment such as eliminating some low frequency variables? I tried "LISTWISE=ON", but this made no difference. Any other suggestions would be appreciated.
I think this is ignorable. With categorical variables and WLSMV, you work with tetrachoric and polychoric correlations which are computed for pairs of variables at a time and therefore can produce a non-positive definite sample correlation matrix - which has some negative eigenvalues. You can still get a pos-def model-estimated correlation matrix. If the model fits well to this sample correlation matrix, you can view the situation as the non-pos def sample correlation matrix was not "significantly non-pos def." There have been ideas in the literature about deleting the eigenvalues and eigenvectors for the negative eigenvalues and recreating the sample correlation matrix this way, smoothing it, and then fitting the model, but I am not sure that is an important improvement.
If you use ML instead, this issue does not come up because ML does not fit the model to those sample correlations.
Hello. When one is choosing among EFA factor solutions using criteria such as the Scree plot and overall model fit, is it true that when you pick a greater number of factors to be extracted, there is necessarily better fit (according to CFI, TLI, and RMSEA)? Or can it sometimes occur that higher numbers of factors extracted can actually result in a more poorly-fitting model than when fewer factors are extracted? Thanks.
Chi-square will improve but I don't necessarily think that would hold with CFI, TLI, and RMSEA. Note that there is a maximum number of factors that can be extracted from a set of indicators. Also, you can get negative residual variances which make the solution inadmissible.
which example in chapter 4 can save the eigenvalues so i can use O’Connor (2000) macros to generate the random variables eigenvalues. Is a process recommended by the recent paper Hancock, G. R., & Mueller, R. O. (Eds.). (2013). Structural equation modeling: A second course (2nd ed.). Charlotte, NC: Information Age Publishing, Inc. supplementary.
The eigenvalues are printed in the output. They are not saved. This test will be in Version 7 using the PARALLEL option.
ellen posted on Tuesday, October 15, 2013 - 11:00 am
Hi, I just installed Mplus version 7.11 yesterday. It runs well with regular SEM analyses. I wanted to perform a Parallel Analysis to determine the optimum number of factors in an exploratory factor analysis. I used the syntax below, but it's been running since yesterday evening till now for 15 hours, and it's still running with no output available yet. I am wondering whether I made a mistake in the syntax. Why is it taking so long?
VARIABLE: NAMES ARE sex age race y1-y50;
USEVARIABLES ARE y1-y50;
MISSING IS all (-99);
ANALYSIS: TYPE = EFA 1 50; PARALLEL = 1000;
PLOT: TYPE= PLOT2 ;
could you please let me know whether this is the correct syntax to run a Parallel Analysis?
You are asking for 50 factor solutions with 1000 random data sets for each of the 50 solutions. I would imagine that could take some time and you have 50 items. The problem is most likely that you are trying to extract too many factors and are getting negative residual variances which could cause slow convergence. I would choose a range of factors related to the number of factors for which the data were developed. For example, if the fifty items should contain four factors, I would perhaps ask for solutions from 1 to 6 or 2 to 6.