Anonymous posted on Wednesday, March 17, 2004 - 7:12 pm
I have recently received responses from reviewers to a paper we have submitted. We have developed a new multi-dimensional instrument, consisting of 4 scales, one of which is pre-existing, one completely new and the other 2 adapted (with modifications and new items) from pre-existing scales. None of the 4 scales have appeared together before. We conducted an EFA to reveal our factor structure (albeit we obviously had a pretty good idea of what our factors should be). Both reveiwers have responded saying that we should have conducted a CFA. From all our reading this would appear to us to be wrong advice, although we can't claim to be experts on factor analysis. I would be very interested in and grateful for your opinion.
I would start with an EFA to weed out bad items and factors, then do an EFA in a CFA framework to obtain standard errors for the factor loadings, and then do a simple structure CFA. These steps are outlined in the Day 1 handout from our short courses and can be purchased.
We are currently using an EFA to find and remove poor items followed by a CFA to confirm the structure (WLSMV estimator in both cases), however, we would also like to use an EFA in a CFA framework. Unfortunately, when we run an EFA in the CFA framework, our model does not converge. It may be due to the fact that our model is very complex (5 factors, 40 items, 2451 subjects), but is it possible that items on different scales may be causing the model to not converge (i.e., we have some items on 3-point, 4-point, and 5-point likert scales)?
One more question related to convergence: What was the specific reason for choosing the initial convergence criterion of 0.00005 (for WLSMV)? I have noticed that if I adjust it slightly, say to 0.00001, the model will converge. I guess I'm just wondering if it is okay to adjust this value or if I need some theoretical justification for doing so.
Also, we have a handful of items that look good in the EFA, but after removing bad items and running the initial CFA, they no longer have significant loadings. Does it make sense to remove these items and run a second CFA or should they be left in?
Thanks for all your help.
bmuthen posted on Friday, February 10, 2006 - 7:37 pm
Your EFA within CFA should work fine - perhaps you need to give more key items starting values of 1. You should follow the advice that we have in our new short course handout.
Regarding convergence, your adjusted convergence criterion is stricter so I can't see why that would make it converge - probably something else is changed as well. Using the default is probably best here.
After removing the bad items, the EFA need to be rerun to see that the model still holds up. Don't go straight to CFA.
Thanks for your help. As you describe in your handouts, we have our EFA within a CFA framework set up for m=5 factors and m-sqared=25 restrictions, but the model does not converge. When you said that we may need to give more key items starting values of 1, were you referring to adding restrictions (anchor terms, etc.) in addition to the 25 we already have?
Regarding our CFA: We have come across a few items that load around .4 in our revised EFA, but do not load at all (.15-.17) in our CFA. The same items are included in the revised EFA and the CFA. Could this be due to "bad" items or is it some inherent difference between EFA and CFA methodologies?
The default starting value in Mplus for factor loadings is one. You need to start all small loadings at zero and start the key ones at one. For example,
verbal BY y1-y10*0 y2*1;
This starts them all at zero and then overrides the zero by one for y2. If this does not help, send you input, data, output, and license number to firstname.lastname@example.org.
Tor Neilands posted on Saturday, February 25, 2006 - 10:22 am
My experience has been that many reviewers are uncomfortable with using CFA in the service of exploratory factor analysis work and related model development. I agree 100% with you, Bengt, that CFA models and associated features (e.g., modification indices; greater control over model specification) make CFAs a useful tool in conducting exploratory scale development work.
Can you suggest references to cite in response to reviewer critiques of the practice of using CFA in the service of exploratory factor analysis and scale development? Also, how does one obtain the handout referenced in your post of February 10th?
With many thanks and best wishes,
bmuthen posted on Saturday, February 25, 2006 - 11:24 am
I think of the "EFA within CFA" approach of Joreskog (1969) Psychometrika as an EFA. But you get the added advantages of CFA in that you have SEs, Modification Indices, and can correlate residuals.
The Short Courses handouts can be ordered off the web site. This topic is covered in Day 1 of our Short Courses.
Tor Neilands posted on Saturday, February 25, 2006 - 5:51 pm
When will the short course handouts reflect the syntax and features in version 4? Or perhaps they have already been updated?
bmuthen posted on Sunday, February 26, 2006 - 5:58 am
Most is fine as is, but some simplifications, extensions, and new examples will be included gradually for the training sessions in May, June, and November, and will then be made available.
Hi, I have problem to run CFA after I decide EFA structure. I got a warning from CFA output.
"WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE FACTOR2."
I do check the correlation matrix by output TECH4, and I think the problem is from factor2 as well. However, I don't know how to fix it. Does negative factor loadings will cause problem? (my factor2 with negative loading) Or, what other restrictions I need to write when I have negative loading in my factor in CFA model?
The problem would not be a negative factor loading but a negative residual variance or one of the other problems noted in the message. If you can't figure this, please send your input, data, output, and license number to email@example.com.
I'm trying to do an EFA within a CFA-framework and I have the same problem as Mr. Sembower, that is, the parameter estimation process does not converge. I have also quite a complex model with 3 factors and 66 items. Sample size is about 580. I followed exactly the specifications, that are outlined in the handout and in the book "Confirmatory Factor Analyis for Applied Research" from Timothy A. Brown.
When I specify the same model with the same dataset on Amos, it converges without problems. I also tried to set all the starting values for factor loadings to zero and the key ones to one. Unfortunately, it didn't help. I also increased the number of key loadings with starting values of one up to five for each factor, but still to no avail. What else could I do, to help the parameter estimation process to converge?
It sounds like there are defaults that differ between Amos and Mplus. Check that you have the same number of and the same free parameters in your model. If this does not help, please send your input, data, output, and license number to firstname.lastname@example.org. Include the output from the EFA.
A journal article published the factor loadings on two orthogonal factors found doing an EFA on the items of a scale using data from a minority population. I want to see if the same factor loadings would be found in another minority population. I thought about just running a CFA to check the fit of a model where the loadings were constrained to be the same as those in the article. But the loadings from the article are correlations, whereas those from the CFA are regression coefficients if my understanding is correct. If this is true, how do I constrain the CFA loadings to equal the published values? If the published EFA factor loading for variable X on the first factor was .654, I can't just say in Mplus: Factor1 by x@.654 since the .654 was a correlation and Mplus will set the regression coefficient for X at .654.
If you standardize your variables, free all factor loadings, and fix the factor variances to one, you will be in the EFA metric. That being said, I think this is too stringent a test. I would instead do an EFA on the new data and see if the factor solution is close but not exact.
Thank you, Linda. I appreciate your suggestion about doing an EFA on my data and seeing whether the factor solution is close to the published solution. I was hoping to take advantage of CFA to 1) get CFI,RMSEA, and ChiSquare values for the fit of the published structure to my data, and 2) formally test whether another factor solution in the population I'm studying has superior fit than the published structure. Is this often done with CFA? It seems like this type of confirmation would often arise when testing scales in new populations. All the literature I have on testing factorial invariance across groups assumes the analyst has the data for all the groups, not just published values.
I'm confused by your suggestion to "free all factor loadings". I don't think I want them freed -- I want them to be fixed at values that correspond to the published EFA factor loadings (correlations). That way I can assess the fit of the published structure to my data using the CFA and RMSEA values.
If I follow your directions to "be in the EFA metric", it sounds like I will just be doing an EFA using CFA syntax. That would be interesting since I would be able to test the loadings for significance. However, wouldn't the CFI, RMSEA, and Chisquare just assess the fit of whatever structure was found in my data, rather than than tell me specifically about the fit of the published structure?
Thank you Linda for this information. However, I don't understand how it answers the question in the last line of my previous post: "wouldn't the CFI, RMSEA, and Chisquare just assess the fit of whatever structure was found in my data, rather than tell me specifically about the fit of the published structure?".
Say my EFA reveals Chisquare p = .0356, RMSEA = .032, CFI = .975. Wouldn't that just indicate that the factors found in my data (totally ignoring the published factor loadings) fit my data well? The stats would seem to say nothing about how well published factor loadings fit my data.
If the above is correct, it appears the only way to tell how well the factor structure of some scale items published for one population fits another population is to run an EFA on data from the new population and eyeball how close the computed loadings match those published for the reference population. Is that right? If yes, this seems like a process that would have been available in 1980 or even 1970; I was hoping that CFA would offer a way to test whether the observed dissimilarity in loadings was beyond what might occur by chance.
I have divergent findings in CFA and EFA analyses . I have started the analysis with CFA with WLSMV estimation because we had a theoretical measurement model (ordinal data) which we wanted to test. The factors are correlating quite highly, around .60 -.70. The level of fit is good. However, one reviewer asked for an EFA analysis, therefore we performed it. We have the same three-factor structure in both analyses, but some items seem to be represented strongly in another factor that they should be. Why these two analyses have divergent findings? What is your suggestion to go further?
I and my colleagues recently ran a EFA analysis based on a likert questionnaire. We were quite happy with our results and found that our four factors (based on 15 variables and a sample of about 300) supported other theoretical research in the field even though there were no a priori factors suggested. However it was suggested that we should now run CFA on the same data, thus suggesting that EFA is only the first step in testing latent structure. We did run CFA as suggested, but we found that RMSEA had a very, very poor result (0.30) even though cfi (.94), gfi (.96) and rmsr (.08) were much better, thus casting great doubt on the CFA model. Thus, if running CFA after EFA with the same data is correct does this also suggest that the EFA model itself is very doubtful, even though CFA has considerably different assumptions, like the zero loading restriction of CFA (we fully expect some based on our data and EFA)?
The zero cross-loadings of CFA which EFA allows to be non-zero but small is typically what makes for the downfall of CFA. You can use Modindices to see which cross-loadings might need to be freed up. And which residual correlations might need to be freed up. A Bayesian alternative approach is presented on our "BSEM" page:
Daniel Lee posted on Friday, May 06, 2016 - 3:03 pm
Hi, this is a general question about EFA via mplus. I have items that load well to one of the several factors (so that's good), but some of these good items, per the modification index, is contributing heavily to model misfit. I was wondering if you could help me understand how this might happen.
An item can be measuring the factor well and still have strong residual correlations with other items - perhaps indicating further factors.
Dodam Park posted on Wednesday, April 12, 2017 - 8:18 pm
Dear Dr. Muthén I am validating a scale developed by other researchers with a new different sample. I performed EFA, and couldn't find theoretical reasons to support the result regardless of which factor model. Thus, I would like to perform only CFA without EFA. But, would it be appropriate to do like this because I can't get any proper factor model based on EFA? Or should I make EFA result meaningful somehow or other?
This general analysis question is better suited for SEMNET.
QianLi Xue posted on Tuesday, February 04, 2020 - 9:00 am
If I fit a one-factor EFA model and a one-factor CFA model (setting variance of the factor to 1 while freeing the 1st loading, and request STYX), I would expect the two will give identical loading coefficients. But they don't. Why is that?
Below is the syntax for both:
---------------- TITLE: this is an example of an CFA analysis with continuous factor indicators DATA: FILE IS H:/ex4.1a.dat; VARIABLE: NAMES ARE y1-y12; MODEL: f1 by y1* y2-y12; f1@1;
OUTPUT: STDYX; ---------------- TITLE: this is an example of an exploratory factor analysis with continuous factor indicators DATA: FILE IS H:/ex4.1a.dat; VARIABLE: NAMES ARE y1-y12; ANALYSIS: TYPE = EFA 1 1; ESTIMATOR=ML; ROTATION=VARIMAX; OUTPUT: TECH1;