EWickwire posted on Saturday, February 18, 2006 - 8:09 pm
Dear Dr. Metheun,
I'm attempting an exploratory factor analysis with 50 items. There are 538 observations, randomly selected from a total sample of 1076. (After the EFA I will perform CFA/SEM with other 538 predicting a behavioral variable.)
Items are scored on a 5-point scale. The midpoint is "neither" between opposing endpoints ("very good" to "very bad" e.g.). As expected "C" (the middle choice) was the modal score, but variance was acceptable. No item had more than ~80% "C" responses.
In the initial EFA I end up with a huge first factor and a scree plot difficult to interpret with 3-4-5-6-7 factors very close. I run parallel analysis, which indicates retain 5 factors, so that is what I have done.
There are indications of a general factor (i.e. many items load on first factor. Inter-factor correlations are only .3-.7. I cheated in CFA abd model fit is horrible with a one-factor model.)
I'm using ML extraction and switching away from SPSS helped tremendously with model fit. I'm not married to ML, I'm using princiapl axis as well and observing differences. I have also run the EFAs with 10-15 randomly generated items included and the loadings on the first factor increase slightly. When I play with it, I do get 5 interpretable factors, or 4 and one messier (still first, largest)..
It's been suggested to use polychoric correlation matrix in an EFA. Items are def categorical and nonnormal, although no skew/kurtosis exceeded 2/7. Multivariate nonnormality was also no satisfied (11% outside mardia's statistic but they are ioncluded in the sample.)
I also know that I should eliminate items but do not know details of that process. (Is it iterative, where I delete all items that don't load .3 or perhaps (?) load on 2+ factors? Then do I rerun the EFA, and adjust/rerun again if necessary?)
I've been stuck with this for a while and have heard of some of the tools, but I'm not sure how to proceed. What would you recommend for
1. Next step? eliminate items? Using which criteria? Then rerun?
2. How much would polychoric corrs improve the EFA? Should I use mplus to factor analyze polychoric corrs?
3. Are ML/Promax correct?
I appreciate your time very much--I've been stuck for too long!
bmuthen posted on Monday, February 20, 2006 - 7:47 am
With a dominant factor, it sounds like you want to try a general-factor-specific-factor CFA approach. In addition to a factor influencing all items, uncorrelated residual (specific) factors allow further correlations among sets of items.
It doesn't sound as if you have to switch from continuous-variable ML modeling to categorical-variable modeling given your relatively symmetric distributions.
EWickwire posted on Monday, February 20, 2006 - 8:28 am
Many thanks for a prompt reply.
A couple quick follow-ups:
1. Are you suggesting that I should drop the EFA portion of my analytic plan? Or how would I report?
2. With your CFA suggestion, do you mean I should have one factor influencing all indicators directly (i.e "on the left" in a CFA diagram) and then uncorrelated specific factors "on the right" (in that same diagram)? I believe this is a nested hierarchical design?
3. A question on interpretability: Based on that nested hierarchical design, from which factors would I predict my DV in the SEM? Would you expect any predictive value from the g factor OR from the specific factors, or both (since the specific factors are now residuals)?
bmuthen posted on Monday, February 20, 2006 - 9:20 am
1. I would do the EFA, but noting that Varimax and Promax may not be the best rotation schemes with a dominant factor (this is well-known in the literature), and then add the CFA, where the item sets for the specific factors would be suggested from substantive theory or the EFA.
2. Yes, but I don't see that as nested. To me, nested would be first-order factors nested within second-order factors, which is a different model.
3. Both. And the decomposition of the contributions from the general and specific factors would be clearer due to them being orthogonal. For applications, see articles by J.E. Gustafsson in Intelligence.
Ewickwire posted on Monday, February 20, 2006 - 9:37 am
1. Which rotation would you try, or are you saying to stick with promax but note that it can be problematic with a dominant factor? (Do you have a reference for this? I've searched but must be looking with wrong terms.)
2. If I understand correctly, I should:
a. Perform EFA with ML extraction (any reason to use or not use Principal Axis, or just see which produces more interpretable results?)
b. Eliminate items (load less than .3, or low communality <.20, or load on multiple factors)
c. Rerun EFA (Do I then remove items based on same criteria and rerun again?)
d. Report results of EFA (perhaps after factor analyzing specific factors, as in searching for a dominant factor?)
e. Use results as basis for CFA
f. Predict DV from both g and specific (residual) factors in CFA/SEM
Dr. Muthen, you're a lifesaver. Again, thank you.
bmuthen posted on Monday, February 20, 2006 - 6:39 pm
1. Mplus only offers Promax and Varimax, so stick with Promax. For the rotation issue, check Multivariate Behavioral Research for a recent EFA article by Cudeck and Browne (?) as well as classic EFA books such as Harman and also Gorsuch and Mulaik.
2 a. Yes ML b. Perhaps; in our courses we recommend doing this in an EFA within a CFA, but if you haven't done that before, you may not want to do this now.
c - f Right
EWickwire posted on Monday, February 20, 2006 - 8:24 pm
Would you ever see it justified that you switched from EFA to exploratory CFA because the data suggested a g factor?
In my split sample EFA-CFA design, would you still do the exploratory CFA in the first sample, then confirm it in the second sample?
What I know about exploratory in CFA context is essentially looking at modification indices and tweaking to improve model fit. If this is what you mean, is there an article that explains or provides a good example for how to write up and report?
bmuthen posted on Tuesday, February 21, 2006 - 3:13 pm
If theory and data suggested it, yes.
Yes; surprises come up.
No article that I know except Joreskog 1969 in Psychometrika. But that is not applied or for writing things up.
EWickwire posted on Tuesday, February 21, 2006 - 3:19 pm
Is adjusting CFA based on modification indices what you mean by "EFA within a CFA?"
Although we would like to give a little help to beginners, I'm afraid this forum should not take the role of an advisor. So we can't give much more help than is in the handout describing the approach - are there any particular parts of pages 126-129 are unclear to you and your advisor? I don't think splitting samples into exploration and validation can be considered old fashioned; if you have a large sample such as yours, I would split it.
That's a big topic. We take a couple of hours to discuss it in our annual November course in Alexandria - I would recommend it to you.
Briefly, I would remove items for which such cross-loadings were not intended or do not make sense. I would keep it simple, so all at once in both cases. And, you don't want a factor structure that is sensitive to dropping some items.