Negative residual variance
Message/Author
 anon9210 posted on Thursday, September 02, 2010 - 6:29 pm
Hi,

I am trying to run an EFA using 8 indicators (geomin rotation; WLSMV estimator; N approx. 3000). The scree plot I get strongly suggests three factors; this makes a lot of sense for theoretical reasons too.

However, it appears that one of my factors is strongly defined by just one of my indicators. I am guessing this is why that particular indicator has a loading > 1 on its factor; additionally, this leads to negative residual variance for that indicator (res. var. = -.63; s.e. = 1.98; est/s.e. = -.32). I get no error messages of any sort. The three factors are relatively uncorrelated with each other (.18, .003, and .13; the latter two being the intercorrelations of the factor I am talking about). The remaining seven indicators look ok; nothing seems out of place with them.

Fit indices for the 3-factor model are as follows: CFI = 1.000; TLI = 1.004; RMSEA = 0; SRMR = .028.

Fit indices for the 2-factor model are: CFI = .947; TLI = .895; RMSEA = .037; SRMR = .094

So:

1.) Is the 3-factor solution a problem? Or am I running into a modeling technicality?

2.) Relatedly, should I stick with the 3 factor solution, or should I drop to a lower 2 factor solution?

3.) Is there anything I can do to avoid the negative residual variance?

 Linda K. Muthen posted on Friday, September 03, 2010 - 9:05 am
Any solution with a negative residual variances is inadmissible. It sounds like the third factor has only one strong indicator which is problematic.

You could try to run the model using ESEM (see Example 5.24) using MODEL CONSTRAINT to keep the residual variance positive. Check to be sure this does not change model fit.
 anon9210 posted on Friday, September 03, 2010 - 4:40 pm
Thanks! On a related note, I am trying to figure out which rotation to use for my analyses. I get similar results with Oblimin and Quartimin; oddly, enough when I use Mplus' default Geomin, the results change somewhat. I am particularly interested in the significance of the correlations between my factors and I noticed that if I use Oblimin or Quartimin, the magnitudes of the correlation coefficients all drop as a whole, but all are somewhat equal and are significant. However, when I use Geomin, there is a lot more discrepancy in the magnitudes of the correlation coefficient, and some of them end up being significant and some of them, nonsignificant. Which one would you recommend in this case?
 Linda K. Muthen posted on Saturday, September 04, 2010 - 9:05 am
We would recommend our default Geomin. See the paper below which is available on the website and also the Browne (2001) reference in the user's guide:

Sass, D.A. & Schmitt, T.A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73-103.
 anon9210 posted on Saturday, September 04, 2010 - 10:35 am
Thanks for the references - I will be sure to go over them. I also read Asparouhov & Muthen (2008) and noticed the footnote about geomin EFAs with more factors using larger epsilon values. I tried playing around with these some, while keeping a three-factor solution constant (based on theoretical reasons and the scree mentioned above; I have 11 indicators though now, so no negative variance problems), and noticed as the value of epsilon increased, the Geomin factor correlations resembled Quartimin and Oblimin more (i.e., more significant correlations between factors; though magnitudes of correlations were more equal and lower). So, a couple of questions again:

1.) Would this support going more with greater epsilon values, or sticking with the Mplus defaults?

2.) Additionally, lower epsilon values also lead to loadings greater than 1 (e.g, 1.048) for that one variable I mentioned previously (even with 11 indicators), though I no longer end up with negative residual variance. I checked previous threads, and if I am understanding things correctly, this is due to a couple of reasons:
(a) As geomin factor loadings are more similar to regression coefficients rather than correlation coefficients, they can be greater than 1, and
(b) Additionally, the correlation between factors might be contributing to this.
Is this correct?
 Linda K. Muthen posted on Sunday, September 05, 2010 - 11:46 am
1. Model fit and the model estimated correlation matrix are the same for all rotations so statistics can't help in this decision. Look at which solution seems most reasonable.

I would use the Mplus defaults unless I had a strong reason for doing otherwise.

2a. For all rotations, factor loadings are regression coefficient which can be greater than one.

2b. Yes.
 Tess Yanisch posted on Friday, July 20, 2018 - 12:51 pm
Hello Drs. Muthen,

I have a three-factor, oblique geomin CFA with categorical variables. It has good fit. One factor has two items, one of which loads >1 and has negative residual variance and an undefined r-square.

I understand one way of fixing this is to constrain the error variance to be positive, possibly by using ESEM as you mentioned above.

(1) I want to do invariance testing at the metric and configural levels (comparing the factors between genders, ethnicities, etc.). Would constraining the error variance affect that?

(2) What do I constrain the error variance to? Anything positive? 1? 0? 50? .7? What are the consequences of what I put in here?

(3) Am I correct in thinking ESEM would be an appropriate fit here?

Many thanks!
 Bengt O. Muthen posted on Friday, July 20, 2018 - 1:55 pm
You don't want to try to fix/constrain residual variances for categorical factor indicators, partly because they are not free parameters to be estimated. For instance, fixing it at zero in a Theta parameterization creates division by zero. Instead, drop the item. And/or drop the factor and instead perhaps add a residual covariance between the two items (that's what the factor achieves after all); also, a factor with only 2 categorical indicators is not a very trustworthy construct.
 Tess Yanisch posted on Friday, July 20, 2018 - 2:43 pm
Thank you very much! That's a big help.

How would I get at the residual covariances? The syntax

MODEL:
F1 by x1* x2 x3;
f2 by x4* x5 x6;
x7 with x8;

would just give me a correlation between them; can I only get residual covariance when I'm doing something else (e.g. regressing an outcome on f1, f2, x7, and x8, and adding "x7 with x8" there), or is there a way to get it in the model-fitting stage?

I know the two indicators aren't great. It was initially more; getting model fit as good in the CFA as it was in the EFA required dropping the lower-loading items. I actually experimented with adding them back in as a way to fix the >1 loading, but unfortunately, that didn't help.
 Bengt O. Muthen posted on Friday, July 20, 2018 - 3:55 pm
If x7 and x8 are not influenced by any variables such as factors, x7 WITH x8 will indeed refer to their correlation, not their residual correlation (covariance). I assume you want them correlated with the other variables in the model (latent or observed) - still, they won't contribute much to inform about F1 and F2.