Anonymous posted on Friday, April 23, 2004 - 11:45 am
I have recently run a confirmatory factor analysis with several constructs related to the development of antisocial behavior. Although fit indicies and chi-square change statistics indicate that the four factor solution proposed is the best fitting model, the factors are highly correlated (.75-.84). Will statisticians question the independence of these constructs because of their high correlation? In addition, do you run into the same problems in SEM with multicollinearity when including highly correlated factors in a regression together?
bmuthen posted on Friday, April 23, 2004 - 12:51 pm
I don't think this is problematic. Factors are often naturally correlated, not independent. Multicollinearity is always a risk. One way around this is to postulate a second-order factor behind highly correlated factors.
Anonymous posted on Wednesday, April 28, 2004 - 11:22 am
Thanks for the info. A few follow-up questions. Do you have reccomendations on how to check for multicollinearity in Mplus 3.0? In addition, when a second-order factor is used to handle the issue, do you simultaneously regress the DV onto the second order factor and all first order factors, or do you simply regress on the DV onto all first order factors without including the second order factor?
bmuthen posted on Friday, April 30, 2004 - 12:15 am
I think multicollinearity can be checked just like in regular regression with observed variables. With a 2nd-order factor, you can simply regress the DV on that factor only.
Anonymous posted on Monday, November 22, 2004 - 12:18 am
I just have a question about the correlation between factors in a confirmatory factor analysis. I wonder if a rotation is used when doing a CFA. If yes, is it promax? If no, how are calculated the correlations between factors? Another question: if the observed variables are categorical, does it make a difference about the correlation between factors?
Note that this is only a correlation if the metric of the factor is set by fixing the factor variances to one. If the metric of the factors is fixed by setting one factor loading to one, then this is a covariance.
yufang posted on Thursday, October 06, 2005 - 5:39 pm
Does anyone know references that recommends how large a correlation between factors is considered moderate or high? and at which point should we check for multicollinearity?
bmuthen posted on Saturday, October 08, 2005 - 5:55 pm
I ran a CFA and got some correlation coefficients among the factors with absolute values greater than 1. Thank Linda for telling me that this means the corresponding factors are not statistically distinguishable. Now I have another question: how can the correlations coefficients among the factors have absolute values greater than 1? Are we using some different formula to calculate these coefficients? Shouldn¡¯t the correlation coefficients should range from -1 to 1? Thanks a lot.
Correlations are not calculated using a formula. The correlations are estimated as part of the model. When variables correlate one, model estimation is thrown off and values greater than one can occur. This is why the results are inadmissible in this case.
yshing posted on Tuesday, November 14, 2006 - 3:53 pm
I'm running a multiple-group CFA with 2 factors. For theoretical reason I don't want to fix the variance of the latent construct to one. For one of my groups I would like to fix the correlation of the two latent constructs to 1. I understand that a nonlinear constraint is needed in the model. How can I do this?
i'am trying to make a confirmatory factor analysis from categorical variables only:
MODEL: f1 BY fte1*0.600 fte2*0.526 fte6*0.745 fte7*0.654 fte8*0.690 fte10*0.639 fte11*0.655 ftt16*0.409 ftt17*0.460 ftt18*0.463 ftf18*0.408; f2 BY ftt19*0.405 ftf1*0.745 ftf2*0.749 ftf3*0.628 ftf8*0.472 ftf9*0.428 ftf13*0.484 ftf17*0.402;
f1@1 f2@1; f1 WITH f2@0;
Under this model I would assume that the factors have variance 1 and uncorrelated. However, the saved factor scores are still correlated and have a variance unequal to one (around 0.5). Do I have to specify more?
Factor scores are not identical to the factors in the estimated model. Deviations can occur when the factors do not have high factor determinacties. This is one reason that it is better to work with factors in a simultaneous model rather than work with factor scores.
so, what do I save with the scores? The deviations are pretty large for a two factor model. Can it be, because the model is bad (CFI 0.171, TLI 0.202, RMSEA 0.243)? I wanted to test a 2-factor model, before I go to the full 9-factor model. Can I check somewhere how many iterations the CFA needed? Maybe the model has not converged, it took some minutes to finish; although estimating each factor separately was very quick. Since we want to estimate a SEM or a Multilevel model with these factors with another program I need reliable factor scores, if possible.
You can do this in two ways. You can set the metric of the factors by setting the factor variances to one. Then you obtain correlations among the factors rather than covariances. Or you can use MODEL CONSTRAINT to create a correlation from the covariance.
I would be very grateful for your advice: 1) Assuming I do a CFA and want to test the orthogonal model, would "f1 WITH f2@0" be the correct syntax to use? 2) Is it possible to obtain CFI, TLI, RMSEA, WRMR values for the the independence model (standard control in CFA) and what would the relevant syntax be? (I have 23 indicators that are ordinal/categorical data and 2 latent variables) I tried MODEL: f1 BY v1 f2 BY v2 ... f23 BY v23 but it obviously didn't work
Thanks very much. Unfortunately, I cannot see any CFI/TLI/RMSEA/WRMR values for the baseline model in the output. I can see the relevant chi-square, df and p value though. Where should I look for the CFI/TLI/RMSEA/WRMR of the baseline model? many thanks Ioanna
where p is the number of variables. This will then get you a chi-2 test saying how well (probably how very badly) the baseline model fits relative to an unrestricted model. Again, CFI is not relevant since both the H1 and H0 models are the baseline model.
Kihan Kim posted on Thursday, January 29, 2009 - 4:11 am
Dear Dr. Muthen,
I'm testing a two-factor CFA model with 7 items (4 items for F1, and 3 items for F2).
I wanted to perform a chi-square difference test between (M1) two-factor CFA (factors are allowed to correlate), and (M2) two-factor CFA with inter-factor correlation fixed at 1.
When I run M2, I'm keep receiving the following convergence problem. I looked over the User Manual regarding "Convergence Problems," and am still not sure what I should try. Could you help me resolving this problem?
NO CONVERGENCE. SERIOUS PROBLEMS IN ITERATIONS. ESTIMATED COVARIANCE MATRIX NON-INVERTIBLE. CHECK YOUR STARTING VALUES.
Unless you have set the metric of the factor by freeing all factor loadings and fixing the factor variance to one, you are fixing the covariance to one not the correlation.
Fixing a parameter to an incorrect value may cause convergence problems. If you want to test if a covariance or correlation is equal to one, use MODEL TEST. See the user's guide for further information.
Kihan Kim posted on Friday, January 30, 2009 - 3:04 pm
Thank you for your answer. I was able to fix the inter-factor correlation to 1 by freeing all factor loadings and fixing the factor variance to one.
I was also trying to use MODEL TEST, but I'm still not clear how to use it. Could you suggest a command so that I can set the inter-factor correlation to 1 using MODEL TEST command for the following MODEL command of 2-factor CFA?
Model: f1 by id1 id2 id3 id4; f2 by fit1 fit2 fit3;
You can't get these with TYPE=EFA but you can get them with the new EFA using the MODEL command. See the Version 5.1 Examples and Language Addendums on the website with the Mplus User's Guide.
Tracy Witte posted on Thursday, March 04, 2010 - 3:57 pm
I am running a CFA with the WLSMV estimator. Similar to the original person on this thread, the correlation between my factors is very high (in my case, approximately .92). However, when I use the difftest procedure to compare the two factor to the one-factor model, the results show that the fit of the model is significantly worsened when I specify a one-factor model. Is my next step to determine if the factors have differential predictive validity? It seems unlikely with such a high correlation between the two of them. From my reading of the above thread, it looks like I should specify a higher-order factor and regress the DV onto lower-order factors. Is this correct? (thank you!)
I want to confirm I am interpreting my output correctly, as I am unsure exactly how the unstandardized vs. STDYX output is calculated within M+.
Within a larger model, I have multicollinearity between two latent predictors (Parental Support and Peer Support) of the latent outcome variable - educational expectations.
The issue I have is that the standard errors (SEs) associated with these direct effects are much much larger in the unstandardized output than in the standardized STDYX output.
I know inflated SEs are a sign or symptom of multicollinearity and have addressed this by constraining the paths of peer and parental support --> educational expectations as equal. (I'll spare the details).
Even after doing so, why are the SEs associated with these two direct effects so much larger in the unstandardized section of the output than in the STYX output?
Hi, I would like to test whether factor correlations are equal to each other in CFA, and I have defined in Mplus: F1 with F2 (1); F2 with F3 (1); F3 with F4 (1); F5 with F6 (1); I got the output where all covariances are 0.044, but correlations from STDYX standardization range from 0.338 to 0.512.
LR test from EQS: Chi^2(3)=6.89, P=.07 LR test from Mplus: Chi^2(3)7.80, P=.05
Have I performed the right test in Mplus, if I want to replicate the results from EQS? Thanks in advance.
The difference in results between Mplus and EQS is likely due to Mplus using n and EQS using n-1 for the chi-square computation. This difference shows up for small samples which I assume you have.
Delforterie posted on Friday, March 18, 2011 - 11:30 am
In this post I read that when the correlation between factors is greater than 1, this means the corresponding factors are not statistically distinguishable. But does it also mean that the correlation is actually 1.00, and the model is simply estimating a somewhat higher correlation?
Additionally, does this mean that when I have 2 factors with a correlation of 1.16, I can assume a one-factor model fits the data better? When I did a Chi-square difference test, it indicated that the two factor model was better.
A correlation estimate of 1 means that the factors are indistinguishable. A correlation estimate higher than 1 means that the model does not make sense for the data because correlations should not be higher than 1. So even if chi-square says that two factors fit better, you should not choose that model. Instead, another model should be explored.
Hi, I am new to MPlus and I am still learning the codes to run the analysis. I am running a five factor CFA model and wanted to remove non significant correlations between 2 latent factors. Could you please tell me how this is done in MPLUS. Thanks for your help.
I would not fix them to zero. I would leave them as is.
Xiaolu Zhou posted on Tuesday, November 08, 2011 - 7:22 pm
I am a new user. Could you help me with my CFA syntax? My data is binary data. I am not sure if my syntax is correct to check the correlation between the 2 factors with this syntax. My syntax is: TITLE: c scale
VARIABLE: NAMES ARE country c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30;
For the output part, I have two questions: 1.where I can find the correlation between the two factors? 2. for the model fit, if chi-square, CFI and RMSEA is good, only TLI is less than .95, can we still call the model fit is acceptable? Many thanks!
The correlation is in the results under b WITH v. If most fit statistics show good fit, that should be acceptable.
Xiaolu Zhou posted on Wednesday, February 01, 2012 - 4:19 pm
I have another question about the CFA of binary data: I found some non-significant thresholds. What does these mean? Do they matter to my model? If they matter, what should I do with them? Thanks a lot!
Thrsholds are used in the computation of probabilities and to test for measurement invariance. I would not be concerned with their significance.
Sarah posted on Friday, October 05, 2012 - 6:49 pm
I have a quick and probably very easy question.
In my SEM model I wish to obtain the correlations between my latent factors. I used the "tech4" option which provides such correlations. But how do I know if the correlations are significant? Put differently how can I obtain the level of significance of the correlations?
please guide me.... i have utilized second order factor model..where
F1 BY y1 y2 y3;
F2 BY y4 y5 y6;
F3 BY y7 y8 y9;
F4 BY F1 F2 F3;
F5 BY x1 x3 x4 x5;
F6 BY v1 v2 v3 v4;
F5 ON F4;
F6 ON F5;
MODEL INDIRECT: F6 IND F5 F4;
Fit indices are CFI=.912, TLI=.90, RMSEA=.038, SRMR=.05
But problem is that F4 which is comprised of three factors, shows non-significant relationship with F2,
I have checked that scale correlation with each subscale and get highly significant correlation…yes no doubt structural equation modeling better deals with measurement errors and consider residuals covariance as well….but please tell me is there any way to come out of this problem….
F1 has .832 standardized estimates with F4
F2 has -.022 non sig correlation with F4…..its residual variance estimate is 1 n significant….
F3 has .899 standardized estimates
now how i would report it...as one factor is non significant....but model fit indices are good enough
Dear Professors, I am trying to check if there is Multicollinearity with my high two correlated factors (.86) in a regression. As suggested, I simply regressed the both factors on a second-order factor. I obtained VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE, where my second-order factor and a DV had a correlation greater than 1. Is it a indicator of multicolinearity? All best, Hugo
Yes, but make sure you don't change your SEM model by adding these correlations. You find latent variable correlations in TECH4. If you want correlations between latent and observed variables not included in TECH4 you have to put a factor behind the observed variables.
Typically, however, you report the model parameters, not these correlations.
I have a simple question about including multiple correlation (WITH) statements in a model. I have a latent factor that I am then correlating with the five domains of personality modeled as separate indicators. I have run five separate models with one personality domain correlated with the latent factor at a time, but I want to correlate all 5 of them concurrently with the latent factor. This runs fine and the values are similar to those acquired when I run the separate models. My question is regarding the concurrent model- are these correlations independent of each other or will they be controlling for the other indicators that are being correlated with the latent factor at the same time?
I am comparing 1 and 2 factor models for factor mixture analysis. In the 2 factor model, factor correlation turned out to be 1. I suspect this means factor 1 and 2 are statistically not distinguishable, and 2 factor model should not be considered.
However, AIC and BIC are smaller in 2 factor model than 1 factor model (AIC, 10111 vs 10012; BIC, 10273 vs 10189). Does this mean that 2 factor model is better model for my data? Is there anything I could do to improve model?
Just to clarify, here is the input statement for 2 factor model. Thanks a lot for your advice,
--INPUT---- TITLE: LCA-CFA DATA: FILE = '35itemN888.dat'; VARIABLE: NAMES = u1-u35; USEVARIABLES = u14-u17 u30-u32; CATEGORICAL = u14-u17 u30-u32; CLASSES = C(4); ANALYSIS: TYPE IS MIXTURE; ALGORITHM=INTEGRATION; MODEL:%OVERALL% f1 by u14-u17; f2 by u30-u32; OUTPUT: STANDARDIZED; ------ ----OUTPUT---- STANDARDIZED MODEL RESULTS Estimate S.E. Est./S.E. P-Value F2 WITH F1 1.000 0.000 ********* 0.000 -------
You could check which factor loadings are big and for pairs of items with large loadings you can replace the factor with WITH statements for those pairs of items. Use Parameterization=Rescov. See the new article on our website:
Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169-177, DOI: 10.1080/10705511.2014.935844