Hello. I am conducting a CFA (MLR estimator) with three, continuous indicators across 4 groups. For one of my groups, I get one non-significant, negative (near zero, within 95% CI) residual variance when my intercepts are freely estimated and factor mean is set to 0 for all groups (my Model 1). I could set the residual variance to 0, but then I get a standardized factor loading and r-square of 1.000, which I consider non-"useful" information. If I re-run Model 1 for just that group, I get the same results.
But, when I constrain factor loadings to be equal across groups (free intercepts, factor mean at 0) for my Model 2, I get a positive residual variance for that variable and group (.609), which is a value still within the original 95% CI of Model 1. To re-run Model 1, might I set the residual variance for that variable for that group to .609 instead of zero so that I might get "usable" estimates?
bmuthen posted on Saturday, February 26, 2005 - 4:54 pm
If the model with invariant loadings fits well, you could make an argument that what you propose is reasonable - you are borrowing information from the other groups to get a better estimate for that residual variance. It is a bit ad hoc, however, since the fixed value of .609 has sampling variability so the resulting SEs of the model might need to be taken with a grain of salt (i.e. work with conservative tests of parameter significance).
rpaxton posted on Saturday, February 18, 2006 - 9:06 pm
I and trying to confirm the factor structure of 2 second order CFA's with 5 factors each. For some reason the residual variance for one of the factors is negative. Would you recommend deleting that factor? What steps should be taken to handle this situation.
bmuthen posted on Sunday, February 19, 2006 - 2:45 pm
You could fix that residual variance at zero. That would mean that that first-order factor is a perfect indicator of the second-order factor - this happens in some instances.
rpaxton posted on Sunday, February 19, 2006 - 4:27 pm
How would I fix the residual variance to zero. This is my model statement: f1 by var2-var3; . . . F10 by var27-var30; exper by f1-f5; behav by f6-f10; !............................... Should I just say F9@0 below the final statement. Thanks
bmuthen posted on Sunday, February 19, 2006 - 4:42 pm
Valeriana posted on Tuesday, March 21, 2006 - 7:41 am
Hi, I´m trying to use a CFA model to test convergent and discriminant validity. Though, almost all the residual variance are non-significant. If I fix them at zero, indexes such as "composite reliability" or "average variance extracted" or any other reliability index will be inflated. What should I do? Thank´s.
After looking through various discussion boards, I figured out how to fix my problem of having a negative residual variance for my variable dep3. I just added a line dep3@0 to the model command:
VARIABLE: NAMES ARE ...; MISSING = ALL (99); USEVAR = dep1 dep2 dep3 critsen1 critgf1 das1; ANALYSIS: TYPE = MEANSTRUCTURE MISSING; MODEL: i s | dep1@1dep2@2dep3@3; i s ON critsen1 critgf1 das1; dep3@0; OUTPUT: TECH4 SAMPSTAT STANDARDIZED MODINDICES (3.84);
I am pleased to have figured out how to fix the problem. However, I do not fully understand why setting that residual variance to zero allowed the model to run. Can you offer me any help in understanding this at a more applied level? Thanks!
Fixing a negative residual variance is done if the residual variance is a small negative value and not signficant. Othwerwise, the model should be changed.
The reason that this is a problem is that variances cannot be negative by definition.
Suzanne Jak posted on Tuesday, April 22, 2008 - 1:09 am
I'm fitting 2 first-order and 1 second-order factor on 7 continuous observed variables. Using scaling in lambda, the residual variance of de first factor is negative. My model runs well when I fix the variance of this factor to 1, and remain the factorlading of the first factor to be 1.
Q1: Is this bad practice? Q2: I thought it is useless to have 1 indicator for a factor, so I regressed de variable 'kub' direct on de 2-order factor. Is this ok?
This is my input:
Title: factormodel met weging sommen op 2 factoren, f1 fixed op 1
Data: FILE IS schaal.dat;
Variable: NAMES ARE ana cijf fig kub som syl voc w; WEIGHT IS w;
Analysis: ESTIMATOR = MLR;
Model: f1 BY fig ana syl cijf som; f1@1; f2 BY som voc; f3 BY f1 f2 kub;
You should not set the metric of the factor by both fixing a factor loading to one and fixing the factor variance to one. If you relax one of these restrictions, the model is not identified. You need a minimum of three first-order factors for the model to be identified without making perhaps unrealistic restrictions on the model.
JPower posted on Tuesday, January 20, 2009 - 9:28 am
Hello, I'm conducting a CFA of a four factor scale with ordinal indicators using wlsmv estimation. One of the factors has only 2 indicators and for these indicators there is limited variability in responses item s: Category 1 0.969 Category 2 0.026 Category 3 0.004 Category 4 0.001 item o: Category 1 0.975 Category 2 0.016 Category 3 0.006 Category 4 0.002 My fit statistics are reasonable (CFI, TLI >0.95, RMSEA, SRMR = 0.07). However, I get a warning about theta not being positive definite and there is a negative residual variance for item s (-0.003). Would dropping this factor be reasonable given the limited variability in responses and the negative residual variance? What would you suggest as next steps? Thanks.
In my opinion a factor with two indicators is not generally believable given that it is not identified without borrowing from other parts of the model. In your case, I would use one of the factor indicators as an observed variable in the model.
I am trying to establish measurement invariance in a measurement model before proceeding to a multi-group structural model. THere are two latent variables and two groups.
When I ran it constraining the factor loadings to be equal, it ran fine, but when I free the factor loadings for the unconstrained model, I get one negative residual variance in one group (leading to the "not positive definite" error message). The intercepts are also freed. The fit indices for the unconstrained model are chi sq=15.899, df=12, CFI=.993, rmsea=0.043, srmr=0.033. Can these results be interpreted with a neg residual variance?
You should first find that the factor model fits well in each group before proceeding to test for measurement invariance. It sounds like this is not the case. You might want to start with an EFA in each group to establish the the same number of factors is found in each group and then proceed to a CFA in each group.
I have a 6-factor CFA model with 13 indicators. In the first 6-factor model I fit one indicator had a negative residual variance that was very small (.05) and non-significant which I fixed at zero without any theoretical rationale , just figuring that it was small enough to justify as zero and proceed with this good fitting model. After testing the first, proposed model against several others with slight theoretically informed adjustments, a new model with one indicator loading on two factors shows a better fit. When I removed the zero residual variance constraint from the one indicator the actual variance in the newer model is more negative (@.11). If having this model constraint was justified in the first place, would it still be justified in the newer model with the increase in the negative residual variance?
It sounds to me like you should start with an EFA of your factor indicators to see if the items are behaving as expected. Making adjustments to a CFA model without previously doing an EFA to study the items can result in a misspecified model. A factor with two indicators is not identified without borrowing information from other parts of the model. I would hesitate to use such a factor.
What do you mean by misspecified? I understand how a factor with two indicators is not indentified but if the model as a whole is identified what does misspecification mean?
Here is where I my questioning is coming from: I proposed that a certain factor structure would arise via prinicipal components or EFA but it was strongly suggested that I use CFA. In looking at it both ways 6-factors will not converge in an EFA but when I run a CFA with the proposed structure the model is a good fit with the exception of the small negative residual variance. So should I take the non-convergence in EFA as a strong hint that 6-factors are not appropriate at all or is there a possibility that a confirmatory model with 6 factors is still feasible? I was hoping to move to factor mixture analysis with this CFA but that may not be a good idea either...
Thanks again - I really appreciate your assistance matt
A misspecified model is a model that does not correctly represent the data which I think you know. Beyond that I am saying that often CFA models are proposed based on theory and estimated using data that may not well measure the constructs represented in the theory. An EFA can often help in seeing this.
If you have to modify the CFA by for example fixing residual variances to zero, this may point to a problem with the model. If a 6-factor CFA fits the data well but a EFA will not converge, the CFA may be a fragile model that will not be replicated with other data.
Note that factor mixture analysis usually has less factors that a regular CFA.
I now have a question with regard to negative residual variances. This is the model I have:
PHYSIO BY haz4 waz4 baz4 haemo2; MENTAL BY WJ3 WJ2 WJ5 stpea; MOTOR BY carty_1a carty_3a carty_5a car2b car6b car4b; MOTOR ON age; comp1 by CD1 CD12 CD2 CD4 CD6; comp2 by CD5 CD7 CD8 CD9 CD18 CD19; comp3 by CD3 CD11 CD13 CD14 CD15 CD16 CD17 CD20 CD25; comp4 by CD21 CD22 CD23 CD26 CD24; comp1 on sex; comp2 on sex; comp3 on sex; comp4 on sex; motor on sex; mental on sex; SUBJ BY comp1 comp2 comp3 comp4; OBJ BY PHYSIO MENTAL MOTOR; SUBJ on ANIO_INC TIPO_LOC LOCAL; OBJ on ANIO_INC TIPO_LOC LOCAL;
All of the indicators are categorical, except the ones in MENTAL.
I am getting negative residual variance for WAZ4. I have tried changing the model, specifically the PHYSIO part by: (a) splitting it into 2 latent variables, but I get theta not positive definite involving waz4, and psi not positive definite involving one of the new latent constructs. (b) taking out waz4 from the model. With this, I get no error messages but the loading of baz4 changes from being significant to non-significant.
Following the recommendations I read in the forum, I conducted EFAs beforehand, for each of the latent constructs and then for all the indicators.
The EFA for PHYSIO comes out with the following: The maximum number of factors is set to 1. So the 4 indicators come out in 1 factor where WAZ4 bears a considerably high loading.
WAZ4 3.856 HAZ4 0.192 BAZ4 0.172 HAEMO2 0.023
For this, Chi-Sq = 12.587 (df=2), CFI=.997, TLI=.994, RMSEA=.053.
When I run EFA for all the indicators, haz4 and haemo2 share their highest loadings with indicators of MENTAL. However, haz4=height-for-age and haemo2=haemoglobin concentration and the indicators for MENTAL are results of memory/cognitive tests. This makes me think that -following theory- I should still keep them under PHYSIO.
When I did the CFA without controls, I got the following: Chi-Sq= 1175.263(df=260) CFI=.918, TLI=.937, RMSEA=.043 and PSI not positive definite for MENTAL.
Does all of this means the model is seriously misspecified?
If in EFA certain variables do not load on the expected factors, then their validity is questionable. If you force them on the expected factors in CFA, the model will not fit well. I think you need to consider the validity of the items that you are using.
I have a one-factor CFA with 4 indicators. The error variance of one indicator is negative but very small and non-significant. I know this is not what is supposed to be. But everything else looks good, though with a warning message. And I like the results. Is it appropriate that I just leave the results there and proceed to explain them with some reasonable arguments? Thanks for your attention.
Hello. We’re carrying out an EFA followed by CFA with covariates. There are 12 dependent variables and they are categorical (binary and ordinal). Based on EFA results, it appears that there are 4 factors, but the 4th factor only has two items and so was modeled as a correlation between those two items.
When I try to add a binary covariate to the model I run into problems. First, there is a negative residual variance for one of the items (avoidanc). I looked and the residual variance is very close to zero. But, when I set that residual variance to zero (using theta parameterization), I get an unidentified model. If I remove the culprit item (avoidanc) from the model it runs without any errors. However, the avoidanc item, which is problematic, is also theoretically important and should remain in the model. So, I’m not sure how to proceed.
Here is what the MPLUS model statement looks like:
MODEL: tbi BY dizzy@1 headache irritabl sleep memory visual ; ptsd BY nightmar@1 avoidanc onguard detached irritabl ; dep BY LittlInt@1 depress detached sleep ; onguard with sleep ; tbi ptsd dep on AUDC ; avoidanc@0;
SY Khan posted on Sunday, March 02, 2014 - 10:09 am
Hi Dr. Muthen,
My EFA results show that there are 4 factors for binary observed variables. When i conduct CFA on just one of my factors (which has 3 items)I get a meesgae of covariance matrix not being positive definite.
I tried fixing the starting values of the problem variable (JOBDSCRT)with the following three commands (two of which dont work):
You should focus on why your CFA gives a negative residual variance and change the model. It seems that you did not translate the EFA into an appropriate CFA.
SY Khan posted on Monday, March 03, 2014 - 2:41 am
Hi Dr. Muthen,
Thanks for your prompt reply. Sorry, I did not expalin myself clearly above.
The EFA suggested 4 factors and CFA confirmed those 4 factors. But when I ran individual CFA's on each four factors seperately I get a negative residual variance for an item on one of the factors i.e. AUTOJD which has three items, JOBVARTY, JOBDSCRT, JOBCTRL.
The negtaive variance for JOBDSCRT: JOBDSCRT Undefined 0.11371E+01 -0.137
I have read from posts above that when the -ve variance is not significant, it can be fixed to zero. so I have done the following:
It sounds like your model is very fragile if separating the factors reveals this problem. Fixing a residual variance to zero is for continuous variables only. This cannot be used with categorical variables. It should also be used only for small non-significant values. I don't believe -.137 falls into this category.
In estimating the factors together, you draw on information from other parts of the model which you do not do when you estimate each factor separately. You can consider using the four-factor model if it fits well. Given these issues, I suspect it does not.
The error terms are the same because you have constrained them to be equal by placing (1) behind each one of them.
SY Khan posted on Tuesday, March 04, 2014 - 6:41 am
Thanks very much for explanation. After reading your reply I thought that the solution would be to drop JOBDSCRT (binary variable).
But when I tried PARAMETERIZATION=THETA it didn't give negative residual. However, the overall fit indices reduced a bit (were better with DELTA).
With Theta I get CFA for AUTOJD BY JOBVARTY, JOBDSCRT, JOBCTRL seperately too.
Please advise if:
1-I can proceed with THETA parameterization? if yes, then do I need to have THETA parameterization in all the subsequent CFAs and SEM ananlysis? OR is it ok to change back to DELTA where it works without a problem?
2- What would be the impact on the quality and legitimacy of result if I did not use same parameterization consistently?
3- I am asking this question becaues my inedepndent variables are binary (4 factors of which one is AUTOJD). Other three independent variable factors work fine with Delta parameterisation).
My intermediate and outcome variables are CATEGORICAL(for CFA). But I run SEM with Latent variables (of binary items) and aggregated variables which are treated as continuous observed variables in SEM. These aggregated variables are generated by adding items identified through CFA of categorical items.
Sorry for the lengthy question and many thanks for your guidance.
SY Khan posted on Tuesday, March 04, 2014 - 8:50 am
Hi just to add a clarification to the above. I simply changed the parameterization=THETA without constraing the model in any other way or to give new starting values.And it worked.
I know that negative residual variance makes the results inadmissible. But i read an article [Chen, F., Bollen, K. A., Paxton, P., Curran, P. & Kirby, J. (2001).lmproper solutions in structural equation models: Causes, consequences, and strategies. Sociological Methods & Research, 29, 468-508.], that describes different tests to check the significance of the negative residual variance and in the Outout in Mplus there isn´t any information about the test, wihich Mplus is using.
Thanks again - I really appreciate your assistance!!!
But I am not sure if it is OK to fix both the factor loadings and factor variance to 1. Fixing the factor variance to zero doesn't seem to solve the problem. I still get the same error message, which makes even less sense.
Sounds like f1-f4 have some negative correlations among them and that is picked up by the f variance.
Jane Doe posted on Sunday, January 24, 2016 - 11:30 am
Thanks a lot. This makes sense. And yes there is a negative correlation between two of the factors. Allowing for that fixes the issue.
Jack Johnny posted on Sunday, April 17, 2016 - 12:08 pm
I am running an SEM with one of the latent factors "F2" has a negative residual variance according to the warning message. When followed your advice above by fixing it to 0, I receive the following message:
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 64, F3 ON F2
THE CONDITION NUMBER IS -0.227D-16.
The original identified model can't be identified now. May I know why this happened? What should I do then?
I don't quite understand what you meant by re-specifying the model. The model is based on a theoretical framework that I want to test. If I re-specify it, then the intended hypotheses will be changed. I think the codes (see below) which I used are OK, according to Mplus manual. Is it because I used pseudo data (borrowed from another study) for the variables tested in my model that caused the negative residual variance?
The commands are as follow:
variable: Names are EP1-EP4 ER1-ER4 AG1-AG18 EN1-EN10 AC; usevariables are EP1-AG3 EN1-EN10 AC; analysis: estimator is MLM; MODEL: F1 by EP1-EP4 ER1-ER4; F2 by AG1-AG3; F3 by EN1-EN10; AC ON F3 F2 F1; F3 ON F2 F1; F2 ON F1;
Run only the BY statements as a first step to see if your measurement model fits. This data may not be valid measures of the constructs the theory is based on and therefore not be correct for this data.
I ran the model with only the BY statements and received the same warning of the negative residual variance for "F2". Moreover, I also ran a BY statement with only the latent factor "F2". As it has only three indicators, I first constrained the two residuals of AG2 AG3 to be equal. The model was identified without the warning message, though the TLI index was negative. Then I tried to constrained the two factor loadings of AG2 AG3 to be equal, which resulted in a warning message that stated a negative residual variance from AG1.
Based on the above info, can I say all the warnings regarding the negative residual variance is caused by the invalid data?
Hello Dr. Linda and Bengt Muthen, I have been reading in this post that: "Fixing a negative residual variance is done if the residual variance is a small negative value and not signficant. Othwerwise, the model should be changed." I want to ask if this should be interpreted from the standardized or unstandardized output? Also, could you specify when the negative residual variance should be set to zero versus set to one? Thank you in advance and happy holidays.