Negative Residual Variance
Message/Author
 Anonymous posted on Thursday, January 20, 2005 - 12:18 pm
Hi-

I have been working in Mplus on several two-level structural equation models, and sometimes the residual variance of my observed dependent variables is negative. The model fits are high, but I am not sure how to interpret or fix these rv's so they are no longer negative. Please advise. Thank you.
 BMuthen posted on Thursday, January 20, 2005 - 7:57 pm
If the negative residual variances are large, this is a sign that your model is not appropriate for your data and you need to change your model. If they are small, you may want to fix them to zero. Residual variance are often small on the between level of multilevel models.
 matthew posted on Sunday, November 27, 2005 - 3:14 am
hi i am a new user of SEM and also face the similar problem. what are the causes of negative residual variance? i mean how inappropriateness of the model would cause this problem but not reflecting on the model-fit indices. i want to ignore it (simply delete it), is there any guideline or significant level of the value so that it would be comfortable to do this?
 Linda K. Muthen posted on Sunday, November 27, 2005 - 2:41 pm
there is too much to say on this topic and checking books is helpful. - in sum, reasons for neg vars include small sample size (so neg est even if pop value is pos), model misspecification, and very skewed variables (floor effects). also see my other answer today.
 Grainne Cousins posted on Monday, March 31, 2008 - 9:10 am
Hi

Using CFA, I established discriminant validity between 6 latent variables (x1-x6). Then using SEM, I regressed y1, a latent variable with 3 continuous indicators on x1-x6. The model fit very well.

However, on addtion of a final path, regressing u1 a binary observed variable on y1, I received a warning that the residual covariance matrix is not positive definite.On inspection of the results, I realised there was a negative residual on one of the indicators of y1, and the model fit indices are very poor. What should I do? I tried dichotomising y1, as the indicators are negatively skewed but the model fit is worse.

I would be greatful for any advice.

Grainne
 Linda K. Muthen posted on Monday, March 31, 2008 - 10:15 am
 Dorothee Durpoix posted on Saturday, April 04, 2009 - 10:17 pm
Hello,

Is it possible to constrain residual variance of outcome variables to be greater than or equal to zero in Mplus (rather that only equal to zero)? If yes, could you please state how?
 Linda K. Muthen posted on Sunday, April 05, 2009 - 9:40 am
You can use MODEL CONSTRAINT to constrain them to be greater than zero, for example,

MODEL CONSTRAINT:
0 < p2;
 Dorothee Durpoix posted on Monday, April 06, 2009 - 3:51 pm
Hello Linda,

I found a negative error variance for one of my ordinal outcomes (estimator WLSMV), and i'd like to test the inequality H0: error variance is greater than or equal to zero using the Wald test. Is it possible in Mplus?

Thank you very much in advance.
 Bengt O. Muthen posted on Monday, April 06, 2009 - 6:02 pm
I don't know what your model is, but unless you have a longitudinal or multi-group situation, the residual variances for ordinal outcomes are not free parameters. They are printed as remainders when requesting a standardized solution - perhaps that's where you see the negative value. So since they are functions of other parameters, you cannot do a test on them in a straightforward fashion.

If you for instance consider a single factor and no covariates (in a cross-sectional, single-group, the residual variance remainder is theta,

theta = 1 - lambda*lambda* psi

so you would have to do the Wald test on the new parameter theta (testing against = 0, not > 0).
 Dorothee Durpoix posted on Monday, April 06, 2009 - 6:40 pm
Thanks Bengt for your answer. I indeed saw the negative residual variance doing ESEM with WLSMV and Delta parameterization on 2-factor model described by a total of 5 ordinal data.

1. But, if one uses the Theta param., the residuals should be the free parameters, shouldn't they? In this case, the Wald test could be directly used on the residual?
2. About the Wald test, I was wondering if it was possible to do a one-sided test of H0: residual >= 0 (as opposed to the two-sided test H0: residual=0), as it is suggested in the article of Kolenikov & Bollen "Testing negative error variances: is a Heywood case a symptom of misspecification?" http://web.missouri.edu/~kolenikovs/

Hope these questions make sense!
Thank you very much for your help.
 Tihomir Asparouhov posted on Tuesday, April 07, 2009 - 9:41 pm
1. With the theta parameterization the residual variance is fixed to 1 (unless you have multiple group situation) - so in a way this is giving you residual variance > 0 condition. The residual variance is not a free parameter because it is still not identified so it has to be fixed to a value that determines the parameterization. For the theta parameterization that value is 1.

2. In principle yes - this amounts to dividing the p-value you get by 2, but again with the theta parameterization you can not do this at all because the residual variance is fixed to one. In the delta parameterization you can do this using the method Bengt outlined above, i.e., by making a new parameter in model constraints that is equal to the residual variance parameter. The residual variance parameter in the model is not really a regular parameter - it is a dependent constrained parameter that you can not access directly so you have to make your own duplicate of it.
 RDU posted on Saturday, February 13, 2010 - 12:29 am
I have a question concerning the residual variance provided in the standardized model results for categorical outcomes. Since this is a standardized solution, then are the residual variances listed standardized residual variances or are they unstandardized? Furthermore, if they are standardized then how exactly does one obtain the unstandardized residual variances (Is it similar to what it would be in regular regression?). Thanks.
 Linda K. Muthen posted on Saturday, February 13, 2010 - 9:43 am
I think you are asking about the residual variances that are printed with R-square. These are raw coefficients that are computed as a remainder from the model estimated results. They are not estimated as part of the model. Categorical outcomes do not have variance parameters.
 RDU posted on Saturday, February 13, 2010 - 10:32 am
To better clarify my question, I was referring to the residual variances that are provided using Theta parameterization (WLSMV estimation) with categorical data, where standardized model results can be requested. As part of the standardized output, residual variances are given along with each item's R^2.

Thus I was wondering whether the residual variance is standardized since it is part of the standardized model output, or whether it is an unstandardized estimate.

If it is in fact a standardized residual variance, I also wanted to know if an unstandardized estimate could be obtained or calculated by hand. Thank you for your response.
 Linda K. Muthen posted on Saturday, February 13, 2010 - 10:59 am
 RDU posted on Saturday, February 13, 2010 - 11:37 am
Yes, I apologize for the confusion. I believe I mistook the scale factors from Theta parameterization for the residual variances provided in delta parameterization.
 RDU posted on Saturday, February 13, 2010 - 1:40 pm
Given the previous question I am also curious as to whether the residual variances given for delta parameterization are standardized or unstandardized, as theta parameterization was said to provide scale factors and not residual variances...Is this correct?
 Linda K. Muthen posted on Saturday, February 13, 2010 - 2:05 pm
Neither the scale factors or residual variances presented with R-square are standardized.
 Hemant Kher posted on Thursday, April 21, 2011 - 8:04 am
Hello Professor Muthen,

I have a question, and I hope that you can provide some insights. My question is related to a multiple-indicator latent curve model. Using the CFA approach, I estimate a latent construct for 4 different time points (factors F1, F2, F3 and F4 -- each estimated using the same 4 items). I followed directions to establish measurement invariance (same scale indicator at each time, loadings for non-scale items constrained equal across time, and equal intercepts for non-scale items). The model with CFA works fine with a good fit.

However, when I fit a growth model on the factors, I get a negative residual variance for the first factor (F1); the residual variance is small and statistically insignificant (-0.026, Z=-0.354, p=0.723). When I fix this residual variance for F1 to zero (f1@0;), the change in model fit is negligible and not statistically significant. But I am not sure if doing this (setting factor residual variance to zero) is reasonable. Your thoughts at your convenience would be appreciated.
 Bengt O. Muthen posted on Thursday, April 21, 2011 - 8:18 am
Negative residual variances typically reflect a mis-specified model. For instance, perhaps a non-linear growth model is more suitable.

Also, instead of fixing the residual variance at zero, you could try holding them equal across time.
 Hemant Kher posted on Thursday, April 21, 2011 - 8:26 am
Professor Muthen -- Thank you for a quick response.

Holding the factor residual variances equal across over time has solved the problem.
 Katja posted on Monday, January 14, 2013 - 1:14 am
Hi!

I have a question, regarding a neg. residual variance. There was a post:
"If the negative residual variances are large, this is a sign that your model is not appropriate for your data and you need to change your model. If they are small, you may want to fix them to zero. Residual variance are often small on the between level of multilevel models."

What is considered as a large/small negative variance? I have a neg. residual variance of -,083. Can i fix it to zero?

Thank you!
 Linda K. Muthen posted on Monday, January 14, 2013 - 10:31 am
Try fixing it to zero and see how this affects the results.
 Johannes Bauer posted on Friday, June 27, 2014 - 12:02 am
Hello

There seem to be two approaches of handling negative residual variances: The first is to fix the residual variance to 0 or a small positive value. The second one is to use the 'model constraint' to constrain the variance to be greater than zero. These approaches seem to be different because the first one delivers an additional degree of freedom wheras the second does not.

So I am wondering which approach you consider more appropriate and why.

Many thanks
Johannes
 Linda K. Muthen posted on Friday, June 27, 2014 - 11:05 am
It is common practice to fix small insignificant negative residual variances to zero or constrain them to be greater than zero. These approaches are basically the same. Neither are optimal. A better choice is to change to model.
 Linh Nguyen posted on Saturday, July 19, 2014 - 9:31 pm
Hello Linda

I am also having the negative residual variance problem with my measurement and structural models.

The problem is at the second-order construct which has only 2 firt-order indicators. So when I run CFA for this second-order construct, it is unidentified (My sample size is 364). So I have set the factor's loadings to be equal (Tau equivalence) in order to run CFA and validate the construct individually. The CFA model fits very well then with an insignificant Chi-squared, CFI =.995, RMSEA =.04, SRMR =.02. The overall measurement model (with 5 other constructs, 32 observed variables) is also adequately fit.

In my structural model, if I did not set the factor loading of the above construct to be equal, I would get 1 negative residual variance (from the above construct's indicator). The negative residual variance was -.017, insignificant. If I continued to employ Tau-equivalence assumptions, the model would be fine. Can I set the construct's factor loadings to be equal like in the CFA model for my structural model to solve the problem?

Thank you very much!

I am looking forwards to hearing from you

Kind regards
Linh
 Linh Nguyen posted on Saturday, July 19, 2014 - 10:35 pm
With regard to the mentioned second-order construct, eventhough it has 2 first-order indicators, each first-order indicator has 3 observed variables.

Thanks
Linh
 Linda K. Muthen posted on Sunday, July 20, 2014 - 7:42 am
I would use the CFA model of equal loadings in the SEM model.
 Linh Nguyen posted on Monday, July 28, 2014 - 7:30 pm
Hi Linda

Thanks so much for your quick response! It really helps. Sorry for my late reply cause I could not post any message on the website until now.

I am writing up my analysis, just wondering if you can give me some references for using the CFA model of equal loadings in the structural model? I have been searching for journal articles about the issue but could not find the proper one.

Thank you

Kind regards
Linh
 Linda K. Muthen posted on Tuesday, July 29, 2014 - 6:02 am
I don't know of any reference. A model with two second-order factors is not identified unless you make some constraint like equal loadings. Ideally you would have more second-order factors so this is not necessary.
 Linh Nguyen posted on Tuesday, July 29, 2014 - 6:52 pm
Thanks Linda
 Ted Fong posted on Friday, August 01, 2014 - 2:12 am
Dear Dr. Muthén,

I understand that a 2nd-order factor model with just two first-order factors is typically unidentified. When the model is made identified with a model constraint such as equal loadings on the 2nd-order factor, this model should have the same df as the two-factor correlated model.

My question is: is it possible for an identified 2nd-order factor model with two first-order factors to have a lower df than the two-factor correlated model? I have recently come across a paper where the former model has 2 df fewer than the latter one.

Thanks very much,
Ted
 Linda K. Muthen posted on Friday, August 01, 2014 - 10:47 am
This question is better suited for a general discussion forum like SEMNET.
 Alyssa Thomas posted on Monday, August 04, 2014 - 8:21 pm
Hi Dr Muthen-
I am running a full structural equation model (with categorical outcome) and am having trouble.
While it runs and is terminated normally, I still get the error message about a negative residual variance:
"THE MODEL ESTIMATION TERMINATED NORMALLY

WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE
DEFINITE....PROBLEM INVOLVING VARIABLE DLMORAL."

After reading online I fixed the residual variance to 0 (DLMORAL@0) , but am still getting this error message.

Am I able to still use the estimates provided or is there something else I can do to fix this problem? Although having it fixed @0 means I do not have estimates for two correlations involving this variable.
This is the starting model so I want to make sure there are no issues before I modify it. I have run CFA on all independent variables before running the full model and there were no problems with this variable.

 Alyssa Thomas posted on Monday, August 04, 2014 - 8:27 pm
I should add that my sample size is only 320 but that has not been a problem for several other models (including one similar) that use this variable.
Does this mean that this issue is caused by the addition of one latent variable and one observed variable (that does not correlate with the variable in question)?
 Linda K. Muthen posted on Tuesday, August 05, 2014 - 5:55 am
 Linda Lin posted on Tuesday, March 10, 2015 - 3:04 pm
Can you provide how Mplus decompose the total outcome variance and calculate the level 2 residual variance?

I tried to demonstrate as below:
the total variance of outcome variable can be decomposed as [ level-1 residual variance ]+ [level-1 explained variance] + [level 2 residual variance] + [level-2 explained variance].

That is, for example

usevariables are class x y;
within=;
between=;
cluster=class;
Analysis: type=twolevel;

Model:
%within%
y on x (a);
y (b);
x (c);

%Between%
y (d);
x(e);
y on x (a);

I expect that total Var(y)should=b+ c*a^2 + d + e*a^2. But I found that the formula I used here can not get the accurate total variance of Y.
 Bengt O. Muthen posted on Wednesday, March 11, 2015 - 6:25 pm
I don't know if you compare your formula to the total variance in the sample or the model-estimated total variance. If you don't have perfect fit, those two are different.

The formula is correct.

You use the label "a" on both levels - is that what you intend?

If this doesn't help, send output to support@statmodel along with your license number.
 Linda Lin posted on Wednesday, March 11, 2015 - 7:12 pm
For clarifying my question above:
1) I intend to constrain the coefficients to be same across both levels.
2) I compared the total variance estimated from below model with the total variance estimated from the above model.

usevariables are class y;
within=;
between=;
cluster=class;
Analysis: type=twolevel;

Model:
%within%
y (e);

%Between%
y (f);

total variance = e+f. I expected the total variances are the same. Is this e+f the sample variance you mentioned?

Thanks!
 Bengt O. Muthen posted on Thursday, March 12, 2015 - 1:54 pm
e+f is the model-estimated total variance that I referred to. Your first model was

Model:
%within%
y on x (a);
y (b);
x (c);

%Between%
y (d);
x(e);
y on x (a);

But because you have an equality constraint
"a", you won't necessarily get the same total variance - the equality may not hold. Try it without the equality.
 Sanne Korneev posted on Monday, October 12, 2015 - 2:45 am
Dear Linda or Bengt,

I am fitting a two-level factor model with WLSMV on binary data, and I would like to report the within-level residual variance, although they are not free parameters. Can I calculate them as 1 - lambdawithin^2 * phiwithin? Or are the residual variances at the within-level fixed at 1 like in the theta-parameterization?

Sanne
 Linda K. Muthen posted on Monday, October 12, 2015 - 10:25 am
If you ask for STANDARDIZED in the OUTPUT command, you will obtain residual variances which are computed as remainders. It is at the end of the standardized output with R-square.
 Sanne Korneev posted on Monday, October 12, 2015 - 10:42 pm
Dear Linda,

Thanks for pointing me to this output. I do not see the residual variances, but I do get scale factors, which are 1/sd(underlying response variable) if I understand correctly.

I then calculated the total variance as 1 / scale factor^2, and the residual variances as total variance - explained variance. The result is around 1 for all variables: 1.002 1.002 1.000 1.001 1.003 0.997 0.999 0.998 1.001. This gives the impression that the residual variances are effectively constrained to be 1, and the small deviations that I find are due to rounding error. Could that be correct? Because then I will just report (unstandardized) residual variances of 1.
 Bengt O. Muthen posted on Wednesday, October 14, 2015 - 2:13 pm
Yes, the residual variances are fixed at 1 because the Theta parameterization is used for 2-level WLSMV.
 Sanne Korneev posted on Thursday, October 15, 2015 - 9:44 pm
Thank you very much!
 Mahmoud A. Moussa posted on Wednesday, December 09, 2015 - 8:08 pm
What is considered as a large/small negative variance? I have a neg. residual variance
Can i fix it to zero? and how?
 Bengt O. Muthen posted on Thursday, December 10, 2015 - 2:36 pm
You say

y@0;

No rules of thumb, but if fixing it at zero didn't change the fit very much then it wasn't big.
 Bo Y posted on Monday, February 08, 2016 - 2:32 am
Hi Drs. Muthen,
Similar to the above questions, I ran into the warning msg when I did a cross-lagged model. The model fit index seems acceptable after modification. I have a relatively small sample size 102 for wave one and 80 for second wave.

DCI23 is an important observed variable in this model. Is there a technique that I could apply to fix this?
Thank you very much!

--------

THE MODEL ESTIMATION TERMINATED NORMALLY

WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE
DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A
LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT
VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES.
PROBLEM INVOLVING VARIABLE DCI23.
 Linda K. Muthen posted on Monday, February 08, 2016 - 8:32 am
No way to change this without changing the model. If fit didn't change very much as stated above, then fixing it to zero should be okay.
 Bo Y posted on Tuesday, February 09, 2016 - 1:34 am
Thanks a lot, Linda. I would like to make sure I put in the syntax right. Would you please let me know whether the following is OK?

Then the CFI increased a bit from .975 to .976, and TFI increased a little bit too from .960 to .962, but still I got the same WARNING msg below. Do I need to do anything further?
--------------------
THE MODEL ESTIMATION TERMINATED NORMALLY

WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE
DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A
LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT
VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES.
PROBLEM INVOLVING VARIABLE DCI23.
 Linda K. Muthen posted on Tuesday, February 09, 2016 - 9:17 am
 Jennie Jester posted on Wednesday, October 26, 2016 - 8:20 am
I am estimating a structural equation model, which has two time points of executive function(EF). The time 2 EF has negative residual variance - I believe it's because the two time points are so highly related.(The standardized output shows that the beta from time 1 EF to time 2 EF is greater than one.) I would like to set the residual variance of time 2 EF to zero. Does it make sense to do that and still use time 2 EF as the dependent variable in a regression? Because my understanding is that a regression equation is predicting variance, so if you set the variance to zero there is nothing to predict.
 Bengt O. Muthen posted on Wednesday, October 26, 2016 - 2:32 pm
I would not recommend going in that direction. I assume EF is a factor with several indicators. If so, explore across-time residual correlations for the same indicators. That may reduce the factor correlation.
 Elina Thomas posted on Thursday, March 09, 2017 - 1:21 pm
Hi Drs. Muthen,

I'm running a quadratic latent growth curve model and am getting an error message because there is a negative residual variance for both the slope and quadratic factor. If I restrict the slope and quadratic factor residual variances to 0, the model fit worsens. However, if I restrict the only time point with non-significant residual variance to zero, and restrict the slope to 0- the model fits well and q no longer has a negative residual variance. Is this o.k. for me to do?

Specifically, this is my residual variance output:

Residual Variances
M6_IBQ 0.934 0.364 2.568 0.010
M9_IBQ 0.311 0.147 2.119 0.034
M12_IBQ 0.915 0.322 2.841 0.005
M24_ECBQ 4.037 3.100 1.302 0.193
I 0.022 0.316 0.070 0.944
S -9.129 5.135 -1.778 0.075
Q -3.853 2.053 -1.877 0.061

When I restrict s@0 & q@0 the model fit worsens.

However, if I restrict s@0 and M24_EBQ@0 the model runs, q no longer has a negative residual variance, and the model fit is better.

Is this o.k. for me to do?

Sorry for my lengthy post, any information is greatly appreciated.