Daniel posted on Thursday, August 14, 2003 - 10:51 am
I ran a parallel process LGM with multiple groups. Everything is fine except when I request standardized values I get a negative residual variance for one of my categorical observed variables, and an undefined r-square. DOes this invalidate my results? How can I fix it?
The negative residual variance seems to be for the first occasion where it is likely that there is very little smoking. Negative residual variances in growth models are common for variables with strong floor/ceiling effects.
It looks like you have a multiple indicator growth model not a parallel process model. I would need to see you full output and possible data to comment further.
Anonymous posted on Monday, August 25, 2003 - 12:28 pm
I am conducting a LGM model using categorical outcomes (intercept free; thresholds fixed to zero; slope free; quadratic latent variable free). I obtained a 2.36 residual variance for one outcome (compared to .39 to .55 for the other variables) and an undefined r-square for the offending variable. I thought that an undefined r-square was usually caused by negative variance. What does this suggest? How might I fix this?
In addition, I continually receive a standardized estimate of the intercept (i.e., threshold) that is greater than one. Can you explain this?
Further, in a different LGM model (same model but with a second slope added and the quadratic term removed) the standardized estimate of the mean of the first slope is greater than one. Again, is this possible?
Also, I tend to have high standardized correlations between the first slope and second slope (e.g., -.61; not significant), but negligible unstandardized correaltions (e.g., -.009).What do you make of this? I also, see this same pattern when I replace the second slope with a quadratic latent variable.
Perhaps, there is a strong linear relationship between the slope and quadratic which is influencing this pattern? Any suggestions?
The undefined r-square can happen for a variety of reasons. I would need to see the full output to answer this.
Stadnardized values can be greater than one when they do not correspond to correlation coeffients. The value of the threshold can be greater than one just like a z-score can be greater than one. Slope means can also be greater than one because they are not interpreted as correlaton coefficients.
Regarding the negligible unstandarized correlation comes about from small variances.
!correlated residuals: all adjacent residuals are correlated PDA WITH PDAmn01; PDAmn01 WITH PDAmn03; PDAmn03 WITH PDAmn05; PDAmn05 WITH PDAmn07; PDAmn07 WITH PDAmn09; PDAmn09 WITH PDAmn11; PDAmn11 WITH PDAmn13; PDAmn13 with PDAmn15;
You mentioned above that negative residual variances in growth models are common for variables with strong floor/ceiling effects. I presume that a model with a negative residual variance on an observed variable is not really "reportable." How do you suggest getting rid of this problem? Setting the pdamn01 variable variance to zero doesn't work (model won't converge). Already had to set the intercept variance to 0 b/c that was coming up negative too. Is transforming the variable an option you would suggest? Wouldn't I need to transform all the observed variables then?
I don't know how you started this analysis but the growth model you have ended up with is a little odd given that the quadratic factor should have time scores that are the square of the slope growth factor. Your growth slope factor has time scores of:
0 1 1 1 1 1 1 free free
The quadratic has:
0 1 4 9 10 free free free free
This does not make sense to me and could be causing problems. I am not sure how this model would be interpreted.
Oh! Looks like I was unaware of the exact nature of the relationship between growth and quadratic factor. Now I wish I had posted anonymously! :-) (the 10 in the quad was a typo)
THe growth pattern, based on means, looks something like this: 3.005 8.647 8.356 7.889 7.752 7.611 7.587 7.536 7.544
What I was seeing here is a decreasing linear pattern, with a quadratic overlay at the beginning (treatment effect). I guess I am still confused about time score usage. You had suggested 01111... for an earlier problem I had with a similar pattern. But maybe a very different model is indicated here? I am actually interested in relating this variable to another one in a dual growth model, so I want to get the simple growth pattern as uncomplicated as possible. Thanks for your assistance! silvia
I suggested 0 1 1 1 1 ... because I see a big jump between time 1 and time 2 and then a flattening out. I guess one question is if the difference between 8.6 and 7.5 meaningful? Is that type of decline important enough to model? If not, I would fit a model with one growth factor 0 1 1 1 .... Is the increase from time 1 to time 2 important to model or can you start the time series at time 2?
Yes the difference between 8.6 and 7.5 is meaningful in our context and worth modeling.
The time 1 value is the baseline before intervention (percent days abstinent from alcohol). If i leave it out for this model, what exactly does the intercept factor mean? Could I account for baseline abstinence by using the time 1 variable as a covariate (PDAslope on PDA)? How is the interpretation of this information different from having the baseline be an indicator of the intercept factor in the growth model (i.e., PDAslope on PDAint)?
I did try taking out the first time point and got a barely acceptable fit. The modindices suggest that time 15 is the most problematic. If I free this , what are the implications for interpreting the drop-off form 8.6 to 7.5?
I think the reason you are having problems is that the model after the first time point is not linear. It seems like you have two things going on looking at all time points. First you have the big jump -- from 3 to 8.6. Then a small series of declines and then a leveling out. You could consider a piecewise model with two pieces -- the first representing the initial growth (up to 8.5) and the second representing the decline and leveling out. So you could have timescores of
0 1 1 1 1 1 1 1 1 for the first slope and 0 0 1 2 3 4 5 5 5 for the second slope. I think this reflects your means:
So you would have one intercept factor and two growth factors. I don't think the variance of the first slope factor is identified so fix it to zero. Also fix its covaraince with the other growth factors to zero. Then you could have different predictors of the initial jump and the decline.
Also, don't put the residual covariances in until you fit this model.
I too am having Mplus (v3) report negative variances with simple linear growth modeling. The code for one example is: Analysis: TYPE = meanstructure missing h1; Model: i by cog0@1Cog2@1Cog6@1; s by cog0@0cog2@2cog6@6; [Cog0 - Cog6@0 i s]; i s on group; Output: Samp;
The negative variances are normally associated with s. Otherwise the output looks good. Indeed, when I plot out the predicted means for the 2 groups using the Mplus parameters they are pretty good, despite the negative varainces. Is it OK to just ignore the negative variances? Can I just set the offending variances to zero? Any advice would be well received. There are only 3 time points (0, 2, & 6 months) and only about 100 cases (some with missing data). Group is a dichotomous variable. Cog is continuous (fairly normal).
bmuthen posted on Sunday, August 28, 2005 - 8:20 pm
This may simply imply that you have no individual variation in the slopes and should therefore treat it as fixed, i.e. fix its variance at zero.
Andy posted on Thursday, November 03, 2005 - 6:15 pm
I have a random intercepts and slopes growth model as on Ex 6.1 on p75 of the V3 manual, except that I have 3 times points (0, 1 , 2). I have a continuous outcome with 80 individuals.
There are 6 covariance df to play with, and I'm fitting a model using all 6 (ie random intercept and slope variances, their covariances, and unconstrained 3 residual variances.) Equating sample covariance matrix to model covariance parameters and solving the 6 equations in 6 unknowns gives the solution as reported in Mplus.
As with others, I get a negative residual variance estimate for the error variance at my time 2, and the numerical value reported in Mplus matches the algebraic calculation. Denoting the sample covariance elements by Sij, the estimate of the error variance at my time 2 is S33+S13-2S23. My data just happens to have S13 as the smallest element in the matrix and S23 approx the same as S33, so this gives the negative variance estimate.
The algebra indicates that these negative variances can occur quite frequently in this model with a saturated covariance parameterisation. Have you found this to be so in your experience with many datasets? What is the usual recommended action? Is it to set the offending error variance to zero, or instead add some constraints to the error variances?
This has reinforced to me some of the limitations with growth modelling with 3 time points. Do you recommend against fitting such an unconstrained error variance model with 3 time points?
bmuthen posted on Friday, November 04, 2005 - 7:54 am
In my experience, negative residual variances often happen with strongly skewed outcomes. If your residual variance comes out negative you can hold the variances equal across time points, or fix it at zero. I recommend having more than 3 time points for reasons of being able to have a more flexible and realistic model (correlated residuals, free time scores...).
Andy posted on Sunday, November 06, 2005 - 8:44 pm
Dear Dr Muthen,
Thanks for your quick reply. My data was a bit right skewed, but the negative variance persists with square-root and log-transform of my data.
I also tried the model with the classic Potthoff and Roy 1964 growth dataset involving the pituitary gland in boys and girls, and with a few other datasets. Whenever I restricted the model to 3 time points I got a negative variance somewhere in the model. These datasets had constant sample variance or increasing variances with time, and constant off-diagonal correlations or decreasing correlations. All gave negative variances somewhere.
So now I'm wondering whether it is the exception rather than the rule to be able to estimate all 3 residual variances unconstrained when you have only 3 time points?
I'm also curious why MPlus does not implement non-negative variance constraints in the estimation routines, eg by reparamaterising a variance sigma^2 as exp(A) where A = ln(sigma^2), and estimating A rather then sigma^2 directly?
BMuthen posted on Saturday, November 12, 2005 - 6:24 pm
I don't think this is an issue that comes up because of three time points. I have experienced many such applications without problems. If you want us to look into your particular data sets, please send the inputs, data, outputs, and license number to firstname.lastname@example.org.
We do not have non-negative variance constraints automatically built in because we do not want to hide this information from the analyst. An analyst can fix a residual variance to zero if they want or to constrain the variance to be non-negative.
We're preparing a dual process model looking at differences in growth between parents and children, however, we obtained a negative variance for one of the slope factors. Does anything about the syntax below suggest an overfitted or problematic model?
USEVAR ARE achvl sachvl tachvl pchvl spchvl tpchvl sex gen2; MISSING ARE ALL (999);
are unusual because intercepts and slopes of the same process are typically correlated. A negative slope may simply mean that the slope variation is almost zero in which case you want to fix its variance at zero and consider it a fixed effect, rather than random.
Also, I would recommend switching to the newer growth langues to simplify your input.
You may also want to correlate contemperanous residuals using WITH statements
Amber Watts posted on Thursday, October 29, 2009 - 11:58 am
I have a similar problem to Peter above. Bengt suggested "This may simply imply that you have no individual variation in the slopes and should therefore treat it as fixed, i.e. fix its variance at zero."
When I fix the slope variance to zero I still get a problem, but if I fix the intercept-slope covariance to 0 the model runs fine. Is this an acceptable thing to do?
It sounds like you are using an old version of the program. If a variance is fixed to zero, all covariances with the variable need to be fixed to zero. This has been done automatically in Mplus for some time.
Amber Watts posted on Thursday, October 29, 2009 - 12:25 pm
I am using Mplus version 5. If I fix slope@0 the Tech4 output it says that Level-Level covariance is negative even though the level-slope is 0.00.
If I fix level@0 the tech4 says the slope-slope covariance is negative even though the level-slope covariance is 0.00
If I fix the level WITH slope@0, I no longer get the error message and both level-level covariance and slope-slope covariance become positive.
I just have the following question/s for you: What does a zero slope variance mean (considering a real zero, r/t forcing to zero)? Is that meaning that the inter-individual differences are the same as those at the beginning of the study? OR does it mean that the inter-individual differences come close to zero over the course of the study?
I'm doing an ESEM model with categorical indicators. When I try to run the model, I get an error message stating that one of my variables has a negative residual variance. Since the dataset I was using is one of several datasets created using multiple imputation (to address missing data), I decided to try running the model in one of the other datasets to see if the model would converge. It did; I then proceeded to run the model in a few more datasets. What I'm seeing is that the model converges in about half the of datasets. When model converges (with good fit indices; CFI = .99, TLI = .98, RMSEA = .07) the residual variance is about .01. When the model does NOT converge, the residual variance is about -.01. How would you suggest I proceed with interpreting and/or adjusting this model?
I have computed latent growth models and have come across some problems I don’t seem to be able to resolve. I have two scales for which item data was collected at three time points. I modeled each of these scales separately in univariate LGM models, using the items as indicators for the three latent construct variables at three time points. I obtained sensible estimates for each LGM. However, when specifying a multivariate model where the slopes and intercepts of the two scales are allowed to covary to examine potential relationships of growth, I obtained the following warning: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE S_CAL. [S_CAL is the slope of one of the two scales]
I also obtained a negative value for the variance of S_CAL (which did not occur in the univariate LGM model of this scale). Please can you advise how to solve this problem?
Many thanks for your quick reply. If I understand you correctly, you advise me to allow the residuals of the latent variables of the two constructs at each time point to covary. I have already done this. Sorry for not making this explicit. However, it does not help to obtain a model that converges.
Ginnie posted on Thursday, April 25, 2013 - 2:53 pm
Dear Dr. Muthen,
I am preparing a growth model for parallel processes; however I obtain an error message regarding the variance of s2.
variable: names are y1-y35; usevariables y7-y10 y22-y25; missing = all (999); analysis: estimator = ml; model: i1 s1 | y7@0y8@1 y9*2 y10*3; i2 s2 | y22@0y23@1 y24*2 y25*3; s1 on i2; s2 on i1; output: tech4; MODINDICES (ALL);
THE MODEL ESTIMATION TERMINATED NORMALLY
WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE S2.
Then, if the slope have a non-significant variance of zero and the covariance between the intercept and the slope is significant, I would have to keep both (the variance of the slope and the relation between the intercept and the slope) even if the fit is worst with the variance of the slope?