Mplus Discussion >> Residual variance and R-Square of slope factor in LGM

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Residual variance and R-Square of slo...

Mplus Discussion > Growth Modeling of Longitudinal Data >

Message/Author

Anonymous posted on Friday, June 08, 2001 - 2:03 pm

In a LGM, the variance of the slope factor was significant(0.725, t=2.223). Then, four covariates (time-invariate) were sued to predict the slope factor. The results show that one of the four covariates had significant effect on the slope, and the residual variance of the slope factor became statistically insignificant (0.640, t=1.944). Does this mean that the variation in the slope was systematically explained by the covariate? If so, the R-Square of the slope factor should be large. Unfortunately, the R-Square was only 0.06, indicating a very small portion (6%) of the variation in the slope factor was explained. Could you please help interpretat the results? Thank you very much.

Linda K. Muthen posted on Saturday, June 09, 2001 - 8:43 am

A distinction should be made between significance and size of an estimate. All that a non-significant slope factor residual variance implies is that the residual variance is not significantly different from zero. The standard error and therefore statistical significance is dependent on sample size. Just because a covariate explains a significant amount of variation does not imply this is necessarily a large amount of variation.

Anonymous posted on Monday, June 11, 2001 - 9:42 am

Linda:
Thanks a lot for your prompt reply. My question was: can we say the variation of the slope was not random but systematic because after the covariates were controlled, the residual variance of the slope became insignificant. If so, does R-Square=0.06 mean only 6% of the systematic variation was explained and 94% of them remained unexplained? Then, where did the unexplained systematic variation go (included into error term which had insignificant variance)? That was where the confusion came from.

Linda K. Muthen posted on Tuesday, June 12, 2001 - 6:08 am

The way I think about this is that the slope has a significant amount of variation, that is, individuals develop at different rates. One covariate explains a significant amount of this variation, that is, that covariate is a significant predictor of these different rates of development. If the r-square is 6 percent, then 6 percent of the variation in the slopes is explained by the covariate. The rest is not explained. That's all I would be able to pull out of these results.

Anonymous posted on Tuesday, June 12, 2001 - 1:16 pm

I tried another model with four repeated measures.

1) set intercept and slope factors correlated:
- the unconditional variance of slope = 0.339
- with 4 predictors, the residual variance of the slope = 0.270
- then the explained variance of lope =(0.339-0.270)/0.339 = 0.20
- Mplus printed R-square of the slope = 0.12

2) set intercept and slope factors uncorrelated:
- the unconditional variance of slope = 0.676
- with 4 predictors, the residual variance of the slope = 0.680
- then the explained variance of lope =(0.676-0.680)/0.676 = -0.01
- Mplus printed R-square of the slope=0.06

How to interpret the results? Thank you very much.

Linda K. Muthen posted on Wednesday, June 13, 2001 - 7:49 am

You are not computing r-square correctly. The slope variance for r-square is not taken from the model without x's. It is the model estimated variance from the model with x's. You can obtain that by asking for RESIDUAL in the OUTPUT command.

Anonymous posted on Wednesday, June 13, 2001 - 12:13 pm

I was in fact using Mplus to test the following simple models:

Model 1:
Yit=Ii+Si*Ti+Ei ... (1-1)
Ii=U0i ... (1-2)
Si=U1i ... (1-3)

Model 2:
Yit=Ii+Si*Ti+Ei ... (2-1)
Ii=a00+b01*X1+b02*X2+b03*X3+b04*X4+V0i ... (2-2)
Si=a10+b11*X1+b12*X2+b13*X3+b14*X4+V1i ... (2-3)

Where Ys are four repeated outcome measures; subscripts i and t represent individuals and time points, respectively; Xs are fixed covariates of the random coefficients, i.e., the growth factors I and S; and E and Us/Vs are random errors at level 1 (i.e., time occasions) and level 2 (individuals), respectively.

Let's focus on the slope factor S:

In Model 1, the Unconditional Var(Si)=Var(U1i) in Eq. 1-3
In Model 2, Residual variance of Si=Var(V1i) in Eq. 2-3

According to Bryk & Raudenbush (1992, p.72), the explained variation in Si in Eq. 2-3 (i.e., R-square) is calculated as: [Var(U1i)-Var(V1i)]/Var(U1i).

I followed your comments by using RESIDUAL option in the OUTPUT command of Mplus, the program output gave the model estimated means, intercepts, thresholds, as well as their residual information, only for the observed Ys and Xs.

Linda K. Muthen posted on Thursday, June 14, 2001 - 3:19 pm

Sorry I meant to say TECH4 not RESIDUAL. TECH4 contains model estimated means, variances, and covariances for the latent variables. RESIDUAL has them for observed variables. Your Model 1 estimate of the Si variance is not necessarily the same as the Model 2 estimate which is obtained from TECH4. Therefore, Mplus uses the Si variance from Model 2 when computing the r-square for Model 2.

Anonymous posted on Friday, June 15, 2001 - 8:59 am

Linda:
Thank you so much for your time and comments. To close up, I would like to come back to the original issue, which, I think, is important for many researchers who are interested in LGM.

1) When the variance of a slope factor is statistically significant, it indicates that the growth rate varies significantly, and its variation may be either random or systematical. The latter could be explained by some predictors.
2) When regressing the slope on predictors, I found that only one predictor had significant effect on the slope and the R-squares was very small (e.g., 0.06); in addition, the residual variance was not statistically significant (i.e., not different from zero).
The results are very difficult to interpret. First of all, the growth rate varied significantly across individuals; second, the variation was basically not explained by the predictors included; third, the variance of the error term was not statistically significant. How come the variation of the response measure (i.e., the slope factor here) was not explained on one hand; and the variance of the error term was not different from zero on the other hand. Where did the variation go? To my understanding, if variation of the response measure were not explained, it would be included in the error term. Your help will be highly appreciated.

bmuthen posted on Friday, June 15, 2001 - 9:33 am

The results may seem confusing, but are not difficult to interpret if you distinguish significance from size of an estimate and distinguish estimates from population values. The fact that the residual variance estimate is not significantly different from zero does not mean that the true population value for the residual variance is small. It can be quite large. All it may mean is that the sample is too small, gives too large a standard error, to reject that it is zero. I would conclude the following. First, the one predictor explains a significant part of the slope variation, although the size of that part is small. Second, although one might now expect that the residual variance is significant (since the total slope variance was found significant), its significance may have been pushed into insignificance by taking out the variance explained by the single predictor. A larger sample might have shown the residual variance to be significant. So a sizable residual variation might still be there. I would add that it is typically quite difficult to find good predictors of slopes. Hope this settles the matter.

Anonymous posted on Sunday, April 25, 2004 - 9:31 am

Following your discussion concerning the amount of variance explained in the slope factor of a LGModel was very helpful to me.

However, I think the basic question is whether it is possible to conduct a significance test of the R-square value of a latent variable (i.e. slope factor) comparable to Fishers F-test in standard regression analysis.

I guess my question is: is it legitimate to compute F = R^2(n-k-1)/(1-R^2)*k with n = sample size, k = number of manifest or latent predictors, and R^2 = (var(slope) - var(zeta))/var(zeta) for any recursive model?

Sorry if this question is slightly off-topic in the LGM section, however, your help will be highly appreciated!

bmuthen posted on Sunday, April 25, 2004 - 9:56 am

Don't know; it is not obvious to me that it is legitimate given that the F test is based on assumptions for the variances of observed variables and here you have latent variables that have different sampling variation than the observed. Any thoughts from other readers?

Jeremy Miles posted on Sunday, April 25, 2004 - 12:40 pm

If you wanted to get a significance test then you could fix all the paths to the latent variable to zero. This model would be nested within the original model, and the chi-square difference test will give you the significance value.

If you've got two latents (e.g. slope and intercept) you might want to get the multivariate test first, by fixing them all paths to both latents.

Jeremy

Anonymous posted on Sunday, April 25, 2004 - 2:22 pm

Bengt & Jeremy,

thank you very much for your help! The Chi-square difference test should do the job in my case.

However, just out of curiosity, has anyone any references concerning the use of the F-test within SEM? (e.g. would it be possible to test the amount of variance explained in the manifest indicators via traditional F tests?

Sylvana Robbers posted on Thursday, August 21, 2008 - 8:39 am

Dear dr. Muth�n,

I have a question about the following simple growth model:

i s | y1@0 y2@1 y3@2;

Although the model runs, the fit is terrible. The problem is the residual variance of y3, which is not estimated by Mplus (output: *****). This variable is normally distributed, and we don't see anything strange about it.

What could be an explanation for this absent residual variance?

Thanks in advance,
Sylvana

Linda K. Muthen posted on Thursday, August 21, 2008 - 11:48 am

It is estimated. The asterisks mean that the number cannot fit in the space provided. It sounds like your outcome has large variances. We recommend rescaling variables with large variance by dividing them by a constant in the DEFINE command. We recommend keeping variances between one and ten.

hanneke creemers posted on Friday, August 22, 2008 - 6:03 am

Dear dr. Muth�n,

Thank you very much for the explanation and advice on the problem my colleague and I encountered and posted in the previous post.

I think our problem is twofold. First, our outcome has - as you correctly conclude - large variances. We can reduce these by rescaling the variables. However, the fit of our model remains terrible. I think this is were the second problem comes in: there is a steep increase in the residual variances (residual variances after rescaling: y1=.246, y2=4.010, y3=29.74). Would you recommend a log transformation (or other method) to solve this issue?

Thanks in advance,
Hanneke

Linda K. Muthen posted on Friday, August 22, 2008 - 9:13 am

Did you rescale all of the repeated measures using the same constant. You need to do that. If so and you continue to have problems, send your files and license number to support@statmodel.com.

Justin Jager posted on Tuesday, November 18, 2008 - 12:31 pm

First, I run the following growth model.

MODEL:
Iheavy BY v308@1 v2708@1 v4708@1;
Lheavy BY v308@0 v2708@1 v4708@2;

[v308@0 v2708@0 v4708@0];

[Iheavy Lheavy];

v6782 On Iheavy;

(R-square of v6782 = .023)

Next, I regressed the IV on linear growth as well (IN CAPS)

v6782 on Iheavy;
V6782 ON LHEAVY;

(R-square of v6782 = .021)

To my surprise the R-square of the IV with the model that has the IV regressed on both the I and the L is smaller (.021) than the model that has just the IV regressed on just the I(.023). I purposely used a simple example here, but this same pattern repeats itself across more complicated models that include more significant relations.

Why is the R-square going down, at worst it should stay the same, right? What am I missing?

Linda K. Muthen posted on Wednesday, November 19, 2008 - 9:13 am

With latent variables, misfit in an over identified model can result in this happening. Also the covariance between the intercept and slope growth factors may be negative.

Ho Wang posted on Wednesday, March 30, 2011 - 2:07 pm

Hi, I got those star lines under "Estimate" and "S.E.", but numbers under Est./S.E. and p-value. I'm just wondering if those star lines stand for those numbers that cannot be estimated. If so, I'm more confused why there are numbers for Est./S.E. and p-value. If not, is there a way to see those numbers currently represented by these star lines?
Thanks,
Ho

Residual Variances
Estimate S.E. Est./S.E. p-value
Var5 ********* ********* 1.662 0.097

Linda K. Muthen posted on Wednesday, March 30, 2011 - 2:53 pm

The asterisks mean that the parameter estimate and standard error are too large to fit in the space allocated for them. You should rescale the variable. We recommend keeping variances of continuous variables between one and ten.

Ho Wang posted on Thursday, March 31, 2011 - 6:39 am

Thanks for tips. Ho

Ho Wang posted on Thursday, March 31, 2011 - 8:19 am

Hi Linda,
I recaled these variables and it worked. But I have an extra question about those stats for Model Fit, as they could not be simply multiplied by the scaling factor.
Should I report the actual model fit without scaling or the one with scaling? If I should report the actual one, for each step of my modeling, does that mean I need to run twice (i.e., one using scaling and one not)
Or is there an easy way to convert these fit stats using the scaling to the actual ones? Thanks, Ho
e.g.
*****************
With recaling:
Loglikelihood

H0 Value -2431.397
H0 Scaling Correction Factor 4.898
for MLR

Information Criteria
Bayesian (BIC) 4964.546
********
without scaling
Loglikelihood

H0 Value -5664.226
H0 Scaling Correction Factor 4.898
for MLR

Information Criteria

Bayesian (BIC) 11430.205

Linda K. Muthen posted on Thursday, March 31, 2011 - 1:13 pm

If I used scaled results, I would also use scaled fit statistics.

Utkun Ozdil posted on Saturday, May 07, 2011 - 1:14 pm

Hi,,

I have two questions about residual variances of models including covariates:

1. How can we interpret the results if the residual within-level variances are insignificant?
2. How can we interpret the results if the residual between-level variances are insignificant?

When residual variances do not significantly differ from zero should they be interpreted along with the R-square?

Thanks...

Bengt O. Muthen posted on Thursday, May 12, 2011 - 9:39 am

1-2. It really doesn't matter if a residual variance is significant or not.

Yes - you still get a point estimate of R-square.

Su Jung Park posted on Monday, January 15, 2018 - 8:50 am

Hello Dear. Dr. Muthen

I got a non-significant negative variance for slope (s2), and then I set it to 0. However, the results of stdyx estimate said "1" for s2 on s1 which was one of the pathways I hypothesized. I would love to report standardized estimate. How can I find it? Otherwise, is there a formula I can use to generate it myself since the program doesn't?

Thank you in advance.

Bengt O. Muthen posted on Monday, January 15, 2018 - 10:58 am

You would expect a standardized slope of 1 when you fix the residual variance at 0 for the DV (i.e. for s2). The numerator and denominator of the standardization is the same, or in other language - the R2 is 1 because there is nothing unexplained in s2.

Su Jung Park posted on Monday, January 15, 2018 - 12:39 pm

Thank you for your time to answer my question. I have one more question for you.
I want to report standardized estimate when I submit a manuscript. When I fix it to zero or not, estimate was different, but significant level was same. Should it be fine to report the result without fixing it to zero? Or, do I have to fix it to zero and report unstandardized estimate? Thank you again.

Bengt O. Muthen posted on Monday, January 15, 2018 - 1:41 pm

This question is more suitable for SEMNET.