I read in the article "Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory" from 2012 by Bengt Muthén and Timohir Asparouhov that it is recommended to standardize the latent variable indicators in Bayesian CFA with informative priors. I assumed when I read the article that this was for pedagogical reasons (i.e. setting the prior variance to .01 will have the same meaning for all indicators). But in some circumstances it may not be a good idea to standardize (e.g. in growth curve modeling when you want to estimate the means over time). The problem then is, I guess, to know how to get the prior variances right? Is there a solution for this?
A related question, which is more of an Mplus question actually: When I tried to estimate a Bayesian CFA with unstandardized indicators Mplus would not produce standardized estimates. Why is this so?
Thank you very much in advance.
Fredrik Falkenström Department of Behavioral Sciences and Learning Linköping University Sweden
Your assumption is correct and I agree that standardization can be dangerous such as with models that are not "scale-free", one example being growth models as you mention. Instead one can simply scale the variables so that they are similar (say variances in the 1-10 range). Differences of this magnitude probably don't affect the impact of the priors very much, but experimenting in terms of a prior sensitivity analysis is always useful. With growth variances change over time, but I would think they don't change enough to change prior impact. This is in a way a bit of a research area, so more work is welcome.
Regarding your last question, I don't see why this would happen, so please send the output to Support.
Thanks! If I would be interested in means (which I'm not sure I am) I guess I can't scale the items using different weights (i.e. dividing or multiplying different indicators by different numbers) because then the means will be affected differently. However, I checked the variances in my sample and it seems that they are all close to 2 so I guess I could either divide them all by 2 and then use the .01 prior that you recommended in your paper, or I could use a .02 prior to reflect the larger variances?
Another question came to mind in this regard: When using the Inverse Wishart prior, you specify degrees of freedom rather than variance for the prior (although the df is translated into a variance for the prior). It seems from you article that the degrees of freedom are related only to the number of variables and not to the variance of the observed variables, is this right? Or are the degrees of freedom translated into a different prior variance if I use non-standardized indicators (with variances other than 1)?
P.S. I have sent my output with the error message about standardized parameters to the Support. D.S.
Let's say the prior variance is for a small cross-loading with factor variance 1. With Y variance 2, the standardized loading is obtained by dividing the loading by sqrt(2)=1.4 so not far from 1. So I don't think you need special prior variance considerations in that case.
Here is one way to think about the IW(Sigma,df) specification, where Sigma is the covariance matrix and df is the degrees of freedom. Denote the number variables of the covariance matrix by p. Our BSEM article appendix and wikipedia points out that df > p+1 makes the mean exist, so a weak prior may have df=p+2. A larger df makes the prior stronger. Furthermore, the mode is
mode = Sigma/(df+p+1).
So say that you want the mode to correspond to a variance of say v. Together with df=p+2, you can then solve for Sigma in IW(Sigma,df) as
Sigma = v*(2p+3).
For instance, p=10 and v=50 gives df=12 and Sigma = 50*23 = 1150, so IW(1150,12).
I'm sorry, I have been thinking about this for some time now but I still don't get it. The df part I think I understand, but the Sigma part I don't.
When using standardized y-variables in your big-five analysis on females that I downloaded, you have p=15 and v=1 (due to standardization). That would mean Sigma = 1*(2*15+3)=33. But in your example input you define the prior as IW(0,21). Shouldn't it have been IW(33,21) then?
Yes, IW prior specifications are a bit awkward. By "Sigma" we mean a covariance matrix and we refer to an element that is either a variance (a diagonal element of Sigma) or a covariance (an off-diagonal element) in that matrix. In my post I was talking about variances, but my paper considered priors for covariances. Hence the zero in IW(0,21). So the mode for the covariance is zero in that case which is what we want to keep residual covariances small. Hope I said that right.
Thanks, that was clarifying. But my original question was actually if the df for the covariances need to change if the variance of the observed variables change? Or are the df independent of the variance of the observed variables (it seems that way to me, but I want to be sure before I go on with my analyses)? Sorry for the unclear question.
I am also still a bit confused by your explanation about the prior for the variances. I thought that the p1-p15 in the input statement of your big five females example referred to the diagonal element (i.e. variances) and p16-p120 referred to the covariances? In that case, you specify a prior of IW(1,21) for the variances, but shouldn't it have been IW(33,21) according to your formula (see my previous posting for this calculation). Or does p1-p15 refer to something else?
Thanks! Now I think I get it. I hope you will bear with me for two last questions : )
1. The zero prior for the covariances is given, so that is no problem. However, the prior for the residual variances (i.e. the "v" in your formula) is more open. Would it make sense to think that the residual variances are likely to be considerably smaller than the observed variables's variances, for example residual variances in the range of 1/3 to 1/2 of the observed variable variances)?
2. Are there any rules of thumb for how large residual correlations can be (or how many moderately large residual correlations there can be) before the model will seem problematic? In your paper I get the sense that you interpret the fact that only one residual correlation was above .50 as a sign that the there is no problem with the model, or did I misunderstand that? It seems from my experiments so far that model fit is often excellent when allowing estimation of residual correlations using small variance priors, but as you state in your paper there may still be problems with the model if too many residual correlations are large.
I am trying to use the Inverse Wishart distribution as a prior for estimating the variances and covariances among latent factors in a longitudinal CFA model. There are three factors all estimated at ten time-points. The variances are supposed to be close to 1, so I tried the formula Sigma = v*(2p+3), and arrived at IW(63,36) (i.e. 1*(2*30+3)) for the factor variances. However, it seems like this must have been wrong, because I get factor variance estimates up to about 60. Have I misunderstood something?
The marginal mode is Sigma/(df-p+3) The full distribution mode is Sigma/(df+p+1)
If the DF is large enough those lead to the same setting but not otherwise. In your case the DF is not large enough.
In BSEM you vary the DF parameter as part of the sensitivity analysis. As you increase the DF you get priors with smaller variances.
If you want the variances to be tight enough around 1 you need to choose bigger DF. Also once you decide what DF you want to use perhaps consider the difference between the marginal and full mode formulas above. To match the marginal mode use marginal mode=Sigma/(df-p+3). which with DF=p+2 would be Sigma=5*v.
Alternatively, use the mean formulas instead of the mode. Those are the same for the full and the marginal distributions mean=Sigma/(df-p-1) With DF=p+2, Sigma=v.
Ultimately setting up the prior shouldn't really affect the results.
Thank you very much for this explanation. A short follow-up question: One of my factors is uncorrelated with the other two (for all occasions). In this case I guess this factor will have its own IW block, and I will count degrees of freedom for this factor separately from the other two, right? i.e. with ten occasions (and the first variance fixed at 1) the p will be 9 for the covariance matrix for this variable and 18 for the matrix of the other two (which are correlated)? So df will be 9 + whatever value I chose for the first factor and 18 + something for the other two?