Priors for residual covariances in BSEM
Message/Author
 Benedict Orindi posted on Thursday, March 10, 2016 - 4:16 am
Dear Bengt,

In your paper "Bayesian Structural Equation Modeling with Cross-loadings and Residual Covariances: Comments on Stromeyer et al", it is indicated that the prior for the residual covariance matrix can be set to IW(dD,d) with d=100 (for sample size near 500), d=1000 (for sample size near 5000), etc. This magnitude of degrees of freedom seems to be very informative and I am not getting the motivation quite well.
Why is the choice of degree of freedom (d) based on sample size? In other statistical literature (e.g. Lesaffre and Lawson, (2012) Bayesian Biostatistics), it is indicated that to obtain a minimally informative inverse Wishart distribution, d should be taken approximately equal to p where D is a p-by-p matrix such that we have the prior as IW(D, p) or IW(D, p+1). The latter choices are based on the number of indicators rather than sample size.
 Tihomir Asparouhov posted on Friday, March 11, 2016 - 10:14 am
Intuitively speaking ... having a prior IW(dD,d) is like adding d observations that have a variance covariance D. If you are aiming for a certain/constant level of "informativeness" the bigger the sample size is the more of these "prior added observations" you want to have so that the observed sample has the same level of dominance over the prior.
 Benedict Orindi posted on Friday, March 11, 2016 - 10:28 pm
Many thanks for your response, Tihomir.
 Roosevelt Vilar Lobo de Souza posted on Friday, September 13, 2019 - 4:09 pm
Hi mplus team,

I have a question about the prior for residual correlation in a BCFA model, I was hoping somebody could help me here. My model does not work after adding priors for the cross-loadings (PPP < .05) and now I am exploring the addition of priors for the residual correlations. Using the inverse-Wishart prior, Asparouhov, Muthen, and & Morin (2015) have recommended starting values for the parameter d of 100 for a sample with 500 participants (my sample is 500 participants as well). In case of non-convergence or PPP < .05 they recommend reducing or increasing the value of d by 50.

My question is related to what is the actual prior that is being set when a d of 50 or 100 is used for a sample of 500. The prior for the cross-loadings is very straight forward and easy to understand, but I can’t figure out what is the actual prior that I am setting for the residuals when I use a d of 50 or 100.

I appreciate any comments. Below are the priors that I am using for the residuals. My model has 6 latent facots and 18 items (3 items per factor)

r1-r18 ~ IW (1,50);
r19-r171 ~ IW (0,50);
 Tihomir Asparouhov posted on Saturday, September 14, 2019 - 4:24 pm
You should keep any cross loadings that came out significant even though PPP is >0.05. I would suggest that you run type=basic and make sure there are no sample corrections close to 1 or -1 (if there are the items can be replaced by one item which is the sum/difference of the two).

The prior should be set like this
r19-r171 ~ IW (0,100);
For the first 18 parameters it is more work. See page 5
If the residual variance of the first item is 1.743 (this is the estimate from the run that includes some cross loadings and gave you PPP<0.05). You multiply the number d. In your case it is 100 so you get 100*1.743=17.43 and the prior for the first parameter then is
r1~IW(17.43,100)
Repeat that for the rest of the items.
 Roosevelt Vilar Lobo de Souza posted on Wednesday, September 18, 2019 - 10:48 pm
Dear Tihomir,

Thank you for your detailed explanation. I have followed your suggestions and examined my model with d = 100 and adding the specific residuals for the first 18 parameters [~ IW( residual*100, 100)]. I am testing this model in 20 countries and when I run the analysis as you suggested, the PPP varies between .001 and .008 for each country.

If I use a more liberal d (e.g. d = 80) I found acceptable PPP (> .05). I read in your paper (Asparouhov, Muthen, & Morin, 2015) that if the starting value of d = 100 does not yield acceptable fit or convergence, one can reduce or increase the d by 50 and investigate the prior sensitivity to find the largest d that converges the data and yield a PPP > .05.

I am having trouble to understand whether a d = 80 represents a prior that still constrains the residual covariances to values approximately fixed to zero or whether a d = 80 is too liberal for a sample with 500 participants. Is there a way I can calculate how close to zero my prior is constraining the residuals when I use a d = 80?

Many thanks, Roosevelt
 Tihomir Asparouhov posted on Thursday, September 19, 2019 - 3:45 pm
I would say that d=80 is perfect. If the value is too low/liberal then you will see very slow convergence and possible non-convergence. Use output:tech1 to find the marginal prior mean and variance (end of tech1) or refer to the last formula on page 57 to compute it manually
 Roosevelt Vilar Lobo de Souza posted on Tuesday, October 15, 2019 - 5:42 pm
Dear Tihomir,

Thank you for your comment. I have checked the end of tech 1 as you recommended, and I found a “infinity” prior variance and prior std. dev. for some of the cross-loadings parameters [e.g. Parameter 1~N(0.000,infinity); Prior variance = infinity; prior std. dev = infinity]. Is that suggesting a problem in my model that I must fix?

Regarding the parameters with the Wishart-inverted for the residual correlations [e.g. Parameter 128~IW(0.000,80)], I found .000 for most of both the prior variance and prior std dev. Is it suggesting a problem with my model? Does it mean that the prior variance for the residual correlations was lower than .000?

Any comments or advice that you can give me would be greatly appreciated.

Many thanks, Roosevelt
 Tihomir Asparouhov posted on Wednesday, October 16, 2019 - 9:15 am
Both issues reveal that you have typos in your input file. if you can't find them - send your example to support@statmodel.com.