

Priors for residual covariances in BSEM 

Message/Author 


Dear Bengt, In your paper "Bayesian Structural Equation Modeling with Crossloadings and Residual Covariances: Comments on Stromeyer et al", it is indicated that the prior for the residual covariance matrix can be set to IW(dD,d) with d=100 (for sample size near 500), d=1000 (for sample size near 5000), etc. This magnitude of degrees of freedom seems to be very informative and I am not getting the motivation quite well. Why is the choice of degree of freedom (d) based on sample size? In other statistical literature (e.g. Lesaffre and Lawson, (2012) Bayesian Biostatistics), it is indicated that to obtain a minimally informative inverse Wishart distribution, d should be taken approximately equal to p where D is a pbyp matrix such that we have the prior as IW(D, p) or IW(D, p+1). The latter choices are based on the number of indicators rather than sample size. 


Intuitively speaking ... having a prior IW(dD,d) is like adding d observations that have a variance covariance D. If you are aiming for a certain/constant level of "informativeness" the bigger the sample size is the more of these "prior added observations" you want to have so that the observed sample has the same level of dominance over the prior. 


Many thanks for your response, Tihomir. 


Hi mplus team, I have a question about the prior for residual correlation in a BCFA model, I was hoping somebody could help me here. My model does not work after adding priors for the crossloadings (PPP < .05) and now I am exploring the addition of priors for the residual correlations. Using the inverseWishart prior, Asparouhov, Muthen, and & Morin (2015) have recommended starting values for the parameter d of 100 for a sample with 500 participants (my sample is 500 participants as well). In case of nonconvergence or PPP < .05 they recommend reducing or increasing the value of d by 50. My question is related to what is the actual prior that is being set when a d of 50 or 100 is used for a sample of 500. The prior for the crossloadings is very straight forward and easy to understand, but I can’t figure out what is the actual prior that I am setting for the residuals when I use a d of 50 or 100. I appreciate any comments. Below are the priors that I am using for the residuals. My model has 6 latent facots and 18 items (3 items per factor) r1r18 ~ IW (1,50); r19r171 ~ IW (0,50); 


You should keep any cross loadings that came out significant even though PPP is >0.05. I would suggest that you run type=basic and make sure there are no sample corrections close to 1 or 1 (if there are the items can be replaced by one item which is the sum/difference of the two). The prior should be set like this r19r171 ~ IW (0,100); For the first 18 parameters it is more work. See page 5 https://www.statmodel.com/download/BSEMRejoinder.pdf If the residual variance of the first item is 1.743 (this is the estimate from the run that includes some cross loadings and gave you PPP<0.05). You multiply the number d. In your case it is 100 so you get 100*1.743=17.43 and the prior for the first parameter then is r1~IW(17.43,100) Repeat that for the rest of the items. 


Dear Tihomir, Thank you for your detailed explanation. I have followed your suggestions and examined my model with d = 100 and adding the specific residuals for the first 18 parameters [~ IW( residual*100, 100)]. I am testing this model in 20 countries and when I run the analysis as you suggested, the PPP varies between .001 and .008 for each country. If I use a more liberal d (e.g. d = 80) I found acceptable PPP (> .05). I read in your paper (Asparouhov, Muthen, & Morin, 2015) that if the starting value of d = 100 does not yield acceptable fit or convergence, one can reduce or increase the d by 50 and investigate the prior sensitivity to find the largest d that converges the data and yield a PPP > .05. I am having trouble to understand whether a d = 80 represents a prior that still constrains the residual covariances to values approximately fixed to zero or whether a d = 80 is too liberal for a sample with 500 participants. Is there a way I can calculate how close to zero my prior is constraining the residuals when I use a d = 80? Many thanks, Roosevelt 


I would say that d=80 is perfect. If the value is too low/liberal then you will see very slow convergence and possible nonconvergence. Use output:tech1 to find the marginal prior mean and variance (end of tech1) or refer to the last formula on page 57 to compute it manually https://www.statmodel.com/download/BayesAdvantages18.pdf 


Dear Tihomir, Thank you for your comment. I have checked the end of tech 1 as you recommended, and I found a “infinity” prior variance and prior std. dev. for some of the crossloadings parameters [e.g. Parameter 1~N(0.000,infinity); Prior variance = infinity; prior std. dev = infinity]. Is that suggesting a problem in my model that I must fix? Regarding the parameters with the Wishartinverted for the residual correlations [e.g. Parameter 128~IW(0.000,80)], I found .000 for most of both the prior variance and prior std dev. Is it suggesting a problem with my model? Does it mean that the prior variance for the residual correlations was lower than .000? Any comments or advice that you can give me would be greatly appreciated. Many thanks, Roosevelt 


Both issues reveal that you have typos in your input file. if you can't find them  send your example to support@statmodel.com. 

Back to top 

