Mplus Discussion >> Standardization in BSEM?

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Standardization in BSEM?

Mplus Discussion > Confirmatory Factor Analysis >

Message/Author

Fredrik Falkenström posted on Tuesday, November 12, 2013 - 8:22 am

Hi,

I read in the article "Bayesian Structural Equation Modeling: A More Flexible Representation of Substantive Theory" from 2012 by Bengt Muthén and Timohir Asparouhov that it is recommended to standardize the latent variable indicators in Bayesian CFA with informative priors. I assumed when I read the article that this was for pedagogical reasons (i.e. setting the prior variance to .01 will have the same meaning for all indicators). But in some circumstances it may not be a good idea to standardize (e.g. in growth curve modeling when you want to estimate the means over time). The problem then is, I guess, to know how to get the prior variances right? Is there a solution for this?

A related question, which is more of an Mplus question actually: When I tried to estimate a Bayesian CFA with unstandardized indicators Mplus would not produce standardized estimates. Why is this so?

Thank you very much in advance.

Best wishes,

Fredrik Falkenström
Department of Behavioral Sciences and Learning
Linköping University
Sweden

Bengt O. Muthen posted on Tuesday, November 12, 2013 - 8:44 am

Your assumption is correct and I agree that standardization can be dangerous such as with models that are not "scale-free", one example being growth models as you mention. Instead one can simply scale the variables so that they are similar (say variances in the 1-10 range). Differences of this magnitude probably don't affect the impact of the priors very much, but experimenting in terms of a prior sensitivity analysis is always useful. With growth variances change over time, but I would think they don't change enough to change prior impact. This is in a way a bit of a research area, so more work is welcome.

Regarding your last question, I don't see why this would happen, so please send the output to Support.

Fredrik Falkenström posted on Wednesday, November 13, 2013 - 4:22 am

Thanks! If I would be interested in means (which I'm not sure I am) I guess I can't scale the items using different weights (i.e. dividing or multiplying different indicators by different numbers) because then the means will be affected differently. However, I checked the variances in my sample and it seems that they are all close to 2 so I guess I could either divide them all by 2 and then use the .01 prior that you recommended in your paper, or I could use a .02 prior to reflect the larger variances?

Another question came to mind in this regard: When using the Inverse Wishart prior, you specify degrees of freedom rather than variance for the prior (although the df is translated into a variance for the prior). It seems from you article that the degrees of freedom are related only to the number of variables and not to the variance of the observed variables, is this right? Or are the degrees of freedom translated into a different prior variance if I use non-standardized indicators (with variances other than 1)?

Fredrik

P.S. I have sent my output with the error message about standardized parameters to the Support. D.S.

Bengt O. Muthen posted on Wednesday, November 13, 2013 - 2:13 pm

Let's say the prior variance is for a small cross-loading with factor variance 1. With Y variance 2, the standardized loading is obtained by dividing the loading by sqrt(2)=1.4 so not far from 1. So I don't think you need special prior variance considerations in that case.

Here is one way to think about the IW(Sigma,df) specification, where Sigma is the covariance matrix and df is the degrees of freedom. Denote the number variables of the covariance matrix by p. Our BSEM article appendix and wikipedia points out that df > p+1 makes the mean exist, so a weak prior may have df=p+2. A larger df makes the prior stronger. Furthermore, the mode is

mode = Sigma/(df+p+1).

So say that you want the mode to correspond to a variance of say v. Together with df=p+2, you can then solve for Sigma in IW(Sigma,df) as

Sigma = v*(2p+3).

For instance, p=10 and v=50 gives df=12 and Sigma = 50*23 = 1150, so IW(1150,12).

Fredrik Falkenström posted on Friday, November 15, 2013 - 4:35 am

I'm sorry, I have been thinking about this for some time now but I still don't get it. The df part I think I understand, but the Sigma part I don't.

When using standardized y-variables in your big-five analysis on females that I downloaded, you have p=15 and v=1 (due to standardization). That would mean Sigma = 1*(2*15+3)=33. But in your example input you define the prior as IW(0,21). Shouldn't it have been IW(33,21) then?

Best,

Fredrik

Bengt O. Muthen posted on Friday, November 15, 2013 - 4:43 pm

Yes, IW prior specifications are a bit awkward. By "Sigma" we mean a covariance matrix and we refer to an element that is either a variance (a diagonal element of Sigma) or a covariance (an off-diagonal element) in that matrix. In my post I was talking about variances, but my paper considered priors for covariances. Hence the zero in IW(0,21). So the mode for the covariance is zero in that case which is what we want to keep residual covariances small. Hope I said that right.

Fredrik Falkenström posted on Saturday, November 16, 2013 - 12:23 pm

Thanks, that was clarifying. But my original question was actually if the df for the covariances need to change if the variance of the observed variables change? Or are the df independent of the variance of the observed variables (it seems that way to me, but I want to be sure before I go on with my analyses)? Sorry for the unclear question.

I am also still a bit confused by your explanation about the prior for the variances. I thought that the p1-p15 in the input statement of your big five females example referred to the diagonal element (i.e. variances) and p16-p120 referred to the covariances? In that case, you specify a prior of IW(1,21) for the variances, but shouldn't it have been IW(33,21) according to your formula (see my previous posting for this calculation). Or does p1-p15 refer to something else?

Bengt O. Muthen posted on Monday, November 18, 2013 - 2:23 pm

The df does not depend on the variance size. You can think of the increasing number of df as having prior information giving an increasingly large addition to the sample data.

The df has to be the same for all Sigma elements.

Regarding your last question, yes, back then I did not go by the advice I am giving you now.

Fredrik Falkenström posted on Wednesday, November 20, 2013 - 12:29 am

Thanks! Now I think I get it. I hope you will bear with me for two last questions : )

1. The zero prior for the covariances is given, so that is no problem. However, the prior for the residual variances (i.e. the "v" in your formula) is more open. Would it make sense to think that the residual variances are likely to be considerably smaller than the observed variables's variances, for example residual variances in the range of 1/3 to 1/2 of the observed variable variances)?

2. Are there any rules of thumb for how large residual correlations can be (or how many moderately large residual correlations there can be) before the model will seem problematic? In your paper I get the sense that you interpret the fact that only one residual correlation was above .50 as a sign that the there is no problem with the model, or did I misunderstand that? It seems from my experiments so far that model fit is often excellent when allowing estimation of residual correlations using small variance priors, but as you state in your paper there may still be problems with the model if too many residual correlations are large.

Bengt O. Muthen posted on Wednesday, November 20, 2013 - 1:41 pm

1. Sure. Or you can run ML to get those variances.

2. No rules of thumb yet - this is a research area where we have to gather experience.

Fredrik Falkenström posted on Thursday, November 21, 2013 - 11:27 pm

Thanks!

Fredrik

Fredrik Falkenström posted on Sunday, March 23, 2014 - 1:18 pm

I am trying to use the Inverse Wishart distribution as a prior for estimating the variances and covariances among latent factors in a longitudinal CFA model. There are three factors all estimated at ten time-points. The variances are supposed to be close to 1, so I tried the formula Sigma = v*(2p+3), and arrived at IW(63,36) (i.e. 1*(2*30+3)) for the factor variances. However, it seems like this must have been wrong, because I get factor variance estimates up to about 60. Have I misunderstood something?

Thank you very much for your help!

Best,

Fredrik Falkenström

Tihomir Asparouhov posted on Tuesday, March 25, 2014 - 1:25 pm

On top of page 57,
http://statmodel.com/download/BayesAdvantages18.pdf
you can find the marginal distribution of your diagonal elements and from here
http://en.wikipedia.org/wiki/Inverse-gamma_distribution
you can get the marginal mean and mode.

The marginal mode is
Sigma/(df-p+3)
The full distribution mode is
Sigma/(df+p+1)

If the DF is large enough those lead to the same setting but not otherwise. In your case the DF is not large enough.

In BSEM you vary the DF parameter as part of the sensitivity analysis. As you increase the DF you get priors with smaller variances.

If you want the variances to be tight enough around 1 you need to choose bigger DF. Also once you decide what DF you want to use perhaps consider the difference between the marginal and full mode formulas above. To match the marginal mode use
marginal mode=Sigma/(df-p+3).
which with DF=p+2 would be
Sigma=5*v.

Alternatively, use the mean formulas instead of the mode. Those are the same for the full and the marginal distributions
mean=Sigma/(df-p-1)
With DF=p+2,
Sigma=v.

Ultimately setting up the prior shouldn't really affect the results.

Fredrik Falkenström posted on Wednesday, March 26, 2014 - 3:09 am

Dear Timohir,

Thank you very much for this explanation. A short follow-up question: One of my factors is uncorrelated with the other two (for all occasions). In this case I guess this factor will have its own IW block, and I will count degrees of freedom for this factor separately from the other two, right? i.e. with ten occasions (and the first variance fixed at 1) the p will be 9 for the covariance matrix for this variable and 18 for the matrix of the other two (which are correlated)? So df will be 9 + whatever value I chose for the first factor and 18 + something for the other two?

Again, thank you very much for your help!

Fredrik

Tihomir Asparouhov posted on Wednesday, March 26, 2014 - 2:37 pm

Yes. It is also possible to add IW prior that includes the initial point, i.e., IW of size 10 and 20.

Fredrik Falkenström posted on Thursday, March 27, 2014 - 10:49 am

Great, thanks!

Fredrik

Mayara Monteiro posted on Tuesday, November 19, 2019 - 1:48 am

Hi,

I have 5 latent variables:
¤ 4 measured by categorical indicators (5 and 6 point Likert-scale)
¤ 1 measured by 2 continuous and 1 binary indicators.

I have declared the categorical and binary indicators as categorical and standardized the continuous indicators.

Do I need to have all the indicators in the same scale (standardize also the categorical variables) for defining the small cross-loadings priors in the CFA when having mixed indicators (categorical and continuous) in the BSEM approach? What are the consequences for interpretability of the standardization of categorical variables?

If not, should I define different priors for the cross-loadings of "categorical composition only" and "mixed composition" latent variables due to the differences in scales?

Moreover, how should I specify the Inverse Wishart priors for the residual variances, having categorical and continuous indicators?

I really appreciate the indication of some example or paper where I can observe better these and possible other particularities of a BSEM with mixed indicators.

Thank you in advance!

Bengt O. Muthen posted on Wednesday, November 20, 2019 - 4:16 pm

Don't try to standardize the categorical variables. Their latent response variable variances are approximately 1 so that goes well together with variances approximately 1 for the continuous variables.

Don't get into Inverse Wishart if you don't have to.

I don't know of a pedagogical paper on this blend of variables.

Mayara Monteiro posted on Sunday, November 24, 2019 - 10:36 am

Thank you for clarifying about scale compatibility.

I have performed the sensitive analysis for cross-loadings in the CFA model and the PPP peaks (PPP=0.106) at the prior variance 0.03. For the complete model, the PPP=0.193, but some cross-loadings are significant and some causal relations loses its significance (compared to Bayesian SEM with default diffuse priors – PPP=0.025). Theoretically, it does not make sense to free these significant cross-loadings. Thus, I have tried different variances in the complete model and have noticed that for N~(0,0.002) all model relationships are significant and the PPP=0.101(CFA model only has a PPP=0.017). Any variance above 0.002, makes some cross-loadings significant and relations in the model non-significant.

My questions are:
1. Is it correct to try different cross-loading priors for the entire model, instead of only considering the CFA model?
2. I found that “A PPP value above 0.05 indicates acceptable fit and a PPP value around 0.5 indicates an excellent-fitting model”, but the slides “Bayesian Analysis Using Mplus” say that a PPP=0.068 indicates a Poorly Fitting Model. Starting from which value of PPP is the model considered good and not only acceptable?
3. Should I stop there or continue looking for improvements in the PPP by adding priors for the residual correlations?

Thank you very much for your time and patience in helping me.

Tihomir Asparouhov posted on Monday, November 25, 2019 - 1:36 pm

1. Yes

2. We generally recommend 0.05. You can change that with some effort see Section 2.3.6
http://statmodel.com/download/BayesFit.pdf

3. It depends. If residual correlations are theoretically more acceptable than cross-loadings then you should pursue that.

Note that after running BSEM and identifying the potential cross-loadings, you can estimate the model only with those cross loadings (no priors). That will recover your ability to detect significant relations.