CFA with Bayesian estimation PreviousNext
Mplus Discussion > Confirmatory Factor Analysis >
 Paul A.Tiffin posted on Saturday, November 06, 2010 - 12:22 pm
Dear Team,

I really appreciate the new Bayesian features in Mplus 6.1.

However, I have a quick query: I am struggling to achieve convergence (psr<=1) when using Bayesian estimation with informative priors on a CFA. As with ML, would adding plausible starting values to the model help achieve convergence?
Many Thanks

 Bengt O. Muthen posted on Saturday, November 06, 2010 - 12:49 pm
Yes, it does. You can also try using STVALUES = ML in the ANALYSIS command. I typically don't see that much of convergence problems with Bayes. An exception of course is if the model is not identified, or identified only with informative priors and they have too large variances.
 Mike Zyphur posted on Tuesday, March 13, 2012 - 8:30 pm
Hi Bengt,
Have you noticed shorter estimation times in very complex models using such empirical priors? They would seem to reduce the need for a substantial burn in phase.

 Bengt O. Muthen posted on Wednesday, March 14, 2012 - 10:17 am
In my current experience, I don't see much of a benefit of using intelligent starting values - I don't use that approach.
 luk bruyneel posted on Thursday, April 25, 2013 - 1:29 pm
Hello dr Muthen,

Below syntax is based on your example 9.19. Could you tell me which statements I need to add to check measurement invariance confer the third example in 'Bayesian SEM: A more flexible representation'? I found file 'run23' of that paper, but have no idea how to implement it in this multilevel analysis. Any suggestions are welcome. Many thanks, Luk


OPSw BY A_1_4 A_1_5 A_1_18;
RNMDw BY A_1_2 A_1_7 A_1_13 A_1_17 A_1_21 A_1_26 A_1_30;
STAFw BY A_1_8 A_1_9 A_1_12;
PARTw BY A_1_3 A_1_6 A_1_11 A_1_23 A_1_25 A_1_29;
QUALw BY A_1_19 A_1_20 A_1_24 A_1_27 A_1_28 A_1_31 A_1_32;

s1 | OPSw ON func;
s2 | RNMDw ON func;
s3 | STAFw ON func;
s4 | PARTw ON func;
s5 | QUALw ON func;

F1 BY A_1_4 A_1_5 A_1_18 A_1_3 A_1_6 A_1_11 A_1_23 A_1_25
A_1_29 A_1_15 A_1_16;
F2 BY A_1_2 A_1_7 A_1_13 A_1_17 A_1_21 A_1_26 A_1_30;
F3 BY A_1_1 A_1_8 A_1_9 A_1_12 A_1_10 A_1_22 A_1_19 A_1_20
A_1_24 A_1_27 A_1_28 A_1_31 A_1_32;

s1 s2 s3 s4 s5 F1 F2 F3 ON exp lan degree;
 Bengt O. Muthen posted on Thursday, April 25, 2013 - 9:01 pm
Not sure what you mean here. What is the "third example"? What type of measurement invariance are you interested in - across Within-Between?

Run23 is a MIMIC model with cross-loadings and direct effects which it doesn't look like you are after.
 luk bruyneel posted on Friday, April 26, 2013 - 12:15 am
Thanks for your prompt reply. The main covariate of interest is 'func'. This is coded as 0 for staff and 1 for their chiefs (with exactly one chief in each unit). I see that the factor means of chiefs are for all but one factor on the within level significantly higher. Now I want to know that I'm not comparing apples and oranges like Linda says in the webvideo. The 'third example' I use for reference is 'Study 3: Direct eff ects in MIMIC modeling' from the BSEM paper, from which you concluded 'This illustrates the importance of allowing for all possible direct e ffects using informative, small-variance priors.' How can I implement this in my model?
Also, since I use RANDOM, is there no way to reproduce Table 23 from the BSEM paper? So I stick to EFA for the factor loadings confer Table 22 of your paper?

Thanks in advance,

 Bengt O. Muthen posted on Friday, April 26, 2013 - 2:04 pm
BSEM MIMIC modeling can be done in line with the UG ex 5.32. Because you consider func as the covariate, you would apply this input to the Within level.

Even if you use RANDOM in your setup above, you would still get estimated loadings on each level because the random aspect is only for the regression slopes of the regressions of the factors on func.
 luk bruyneel posted on Wednesday, May 15, 2013 - 3:03 am
Thank you, that helped a lot. My final model is a two-level Bayesian mimic model with cross-loadings and direct effects with zero mean and small-variance priors. I have three questions related to this:

1. In my output I don't get PPC. What could be the reason?
2. Apologies, but I'm still confused what I can conclude from a mimic model regarding the factor structure across groups. In your 1989 Psychometrika paper (latent variable modeling in...) I read on page 563 that it allows an investigation of hypotheses of construct validity and invariance across subpopulations. I get the invariance part, but how does this relate to construct validity exactly? Is that because the direct effects of covariates on items are tested over and above the indirect effect via the factors? Or, in the Topic 1 slides (slide 175) it says that the model ASSUMES the same factor loadings, observed residual variances/covariances, factor variances and covariances for all levels of the covariates. Should I thus first do a MGCFA?
3. If I have multiple covariates I want to test, would you recommend first testing them all seperately, or immediately all together?

All the best,

 Bengt O. Muthen posted on Wednesday, May 15, 2013 - 8:38 am
1. We have not yet implemented PPC for multilevel models.

2. I would forget about the construct validity issue here - you are testing for invariance. MIMIC assumes what you mention, but that doesn't mean you have to do MGFA to check that. Just look at the MIMIC approach as an approximate way to discover major non-invariance in the intercepts.

3. In MIMIC I would do them all together, which is the strength of MIMIC.
 Rebecca D Rhead posted on Friday, September 20, 2013 - 4:27 am
Im having difficulty obtaining a Posteria Predictive P-value for my CFA model.

Names are
crisis people resource future disaster cntrysde control priority species id;
Missing are all (-9999) ;
USEV = crisis people resource future disaster cntrysde control priority species;

STANDARDIZE crisis people resource future disaster cntrysde control priority species;

ESTIMATOR = bayes;
CHAIN = 4;
FBITER = 50000;
POINT = mean;

f1 BY crisis* future control priority;
f2 BY disaster* resource people;
f3 BY cntrysde* species;

f1 BY disaster resource people cntrysde species*0 (a5-a9);
f2 BY crisis future control priority*0 (b1-b4);
f2 BY cntrysde species*0 (b8-b9);
f3 BY crisis future control priority disaster resource people*0 (c1-c7);

a5-a9 ~ N(0,.01);
b1-b4 ~ N(0,.01);
b8-b9 ~ N(0,.01);
c1-c7 ~ N(0,.01);

The ppp came out at 0.000.
 Rebecca D Rhead posted on Friday, September 20, 2013 - 4:27 am
I then ran a basic version of the model without crossloadings using the syntax below and had the same problem.

Names are
crisis people resource future disaster cntrysde control priority species id;
Missing are all (-9999) ;
USEV = crisis people resource future disaster control priority cntrysde species;
CATEGORICAL are crisis people resource future disaster control priority cntrysde species;

STANDARDIZE crisis people resource future disaster cntrysde control priority species;

ESTIMATOR = bayes;

f1 BY crisis future control priority;
f2 BY disaster resource people;
f3 BY cntrysde species;


I have run each factor individually, and each produces good ppp values.

I also ran the entire model with ML estimation, which appears to run fine. Goodness of fit statistics indicate the model is good.

I have re-checked the data in STATA and cannot see a problem.

Im completely stuck at this point.

Any help would be much appreciated.
 Linda K. Muthen posted on Friday, September 20, 2013 - 12:00 pm
Please send the ML and BAYES outputs and your license number to

You should check the data in Mplus not STATA. It is how Mplus is reading the data that is the issue. You could be reading it incorrectly in Mplus.

In the future, please do not post questions that cannot fit in one window.
 luk bruyneel posted on Tuesday, October 15, 2013 - 2:03 am
Hi dr. Muthén,

I completed a study with a Bayesian 2-level MIMIC model. Covariates are at the lowest level. My findings are very interesting in that I find significant effects for the main covariate of interest on several latent variables. More concretely, I find that nurse managers compared to staff nurses have consistenty higher ratings of several aspects of nurses' work environment. One could say that nurse managers don't seem to grasp what is going on.

I hypothesize that this would lead to decreased wellbeing among nurses. So is there any way that I could model this (thus add an outcome variable to this model)?
 Linda K. Muthen posted on Tuesday, October 15, 2013 - 10:34 am
You should ask this on a general discussion forum like SEMNET or Multilevelnet.
 an-tsu chen posted on Thursday, November 28, 2013 - 1:55 am
Dear Team,

When doing Bayesian estimation, we often choose the inverse-wishart distribution when modeling the normal covariance matrix. But what if some of the variables are categorical? Does this case change the way we assign parameters to the IW distribution?

I have already referred to the User's Guide but did not see any information addressing this, so I came here to ask this question. Thank you!!

 Bengt O. Muthen posted on Thursday, November 28, 2013 - 8:06 am
With categorical variables you can still use IW priors for say the factor covariance matrix Psi and the residual covariance matrix Theta (e.g. off-diagonal elements can be set to have zero-mean, small-variance priors). See also our technical papers on Bayes implementation in Mplus which are on our website.
 'Alim Beveridge posted on Friday, January 24, 2014 - 9:16 pm
Dear Bengt and Linda,

I am trying to run a Bayesian CFA model similar to the one shown in Table 15 of Muthén & Asparouhov 2012 (Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17(3): 313–335); that is, one where residual covariances have inverse Wishart piors. I am basing my code on the code in run15.inp on your website. I have noticed in the code the following line that also assigns inverse Wishart piors with an estimate of 1 to the residual variances:

Could you explain why this is necessary? It is not explained in the article. I notice that when I leave it out, I get the following error:



Thanks for your help and this very useful site!
 Bengt O. Muthen posted on Sunday, January 26, 2014 - 12:20 pm
You want to give priors for the whole covariance matrix, so not just the off-diagonal covariances but also the diagonal variances. Regarding the settings for the inverse Wishart, we recently posted the following on the variances, implying that the choice of IW(1,21) is not always the best.

Here is one way to think about the IW(Sigma,df) specification, where Sigma is the covariance matrix and df is the degrees of freedom. Denote the number variables of the covariance matrix by p. Our BSEM article appendix and wikipedia points out that df > p+1 makes the mean exist, so a weak prior may have df=p+2. A larger df makes the prior stronger. Furthermore, the mode is

mode = Sigma/(df+p+1).

So say that you want the mode to correspond to a variance of say v. Together with df=p+2, you can then solve for Sigma in IW(Sigma,df) as

Sigma = v*(2p+3).

For instance, p=10 and v=50 gives df=12 and Sigma = 50*23 = 1150, so IW(1150,12).
 'Alim Beveridge posted on Sunday, January 26, 2014 - 9:41 pm
Dear Bengt,

thanks for the thorough explanation. 3 followup questions:
1. Do you still recommend IW(0,p+2) for the covariances?
2. Is it correct to conclude from your post that if I give priors to any element in the var-covariance matrix I must give priors to all elements? Suppose I have a SEM with 50 observed variables and know that the measurement model of 2 LVs (each having 5 indicators) is problematic so want to use your methods of assigning IW priors to the 90 residual covariances between and 10 variances of their indicators. Do I have to assign IW priors to all 1225 covariances and 50 variances? Or is there a way to do so only for the part of the model that I believe to have problems?
3. Are cross-loadings with 0-mean normal priors required to make this work, or can I give IW priors to the covariance matrix without also giving priors to cross-loadings?

thanks for the help.
 Tihomir Asparouhov posted on Monday, January 27, 2014 - 10:55 am
1. Yes, but the DF parameter is usually varied for sensitivity analysis

2. You can assign only to those 90 residual covariances ( ... but to be accurate ... there are 45 covariances and 10 variances ... so you will be assigning a prior to 55 parameters).

3. You need priors for the cross loadings.
 Tihomir Asparouhov posted on Monday, January 27, 2014 - 2:43 pm
This is a clarification on 3. You need priors for unindentified cross loadings such as when all cross loadings are included. If you are using cross loadings that are identifiable (for example when only some cross loadings are included in the model) you don't need priors.
 Bengt O. Muthen posted on Monday, January 27, 2014 - 2:55 pm
And, the use of cross-loading priors is not tied to the use of IW priors.
 'Alim Beveridge posted on Sunday, February 09, 2014 - 7:47 pm
Thanks for all the guidance. A clarification question: does a smaller DF value for an IW prior mean a stricter (more informative) prior (as with a normal prior's variance) or does it mean a less informative prior?

 'Alim Beveridge posted on Sunday, February 09, 2014 - 8:28 pm
Please ignore my above question. I see you already answered it in an earlier post: "a weak prior may have df=p+2. A larger df makes the prior stronger."
 'Alim Beveridge posted on Sunday, February 09, 2014 - 10:52 pm
Dear Bengt,

1. A followup question to your above post about an updated approach to setting IW priors for residual variances.

I am trying a Bayesian CFA model with 27 observed variables (all continuous) and 7 LVs. When I don't estimate residual covariances at all I get a very poor fit. If estimate all possible residual covariances following the example in your Muthén & Asparouhov 2012 Psych Methods paper (resid variances have priors c1-c27~IW(1,29) and resid covariances have prios p1-p351~IW(0,29) ) I get good fit and in addition the loadings for most indicators on almost all LVs improve noticeably. In addition, 18 resid covariances are significant.

Then I try the new recommendation. Given that when I do not add priors the residual variances are in the range .2-.8 with most being around .5, I set the Sigma for resid variance priors to achieve a mode of .5: .5(27*2+3) = 28.5
However I get the following error for any Sigma > 3:



For Sigma = 2 or 3 the results are very similar to Sigma = 1.
Can you explain why the error occurs for Sigma > 3?

 'Alim Beveridge posted on Sunday, February 09, 2014 - 10:53 pm
This did not fit in the above give the size limits of posts:

2. In your paper you write, "An excellent-fitting model is expected to have a posterior predictive p value around .5 and an f statistic difference of zero falling close to the middle of the confidence interval."
When I use a Sigma = 1 for the resid variances I get the following fit:
Chi-sq diff 95% CI: (-99.122 56.210)
p= 0.726

How should I interpret this given that the CI is skewed towards the negative and that p > .5 rather then less than?

 Bengt O. Muthen posted on Monday, February 10, 2014 - 11:27 am
1. Send output and data to Support.

2. p = 0.726 is still a good fit. I don't think it is important that it is greater than .5.
 Ted Fong posted on Sunday, April 20, 2014 - 9:09 am
Dear Dr. Muthen,

I am evaluating the factor structure of a 9-item scale via specifying 3 Bayes CFA models (one-factor, three-factor, and bifactor model with 1 general and 2 specific factors) using informative priors for the residual covariances. All of the 3 models show a good fit with PPP around .5.

1) My intention is to compare them via information criteria shown in the output. However, the DIC and BIC appear to show conflicting results:
1-factor model: DIC = 21017, BIC = 21344
3-factor model: DIC = 21008, BIC = 21494
bifactor model: DIC = 20714, BIC = 21684

The 1-factor model has the highest DIC but the lowest BIC. The bifactor model, on the other hand, has the lowest DIC but highest BIC. I know that BIC is favored over AIC in mixture modeling. Do you have a preference for using the DIC or BIC in comparing Bayesian CFA models?

2) The Estimated Number of Parameters (pD) is positive in the 1- or 3-factor CFA models but is negative (-248.970) in the bifactor model. Does this negative pD signify a problem for the bifactor model?

Many thanks and regards,
 Tihomir Asparouhov posted on Monday, April 21, 2014 - 12:45 pm
1) DIC tends to select over-fitted models (it is very similar to AIC).

2) Usually a large negative pD value signifies a problem. Try to diagnose it with the the traceplots for the parameters. Perhaps some kind of sign switching is happening. You can read section 3 in
for some ideas.
 Ted Fong posted on Sunday, April 27, 2014 - 9:02 am
Dear Dr. Asparouhov,

Thanks for your reply and expert opinions.

Following your advice, I checked the trace plots for the parameters in the bifactor model. Yes, the major loadings on the 2 specific factors showed some kind of sign switching and bimodal posterior distributions with wide 95% CI e.g. (-0.384, 0.877).

I have read section 3.1 of your paper but still am not clear on the cause and proper treatment of such unidentification problem.

1) I read from page 6 of your paper: "the only statistic that is not essentially distorted by the multiple modes is the mode of the posterior". Does this mean using POINT = MODE may help in this problem?

2) I tried constraining these major loadings to be greater than zero using model constraint. However, the model estimation did not terminate normally: UNABLE TO GENERATE PARAMETERS SATISFYING THE CONSTRAINTS. CHECK THE STARTING VALUES. THE PROBLEM OCCURRED IN CHAIN 2.

3) I also tried specifying informative priors e,g, N(0.3, 0.01) for these major loadings. This results in a pD of 42.938 for the model with no apparent sign switching. However, in Ex 5.31 of the UG, informative priors are specified only for cross-loadings but not major loadings. I am not sure adding informative priors is a correct way to tackle the issue.

I would be grateful if you would have any comments.
 Daniel Seddig posted on Monday, October 20, 2014 - 5:25 am
Dear Bengt, Linda and Tihomir, concerning the calculation of the DIC I stumbled upon the following questions: I observe differences between BIC and DIC in favoring a particular CFA model with approximate MI. The BIC seems to penalize the additional uncertainty reflected by the prior variance of the approximate invariance constraints stronger than the DIC. Is this difference based on the BICs stonger penalization of model complexity in general or is it based on a difference in the likelihood functions or both? Is the Bayesian loglikelihood an aggregation of loglikelihoods obtained from the posterior draws or from the final posterior? Thank you, Daniel.
 Tihomir Asparouhov posted on Monday, October 20, 2014 - 10:42 am
It is due to BICs stonger penalization of model complexity. If you keep invariance parameters as equal you can expect to see more agreement between BIC and DIC. We use the final posterior for the computation of BIC and DIC.
 Dmitriy Poznyak posted on Wednesday, March 04, 2015 - 9:41 am
Hi, Mplus team,

I wonder if there's a way to request Svalues with standardized estimates under the Bayes estimator? I fit CFA on a small sample and I'd like to first fit the Bayesian CFA and then use these estimates as the plausible values for WLMSV estimator.

Thank you!
 Bengt O. Muthen posted on Wednesday, March 04, 2015 - 2:43 pm
No, we only print SVALUES output with the regular model estimates.
 Dmitriy Poznyak posted on Wednesday, March 04, 2015 - 7:44 pm
Thank you for the prompt reply. Any chance unstandardized estimates can still be used as possible values in the ML model? I believe the latter assumes the values (constraints) are standardized which would affect model specification.
 Bengt O. Muthen posted on Thursday, March 05, 2015 - 10:13 am
I don't understand your question. Please clarify how you think about estimator (you mention Bayes, WLSMV and ML) and standardized vs unstandardized.
 Dmitriy Poznyak posted on Thursday, March 05, 2015 - 10:55 am
Sorry I was not clear. I am fitting a CFA model for mixed outcomes based on a relatively small sample. Because of the sample size, I first want to model the data via Bayes estimation and use the model parameters (obtained via SVALUES) to reproduce the same model via WLSMV in order to avoid convergence issues.

Now, when I fit the model using WLSMV the SVALUES (at least the loadings) are standardized and can be used to reproduce the same model on e.g. another sample. When I fit the model via Bayesian estimator, resulting SVALUES (loadings) are not standardized and using them to reproduce the same factorial structure leads to nonsensical outcomes. For instance, this is an excerpt from the SVALUES command using Bayesian estimator

f1 BY var1*2.48467;
f1 BY var2*2.13034;
f1 BY var3*1.45484;

Whenever I impose these constraints on the new model (say, either based on a different sample from the same population or based on the same sample but using a different estimator), Mplus assumes the factor loadings above are standardized, which, of course, results in the model that makes no sense.

Hence, to reiterate, I was trying to figure out whether there is a way to use these unstandardized parameters from the SVALUES output as constraints in another model, either based on another sample or using a different estimator.
 Bengt O. Muthen posted on Thursday, March 05, 2015 - 1:37 pm
I think I know what you mean. Bayes for categorical variable uses the "Theta parameterization" where the continuous latent response variables don't have variances 1, but instead the Theta variances are 1. When you move to WLSMV you probably use the default Delta parameterization where the lrv's have variances 1. If this is what's happening, you want to request Parameterization=Theta in your WLSMV run; then the Bayes unstandardized estimates should work fine.
 Dmitriy Poznyak posted on Thursday, March 05, 2015 - 3:28 pm
Bingo! This is precisely what I meant. Thank you so much for clarifying this, Dr. Muthen.
 Ted Fong posted on Tuesday, March 24, 2015 - 11:41 pm
Dear Dr. Asparouhov/ Dr. Muthen,

Your recent paper on "BSEM with cross-loadings and residual covariances" recommends the use of DIC rather than BIC for model selection involving informative priors, as BIC over-penalizes BSEM model by counting approximately zero parameters as actual parameters (if my understanding is correct)

I followed your paper and conducted BSEM with residual covariances on three (1-factor, 3-factor, and bifactor) models on a 9-item measure. Since all my three BSEM models include only informative priors (d = 300) for the same 36 residual covariances, would it makes sense to you if I compare the 3 models using also BIC?

I ask because these three BSEM models show highly similar DIC but different BIC. I am not sure if I can compare their BIC when they appear to have the same specifications on informative priors. (the residual variances of the 9 items did vary slightly across the three models...)
 Tihomir Asparouhov posted on Wednesday, March 25, 2015 - 9:24 am
In that paper we recommended DIC for situations where BSEM models are compared against SEM models, we did not imply that DIC will be better if one BSEM model is compared to another BSEM model. In fact approximately zero parameters shouldn't really cause damage if one BSEM model is compared to another BSEM model - assuming they are present in both models. I don't really see why DIC and BIC would be that different in the situation you are describing. One way to figure this out is via a simulation study. You should also consider the SEM based inference for the three models as the residual covariance shouldn't really affect the choice between the three factor structures. For more insights I will have to look at the actual results. You can send them to
 Ted Fong posted on Wednesday, March 25, 2015 - 8:38 pm
Dear Dr. Asparouhov,

Specifying various cross-loading priors for 3-factor and bi-factor models resulted in PPP = .000

Fit statistics of BSEM with residual covariances (d = 300):
1-factor : pD = 39.2, PPP = .113, DIC = 11957, BIC = 12277
3-factor : pD = 41.0, PPP = .160, DIC = 11955, BIC = 12290
bi-factor : pD = 42.3, PPP = .149, DIC = 11957, BIC = 12309

All 3 models (N = 556) showed acceptable PPP and equivalent DIC. Since they specified the same informative priors, I guess I could argue that the substantially lower BIC favors the 1-factor model over than other two?

Many thanks for your insights.
 Tihomir Asparouhov posted on Thursday, March 26, 2015 - 9:47 am
I think you should use BIC results which also agree with PPP. The deviance (DIC-2*Pd) pretty much does not change across the models and it would be hard to justify adding to the complexity of the model. DIC and AIC are well known to have a bias towards more complex models as they do not penalize sufficiently for the increase in the number of parameters.

You should also run more iterations possibly double the number of iterations to check the above results replicate.

Also to be sure that BSEM is not interfering in anyway, you can run the corresponding non-BSEM models where residual covariances that are significant in the BSEM models are set as free parameters while the rest are fixed to zero.
 Ted Fong posted on Sunday, March 29, 2015 - 8:26 pm
Dear Dr. Asparouhov,

Yes, I have run all 3 BSEM models using different no. of iterations and they converged rather easily in 20,000 or 40,000 iterations (PSR < 1.05 throughout)

As you suggested, I will use BIC to argue for the 1-factor BSEM model, since the increased model complexity results in higher BIC but not lower DIC.

Though 1/3 of the residual correlations are statistically significant in the 1-factor model, all of them are smaller than 0.17 and probably substantively insignificant. With reference to Section 2.5 in your paper, I intend to attribute the model misfit to minor residual correlations and treat the 1-factor model as a approximate model to the data.

Deeply appreciate your time and invaluable insights on the BSEM results.
 Andreas Stenling posted on Wednesday, May 13, 2015 - 5:40 am

I am trying to estimate a Bayesian CFA as shown in Table 15 in the BSEM paper (Muthén & Asparouhov, 2012, Psych Methods). However, when I include weakly-informative priors for cross-loadings and residual correlations I obtain the following error message:


How can I investigate what's causing this problem and are there ways to remedy this problem?

 Bengt O. Muthen posted on Wednesday, May 13, 2015 - 9:30 am
Try setting the metric of the factors by fixing their variances at 1 instead of the first loadings. If that doesn't help, send input, output, data and license number to support.
 Andreas Stenling posted on Wednesday, May 13, 2015 - 10:30 am
It helped, thanks! I'm courious, how does scaling of the latent factor affect convergence?

 Bengt O. Muthen posted on Wednesday, May 13, 2015 - 6:46 pm
I am not entirely sure. We experimented with it in our Bayes papers on the web.
 lotti posted on Thursday, May 28, 2015 - 6:37 am
Dear Prof Muthén, I have three questions regarding the evaluation of informative priors.

1. I have a CFA model of a well known six items scale. Basically, there are six items loading on a single latent factor (simple congeneric model) I derived the average loading for each of the six items from 8 available studies reporting the same model. Then now I have six informative priors for replicating this model in a new dataset. These priors are on a standardized scale (i.e., all previous studies reported completely standardized loadings). It is correct to use the "Standardize" procedure in order to inform Mplus on which scale evaluate these priors?
3. Can I derive informative information about the variance of the above priors by computing the average variance of the six loadings across previous studies?
2. There is a way to conduct a "sensitivity" analysis on informative priors? I mean, there is a way to evaluate the gain associated with removing an informative prior?
I thank you in advance for your attention, and apologise for the naivety of these questions.
 Bengt O. Muthen posted on Thursday, May 28, 2015 - 1:15 pm
1. Yes.

3. Yes.

2. A sensitivity analysis is done in the paper on our website:

Muthén, B. & Asparouhov, T. (2012). Bayesian SEM: A more flexible representation of substantive theory. Psychological Methods, 17, 313-335. Click ""download paper"" below for the latest version of October 21, 2011. Download the 2nd version dated April 14, 2011. Click here to view the seven web tables referred to in the paper and here to view Mplus inputs, data, and outputs used in this version of paper. Download the 1st version dated September 29, 2010 containing a MIMIC section and more tables, and the corresponding Mplus inputs, data, and outputs here. The seven web tables correspond to tables 8, 10, 17, 18, 19, 20, and 21 of the first version.

Note also my intro Bayes paper:

Muthén, B. (2010). Bayesian analysis in Mplus: A brief introduction. Technical Report. Version 3. Click here to view Mplus inputs, data, and outputs used in this paper.
 Mark LaVenia posted on Monday, August 17, 2015 - 5:30 am
I apologize in advance for the very basic question:

I am interested in applying some of the methods described in the recent Journal of Management article (Asparouhov, Muthén, & Morin, 2015).

In our initial run, I don't see the BIC or DIC reported in the output. I do see that formulas 6 through 9 in the Asparouhov et al. article do show how they are calculated. However, I am not even sure I know where in the output I find all of the values to plug into the formulas.

My questions are (a) is there a way to get the BIC and DIC to print in the output or (b) is there an applied example/annotated output that I could reference that would provide some guidance on computing these values for myself?

Fyi, I am pasting below what is printed in the model fit section of our output:


Number of Free Parameters 160

Bayesian Posterior Predictive Checking using Chi-Square

95% Confidence Interval for the Difference Between
the Observed and the Replicated Chi-Square Values

26953.272 27839.576

Posterior Predictive P-Value 0.000

Thank you,
 Bengt O. Muthen posted on Monday, August 17, 2015 - 6:25 am
Please send your full output to support so we can see what is going on.
 Mark LaVenia posted on Tuesday, August 18, 2015 - 5:56 am
Hi Bengt and Linda - Thank you. For the benefit of others, I am pasting below Linda's response after having viewed my output:

"BIC and DIC are not available with categorical outcomes using the Bayes estimator. It is not possible to compute these by hand." (Linda K. Muthen, sent Mon 8/17/2015 6:23 PM).

Along these lines then, I do have a follow up question: We were only working with the Bayes estimator at this phase because next steps in the analysis involved applying the BSEM Measurement Invariance approach described in Web Notes No. 17 (Muthén & Asparouhov, 2013).

We would have a 2- or 3-factor model with multiple-groups, where indicators are mostly binary and others are counts (which we have taken a natural log of and treat as continuous). My question is, do you foresee a problem applying the BSEM multiple-group approach given that many of our indicators are categorical (or that we have a mixture of categorical and continuous)? I ask because I wasn't expecting the limitation of using categorical DVs with Bayes that I learned about in Linda's response above--and I want to see if there are additional limitations that may apply to our planned next steps. The Fox (2010) example you provide indicates the use of binary indicators, so I assume there should be no problem.

Any caution you may have regarding this plan will be greatly appreciated.

 Tihomir Asparouhov posted on Tuesday, August 18, 2015 - 10:59 am
Mark - you shouldn't have any problems with the above plan. You can use PPP to evaluate model fit with categorical indicators or the combination of categorical and continuous.
 Mark LaVenia posted on Wednesday, August 19, 2015 - 7:14 am
Fantastic! Thank you so much.
 Martin Taylor posted on Sunday, July 24, 2016 - 10:51 pm

I am trying to run an ordinal bayesian CFA model on a 5 item measure. I would like to set the residual covariances for the items to near zero. If I understand the recommendations correctly, I want to explore IW(0,11) priors on the covariances but also need to place priors on the residual variances as well, such as IW(1,11). The problem I am running into is with the label. I can label the covariance parameters in the WITH statement, e.g., u1 WITH u2 (a6). However, when I want to label the residual variances using u1-u5 (a1-a5), Mplus tells me I cannot run this line under =bayes. The computer is expecting another estimator such as WLSMV. How do I assign a label to the residual variances to allow for a prior when the observed variables are categorical?

 Linda K. Muthen posted on Monday, July 25, 2016 - 6:17 am
Residual variances are not model parameters for categorical dependent variables. You cannot give them priors.
 Martin Taylor posted on Monday, July 25, 2016 - 12:10 pm
Thank you!

I had speculated it was the cause of my problem since with other categorical estimators the residual variances are a function of the loadings but was not entirely confident it extended to the bayesian estimator.
 Martin Taylor posted on Tuesday, October 11, 2016 - 9:31 am
I am trying to regress five items onto a single latent variable using bayesian estimation. I have placed near zero constrains on the residual covariances using IW(0,34) based upon five items and sample size. This model does not converge including when increasing the chains. When I remove the near zero covariances the models converges (looks decent too; ppp = .270). Mplus is flagging the PSR for one of the residual covariances and looking at the covariances it looks like there are one or more significant and moderate residual correlations that might be creating my problem. I am trying to free up the one or more problematic covariances but am having trouble finding much documentation on the strategy. It looks like a N(0, .001) is placed on the residual covariance but I am not sure about that. Please help. Thanks!
 Bengt O. Muthen posted on Tuesday, October 11, 2016 - 10:42 am
I recommend looking at the strategies presented in the paper on our website:

Asparouhov, T., Muthén, B. & Morin, A. J. S. (2015). Bayesian structural equation modeling with cross-loadings and residual covariances: Comments on Stromeyer et al. Journal of Management, 41, 1561-1577.
 Martin Taylor posted on Wednesday, October 12, 2016 - 9:05 am
Thanks for the reference. It provided much needed procedural structure for me. Two quick follow-up question. In Muthen & Asparouhov (2012) it looks like the 3rd method for placing near zero priors on the residual covariances consists of placing a normal prior on the covariance, N(0, .001), and an inverse-gamma on the residual variances and the MCMC chains ran with a random walk. Is that understanding correct? If so, would it suggest that a normal prior should be placed on the covariances of the latent factors rather than an inverse-wishart? Thanks for the help!
 Tihomir Asparouhov posted on Wednesday, October 12, 2016 - 2:31 pm
Martin - it makes no sense to me that you want to include residual covariance. If PPP=0.27 you should stop there and not add any more parameters. I don't see any reasons to switch to non-conjugate priors such as the normal. Our general recommendation is to start with IW(0,500) say and slowly decrease the degrees of freedom until you hit convergence problems or/and very slow convergence or PPP stops improving (but note there is no room for improvement if PPP=0.27 with IW(0,500))
 Guido Biele posted on Monday, December 04, 2017 - 8:16 am
Since version 8 Mplus automatically samples twice: First for the PPPP estimation and then for the specified model. Is it possible to turn of the first sampling for PPPP calculation?
 Tihomir Asparouhov posted on Monday, December 04, 2017 - 4:12 pm
 Martina Oldeweme posted on Wednesday, December 13, 2017 - 1:04 am
Dear Team,
I run the same CFA (16 items, 4 factors, 4 items for each factor, N> 600) with a Bayesian estimator and a ML estimator. While the results of the ML estimator are nearly perfect, the results with Bayes diverge completely (PPP= 0). Do you have an explanation for these results? I’m particularly interested under which conditions ML and Bayes estimations diverge.
 Guido Biele posted on Wednesday, December 13, 2017 - 2:42 am
Just out of curiosity: What do you mean by "the results of the ML estimator are nearly perfect"? You can nearly perfectly recover parameters from simulated data? You have good CFI/TLI/RMSEA values?

If you want to compare ML and Bayesian estimates, it is important to know what your are taking as the Bayesian estimate. The maximum a posteriori (MAP), the mean or median of the posterior? Many Bayesians would take the mean of the posterior as their estimate, which will diverge from the MAP if the posterior distribution is skewed. If you implement identical models and use non-informative priors, the ML and MAP estimates should be identical (provided the MCMC chains converged--i.e potential scale reduction factors are low--and you have a sufficient number of effective samples)
 Bengt O. Muthen posted on Wednesday, December 13, 2017 - 2:00 pm
Answer to Martina:

This is an unusual outcome. We need to see the output from both runs - send to Support along with your license number.
 Tihomir Asparouhov posted on Wednesday, December 13, 2017 - 5:09 pm
Bayes and ML are asymptotically identical and N=600 should be enough to see nearly identical results.
 Bengt O. Muthen posted on Thursday, December 14, 2017 - 11:05 am
Answer to Martina:

Your output shows that the ML fit is poor as determined by chi-square:

Chi-Square Test of Model Fit

Value 324.403
Degrees of Freedom 98
P-Value 0.0000

The Bayesian PPP is chi-square based so it is not surprising that this also rejects the model. ML and Bayes don't diverge here - and they seldom do.
 Freya Glendinning posted on Thursday, February 22, 2018 - 9:54 am
Hi Tihomir,

In your previous post you mention that:

"Our general recommendation is to start with IW(0,500) say and slowly decrease the degrees of freedom until you hit convergence problems or/and very slow convergence or PPP stops improving"

Do you reccomend this in any of your publications? If so, could you point me in the right direction...

Thank you very much.
 Tihomir Asparouhov posted on Thursday, February 22, 2018 - 11:05 am
Our most up-to-date writing on this topic is
Appendix A.

The IW(0,500) would apply only to the off diagonal elements.

Here is a sample code

y1-y5*1 (v1-v5);
y1-y5 with y1-y5*0 (c1-c10);

model prior:

This assumes that the residual variances in the regular CFA are 1. If they are not then you would use


where r1-r5 are the residual variance estimates from the original CFA.

Double check that the above approach yields PPP close to the original CFA PPP (like point 2 in Appendix A states)
 Tihomir Asparouhov posted on Thursday, February 22, 2018 - 11:09 am
Also Appendix B shows this as well as top of page 12.
 Freya Glendinning posted on Friday, February 23, 2018 - 5:22 am
Great, this is very helpful!

Thanks a lot!
 Angel Arias posted on Wednesday, May 16, 2018 - 9:43 am
Hello Linda and Bengt,
I want to compare the Mplus ML and BAYES estimators and their effect on the magnitude of factor loadings in a CFA model with binary data. I am a bit unclear on how to run the BAYES CFA, please see syntax below. Any advice?

DATA: FILE IS BAYES_CFA_test_form21_RCSS.csv; !file path
VARIABLE: NAMES ARE I1 - I44; !specify the number of items
MISSING ARE ALL (99); !missing data
MODEL: RC BY I1 - I29;
SC BY I30 - I44;
 Bengt O. Muthen posted on Wednesday, May 16, 2018 - 1:41 pm
ML and Bayes don't really give different results. Note that you have to specify link=probit if you use ML to compare to Bayes. Your input looks fine but there are settings in using Bayes which are shown in our UG. For information on Bayes, see also our Short course video and handout for Topic 9 and Topic 11 on our website.
 Nicole Tuitt posted on Sunday, July 29, 2018 - 8:06 pm

I am trying to run a CFA using a sample of 246 students from 13 different middle schools. I need to control for clusters/schools. I know how to control for fewer than 20 clusters in a structural equation model (create dummy variables for each cluster and control for the dummy variables). But, how do I do this with CFA using a bayes estimator. I tried type=complex, but type=complex isn't compatible with estimator=bayes.


 Bengt O. Muthen posted on Monday, July 30, 2018 - 3:39 pm
Take the same dummy variable approach with Bayes (no need to use Complex for Bayes or ML here) - although with Bayes you may get better performance than ML for twolevel analysis with only 13 clusters.
 Dmitriy Poznyak posted on Thursday, August 02, 2018 - 11:42 am
Hi, Mplus team,

Is there an easy way to get UNstandardized lodings from the Bayesian CFA model with informative priors? My model is a 2nd order CFA with 7 factors which uses a small sample with non-normally distributed data. Because I use the “define: standardize” statement in order to use the priors, the items are completely standardized for the analysis. Correspondingly, I am not able to get unstandardized loadings (technically, they are a part of the output but they match my standardized loadings exactly).

When I am trying to convert standardized loadings to unstandardized by hand, my results don’t seem to match Mplus (I am testing this using the CFA model with default priors, which gives me both UNstandardized and Standardized loadings). Interestingly, if I normalize the loadings in a factor to sum to 1.0, the relative standing of the items based on Mplus computations and my own computations is virtually identical.

This said, I wonder if there may be the way to get unstandardized loadings from Mplus in my situation. If not, can you perhaps suggest an approach to convert STDYX (or STD) loadings to Unstandardized loadings so they would match Mplus' own computations.

Thank you,
 Bengt O. Muthen posted on Thursday, August 02, 2018 - 2:09 pm
You can use priors without standardizing the variables. It is true that the effect of priors are somewhat different for variables on different scales but that doesn't mean you have to standardize. You can just divide by a constant (e.g. 10) to get the observed variable variances in a reasonably common range of say 1 - 10/
 Dmitriy Poznyak posted on Thursday, August 02, 2018 - 3:25 pm
Thank you for the prompt response! This makes sense. In this case, would you recommend to still customize the priors for each variable (say, by setting the mean to something comparable to 0.6 and variance to e.g. 0.1 S.D. on the original metric) or use a "one prior fit all" approach, assuming we bring the variables to a more or less common metric? In my case, I don't see meaningful differences when I switch from the default to informative priors, so this might be a moot point, but I would appreciate your opinion on the issue.
 Bengt O. Muthen posted on Thursday, August 02, 2018 - 4:08 pm
One prior fits all seems alright to me.
 j guo posted on Tuesday, August 21, 2018 - 4:10 pm
Hi, Mplus team,

I compared ML ESEM and BSEM with informative cross-loading priors. Currently, I compare model fit in term of model rejection rate (ML LRT and Bayes PPp). I wonder if BIC is comparable between ML and Bayes models and if AIC in ML model is comparable to DIC in Bayes models. Thanks!
 Tihomir Asparouhov posted on Wednesday, August 22, 2018 - 10:12 am
Yes - these are fairly good approximations but I am not aware of any full simulation studies that confirm this. Unless you conduct your own simulation studies, I would recommend using such a comparison cautiously, i.e., consider differences >50 to be conclusive but smaller differences should be treated as inconclusive.
 Daniel Lee posted on Friday, January 11, 2019 - 8:18 am
I have conducted a Bayesian CFA (items on a ordinal scale) with diffuse priors (default). I was under the assumption that conclusions between my bayesian CFA model and frequentist CFA model (WLSMV estimation) should be relatively the same. However, while the frequentist CFA model fit the data poorly (RMSEA = .11), the bayesian CFA model fit the data quite well (PPP = .37). Further, factor loadings in the bayesian CFA exceeded 1.000 (e.g., 1.566), while factor loadings in the frequentist model were all around .60 to .90.

I was wondering why my results from bayesian CFA might have diverged from frequentist CFA given that I used diffuse priors (default) in the bayesian model. Thank you so much!
 Bengt O. Muthen posted on Saturday, January 12, 2019 - 12:04 pm
Note that Bayes ordinal CFA uses a Theta parameterization whereas this is not the default in WLSMV. So scales are different unless you look at standardized solutions.

The PPP of Bayes has lower power to reject a model than the chi-2 of WLSMV. We have written about this in our initial Bayes documents on our website.
 Martin Taylor posted on Sunday, January 27, 2019 - 12:29 pm
Hi Mplus team,

Is there any reason why the the interpretations for the low variance priors recommended in Muthen & Asparouhov (2011) would not also work for ordinal CFA models under the probit link in Mplus?

Since the probit link is the inverse cumulative standard normal I was thinking that a prior of N(0, 0.01) on an ordinal indicator would also imply a 95% confidence that the standardized loading lies between -.2 and +.2, just as in the case of continuous indicators. But I might be missing something.

Thank you for your help!
 Bengt O. Muthen posted on Monday, January 28, 2019 - 1:13 pm
Q1: No.

Q2: Ok if you have f@1 and use the default Delta parameterization so both the factor and the factor indicator have variance one.
 Martin Taylor posted on Monday, January 28, 2019 - 2:52 pm
Thank you!
 Ulrich Keller posted on Monday, September 02, 2019 - 5:38 am
Dear Mplus team,

I am running Bayesian CFAs with binary indicators (default diffuse priors for now) and keep running into the same problem: While the models fit well, they converge very slowly and, more importantly, the chains don't mix well at all, with high autocorrelations even at large lags.

The last model I ran, for example, took more than 200k iterations to converge, but produced a very good fit (PPp=.463). For many parameters though, I got autocorrelations of >.9 even with a lag of 1000.

Would you consider this a problem?

For the model I mentioned, I also ran WLSMV estimation (Theta parameterization), which gave a very well-fitting solution with estimates close to the Bayesian estimates. The S.E.s were close to the Posterior S.D.s as well.
 Tihomir Asparouhov posted on Tuesday, September 03, 2019 - 9:48 am
Usually such slow convergence is related to a poorly identified part of the model. You can identify the issue by looking at which parameters have the large AR or even just by looking at which parameters have unusually large SE. Other possibilities are large amount of missing data, perfect correlations between variable (i.e. empty cells in the bivariate distribution), or unusually large scale for predictors. Send your example to if you want us to take a closer look.
 Georgios Sideridis posted on Thursday, January 23, 2020 - 11:19 am
Can we model new parameters that are created using Model constraint as parameters that we can specify priors for using the Model priors command in Bayesian modeling?


will it be possible to get PPPP estimates using the DIFFERENCE command in the Model priors statement? The tech note of the PPPP specifies that this may come in the near future so, just wondering if you have an update on that.

thank you!
 Tihomir Asparouhov posted on Friday, January 24, 2020 - 3:00 pm
I am afraid that the answer is no to both questions.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message