EFA with Bayes estimation PreviousNext
Mplus Discussion > Exploratory Factor Analysis >
 Jan Ivanouw posted on Saturday, November 06, 2010 - 5:28 am
When trying to perform EFA with Baeyes estimation, the output file does not contain the usual information. There is the usual logo in the top left corner (Muthén & Muthén, etc.), but besides from that, the output file is a direct replicate of the input file. No information of calculation time or anything. No error messages either. Also, the output file does not appear on the screen as usual, but is only saved on the disk.

What is wrong?
 Bengt O. Muthen posted on Saturday, November 06, 2010 - 5:55 am
Bayes is not yet available with EFA. See Version History for version 6.
 Jan Ivanouw posted on Saturday, November 06, 2010 - 6:29 am
Oh, I have misunderstod your presentation from John Hopkins, August 18, 2010 where you talked about an EFA example, but it must have been a non-Bayes version.

Looking forward to a Bayes-implementation!
 Bengt O. Muthen posted on Saturday, November 06, 2010 - 9:55 am
Yes, the EFA was ML. Yes, Bayes EFA will be very useful, also taking care of Heywood cases and small samples.
 Keivn Linares posted on Wednesday, November 10, 2010 - 12:03 pm
Greetings Drs. Muthen,

I understand that EFA cannot be executed using bayes. My question is, can a more constrained model such as an ecfa be done using the bayes estimator? Furthermore, my model did converge, yet I am more about whether there are negative implications, or if it is even practical to do so on bayes. My model is below this email. Also, I would like to take the opportunity to thank the Mplus team for devoting so much time and effort into this great tool, I am having such a blast learning new things about Mplus everyday.

FBITER = 20000;
STVAL = ml;

F1 BY M1-M20*.5 M10@0 M17@0;
F2 BY M1-M20*.5 M17@0 M18@0;
F3 BY M1-M20*.5 M10@0 M18@0;
 Bengt O. Muthen posted on Wednesday, November 10, 2010 - 12:53 pm
Bayes can do EFA within CFA. I don't see any negative consequences. A positive outcome is no negative residual variances. There may be in some cases be convergence difficulties given such a relaxed model. The real advantage with Bayes in factor analysis I think is the possibility discussed in:

 Jan Zirk posted on Thursday, November 01, 2012 - 7:07 am
Dear Bengt Linda or Tihomir
My questions concern comparison of results of a Bayesian and WLSMV EFA 1-8. 30 categorical items (7 categories) were used (big sample: n=83548, so the Bayesian EFA took 125h while the WLSMV less than 1h). The theory of the tested instrument suggested 5 factors.
According to eigenvalues, which are the same in both estimation methods, the 7 factor solution is the first with eigv>1. WLSMV computed all 8 solutions and showed goodness-of-fit indices for all of them. Bayesian EFA provided output for up to 5 factors and there was no convergence for 6,7 & 8.
My first question is:
1)Can lack of convergence for 6-8factors be used as evidence for preference of the 5-factor solution?

In the next step a 1-factor Bayesian CFA for all 30 items was run for item scaling (thus the mean plausible values of the 30 categorical items were extracted). These continuous measures were used in the Bayesian EFA1-8 (which was much faster this time). And this time there was convergence for 6-factor, solution which was better than 5-factor, and there was no convergence for 7&8.
So my second question is:
2) Would plausible values of the latent response variables of the categorical items from 1-factor CFA be the same as latent response plausible values of these categorical measures from an analysis with different structural properties (e.g. 5-factor EFA)?
Best wishes,

 Bengt O. Muthen posted on Thursday, November 01, 2012 - 9:07 pm
I would not choose the number of factors based on eigenvalues > 1. If you want to use eigenvalues, I think it is better to look for a "break (an "elbow").

Bayes EFA is slower than WLSMV EFA when the sample size gets larger and also has computational difficulties when factors are weakly measured as happens when over-factoring.

1) Maybe, but that is weak evidence and doesn't say if 5 factors provides a good model.

2) If you want to create plausible values behind each of the 30 categorical variables, try using the "H1" model

u1-u30 with u1-u30;

Bayes then gives you plausible values for the 30 continuous u* variables.
 Jan Zirk posted on Thursday, November 01, 2012 - 9:16 pm
Thanks so much!
 Jan Zirk posted on Thursday, November 01, 2012 - 9:18 pm
I noticed the parallel analysis facility so will also use this approach with MLR when I have the plausible values extracted.

Best wishes,

 Bengt O. Muthen posted on Friday, November 02, 2012 - 4:15 pm
Note that parallel is for continuous variables only, so you have to assume this approximation. We noticed that it didn't work well for tetrachoric and polychoric correlation matrices.
 Jan Zirk posted on Friday, November 02, 2012 - 4:48 pm
That's right! I meant running the parallel analysis on the plausible values measures from Bayesian approach.
Best wishes,

 Bengt O. Muthen posted on Friday, November 02, 2012 - 4:56 pm
Did your run with

u1-u30 with u1-u30;

work out?
 Jan Zirk posted on Friday, November 02, 2012 - 5:23 pm
Still running, I think that I will know in the morning (I mean in about 10h; I had to first run different analyses which took some time. Will let you know as soon as I have them.

 Jan Zirk posted on Friday, November 02, 2012 - 5:48 pm
To my suprise the model have just been found identified (after 800 iterations(!) ) and now the imputations are being generated (n=10). This holds promise for much faster further analysis.

All the best,
 Jan Zirk posted on Friday, November 02, 2012 - 6:30 pm
I would like to ask you about Bayes factor. As far as I understand the first information provided in the output on the Bayes factor indicates preference of the more complex model if the value is bigger than 3 (according to convention). Where can I find then the information how the Log of Bayes factor is computed and what it can be useful for? Is there an article describing how it is computed in Mplus?

My second question is could you provide me with a reference to an article which would show how ordinal variables are treated under Bayesian estimation in comparison to WLSMV probit and ML logistic approach? I am trying to understand the mechanism underlying extraction of plausible values from categorical variables without necessity of regressing them on a latent factor (like in the H1 model that you mentioned).

With best wishes,
 Tihomir Asparouhov posted on Tuesday, November 06, 2012 - 9:41 am
The Log of the Bayes factor is just that: the log of the Bayes factor. It does not provide any new information. It is given for convenience. You can see that way how the Bayes factor is computed.

The Bayes EFA methodology is described in

The EFA model uses standardized metric. The estimates and factor score for the 3 estimators are in the same standardized metric and are directly comparable.
 Jan Zirk posted on Tuesday, November 06, 2012 - 10:52 am
Oh, Thank you for the article; That is really useful. Now I understand that the default for ordered categorical is probit. I noticed that my first question on the 'log of the bayes factor' was really confusing. Of course, 'log' means just a logarithm of the BF value; the BF is so high in all my models that the output shows only approximation (>1000000). Everything is clear :-)
 Guido Biele posted on Tuesday, September 12, 2017 - 12:30 am
Model checking for Bayesian EFA:
- It seems that by deault posterior predictive distributions are only generated for categorical indicator variables. Could those also be obtained for continuous indicator variables?
(e.g. show oberved and predicted mean and sd of indicator variables)

- The DIC is mentioned at several places of Mplus documentation and papers. What is the command to obtain DIC for Bayesian EFA or SEM?
 Bengt O. Muthen posted on Tuesday, September 12, 2017 - 6:25 pm
Q1: If you mean the PPC that gives PPP, that is available also for continuous outcomes.

Q2: DIC is printed automatically whenever it is available.
 Guido Biele posted on Wednesday, September 13, 2017 - 12:11 am
Resp Q1: What is the command to get PPP,PPC for continuous variables?

Resp Q2: Is there a way to get the log posterior instead of the DIC when the DIC is not available?
[It would be great if the manual would say where the DIC is unavailable]
 Bengt O. Muthen posted on Wednesday, September 13, 2017 - 2:29 pm
Q1: It is printed when available.

Q2: No. A full description of DIC in the DSEM context is in the DSEM theory paper on our website:

Asparouhov, T., Hamaker, E.L. & Muthen, B. (2017). Dynamic structural equation models. Technical Report. Version 2. (Download Mplus analyses)

The general problem with DIC is that it requires that the loglikelihood can be (easily) computed. The fact that in many cases it isn't easily computed is, however, the very reason that DSEM doesn't use ML but Bayes. For example, with categorical outcomes and many random effects, ML computations are prohibitive.
 Guido Biele posted on Thursday, September 14, 2017 - 5:06 am
If I understand it correctly, this means in essence that model comparison for Bayesian EFA with continuous and categorical variables is not possible within Mplus because (1) posterior predictive checks are only available for a subset of the data and (b) (penalized) measures of model fit are not available.

Hence, one would need to use non-Bayesian estimation in Mplus (if that is not to expensive, as you alluded to above) or use the parallel method (for example through the psych package in R) to determine the number of factors?

About computing the likelihood: Why would one need to compute the ML, could one not just evaluate the model at the maximum a posteriori after the sampling is finished?
 Tihomir Asparouhov posted on Thursday, September 14, 2017 - 4:08 pm
The PPP is available for Bayes EFA with continuous and categorical variables. Check your output carefully. It works the same way chi-square test of fit works.
Check out section 6 in
in particular subsections 6.4 and 6.5 might be of interest. If you are not getting PPP in Bayes EFA send your run and data to support@statmodel.com

Regarding computing the maximum-likelihood: it s possible to use the bayes mode estimator but estimating the mode from MCMC is not a very reliable way to compute the maximum-likelihood since estimating the multivariate mode is not easy and would require huge amount of draws. You can compute that likelihood using ML estimator and if need be montecarlo integration (although again this is not needed to determine the number of factors since you can use PPP).

Regarding label switching in Bayes Mixture estimation: If label switching occurs in Bayesian Mixture estimation we recommend adding inequality constraints among the parameters in the model constraint statement, i.e., inequalities that order the classes and prevent label switching, for example using ordering of the means of one indicator which has different means across the classes.
 Guido Biele posted on Monday, September 18, 2017 - 4:34 am
I looked at http://statmodel.com/download/BayesAdvantages18.pdf and found the chi-square PPP explained in https://www.statmodel.com/download/Bayes3.pdf. However, this was not what I was after, and maybe Mplus does not provide what I was looking for.
In my experience, a posterior predictive check compares statistics calculate from observe data with statistics calculate from model-simulated data. Importantly, the primary goal of a PPC is not to reject or accept a model, but to investigate how a model could or should be improved.
The model fit, which is the base of the chi-square PPP, is one test statistic one can look at, but I think it is not particularly useful when investigating why the model fit is bad and it could be improved. The posterior predictive plots for the likelihood are somewhat informative, but (I think) less informative as other statistics.
I had hoped that Mplus could also show observed and predicted means and variances of continuous indicators and observed and predicted correlations of all indicators (because for continuous variables Mplus already shows observed and simulated proportions for response categories). But it seems that this type of posterior predictive check is not supported by Mplus.
Is this correct?
 Tihomir Asparouhov posted on Monday, September 18, 2017 - 6:06 pm
Consider the two output options "tech10" and "residual" and the specialized technique for model modification discovery call BSEM

Also I would not put a lot of weight on
"observed and predicted means and variances of continuous indicators and observed and predicted correlations of all indicators" as a model modification technique. You can ask yourself why this is not a recommended technique in the frequentist world for model modification. It is because misfits on the observed quantities are not directly related to the misfits in the structural model.
 Guido Biele posted on Tuesday, September 19, 2017 - 4:32 am

as far as I can see, the tech10 option provides the same posterior predictive P-values which I had already seen in the posterior predictive plots.

residual are not displayed, instead the following error message is displayed after sampling is completed and most of the *.out file is written:

Internal Error Code: FAILED TO OPEN GROUP
An internal error has occurred. Please contact us about the error,
providing both the input and data files if possible.
(I am sending required files to the support-email)
 Guido Biele posted on Tuesday, November 21, 2017 - 1:35 am
Calculating likelihoods from Bayesian analyses.

I would like to use the R-package loo (https://arxiv.org/abs/1507.04544) to calculate the Watanabe-Akaike information criterion for Mplus models.

To do this, one needs a n times i matrix of log-likelihoods*, where n is the number of (independent) participants and i is the number of posterior samples.

Is it possible to obtain the log-likelihood by using one iteration from the posterior as starting values for a non-Bayesian analysis and let Mplus write out the log-likelihood for the starting values without doing any further optimization?

Thanks in advance for your support

*I mean the simple log-likelihood, as in log(p(y_i|theta)) where y_i is the observed data of participant i and theta are the model parameters.
 Tihomir Asparouhov posted on Tuesday, November 21, 2017 - 4:04 pm
Yes. Even better you can simply fix the parameters to those from the iteration ... just replace * with @.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message