Jan Ivanouw posted on Saturday, November 06, 2010 - 5:28 am
When trying to perform EFA with Baeyes estimation, the output file does not contain the usual information. There is the usual logo in the top left corner (Muthén & Muthén, etc.), but besides from that, the output file is a direct replicate of the input file. No information of calculation time or anything. No error messages either. Also, the output file does not appear on the screen as usual, but is only saved on the disk.
I understand that EFA cannot be executed using bayes. My question is, can a more constrained model such as an ecfa be done using the bayes estimator? Furthermore, my model did converge, yet I am more about whether there are negative implications, or if it is even practical to do so on bayes. My model is below this email. Also, I would like to take the opportunity to thank the Mplus team for devoting so much time and effort into this great tool, I am having such a blast learning new things about Mplus everyday.
Bayes can do EFA within CFA. I don't see any negative consequences. A positive outcome is no negative residual variances. There may be in some cases be convergence difficulties given such a relaxed model. The real advantage with Bayes in factor analysis I think is the possibility discussed in:
Jan Zirk posted on Thursday, November 01, 2012 - 7:07 am
Dear Bengt Linda or Tihomir My questions concern comparison of results of a Bayesian and WLSMV EFA 1-8. 30 categorical items (7 categories) were used (big sample: n=83548, so the Bayesian EFA took 125h while the WLSMV less than 1h). The theory of the tested instrument suggested 5 factors. According to eigenvalues, which are the same in both estimation methods, the 7 factor solution is the first with eigv>1. WLSMV computed all 8 solutions and showed goodness-of-fit indices for all of them. Bayesian EFA provided output for up to 5 factors and there was no convergence for 6,7 & 8. My first question is: 1)Can lack of convergence for 6-8factors be used as evidence for preference of the 5-factor solution?
In the next step a 1-factor Bayesian CFA for all 30 items was run for item scaling (thus the mean plausible values of the 30 categorical items were extracted). These continuous measures were used in the Bayesian EFA1-8 (which was much faster this time). And this time there was convergence for 6-factor, solution which was better than 5-factor, and there was no convergence for 7&8. So my second question is: 2) Would plausible values of the latent response variables of the categorical items from 1-factor CFA be the same as latent response plausible values of these categorical measures from an analysis with different structural properties (e.g. 5-factor EFA)? Best wishes,
Jan Zirk posted on Friday, November 02, 2012 - 5:23 pm
Still running, I think that I will know in the morning (I mean in about 10h; I had to first run different analyses which took some time. Will let you know as soon as I have them.
Jan Zirk posted on Friday, November 02, 2012 - 5:48 pm
To my suprise the model have just been found identified (after 800 iterations(!) ) and now the imputations are being generated (n=10). This holds promise for much faster further analysis.
All the best, Jan
Jan Zirk posted on Friday, November 02, 2012 - 6:30 pm
I would like to ask you about Bayes factor. As far as I understand the first information provided in the output on the Bayes factor indicates preference of the more complex model if the value is bigger than 3 (according to convention). Where can I find then the information how the Log of Bayes factor is computed and what it can be useful for? Is there an article describing how it is computed in Mplus?
My second question is could you provide me with a reference to an article which would show how ordinal variables are treated under Bayesian estimation in comparison to WLSMV probit and ML logistic approach? I am trying to understand the mechanism underlying extraction of plausible values from categorical variables without necessity of regressing them on a latent factor (like in the H1 model that you mentioned).
The EFA model uses standardized metric. The estimates and factor score for the 3 estimators are in the same standardized metric and are directly comparable.
Jan Zirk posted on Tuesday, November 06, 2012 - 10:52 am
Oh, Thank you for the article; That is really useful. Now I understand that the default for ordered categorical is probit. I noticed that my first question on the 'log of the bayes factor' was really confusing. Of course, 'log' means just a logarithm of the BF value; the BF is so high in all my models that the output shows only approximation (>1000000). Everything is clear
Guido Biele posted on Tuesday, September 12, 2017 - 12:30 am
Model checking for Bayesian EFA: - It seems that by deault posterior predictive distributions are only generated for categorical indicator variables. Could those also be obtained for continuous indicator variables? (e.g. show oberved and predicted mean and sd of indicator variables)
DIC: - The DIC is mentioned at several places of Mplus documentation and papers. What is the command to obtain DIC for Bayesian EFA or SEM?
Q2: No. A full description of DIC in the DSEM context is in the DSEM theory paper on our website:
Asparouhov, T., Hamaker, E.L. & Muthen, B. (2017). Dynamic structural equation models. Technical Report. Version 2. (Download Mplus analyses)
The general problem with DIC is that it requires that the loglikelihood can be (easily) computed. The fact that in many cases it isn't easily computed is, however, the very reason that DSEM doesn't use ML but Bayes. For example, with categorical outcomes and many random effects, ML computations are prohibitive.
Guido Biele posted on Thursday, September 14, 2017 - 5:06 am
If I understand it correctly, this means in essence that model comparison for Bayesian EFA with continuous and categorical variables is not possible within Mplus because (1) posterior predictive checks are only available for a subset of the data and (b) (penalized) measures of model fit are not available.
Hence, one would need to use non-Bayesian estimation in Mplus (if that is not to expensive, as you alluded to above) or use the parallel method (for example through the psych package in R) to determine the number of factors?
About computing the likelihood: Why would one need to compute the ML, could one not just evaluate the model at the maximum a posteriori after the sampling is finished?
The PPP is available for Bayes EFA with continuous and categorical variables. Check your output carefully. It works the same way chi-square test of fit works. Check out section 6 in http://statmodel.com/download/BayesAdvantages18.pdf in particular subsections 6.4 and 6.5 might be of interest. If you are not getting PPP in Bayes EFA send your run and data to email@example.com
Regarding computing the maximum-likelihood: it s possible to use the bayes mode estimator but estimating the mode from MCMC is not a very reliable way to compute the maximum-likelihood since estimating the multivariate mode is not easy and would require huge amount of draws. You can compute that likelihood using ML estimator and if need be montecarlo integration (although again this is not needed to determine the number of factors since you can use PPP).
Regarding label switching in Bayes Mixture estimation: If label switching occurs in Bayesian Mixture estimation we recommend adding inequality constraints among the parameters in the model constraint statement, i.e., inequalities that order the classes and prevent label switching, for example using ordering of the means of one indicator which has different means across the classes.
Guido Biele posted on Monday, September 18, 2017 - 4:34 am
I looked at http://statmodel.com/download/BayesAdvantages18.pdf and found the chi-square PPP explained in https://www.statmodel.com/download/Bayes3.pdf. However, this was not what I was after, and maybe Mplus does not provide what I was looking for. In my experience, a posterior predictive check compares statistics calculate from observe data with statistics calculate from model-simulated data. Importantly, the primary goal of a PPC is not to reject or accept a model, but to investigate how a model could or should be improved. The model fit, which is the base of the chi-square PPP, is one test statistic one can look at, but I think it is not particularly useful when investigating why the model fit is bad and it could be improved. The posterior predictive plots for the likelihood are somewhat informative, but (I think) less informative as other statistics. I had hoped that Mplus could also show observed and predicted means and variances of continuous indicators and observed and predicted correlations of all indicators (because for continuous variables Mplus already shows observed and simulated proportions for response categories). But it seems that this type of posterior predictive check is not supported by Mplus. Is this correct?
Also I would not put a lot of weight on "observed and predicted means and variances of continuous indicators and observed and predicted correlations of all indicators" as a model modification technique. You can ask yourself why this is not a recommended technique in the frequentist world for model modification. It is because misfits on the observed quantities are not directly related to the misfits in the structural model.
Guido Biele posted on Tuesday, September 19, 2017 - 4:32 am
as far as I can see, the tech10 option provides the same posterior predictive P-values which I had already seen in the posterior predictive plots.
residual are not displayed, instead the following error message is displayed after sampling is completed and most of the *.out file is written:
*** FATAL ERROR Internal Error Code: FAILED TO OPEN GROUP An internal error has occurred. Please contact us about the error, providing both the input and data files if possible. (I am sending required files to the support-email)
Guido Biele posted on Tuesday, November 21, 2017 - 1:35 am
To do this, one needs a n times i matrix of log-likelihoods*, where n is the number of (independent) participants and i is the number of posterior samples.
Is it possible to obtain the log-likelihood by using one iteration from the posterior as starting values for a non-Bayesian analysis and let Mplus write out the log-likelihood for the starting values without doing any further optimization?
Thanks in advance for your support
*I mean the simple log-likelihood, as in log(p(y_i|theta)) where y_i is the observed data of participant i and theta are the model parameters.