

Highdimensional EFA (optimize run ti... 

Message/Author 


I'm running a series of EFAs on 6080 items, extracting up to 10 factors. We are using the ML estimator, and the data are missing completely at random (by design) and weights are applied. I am currently running this using M+ v6 on a 64bit Windows 7 quadcore machine  the first EFA has now been running for 6 days. We are planning to purchase a server on which to run these analyses, and were wondering what kind of machine parameters would optimize run times, or how much M+ is using machine features like, e.g., onboard ram vs virtual ram on the hard drive. 


Are the factor indicators categorical or continuous? 


Categorical (4 options) 


With maximum likelihood and categorical factor indicators each factor is one dimension of integration. We do not recommend models with more than four dimensions of integration. I suggest using WLSMV which is less computationally demanding in this case. I would think that you have some idea of how many factors are represented by the set of items. If it is four, for example, perhaps extracting from three to five or two to six would be sufficient. 


Unfortunately, we need to use ML to handle the missingness in our data (which is substantial, so we lose more than 1/2 of our subjects otherwise). We are constructing a measure, and for one of the sections (with 80 items), the proposed number of factors is 10, so we also need to run up to the 10factor solution. Is there a hardware solution to optimize M+'s running of a categorical ML large EFA analysis, without changing the analysis itself? 


Note that WLSMV does not use listwise deletion. Staying with ML, both integ=3 and integ=montecarlo can present numerical precision problems with that many factors. A more practical fullinformation approach would be to do Bayesian multiple imputation followed by WLSMV. The approach is studied in Section 3.1 of Asparouhov, T. & Muthén, B. (2010). Multiple imputation with Mplus. Technical Report. which on our web site under Papers, Bayesian Analysis. The UG ex 11.5 shows how do to the multiple imputation step. 


If WLSMV is not using listwise deletion, how exactly is it handling missing data? And how does that work if its based on polychorics and the associated asymptotic weight matrix? 


Pairwise present. 

Back to top 

