Xu, Man posted on Wednesday, March 06, 2013 - 4:06 am
I'd like to run parallel anlaysis for some categorical data, but the parallel anlaysis otpion is not available for categorical data.
I was wondering if it makes sense to use biserial/tetrachoric correlation matrix as data input, and analyse that as if it is continuous (but without estimating the thresholds).
To get the biserial/tetrachoric correlation matrix based on the same sample (taking into account of missing data), I would declare all data as categorical and ask for SAMPSTAT output to get correlation matrix.
I have a couple of questions regarding the use of parallel analysis in MPlus:
1) What happens when the ML solution does not converge with the random data? I imagine that this would happen quite often when trying to extract multiple factors from data uncorrelated in the population.
2) Is there a particular reason why you chose not to use the principal component eigenvalues for parallel analysis? The PA procedure seems to work quite well with these eigenvalues, including cases with ordinal data and polychoric correlations (e.g., Garrido, Abad, & Ponsoda, 2012).
We had parallel analysis developed also for tetrachoric and polychoric correlations, but my explorations of it indicated that it didn't work well, so we didn't include it in release versions. The poor performance may have to do with the fact that these correlation matrices behave differently than correlations among observed continuous variables.
Hi, Is there a reference that could be used to say that the performance of parallel analysis on categorical is poor? I have been suggested to perform parallel analysis to decide upon number of factors to keep in an EFA, but either want to learn how to, or motivate why not.
We looked at parallel analysis for categorical variables using eigenvalues for latent correlations (tetrachoric, polychoric...), but it didn't seem to point to the right number of factors so we decided to not include it in Mplus, but only have it for continuous outcomes. Perhaps it has to do with the correlations not capturing all information in the data with categorical variables. I know of no references on this.
I also wish that Parallel Analysis (PA) would be implemented in Mplus for categorical variables (as I indicated in a previous post in this thread from 2013).
My colleagues and I have been working on this issue for some time and have found through large simulations that PA works quite well with categorized data (obviously not as well as with the underlying continuous variables prior to categorization, but this is common for all methods with smaller samples due the loss of information). In fact, we recently published an article (see below) comparing the performance of PA with the CFI, TLI, RMSEA and SRMR fit indices, and PA clearly outperformed all of them with categorical data. To my knowledge there is currently no method used in the psychological/educational literature that outperforms PA with this type of data.
Garrido, L. E., Abad, F. J., Ponsoda, V. (2015, December 14). Are Fit Indices Really Fit to Estimate the Number of Factors with Categorical Variables? Some Cautionary Findings via Monte Carlo Simulation. Psychological Methods. Advance online publication. http://dx.doi.org/10.1037/met0000064
Interesting. I tried out PA on sample tetrachoric correlations once we had it for Pearson correlations and my initial impression was that it didn't capture the right number of factors at all. And that's why we didn't implement it. So I will be interested in reading your new paper. Thanks for letting us know.