Parallel analysis for categorical data PreviousNext
Mplus Discussion > Exploratory Factor Analysis >
 Xu, Man posted on Wednesday, March 06, 2013 - 4:06 am
I'd like to run parallel anlaysis for some categorical data, but the parallel anlaysis otpion is not available for categorical data.

I was wondering if it makes sense to use biserial/tetrachoric correlation matrix as data input, and analyse that as if it is continuous (but without estimating the thresholds).

To get the biserial/tetrachoric correlation matrix based on the same sample (taking into account of missing data), I would declare all data as categorical and ask for SAMPSTAT output to get correlation matrix.

Would this be sensible to do?

 Linda K. Muthen posted on Wednesday, March 06, 2013 - 9:50 am
We do not provide parallel analysis for categorical data because we have found it does not work well for categorical data.
 Luis Garrido posted on Friday, March 08, 2013 - 3:34 pm
Hi Linda,

I have a couple of questions regarding the use of parallel analysis in MPlus:

1) What happens when the ML solution does not converge with the random data? I imagine that this would happen quite often when trying to extract multiple factors from data uncorrelated in the population.

2) Is there a particular reason why you chose not to use the principal component eigenvalues for parallel analysis? The PA procedure seems to work quite well with these eigenvalues, including cases with ordinal data and polychoric correlations (e.g., Garrido, Abad, & Ponsoda, 2012).

 Linda K. Muthen posted on Friday, March 08, 2013 - 6:20 pm
1. Convergence is not an issue. The randomly generated correlation matrices are used only to compute eigenvalues.

2. The principal component eigenvalues are used in that we compute the eigenvalues for the correlation matrix not adjusting the diagonal.
 Luis Garrido posted on Saturday, March 09, 2013 - 4:37 am
Thanks for the clarifications!
 Carlos Fernando Collares posted on Tuesday, June 04, 2013 - 5:05 am

Could anyone clarify why parallel analysis would not work fine to determine de dimensionality of a categorical, dichotomous, data matrix?

Maybe this is somewhere else, but I could not find it by myself.


 Carlos Fernando Collares posted on Tuesday, June 04, 2013 - 5:09 am
Look at what I have found out:
 Carlos Fernando Collares posted on Tuesday, June 04, 2013 - 5:16 am
And here I found a study on why parallel analysis should be used with caution with dichotomous matrices:
 Bengt O. Muthen posted on Tuesday, June 04, 2013 - 8:36 am
We had parallel analysis developed also for tetrachoric and polychoric correlations, but my explorations of it indicated that it didn't work well, so we didn't include it in release versions. The poor performance may have to do with the fact that these correlation matrices behave differently than correlations among observed continuous variables.
 Örjan Dahlström posted on Tuesday, March 04, 2014 - 7:17 am
Is there a reference that could be used to say that the performance of parallel analysis on categorical is poor? I have been suggested to perform parallel analysis to decide upon number of factors to keep in an EFA, but either want to learn how to, or motivate why not.

 Bengt O. Muthen posted on Tuesday, March 04, 2014 - 9:04 am
We looked at parallel analysis for categorical variables using eigenvalues for latent correlations (tetrachoric, polychoric...), but it didn't seem to point to the right number of factors so we decided to not include it in Mplus, but only have it for continuous outcomes. Perhaps it has to do with the correlations not capturing all information in the data with categorical variables. I know of no references on this.
 Luis Eduardo Garrido posted on Tuesday, January 26, 2016 - 5:29 am
Dear Dr. Muthen,

I also wish that Parallel Analysis (PA) would be implemented in Mplus for categorical variables (as I indicated in a previous post in this thread from 2013).

My colleagues and I have been working on this issue for some time and have found through large simulations that PA works quite well with categorized data (obviously not as well as with the underlying continuous variables prior to categorization, but this is common for all methods with smaller samples due the loss of information). In fact, we recently published an article (see below) comparing the performance of PA with the CFI, TLI, RMSEA and SRMR fit indices, and PA clearly outperformed all of them with categorical data. To my knowledge there is currently no method used in the psychological/educational literature that outperforms PA with this type of data.

Garrido, L. E., Abad, F. J., Ponsoda, V. (2015, December 14). Are Fit Indices Really Fit to Estimate the Number of Factors with Categorical Variables? Some Cautionary Findings via Monte Carlo Simulation. Psychological Methods. Advance online publication.
 Luis Eduardo Garrido posted on Tuesday, January 26, 2016 - 11:43 am
I forgot to clarify in my previous post that we implemented PA with tetrachoric/polychoric correlations.
 Bengt O. Muthen posted on Tuesday, January 26, 2016 - 6:38 pm
Interesting. I tried out PA on sample tetrachoric correlations once we had it for Pearson correlations and my initial impression was that it didn't capture the right number of factors at all. And that's why we didn't implement it. So I will be interested in reading your new paper. Thanks for letting us know.
 Kellina Pyle posted on Tuesday, August 29, 2017 - 2:58 pm
I have categorical variables rated on a 4-point Likert scale, and I would like to conduct parallel analysis to determine the number of factors to retain. My version of MPlus (Version 7) will not allow for parallel analysis with categorical variables. I read from an earlier post that with 5-point Likert scales you can treat them as continuous in order to run parallel analysis with MPlus. Would the same apply for 4-point likert scale items?
 Bengt O. Muthen posted on Tuesday, August 29, 2017 - 5:05 pm
If you don't have strong floor or ceiling effects, 4-point variables might be well approximated as continuous.
 Kellina Pyle posted on Wednesday, August 30, 2017 - 10:55 am
Thank you for your prompt reply.

Unfortunately I have strong floor effects. Is parallel analysis inappropriate here? What might be a better approach for determining the number of factors to retain with non-normal categorical data?
 Bengt O. Muthen posted on Wednesday, August 30, 2017 - 3:59 pm
There was a recent article on parallel analysis for categorical variables (using latent correlations) but this is not implemented in Mplus.

With the WLSMV estimator you can decide on the number of factors using chi-square and other overall model fit statistics. With ML, you can use BIC.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message