EFA with categorical indicators PreviousNext
Mplus Discussion > Exploratory Factor Analysis >
 Anonymous posted on Monday, May 23, 2005 - 4:18 pm
Hello. I was wondering . . .

Will MPLUS preform a principal components analysis on categorical indicators?

Is it possible to obtain communality estimates using a WLSMV estimator to factor analyze categorical indicators in MPLUS?

Thanks for your help.
 bmuthen posted on Monday, May 23, 2005 - 5:52 pm
No - but a ULS factor analysis would seem a good alternative where you don't have to assume zero residual variances (PCA is actually giving the starting values for the FA).

Yes - they are obtained as 1-(residual variance).
 Reetu posted on Thursday, December 15, 2005 - 2:54 pm
I'm trying to do an EFA with both continuous and categorical variables. I have missings in my continuous but not in my categorical. The code is written below:


I am using version 2.14 and I keep getting an error saying that it will only perform this on continuous variables. Is there a way of getting around this?
 Linda K. Muthen posted on Thursday, December 15, 2005 - 5:07 pm
No, you need Version 3 for that.
 Pajarita Charles posted on Wednesday, June 11, 2008 - 9:41 pm

I am doing both EFA and CFA with Likert type items (1-5), and treated the items as if they were categorical based on recommendations I've seen by you elsewhere. I used weighted least-squares as the estimation method.

1) Is it correct to use WLS and to treat Likert-type items as categorical and not continous when there are such few categories, i.e., less than 10?

2) If it is okay, can you offer a suggestion for some references to justify treating Likert items as categorical? I need to provide a rationale for my choice for a paper.

3) Does the paper below offer this justification and if so, can you send it to me? It's not published as far as I can tell. "Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes." Accepted for publication in Psychometrika, #75. By Muthen, du Toit, & Spisc 1997.

Thank you.
 Linda K. Muthen posted on Thursday, June 12, 2008 - 12:27 pm
I don't think any justification is needed to treat 5-scale Likert items as categorical variables. Likert items are categorical. There would need to be a justification to treat them as continuous variables. See the following paper for more information:

Muthén, B. & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.
 Pajarita Charles posted on Sunday, June 15, 2008 - 4:07 pm
Thank you for verifying that what I did was correct. Is there any chance you can send me the paper in the British Journal because my library doesn't have electronic access that far back in time and I am out of state so I can't go in person to the library. Thank you so much!
 Pajarita Charles posted on Sunday, June 15, 2008 - 5:16 pm
Sorry - I didn't provide an email address:


Thanks again.
 Linda K. Muthen posted on Monday, June 16, 2008 - 9:30 am
That should be available at Bengt's website:

 aprile benner posted on Tuesday, June 24, 2008 - 1:09 pm
i have a 30-item survey. survey responses are categorical (but not ordinal). given this, is there a way to do an EFA (and then a CFA) to identify factors? i looked in the mplus manual, but the only examples provided are for ordinal categorical data. if it is not possible to conduct an EFA/CFA, is there another way to determine how question responses group together?
 Linda K. Muthen posted on Tuesday, June 24, 2008 - 2:05 pm
Do you mean that the factor indicators are nominal, unordered categorical?
 aprile benner posted on Tuesday, June 24, 2008 - 2:25 pm
that is correct. i believe there will be an underlying response pattern that falls into a number of different "factors," but i am not sure of the proper analysis technique to determine this. thanks!
 Linda K. Muthen posted on Tuesday, June 24, 2008 - 2:50 pm
We don't allow nominal factor indicators for EFA. I can't think of any other analysis technique that would work for this.
 Bruce A. Cooper posted on Wednesday, July 02, 2008 - 5:31 pm
I've used ULS as the estimator for an EFA of about 16 5-point Likert items measuring various symptoms, treated as ordinal categorical vars. I chose ULS instead of WLSMV because many of the item distributions are strongly skewed and the sample is only 157 cases, after reading Bengt's paper comparing ML, GLS, WLS, Robust WLS, and ULS for analysis of nonnormal data (can't find the reference now!). My memory is that Robust WLS performed best as long as N>200. So, I'm using ULS. The problem I have is that Mplus only gives unstandardized RMR for a fit index, which has no standard criterion for good fit. What to do?
 Linda K. Muthen posted on Wednesday, July 02, 2008 - 5:37 pm
Skewness is not an issue with categorical indicators. The categorical data methodology can handle floor and ceiling effects. I suggest using WLSMV.
 Bruce A. Cooper posted on Monday, August 04, 2008 - 1:58 pm
Hi Linda -

Following up and working on this again now that I've got a draft to review from the researcher...

The primary reason I went with ULS was because of the small sample size combined with the ordinal scale. In addition, for a couple of the analyses, the results from an WLSMV analysis were not consistent with that from the ULS analysis, making me want to be more cautious with those and stick to the ULS for it (cf. Muthén 1989, paper #21).

So now, I'm still wondering about evaluating model fit. For this analysis, Mplus 5.1 gives only the RMR, not the standardized RMR (for which I found some guidance in one book). I understand that the size of the RMR is scale dependent and so there are no standards similar to those for the RMSEA or even the SRMR that can be used to evaulate model fit.

I'm thinking that an interpretation of the RMR for my data might therefore be possible by extension, if I compare the RMR from the ULS analysis to the RMR I get from a "good" analysis using WLSMV, where I get the RMSEA as well as the RMR. If the RMR indices are in the same ballpark for the two analyses, and if the RMSEA is acceptable for the WLSMV analysis, I think the ULS estimate for the RMR may provide evidence of "good fit".

Any thoughts would be welcome!
 Linda K. Muthen posted on Monday, August 04, 2008 - 5:11 pm
If you want to use unweighted least squares and obtain fit statistics, use ULSMV which gives standard errors and fit statistics. ULS gives only parameter estimates and is useful when there are very many categorical variables. The results between unweighted least squares and weighted least squares should not differ much if the model fits well.
 Qi posted on Monday, December 08, 2008 - 9:11 am
On p482-483 of Mplus manual, are those bolded font estimators the default estimator in Mplus?

I am running EFA and CFA for items using Likert-scale, which are treated as categorical ordinal variables. According to Flora & Curran (2004), WLSMV seems to be the best estimator for CFA with such items, which I believe is also the default in Mplus5.

For EFA, the default in Mplus is WLSM, is this the recommended estimator? Is there any reference for using WLSM is better than WLSMV for EFA with categorical ordinal items?

Thanks for your guidance!
 Linda K. Muthen posted on Monday, December 08, 2008 - 3:23 pm
Yes, the bolded entries are the default. WLSM was selected as the default estimator for EFA because EFA often involves a lot of items and it is faster than WLSMV. There is no article that documents this.
 Dorothee Durpoix posted on Thursday, January 22, 2009 - 8:15 pm
Dear Linda,

1- Regarding the last post (WLSM vs. WLSMV), if the number of variables/items are relatively small (<15), would you advise to use WLSMV rather than WLSM?
Apart from the time issue you just mentioned, would you mind enlightening me on the advantages of one estimator over the other? I've looked for information on this, but was not very successful...

2- I've got a mix of ordinal (5-point scales) and binary variables. Some of my binary variables are highly skewed, some, inversely highly skewed. Moreover, crosstabs between my variables reveal that some combinations have zero count, which is particularly detrimental in polychoric correlations, I've read. I've run some EFA (with default WLSM) and, indeed, some models did not converge.
Would you advise dropping the problematic variables, integrating these variables in an index instead for instance, or use a more appropriate estimation method (in this case, which one?)?
 Linda K. Muthen posted on Friday, January 23, 2009 - 10:50 am
1. Yes, I would use WLSMV when the number of variables is small. See the following paper which is available on the website:

Muthén, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes.

2. You do not want empty cells in the bivariate tables. You can either collapse categories if that is appropriate or use only one of the variables.

Methods for categorical data using either weighted least squares or maximum likelihood estimation take floor and ceiling effects into account.
 Helen Skerman posted on Tuesday, October 26, 2010 - 1:09 am
I am doing an EFA of 32 ordinal variables with 5 categories that are positively skewed. The sample size is 218, so I am using ULSMV. There is no error message when I run the analysis, but I was wondering if I should collapse some categories, given the number of cells with zero frequencies in bivariate tables, especially for the higher ratings.
 Linda K. Muthen posted on Tuesday, October 26, 2010 - 8:10 am
I would collapse categories particularly due to the small sample size.
 Chantal Hermann posted on Friday, April 08, 2011 - 11:19 am

I just have a quick question that I was hoping you could please help me with. I would like to run an EFA on a number of categorical variables that have different likert scales (e.g., some 4 points, some 5 points, and some 7 points). Is this problematic for EFA? As well, is it possible to run this using MPlus?

Thank-you in advance,

 Linda K. Muthen posted on Friday, April 08, 2011 - 11:28 am
This is not a problem and can be done in Mplus.
 Chantal Hermann posted on Friday, April 08, 2011 - 11:42 am
Hi Linda,

Thank-you for the quick response! Just a follow-up question, would MPlus calculate a polychoric correlation matrix in this case?


 Linda K. Muthen posted on Friday, April 08, 2011 - 1:58 pm
The correlation between two ordered categorical variables is a polychoric correlation.
 Mary Teresa Granillo posted on Sunday, April 17, 2011 - 5:10 am
I'm conducting an EFA with highly positively skewed data so I used the ordinal method. I know that WLSM is the default. However, when I use the ULSMV estimator I get a better model fit. Can you tell me when it is appropriate to use the ULSMV versus the WLSM estimator?

 Linda K. Muthen posted on Sunday, April 17, 2011 - 2:08 pm
It can happen in some cases. See articles by Albert Maydeu where he discusses this issue.
 Mary Teresa Granillo posted on Sunday, April 17, 2011 - 6:26 pm
Hi Linda,
Thank you for the reference. It seems that ULSMV is most optimal for small sample sizes. My N = 15,917 and the EFA is on 9 items that make up a depression scale (PHQ-9). Most prior research shows evidence of a single factor, however there is evidence of a 2 factor structure as well. This is why I wanted to do an EFA. I'm just not very familiar with Mplus nor identifying the correct estimator. Given this information, what is your recommendation for an estimator?

Thanks again,
 Linda K. Muthen posted on Monday, April 18, 2011 - 8:24 am
Our default is WLSMV.
 Chantal Hermann posted on Thursday, May 12, 2011 - 10:50 am

I just have a quick question I was hoping you could please help me with. I have run my analyses with both FACTOR and MPlus and I am getting different results. I noticed that the polychoric correlations computed are different for the two programs, 1) do you know why the correlations might differ? and 2) do you know of any differences between FACTOR and MPlus that would lead to different results.

The analyses were for a sample of 280 participants, 36 categorical items (4 point likert scale), ULS, and Promax rotation.

Also, in FACTOR my correlation matrix was non-positive definite (used ridge estimator to correct)but was not in MPplus?

Thanks for your help,

Chantal Hermann
 Bengt O. Muthen posted on Thursday, May 12, 2011 - 6:34 pm
Are you sure that FACTOR computes polychoric correlations and not regular (Pearson) correlations? Do you use the same sample size in FACTOR and Mplus for the pairs of variables? See Mplus Type=Basic output.

If this doesn't help, please send outputs and license number to support.
 Shane R. Moulton posted on Wednesday, September 26, 2012 - 3:35 am
Hello-I have 2 related questions:
1. Realizing EFA/CFA is robust to normality assumptions, are there any guidelines for when an indicator’s distribution is too skewed to be useful? I have a large (n=972) dataset with 33 indicators-14 of them cat.(dich.). But, most suffer from heavy flooring effects (censored down?). My "best distributed" cat.indicator has 76% 0s and 24% 1s, and my "worst distributed" one has 95.8% 0s and 4.2% 1s(ave. across all 14 cat.indicators: 85% 0s, 15% 1s... lopsided distributions indeed!). Even most of my continu. indicators have well over 50% of cases at the floor value, with the scant remainder occupying progressively higher values. My question: At what point do distributions (esp. cat. ones) become too lopsided to be useful?

2. My second question pertains to the VARIABLE subcommands, CENSORED and CATEGORICAL. I thought I could simultaneously run both of these subcommands for EFA—even with the same indicators identified under each; but when I try I get an error message saying I can only specify each indicator under CENSORED or CATEGORICAL, but not both. If there is no way to have the same indicators in both subcommands, which subcommand should one choose when all cat. indicators are also highly censored (e.g., see question 1 above). In other words, is it more important to identify censored cat. indicator as CENSORED or CATEGORICAL?
 Linda K. Muthen posted on Wednesday, September 26, 2012 - 6:09 am
Categorical data methodology can hand floor and ceiling effects. Skewness is for continuous variables. The issue for categorical variables is empty cells in the bivariate tables.
 Shane R. Moulton posted on Wednesday, October 03, 2012 - 4:48 pm
Thanks, Linda. I appreciate your reply.

Regarding EFA/CFA, are there any problems with dichotomizing hopelessly-skewed continuous indicators and treating them as categorical indicators? I have some continuous variables whose distributions can't be aided by transformations, so now that I know distributions are never an issue with categorical, I'm wondering if dichotomizing messy continuous is a viable option for side-stepping normality assumptions? Forgive me if this is a newb question to ask--I'm still a student and trying to learn EFA/CFA with Mplus on my own.

Thanks again!
 Carme Viladrich posted on Thursday, October 04, 2012 - 8:15 am

Regarding the post by Linda on the empty cells issue, how does Mplus treat the empty cells in bivariate categorical tables? Are there any corrections done?

Should I be worried about this issue if I had 10 empty cells out of 150 (4 items, 5 ordered response categories each)?
 Linda K. Muthen posted on Thursday, October 04, 2012 - 9:54 am

If you have more than 25% at either the upper or lower value, you can try the following:

1. Censored
2. Two-part modeling
3. Trichotomizing
4. Dichotomizing
 Linda K. Muthen posted on Thursday, October 04, 2012 - 9:57 am

As the default .5 divided by the sample size is added the zero cells. See the ADDFREQUENCY option.

This is more of a problem for bivariate tables. For ordinal variables 10 out of 150 is probably acceptable as long as it does not cause problems.
 Jietng Zhang posted on Tuesday, December 04, 2012 - 1:30 pm
Dear Bengt & Linda,

Recently, I conduct EFA and CFA for a screening scale with binary indicators. I find the factor scores are either negatively skewed or mixture distributed, and there about 15% of samples with the same lowest scores for the five factors: -.499, -.714, -.761, -.512, -.771.

Is it a kind of floor effect? Is it a problem that factor scores are not normally distributed? If so, any solution? Thank you so much!

PS: the data is a total population of freshmen in a college, and most of the items take up high percentage of negative response.
 Bengt O. Muthen posted on Tuesday, December 04, 2012 - 1:48 pm
This may be due to the items, for instance not allowing a separation of scores at the low end. It does not necessarily mean that the factor distribution is non-normal.

For an alternative approach using mixtures in Mplus, see

Wall, M. M., Guo, J., & Amemiya, Y. (2012). Mixture factor analysis for approximating a nonnormally distributed continuous latent factor with continuous and dichotomous observed variables. Multivariate Behavioral Research, 47:2, 276-313.
 Jietng Zhang posted on Sunday, December 09, 2012 - 12:19 pm
Thank you, Bengt. The paper is quite interesting and useful to me.
 Daniel Lee posted on Monday, February 11, 2013 - 7:48 pm
Hi, just to make sure, I have a 18-item scale on Likert-type scale (1-5). If I use WLSMV, do I need to worry about multivariate normality? I know this is a concern when you use ML, but wasn't sure for WLSMV. Thank you!
 Linda K. Muthen posted on Monday, February 11, 2013 - 8:16 pm
With categorical outcomes, you don't need to worry about multivariate normality with WLSMV or ML if the variables are on the CATEOGORICAL list. In both cases, categorical variable methodology handles floor and ceiling effects.
 Chantal Hermann posted on Wednesday, March 20, 2013 - 7:57 am

I have run an EFA on ordinal data (polychoric correlations) with the WLSMV estimation method on a number of different items that are rated on different Likert scales (e.g., some 7-point Likert scales, some 5-point Likert scales, some 4-point Likert scales). The items seem to break up into factors based on their Likert scale scoring (e.g., all of the 7-point items cluster together). Is it possible that the factor structure is a product of how the items have been scored? (e.g., items cluster based on likert scale scoring vs. underlying latent factors)? Thank you for your help.
 Linda K. Muthen posted on Wednesday, March 20, 2013 - 10:21 am
How did you expect the factors to load? It seems the content of items with a certain number of categories must be the same.
 Rebecca D Rhead posted on Wednesday, February 26, 2014 - 11:41 am

I have an EFA model with 9 ordinal indicator variables (5 point likert scale) and one of these variables demonstrates floor/ceiling effect. I have specified all variables as categorical and used MLR estimation.

1) Will MLR deal with the floor/ceiling effect of my one troublesome variable?

2) Is ML method of EFA still suitable given the inclusion of this variable? Or would PAF be better?

Many Thanks x
 Linda K. Muthen posted on Wednesday, February 26, 2014 - 12:31 pm
1. Categorical data methodology deals with floor and ceiling effects using maximum likelihood and weighted least squares estimation for all variables on the CATEGORICAL list.

2. Yes.
 seungjin lee posted on Wednesday, May 28, 2014 - 8:43 am
Hello Dr. Muthen,

I have two questions.

1. I had conducted EFA with a binary data set with WLSMV estimation method. I expected to have WRMR instead of SRMR.However, SRMR was estimated. Can you explain why?

2. Let's suppose that two-factors model is the best model from EFA analysis.Is it good idea to conduct CFA with these two factors to confirm EFA result? I read somewhere that this is not good idea. Could you explain to me why?

Have a good day!

 seungjin lee posted on Wednesday, May 28, 2014 - 9:07 am
Hello Dr. Muthen,

I am sorry again.. :-)

It there any meaning for two-factors CFA model (all indicators are explained by two factors (--seems like EFA model) if it is not the latent growth model?

 Bengt O. Muthen posted on Wednesday, May 28, 2014 - 6:04 pm
1. With EFA the weighting is less important since the model doesn't mix say thresholds and correlations, but focuses on correlations alone.

2. CFA is not needed on top of EFA since you have all you need in the EFA (model fit, SEs).

Regarding your last question I don't understand why you think a 2-factor doesn't have meaning. Maybe you want to read a factor analysis book like Brown's.
 seungjin lee posted on Wednesday, May 28, 2014 - 6:29 pm
Hello Dr. Muthen,

First of all, I always appreciate your response!!

My second question was

"It there any meaning for two-factors CFA model (all indicators are explained by two factors (--seems like EFA model) if it is not the latent growth model?"

Probably, my statement was not clear with my language barrier. - -;

What I meant:
In two-factors CFA model, each factor has totally identical indicators and then each indicator is explained by the two factors.
It seems EFA model before a rotation. Then, it does not make sense for me to conduct CFA.

If you understand like this, probably, I missed something. Then, if you are fine, can you let me know the book title (Author Brown as you suggested)?

Thank you so much!!


 Bengt O. Muthen posted on Thursday, May 29, 2014 - 11:05 am
A 2-factor CFA model needs to apply at least 4 restrictions on the factor loadings and factor covariance matrix to be identified. That's what EFA does. Typically, CFA has more restrictions.

Brown (2006) is a Guilford book called Conf. FA... We recommend it for CFA.
 Jamie Griffith posted on Friday, August 01, 2014 - 4:43 pm
Dear Mplus Team

I have a question concerning EFA with a bifactor (oblique) rotation using categorical indicators. With MLR (or any other) estimator, are thresholds calculated for each indicator ? If so, how can I find them in the output.

Many thanks in advance

Jamie Griffith
 Linda K. Muthen posted on Friday, August 01, 2014 - 4:59 pm
We don't give the with TYPE=EFA but we do with ESEM. Example 5.24 without covariates and direct effects is the same as EFA. You can do this for categorical variables by adding the CATEGORICAL option.
 Jamie Griffith posted on Sunday, August 03, 2014 - 1:38 pm
Dear Linda

Thanks so much for your speedy response. I do have a follow-up question: Is ESEM possible with ML or MLR estimation? I am guessing not, because I have only been able to fit an ESEM model using WLSMV so far. With ML and MLR, Mplus issues me a warning.

Thanks again

 Linda K. Muthen posted on Sunday, August 03, 2014 - 2:23 pm
What is the warning. It should be available.
 Jamie Griffith posted on Monday, August 04, 2014 - 7:12 am
Hi Linda

When I attempt to fit the model with MLR and ML, I receive the following error :

*** ERROR in MODEL command
The use of EFA factors (ESEM) is not allowed with ALGORITHM=INTEGRATION.

Using WLSMV, the analyses progress fine.

If you have any advise about fitting it using ML and MLR, I would very much appreciate it.

Thanks !

 Linda K. Muthen posted on Monday, August 04, 2014 - 7:17 am
There is currently no way around this.
 Margarita  posted on Wednesday, April 08, 2015 - 1:43 am
Dear Dr. Muthen,

I have 14 ordinal items from two different measures with different scales (7 items from a 4-point scale and 7 from a 7-point scale) and I would like to see whether the two measures overlap. Initially, I carried out a 2-factor EFA with WLSMV but the results are completely different when I use ML. The two measures overlap when WLSMV is used, whereas when ML is used they seem to be distinct. Thus, I was wondering:

1. Can I use items that have different Likert-scales in EFA? and

2. If yes, given that results differ between WLSMV and ML, which of the two would be more preferable?

Thank you for your help!
 Linda K. Muthen posted on Wednesday, April 08, 2015 - 10:11 am
1. Yes.
2. The results should be very close. Please send the two outputs and your license number to support@statmodel.com.
 Cheng posted on Thursday, December 24, 2015 - 6:13 am
1. Can I use WLSMV estimator to run EFA in Mplus if the indicators are categorical (True/false)?
2. Can I use item that have different likert scales in CFA? I saw your previous comment mentioned that EFA can.

 Linda K. Muthen posted on Thursday, December 24, 2015 - 6:20 am
Yes to both.
 Cheng posted on Thursday, December 24, 2015 - 9:36 pm
May I ask again, if I have 4 items with categorical (true/false), and 6 items with 5 likert scale, can I still run them together in EFA then CFA in mplus?

I use SPSS most of the time for EFA, I always thought that we should run same likert scale indicators only in one Efa model, no mixing up with different scale. Maybe I am wrong now.

 Linda K. Muthen posted on Friday, December 25, 2015 - 6:32 am
You can mix scales.
 Yann Le Corff posted on Saturday, October 01, 2016 - 7:34 am
Dear Dr. Muthen,

I am conducting an EFA on a 300-item personality inventory. Items are dichotomous (true-false). N = 3,500.

I wanted to use the WLSMV method but as others have reported, the analysis takes forever (still computing after two days). I am using my colleague's 6.12 64-bit version because my 7.2 32-bit would not even run the analysis.

Would the ULSMV be an appropriate alternative? In fact, which would be the most appropriate method (and why)? I've tried to find papers comparing ULSMV and WLSMV but found nothing except a dissertation using ordinal variables.
 Bengt O. Muthen posted on Monday, October 03, 2016 - 7:04 am
You should use ULSMV. I think there are articles on it in the SEM journal. If you have only a couple of factors you can use ML. See also our FAQ:

Estimator choices with categorical outcomes

If you do these big analyses you should get a fast 64-bit machine - and upgrade to 7.4.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message