Message/Author 

Anonymous posted on Monday, May 23, 2005  4:18 pm



Hello. I was wondering . . . Will MPLUS preform a principal components analysis on categorical indicators? Is it possible to obtain communality estimates using a WLSMV estimator to factor analyze categorical indicators in MPLUS? Thanks for your help. 

bmuthen posted on Monday, May 23, 2005  5:52 pm



No  but a ULS factor analysis would seem a good alternative where you don't have to assume zero residual variances (PCA is actually giving the starting values for the FA). Yes  they are obtained as 1(residual variance). 

Reetu posted on Thursday, December 15, 2005  2:54 pm



I'm trying to do an EFA with both continuous and categorical variables. I have missings in my continuous but not in my categorical. The code is written below: TYPE = MISSING EFA 1 10; I am using version 2.14 and I keep getting an error saying that it will only perform this on continuous variables. Is there a way of getting around this? 


No, you need Version 3 for that. 


Hello, I am doing both EFA and CFA with Likert type items (15), and treated the items as if they were categorical based on recommendations I've seen by you elsewhere. I used weighted leastsquares as the estimation method. 1) Is it correct to use WLS and to treat Likerttype items as categorical and not continous when there are such few categories, i.e., less than 10? 2) If it is okay, can you offer a suggestion for some references to justify treating Likert items as categorical? I need to provide a rationale for my choice for a paper. 3) Does the paper below offer this justification and if so, can you send it to me? It's not published as far as I can tell. "Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes." Accepted for publication in Psychometrika, #75. By Muthen, du Toit, & Spisc 1997. Thank you. 


I don't think any justification is needed to treat 5scale Likert items as categorical variables. Likert items are categorical. There would need to be a justification to treat them as continuous variables. See the following paper for more information: Muthén, B. & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171189. 


Thank you for verifying that what I did was correct. Is there any chance you can send me the paper in the British Journal because my library doesn't have electronic access that far back in time and I am out of state so I can't go in person to the library. Thank you so much! 


Sorry  I didn't provide an email address: pcharles@email.unc.edu Thanks again. 


That should be available at Bengt's website: http://www.gseis.ucla.edu/faculty/muthen/muthen3.htm 


i have a 30item survey. survey responses are categorical (but not ordinal). given this, is there a way to do an EFA (and then a CFA) to identify factors? i looked in the mplus manual, but the only examples provided are for ordinal categorical data. if it is not possible to conduct an EFA/CFA, is there another way to determine how question responses group together? 


Do you mean that the factor indicators are nominal, unordered categorical? 


that is correct. i believe there will be an underlying response pattern that falls into a number of different "factors," but i am not sure of the proper analysis technique to determine this. thanks! 


We don't allow nominal factor indicators for EFA. I can't think of any other analysis technique that would work for this. 


I've used ULS as the estimator for an EFA of about 16 5point Likert items measuring various symptoms, treated as ordinal categorical vars. I chose ULS instead of WLSMV because many of the item distributions are strongly skewed and the sample is only 157 cases, after reading Bengt's paper comparing ML, GLS, WLS, Robust WLS, and ULS for analysis of nonnormal data (can't find the reference now!). My memory is that Robust WLS performed best as long as N>200. So, I'm using ULS. The problem I have is that Mplus only gives unstandardized RMR for a fit index, which has no standard criterion for good fit. What to do? 


Skewness is not an issue with categorical indicators. The categorical data methodology can handle floor and ceiling effects. I suggest using WLSMV. 


Hi Linda  Following up and working on this again now that I've got a draft to review from the researcher... The primary reason I went with ULS was because of the small sample size combined with the ordinal scale. In addition, for a couple of the analyses, the results from an WLSMV analysis were not consistent with that from the ULS analysis, making me want to be more cautious with those and stick to the ULS for it (cf. Muthén 1989, paper #21). So now, I'm still wondering about evaluating model fit. For this analysis, Mplus 5.1 gives only the RMR, not the standardized RMR (for which I found some guidance in one book). I understand that the size of the RMR is scale dependent and so there are no standards similar to those for the RMSEA or even the SRMR that can be used to evaulate model fit. I'm thinking that an interpretation of the RMR for my data might therefore be possible by extension, if I compare the RMR from the ULS analysis to the RMR I get from a "good" analysis using WLSMV, where I get the RMSEA as well as the RMR. If the RMR indices are in the same ballpark for the two analyses, and if the RMSEA is acceptable for the WLSMV analysis, I think the ULS estimate for the RMR may provide evidence of "good fit". Any thoughts would be welcome! Bruce 


If you want to use unweighted least squares and obtain fit statistics, use ULSMV which gives standard errors and fit statistics. ULS gives only parameter estimates and is useful when there are very many categorical variables. The results between unweighted least squares and weighted least squares should not differ much if the model fits well. 

Qi posted on Monday, December 08, 2008  9:11 am



On p482483 of Mplus manual, are those bolded font estimators the default estimator in Mplus? I am running EFA and CFA for items using Likertscale, which are treated as categorical ordinal variables. According to Flora & Curran (2004), WLSMV seems to be the best estimator for CFA with such items, which I believe is also the default in Mplus5. For EFA, the default in Mplus is WLSM, is this the recommended estimator? Is there any reference for using WLSM is better than WLSMV for EFA with categorical ordinal items? Thanks for your guidance! 


Yes, the bolded entries are the default. WLSM was selected as the default estimator for EFA because EFA often involves a lot of items and it is faster than WLSMV. There is no article that documents this. 


Dear Linda, 1 Regarding the last post (WLSM vs. WLSMV), if the number of variables/items are relatively small (<15), would you advise to use WLSMV rather than WLSM? Apart from the time issue you just mentioned, would you mind enlightening me on the advantages of one estimator over the other? I've looked for information on this, but was not very successful... 2 I've got a mix of ordinal (5point scales) and binary variables. Some of my binary variables are highly skewed, some, inversely highly skewed. Moreover, crosstabs between my variables reveal that some combinations have zero count, which is particularly detrimental in polychoric correlations, I've read. I've run some EFA (with default WLSM) and, indeed, some models did not converge. Would you advise dropping the problematic variables, integrating these variables in an index instead for instance, or use a more appropriate estimation method (in this case, which one?)? 


1. Yes, I would use WLSMV when the number of variables is small. See the following paper which is available on the website: Muthén, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. 2. You do not want empty cells in the bivariate tables. You can either collapse categories if that is appropriate or use only one of the variables. Methods for categorical data using either weighted least squares or maximum likelihood estimation take floor and ceiling effects into account. 


I am doing an EFA of 32 ordinal variables with 5 categories that are positively skewed. The sample size is 218, so I am using ULSMV. There is no error message when I run the analysis, but I was wondering if I should collapse some categories, given the number of cells with zero frequencies in bivariate tables, especially for the higher ratings. 


I would collapse categories particularly due to the small sample size. 


Hi, I just have a quick question that I was hoping you could please help me with. I would like to run an EFA on a number of categorical variables that have different likert scales (e.g., some 4 points, some 5 points, and some 7 points). Is this problematic for EFA? As well, is it possible to run this using MPlus? Thankyou in advance, Chantal 


This is not a problem and can be done in Mplus. 


Hi Linda, Thankyou for the quick response! Just a followup question, would MPlus calculate a polychoric correlation matrix in this case? Thanks, Chantal 


The correlation between two ordered categorical variables is a polychoric correlation. 


Hi, I'm conducting an EFA with highly positively skewed data so I used the ordinal method. I know that WLSM is the default. However, when I use the ULSMV estimator I get a better model fit. Can you tell me when it is appropriate to use the ULSMV versus the WLSM estimator? Thanks, Teresa 


It can happen in some cases. See articles by Albert Maydeu where he discusses this issue. 


Hi Linda, Thank you for the reference. It seems that ULSMV is most optimal for small sample sizes. My N = 15,917 and the EFA is on 9 items that make up a depression scale (PHQ9). Most prior research shows evidence of a single factor, however there is evidence of a 2 factor structure as well. This is why I wanted to do an EFA. I'm just not very familiar with Mplus nor identifying the correct estimator. Given this information, what is your recommendation for an estimator? Thanks again, Teresa 


Our default is WLSMV. 


Hi, I just have a quick question I was hoping you could please help me with. I have run my analyses with both FACTOR and MPlus and I am getting different results. I noticed that the polychoric correlations computed are different for the two programs, 1) do you know why the correlations might differ? and 2) do you know of any differences between FACTOR and MPlus that would lead to different results. The analyses were for a sample of 280 participants, 36 categorical items (4 point likert scale), ULS, and Promax rotation. Also, in FACTOR my correlation matrix was nonpositive definite (used ridge estimator to correct)but was not in MPplus? Thanks for your help, Chantal Hermann 


Are you sure that FACTOR computes polychoric correlations and not regular (Pearson) correlations? Do you use the same sample size in FACTOR and Mplus for the pairs of variables? See Mplus Type=Basic output. If this doesn't help, please send outputs and license number to support. 


HelloI have 2 related questions: 1. Realizing EFA/CFA is robust to normality assumptions, are there any guidelines for when an indicatorâ€™s distribution is too skewed to be useful? I have a large (n=972) dataset with 33 indicators14 of them cat.(dich.). But, most suffer from heavy flooring effects (censored down?). My "best distributed" cat.indicator has 76% 0s and 24% 1s, and my "worst distributed" one has 95.8% 0s and 4.2% 1s(ave. across all 14 cat.indicators: 85% 0s, 15% 1s... lopsided distributions indeed!). Even most of my continu. indicators have well over 50% of cases at the floor value, with the scant remainder occupying progressively higher values. My question: At what point do distributions (esp. cat. ones) become too lopsided to be useful? 2. My second question pertains to the VARIABLE subcommands, CENSORED and CATEGORICAL. I thought I could simultaneously run both of these subcommands for EFAâ€”even with the same indicators identified under each; but when I try I get an error message saying I can only specify each indicator under CENSORED or CATEGORICAL, but not both. If there is no way to have the same indicators in both subcommands, which subcommand should one choose when all cat. indicators are also highly censored (e.g., see question 1 above). In other words, is it more important to identify censored cat. indicator as CENSORED or CATEGORICAL? Thanks! 


Categorical data methodology can hand floor and ceiling effects. Skewness is for continuous variables. The issue for categorical variables is empty cells in the bivariate tables. 


Thanks, Linda. I appreciate your reply. Regarding EFA/CFA, are there any problems with dichotomizing hopelesslyskewed continuous indicators and treating them as categorical indicators? I have some continuous variables whose distributions can't be aided by transformations, so now that I know distributions are never an issue with categorical, I'm wondering if dichotomizing messy continuous is a viable option for sidestepping normality assumptions? Forgive me if this is a newb question to askI'm still a student and trying to learn EFA/CFA with Mplus on my own. Thanks again! 


Hi. Regarding the post by Linda on the empty cells issue, how does Mplus treat the empty cells in bivariate categorical tables? Are there any corrections done? Should I be worried about this issue if I had 10 empty cells out of 150 (4 items, 5 ordered response categories each)? 


Shane: If you have more than 25% at either the upper or lower value, you can try the following: 1. Censored 2. Twopart modeling 3. Trichotomizing 4. Dichotomizing 


Carme: As the default .5 divided by the sample size is added the zero cells. See the ADDFREQUENCY option. This is more of a problem for bivariate tables. For ordinal variables 10 out of 150 is probably acceptable as long as it does not cause problems. 


Dear Bengt & Linda, Recently, I conduct EFA and CFA for a screening scale with binary indicators. I find the factor scores are either negatively skewed or mixture distributed, and there about 15% of samples with the same lowest scores for the five factors: .499, .714, .761, .512, .771. Is it a kind of floor effect? Is it a problem that factor scores are not normally distributed? If so, any solution? Thank you so much! PS: the data is a total population of freshmen in a college, and most of the items take up high percentage of negative response. 


This may be due to the items, for instance not allowing a separation of scores at the low end. It does not necessarily mean that the factor distribution is nonnormal. For an alternative approach using mixtures in Mplus, see Wall, M. M., Guo, J., & Amemiya, Y. (2012). Mixture factor analysis for approximating a nonnormally distributed continuous latent factor with continuous and dichotomous observed variables. Multivariate Behavioral Research, 47:2, 276313. 


Thank you, Bengt. The paper is quite interesting and useful to me. 

Daniel Lee posted on Monday, February 11, 2013  7:48 pm



Hi, just to make sure, I have a 18item scale on Likerttype scale (15). If I use WLSMV, do I need to worry about multivariate normality? I know this is a concern when you use ML, but wasn't sure for WLSMV. Thank you! 


With categorical outcomes, you don't need to worry about multivariate normality with WLSMV or ML if the variables are on the CATEOGORICAL list. In both cases, categorical variable methodology handles floor and ceiling effects. 


Hello, I have run an EFA on ordinal data (polychoric correlations) with the WLSMV estimation method on a number of different items that are rated on different Likert scales (e.g., some 7point Likert scales, some 5point Likert scales, some 4point Likert scales). The items seem to break up into factors based on their Likert scale scoring (e.g., all of the 7point items cluster together). Is it possible that the factor structure is a product of how the items have been scored? (e.g., items cluster based on likert scale scoring vs. underlying latent factors)? Thank you for your help. 


How did you expect the factors to load? It seems the content of items with a certain number of categories must be the same. 


Hello, I have an EFA model with 9 ordinal indicator variables (5 point likert scale) and one of these variables demonstrates floor/ceiling effect. I have specified all variables as categorical and used MLR estimation. 1) Will MLR deal with the floor/ceiling effect of my one troublesome variable? 2) Is ML method of EFA still suitable given the inclusion of this variable? Or would PAF be better? Many Thanks x 


1. Categorical data methodology deals with floor and ceiling effects using maximum likelihood and weighted least squares estimation for all variables on the CATEGORICAL list. 2. Yes. 


Hello Dr. Muthen, I have two questions. 1. I had conducted EFA with a binary data set with WLSMV estimation method. I expected to have WRMR instead of SRMR.However, SRMR was estimated. Can you explain why? 2. Let's suppose that twofactors model is the best model from EFA analysis.Is it good idea to conduct CFA with these two factors to confirm EFA result? I read somewhere that this is not good idea. Could you explain to me why? Have a good day! Thanks, SJ 


Hello Dr. Muthen, I am sorry again.. :) It there any meaning for twofactors CFA model (all indicators are explained by two factors (seems like EFA model) if it is not the latent growth model? Thanks, SJ 


1. With EFA the weighting is less important since the model doesn't mix say thresholds and correlations, but focuses on correlations alone. 2. CFA is not needed on top of EFA since you have all you need in the EFA (model fit, SEs). Regarding your last question I don't understand why you think a 2factor doesn't have meaning. Maybe you want to read a factor analysis book like Brown's. 


Hello Dr. Muthen, First of all, I always appreciate your response!! My second question was "It there any meaning for twofactors CFA model (all indicators are explained by two factors (seems like EFA model) if it is not the latent growth model?" Probably, my statement was not clear with my language barrier.  ; What I meant: In twofactors CFA model, each factor has totally identical indicators and then each indicator is explained by the two factors. It seems EFA model before a rotation. Then, it does not make sense for me to conduct CFA. If you understand like this, probably, I missed something. Then, if you are fine, can you let me know the book title (Author Brown as you suggested)? Thank you so much!! Sincerely, SJ 


A 2factor CFA model needs to apply at least 4 restrictions on the factor loadings and factor covariance matrix to be identified. That's what EFA does. Typically, CFA has more restrictions. Brown (2006) is a Guilford book called Conf. FA... We recommend it for CFA. 

Back to top 