

Grouping categories of ordinal data 

Message/Author 


I have a survey sample (N=1780) with seven categorical items, measured on an ordinal scale (0 to 10). The data is sparse, and the univariate analysis of the items shows: a ceiling effect (between 21.4% and 51.9% for different items), some null categories (for a single item), and several (29) very low categories (under 2%). My idea is to explore best fitting models between Latent Class Analysis, Factor Analysis, and Factor Mixture Analysis. And I’ve some doubts about which grouping strategy to follow: a) using the same cutpoints for the seven items, which roughly speaking gives three categories; or b) using different cutpoints for each variable, which will give me between two to four categories. If I consider response patterns, three categories seems too much, and perhaps it would be better to use two categories for all the items. If I look to the results of a multiple correspondence analysis, it seems that it is better to use different cutpoints for each item. Finally, in a previous published research of similar items on a small sample (N=146), with 6 items on a 5 points likert scale using Mplus, I didn’t give any importance to the issue (peccavi) and ended by not grouping the categories (without been questioned by reviewers). Sorry for the length of this post. Any help or references would be very much appreciated. Thanking you in advance, Fernando 


After thinking on how ordinal variables are obtained (by putting behind them a latent continuous variable), I think that the best alternative is to use different cutpoints for each item. Do you agree? (Nevertheless, I also remember reading in a paper that the lowest damage is done by equal cutpoints...) Obviously, given the data distribution a two part model or a censored approach seem inappropriate. Once again, any help or references would be very much appreciated. Thanks, Fernando 


Tough question and I doubt that there is any really useful literature to give a definite guidance to this. I would lean towards either ignoring the problem (although 52% censoring is a bit worrisome) and treat them as continuous or to categorize into say 3 categories for all. But you could also keep the max number of categories that you have data for for each item. If you are concerned, you may also compare solutions for different approaches to this. There shouldn't be sensitivity to the choice. 


Thank you very much for your quick answer. In references I've seen that: 1. If the variables are measured in the same scale, use common cutpoints. If not, equilibrate frequencies (Escofier B, Pagés J., 1997, pg. 66). 2. The bivariate tables must agree with Cochran's rule (<20 cells with values under 5), and zerovalued cells must be avoided for ML estimators. (Agresti, 2002, pgs. 391398). 3. A given numerical data is not to be taken with all its precision, but for its meaning. (Benzécri, 2006. Modulad (35), pg. 2). So, no definite answer. Thanks again, Fernando 

Back to top 

