reliability measures PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Anonymous posted on Friday, April 15, 2005 - 8:11 am
I am new in the field of CVMs and Mplus. I want to do EFAs (half the sample) and CFAs (half the sample). Categorical variables: 4 or 5 categories. Two questions:

1. How to perform a Pearson chi-square test of bivariate normality within Mplus (see Muthén, 1984)? Is it possible (and really necessary)? (The skewness of most of the items is quite large.)

2. Should I find “good” solutions (scales) and want to report the reliability for the scales (total sample), can I use Cronbach’s ALPHA (calculated in SPSS), because SPSS uses Pearson’s r? What would be proper alternative?

Thank you very much for your help!
 BMuthen posted on Saturday, April 16, 2005 - 4:40 am
1. Mplus does not include tests for underlying normality. These tests are often too sensitive and the exact normality assumption may not be necessary -- see the recent Psych Methods paper on this topic. Skewness is not an issue with categorical outcomes.

2. Cronbach's alpha is for continuous items. If you are interested in how well the factors are measured, you would have to consider information function curves as done in item response theory (IRT). This is not yet available in Mplus.
 Elina Dale posted on Monday, April 15, 2013 - 6:41 pm
Dear Dr. Muthen,

I am trying to conduct reliability eval with cat items. I thought I could use omega coefficient:
omega = (sum of factor loadings)^2 / (sum of uniquenesses + (sum of factor loadings)^2)

Raykov & Marcoulides use it with categorical items as an approximate method because they obtain res variances using ANALYSIS: ESTIMATOR=MLR and do not declare their factor indicators to be categorical. They say it is OK for items with more than 2 possible scores (binary) and with minor clustering effect.

I am wondering why not use res variances obtained using the correct estimator for categorical clustered data that you get in the Output, Table R-SQUARE, at the very end.

Are the Residual variances for each item provided in Column 5 in the same table as R^2 the thetas used in estimation of reliability?

Thank you!
 Linda K. Muthen posted on Tuesday, April 16, 2013 - 1:09 pm
See the following from Tenko Raykov:

Raykov & Marcoulides (2011, Introduction to Psychometric Theory) do NOT say anywhere in their book that it is ok to use the cited in the above question omega coefficient formula for scale reliability with categorical items having at least 3 response options. What they say in their book is that if (i) all items have at least 5 response options, (ii) there’s no piling of responses at the smallest or largest category for an item, and (iii) there is no appreciable clustering effect of the studied subjects (cases), then using the omega coefficient formula presented in the question with ESTIMATOR = MLR; may be a reasonable approach. When items are not binary but with less than 5 options, they also hint to a possible approach based on parceling but indicate also its potentially serious limitations (producing possibly different estimates for different parceling choices). I am not aware of a published (perfectly) correct way in general for estimating reliability with items that are not binary or continuous, and which do not fulfill (i), (ii) and (iii) above.
 Elina Dale posted on Tuesday, April 16, 2013 - 3:24 pm
Dear Dr. Muthen,

Thank you for your response! I understand all the points you've made and the conditions (i), (ii) and (iii) that are necessary in order to use ESTIMATOR=MLR to estimate residual variances for the ordinal factor indicators, as per R&M.
My Q is whether I could use res variances provided in the last column of the table R-SQUARE that I obtain with WLSMV estimator and declare my factor indicators as categorical. As you have explained to me once, these are correct residual variances. Or did I misunderstand you? If these are res variances of the items, then I thought the formula for omega coeff could be applied, even if the 3 conditions are not met. The formula needs sum of uniquenesses in the denom and that's what I'm trying to obtain. If I'm using the correct estimator for ordinal data (and you said estimators for ordinal data are designed to handle piling) and to account for clustering, can I use the res variances provided in MPlus output for estimating sum of uniquenesses in the denom of the formula?

In general, for ordered categorical indicators, when I specify "COMPLEX" in the analysis to account for non-ignorable clustering, in what part of the output do I see residual variances for the items? Are these the residual variances in the same sense as residual variances in MLE in simple regression? Thank you again for your help!
 Bengt O. Muthen posted on Wednesday, April 17, 2013 - 1:59 pm
The Estimator = MLR answer draws on the idea that sometimes it is ok to approximate ordinal variables as continuous variables, and the apply the continuous-variable formulas which build on a regular linear factor analysis model.

Your new question seems to ask - can we use these formulas for the estimates we get when treating the variables as categorical? In principle, the continuous-variable formulas would apply here also, but only with respect to the underlying y* latent response variables behind the categorical observed y's. Reliability formulas are concerned with sums of observed variables, however, not latent response variables. So the results would not have quite the same meaning. It is pretty much a research topic how to define reilability with categorical outcomes. I prefer to think in terms of how well a factor is measured, which is something that is also considered in Item Response Theory. That doesn't consider reliability of sums of observed variables.
 Elina Dale posted on Wednesday, April 17, 2013 - 3:12 pm
Thank you, Dr. Muthen!

This makes sense to me now, although then I am surprised at the use of Cronbach's alpha or ICC as measures of internal consistency (reliability) in the field of scale development. Likert scales are widely applied and so are Cronbach's alpha or omega coefficients.

Do you think a better signal of internal consistency of the items in a scale would be review of polychor correlation matrix?

Also, then I am still confused with MPlus output. Even when my factor indicators are categorical and I declare them so (CATEGORICAL = i1 i3 i3 i4) and I perform CFA with WLSMV as my estimator, at the very end of the output, MPlus provides the R-SQUARE table that has Residual Variance for each of the observed(!) items. What do they mean then? Can they signal items with low reliability? Also, I thought for ordinal data residual variance was pi^2/3 for logistic models and 1(?) for probit. So, what are those residual variances that MPlus shows next to each categorical item? How are they estimated?
 Bengt O. Muthen posted on Thursday, April 18, 2013 - 12:01 pm
I think most researchers approximate a Likert scale as a continuous variable. With 5 categories I would only consider categorical variable treatment if there were strong floor or ceiling effects.

The Mplus R-square with categorical variables refers to the continuous underlying latent response variable (what we call u* variables).

There are different metrics to present residual variances in for categorical outcomes. The residual variances for our WLSMV estimator using the default Delta parameterization sets the u* variances at 1 and computes the residual variances as remainders to make the variance sum to that 1 value. With Theta parameterization, the residual variances are instead set to 1 (so the u* variance is greater than 1) - that's what you refer to as the probit metric. And with logit (which requires ML in Mplus) you are right that the residual variances are set at pi^2/3.

I think analysis of u* correlations - so e.g. polychoric correlations - is useful and so is ML analysis of categorical outcomes. It tells you a lot about the items.

But currently there doesn't seem to be a generally accepted reliability index with categorical outcomes - correct me if I am wrong, readers. And I think it has to do with the fact that we are not considering scales based on summing u*'s, but on summing u's.

I also think that there is an overwhelming and puzzling over-emphasis on Cronbach alpha reliability which psychometricians have criticized strongly for years (for instance Raykov, 1997 in MBR, 2012 book; Sijtsma, 2009 Psychometrika). Instead, I think folks should use the factor model and focus on how well the factors can be measured and there is a lot of writing on how to do that, including contributions from Item Response Theory.
 Elina Dale posted on Thursday, April 18, 2013 - 12:25 pm
Dear Dr. Muthen,

Thank you so much for this very comprehensive and clear response to my questions. This is very helpful to me! Thank you! Elina
 J.D. Haltigan posted on Monday, September 25, 2017 - 10:50 pm
Just touching down to see if there has been any further discussion on this issue. Recently received a review stating one can not calculate Omega hierarchical when using WLSMV b/c one can not obtain res. vars using this estimator. However, I used MODEL CONSTRAINT to do so. FWIW, indicators were declared as categorical with 3 response options.

I ask as I have also seen others calculate Omega hierarchical when using WLSMV. Is this still ambiguous issue?
 Bengt O. Muthen posted on Tuesday, September 26, 2017 - 10:26 am
The latest on this issue that I know about is in the FAQ by Raykov:

Reliability - binary and ordinal items

I would add that reliability can be expressed in terms of y* variables (this was done already in Linda Muthen's 1983 dissertation), but an issue is how this translates to y variables.
 J.D. Haltigan posted on Saturday, September 30, 2017 - 4:38 pm
Hi Bengt,

Delving a bit more into this issue and re-reading your statement above
"But currently there doesn't seem to be a generally accepted reliability index with categorical outcomes - correct me if I am wrong, readers. And I think it has to do with the fact that we are not considering scales based on summing u*'s, but on summing u's."
I believe this indeed the crux of the issue. For example, a recent thread I was pointed to on the issue seems to articulate just this:!topic/mirt-package/4D0dy46Q_Qc

My query then is: Let's say one does a bifactor model with categorical data (WLSMV, Delta), computes values of Omega using Model Constraint, and reports them based on the bifactor model. Is the assumption then, that if one were to go back to the raw data and compute the scales, that the model-implied Omega reliability would not be precise to the observed raw data because of the underlying latent y*'s based on the categorical y's [estimated using the polychoric correlation matrix]? If true, would not the model-implied Omega still be a useful heuristic to approximate scale reliability based on the observed y's? I guess I'm surprised that any bias from the y* approach to categorical items would be so great as to yield severe departures based observed y's.
 Bengt O. Muthen posted on Sunday, October 01, 2017 - 12:39 pm
Q1: Right

Q2: Perhaps/probably.

See also my old paper (obtained via About Us through my UCLA site's Full Paper list)

Unpublished 1) Muthén, B. (1977). Some results on using summed raw scores and factor scores from dichotomous items in the estimation of structural equation models. Unpublished Technical Report, University of Uppsala, Sweden. [Available as PDF]
 J.D. Haltigan posted on Sunday, October 01, 2017 - 3:07 pm
Thanks, Bengt. Have dl'ed and looking forward to reading.
 Geir Pedersen posted on Wednesday, January 15, 2020 - 11:23 am
Dear Dr. Muthen,
I am trying to estimate Omega coefficient for three latent variables. Unfortunately I get the following error message:

Unknown parameter label in MODEL CONSTRAINT: L1

Can you please help me out here? The model is as follows:

Inadequa BY i1f5 i7f5 i9f5 i11f5 (L1-L4);
Idealize BY i2f5 i4f5 i8f5* (L5-L7);
Confiden BY i6f5 i10f5 i12f5* (L8-L10);
i1f5 i7f5 i9f5 i11f5 (R1-R4);
i2f5 i4f5 i8f5 (R5-R7);
i6f5 i10f5 i12f5 (R8-R10);

NEW(omegaIn omegaId omegaCo);
omegaIn = (L1+L2+L3+L4)^2/((L1+L2+L3+L4)^2 +R1+R2+R3+R4);
omegaId = (L5+L6+L7)^2/((L5+L6+L7)^2 +R5+R6+R7);
omegaCo = (L8+L9+L10)^2/((L8+L9+L10)^2 +R8+R9+R10);

 Bengt O. Muthen posted on Wednesday, January 15, 2020 - 1:00 pm
L1 refers to the first loading which is fixed at 1 as the default. So it is not a parameter to be estimated which is what parameter labels should correspond to. You can either give the value 1 in the omega formula or set the metric in the factor variance instead.
 Geir Pedersen posted on Thursday, January 16, 2020 - 12:57 am
Dear Dr. Muthen,
Thank you so much. Now the estimation terminated normally.
 Ebrahim Hamedi posted on Friday, March 20, 2020 - 5:41 pm
there is a document for omega reliability on your website which seems to be for continuous variables. is there a separate document for ordered categorical omega?

many thanks in advance
 Bengt O. Muthen posted on Saturday, March 21, 2020 - 11:28 am
I am not sure that a generally accepted omega exists for categorical outcomes. See our FAQ:

Reliability - binary and ordinal items
 Andrew Johnson posted on Sunday, March 22, 2020 - 11:05 pm
Hi Bengt,

What's your opinion on the categorical Omega proposed by Green and Yang (2009), which is used by the R package MBESS?

Also some interesting Bayesian comparisons in Yang and Xia (2019)

Green, S.B., Yang, Y. Reliability of Summed Item Scores Using Structural Equation Modeling: An Alternative to Coefficient Alpha. Psychometrika 74, 155–167 (2009)

Yang, Y., & Xia, Y. (2019). Categorical Omega With Small Sample Sizes via Bayesian Estimation: An Alternative to Frequentist Estimators. Educational and Psychological Measurement, 79(1), 19–39.
 Bengt O. Muthen posted on Tuesday, March 24, 2020 - 11:18 am
I am not sure we have reached a consensus of how to compute a reliability with categorical outcomes. I contacted Tenko Raykov who has worked on related matters and he allowed me to post his view:

“I am yet to see a proof of the claimed reliability estimator in Yang & Green (2009) being indeed equal to the ratio of true variance to observed variance of the overall sum score (whether weighted or not; see also next sentence). This proof is to explicate also (i) all its assumptions, (ii) the methods to be used for testing them, (iii) what assumptions these latter methods make themselves (and how to test them), and (iv) if these methods indeed yield dependable results with respect to what they are supposed/expected to accomplish.”
 Andrew Johnson posted on Wednesday, March 25, 2020 - 6:59 pm
Very interesting, thanks!
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message