Anonymous posted on Friday, April 15, 2005 - 8:11 am
I am new in the field of CVMs and Mplus. I want to do EFAs (half the sample) and CFAs (half the sample). Categorical variables: 4 or 5 categories. Two questions:
1. How to perform a Pearson chi-square test of bivariate normality within Mplus (see Muthén, 1984)? Is it possible (and really necessary)? (The skewness of most of the items is quite large.)
2. Should I find “good” solutions (scales) and want to report the reliability for the scales (total sample), can I use Cronbach’s ALPHA (calculated in SPSS), because SPSS uses Pearson’s r? What would be proper alternative?
Thank you very much for your help!
BMuthen posted on Saturday, April 16, 2005 - 4:40 am
1. Mplus does not include tests for underlying normality. These tests are often too sensitive and the exact normality assumption may not be necessary -- see the recent Psych Methods paper on this topic. Skewness is not an issue with categorical outcomes.
2. Cronbach's alpha is for continuous items. If you are interested in how well the factors are measured, you would have to consider information function curves as done in item response theory (IRT). This is not yet available in Mplus.
Elina Dale posted on Monday, April 15, 2013 - 6:41 pm
Dear Dr. Muthen,
I am trying to conduct reliability eval with cat items. I thought I could use omega coefficient: omega = (sum of factor loadings)^2 / (sum of uniquenesses + (sum of factor loadings)^2)
Raykov & Marcoulides use it with categorical items as an approximate method because they obtain res variances using ANALYSIS: ESTIMATOR=MLR and do not declare their factor indicators to be categorical. They say it is OK for items with more than 2 possible scores (binary) and with minor clustering effect.
I am wondering why not use res variances obtained using the correct estimator for categorical clustered data that you get in the Output, Table R-SQUARE, at the very end.
Are the Residual variances for each item provided in Column 5 in the same table as R^2 the thetas used in estimation of reliability?
Raykov & Marcoulides (2011, Introduction to Psychometric Theory) do NOT say anywhere in their book that it is ok to use the cited in the above question omega coefficient formula for scale reliability with categorical items having at least 3 response options. What they say in their book is that if (i) all items have at least 5 response options, (ii) there’s no piling of responses at the smallest or largest category for an item, and (iii) there is no appreciable clustering effect of the studied subjects (cases), then using the omega coefficient formula presented in the question with ESTIMATOR = MLR; may be a reasonable approach. When items are not binary but with less than 5 options, they also hint to a possible approach based on parceling but indicate also its potentially serious limitations (producing possibly different estimates for different parceling choices). I am not aware of a published (perfectly) correct way in general for estimating reliability with items that are not binary or continuous, and which do not fulfill (i), (ii) and (iii) above.
Elina Dale posted on Tuesday, April 16, 2013 - 3:24 pm
Dear Dr. Muthen,
Thank you for your response! I understand all the points you've made and the conditions (i), (ii) and (iii) that are necessary in order to use ESTIMATOR=MLR to estimate residual variances for the ordinal factor indicators, as per R&M. My Q is whether I could use res variances provided in the last column of the table R-SQUARE that I obtain with WLSMV estimator and declare my factor indicators as categorical. As you have explained to me once, these are correct residual variances. Or did I misunderstand you? If these are res variances of the items, then I thought the formula for omega coeff could be applied, even if the 3 conditions are not met. The formula needs sum of uniquenesses in the denom and that's what I'm trying to obtain. If I'm using the correct estimator for ordinal data (and you said estimators for ordinal data are designed to handle piling) and to account for clustering, can I use the res variances provided in MPlus output for estimating sum of uniquenesses in the denom of the formula?
In general, for ordered categorical indicators, when I specify "COMPLEX" in the analysis to account for non-ignorable clustering, in what part of the output do I see residual variances for the items? Are these the residual variances in the same sense as residual variances in MLE in simple regression? Thank you again for your help!
The Estimator = MLR answer draws on the idea that sometimes it is ok to approximate ordinal variables as continuous variables, and the apply the continuous-variable formulas which build on a regular linear factor analysis model.
Your new question seems to ask - can we use these formulas for the estimates we get when treating the variables as categorical? In principle, the continuous-variable formulas would apply here also, but only with respect to the underlying y* latent response variables behind the categorical observed y's. Reliability formulas are concerned with sums of observed variables, however, not latent response variables. So the results would not have quite the same meaning. It is pretty much a research topic how to define reilability with categorical outcomes. I prefer to think in terms of how well a factor is measured, which is something that is also considered in Item Response Theory. That doesn't consider reliability of sums of observed variables.
Elina Dale posted on Wednesday, April 17, 2013 - 3:12 pm
Thank you, Dr. Muthen!
This makes sense to me now, although then I am surprised at the use of Cronbach's alpha or ICC as measures of internal consistency (reliability) in the field of scale development. Likert scales are widely applied and so are Cronbach's alpha or omega coefficients.
Do you think a better signal of internal consistency of the items in a scale would be review of polychor correlation matrix?
Also, then I am still confused with MPlus output. Even when my factor indicators are categorical and I declare them so (CATEGORICAL = i1 i3 i3 i4) and I perform CFA with WLSMV as my estimator, at the very end of the output, MPlus provides the R-SQUARE table that has Residual Variance for each of the observed(!) items. What do they mean then? Can they signal items with low reliability? Also, I thought for ordinal data residual variance was pi^2/3 for logistic models and 1(?) for probit. So, what are those residual variances that MPlus shows next to each categorical item? How are they estimated?
I think most researchers approximate a Likert scale as a continuous variable. With 5 categories I would only consider categorical variable treatment if there were strong floor or ceiling effects.
The Mplus R-square with categorical variables refers to the continuous underlying latent response variable (what we call u* variables).
There are different metrics to present residual variances in for categorical outcomes. The residual variances for our WLSMV estimator using the default Delta parameterization sets the u* variances at 1 and computes the residual variances as remainders to make the variance sum to that 1 value. With Theta parameterization, the residual variances are instead set to 1 (so the u* variance is greater than 1) - that's what you refer to as the probit metric. And with logit (which requires ML in Mplus) you are right that the residual variances are set at pi^2/3.
I think analysis of u* correlations - so e.g. polychoric correlations - is useful and so is ML analysis of categorical outcomes. It tells you a lot about the items.
But currently there doesn't seem to be a generally accepted reliability index with categorical outcomes - correct me if I am wrong, readers. And I think it has to do with the fact that we are not considering scales based on summing u*'s, but on summing u's.
I also think that there is an overwhelming and puzzling over-emphasis on Cronbach alpha reliability which psychometricians have criticized strongly for years (for instance Raykov, 1997 in MBR, 2012 book; Sijtsma, 2009 Psychometrika). Instead, I think folks should use the factor model and focus on how well the factors can be measured and there is a lot of writing on how to do that, including contributions from Item Response Theory.
Elina Dale posted on Thursday, April 18, 2013 - 12:25 pm
Dear Dr. Muthen,
Thank you so much for this very comprehensive and clear response to my questions. This is very helpful to me! Thank you! Elina
Just touching down to see if there has been any further discussion on this issue. Recently received a review stating one can not calculate Omega hierarchical when using WLSMV b/c one can not obtain res. vars using this estimator. However, I used MODEL CONSTRAINT to do so. FWIW, indicators were declared as categorical with 3 response options.
I ask as I have also seen others calculate Omega hierarchical when using WLSMV. Is this still ambiguous issue?
Delving a bit more into this issue and re-reading your statement above ** "But currently there doesn't seem to be a generally accepted reliability index with categorical outcomes - correct me if I am wrong, readers. And I think it has to do with the fact that we are not considering scales based on summing u*'s, but on summing u's." ** I believe this indeed the crux of the issue. For example, a recent thread I was pointed to on the issue seems to articulate just this:
My query then is: Let's say one does a bifactor model with categorical data (WLSMV, Delta), computes values of Omega using Model Constraint, and reports them based on the bifactor model. Is the assumption then, that if one were to go back to the raw data and compute the scales, that the model-implied Omega reliability would not be precise to the observed raw data because of the underlying latent y*'s based on the categorical y's [estimated using the polychoric correlation matrix]? If true, would not the model-implied Omega still be a useful heuristic to approximate scale reliability based on the observed y's? I guess I'm surprised that any bias from the y* approach to categorical items would be so great as to yield severe departures based observed y's.
See also my old paper (obtained via About Us through my UCLA site's Full Paper list)
Unpublished 1) Muthén, B. (1977). Some results on using summed raw scores and factor scores from dichotomous items in the estimation of structural equation models. Unpublished Technical Report, University of Uppsala, Sweden. [Available as PDF]
L1 refers to the first loading which is fixed at 1 as the default. So it is not a parameter to be estimated which is what parameter labels should correspond to. You can either give the value 1 in the omega formula or set the metric in the factor variance instead.
I am not sure we have reached a consensus of how to compute a reliability with categorical outcomes. I contacted Tenko Raykov who has worked on related matters and he allowed me to post his view:
“I am yet to see a proof of the claimed reliability estimator in Yang & Green (2009) being indeed equal to the ratio of true variance to observed variance of the overall sum score (whether weighted or not; see also next sentence). This proof is to explicate also (i) all its assumptions, (ii) the methods to be used for testing them, (iii) what assumptions these latter methods make themselves (and how to test them), and (iv) if these methods indeed yield dependable results with respect to what they are supposed/expected to accomplish.”