Ian Zajac posted on Wednesday, August 15, 2007 - 8:03 pm
In am performing a CFA with dichotomous outcome variables. I get the EMPTY CELL warning regarding the bivariate table and the relevant variable names.
This, I have read, means that the tetrachoric correlation for a pair of variables is equal to 1. However, when I inspect the sample tetrachoric matrix, this is not the case at all. The correlations are generally moderate (@ .45 or so).
Am I looking at the right correlations? Also, could this be due to highly skewed variables? It appears that all the suspect variables are (i.e., >95% correct/incorrect).
Empty cells in bivariate tables are most common when variables have extreme cuts like 95/5. Empty cells imply correlations of one. I'm not sure what you are looking at so can't say if you are looking at the right thing. If you want further clarifiction, please send your input, data, output, and license number to firstname.lastname@example.org.
Ian Zajac posted on Monday, August 20, 2007 - 6:14 pm
OUPUT is sampstat
You get a number of statistics including > SUMMARY OF CATEGORICAL DATA PROPORTIONS > SAMPLE STATISTICS > ESTIMATED SAMPLE STATISTICS > SAMPLE TETRACHORIC CORRELATIONS
Given the warnings, I would expect than when I inspect the 'Sample tetrachoric correlations', I would find a correlation of 1 between the variables noted in the warnings. But, this isn't the case. The correlations are actually moderate and are never equal to 1.
Dear Linda, I have the exact same problem as Ian:"I get the EMPTY CELL warning regarding the bivariate table and the relevant variable names.
This, I have read, means that the tetrachoric correlation for a pair of variables is equal to 1. However, when I inspect the sample tetrachoric matrix, this is not the case at all. The correlations are generally moderate (@ .45 or so). "
For ordinal variables, is it necessary to ensure there are no empty cells in any of the bivariate tables. I am investigating the structure in 32 variables with 4 categories and will exclude 6 of the low frequency variables to address this problem. However, is it OK if there is the occasional empty cell remaining?
I am having just the same problem that Ian and Theresa have had. I get a slew of warning messages telling me that the bivariate table of (dichotomous variable) and (other dichotomous variable) has an empty cell. What does this mean? If I calculate the bivariate correlation between those two variables, it is not 1, and it is also not 0. It is generally moderate, just as Ian described.
I'm a little confused by what is being said here. In my case I have: THE BIVARIATE TABLE OF M7 AND M1 HAS AN EMPTY CELL. In traditional categorical data analysis you could put 0.5 in that cell or something of that sort. Is there a way to deal with this in Mplus besides just deleting a variable?
Having searched high and low for why my estimates may differ in my IRT models depending on my use of WLMSV vs. MLR I think this may be it.
In the default (WLMSV) mode, I get these warnings for a number of variables. Using MLR I don't. All data is nonmissing (I have no missing data). Does the MLR estimator simply handle the empty cells differently?
ML doesn't first compute latent correlations as in WLSMV, so results may differ a bit when there are zero cells (which hurts correlations). But note also that the default ML link is logit, whereas WLSMV uses probit.
1) Most p-values for the indicators (loadings) in the WLSMV (probit) are significant; in MLR (logit) they are not. That said, ICC curves are generally in accord. The one curve that literally reverses its direction is for the indicator that has the most empty cell warnings in WLMSV. As such, is MLR (logit) somehow taking into account the rarity of occurrence for this particular behavior (indicator) differently than probit? More specifically, given the high zero-inflation of the indicators is MLR/ logit a better choice?
2) What fascinates me is that the estimates of the IRT ZIP I ran (count) mimics the results of the MLR IRT run (categorical). Thus, my inclination is that using MLR in the binary case better addresses my data given such rare occurrences of some indicators? Or am I completely off base?
1) I don't think it is clear that ML is better than WSLMV with empty cells. Research with simulations would be needed to shed light on that I can't recall having seen that. - Anyone? I would think ML also suffers from empty cells since you then have limited information about association between pairs of items.
2) I am not sure one can take the count results as support for the binary ML advantage - they may suffer similarly from the empty cells.
Hi. I am testing invariance across gender using a CFA. The indicators are dichotomous. When I run the configural invariance analysis I get an error message that "the bivariate table has an empty cell." Through reading the forum, I understand that mplus recognizing empty cells as having a tetrachoric correlation of 1. Thus, I tried to trim some items but the problems remains. Could this just be an issue with too many zeros? Is there a way to fix this? Thanks!
When two dichotomous items have a zero cell in their bivariate table, only one variable should be used in the analysis.
Tait Medina posted on Thursday, February 26, 2015 - 10:52 am
I am estimating a single-factor model with 6 dichotomous items. The bivariate frequency table between 2 items has an empty cell. I am wondering if there has been any new thoughts on empty cells and ML? My initial thought is to drop one of the items since it is quite skewed (91% answer yes). But, I am wondering if anyone on this thread has any new thoughts on this, or has read any recent work that might shed light on how to handle this under ML. Thank you.
Having an empty cell in this case is equivalent to polychoric correlation of +-1 and you can combine the two items in one with 3 categories, for example 0 0 -> 0 0 1 -> 1 1 1 -> 2 That will preserve all the information in the data.
In the case of a ZIP model (count variable) I mentioned above two additional questions came to mind in looking at the output:
1) is it the case that loadings would only be provided for the non structural zero class, hence why only one set of loadings are provided
2) is it the case that the count variable is treated as continuous (non structural zero group)? If so, I was wondering if we are in IRT-land anymore (i.e., variable is not categorical or ordered categorical).
I'm still struggling to understand this. There are two reasons for an empty cell that I can think of.
The first is low coverage, in which the sample is too small to see the full variability of patterns in the population (i.e., all cells exist, but you don’t have enough observations to see it). This case would definitely be problematic for estimation of the correlation.
The other reason is that the difficulties of the items are very far apart, such that one value is extremely unlikely in the presence of another. For example, an incorrect response to an easy item (e.g., "Spell 'cat.'") is very unlikely to be paired with a correct response to a difficult item (e.g., "Spell 'ocelot.'"), whereas a correct response to the easy item can see a mix of correct and incorrect responses to the difficult item. This represents an empty cell situation that is actually reflective of reality, and to have an item pool that covers the full range of ability in the population, you would expect to see this pattern.
Is this distinction not important when designing an item pool and analyzing the resulting data?
I agree with you regarding the second reason for empty cells. If this is a prominent feature I think don't think regular factor analysis is suitable. Instead a model is needed that reflects the difficulty ordering, like a Guttman scale.
I am working with a test that measures child development, so the items are organized according to its difficulty (similar to the Jeff Williams example). If I construct a score by area or domain of development (language, motor, etc.) and then I run a CFA of the latent variable (development) on those scores, it makes sense or not? Will I be available to say something about the items?
Many of you will get quite a few empty cell warnings because your data is binary and your responses are naturally disproportional.
You might be hesitant to remove all the variables with an empty cell, as Linda has suggested. Whilst this is necessary, you can go about it in an efficient way. For instance, by removing the few variables which were most consistently related to empty cells, I no longer received a warning about empty cells. Of course, this does not mean that the correlations between remaining variables are any healthier - many of them likely still approach 1 and caution should be exercised. However, it does not mean you need to axe all variables listed.
I am performing a CFA using dichotomous variables (using 23 items for 6 factors, N = 142). I received two warnings after executing the CFA. The first one contains the EMPTY CELL warning regarding the bivariate table and the relevant names. I know that empty cells imply correlations of one, but the items are not highly skewed (at least one of the two items in which the EMPTY CELL warming is given). Is it a okay to include items in the CFA that contain the EMPTY CELL warning?
In addition, I received the following warning: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE HR5.
When I remove certain items, the ‘problem variable’ shifts to another variable. What can I do with this warning in order to solve the issue?