Message/Author 

Ian Zajac posted on Thursday, August 16, 2007  2:03 am



In am performing a CFA with dichotomous outcome variables. I get the EMPTY CELL warning regarding the bivariate table and the relevant variable names. This, I have read, means that the tetrachoric correlation for a pair of variables is equal to 1. However, when I inspect the sample tetrachoric matrix, this is not the case at all. The correlations are generally moderate (@ .45 or so). Am I looking at the right correlations? Also, could this be due to highly skewed variables? It appears that all the suspect variables are (i.e., >95% correct/incorrect). Regards  ian. 


Empty cells in bivariate tables are most common when variables have extreme cuts like 95/5. Empty cells imply correlations of one. I'm not sure what you are looking at so can't say if you are looking at the right thing. If you want further clarifiction, please send your input, data, output, and license number to support@statmodel.com. 

Ian Zajac posted on Tuesday, August 21, 2007  12:14 am



Hi Linda, OUPUT is sampstat You get a number of statistics including > SUMMARY OF CATEGORICAL DATA PROPORTIONS > SAMPLE STATISTICS > ESTIMATED SAMPLE STATISTICS > SAMPLE TETRACHORIC CORRELATIONS Given the warnings, I would expect than when I inspect the 'Sample tetrachoric correlations', I would find a correlation of 1 between the variables noted in the warnings. But, this isn't the case. The correlations are actually moderate and are never equal to 1. Should I be using a different output command? Regards  Ian. 


I would need to see exactly what you are getting to answer you. Please send your input, data, output, and license number to support@statmodel.com. 


Dear Linda, I have the exact same problem as Ian:"I get the EMPTY CELL warning regarding the bivariate table and the relevant variable names. This, I have read, means that the tetrachoric correlation for a pair of variables is equal to 1. However, when I inspect the sample tetrachoric matrix, this is not the case at all. The correlations are generally moderate (@ .45 or so). " Is there a solution to this problem? many thanks in advance! kind regards, Theresa 


The sample correlation matrix will not necessarily show one. That is the problem. For binary variables , you need bivariate tables without zero cells for this type of analysis. 


For ordinal variables, is it necessary to ensure there are no empty cells in any of the bivariate tables. I am investigating the structure in 32 variables with 4 categories and will exclude 6 of the low frequency variables to address this problem. However, is it OK if there is the occasional empty cell remaining? Many thanks. 


It should be okay to have some zero cells for ordinal variables. The problem is that there is no way to tell if it is okay. I would be cautious. 


I am having just the same problem that Ian and Theresa have had. I get a slew of warning messages telling me that the bivariate table of (dichotomous variable) and (other dichotomous variable) has an empty cell. What does this mean? If I calculate the bivariate correlation between those two variables, it is not 1, and it is also not 0. It is generally moderate, just as Ian described. 


An empty cell implies a correlation of one. The correlation is not estimated at one and this is the problem. Both variables should not be used in the analysis. 


I'm a little confused by what is being said here. In my case I have: THE BIVARIATE TABLE OF M7 AND M1 HAS AN EMPTY CELL. In traditional categorical data analysis you could put 0.5 in that cell or something of that sort. Is there a way to deal with this in Mplus besides just deleting a variable? Thanks, Tom 


We do add a constant but this does not really solve the problem. An empty cell implies a correlation of one so both variables should not be used in the analysis. 


Having searched high and low for why my estimates may differ in my IRT models depending on my use of WLMSV vs. MLR I think this may be it. In the default (WLMSV) mode, I get these warnings for a number of variables. Using MLR I don't. All data is nonmissing (I have no missing data). Does the MLR estimator simply handle the empty cells differently? 


ML doesn't first compute latent correlations as in WLSMV, so results may differ a bit when there are zero cells (which hurts correlations). But note also that the default ML link is logit, whereas WLSMV uses probit. 


Thank you! Two followups. 1) Most pvalues for the indicators (loadings) in the WLSMV (probit) are significant; in MLR (logit) they are not. That said, ICC curves are generally in accord. The one curve that literally reverses its direction is for the indicator that has the most empty cell warnings in WLMSV. As such, is MLR (logit) somehow taking into account the rarity of occurrence for this particular behavior (indicator) differently than probit? More specifically, given the high zeroinflation of the indicators is MLR/ logit a better choice? 2) What fascinates me is that the estimates of the IRT ZIP I ran (count) mimics the results of the MLR IRT run (categorical). Thus, my inclination is that using MLR in the binary case better addresses my data given such rare occurrences of some indicators? Or am I completely off base? 


1) I don't think it is clear that ML is better than WSLMV with empty cells. Research with simulations would be needed to shed light on that I can't recall having seen that.  Anyone? I would think ML also suffers from empty cells since you then have limited information about association between pairs of items. 2) I am not sure one can take the count results as support for the binary ML advantage  they may suffer similarly from the empty cells. 

Matt Keough posted on Sunday, March 23, 2014  10:19 am



Hi. I am testing invariance across gender using a CFA. The indicators are dichotomous. When I run the configural invariance analysis I get an error message that "the bivariate table has an empty cell." Through reading the forum, I understand that mplus recognizing empty cells as having a tetrachoric correlation of 1. Thus, I tried to trim some items but the problems remains. Could this just be an issue with too many zeros? Is there a way to fix this? Thanks! 


When two dichotomous items have a zero cell in their bivariate table, only one variable should be used in the analysis. 

Tait Medina posted on Thursday, February 26, 2015  4:52 pm



I am estimating a singlefactor model with 6 dichotomous items. The bivariate frequency table between 2 items has an empty cell. I am wondering if there has been any new thoughts on empty cells and ML? My initial thought is to drop one of the items since it is quite skewed (91% answer yes). But, I am wondering if anyone on this thread has any new thoughts on this, or has read any recent work that might shed light on how to handle this under ML. Thank you. 


Having an empty cell in this case is equivalent to polychoric correlation of +1 and you can combine the two items in one with 3 categories, for example 0 0 > 0 0 1 > 1 1 1 > 2 That will preserve all the information in the data. 


In the case of a ZIP model (count variable) I mentioned above two additional questions came to mind in looking at the output: 1) is it the case that loadings would only be provided for the non structural zero class, hence why only one set of loadings are provided 2) is it the case that the count variable is treated as continuous (non structural zero group)? If so, I was wondering if we are in IRTland anymore (i.e., variable is not categorical or ordered categorical). Thank you. 


1) You should fix loadings at zero for the zero class (maybe Mplus is effectively doing that since I think you fix all parameters for the factor at zero). 2) The count variable is treated as Poisson (count). 


I'm still struggling to understand this. There are two reasons for an empty cell that I can think of. The first is low coverage, in which the sample is too small to see the full variability of patterns in the population (i.e., all cells exist, but you don’t have enough observations to see it). This case would definitely be problematic for estimation of the correlation. The other reason is that the difficulties of the items are very far apart, such that one value is extremely unlikely in the presence of another. For example, an incorrect response to an easy item (e.g., "Spell 'cat.'") is very unlikely to be paired with a correct response to a difficult item (e.g., "Spell 'ocelot.'"), whereas a correct response to the easy item can see a mix of correct and incorrect responses to the difficult item. This represents an empty cell situation that is actually reflective of reality, and to have an item pool that covers the full range of ability in the population, you would expect to see this pattern. Is this distinction not important when designing an item pool and analyzing the resulting data? 


I agree with you regarding the second reason for empty cells. If this is a prominent feature I think don't think regular factor analysis is suitable. Instead a model is needed that reflects the difficulty ordering, like a Guttman scale. 

Back to top 