Mplus Discussion >> Empty Bivariate Table is not equal to 1?

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Empty Bivariate Table is not equal to 1?

Mplus Discussion > Confirmatory Factor Analysis >

Message/Author

Ian Zajac posted on Wednesday, August 15, 2007 - 8:03 pm

In am performing a CFA with dichotomous outcome variables. I get the EMPTY CELL warning regarding the bivariate table and the relevant variable names.

This, I have read, means that the tetrachoric correlation for a pair of variables is equal to 1. However, when I inspect the sample tetrachoric matrix, this is not the case at all. The correlations are generally moderate (@ .45 or so).

Am I looking at the right correlations?
Also, could this be due to highly skewed variables? It appears that all the suspect variables are (i.e., >95% correct/incorrect).

Regards -
ian.

Linda K. Muthen posted on Thursday, August 16, 2007 - 6:45 am

Empty cells in bivariate tables are most common when variables have extreme cuts like 95/5. Empty cells imply correlations of one. I'm not sure what you are looking at so can't say if you are looking at the right thing. If you want further clarifiction, please send your input, data, output, and license number to support@statmodel.com.

Ian Zajac posted on Monday, August 20, 2007 - 6:14 pm

Hi Linda,

OUPUT is sampstat

You get a number of statistics including
> SUMMARY OF CATEGORICAL DATA PROPORTIONS
> SAMPLE STATISTICS
> ESTIMATED SAMPLE STATISTICS
> SAMPLE TETRACHORIC CORRELATIONS

Given the warnings, I would expect than when I inspect the 'Sample tetrachoric correlations', I would find a correlation of 1 between the variables noted in the warnings. But, this isn't the case. The correlations are actually moderate and are never equal to 1.

Should I be using a different output command?

Regards -
Ian.

Linda K. Muthen posted on Thursday, August 23, 2007 - 11:14 am

I would need to see exactly what you are getting to answer you. Please send your input, data, output, and license number to support@statmodel.com.

Theresa Dicke posted on Wednesday, March 30, 2011 - 2:58 am

Dear Linda,
I have the exact same problem as Ian:"I get the EMPTY CELL warning regarding the bivariate table and the relevant variable names.

This, I have read, means that the tetrachoric correlation for a pair of variables is equal to 1. However, when I inspect the sample tetrachoric matrix, this is not the case at all. The correlations are generally moderate (@ .45 or so). "

Is there a solution to this problem?

many thanks in advance!
kind regards, Theresa

Linda K. Muthen posted on Wednesday, March 30, 2011 - 6:52 am

The sample correlation matrix will not necessarily show one. That is the problem. For binary variables , you need bivariate tables without zero cells for this type of analysis.

Helen Skerman posted on Wednesday, June 22, 2011 - 3:51 pm

For ordinal variables, is it necessary to ensure there are no empty cells in any of the bivariate tables. I am investigating the structure in 32 variables with 4 categories and will exclude 6 of the low frequency variables to address this problem. However, is it OK if there is the occasional empty cell remaining?

Many thanks.

Linda K. Muthen posted on Thursday, June 23, 2011 - 11:08 am

It should be okay to have some zero cells for ordinal variables. The problem is that there is no way to tell if it is okay. I would be cautious.

Rebecca Fortgang posted on Monday, June 11, 2012 - 1:17 pm

I am having just the same problem that Ian and Theresa have had. I get a slew of warning messages telling me that the bivariate table of (dichotomous variable) and (other dichotomous variable) has an empty cell. What does this mean? If I calculate the bivariate correlation between those two variables, it is not 1, and it is also not 0. It is generally moderate, just as Ian described.

Linda K. Muthen posted on Monday, June 11, 2012 - 1:27 pm

An empty cell implies a correlation of one. The correlation is not estimated at one and this is the problem. Both variables should not be used in the analysis.

Thomas A. Schmitt posted on Wednesday, August 22, 2012 - 8:53 am

I'm a little confused by what is being said here. In my case I have: THE BIVARIATE TABLE OF M7 AND M1 HAS AN EMPTY CELL. In traditional categorical data analysis you could put 0.5 in that cell or something of that sort. Is there a way to deal with this in Mplus besides just deleting a variable?

Thanks,

Tom

Linda K. Muthen posted on Wednesday, August 22, 2012 - 9:23 am

We do add a constant but this does not really solve the problem. An empty cell implies a correlation of one so both variables should not be used in the analysis.

J.D. Haltigan posted on Wednesday, February 19, 2014 - 11:13 am

Having searched high and low for why my estimates may differ in my IRT models depending on my use of WLMSV vs. MLR I think this may be it.

In the default (WLMSV) mode, I get these warnings for a number of variables. Using MLR I don't. All data is nonmissing (I have no missing data). Does the MLR estimator simply handle the empty cells differently?

Bengt O. Muthen posted on Wednesday, February 19, 2014 - 11:32 am

ML doesn't first compute latent correlations as in WLSMV, so results may differ a bit when there are zero cells (which hurts correlations). But note also that the default ML link is logit, whereas WLSMV uses probit.

J.D. Haltigan posted on Wednesday, February 19, 2014 - 12:02 pm

Thank you! Two follow-ups.

1) Most p-values for the indicators (loadings) in the WLSMV (probit) are significant; in MLR (logit) they are not. That said, ICC curves are generally in accord. The one curve that literally reverses its direction is for the indicator that has the most empty cell warnings in WLMSV. As such, is MLR (logit) somehow taking into account the rarity of occurrence for this particular behavior (indicator) differently than probit? More specifically, given the high zero-inflation of the indicators is MLR/ logit a better choice?

2) What fascinates me is that the estimates of the IRT ZIP I ran (count) mimics the results of the MLR IRT run (categorical). Thus, my inclination is that using MLR in the binary case better addresses my data given such rare occurrences of some indicators? Or am I completely off base?

Bengt O. Muthen posted on Thursday, February 20, 2014 - 2:16 pm

1) I don't think it is clear that ML is better than WSLMV with empty cells. Research with simulations would be needed to shed light on that I can't recall having seen that. - Anyone? I would think ML also suffers from empty cells since you then have limited information about association between pairs of items.

2) I am not sure one can take the count results as support for the binary ML advantage - they may suffer similarly from the empty cells.

Matt Keough posted on Sunday, March 23, 2014 - 5:19 am

Hi. I am testing invariance across gender using a CFA. The indicators are dichotomous. When I run the configural invariance analysis I get an error message that "the bivariate table has an empty cell." Through reading the forum, I understand that mplus recognizing empty cells as having a tetrachoric correlation of 1. Thus, I tried to trim some items but the problems remains. Could this just be an issue with too many zeros? Is there a way to fix this? Thanks!

Linda K. Muthen posted on Sunday, March 23, 2014 - 6:28 am

When two dichotomous items have a zero cell in their bivariate table, only one variable should be used in the analysis.

Tait Medina posted on Thursday, February 26, 2015 - 10:52 am

I am estimating a single-factor model with 6 dichotomous items. The bivariate frequency table between 2 items has an empty cell. I am wondering if there has been any new thoughts on empty cells and ML? My initial thought is to drop one of the items since it is quite skewed (91% answer yes). But, I am wondering if anyone on this thread has any new thoughts on this, or has read any recent work that might shed light on how to handle this under ML. Thank you.

Tihomir Asparouhov posted on Thursday, February 26, 2015 - 7:02 pm

Having an empty cell in this case is equivalent to polychoric correlation of +-1 and you can combine the two items in one with 3 categories, for example
0 0 -> 0
0 1 -> 1
1 1 -> 2
That will preserve all the information in the data.

J.D. Haltigan posted on Wednesday, April 08, 2015 - 12:49 pm

In the case of a ZIP model (count variable) I mentioned above two additional questions came to mind in looking at the output:

1) is it the case that loadings would only be provided for the non structural zero class, hence why only one set of loadings are provided

2) is it the case that the count variable is treated as continuous (non structural zero group)? If so, I was wondering if we are in IRT-land anymore (i.e., variable is not categorical or ordered categorical).

Thank you.

Bengt O. Muthen posted on Wednesday, April 08, 2015 - 6:18 pm

1) You should fix loadings at zero for the zero class (maybe Mplus is effectively doing that since I think you fix all parameters for the factor at zero).

2) The count variable is treated as Poisson (count).

Jeff Williams posted on Friday, September 11, 2015 - 8:47 am

I'm still struggling to understand this. There are two reasons for an empty cell that I can think of.

The first is low coverage, in which the sample is too small to see the full variability of patterns in the population (i.e., all cells exist, but you don�t have enough observations to see it). This case would definitely be problematic for estimation of the correlation.

The other reason is that the difficulties of the items are very far apart, such that one value is extremely unlikely in the presence of another. For example, an incorrect response to an easy item (e.g., "Spell 'cat.'") is very unlikely to be paired with a correct response to a difficult item (e.g., "Spell 'ocelot.'"), whereas a correct response to the easy item can see a mix of correct and incorrect responses to the difficult item. This represents an empty cell situation that is actually reflective of reality, and to have an item pool that covers the full range of ability in the population, you would expect to see this pattern.

Is this distinction not important when designing an item pool and analyzing the resulting data?

Bengt O. Muthen posted on Saturday, September 12, 2015 - 5:04 pm

I agree with you regarding the second reason for empty cells. If this is a prominent feature I think don't think regular factor analysis is suitable. Instead a model is needed that reflects the difficulty ordering, like a Guttman scale.

Deiby Cubides Mateus posted on Wednesday, June 07, 2017 - 8:53 am

I am working with a test that measures child development, so the items are organized according to its difficulty (similar to the Jeff Williams example). If I construct a score by area or domain of development (language, motor, etc.) and then I run a CFA of the latent variable (development) on those scores, it makes sense or not? Will I be available to say something about the items?

Bengt O. Muthen posted on Wednesday, June 07, 2017 - 6:02 pm

This question is suitable for SEMNET.

Matthew Constantinou posted on Friday, June 16, 2017 - 10:24 am

TIP WHEN REMOVING VARIABLES

Many of you will get quite a few empty cell warnings because your data is binary and your responses are naturally disproportional.

You might be hesitant to remove all the variables with an empty cell, as Linda has suggested. Whilst this is necessary, you can go about it in an efficient way. For instance, by removing the few variables which were most consistently related to empty cells, I no longer received a warning about empty cells. Of course, this does not mean that the correlations between remaining variables are any healthier - many of them likely still approach 1 and caution should be exercised. However, it does not mean you need to axe all variables listed.

Linda K. Muthen posted on Saturday, June 17, 2017 - 6:20 am

This is exactly how you should do this. Once you don't get any more messages about empty cells, you have no implied correlations of one so there is nothing to worry about in that area.

Alaine Garmendia posted on Wednesday, May 09, 2018 - 6:30 am

Dear Linda and Bengt,

I am performing a CFA using dichotomous variables (using 23 items for 6 factors, N = 142). I received two warnings after executing the CFA. The first one contains the EMPTY CELL warning regarding the bivariate table and the relevant names. I know that empty cells imply correlations of one, but the items are not highly skewed (at least one of the two items in which the EMPTY CELL warming is given). Is it a okay to include items in the CFA that contain the EMPTY CELL warning?

In addition, I received the following warning: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE
DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE HR5.

When I remove certain items, the �problem variable� shifts to another variable. What can I do with this warning in order to solve the issue?

Thank you very much for your help.

Bengt O. Muthen posted on Thursday, May 10, 2018 - 2:55 pm

Q1: I would not go ahead with the WLSMV analysis. Instead, get rid of items.

Q2: You should explore sources of model misfit which can give this Warning.

Alaine Garmendia posted on Monday, May 14, 2018 - 6:39 am

Thank you very much for your response Dr. Muthen.
I will try with other estimators since we cannot get rid of items.

Thanks again

Alaine