I am helping two colleagues with two separate models that are being fit in Mplus. In the process of helping them, I am encountering sample size issues on which I would find your thoughts to be very helpful.
In the first situation, a post-doctoral fellow wishes to evaluate whether exogenous predictor A (family financial violence) impacts distal outcome C (post-partum depression) via mediating variable B (partner violence to spouse). All three variables are dichotomous yes/no variables. Though the sample size is 291, only 15 participants exhibited depression and crosstabular tables of the A*B and B*C linkages show several instances of very small cell sizes, on the order of 1-8 participants per cell. Though Mplus faithfully runs the analysis and is able to generate bootstrap-based confidence intervals without rejecting any bootstrap samples, generalizing from such small cells seems problematic. Would you agree? Is there literature you can recommend that offers recommendations of minimum cell sizes for ordered categorical data modeling in SEM using ML and WLSMV estimators?
In the second scenario, a confirmatory factor analysis is being performed on a sample of 477 respondents who responded to 10 Likert survey items on the topic of experiencing homosexuality stigma. The scale points are 1 = never experienced the event, 2 = experienced the event once, 3 = experienced the event a few times, 4 = experienced the event many times. Because some of the events are experienced rarely, there are a number of items for which the number of participants who experience categories 2-4 can be quite small (< 10). For a few of the items, there are on respondents who endorsed categories three or four. As in the first example, Mplus fits the (two factor) model without incident using WLSMV estimation, but I am curious to know if the spareness of the data for some of the categories could prove problematic from an interpretational standpoint.
With best wishes and many thanks for your thoughts,
My 1989 article in Soc Meth & Research talks about sample size issues with factor analysis of binary items. For your path model, the Hosmer-Lemeshow logistic regression book would see useful as well for sample size issues. I would agree that this is very small sample, even for logistic regression. Regarding the CFA, I would look for the Version 4 warnings about zero cells in bivariate tables; if they occur I would collapse categories. Other than that, situation-specific simulations would be the only real way to get further insights into the quality of the results.