I have a data that consists of pupil's grades in different school subjects. I'm trying to fit a bi-factor model with a general factor and specific factors for math/science and language subjects. The problem is that for some of the subjects the students can decide whether they take an advanced or a basic course, so the data is NMAR.
I guess this biases the estimation, but as far as understand some of the bias may be corrected using a selection model to account for the missing data (Muthen 1987). All the simple examples I've been able to find have to do with longitudinal data and I have not been able to figure out how to apply them in the context of CFA. So how would I specify the model (what is the syntax) in the following situation:
-I have 4 subjects (Gen1-Gen4) that only load onto the general factor
-Math/Sci subjects load onto the general and math factors and one of the subjects is divided into basic and advanced course (Variables: Mat1-Mat2, Mat3Adv, Mat3Bas)
-Language factor is similar to the math/sci factor (Variables: Lan1-Lan2, Lan3Adv, Lan3Bas)
So my problem is not the bi-factor model itself, but the selection model. How many latent factors should this model have? Do I use the grades themselves or dichotomous variables as indicators? What correlates with what and so on?
The data may still be MAR if the variables that are observed for a subject predicts his/her missingness.
For instance, if all people have data on gen1-gen4, math1-mat2 and missing on only either mat3adv or mat3bas, perhaps it is reasonable that the choice of taking mat3adv or mat3bas (and therefore missing on the other) is predicted by those observed variables.
In this case you simply specify the bi-factor model as usual and using ML estimation you will get estimates assuming MAR by default.
I think I'll go ahead and assume MAR data then, it seems quite plausible. But just out of interest, how would the selection model work if I was going to give it a try? Or does it even make any sense in this situation?
That article actually shows how to do ML under MAR using a multiple-group approach where groups correspond to missing data patterns. That has now mostly historical-pedagogical value. You get the same thing just using the Mplus ML (or MLR) default without having to split data into missing data pattern groups.