Hello, I am currently working on recoding items for a measure in preparation for factor analysis. There are some items in the measure that have two response categories. For instance:
Item: I have felt judged by a health care provider. Responses: (a) Did this ever happen? 0: No 1: Yes (b) If yes, how much did it affect you? 1: Bothered me greatly 2: Bothered me somewhat 3: Did not bother or help me 4: Helped me somewhat 5: Helped me greatly
I’m not sure how to include these types of items in a factor analysis, specifically if I need to recode it into a third variable or keep them as is. My concern with keeping them as is is with the missing data for part (b) if they answered ‘no’ in part (a). I would greatly appreciate any references that could provide me with some guidance on how to include items such as these in factor analyses. Thank you for your time and help.
I think I saw your question on SEMNET and thought about it because it is a common situation without an easy answer. I think it sounds like a candidate for either Heckman or two-part factor analysis. Two-part factor analysis was discussed in the paper on our website
Kim, Y.K. & Muthén, B. (2009). Two-part factor mixture modeling: Application to an aggressive behavior measurement instrument. Structural Equation Modeling, 16, 602-624. download paper contact first author show abstract
That one extends the scope to mixtures, but that is not necessary. It was also focused more on censored outcomes, whereas your case is more like missing on the (b) questions - and missing selectively. Which is Heckman-related (wages missing for those who do not work).
Heckman factor analysis I have not seen anyone do, I think, so that would be a methods research topic.
You can start by picking one item and do regression on some covariates. Then you can do an analysis that checks if you have different predictors of (a) than (b), which two-part and Heckman allow.
Thank you so much for getting back to me. I scanned the article you referenced and found it to be very helpful.
I did have another question. Another person on SEMNET had suggested treating the missing data for part (b) as missing by design with part (a) as an auxiliary variable to provide a mechanism for the missing data. I was wondering what your thoughts were on this and if you knew of any references that could potentially guide me on how to carry out such an analysis using Mplus.
Yes, the answer to (a) could be viewed as the MAR predictor of missingness on (b). So it would be relevant as an auxiliary variable, although I don't know how well that would work. But wouldn't you want to model both parts by some predictors that are different for the two parts, either observed or latent (the latter meaning a factor for (a) and a factor for (b))? If (a) so predicted is included in the model, instead of being auxiliary, you would fall under MAR, that is, use regular FIML.
Thank you for responding to my questions. After much thought and conversation with my research advisor and stats professor, we have decided to pursue a 2-stage robust method with auxiliary variables. I’m not very familiar with this method for SEM, specifically with the syntax.
I was wondering if there are any readings you could recommend that would help a beginner navigate through the syntax and interpretations of the results. The article you cited above was really helpful in that the authors outline the steps necessary for the analysis they discuss. I was hoping you would be able to point me toward material written in a similar fashion.
When you say two-stage, I think you refer to what I call two-part. If so, perhaps growth modeling articles using two-part would be of some use because the repeated measures are like multiple factor indicators and the growth factors like factors. You find such papers on our website under Papers, Two-Part Growth Modeling with a Preponderance of Zeros.
Apart from the article I mentioned, I don't know of another factor analysis application, so yours might be an interesting first.