I presently have data on 158 categorical (1/0) items which were administered to 1,000 individuals. The data were collected in a planned missing data design, where a total of five different forms were used. Each form had a set of unique and common items (appx 20% of the 158 are common across the five forms). When running an EFA on these data, there are a number of missing cells due to nature of the design. The output I get from the analysis using WLSMV or ULS produces the WARNING: BIVARIATE TABLE OF X_ AND X_ HAS AN EMPTY CELL. Although I can get AIC/BIC using ML, I'm interested in the other ancillary statistics for model evaluation. Is there a way to control for this type of design and use the WLSMV or ULS estimators?
Dear Linda/Bengt, I am doing an exploratory factor analysis on binary items measuring safety culture in firms. Some items are not applicable to smaller firms and are recorded as missing. What is the best method to deal with this missingness? I have a variable measuring firm size.
Using maximum likelihood estimation would be the best, but with binary items each factor is one dimension of integration which makes EFA difficult. You could use DATA IMPUTATION to obtain imputed data sets and then use TYPE=IMPUTATION and weighted least squares to do the EFA.
Thanks, i do have the book. When data is MNAR he discusses about selection and pattern mixture models. I did not see about the performance of ML and Multiple imputation when we have 'not applicable items'. if i understand it correctly I read a section in the paper by Schafer and Graham(Missing data: state of the art) where they say when the missing values are out of scope - we can assume MAR.
The question is what an (implicitly) imputed value means for a subject for whom the question is not applicable. It's similar to imputing values for someone who drops out of a study due to death. There may be other approaches for this case. You may want to ask on SEMNET to hear if someone has a good reference.
I'm attempting to identify the latent factors within a complex data set that is comprised of:
- assessments that were administered in multiple different training cohorts - 5 different version of the same assessment were administered (each of which uses a different number of items such that version 1 uses 18 items, version 2 uses 36 items, version uses 30 items, etc ... so significant planned missingness) - a total of 55 ordinal items across all versions of assessment - 3 different scorers across all versions of assessment - 3 test-times (t1, t2, t3)
I hope to take an exploratory approach to factor analysing using as much data as possible to preserve power.
It seems to me that the best I can hope to do is: - use ESEM rather than EFA - analysis the 3 test-times separately rather than together in the same analysis (I know that I can test for measurement invariance using CFA, but I want to do an exploratory analysis instead of something confirmatory if possible) - use multiple groups where groups would be (i) training cohorts and (ii) assessment version
Is this correct? Or is there a more constructive way of structuring my thinking about this? Many thanks.
From the UG I can see that you can use ESEM: * at multiple time points (Example 5.26) and also * for multiple groups (Example 5.27). * I've also understood that multilevel analysis is unavailable for ESEM.
But is it possible to conduct an ESEM that takes into account: * one group variable such as "training cohort" * a second group variable such as "version of assessment" * and multiple time points all simultaneously?
I'm having trouble locating an example that integrates all these variables in one analysis.
It means that the data can be grouped in two different ways:
* the first way to group the data would be according to specific "training cohorts" which people participated in * the second way to group the data would be according to the particular "version of the assessment" they completed (as there was 5 different versions of the assessment used in total)
I can't seem to find an example of ESEM syntax that incorporates two different ways in which data can be grouped (i.e. by which training cohort they participated in & by which version of the assessment they took) in addition to multiple time points.