Rules Comm posted on Saturday, July 02, 2011 - 11:02 am
I am coping with outliers in CFA. There are 7 outliers in a 400 cases dataset. I do not want to delete them because they are meaningful. Could you suggest any way that I can deal with them? I have tried to search in this web and other webs. But I did not find anything related to CFA and outliers. Thanks.
I just wanted to check if it is still appropriate to use Cook's distances to check for outliers in a CFA model with categorical (ordinal) indicators - is the interpretation of Cook's distances the same in this context? Thanks
Thanks for your help. Just to clarify then, would loglikelihood outliers still be interpretable with WLSMV estimation (as ordinal data)? Also, I am less familiar with loglikelihood outliers, are there any rules of thumb about what constitutes an outlier, or is it more a case of looking for points that are far out from the tail of the distribution? Many thanks
In that case is there a way of identifying outliers in a CFA with ordinal indicators and WLSMV estimation? Or is it a case of estimating the model with ML (and tresting the indicators as continuous)? Thanks
I am testing the validity of an emotional competence questionnaire (50 items and 10 expected factors [2 expected higher-order factors]) with athletes (nested within 19 teams). I am using both CFA and ESEM approaches in Mplus. We will also be examining the relationship between athlete leadership and emotional competence. I am hoping you can answer a few questions regarding outliers.
1. I have come across examples where preliminary data screening is done in SPSS prior to the main analyses in Mplus. As such, I tested for univariate (Z scores) and multivariate outliers (Mahalanobis Distance) at the subscale level in SPSS. To address the univariate outliers (7 cases), I planned to use Windsorizing but realized this would be done at the subscale level and would not impact the items, which would be imported into Mplus. Do you have any suggestions for how to deal with the univariate outliers? Should outliers at the item level be tested instead of the subscale level in this case? I plan to delete the cases of multivariate outliers (4 cases).
2. Instead of using SPSS to identify and address outliers, I consulted the Mplus guide and read about testing for outliers (multivariate I believe?) using Mahalanobis Distance as part of the PLOT or SAVE command. Is there a way to test for univariate outliers in Mplus as well? Is this done through INFLUENCE?
Thank you for your prompt response! As a follow-up, if I use one of the several outliers options in Mplus to screen for outliers within my CFA model would this be screening solely for multivariate outliers?
Also, when I run my ESEM model and subsequent regression analyses would I be required to again check for outliers?
I forgot one additional question to add to the above.
If I were to do SPSS univariate screening, would you recommend doing so at the subscale or item level? I typically do so at the subscale level but I know the items will be what is brought into Mplus to then used to conduct my main analyses.
Following your suggestion, I checked for multivariate outliers wrt my CFA model in Mplus using Mahalanobis distance. I identified 13 multivariate outliers, which I removed from my main data file before running the CFA again in Mplus. Two concerns have emerged.
1. When I ran the CFA after removing the outliers, I checked Mahalanobis distance wrt my CFA using Mplus again and 8 new outliers were identified. Is this typical? I feel as though if I remove these 8 outliers and run the CFA again, the same thing will keep happening until I have very little data left.
2. The model fit indices changed only slightly between the CFA that I ran with my complete data set and the CFA that I ran without the 13 outliers (see fit indices below). In the interest of retaining participants for adequate power, do you think the complete data set could be used to run my CFA in Mplus?
fit indices for complete data: RMSEA .054; CFI .770; TLI .750; SRMR .084
fit indices for data with outliers removed: RMSEA .051; CFI .779; TLI .761; SRMR .081
Any assistance or suggestions would be appreciate, thank you!
I apologize for my many questions, this is my first time using Mplus to run a CFA and evaluate my outliers. I sincerely appreciate your time and expertise.
I checked Cook's distance wrt my CFA model in Mplus as suggested and there appear to be well over half my sample that is > 1 (approximately 150). Furthermore, only 2 within the top 10 cases (i.e., highest Cook's distance scores) were also identified as a multivariate outlier using MD within Mplus. This seems quite odd. As I mentioned previously, 13 multivariate outliers wrt the CFA model were identified using MD.
Have you encountered any issues with Cooks distance within Mplus where high values for many participants were found?
Do you have any suggestions moving forward in how to approach the outliers? (I certainly cannot delete well over half my sample but I want to handle them appropriately and accurately present my CFA model)?
To add an additional layer to my comment above, I just ran the CFA without the TYPE=COMPLEX command and the Cook's distance scores decreased substantially! There are approximately 30 outliers > 1 (the majority of these 30 are 1.something with the others ranging from 2, 5, 13, and 18). This time 6 of the top 10 (i.e., highest Cook's distance scores) were also identified as a multivariate outlier using MD within Mplus. 30 is still a lot but pales in comparison to 153. Is there something about TYPE=COMPLEX that impacts Cook's distance scores?
What I might consider doing is to use a couple of outlier methods including the loglikelihood contributions and then see which subjects they agree on. Then start by removing the worst one and see if any results change in meaningful ways. If they do, delete the next worst one, etc.
Thank you, I will do this! If TYPE=COMPLEX is used to run my CFA should it also be used to identify outliers? I was just so surprised to see the number of identified cases (Cook's Distance > 1) go from 150 to 30.
Also, can COOKS, MAHALANOBIS, LOGLIKELIHOOD, and INFLUENCE be used with MLR (using MLR for missing data). I read that LOGLIKELIHOOD can only be used with maximum likelihood estimators - does this include MLR?
Thank you for your assistance! I have a much better understanding of my outliers now. If TYPE=COMPLEX is used to run my CFA (just to account for the nested nature of my data) should this command also be used to identify the outliers? I was just so surprised to see the number of identified cases for Cook's Distance > 1 go from 150 (with TYPE=COMPLEX) to 30 (without TYPE=COMPLEX).