Message/Author 

Rules Comm posted on Saturday, July 02, 2011  11:02 am



Professors, I am coping with outliers in CFA. There are 7 outliers in a 400 cases dataset. I do not want to delete them because they are meaningful. Could you suggest any way that I can deal with them? I have tried to search in this web and other webs. But I did not find anything related to CFA and outliers. Thanks. 


You would probably get a larger response to a general question like this on a general discussion forum like SEMNET. 


I just wanted to check if it is still appropriate to use Cook's distances to check for outliers in a CFA model with categorical (ordinal) indicators  is the interpretation of Cook's distances the same in this context? Thanks 


Not sure how widely accepted Cook's is for categorical outcomes. I would use Loglikelihood outliers, obtained by ML. 


Thanks for your help. Just to clarify then, would loglikelihood outliers still be interpretable with WLSMV estimation (as ordinal data)? Also, I am less familiar with loglikelihood outliers, are there any rules of thumb about what constitutes an outlier, or is it more a case of looking for points that are far out from the tail of the distribution? Many thanks 


Weighted least squares estimators do not have loglikelihoods. You would need to use maximum likelihood if you want to look at the loglikelihoods. 


In that case is there a way of identifying outliers in a CFA with ordinal indicators and WLSMV estimation? Or is it a case of estimating the model with ML (and tresting the indicators as continuous)? Thanks 


You can estimate the model with ML and treat the indicators as categorical. 


One last question, if I did want to stick to WLSMV estimation are there any means of identifying outliers in Mplus that you would recommend? 


Not sure I would. Outliers are better chased by ML. 


Hi Drs Muthen, I am testing the validity of an emotional competence questionnaire (50 items and 10 expected factors [2 expected higherorder factors]) with athletes (nested within 19 teams). I am using both CFA and ESEM approaches in Mplus. We will also be examining the relationship between athlete leadership and emotional competence. I am hoping you can answer a few questions regarding outliers. 1. I have come across examples where preliminary data screening is done in SPSS prior to the main analyses in Mplus. As such, I tested for univariate (Z scores) and multivariate outliers (Mahalanobis Distance) at the subscale level in SPSS. To address the univariate outliers (7 cases), I planned to use Windsorizing but realized this would be done at the subscale level and would not impact the items, which would be imported into Mplus. Do you have any suggestions for how to deal with the univariate outliers? Should outliers at the item level be tested instead of the subscale level in this case? I plan to delete the cases of multivariate outliers (4 cases). 2. Instead of using SPSS to identify and address outliers, I consulted the Mplus guide and read about testing for outliers (multivariate I believe?) using Mahalanobis Distance as part of the PLOT or SAVE command. Is there a way to test for univariate outliers in Mplus as well? Is this done through INFLUENCE? 


You can do SPSS univariate screening, but I think the most relevant screening is in the context of the CFA model. You can use any of our several Outliers options. 


Dr. Muthen, Thank you for your prompt response! As a followup, if I use one of the several outliers options in Mplus to screen for outliers within my CFA model would this be screening solely for multivariate outliers? Also, when I run my ESEM model and subsequent regression analyses would I be required to again check for outliers? 


Dr Muthen, I forgot one additional question to add to the above. If I were to do SPSS univariate screening, would you recommend doing so at the subscale or item level? I typically do so at the subscale level but I know the items will be what is brought into Mplus to then used to conduct my main analyses. Thank you! 


First post: Q1: Multivariate and wrt the CFA model. Q2: Yes. Second post: Question for SEMNET. 


Dr. Muthen, Thank you for your prompt and informative response! 


Hi Dr. Muthen, Following your suggestion, I checked for multivariate outliers wrt my CFA model in Mplus using Mahalanobis distance. I identified 13 multivariate outliers, which I removed from my main data file before running the CFA again in Mplus. Two concerns have emerged. 1. When I ran the CFA after removing the outliers, I checked Mahalanobis distance wrt my CFA using Mplus again and 8 new outliers were identified. Is this typical? I feel as though if I remove these 8 outliers and run the CFA again, the same thing will keep happening until I have very little data left. 2. The model fit indices changed only slightly between the CFA that I ran with my complete data set and the CFA that I ran without the 13 outliers (see fit indices below). In the interest of retaining participants for adequate power, do you think the complete data set could be used to run my CFA in Mplus? fit indices for complete data: RMSEA .054; CFI .770; TLI .750; SRMR .084 fit indices for data with outliers removed: RMSEA .051; CFI .779; TLI .761; SRMR .081 Any assistance or suggestions would be appreciate, thank you! 


Try the Cook's distance and check if any point is >1. 


Dr. Muthen, Thank you for your assistance! 


Hi Dr. Muthen, I apologize for my many questions, this is my first time using Mplus to run a CFA and evaluate my outliers. I sincerely appreciate your time and expertise. I checked Cook's distance wrt my CFA model in Mplus as suggested and there appear to be well over half my sample that is > 1 (approximately 150). Furthermore, only 2 within the top 10 cases (i.e., highest Cook's distance scores) were also identified as a multivariate outlier using MD within Mplus. This seems quite odd. As I mentioned previously, 13 multivariate outliers wrt the CFA model were identified using MD. Have you encountered any issues with Cooks distance within Mplus where high values for many participants were found? Do you have any suggestions moving forward in how to approach the outliers? (I certainly cannot delete well over half my sample but I want to handle them appropriately and accurately present my CFA model)? Thank you! 


Hi Dr. Muthen, To add an additional layer to my comment above, I just ran the CFA without the TYPE=COMPLEX command and the Cook's distance scores decreased substantially! There are approximately 30 outliers > 1 (the majority of these 30 are 1.something with the others ranging from 2, 5, 13, and 18). This time 6 of the top 10 (i.e., highest Cook's distance scores) were also identified as a multivariate outlier using MD within Mplus. 30 is still a lot but pales in comparison to 153. Is there something about TYPE=COMPLEX that impacts Cook's distance scores? 


What I might consider doing is to use a couple of outlier methods including the loglikelihood contributions and then see which subjects they agree on. Then start by removing the worst one and see if any results change in meaningful ways. If they do, delete the next worst one, etc. 


Hi Dr. Muthen, Thank you, I will do this! If TYPE=COMPLEX is used to run my CFA should it also be used to identify outliers? I was just so surprised to see the number of identified cases (Cook's Distance > 1) go from 150 to 30. Thank you again for all of your help! 


Also, can COOKS, MAHALANOBIS, LOGLIKELIHOOD, and INFLUENCE be used with MLR (using MLR for missing data). I read that LOGLIKELIHOOD can only be used with maximum likelihood estimators  does this include MLR? 


MLR is a maximumlikelihood estimator. It gives the same parameter estimates as ML, only different SEs. Try it and see what you get. 


Hi Dr. Muthen, Thank you for your assistance! I have a much better understanding of my outliers now. If TYPE=COMPLEX is used to run my CFA (just to account for the nested nature of my data) should this command also be used to identify the outliers? I was just so surprised to see the number of identified cases for Cook's Distance > 1 go from 150 (with TYPE=COMPLEX) to 30 (without TYPE=COMPLEX). Thank you again for all of your help! 


Yes. 

Back to top 