Outliers in CFA PreviousNext
Mplus Discussion > Confirmatory Factor Analysis >
Message/Author
 Rules Comm posted on Saturday, July 02, 2011 - 11:02 am
Professors,

I am coping with outliers in CFA. There are 7 outliers in a 400 cases dataset. I do not want to delete them because they are meaningful. Could you suggest any way that I can deal with them? I have tried to search in this web and other webs. But I did not find anything related to CFA and outliers. Thanks.
 Linda K. Muthen posted on Sunday, July 03, 2011 - 10:02 am
You would probably get a larger response to a general question like this on a general discussion forum like SEMNET.
 Peter Taylor posted on Wednesday, August 06, 2014 - 7:58 am
I just wanted to check if it is still appropriate to use Cook's distances to check for outliers in a CFA model with categorical (ordinal) indicators - is the interpretation of Cook's distances the same in this context? Thanks
 Bengt O. Muthen posted on Wednesday, August 06, 2014 - 11:57 am
Not sure how widely accepted Cook's is for categorical outcomes. I would use Loglikelihood outliers, obtained by ML.
 Peter Taylor posted on Thursday, August 07, 2014 - 12:34 am
Thanks for your help. Just to clarify then, would loglikelihood outliers still be interpretable with WLSMV estimation (as ordinal data)? Also, I am less familiar with loglikelihood outliers, are there any rules of thumb about what constitutes an outlier, or is it more a case of looking for points that are far out from the tail of the distribution? Many thanks
 Linda K. Muthen posted on Thursday, August 07, 2014 - 9:23 am
Weighted least squares estimators do not have loglikelihoods. You would need to use maximum likelihood if you want to look at the loglikelihoods.
 Peter Taylor posted on Monday, August 11, 2014 - 2:14 am
In that case is there a way of identifying outliers in a CFA with ordinal indicators and WLSMV estimation? Or is it a case of estimating the model with ML (and tresting the indicators as continuous)? Thanks
 Linda K. Muthen posted on Monday, August 11, 2014 - 6:15 am
You can estimate the model with ML and treat the indicators as categorical.
 Peter Taylor posted on Tuesday, August 12, 2014 - 1:06 am
One last question, if I did want to stick to WLSMV estimation are there any means of identifying outliers in Mplus that you would recommend?
 Bengt O. Muthen posted on Tuesday, August 12, 2014 - 9:31 am
Not sure I would. Outliers are better chased by ML.
 Ashley Duguay posted on Thursday, April 05, 2018 - 10:10 am
Hi Drs Muthen,

I am testing the validity of an emotional competence questionnaire (50 items and 10 expected factors [2 expected higher-order factors]) with athletes (nested within 19 teams). I am using both CFA and ESEM approaches in Mplus. We will also be examining the relationship between athlete leadership and emotional competence. I am hoping you can answer a few questions regarding outliers.

1. I have come across examples where preliminary data screening is done in SPSS prior to the main analyses in Mplus. As such, I tested for univariate (Z scores) and multivariate outliers (Mahalanobis Distance) at the subscale level in SPSS. To address the univariate outliers (7 cases), I planned to use Windsorizing but realized this would be done at the subscale level and would not impact the items, which would be imported into Mplus. Do you have any suggestions for how to deal with the univariate outliers? Should outliers at the item level be tested instead of the subscale level in this case? I plan to delete the cases of multivariate outliers (4 cases).

2. Instead of using SPSS to identify and address outliers, I consulted the Mplus guide and read about testing for outliers (multivariate I believe?) using Mahalanobis Distance as part of the PLOT or SAVE command. Is there a way to test for univariate outliers in Mplus as well? Is this done through INFLUENCE?
 Bengt O. Muthen posted on Thursday, April 05, 2018 - 2:50 pm
You can do SPSS univariate screening, but I think the most relevant screening is in the context of the CFA model. You can use any of our several Outliers options.
 Ashley Duguay posted on Thursday, April 05, 2018 - 6:11 pm
Dr. Muthen,

Thank you for your prompt response! As a follow-up, if I use one of the several outliers options in Mplus to screen for outliers within my CFA model would this be screening solely for multivariate outliers?

Also, when I run my ESEM model and subsequent regression analyses would I be required to again check for outliers?
 Ashley Duguay posted on Thursday, April 05, 2018 - 6:39 pm
Dr Muthen,

I forgot one additional question to add to the above.

If I were to do SPSS univariate screening, would you recommend doing so at the subscale or item level? I typically do so at the subscale level but I know the items will be what is brought into Mplus to then used to conduct my main analyses.

Thank you!
 Bengt O. Muthen posted on Friday, April 06, 2018 - 5:44 pm
First post:

Q1: Multivariate and wrt the CFA model.

Q2: Yes.

Second post:

Question for SEMNET.
 Ashley Duguay posted on Monday, April 09, 2018 - 4:56 pm
Dr. Muthen,

Thank you for your prompt and informative response!
 Ashley Duguay posted on Monday, April 09, 2018 - 6:15 pm
Hi Dr. Muthen,

Following your suggestion, I checked for multivariate outliers wrt my CFA model in Mplus using Mahalanobis distance. I identified 13 multivariate outliers, which I removed from my main data file before running the CFA again in Mplus. Two concerns have emerged.

1. When I ran the CFA after removing the outliers, I checked Mahalanobis distance wrt my CFA using Mplus again and 8 new outliers were identified. Is this typical? I feel as though if I remove these 8 outliers and run the CFA again, the same thing will keep happening until I have very little data left.

2. The model fit indices changed only slightly between the CFA that I ran with my complete data set and the CFA that I ran without the 13 outliers (see fit indices below). In the interest of retaining participants for adequate power, do you think the complete data set could be used to run my CFA in Mplus?

fit indices for complete data:
RMSEA .054; CFI .770; TLI .750; SRMR .084

fit indices for data with outliers removed:
RMSEA .051; CFI .779; TLI .761; SRMR .081

Any assistance or suggestions would be appreciate, thank you!
 Bengt O. Muthen posted on Tuesday, April 10, 2018 - 3:47 pm
Try the Cook's distance and check if any point is >1.
 Ashley Duguay posted on Tuesday, April 10, 2018 - 6:10 pm
Dr. Muthen,

Thank you for your assistance!
 Ashley Duguay posted on Tuesday, April 10, 2018 - 6:54 pm
Hi Dr. Muthen,

I apologize for my many questions, this is my first time using Mplus to run a CFA and evaluate my outliers. I sincerely appreciate your time and expertise.

I checked Cook's distance wrt my CFA model in Mplus as suggested and there appear to be well over half my sample that is > 1 (approximately 150). Furthermore, only 2 within the top 10 cases (i.e., highest Cook's distance scores) were also identified as a multivariate outlier using MD within Mplus. This seems quite odd. As I mentioned previously, 13 multivariate outliers wrt the CFA model were identified using MD.

Have you encountered any issues with Cooks distance within Mplus where high values for many participants were found?

Do you have any suggestions moving forward in how to approach the outliers? (I certainly cannot delete well over half my sample but I want to handle them appropriately and accurately present my CFA model)?

Thank you!
 Ashley Duguay posted on Tuesday, April 10, 2018 - 7:19 pm
Hi Dr. Muthen,

To add an additional layer to my comment above, I just ran the CFA without the TYPE=COMPLEX command and the Cook's distance scores decreased substantially! There are approximately 30 outliers > 1 (the majority of these 30 are 1.something with the others ranging from 2, 5, 13, and 18). This time 6 of the top 10 (i.e., highest Cook's distance scores) were also identified as a multivariate outlier using MD within Mplus. 30 is still a lot but pales in comparison to 153. Is there something about TYPE=COMPLEX that impacts Cook's distance scores?
 Bengt O. Muthen posted on Wednesday, April 11, 2018 - 4:33 pm
What I might consider doing is to use a couple of outlier methods including the loglikelihood contributions and then see which subjects they agree on. Then start by removing the worst one and see if any results change in meaningful ways. If they do, delete the next worst one, etc.
 Ashley Duguay posted on Wednesday, April 11, 2018 - 7:25 pm
Hi Dr. Muthen,

Thank you, I will do this! If TYPE=COMPLEX is used to run my CFA should it also be used to identify outliers? I was just so surprised to see the number of identified cases (Cook's Distance > 1) go from 150 to 30.

Thank you again for all of your help!
 Ashley Duguay posted on Thursday, April 12, 2018 - 8:35 am
Also, can COOKS, MAHALANOBIS, LOGLIKELIHOOD, and INFLUENCE be used with MLR (using MLR for missing data). I read that LOGLIKELIHOOD can only be used with maximum likelihood estimators - does this include MLR?
 Bengt O. Muthen posted on Thursday, April 12, 2018 - 12:25 pm
MLR is a maximum-likelihood estimator. It gives the same parameter estimates as ML, only different SEs. Try it and see what you get.
 Ashley Duguay posted on Monday, April 16, 2018 - 6:30 am
Hi Dr. Muthen,

Thank you for your assistance! I have a much better understanding of my outliers now. If TYPE=COMPLEX is used to run my CFA (just to account for the nested nature of my data) should this command also be used to identify the outliers? I was just so surprised to see the number of identified cases for Cook's Distance > 1 go from 150 (with TYPE=COMPLEX) to 30 (without TYPE=COMPLEX).

Thank you again for all of your help!
 Bengt O. Muthen posted on Tuesday, April 17, 2018 - 4:00 pm
Yes.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: