Mplus Discussion >> Skewness

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Skewness

Mplus Discussion > Confirmatory Factor Analysis >

Message/Author

Anonymous posted on Friday, February 13, 2004 - 11:13 am

Is it necessary that the data are normally distributed for confirmatory factor analyses or linear growth modelling, if the outcome variables are likert type scale and continuous?

Linda K. Muthen posted on Friday, February 13, 2004 - 11:29 am

Factor indicators or outcomes in a growth model can be continuous, binary, or ordered categorical in the current version of Mplus. If they are not normally distributed, there are estimators that are robust to non-normality. In Version 3, factor indictaors and outcomes can also be censored or count variables. In factor analysis combinations of different variable types are allowed.

J.W. posted on Monday, February 04, 2008 - 2:13 pm

I am testing multivariate normality of the observed continuous variables before running a CFA model with 18 variables loading to 3 factors. The Mardia measures for skewness and kurtosis estimated from SAS using SAS macro MULTNORM are the following:

Mardia Skewness 9844 p<.0001
Mardia Kurtosis 110.3 p<.0001

By specifying a single group in Mixture analysis and Option 13 in Mplus, I have:

TWO-SIDED MULTIVARIATE SKEW TEST OF FIT
Sample Value 85.942
Mean 27.862
Standard Deviation 1.182
P-Value 0.0000

TWO-SIDED MULTIVARIATE KURTOSIS TEST OF FIT
Sample Value 481.889
Mean 357.220
Standard Deviation 2.995
P-Value 0.0000

Why the sample values are so different in Mplus and SAS results? How can we test multivariate normality in Mplus? Thanks!

Linda K. Muthen posted on Monday, February 04, 2008 - 5:53 pm

Mplus computes Mardia (1970) definitions of multivariate skew and kurtosis. Mplus uses the actual sample statistic as defined in Mardia, Kent, Bibby (1979, pg 21). SAS uses definitions from Mardia (1974).

J.W. posted on Tuesday, February 05, 2008 - 9:46 am

Linda,

Thanks a lot for your prompt reply to my question.

In Mixture analysis, the P-value provided by Tech13 in Mplus output is for comparing the sample value and model estimated value in regard to Mardia Skewness and Kurtosis measures. A small P-value (e.g., <0.05) indicates that the model (i.e., the single group model in my case) does not fit the data. To test the hypothesis of multivariate normality in the observed measures, can I report the Sample Value and P-value in Mplus output? That is, if the P-value is <0.05, then reject the hypothesis of multivariate normality. However, it seems to me that the P-value in Mplus output is for model fit test. Is it also for testing multivariate normality of the observed variables? Appreciate your help!

Linda K. Muthen posted on Wednesday, February 06, 2008 - 10:38 am

TECH13 does not provide tests of multivariate skewness and kurtosis. It is a test of the model-generated skewness and kurtosis against observed variable skewness and kurtosis. Multivariate normality is not needed when using the MLR and MLM estimators.

Andrea Vocino posted on Tuesday, March 25, 2008 - 11:29 pm

Hi Linda,

What kind of output do I need to run in order to estimate actual sample statistic as defined in Mardia, Kent, Bibby (1979, pg 21).

Linda K. Muthen posted on Wednesday, March 26, 2008 - 8:23 am

This is TECH13 which is available only for TYPE=MIXTURE; You can do TYPE-MIXTURE; with CLASSES = c(1); if you are not doing mixture modeling.

kirby posted on Friday, May 16, 2008 - 6:30 am

Dear all,
Linda wrote "TECH13 does not provide tests of multivariate skewness and kurtosis. It is a test of the model-generated skewness and kurtosis against observed variable skewness and kurtosis."
and "Mplus computes Mardia (1970) definitions of multivariate skew and kurtosis. Mplus uses the actual sample statistic as defined in Mardia, Kent, Bibby (1979, pg 21)."

I am sorry, but I do not really understand which definition MPlus uses and what exactly Tech13's Mardia Coefficient can tell me. Does it help me to assess multivariate normality of my indicator variables or not?

If one can use Mplus Tech13's coefficient, how can I interpret it? I tried to find some guidance e.g. in the semnet archive, but there was nothing that really helped me. I assume the following result indicates that my indicator variables are not multivariate normally distributed, right? But how can I evalute if the multivariate kurtosis is low, moderate, or heavy?

TWO-SIDED MULTIVARIATE KURTOSIS TEST OF FIT
Sample Value 551.903
Mean 481.730
Standard Deviation 2.579
P-Value 0.0000

Thanks a lot for your help!!

PS: Another thought: (how) can I use the Santora-Bentler scaling correction factor to assess multivariate normality?

Bengt O. Muthen posted on Friday, May 16, 2008 - 5:39 pm

Tech13 was primarily developed for mixture models to see if the estimated model would capture the skewness and kurtosis in the data. I think that is the background for Linda's statement. With a single class, however, this gives a standard test of multivariate normality. The actual skewness and kurtosis values are obtained in the output using Tech12.

I have not seen the Satora-Bentler correction factor used to test/assess multivariate normality.

My own opinion is that tests of multivariate normality are of less importance now that we have non-normality robust techniques using MLR or MLM in Mplus. Experience indicates that under non-normality the normality-based ML parameter estimates are quite robust, the SEs that MLR and MLM give are very good, and MLR/ MLM chi-square test of model fit is also very good. Normality testing seems to have been advocated in earlier days when these robust techniques hadn't been implemented. So in this sense, the focus on testing normality in the context of latent variable modeling is in my view a bit outdated.

kirby posted on Sunday, May 18, 2008 - 6:12 am

Dear Bengt,

thanks a lot for your detailed answer!

Please allow one follow-up question: are there any criteria along which I can decide whether I should use MLM or MLR?

On the one hand MLM is more popular and there is more information available from other authors, but on the other hand it only works with listwise deletion... Is there anything which points at MLM or MLR (I do not have non-independent observations)?

Have a nice Sunday!

Bengt O. Muthen posted on Monday, May 19, 2008 - 8:50 am

MLR and MLM usually give very similar SEs and they are asymptotically equivalent (even for non-normal data).

kirby posted on Tuesday, May 20, 2008 - 6:38 am

Dear Bengt,

thanks for that pleasant answer! Then I would prefer MLR - due to my missing values.

Do you maybe know any paper which supports your statement that MLM and MLR are very similar? I have had a look on the website but have not found anything.
Thanks!

Linda K. Muthen posted on Tuesday, May 20, 2008 - 6:46 am

I don't know of any such paper.

kirby posted on Wednesday, May 21, 2008 - 7:48 am

I found some simulation studies which evaluate the performance of MLM and compare it to 'normal' ML estimation. However, I did not come across such studies for MLR. Do you maybe know one/some? That would be great!

Linda K. Muthen posted on Wednesday, May 21, 2008 - 11:16 am

I don't know of any.

Sean Mullen posted on Friday, November 06, 2009 - 2:19 pm

Dear list,
Continuing on with the discussion above...how does Numerical Integration compare (to MLM and MLR) as a robust estimator in the presence of non-normal data? I have 2 non-normal distal outcome variables (model 1) and 3 non-normal continous indicator variables (model 2)--but MPlus requires integration with GMM and missing data. Should I worry about non-normality, or should I definitely reflect and transform my negatively skewed variables? Thanks in advance!

Bengt O. Muthen posted on Friday, November 06, 2009 - 6:22 pm

Numerical integration is an algorithm, not an estimator so can not be compared to say MLR. With mixture models such as GMM you should not worry about non-normality of outcomes - the mixture creates non-normality in the outcomes. So if you transform non-normal outcomes you may not find the mixture that generated the data.

finnigan posted on Thursday, July 22, 2010 - 12:28 pm

Linda/Bengt

Can a box cox transformation be used to normalize skewed ordinal categorical data?

Thanks

Linda K. Muthen posted on Thursday, July 22, 2010 - 2:38 pm

I don't think this would be appropriate. Categorical data methodology is developed to deal with floor and ceiling effects so no transformation is necessary.

finnigan posted on Friday, August 06, 2010 - 8:13 am

Linda/Bengt

I have a sample size of 129 individuals responding to a 56 item survey using 1-5 point likert scale. Previous research has shown that a rather poor fitting five factor model under pins the data. I cannot validate the five factor model in my sample because it is too small and the data is skewed. Previous research has shown that 18 to 25 items out of 56 items do not load on a factor.

The five factors do not correlate, and I would like to do a CFA on each of the factors separately to assess the loadings in my sample, with a view to reducing the number of items . Would you see any problems with this approach if the loadings are only of interest?

Thanks

Linda K. Muthen posted on Friday, August 06, 2010 - 9:50 am

This seems reasonable.

Jennifer Yahner posted on Wednesday, September 29, 2010 - 8:28 pm

Dr. Muthen,

Continuing on your statement above that given MLR, worries about non-normality are outdated, are you saying that -- in the context of path modeling where the dependent variable has a zero-inflated Poisson-like distribution -- specifying the DV as such is not necessary?

I am hoping your answer is yes. That treating the DV as continuous, but using MLR estimation, yields similar results.

-Jennifer

Linda K. Muthen posted on Thursday, September 30, 2010 - 5:34 am

If you have a zero-inflated count variable, you should treat it as such by using the COUNT option.

finnigan posted on Thursday, December 30, 2010 - 3:17 am

Linda/Bengt
I am trying to run a one factor model using 10 measured variables scored on a likert scale from 1-5. The scale has been previously used in published research.
My data are skewed and z scores for the indicators range from 5.7217 � 00372. Z scores for kurtosis range from 2.07 to 4.34.
MPLUS gave the following warning:
WARNING in MODEL command. All variables are uncorrelated with all other variables in the model.
Check that this is what is intended.

Given that previous research has shown that these items load on a factor and cronbachs alpha is .72 ,its surprising that this warning has arisen. I�m wondering if the warning has emerged because the skewness and kurtosis have impacted Pearsons correlation coefficients.

Is there any way in MPLUS to deal with this warning?

Linda K. Muthen posted on Thursday, December 30, 2010 - 5:48 am

This warning comes when variables on the USEVARIABLES or NAMES list are not used in the MODEL command to warn you if this is unintended. These variables are all considered analysis variables. If you can't see the reason, please send the full output and your license number to support@statmodel.com.

Katharina Fischer posted on Tuesday, August 23, 2011 - 12:19 am

Dear all,

having read several posts on the topic of mardia test, I learned, that there is the possibility of getting the mardia test by specifying a mixture model and use tech13. I would like to know, if there is an alternative way of getting this test in the current version (I use version 6). (I know, for my analyses I should use MLR or MLM with non normal data, but I am afraid I have to underpin this decision by reporting a test statistic like mardia�s coefficient in my diploma thesis...)

Does anyone know a paper or other text were I can get information helping me to decide if to use MLR or MLM? (A text that tells something about the differences/advantages of the two options?)

Thanks a lot for any comment,
Katharina

Bengt O. Muthen posted on Tuesday, August 23, 2011 - 8:23 am

There is nothing wrong with always using MLR (or MLM when no missing data) instead of ML. If you have reasons to suspect that your data are non-normal, I would simply use MLR/MLM and see what difference (compared to ML) that makes to SEs and chi-square. To me, that is the ultimate test of non-normality. I don't find it necessary to do a Mardia test - I never do - that was needed in the old days when we didn't have MLR/MLM, but only ML.

I can't point to a paper where ML, MLR, and MLN are thoroughly compared. Anyone?

Katharina Fischer posted on Tuesday, August 23, 2011 - 10:31 am

Dear Professor Muthen,

thank you very much for your helpful advice. So I will use MLR with my data (because I have some missings).

Kind regards,
Katharina

finnigan posted on Monday, March 12, 2012 - 2:47 pm

Linda/Bengt

I am looking at the univariate skew and kurtosis in mplus. I have three waves of data and I'm running a CFA per wave consisting of a one factor model. However different indicators from different CFAs are showing the same values for skewness and kurtosis of observed indicators. Is there any reason that different indicators would have identical skewness and kurtosis values?

Thanks

Linda K. Muthen posted on Monday, March 12, 2012 - 2:55 pm

Please send the output and your license number to support@statmodel.com so we can see exactly what you are talking about.

janni niclasen posted on Friday, April 13, 2012 - 6:00 am

Hi

I am testing three different models running CFA (N=71000):
model 1: 25 items and 5 first order factors;
model 2: adding 2 second order factors - two to each of two first order factors;
model 3: adding 1 second order factor to the same four first order factors as in model 2.
My 25 items are highly skewed (the most skewed distribution 98, 1.5, 0.5). I get some fine overall fit statistics - RMSEA = .36, but below .9 CFI/ TLI and obviously very high chi square >6000) - but I also get some factor estimates >1 (because I haven't got enough variance I assume).
I am treating my data on a cateogircal level and I have also tried dicotomising my answer categories. But it still doesn't work - I still get factor estimates >1. I hope you can advise me on what do do/ try next in order for me to be able to run my models without getting these overfitted models.

Thanks a lot...

Linda K. Muthen posted on Friday, April 13, 2012 - 8:20 am

Factor loadings can be greater than one with correlated factors. Your concern should be negative residual variances.

brendadooley posted on Monday, April 30, 2012 - 12:39 pm

Is there any adjustment to Pearson correlations that can be made when observed indicators are not normally distributed.

Thanks

Linda K. Muthen posted on Monday, April 30, 2012 - 1:57 pm

There is no adjustment. If the variables are censored, you can use the CENSORED option. Otherwise, there are estimators like MLR that are robust to non-normality.

Ashley posted on Saturday, August 04, 2012 - 3:13 pm

Dear Drs. Muthen,

I am running a growth cruve model using a drinking variable that is quite skewed. Because of the non-normality, I am using the MLR estimation you explained earlier in this thread. On a path from a predictor variable to the intercept of drinking, I am getting different p-values between the unstandardized and standardized results. I know that the betas are supposed to change when they are standardized, and the p-values may change very slightly, but the difference in p-values in this case is about .05, so the path is significant in the unstandardized results but not in the standardized results. Have you experienced this before when using MLR, or am I using it in an incorrect circumstance? If it is possible to get different p-values, which results should I rely on? Thank you very much for your time.

Linda K. Muthen posted on Saturday, August 04, 2012 - 7:33 pm

The p-values differ between standardized and unstandardized because the sampling distribution are different. It has nothing to do with MLR. In all cases with so many simultaneous tests, you should be conservative. You should use the one you want to report.

Ashley posted on Monday, August 06, 2012 - 10:29 am

Thank you very much. I appreciate the time and effort you both put into the discussion board, as it is one of the most useful tools. Thank you.

Anne Janssen posted on Friday, August 17, 2012 - 7:52 am

Dear Dr. Muthen,

I am running a CFA with 9 items loading on two factors and would like to use tech11 or tech13 to get information about the skewness. As Linda posted earlier this is only possible with type mixture and classes = c(1). How do I have to specify my model command which includes by-commands?

Thanks a lot in advance.

-Anne

Linda K. Muthen posted on Friday, August 17, 2012 - 1:01 pm

It is TECH13. You need to add %OVERALL% after MODEL and before the BY statements.

Anne posted on Monday, August 20, 2012 - 4:59 am

Thanks a lot for your help!

Nicholas Bishop posted on Tuesday, June 18, 2013 - 7:26 pm

Hello,
In the situation where a dependent variable is a zero-inflated count, but the addition of the COUNT option produces a model that will not run due to computational complexity, how much trust can we put in results when using MLM/MLR? Would this be an instance where transforming the variable may be suitable? The warning I am receiving is below.

Nick

(THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE. THE ANALYSIS REQUIRES 4 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.50625E+05 INTEGRATION POINTS)

Linda K. Muthen posted on Wednesday, June 19, 2013 - 5:43 am

The message has to do with not enough memory because the model has 4 dimensions of integration. Try INTEGRATION = MONTECARLO (5000). If this does not help, send the output and your license number to support@statmodel.com.

Cecily Na posted on Thursday, August 08, 2013 - 11:42 am

Dear Professor,
Although MLR estimator can handle non-normal data, to what extent the non-normality is for MLR to be most effective? For instance, Skewness < 2 and Kurtosis < 7, or the sknewness and kurtosis can be infinite?

Thank you.

Linda K. Muthen posted on Thursday, August 08, 2013 - 12:41 pm

I don't know of any studies that have looked at this specifically. You would need to do your own simulation study to see.

Anne posted on Wednesday, September 04, 2013 - 6:13 am

Dear Ms. Muthen,

although you say reporting Mardia coefficient is outdated, I need to report it in my thesis. Could you be so kind and explain to me what the mean and what the sample value is? What is the difference?

Thank you in advance!

Linda K. Muthen posted on Wednesday, September 04, 2013 - 9:28 am

Please send your output and license number to support@statmodel.com.

Laura Mezquita posted on Friday, October 11, 2013 - 11:22 am

Hello,
I have evidence of non-normal distribution in my data. So, I've used MLM. It's a SEM with two factors as IV, two mediator variables and two dependent variables.
Then, I made a multi-group analysis.
My doubt is that the d.f. of the model should be double than in the original model. However, I've more degrees of freedom than I expected. I guess than it's being constraint some parameters by default. Which? And how can I freely estimate these parameters?

As a second step, I want to constraint some paths in the model and test If I have some differences between groups. As I'm using MLM, I think I have to test each path separately and calculate the scaled ji squared difference with the Bryant and Satorra (2012) macro. Am I right? Is there any form to test all the differences in the paths when MLM estimator is used at the same time?

Thank you in advance!

Linda K. Muthen posted on Friday, October 11, 2013 - 5:30 pm

Please see the Topic 1 course handout under multiple group analysis. Factor loadings and intercepts are held equal as the default in multiple group analysis. You can see how to relax these equalities in the handout.

You can do a joint test of them all by comparing a model with all coefficients free versus all coefficients held equal or you can do them one at a time. You can also use MODEL TEST.

Sabrina Thornton posted on Thursday, November 14, 2013 - 4:37 am

HI,

Following on from the discussion regarding normality test, Dr. Bengt O. Muthen wrote:

"There is nothing wrong with always using MLR (or MLM when no missing data) instead of ML. If you have reasons to suspect that your data are non-normal, I would simply use MLR/MLM and see what difference (compared to ML) that makes to SEs and chi-square. To me, that is the ultimate test of non-normality. I don't find it necessary to do a Mardia test - I never do - that was needed in the old days when we didn't have MLR/MLM, but only ML."

I am wondering whether there is a rule of thumb based your experience that the extent of differences of SEs and chi-square (I presumed you meant the chi-square in the model fit indices?) between the outputs estimated by ML and MLM, would warrant a conclusion that the data has non-normality. Please advise. Many thanks.

Linda K. Muthen posted on Thursday, November 14, 2013 - 8:19 am

MLR is robust to non-normality and model misspecification. You can't tell why the standard errors are different. If I were doing maximum likelihood estimation, I would use MLR.

Ibrahim Al-Jubari posted on Sunday, January 26, 2014 - 9:00 am

Dear Drs. Muthen,

I am using Mplus 7.11 with multilevel add on, how can I get the skewness and kurtusis output? Isn't it available for this version? Noting that all observed variables are continuous.

Thank you

Linda K. Muthen posted on Sunday, January 26, 2014 - 9:22 am

You can find skewness and kurtosis using the PLOT command. Look at the histograms. When you right click on the histogram you can view the descriptive statistics which include skewness and kurtosis.

Ibrahim Al-Jubari posted on Sunday, January 26, 2014 - 5:04 pm

I could view one observed variable at a time.
Can I view descriptive statistics for all observed variables?

Thanks a lot Dr. Linda

Linda K. Muthen posted on Monday, January 27, 2014 - 8:39 am

TYPE=BASIC and the SAMPSTAT option give descriptive statistics for a set of variables. They do not however provide skewness and kurtosis.

Lindsay Nicolai posted on Friday, February 28, 2014 - 1:52 pm

Hello. I am trying to determine whether to use the ML or MLM estimator for my data. My x2 value with ML is 638.00 and my x2 value with MLM is 619.419. The Scaling Correction Factor is 1.030. I know that values >1 are indicative of nonnormality; however, is this a hard and fast rule? Or is there some judgement involved? To me, 1.030 is very close to 1.

Bengt O. Muthen posted on Friday, February 28, 2014 - 1:56 pm

You may also see if the SEs differ substantially.

A good choice is MLR which is robust to non-normality as well as some other mis-specifications.

Mario posted on Monday, February 19, 2018 - 11:53 pm

Dear Dr, Muthen,
regarding your comment on May 16, 2008 - 5:39 pm, when you stated that MLR and MLM, even ML, doesnt require multivariate normality anymore cause they are quite robust.
Do you know of any publication stating it?

thanks in advance

Bengt O. Muthen posted on Tuesday, February 20, 2018 - 2:31 pm

MLR and MLM give robust SEs. They also give estimates robust to non-normality as does ML. ML, however, does not give robust SEs. This is written about in many sources - perhaps you can google articles in the SEM journal by for instance Kei-Hai Yuan or Victoria Savalei.

Han-Jung Ko posted on Friday, April 26, 2019 - 11:15 pm

Dear Drs. Muthen,
Regarding your comment on Feb 20, 2018 - 2:31 pm, I have tried to run multiple group 8-factor CFA using MLR and MLM because the outcome variables are non-normally distributed. However, the model cannot converge using MLR or MLM but only ML. I am guessing maybe it is due to my sample size (Ngroup 1 = 228, Ngroup 2 = 127, Ngroup 3 = 281). Can I adjust the SEs in the ML results to accommodate the non-normality? If so, how can I do using Mplus? (I did look up Kei-Hai Yuan and Victoria Savalei's studies but have not found something to apply to my case)
Thank you,
Koko

Bengt O. Muthen posted on Saturday, April 27, 2019 - 6:32 am

MLR and MLM give the same parameter estimates as ML and should in principle have the same convergence/non-convergence issues. I would try to figure out why MLR doesn't converge when ML does. You can send your 2 outputs to Support along with your license number.