Mplus Discussion >> Factor Analysis with Ordinal and Continuous Data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Factor Analysis with Ordinal and Cont...

Mplus Discussion > Exploratory Factor Analysis >

Message/Author

Gaurav Siddhu posted on Tuesday, May 22, 2007 - 1:12 am

Dear Dr Muthen,
I am planing to purchase MPlus for my doctoral research but I have few questions in mind before making the decision. I have almost 25 variables which I plan to use for poverty modelling in the area of research. These variables are in continuous, ordinal and nominal forms. Hence I could not use traditional factor analysis in SPSS. Do you think Mplus is right software for mix data like this? If yes then how does it create correlation matrix? Is it possible to create score for each factor for each individual household to decide who is poor and who is not in my analysis? Could you also do cluster analysis with MPlus?
Mank thanks
Gaurav

Linda K. Muthen posted on Tuesday, May 22, 2007 - 8:09 am

Factor indicators in Mplus can be continuous, censored, binary, ordered categorical (ordinal), counts, or combinations of these variable types. Nominal factor indicators would need to be turned into a set of binary variables. The sample statistics depend on the scale of the variables. Factor scores are available. Cluster analysis is available via mixture modeling.

Gaurav Siddhu posted on Thursday, May 31, 2007 - 1:43 am

Dear Dr. Muthen
Thanks for you reply. Could you also tell me how EFA in Mplus deals with missing values in both continuous and ordinal variables?
Thanks

Linda K. Muthen posted on Monday, June 04, 2007 - 7:13 am

For continuous variables, it is full-information maximum likelihood. For binary and ordinal variables, it is pairwise present.

Gaurav Siddhu posted on Monday, June 04, 2007 - 3:37 pm

Thanks
I have few more questions:
How are the factor scores calculated in MPlus?

How could I obtain factor score after performing EFA on group of continuous, ordinal and dichotomous variable?

Could I then perform cluster analysis on estimated factor scores in MPlus?

Many thanks,
Gaurav

Linda K. Muthen posted on Monday, June 04, 2007 - 3:48 pm

See Technical Appendix 11 which is available on the website. Factor scores are available only through CFA in Mplus at this time.

It sounds like you want to do a factor mixture model. Doing it in one step superior to doing it in two steps. See Example 7.20.

Gaurav Siddhu posted on Tuesday, June 05, 2007 - 10:28 am

Thanks.
The ordinal data that I have to analyse has various levels, they are made of 3, 4 or 5 levels. So will I have to standardise the data before doing EFA in CFA framework? If not, where does the standardisation take place to account for different levels in variables? Is the EFA in CFA framework for ordinal and continuous data done on covariance matrix or correlation matrix? What type of correlation done to create matrix for this kind of non-normal data?
What are the assumptions underlying this kind analysis?
Regards,
Gaurav

Linda K. Muthen posted on Tuesday, June 05, 2007 - 2:50 pm

You should not standardize categorical variables. The numbers represent categories. They have no numeric value. The measure of association used in model estimation takes into account the nature of the variable, for example, it is a Pearson correlation for two continuous variables, a tetrachoric correlation for two binary variables, a polychoric correlation for two ordered polytomous variables, etc. There are several references on the website under the heading Categorical Outcomes that discuss estimation and assumptions for methods for categorical outcomes.

Gaurav Siddhu posted on Saturday, June 16, 2007 - 9:00 am

I noticed that new version of MPlus gives 2 new outputs in EFA: Factor determinacy and factor structure. Could you please explain their significance and how to use them?
Thanks
Gaurav

Linda K. Muthen posted on Sunday, June 17, 2007 - 9:27 am

The factor score determinacy ranges from zero to one and describes how well the factor is measured with one being the best value. The factor structure matrix shows the correlation between the items and the factors. This indicates which items measure the factors best.

Gaurav Siddhu posted on Wednesday, June 27, 2007 - 2:27 pm

Dear Linda,
I just did my first CFA modelling in MPlus. The very first problem is that i get WARNING: VARIABLE Q2 MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
Q2 is income variable. It is a continuous variable.
Please advice me what to do.
Regards

Linda K. Muthen posted on Wednesday, June 27, 2007 - 2:41 pm

You should send your input, data, output, and license number to support@statmodel.com. You may be reading your data incorrectly.

Gaurav Siddhu posted on Wednesday, June 27, 2007 - 3:51 pm

Dear Linda,
I hope you got the data file I emailed you. When I run EFA with the same data file I get perfect result. However, with CFA it is giving same error message: WARNING: VARIABLE Q2 MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
Regards

Gaurav Siddhu posted on Wednesday, June 27, 2007 - 5:28 pm

Another quick question..
when i was doing EFA with all my 23 variable i got message of non positive definite matrix. So i tried to do efa with small number of variable adding one at a time to see which variable is causing the problem. What I found is that analysis goes smoothly until I have added 20 variables in analysis and when I added 21st variable I got same error message again. So I dropped 21st variable and added 22nd and 23rd variable and gor the same message. Then I removed one variable from first 20 in my list and added 21 or 22 or 23 variable and the analysis showed no error. Does that mean variable size effect matrix being positive definite or not? or what could be possible reason for that.
Regards

Linda K. Muthen posted on Thursday, June 28, 2007 - 9:37 am

With categorical outcomes, the sample correlations are estimated pairwise. This can result in a non-positive definite matrix. It is not a direct function of the number of variables although as the number of variables increases, the probability of this occurring increases. It does not affect the results.

yang posted on Thursday, September 20, 2007 - 12:17 pm

Drs. Muthen,

I am doing a CFA on a set of binary (0/1) variables for a unidimensional structure. I got the factor scores for each of the subjects, and I assume that Mplus is calculating the factor scores based on the tetrachoric correlation matrix instead of Pearson correlation matrix. However, I am not 100% sure. Would you mind kindly confirming this assumption? Thanks.

Linda K. Muthen posted on Thursday, September 20, 2007 - 12:45 pm

Technical Appendix 11 on the website describes factor score estimation.

robertav posted on Friday, September 21, 2007 - 6:43 am

Dear authors,
I'm carring out a simple EFA, with 9 ordinal indicators.
With 2-factor solution I obtain this result:

FACTOR DETERMINACIES
1 2
________ ________
1 1.002 0.939

You said in this "topic" that the factor score determinacy ranges from zero to one and describes how well the factor is measured, with one being the best value.
Why do I obtain the value 1.002?
And how is calculated the factor determinacy?

Really thanks

Linda K. Muthen posted on Friday, September 21, 2007 - 8:01 am

I'm not sure how you got factor determinacies with EFA because I don't think they are available with EFA. Please send your input, data, output, and license number to support@statmodel.com. Note that factor determinacies are available for only continuous outcomes so I think you may not be using the CATEGORICAL option of the VARIABLE command to specify that your outcomes are categorical.

Anna Zajacova posted on Thursday, September 11, 2008 - 9:59 am

Dear Dr. Muthen,

I would greatly appreciate your help with the following question: How is the latent score in CFA calculated for observations with missing values on one or more indicators?
Brief background: I run a simple CFA model with 5 binary indicators. I export the factor score and use it in regressions (I know it would be better/more efficient to estimate a full SEM in Mplus). I read the technical appendix 11 and googled extensively for answer but couldn't find it.
Many thanks in advance for your reply,
Anna Zajacova

Bengt O. Muthen posted on Friday, September 12, 2008 - 9:04 am

The latent variable score for an individual is computed using the posterior distribution of the latent variable. This is based on (1) the model and (2) the data for the person. (1) The estimated model parameters take missing data into account by the usual approach of ML under "MAR", that is using all available data. (2) The data for the person appears in the posterior for each variable that is observed for this person - missing indicators don't contribute.

Anna Zajacova posted on Friday, September 12, 2008 - 11:42 am

Bengt,
Thank you so much for your answer. Would it be correct to say that the estimation effectively imputes the value of the missing indicator(s) for a given individual based on covariances of the non-missing observations from other individuals plus the observed data for the individual, and then calculates the factor score based on the observed values of the non-missing indicators and the 'imputed' values of the missing indicators?
Many thanks in advance,
Anna

Bengt O. Muthen posted on Sunday, September 14, 2008 - 10:21 am

Estimated factor scores are obtained from the posterior = prior + data. The prior becomes the estimated model using all available data. Data is the observed data for the individual.

So instead of your statements, I would say that for a person with missing data on some of the indicators, the estimated model - the prior - is relied on more due to the missing data. There is never an imputation done (although conceptually one might think of that) for either the model estimation step or the factor score estimation step. The important missing data aspect comes in during estimation of the model parameters - that's when you draw on information from correlations with variables without missing to estimate parameters for variables with missing (MAR theory).

Anna Zajacova posted on Monday, September 22, 2008 - 9:34 am

Dear Bengt,
Thank you again for your answer -- things are crystal clear now. Your reply is much appreciated!
Anna

Thomas A. Schmitt posted on Tuesday, December 09, 2008 - 5:00 pm

Hello Linda,

I just wanted to see if the statements below still hold. I am getting factor determinacies with EFA specifying it as categorical data. If they do then I suppose I'm doing something wrong. Also, could you point to a reference as to why non-positive definite matrix will not affect the results in Mplus. Thank you!

Tom

I'm not sure how you got factor determinacies with EFA because I don't think they are available with EFA. Please send your input, data, output, and license number to support@statmodel.com. Note that factor determinacies are available for only continuous outcomes so I think you may not be using the CATEGORICAL option of the VARIABLE command to specify that your outcomes are categorical.

Linda K. Muthen posted on Wednesday, December 10, 2008 - 10:29 am

The statement below is no longer valid.

Janine Neuhaus posted on Wednesday, January 07, 2009 - 9:53 am

Dear Linda!
I would like to come back to Thomas' request. I have nested data and ran an EFA based on the estimated between correlation matrix (using version 5.1). I specified two factors. Same as Thomas I got factor determinacies greater than 1. What does that mean?
One of my items has a negative error variance. Might that be the problem? If yes, how can I handle that?
Thank you very much in advance!
Janine

Linda K. Muthen posted on Wednesday, January 07, 2009 - 10:39 am

A negative error variance makes the results inadmissible so the results are not interpretable. You might want to try the new EFA feature in the MODEL command that came out with Version 5.1. See the Version 5 Examples and Language Addendums on the website. If the negative residual variance is small and not signficant, you could fix it at zero.

dkim posted on Tuesday, February 03, 2009 - 1:30 pm

Dear Linda,
I have 10-15 dichotomous variables. I tried to run both EFA with 1 factor using ML and CFA with 1 factor using both default estimator and ML. All three analyses ran without any errors.

In SPSS with a ML extraction method, after EFT run, I can get a residual matrix (observed - model predicted), which I can get,using MPLUS, from CFA with the default estimator (WLSMV) but not ML estimator. Is there any way I can get the residual matrix either using values on mplus output or specifying MPLUS options in the input file? I have read the manual but I can't find the info.

Thank you
I think

Linda K. Muthen posted on Tuesday, February 03, 2009 - 4:15 pm

In SPSS, the variables are treated as continuous. If you treat them as continuous in Mplus, you will obtain residuals also. In Mplus, treating the variables as categorical with maximum likelihood estimation requires numerical integration. Sample statistics are not sufficient for model estimation. The raw data are used.

Jaime Derringer posted on Thursday, February 25, 2010 - 4:51 pm

I'm trying to produce a correlation matrix for a large number of model variables, some of which are continuous (factor scores) and some of which are ordinal (items not included in the factor scores) When I run Analysis: TYPE=BASIC with both continuous and categorical variables included, M+ is only outputting continuous variables in the resulting correlation matrix.

Is there a way to produce a correlation matrix that includes both continuous and categorical variables?

Linda K. Muthen posted on Thursday, February 25, 2010 - 5:05 pm

I just did a TYPE=BASIC with continuous and categorical outcomes and I get a correlation matrix containing all variables. Please send your output and license number to support@statmodel.com.

Mohamed Abou-Shouk posted on Monday, April 25, 2011 - 11:55 am

Hello,
Doing EFA in spss enables us to suppress small coeffecients of loadings below .40 according to some references, what's about the equivalent value in MPLUS?
in other words, what is the cutt-off value to retain or exclude factors in EFA using MPLUS.

one more question is do i have to run EFA using MPLUS or i can depend on spss factors resulted from factor analysis, especially it shows a good fit when i enter it in CFA in MPLUS.

Thanks,

Bengt O. Muthen posted on Monday, April 25, 2011 - 5:58 pm

Mplus gives you estimated standard errors in order to decide which loadings are significant or ignorable. No arbitrary cut off is necessary.

I would do the EFA in Mplus for the reason above and also because Mplus allows Geomin rotation and modification indices.

See also the recent article on ESEM:

Asparouhov, T. & Muth�n, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397-438.

which you find on our web site.

Mohamed Abou-Shouk posted on Tuesday, April 26, 2011 - 10:26 am

Hi,
I have one binary dependent variable (yes/no) and 2 inddependent latent variables (continuous as they are Likert Scale 1-5). I am trying to measure the impact of those two indpendent variables on the dependent variable.

I have written the syntax as:

data: file is....
variable: names are x1-x39 u1;
categorical is u1;

model:
f1 by x1-x4;
f2 by x5-x9;
f3 by x10-x15;
f4 by x16-x21;

f5 by x22-x25;
f6 by x26-x30;
f7 by x31-x36;
f8 by x37-x39;

f9 by f1-f4;
f10 by f5-f8;

u1 on f9 f10;

Is this syntax correct to measure the above target relationship or i need to add anything else?

I have got the results of model fit as:
RMSEA=0.019- CFI=0.915- TLI=0.909- WRMR= 0.761
Are these fit indices showing a good model fit?

What else fit indices i have to calculate?

Many thanks Indeed,

Linda K. Muthen posted on Tuesday, April 26, 2011 - 10:58 am

Your model setup looks correct. The best way to know if you get what you want is to estimate the model and see which parameters are estimated or look at TECH1. All available fits statistics are given as the default.

Mohamed Abou-Shouk posted on Tuesday, April 26, 2011 - 11:28 am

Are the results of the model in this case targets u1 as 1 or u1 as 0?
Thanks

Linda K. Muthen posted on Tuesday, April 26, 2011 - 1:56 pm

One.

EFried posted on Friday, December 07, 2012 - 1:08 pm

Quick question:

When running EFA with categorical data, and am interested in factor loadings that are generally reported in papers. I am confused as to why the sum of neither the "geomin rotated loadings" nor of the "factor structure" across factors add up to 1. I am used to that kind of output from publications and other programs, in which an item has 1 point of variance to "give" that is distributed across factors.

Am I misinterpreting the MPLUS output?

Thank you

Bengt O. Muthen posted on Sunday, December 09, 2012 - 5:45 pm

Adding up to 1 is a feature of principal component analysis which in early days was used to estimate a factor model. In say, ML EFA there is no such scaling to 1. PCA has a focus on variance explanation but factor analysis instead has a focus on explaining correlations.

EFried posted on Monday, December 10, 2012 - 8:13 am

Thank you Bengt.

Noa Cohen posted on Saturday, May 16, 2015 - 10:23 am

I am conducting a CFA with ordinal (likert) data. I am wondering whether to treat them as categorical or continuous.
Which estimator would you suggest, and other than the estimator, what changes in the analysis once you state the variables as "categorical" rather than continuous?
Thank you.

Linda K. Muthen posted on Saturday, May 16, 2015 - 1:51 pm

If the Likert variables have floor or ceiling effects, piling up at the lowest or highest category, I would treat them as categorical by using the CATEGORICAL option. The default estimator is WLSMV. Maximum likelihood and Bayes are also available.

Ali posted on Tuesday, April 10, 2018 - 11:45 am

I am conducting an EFA with mixed scales. Three items are Likert-scale and two items are binary. Should I use WLSME estimator to run an EFA while treating Likert-scale and binary scale as categorical?

Also, is it possible to obtain factor scores by using Liker-scale and binary scale in a CFA model?

Bengt O. Muthen posted on Tuesday, April 10, 2018 - 3:38 pm

Yes on both.