Two-level exploratory factor analysis
Message/Author
 Feng Liu posted on Wednesday, September 16, 2015 - 3:29 pm
Hi,

I am running a two-level EFA with categorical data (30 items) with students nested in teachers. I used the following commands to investigate the internal structure:
ANALYSIS: TYPE = TWOLEVEL EFA 1 5 UW 1 1 UB;
SAVEDATA:
SIGBETWEEN IS Two Level EFA2 sigma.dat;
SAMPLE IS Two Level EFA2.dat;

The design team theorized the factors should all be at student level and they do not care much about the teacher level factor(s). Therefore, I used 1 1 UB. However, the project team would like to account for the between level variance when looking at the results. I can check the goodness of fit indices, factor loading matrix, factor correlation matrix and factor structure matrix for model comparison. However, I am not sure how I can get the proportion of variance/covariance explained by the two levels, which is what the team wants. The command SAMPLE IS Two Level EFA2.dat should generate the sample correlation and covariance matrices based on the MPlus user guide v7 (p.g., 749). However, when I opened the file using Notepad I saw a 93*5 matrix without any title explanation. I really appreciate it if you can help me understand the matrix (e.g., why is 93*5 matrix, what this matrix is about). Any suggestions on the investigation of the internal structure and proportion of variance/covariance explained by different levels would also be greatly appreciated. Thanks so much!

Best regards,
 Feng Liu posted on Wednesday, September 16, 2015 - 3:37 pm
The results also show that there are 3 eigenvalues for within level sample correlation matrix larger than 1 and 4 for between level matrix larger than 1. If go with the rule of thumb keeping factors with eigenvalue larger than 1, does that mean there should be 3 within factors and 4 between factors, which does not align with the design team's theory? Thanks in advance!
 Bengt O. Muthen posted on Wednesday, September 16, 2015 - 5:21 pm
I would not go by eigenvalues.

Regarding variance contributed by the two levels, divide the between variance by the total variance, either from the sample values or from the estimated model. Note that these variances differ by item.
 Feng Liu posted on Wednesday, September 16, 2015 - 7:55 pm
Thanks so much for the quick response, Dr. Muthen. I agree that, as you suggested, the intraclass correlation coefficient (ICC) can be derived by dividing the between variance by the total variance. When you say "from the estimated model", I assume you meant dividing ESTIMATED RESIDUAL VARIANCES from BETWEEN LEVEL RESULTS by (ESTIMATED RESIDUAL VARIANCES from BETWEEN LEVEL RESULTS + ESTIMATED RESIDUAL VARIANCES from WITHIN LEVEL RESULTS) for each item. Please let me know if I am mistaking anything.

If as you said, I should not go by eigenvalues to determine the number of factors for the internal structure, do you have any suggestions on the investigation of which model fits the data better? Do I just use the goodness of fit indices for that purpose? My project team also wants to use a one level EFA to test the internal structure. Do you think I should compare all the two level EFA models and one level models using the indices for model selection? Thanks in advance.

Best regards,
 Bengt O. Muthen posted on Thursday, September 17, 2015 - 8:55 am
No, you want to look at the Estimated variance, not just the residual variance. You find the Estimated variance in the RESIDUAL section of the output.

To determine the number of factors you should go by the fit indices.
 Feng Liu posted on Thursday, September 17, 2015 - 9:29 am
Thanks so much for the response, Dr. Muthen. I assume the default output does not include the Estimated variance. I need to use OUTPUT command to request that. Should I use OUTPUT: SAMPSTAT for this purpose? Please forgive my limited knowledge on this. Thanks so much!

Best regards,
 Feng Liu posted on Thursday, September 17, 2015 - 11:13 am
Another question I have is about the model selection. When I check the model fit indices, I found the one-level 5-factor model has the similar model fit indices with the two-level with 2-within factor and 3-between factor model all meeting the goodness of fit requirements (i.e., TLI and CFI>.95, RMSEA<.06, SRMR<.05), should I go with the one-level 5-factor model or the two-level with 2-within factor and 3-between factor model? Or do I need to check the estimated variance from the sample values (I assume will use HLM to test whether ICC is significant) or the estimated two level model for the model selection? Sorry to have so many questions. I really appreciate your help!
 Bengt O. Muthen posted on Thursday, September 17, 2015 - 6:17 pm
First post:

use

OUTPUT: RESIDUAL;
 Bengt O. Muthen posted on Thursday, September 17, 2015 - 6:20 pm
Second post:

As a first step you should check if you need 2-level modeling or 1-level modeling. Do 2-level BASIC and see if the level 2 variances are significant or not.

For the 2-level modeling it is rare to need more factors on between than on within.
 Feng Liu posted on Thursday, September 17, 2015 - 8:07 pm
Thanks for the guidance on the model selection (e.g., whether 2-level or 1-level model), Dr. Muthen. I assume when you mentioned "Do 2-level BASIC" you meant to run the two level EFA with the command
ANALYSIS: TYPE = TWOLEVEL EFA 1 5 UW 1 3 UB;
The number of allowed within factors and between factors can be flexible (e.g., can be 1 5 UW 1 1 UB or 1 5 UW 1 3 UB), but need to test the significance of level 2 variances for each model. Or is there one specific 2-level BASIC model? Thanks!

Best regards,
 Linda K. Muthen posted on Friday, September 18, 2015 - 6:34 am
Use TYPE = TWOLEVEL BASIC; and no MODEL command to get descriptive statistics. Look at the variances for the between level.
 Feng Liu posted on Friday, September 18, 2015 - 12:18 pm
Thanks so much for the guidance, Dr. Muthen. As you suggested, I used the following codes to check the variances at the two levels

TITLE: this is an example of a two-level
exploratory factor analysis with
ordinal factor indicators
DATA: FILE IS KEIForAnalysis_Fall2014 for EFA.CSV;
VARIABLE: NAMES ARE y1-y30 clus;
USEVARIABLES = y1-y30;
CATEGORICAL ARE y1-y30;
CLUSTER = clus;
missing are all (-9);
ANALYSIS: TYPE = TWOLEVEL BASIC;
output:
RESIDUAL;

All the items (y1-y30) are ordered categorical (ordinal). Originally, I used the command
output:RESIDUAL sampstat; and got the warning message:
SAMPSTAT option is not available when all outcomes are censored, ordered
categorical, unordered categorical (nominal), count or continuous-time
survival variables. Request for SAMPSTAT is ignored.
Therefore, I removed the sampstat. In the output, I only saw the following two sections SUMMARY OF DATA and UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES have information. I could not find the variances of the two levels. Would you please let me know what I did wrong and how to correct them? Thanks so much!

Best regards,
 Bengt O. Muthen posted on Friday, September 18, 2015 - 6:03 pm
Please send your input, output, data, and license number to support so we can see exactly what you are doing.
 Leonhard A. Bakker posted on Thursday, July 07, 2016 - 3:23 am
Dear Bengt and Linda,

What is the meaning and interpretation of unrestricted covariance in multilevel exploratory factor analysis?

Kind regards, Leonhard
 Bengt O. Muthen posted on Thursday, July 07, 2016 - 9:39 am
The within and between covariance matrices are both unrestricted, that is, "saturated".