I am attempting to conduct an EFA, subsequently estimate several first and second order CFA models, compare these models in terms of fit (with absolute fit statistics, nested and non-nested comparisons) across subsamples (using multiple group comparisons), and estimate factor scores. Most of these procedures I can already do in Mplus, but I am now also dealing with a complex sample with a sample weight, a strata variable and a cluster variable. The strata variable has 42 values, and there are 84 clusters. In the data set however, the cluster variable has only two values (i.e., 2 cluster values within 42 strata = 84 clusters).
I am very new to the area of complex sample design. First, are all of the analyses I proposed above feasible with a complex sample using Mplus? Second, how do I adapt my EFA/CFA models to adjust for the sample design (note: I am only looking to adjust for the sample design, not obtain detailed information about the strata or clusters)?
So first, for the EFA, do I simply apply the sample weight (and not use the cluster and strata information)? Are there any other issues I need to keep in mind for this analysis when using a complex sample?
Second, how do I adjust my CFA syntax to properly model the weight, cluster, and strata variables (especially the latter two)? Following is an example of my syntax (excluding data and title):
VARIABLE: NAMES ARE hypins fatigue retard agit conc indeci death watedec wateinc insom anhed guilt lworth suic; USEVARIABLES ARE hypins fatigue retard agit conc indeci death watedec wateinc insom anhed guilt lworth suic; CATEGORICAL ARE hypins-suic;
ANALYSIS: TYPE IS GENERAL; ESTIMATOR IS WLSMV; ITERATIONS = 10000; CONVERGENCE = 0.00005;
MODEL: f1 BY fatigue retard agit conc indeci watedec wateinc insom anhed hypins; f2 BY death guilt lworth suic;
I looked in the manual and was confused regarding how to properly model both the cluster and strata variables simultaneously with my sample. And I am assuming that with the sample weight, I can apply that to any analysis as long as I specify the weight command. Any help would be greatly appreciated.
Your proposed analyses certainly seem feasible in Mplus. The complex survey data syntax simply amounts to adding the options
weight = swght; strat = stratum; cluster= psu;
in the Variable command and using Type = Complex in the Analysis command to get the correct SEs and chi-square.
For EFA, only the weight option is available and Type = Complex is not available. But the cluster and strata information is less important in EFA since no SEs are given and you don't really need the chi-square test of fit.
You can read more about the techniques in Mplus Web note #7, forthcoming in the SEM journal. Acknowledging stratification can give important reductions in the SEs.
If I apply the sampling weight to the CFA models I spoke of, I assume that the model parameters will be adjusted according to these weights. When I create factor scores from these "weighted" models, is the sampling weight involved in any additional aspect of factor score estimation, or is it simply that the factor scores will be derived from a model that has been estimated from weighted data? Essentially; I am trying to subsequently use the factor scores created from the CFA models in subsequent linear and logistic regression models (in which the sample weight will also be applied) and want to be sure that I am properly weighting (i.e., not "overweighting," if such a thing is possible) the data.
Also, are there any portions of the EFA output that WILL be affected by not including the cluster and strata information?
I have tried estimating the CFA models I spoke of earlier and am getting the following error:
*** ERROR Each stratum must contain unique cluster IDs. Clusters are not nested within strata.
As I mentioned on 6/3, "The strata variable has 42 values, and there are 84 clusters. In the data set however, the cluster variable has only two values (i.e., 2 cluster values [1 & 2] within 42 strata = 84 clusters)."
Is there any way I can work around the fact that my cluster variable is nested within my strata variable (aside from recoding the cluster variable)? For example, is there a way I can specify properly in my syntax that cluster is to be nested within strata? If there is no other way, would it make sense to code the cluster variable as follows- Within strata #1: cluster; 1 = 1, 2 = 2; In strata #2, cluster; 1 = 3, 2 = 4; strata #3, cluster; 1 = 5, 2 = 6, etc...? Will this strategy handle the complex data appropriately (again, if such recoding is necessary)?
bmuthen posted on Wednesday, June 15, 2005 - 7:46 am
I would like to explore the dimensionality of a new scale on collective efficacy with a set of hierarchical, non-independent data. Therefore I have a question about the following answer you posted about the use of EFA on clustered data: "The only EFA output affected by not including cluster and strata info is the chi-square test of model fit and related fit indices." (June 06, 2005)
I understand, that the chi-square-statistic is biased, but are RMSEA and RMSR also biased? If so, I donīt see how to use the remaining information (factor loadings) to explore the data, since there is no way to determine a reasonable number of factors. Are there any other options to conduct EFA on clustered data?
I would suggest saving the pooled-within sample correlation matrix using the SAVEDATA command and using it as your data for the EFA. The sample size for this analysis would be n minus the number of clusters.
Dear Linda, thanks a lot, this solution works very well.
Marc Reis posted on Thursday, October 13, 2005 - 11:06 am
I would like to do the same as Sandra (EFA on a pooled-within-correlation-matrix with non-independent data). Would you please give a recommendation for the best estimator: With summary data, the choice is between ML and ULS. ML provides the chi-square-statistic and RMSEA, so I would prefer it. Does the use of the pooled-within-correlation-matrix leads to unbiased estimations of both indices? If not, is RMSR trustworthy (with ML and/or ULS)?
I am currently conducting EFA using NESARC. I have read that deleting missing data can adversely affect the weighting variable. I am aware of the subpopulation option in Mplus version 4, I know however that you cannot use it in EFA and that others have resolved this problem by using the USEROBS option. Is it possible to employ the USEROBS option in EFA?
The unrestricted model is the H1 model of unrestricted correlations. If you have that model on one level, tests of fit apply to the model on the other level. See the following technical appendix on the website:
Two-Level Weighted Least Squares Estimation. Proceedings of the Joint Statistical Meeting, August 2007, Biometrics Section
I'm estimating CFA models on clustered data for which only the correlation matrix is available. How do I incorporate/account for the non-independence of observations when I'm using DATA: TYPE IS CORRELATION MEANS STDEVIATIONS?
I would also like to run an EFA (with categorical indicators) on complex data using the suggestion you posted previously (EFA on a pooled-within-correlation-matrix with non-independent data/October 11th, 2005) I read you recommended ML as estimator. However, my data has non-normal distribution so I was considering using MLM or WLSMV. Any recommendations? Thanks in advance,
As a follow-up to my previous post, I tried running the EFA with the pooled within sample corr matrix as data (from the SAVEDATA command) but I get the following error: *** ERROR Unexpected end of file reached in data file. I think I'm not specifying correctly the source of the data. But I'm not sure how to incorporate in the syntax the use of the correlation matrix and my original data Thanks again, itziar
Dear Dr. Muthen, Thanks for your response. I tried your suggestion and obtained a covariance matrix. However, I have categorical factor indicators. Is this correct? Shouldn't I obtain a correlation matrix instead? Thanks! itziar