I used ECLS:2011 in my project which used 3-stage sampling design: students nested in schools, schools nested in PSUs (PSUs are chosen from strata). Then in this case, what is stratification= and what is cluster= ?
The dataset I used is ECLS:2011. I used the base year data because sampling weights are provided for school level and student level only for base year. Based on the User’s Manual for the ECLS-K:2011 Kindergarten Data File and Electronic Codebook, Public Version, it says:
The sample for the ECLS-K:2011 was selected using a three-stage process. In the first stage of sampling, the country was divided into primary sampling units (PSUs), or geographic areas and 90 PSUs were sampled for inclusion in the study. In the second stage, samples of public and private schools were selected within the sampled PSUs. In the third stage of sampling, children enrolled in kindergarten and 5-year–old children in ungraded schools or classrooms were selected within each sampled school.
In terms of this paragraph, I think, SCHOOLID is on the cluster command.Am I right? If I am right, what is on the stratification command? PSU?
PSU is always on the cluster command. There is no stratification in this sampling scheme. Second stage sampling information is not included and generally has little influence. If you want you can look into two-level modeling where you can use cluster=PSU SCHOOLID.
Dear Dr. Asparouhov, Thank you for your reply. I am bootstrapping complex samples. If I use cluster =PSU SCHOOLID, there are two clusters here. So I have to use type=complex twolevel; or simply use type=twolevel. But BOOTSTRAP is not allowed with TYPE=TWOLEVEL.
We don't have bootstrap for two-level. You can use MLR if you decide to do a two-level model.
Bing T. Chen posted on Wednesday, October 03, 2018 - 9:03 pm
Hi Dr. Asparouhov, Thank you for your reply. I double checked the ECLS:2011 user's manual and I found there is no PSU and stratum identification variable. They have variables of child ID, school ID, parent ID, teacher ID,child care provider ID, and even TWIN ID. In this case, can I bootstrap complex data set? If I can, how to do that? Thank you.
Dear Dr. Asparouhov, Thank you for your reply. What you said is true. Based on the introduction of sampling design, it is three-stage process: first PSU, then school, and finally children. The problem is I cannot find PSU and stratum identifiers. All there is is the weight for PSU and stratum, eg. W1C0PSU, W1C0STR.
But there is school ID in the data set. So school is the cluster in the data set. In this case, how can I do bootstrapping?
Use school as PSU - you can still account for some non-independence within the schools that way so use cluster=school and probably the weight should be the product of the two weights.
Bing T. Chen posted on Thursday, October 04, 2018 - 12:15 pm
Dear Dr. Asparouhov, Thank you for your patience.
So what you said is to use school for PSU and cluster command (school is treated as psu and cluster at the same time, if my understanding is correct). The sample weights in the data set I am using are school-level weights, and child-level weights respectively. My idea is first to bootstrap complex data set as the structure of the empirical data set (ECLS:2011). Then analyze the empirical data set and replicate data sets with various estimation methods.
Since my data set has two levels: school level, and child level. When I bootstrap the data set, I cannot use type=twolevel with bootstrap. But I cannot use type=complex either since I have two level data set and two level weights. In this case, I don't know how to bootstrap the data sets. I am eager for your ideas on it.