Mplus Discussion >> Bootstrap complex sample data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Bootstrap complex sample data

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Bing T. Chen posted on Sunday, September 30, 2018 - 12:28 pm

Dear Dr. Muthen,
I used the code for Example 13.19 in Mplus user's guide to generate and save bootstrap data sets, but I am not sure about the the variable command:

VARIABLE: NAMES ARE y1-y4 weight strat psu;
WEIGHT = weight;
STRATIFICATION = strat;
CLUSTER = psu;

I used ECLS:2011 in my project which used 3-stage sampling design: students nested in schools, schools nested in PSUs (PSUs are chosen from strata). Then in this case, what is stratification= and what is cluster= ?

Thank you.

Respectfully,

Bing

Tihomir Asparouhov posted on Monday, October 01, 2018 - 10:02 am

PSU is usually on the cluster command. The STRATIFICATION = command should be the variable that indicates which PSUs are coming from the same stratum - it is a separate variable.

Bing T. Chen posted on Monday, October 01, 2018 - 10:31 am

Thank you, Dr. Asparouhov.

The dataset I used is ECLS:2011. I used the base year data because sampling weights are provided for school level and student level only for base year. Based on the User’s Manual for the ECLS-K:2011 Kindergarten Data File and Electronic Codebook, Public Version, it says:

The sample for the ECLS-K:2011 was selected using a three-stage process. In the first stage of sampling, the country was divided into primary sampling units (PSUs), or geographic areas and 90 PSUs were sampled for inclusion in the study. In the second stage, samples of public and private schools were selected within the sampled PSUs. In the third stage of sampling, children enrolled in kindergarten and 5-year–old children in ungraded schools or classrooms were selected within each sampled school.

In terms of this paragraph, I think, SCHOOLID is on the cluster command.Am I right? If I am right, what is on the stratification command? PSU?

I am looking forward to your reply.

Thank you.

Best,

Bing

Tihomir Asparouhov posted on Monday, October 01, 2018 - 11:27 am

PSU is always on the cluster command. There is no stratification in this sampling scheme. Second stage sampling information is not included and generally has little influence. If you want you can look into two-level modeling where you can use
cluster=PSU SCHOOLID.

Bing T. Chen posted on Tuesday, October 02, 2018 - 5:19 pm

Dear Dr. Asparouhov,
Thank you for your reply.
I am bootstrapping complex samples. If I use cluster =PSU SCHOOLID, there are two clusters here. So I have to use type=complex twolevel; or simply use type=twolevel. But BOOTSTRAP is not allowed with TYPE=TWOLEVEL.

In this case, how to do bootstrapping?

Thank you.
Best,

Bing

Tihomir Asparouhov posted on Wednesday, October 03, 2018 - 7:54 am

We don't have bootstrap for two-level. You can use MLR if you decide to do a two-level model.

Bing T. Chen posted on Wednesday, October 03, 2018 - 9:03 pm

Hi Dr. Asparouhov,
Thank you for your reply.
I double checked the ECLS:2011 user's manual and I found there is no PSU and stratum identification variable. They have variables of child ID, school ID, parent ID, teacher ID,child care provider ID, and even TWIN ID.
In this case, can I bootstrap complex data set? If I can, how to do that? Thank you.

Best,

Bing

Tihomir Asparouhov posted on Thursday, October 04, 2018 - 8:16 am

The quote you used above describing the sample states that there is PSU.

If you don't have PSU variable you can use child ID as PSU (i.e. as the cluster variable in Mplus). That assumes the observations are independent but you will be able to use weight and bootstrap.

Bing T. Chen posted on Thursday, October 04, 2018 - 8:53 am

Dear Dr. Asparouhov,
Thank you for your reply.
What you said is true. Based on the introduction of sampling design, it is three-stage process: first PSU, then school, and finally children. The problem is I cannot find PSU and stratum identifiers. All there is is the weight for PSU and stratum, eg. W1C0PSU, W1C0STR.

But there is school ID in the data set. So school is the cluster in the data set. In this case, how can I do bootstrapping?

Thank you.

Best,

Bing

Tihomir Asparouhov posted on Thursday, October 04, 2018 - 10:34 am

Use school as PSU - you can still account for some non-independence within the schools that way so use cluster=school and probably the weight should be the product of the two weights.

Bing T. Chen posted on Thursday, October 04, 2018 - 12:15 pm

Dear Dr. Asparouhov,
Thank you for your patience.

So what you said is to use school for PSU and cluster command (school is treated as psu and cluster at the same time, if my understanding is correct).
The sample weights in the data set I am using are school-level weights, and child-level weights respectively.
My idea is first to bootstrap complex data set as the structure of the empirical data set (ECLS:2011). Then analyze the empirical data set and replicate data sets with various estimation methods.

Since my data set has two levels: school level, and child level. When I bootstrap the data set, I cannot use type=twolevel with bootstrap. But I cannot use type=complex either since I have two level data set and two level weights. In this case, I don't know how to bootstrap the data sets. I am eager for your ideas on it.

Thank you again.

Best,

Bing

Tihomir Asparouhov posted on Friday, October 05, 2018 - 3:00 pm

Two-level bootstrap is not available in Mplus. You can use type=complex.