Sample weights PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
Message/Author
 Scott Weaver posted on Wednesday, May 03, 2006 - 1:21 pm
Hello,
I have two questions regarding the use of sampling weights with Mplus:

(1) If I select a subsample of data, say on a characteristic (e.g., ethnic group), then the sample weights for the subsample will likely not sum to the sample size of the subsample. Is Mplus' rescaling of the weights sufficient or should I recalculate the sample weights for this subsample based on the probability of selection (or is this what Mplus is doing when it rescales the weights).

(2) If I wish to look at a multiple group model where the grouping variable is cohorts (actually aggregated over paired, adjacent cohorts, e.g., 1980&1981, 1981&1982, ...). I have a sample weight variable that is computed based on the probability of selection *within* cohort. Because I want to (a) aggregrate adjacent cohorts and (b) then test a multiple group model where these aggregate 2-yr cohorts are the grouping variable, can I the weight variable calculated within cohort or will the weights be incorrect for my analysis because of the aggregation and multiple group analysis? I hope that this question is clear :-)

Thanks,
Scott
 Linda K. Muthen posted on Thursday, May 04, 2006 - 8:25 am
1. You can use the new SUBPOPULATION option of the VARIABLE command in this situation. See page 403 of the new user's guide which is on the website.

2. It sounds like you have a sample for which weights were determined and that this sample consists of cohorts and that you want to place more than one cohort in a group. So the data would be as follows:

group cohort
1 1
1 2
2 3
2 4

If this is the case, then I think the weights are fine as is.
 Scott Weaver posted on Saturday, May 13, 2006 - 11:08 am
Thank you Linda for your response. I have one follow-up question.
In order to use sample weights, do I need to specify TYPE=COMPLEX? I had originally only specified the weight variable using the WEIGHT IS command. But then I read on p. 205 of the user's manual, "With sampling weights, parameters are estimated by
maximizing a weighted loglikelihood function. Standard error computations use a sandwich estimator. This approach can be obtained by specifying TYPE=COMPLEX in the ANALYSIS command in conjunction with the STRATIFICATION, CLUSTER, and/or WEIGHT
options of the VARIABLE command."

So I then tried to run my analysis with the TYPE=CLUSTER specified, but received this error message:
*** ERROR in Variable command
TYPE = COMPLEX analysis requires CLUSTER or STRATIFICATION option.

Thank you,
Scott
 Linda K. Muthen posted on Saturday, May 13, 2006 - 2:26 pm
No, weights can be used without TYPE=COMPLEX; COMPLEX requires a CLUSTER or STRATIFICATION variable.
 Matthew Diemer posted on Tuesday, June 06, 2006 - 12:05 pm
I have a more 'finicky' problem with sampling weights. I am using NELS data, and the values for some weights are very large (10531.0592 & 11290.0289, for example). [I'm interested in using the SUBPOPULATION command along with cluster and stratification features.]

My understanding is that size of these weight values exceeds the column width when the file is saved as a .dat file to be read into MPlus. Inspecting the data file in WordPad reveals that the size of these two weight values causes subsequent values in the row to be pushed to the right (so the value for variable A is read as the value for variable B, for example).

When I try to run analyses in MPlus with the full values for these two weight variables, the data are not 'read in' correctly.

So, I tried trimming the length of these variables by trimming the weight values of only these two cases to the tenth place (smallest value that would not cause problems in the .dat file).

However, by doing so, I receive a message that "SUM OF GIVEN OBSERVATION WEIGHTS IS 10128.06

IT DOES NOT AGREE WITH THE NUMBER OF OBSERVATIONS 10123"

There might be a simple solution to this problem, but it has escaped me.

Any suggestions?
 Linda K. Muthen posted on Wednesday, June 07, 2006 - 8:25 am
I assume that you are saving the data with a program other than Mplus. You should be able to adjust the format of the saved data to accommodate the width of your weight variable. The error message your report is not something I have seen. Please send your input, data, output, and license number to support@statmodel.com.
 Matthew Diemer posted on Friday, June 09, 2006 - 8:04 am
I think this problem *might* be because I had wlsmv as my estimator and all dependent variables in the model as continuous. Per p. 342 of MPlus user's guide version 3, this is not an available option.

I had been saving the data in .dat format and inspecting the data in WordPad. I'll look into accomodating the width of the weight variable as well.
 J.W. posted on Wednesday, November 18, 2009 - 12:42 pm
I would like to select a sub-sample(e.g., males only) from a data set for modeling in Mplus. The SUBPOPULATION option can be used for sample selection with TYPE=COMPLEX. My data set is not from a complex survey. Is there an option that allows the sub-sample selection. Many thanks for your help!
 Linda K. Muthen posted on Wednesday, November 18, 2009 - 1:48 pm
USEOBSERVATIONS
 J.W. posted on Wednesday, November 18, 2009 - 1:58 pm
oK, I have figured out. USEOBSERVATIONS option works. Thanks.
 John Transue posted on Wednesday, October 26, 2011 - 6:14 pm
My question is very similar to the one that started this thread, but the solution in this thread are not working.

I am using ANES data that has a poststratification weight variable, but no clustering. I am using Mplus 5.2.

I want to run an analysis on just the white respondents, but I am concerned that if I use the USEOBSERVATIONS the weight will no longer be correct.

When I use these commands in the Variables section:
Weight is c1_weigh ;
Subpopulation IS c1_ppeth == 1 ;

Mplus aborts, saying:
*** ERROR in VARIABLE command
The SUBPOPULATION option can be used only with TYPE=COMPLEX.

So then I add TYPE=COMPLEX:
Weight is c1_weigh ;
Subpopulation IS c1_ppeth == 1 ;

Analysis:
TYPE=COMPLEX ;

When I run this, I get this error:
*** ERROR in VARIABLE command
TYPE = COMPLEX analysis requires CLUSTER or STRATIFICATION option.

Above this post, it says to use USEOBSERVATIONS. That does indeed allow the model to run, but how can I be sure that the weights are used correctly?
 Linda K. Muthen posted on Thursday, October 27, 2011 - 5:21 pm
SUBPOPULATION is needed only when you have strata or clusters so with TYPE=COMPLEX. If you have only sampling weights, you should use the default TYPE=GENERAL and USEOBSERVATIONS.
 Wim Van den Broeck posted on Tuesday, June 04, 2013 - 2:25 am
I have a dataset with about 1000 Ss in 165 schools. At both levels units were self-selected (all schools in the population were contacted). I have information on how many teachers are in each school and thus on the chance that they were self-selected, and also on the chance that a school was self-selected based on population characteristics (region and other). So I created two weight variables, 'weight' and 'bweight'. Is that correct till here? When performing a TYPE = TWOLEVEL analysis I can see that the estimation of the dependent variable (opinion about school reform) is clearly influenced by the 'weight' variable, but not by the 'bweight' variable. Can you help me?
 Wim Van den Broeck posted on Tuesday, June 04, 2013 - 2:38 am
Sorry, the last sentence in my previous querry should be the other way around: the dependent variable is influenced by bweight, not by weight.
 Tihomir Asparouhov posted on Thursday, June 06, 2013 - 9:14 am
It all sounds correct. The weight variable probably doesn't have much variation. That is one possible explanation about why you don't see a difference. In general the weight variable should affect the results.
 Wim Van den Broeck posted on Thursday, June 06, 2013 - 9:46 am
Thanks already Tihomir! The strange thing is that when I compare this analysis (with weight and bweight variables) with an analysis in which I delete the weight variable (only bweight left), then I got exactly the same numbers (also in last decimals). Do I have to use the wtscale command? When I do that with unscaled, the results change. Maybe I did something wrong in constructing the weight variable. This is how I did it: when 10 teachers participated in a school with 70 teachers, I assigned the weight of 7 for each of the participating teachers. Actually there is a lot of variability in this weight variable.
Here's my syntax:
VARIABLE: ....
CLUSTER = school;
WEIGHT = w1;
BWEIGHT = w2;

ANALYSIS: TYPE = TWOLEVEL;
MODEL:
%WITHIN%
item21;
%BETWEEN%
item21;
 Tihomir Asparouhov posted on Thursday, June 06, 2013 - 10:40 am
W1 as constructed says that within school the teachers are equally likely to be selected, i.e., it has no information, and it should not affect the results.

If you were running a single level model (not a two-level model) w1 would be meaningful because it would reflect the probability of selection across the entire population.
 Tihomir Asparouhov posted on Thursday, June 06, 2013 - 10:46 am
This article may clarify the usage of weights in multilevel models

https://www.statmodel.com/download/asparouhovgmms.pdf
 Wim Van den Broeck posted on Thursday, June 06, 2013 - 11:00 am
I see. Then what is the appropriate way to deal with the selection bias in this study, considering that both schools (choice by the principal) and teachers who are more in favor of (or against) the school reform may have been more inclined to participate? Thanks again!
 Tihomir Asparouhov posted on Friday, June 07, 2013 - 8:31 am
I think your data set might not have information on that. If you have background information for all the teachers in the school (both those that respond and those that don't respond) you may be able to stratify the weights in a meaningful way. Search the literature on selection bias for more ideas.
 Wim Van den Broeck posted on Saturday, June 08, 2013 - 2:49 am
Actually I have information on the proportions of sex, age, years of experience in the entire population. Because these variables are also related to the dependent variable, I assume I can use them in creating the w1 weights (post stratification). I have two more questions: is the MPlus approach in dealing with selection bias akin with the Heckman approach in econometrics, and what exactly is meant by "approximately" unbiased estimation method in the paper?
 Bengt O. Muthen posted on Saturday, June 08, 2013 - 10:13 am
I would not attempt using weights to solve this problem. I would do two things. First, see how different from the population your sample is with respect to sex, age, etc. Second, I would use those variables as covariates in your model.
 deana desa posted on Wednesday, July 31, 2013 - 4:23 am
Dear Drs. Muthen and Tihomir,
I read this paper, http://statmodel.com/download/Scaling3.pdf, and I have two questions as follows:

1. If an MGCFA model is specified with raw weight (e.g., W1) in conjunction with TYPE IS COMPLEX and CLUSTER IS IDCLUS, does this means Mplus rescaled W1 and thus the weight is add up to the cluster's sample size? Am I misunderstood the default by Mplus when defining weight with COMPLEX and CLUSTER?

2. If the rescaled weight is calculated prior to Mplus analysis (i.e., RW1) and used in the analysis. Is it appropriate or not to define TYPE IS COMPLEX and CLUSTER IS IDCLUS?

3. Is it appropriate to use RW1 without COMPLEX and CLUSTER, that parameters are estimated w.r.t. cluster sample size?
 Tihomir Asparouhov posted on Wednesday, July 31, 2013 - 4:35 pm
1. No. This scaling is done for two-level models not for type=complex.

2 & 3. I would not recommended to rescale the weights with type=complex (if you rescale you can get biased estimates). You should still use type=complex because it accounts for the non-independence of the observations from the same cluster.
 deana desa posted on Thursday, August 08, 2013 - 2:02 am
Thanks, Tihomir.

My follow-up question is the following:

If TYPE=COMPLEX is used for a single level MG-CFA, is there any default scaling that occurs with sampling weights in Mplus 7.11?

That is, are raw weights treated as they are?
 Linda K. Muthen posted on Thursday, August 08, 2013 - 8:07 am
Sampling weights are rescaled to sum to the number of observations in the data set if needed.
 Sarah  posted on Wednesday, July 16, 2014 - 8:32 am
Dear Drs. Muthen
I have a query regarding weights / complex samples. My model involves the application of a weight and the use of the complex sample feature to account for clustering and stratification. In addition I am looking at a subsample. When I ran my model initially I accidently used the command USEOBSERVATIONS as opposed to SUBPOPULATION. The model ran but I received the warning that the standard errors may be incorrect and to use the SUBPOPULATION command. Upon doing this my model would not run and I received the following message:
THE MINIMUM COVARIANCE COVERAGE WAS NOT FULFILLED.
USE THE USEOBSERVATIONS OPTION INSTEAD OF THE SUBPOPULATION OPTION
TO SEE THE OUTPUT OF THE COVARIANCE COVERAGE.
CATEGORICAL VARIABLE GENHLTHR HAS ZERO OBSERVATIONS IN CATEGORY 0.

Iím not sure if the zero observations issue is what is causing the problem or not but Iím slightly puzzled as GENHLTHR does not have zero observation in category 0 when I run the model using the Ďuseobservationsí command. Iím unsure how to address this problem and wonder if you could kindly point me in the right direction?

Many thanks for your help.
Sarah
 Tihomir Asparouhov posted on Wednesday, July 16, 2014 - 12:43 pm
Please send the data and input file to support@statmodel.com
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: