Mplus Discussion >> Sample weights

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Sample weights

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Scott Weaver posted on Wednesday, May 03, 2006 - 1:21 pm

Hello,
I have two questions regarding the use of sampling weights with Mplus:

(1) If I select a subsample of data, say on a characteristic (e.g., ethnic group), then the sample weights for the subsample will likely not sum to the sample size of the subsample. Is Mplus' rescaling of the weights sufficient or should I recalculate the sample weights for this subsample based on the probability of selection (or is this what Mplus is doing when it rescales the weights).

(2) If I wish to look at a multiple group model where the grouping variable is cohorts (actually aggregated over paired, adjacent cohorts, e.g., 1980&1981, 1981&1982, ...). I have a sample weight variable that is computed based on the probability of selection *within* cohort. Because I want to (a) aggregrate adjacent cohorts and (b) then test a multiple group model where these aggregate 2-yr cohorts are the grouping variable, can I the weight variable calculated within cohort or will the weights be incorrect for my analysis because of the aggregation and multiple group analysis? I hope that this question is clear :-)

Thanks,
Scott

Linda K. Muthen posted on Thursday, May 04, 2006 - 8:25 am

1. You can use the new SUBPOPULATION option of the VARIABLE command in this situation. See page 403 of the new user's guide which is on the website.

2. It sounds like you have a sample for which weights were determined and that this sample consists of cohorts and that you want to place more than one cohort in a group. So the data would be as follows:

group cohort
1 1
1 2
2 3
2 4

If this is the case, then I think the weights are fine as is.

Scott Weaver posted on Saturday, May 13, 2006 - 11:08 am

Thank you Linda for your response. I have one follow-up question.
In order to use sample weights, do I need to specify TYPE=COMPLEX? I had originally only specified the weight variable using the WEIGHT IS command. But then I read on p. 205 of the user's manual, "With sampling weights, parameters are estimated by
maximizing a weighted loglikelihood function. Standard error computations use a sandwich estimator. This approach can be obtained by specifying TYPE=COMPLEX in the ANALYSIS command in conjunction with the STRATIFICATION, CLUSTER, and/or WEIGHT
options of the VARIABLE command."

So I then tried to run my analysis with the TYPE=CLUSTER specified, but received this error message:
*** ERROR in Variable command
TYPE = COMPLEX analysis requires CLUSTER or STRATIFICATION option.

Thank you,
Scott

Linda K. Muthen posted on Saturday, May 13, 2006 - 2:26 pm

No, weights can be used without TYPE=COMPLEX; COMPLEX requires a CLUSTER or STRATIFICATION variable.

Matthew Diemer posted on Tuesday, June 06, 2006 - 12:05 pm

I have a more 'finicky' problem with sampling weights. I am using NELS data, and the values for some weights are very large (10531.0592 & 11290.0289, for example). [I'm interested in using the SUBPOPULATION command along with cluster and stratification features.]

My understanding is that size of these weight values exceeds the column width when the file is saved as a .dat file to be read into MPlus. Inspecting the data file in WordPad reveals that the size of these two weight values causes subsequent values in the row to be pushed to the right (so the value for variable A is read as the value for variable B, for example).

When I try to run analyses in MPlus with the full values for these two weight variables, the data are not 'read in' correctly.

So, I tried trimming the length of these variables by trimming the weight values of only these two cases to the tenth place (smallest value that would not cause problems in the .dat file).

However, by doing so, I receive a message that "SUM OF GIVEN OBSERVATION WEIGHTS IS 10128.06

IT DOES NOT AGREE WITH THE NUMBER OF OBSERVATIONS 10123"

There might be a simple solution to this problem, but it has escaped me.

Any suggestions?

Linda K. Muthen posted on Wednesday, June 07, 2006 - 8:25 am

I assume that you are saving the data with a program other than Mplus. You should be able to adjust the format of the saved data to accommodate the width of your weight variable. The error message your report is not something I have seen. Please send your input, data, output, and license number to support@statmodel.com.

Matthew Diemer posted on Friday, June 09, 2006 - 8:04 am

I think this problem *might* be because I had wlsmv as my estimator and all dependent variables in the model as continuous. Per p. 342 of MPlus user's guide version 3, this is not an available option.

I had been saving the data in .dat format and inspecting the data in WordPad. I'll look into accomodating the width of the weight variable as well.

J.W. posted on Wednesday, November 18, 2009 - 12:42 pm

I would like to select a sub-sample(e.g., males only) from a data set for modeling in Mplus. The SUBPOPULATION option can be used for sample selection with TYPE=COMPLEX. My data set is not from a complex survey. Is there an option that allows the sub-sample selection. Many thanks for your help!

Linda K. Muthen posted on Wednesday, November 18, 2009 - 1:48 pm

USEOBSERVATIONS

J.W. posted on Wednesday, November 18, 2009 - 1:58 pm

oK, I have figured out. USEOBSERVATIONS option works. Thanks.

John Transue posted on Wednesday, October 26, 2011 - 6:14 pm

My question is very similar to the one that started this thread, but the solution in this thread are not working.

I am using ANES data that has a poststratification weight variable, but no clustering. I am using Mplus 5.2.

I want to run an analysis on just the white respondents, but I am concerned that if I use the USEOBSERVATIONS the weight will no longer be correct.

When I use these commands in the Variables section:
Weight is c1_weigh ;
Subpopulation IS c1_ppeth == 1 ;

Mplus aborts, saying:
*** ERROR in VARIABLE command
The SUBPOPULATION option can be used only with TYPE=COMPLEX.

So then I add TYPE=COMPLEX:
Weight is c1_weigh ;
Subpopulation IS c1_ppeth == 1 ;

Analysis:
TYPE=COMPLEX ;

When I run this, I get this error:
*** ERROR in VARIABLE command
TYPE = COMPLEX analysis requires CLUSTER or STRATIFICATION option.

Above this post, it says to use USEOBSERVATIONS. That does indeed allow the model to run, but how can I be sure that the weights are used correctly?

Linda K. Muthen posted on Thursday, October 27, 2011 - 5:21 pm

SUBPOPULATION is needed only when you have strata or clusters so with TYPE=COMPLEX. If you have only sampling weights, you should use the default TYPE=GENERAL and USEOBSERVATIONS.

Wim Van den Broeck posted on Tuesday, June 04, 2013 - 2:25 am

I have a dataset with about 1000 Ss in 165 schools. At both levels units were self-selected (all schools in the population were contacted). I have information on how many teachers are in each school and thus on the chance that they were self-selected, and also on the chance that a school was self-selected based on population characteristics (region and other). So I created two weight variables, 'weight' and 'bweight'. Is that correct till here? When performing a TYPE = TWOLEVEL analysis I can see that the estimation of the dependent variable (opinion about school reform) is clearly influenced by the 'weight' variable, but not by the 'bweight' variable. Can you help me?

Wim Van den Broeck posted on Tuesday, June 04, 2013 - 2:38 am

Sorry, the last sentence in my previous querry should be the other way around: the dependent variable is influenced by bweight, not by weight.

Tihomir Asparouhov posted on Thursday, June 06, 2013 - 9:14 am

It all sounds correct. The weight variable probably doesn't have much variation. That is one possible explanation about why you don't see a difference. In general the weight variable should affect the results.

Wim Van den Broeck posted on Thursday, June 06, 2013 - 9:46 am

Thanks already Tihomir! The strange thing is that when I compare this analysis (with weight and bweight variables) with an analysis in which I delete the weight variable (only bweight left), then I got exactly the same numbers (also in last decimals). Do I have to use the wtscale command? When I do that with unscaled, the results change. Maybe I did something wrong in constructing the weight variable. This is how I did it: when 10 teachers participated in a school with 70 teachers, I assigned the weight of 7 for each of the participating teachers. Actually there is a lot of variability in this weight variable.
Here's my syntax:
VARIABLE: ....
CLUSTER = school;
WEIGHT = w1;
BWEIGHT = w2;

ANALYSIS: TYPE = TWOLEVEL;
MODEL:
%WITHIN%
item21;
%BETWEEN%
item21;

Tihomir Asparouhov posted on Thursday, June 06, 2013 - 10:40 am

W1 as constructed says that within school the teachers are equally likely to be selected, i.e., it has no information, and it should not affect the results.

If you were running a single level model (not a two-level model) w1 would be meaningful because it would reflect the probability of selection across the entire population.

Tihomir Asparouhov posted on Thursday, June 06, 2013 - 10:46 am

This article may clarify the usage of weights in multilevel models

https://www.statmodel.com/download/asparouhovgmms.pdf

Wim Van den Broeck posted on Thursday, June 06, 2013 - 11:00 am

I see. Then what is the appropriate way to deal with the selection bias in this study, considering that both schools (choice by the principal) and teachers who are more in favor of (or against) the school reform may have been more inclined to participate? Thanks again!

Tihomir Asparouhov posted on Friday, June 07, 2013 - 8:31 am

I think your data set might not have information on that. If you have background information for all the teachers in the school (both those that respond and those that don't respond) you may be able to stratify the weights in a meaningful way. Search the literature on selection bias for more ideas.

Wim Van den Broeck posted on Saturday, June 08, 2013 - 2:49 am

Actually I have information on the proportions of sex, age, years of experience in the entire population. Because these variables are also related to the dependent variable, I assume I can use them in creating the w1 weights (post stratification). I have two more questions: is the MPlus approach in dealing with selection bias akin with the Heckman approach in econometrics, and what exactly is meant by "approximately" unbiased estimation method in the paper?

Bengt O. Muthen posted on Saturday, June 08, 2013 - 10:13 am

I would not attempt using weights to solve this problem. I would do two things. First, see how different from the population your sample is with respect to sex, age, etc. Second, I would use those variables as covariates in your model.

deana desa posted on Wednesday, July 31, 2013 - 4:23 am

Dear Drs. Muthen and Tihomir,
I read this paper, http://statmodel.com/download/Scaling3.pdf, and I have two questions as follows:

1. If an MGCFA model is specified with raw weight (e.g., W1) in conjunction with TYPE IS COMPLEX and CLUSTER IS IDCLUS, does this means Mplus rescaled W1 and thus the weight is add up to the cluster's sample size? Am I misunderstood the default by Mplus when defining weight with COMPLEX and CLUSTER?

2. If the rescaled weight is calculated prior to Mplus analysis (i.e., RW1) and used in the analysis. Is it appropriate or not to define TYPE IS COMPLEX and CLUSTER IS IDCLUS?

3. Is it appropriate to use RW1 without COMPLEX and CLUSTER, that parameters are estimated w.r.t. cluster sample size?

Tihomir Asparouhov posted on Wednesday, July 31, 2013 - 4:35 pm

1. No. This scaling is done for two-level models not for type=complex.

2 & 3. I would not recommended to rescale the weights with type=complex (if you rescale you can get biased estimates). You should still use type=complex because it accounts for the non-independence of the observations from the same cluster.

deana desa posted on Thursday, August 08, 2013 - 2:02 am

Thanks, Tihomir.

My follow-up question is the following:

If TYPE=COMPLEX is used for a single level MG-CFA, is there any default scaling that occurs with sampling weights in Mplus 7.11?

That is, are raw weights treated as they are?

Linda K. Muthen posted on Thursday, August 08, 2013 - 8:07 am

Sampling weights are rescaled to sum to the number of observations in the data set if needed.

Sarah posted on Wednesday, July 16, 2014 - 8:32 am

Dear Drs. Muthen
I have a query regarding weights / complex samples. My model involves the application of a weight and the use of the complex sample feature to account for clustering and stratification. In addition I am looking at a subsample. When I ran my model initially I accidently used the command USEOBSERVATIONS as opposed to SUBPOPULATION. The model ran but I received the warning that the standard errors may be incorrect and to use the SUBPOPULATION command. Upon doing this my model would not run and I received the following message:
THE MINIMUM COVARIANCE COVERAGE WAS NOT FULFILLED.
USE THE USEOBSERVATIONS OPTION INSTEAD OF THE SUBPOPULATION OPTION
TO SEE THE OUTPUT OF THE COVARIANCE COVERAGE.
CATEGORICAL VARIABLE GENHLTHR HAS ZERO OBSERVATIONS IN CATEGORY 0.

I�m not sure if the zero observations issue is what is causing the problem or not but I�m slightly puzzled as GENHLTHR does not have zero observation in category 0 when I run the model using the �useobservations� command. I�m unsure how to address this problem and wonder if you could kindly point me in the right direction?

Many thanks for your help.
Sarah

Tihomir Asparouhov posted on Wednesday, July 16, 2014 - 12:43 pm

Please send the data and input file to support@statmodel.com

Lisa M. Yarnell posted on Saturday, May 30, 2015 - 2:27 pm

Hello. In my analyses, I have students nested in teachers, with probability weights available for each. In general, when we apply student probability weights to student-focused models, we are able to generalize the results to the population of students represented by the sample. Conversely, when we apply teacher probability weights to teacher-focused models, we are able to generalize results to the population of teachers represented by the sample of teachers.

In two-level modeling where we are able to apply probability weights on both levels (Mplus' WEIGHT and BWEIGHT), what sorts of generalizing statements can be made? Do the results thus generalize to (reflect the population of) both students on level-1 and teachers on level-2? So either type of generalizing statements may be made in interpreting the results?

Thank you.

Linda K. Muthen posted on Sunday, May 31, 2015 - 11:58 am

Yes.

Samuli Helle posted on Wednesday, November 25, 2015 - 2:49 am

Hi. I�d use some advice how to best take the sampling characteristics of my data into account in Mplus. I have data from four parishes. For each parish, I should have all the individuals born since year x (i.e. a census type of data) but before that year I have just samples from each parish. Should I use sampling weights or is there a better way to handle data like this in Mplus?

Tihomir Asparouhov posted on Wednesday, November 25, 2015 - 10:34 am

Sampling weights looks reasonable.

Samuli Helle posted on Wednesday, December 09, 2015 - 4:22 am

Thanks Tihomir! Should the calculation of sampling weights take missing data patterns across strata into account?

Tihomir Asparouhov posted on Wednesday, December 09, 2015 - 3:40 pm

No. Just code the missing values in the usual Mplus style, i.e., code the missing values in the data as 999 and use this option "missing = all(999);" in the variable command.

Samuli Helle posted on Wednesday, December 09, 2015 - 10:43 pm

Ok, Thanks a lot!

Frank Reichert posted on Wednesday, February 24, 2016 - 8:09 pm

Hello, I am wondering which weighting/scaling method I should use in two- and three-level analysis. I have data that were collected through a two-stage stratified cluster sampling procedure (students nested in schools). My database provides a final student weight, which is the product of a student weight and a school weight (but I cannot get access to these two components).

I know that the school weight was computed as the product of the inverse of the probability of selection of the school and the school non-response adjustment. The student weight was calculated as the product of the inverse of the probability of selection of the student from the sampled school and the student level non-response adjustment.

Now since I have only their products, is it reasonable to use this product as a weight in multilevel analysis? And if so, is unscaled the appropriate scaling method?

I would be happy for any recommendation on this.

Tihomir Asparouhov posted on Thursday, February 25, 2016 - 9:50 am

The product weight is designed for single level analysis so consider using type=complex instead of type=twolevel analysis if you can not obtain separately the within and the between level weights.

If you want to use type=twolevel and you can not obtain the within and between level weights separately your best bet would be to specify the product weight as a within level weight (weight=) and use Mplus default scaling method (so don not specify a scaling= command). This approach takes care of student level weighting but not the school level weighting.

Further information can be obtained in

http://statmodel.com/download/Scaling3.pdf
and
http://statmodel.com/download/asparouhovgmms.pdf

Frank Reichert posted on Wednesday, March 02, 2016 - 4:29 pm

Thanks for your answer. I figured that I could obtain the replicate weights, but I actually want to do multiple group comparisons and examine the role of school level variables (my focus is not on individual level variables). To my understanding, multiple group analysis is not implemented in type=complex, is this correct?

Tihomir Asparouhov posted on Wednesday, March 02, 2016 - 11:04 pm

Multiple group is available with type=complex.

Frank Reichert posted on Wednesday, March 02, 2016 - 11:28 pm

Thank you, I must have mixed that up then.

Dan Berry posted on Friday, April 14, 2017 - 8:36 am

Hello,
I�m trying to wrap my head around the following: I�m fitting a two-level model to longitudinal data. It is specified to estimate the within-person effect of (group-mean-centered) X and on Y (Level 1) and the between-person effect of person-mean X on Y (Level 2).

To adjust for a large set of potential time-invariant confounds in the between-person estimate, I�ve created a set of inverse propensity score weights for person-mean X. I include these weights using the �bweight� function.

As expected, the inclusion of the �bweight� impacts the fixed effect and SEs of the between-person X effect. I was unclear, however, why it is also having a pronounced effect on the within-person X effect. I�m clearly missing something about the weighting process. A little help? Thanks in advance.

Tihomir Asparouhov posted on Friday, April 14, 2017 - 10:54 pm

You can see the weighting here
http://statmodel.com/download/Scaling3.pdf

The bweight is designed to take care of sampling weights and to be 1/ prob of selection. So if one cluster has weight 2 v.s. another with weight 1 - it is like adding a copy of the first cluster to make it twice as important. Bottom line is - the bweight applies to the entire cluster not just the model on the between level.

If you want the within level to be unaffected (which I am not sure that you should ... reading your description it doesn't sound like this is much different from sampling weight) you can try using the opposite weight on the within level 1/bweight in combination with the unscaled option. It could work.

Dan Berry posted on Saturday, April 15, 2017 - 11:24 am

Thanks, Tihomir! That clears it up.
Best,
-dan

jintana jankhotkaew posted on Thursday, August 03, 2017 - 9:01 am

Hello,
If I would like to apply sampling weight in the multiple linear regression without taking multi-level analysis.

Can I do that by adding sampling weight variable and how? Did I get it right that if I apply sampling weight I cannot apply multiple imputation.
Best regards,
Jintana

Bengt O. Muthen posted on Thursday, August 03, 2017 - 5:23 pm

Use Type = Complex (see UG) together with the weight option.

jintana jankhotkaew posted on Monday, August 07, 2017 - 1:36 pm

Hello,
I have four stratum, but Mplus allows to define only one stratum?
If I have four stratum A=Province, B=District,C=sub-district, D=village and within village I randomly select respondents. What is your suggestion for defining stratum in the command?
Best regards,
jintana

jintana jankhotkaew posted on Monday, August 07, 2017 - 2:40 pm

Hello,
I run the model with type=complex. The message below appears. I did not face this problem when I apply type=general. Could you may be explain why?

THE WEIGHT MATRIX PART OF VARIABLE V IS NON-INVERTIBLE. THIS MAY
BE DUE TO ONE OR MORE CATEGORIES HAVING TOO FEW OBSERVATIONS. CHECK
YOUR DATA AND/OR COLLAPSE THE CATEGORIES FOR THIS VARIABLE.
PROBLEM INVOLVING THE REGRESSION OF V ON AI. THE PROBLEM
MAY BE CAUSED BY AN EMPTY CELL IN THE BIVARIATE TABLE.

THE WEIGHT MATRIX PART OF VARIABLE V IS NON-INVERTIBLE. THIS MAY
BE DUE TO ONE OR MORE CATEGORIES HAVING TOO FEW OBSERVATIONS. CHECK
YOUR DATA AND/OR COLLAPSE THE CATEGORIES FOR THIS VARIABLE.
PROBLEM INVOLVING THE REGRESSION OF V ON EDU3. THE PROBLEM
MAY BE CAUSED BY AN EMPTY CELL IN THE BIVARIATE TABLE.
Best regards,
Jintana

jintana jankhotkaew posted on Monday, August 07, 2017 - 2:48 pm

Hello,
Please, allow me to explain more. I guess the problems of applying type=complex does not work in this case because some of my items for CFA is rare, in this case item V is the problem. If yes,I would apply type=general instead.
Best regards,
Jintana

Bengt O. Muthen posted on Monday, August 07, 2017 - 3:11 pm

Answer to your 1:36 question:

I think as strata you would consider the total number of sub-districts that you have. And then have:

Cluster = Village;

assuming that you have at least 30 or so villages.

Bengt O. Muthen posted on Monday, August 07, 2017 - 3:13 pm

Answer to your 2:40 question:

Explore the reason for the problem by analyzing only the 2 variables V and AI in the model with

V ON ai;

Try to restrict your questions to a post in one window.

jintana jankhotkaew posted on Monday, August 07, 2017 - 11:17 pm

Dear Muthen,
I would like to ensure whether I get it right.
I would type
STRATIFICATION=SUB-DISTRICT;
CLUSTER=VILLAGE;

Best regards,
Jintana

Bengt O. Muthen posted on Tuesday, August 08, 2017 - 1:15 pm

Yes, but you have to make sure that your subdistrict variable has unique values for all the districts in all your districts and provinces.

Note that this is not really an Mplus question but a complex survey data question - you need to understand that literature to do this correctly.

jintana jankhotkaew posted on Tuesday, August 08, 2017 - 11:12 pm

Dear Muthen,,
Thank you very much for your response. I would appreciate if you could explain me more or suggest further reading. In my case, I have sampling weight, weight variable that is accounted for error due to the complex design. In my dataset, there are five variables 1) province: 5 provinces(five provinces- coded 1-5) 2) district variable (3 districts per provinces- coded 1-15) 3) sub-district (2 subdistrict per district-coded 1-30) 4) village variable ( 3 villages per sub-district coded 1-90). I am not sure how to ensure that "subdistrict variable has unique values for all the districts in all your districts and provinces." Normally, if we apply sampling weight in STATA for example, we would apply it without identifying stratum in the command. In this case with Mplus, when we allow to identify stratum with only one variable, I may need your support.
Best regards,
Jintana

Bengt O. Muthen posted on Wednesday, August 09, 2017 - 3:48 pm

You were asking about Strata and therefore I answered about the unique values. But now you are talking about sampling weights which are different from strata:

"if we apply sampling weight in STATA for example, we would apply it without identifying stratum in the command"

I don't know if you understand that these concepts are different - Mplus also does not force you to refer to strata if you are referring to weights. Good complex survey books include Lohr's book Sampling: Design and Analysis. Perhaps you want to talk to a statistical consultant about this.

By unique strata I mean that you can't have a sub-district with the same value in the different districts (for example) - just renumber them.

SRL posted on Thursday, December 21, 2017 - 11:57 am

I am trying to run a moderation analyses including sampling weights but am getting error messages. I have a "weight is" statement and my analyses are" type=general; estimator = ML; bootstrap = 10000." Do you have suggestions on where I have errors?

Bengt O. Muthen posted on Thursday, December 21, 2017 - 3:21 pm

Send your output to Support along with your license number.

Rachel Frieder posted on Wednesday, February 07, 2018 - 1:55 pm

I am running a sequential mediation path model with TYPE=COMPLEX. I have about 500 level 1 cases (subordinates) working for ~280 unique leaders (cluster). I am trying to calculate the sample statistics and correlations (seemingly the easiest part!) among study variables. However, the univariate statistics for the Leader variables show an N of 500. Is there a way to compute the leader-level stats (means, sd's, intercorrelations) and the subordinate-level stats with the appropriate N's?

Bengt O. Muthen posted on Wednesday, February 07, 2018 - 4:24 pm

Run Type=Basic twolevel to get these statistics.

Rachel Frieder posted on Wednesday, February 07, 2018 - 6:28 pm

Thank you (as always) for the speedy reply! The complicating factor is I have a lot of missing data and I am letting MPlus handle it (by default). For instance, many subordinates responded (N=1925) but their supervisors did not (and vice versa). So while I have performance ratings for most of these subordinates, they aren't all included in the final analysis. Only 500 or so are (because matched data is available for only ~280 of their leaders). When I run type= basic twolevel, my N's for the sample stats correspond to the overall N, not the final N that I use in my dataset if that makes any sense. Is there a way to only calculate these statistics for the "final sample". Listwise = on doesn't work, because that further reduces my leaders to N~220 and my subordinates to N~400. Thanks so much (and sorry if this is confusing!!)

Bengt O. Muthen posted on Thursday, February 08, 2018 - 4:20 pm

O probably misunderstand, but why don't you run Type = twolevel basic on the same "matched data" that you use in your analyses?

Dan Berry posted on Thursday, March 08, 2018 - 1:45 pm

Hello. Perhaps a silly question. But I was hoping to get your thoughts on the following re: rescaling weights. I have multiple sampling weights for a given between-person unit (there are multiple weights b/c they�re actually inverse propensity score weights, where the selection model changes over time). I need to aggregate these weights, so that there is one aggregate weight per unit (which I�d then include as a �bweight� in a multilevel model). Taking the product of the raw weights gives me some crazy-huge aggregate weights. I�d like to avoid this.

My question: would it undermine the utility of the weight, if I first rescaled each of the weights like Mplus does (e.g., rescale_wj = wj (n/sum of wj) and then take their product to create and aggregate weight?

Thanks in advance for any thoughts.
-dan

Tihomir Asparouhov posted on Thursday, March 08, 2018 - 3:37 pm

As long as you have the same number of weights per person the scaling doesn't matter since the aggregate weight would be scaled by the same constant across persons. If that is the case, the approach you suggest is reasonable. As long as you have the same number of weights per person, I think the way you rescale the weights before multiplying won't affect Mplus results (apart from round off error), so it is best to get the weights as close to 1 as possible (which is what the approach you suggest does).

Dan Berry posted on Friday, March 09, 2018 - 6:55 am

Thanks, Tihomir. Very helpful (per usual).

Jinxin ZHU posted on Monday, April 02, 2018 - 7:37 pm

May I know why the sampling weights are "rescaled so that they sum to the total number of observations".

Are the results affected by the rescaling process?

Can I release this default setting, as sometimes the sampling weights are sum to the size of the target population?

Tihomir Asparouhov posted on Tuesday, April 03, 2018 - 10:35 am

They are rescaled for convenience to avoid values that are too large or too small. The scale does affect the result, with the exception of the log-likelihood which you can adjust that manually by multiplying by (size of target pop / size sample pop).

Tihomir Asparouhov posted on Tuesday, April 03, 2018 - 11:35 am

There is a typo above - it should say

The scale does NOT affect the result ...

Jinxin ZHU posted on Tuesday, April 03, 2018 - 6:30 pm

Thank you so much. A follow-up question: How about the multilevel analysis.

I found the results are slightly changed with WTSCALE = Unscaled, when I use the student weight at the within level using TIMSS data.

Tihomir Asparouhov posted on Wednesday, April 04, 2018 - 8:39 am

It is completely different in multilevel analysis. Take a look at these two articles:

http://statmodel.com/download/asparouhovgmms.pdf

http://statmodel.com/download/Scaling3.pdf

Jinxin ZHU posted on Thursday, April 05, 2018 - 6:46 pm

Thank you so much Tihomir! These articles are really helpful.

Currently, I am handling the ICCS 2009 and ICCS 2016 data. Are there any papers/examples of how I should specify the sampling weights in a two-level analysis using large-scale international data sets (such as ICCS, PISA, and TIMSS) with Mplus?

Thanks again!

Tihomir Asparouhov posted on Friday, April 06, 2018 - 8:50 am

The above paper is our latest methodological writing on that topic. If you want to see applications you might find some of these useful (back references)

https://scholar.google.com/scholar?hl=en&as_sdt=5,45&cites=4068178252377360914&scipsc=&q=&scisbd=1

Dan Li posted on Friday, May 25, 2018 - 8:50 am

Hello, I have a question regarding estimates based on weighted and unweighted samples.
I used PIRLS data to do Latent Profile Analysis. I specified weight variable as WEIGHT=SCHWGT;
In the output:
Number of observations 370
It seems that Class Counts and Proportions are based on unweighted sample (n=370)
In the LPA, I also included several auxiliary variables. My question is are the estimates (i.e., mean S.E.) for the auxiliary variables across classes based on unweighted sample or weighted sample. Thank you.

Tihomir Asparouhov posted on Friday, May 25, 2018 - 5:13 pm

The weights are standardized to sum up to the total sample size. All the analyses use the standardized weights.

Neil Stenhouse posted on Monday, July 16, 2018 - 11:43 am

Hello,

I am working on an analysis similar to John Transue, above (October 2011 post). I am running a logistic regression analysis on data with sampling weights. There is no clustering.

The user's guide says that for looking at stratified data with TYPE=COMPLEX, "Standard error calculations use a sandwich estimator." However, I am wondering what happens when NOT using TYPE=COMPLEX.

My question is: how does Mplus adjust the standard errors when using the "WEIGHT IS" command, with TYPE=GENERAL, and ESTIMATOR=MLR?

Tihomir Asparouhov posted on Monday, July 16, 2018 - 3:12 pm

It also uses the sandwich estimator - formula (2)

http://www.statmodel.com/download/webnotes/mplusnote72.pdf

Neil Stenhouse posted on Monday, July 16, 2018 - 3:22 pm

Thank you so much Dr. Asparouhov!

One clarification - my model does not contain any latent variables, only manifest variables. The sandwich estimator will still be used in this case?

Samuli Helle posted on Monday, May 27, 2019 - 2:34 am

Is there a way to use different weight variables for different response variables? I mean if I have e.g. response variables DV1-DV3 and weight variables W1-W3. By looking at the UG, this cannot be done (could multigroup setup in a long format solve the problem...).

Tihomir Asparouhov posted on Tuesday, May 28, 2019 - 10:31 am

Maybe you can do that, however, if you have different sampling weights due to non-response over time we recommend that you use missing values to deal with that instead of adjusting the weights. So using W1 only and specifying missing values for non-response is the theoretically correct way of doing this.

Samuli Helle posted on Tuesday, May 28, 2019 - 1:24 pm

Ok, thanks. If I use multigroup approach does the weight variable influence all the response variables like factor indicators in addition to the main DVs of interest?

Tihomir Asparouhov posted on Thursday, May 30, 2019 - 1:13 pm

It does but I am not sure how this is related to the previous question. In principle you can use a univariate two-level analysis where each cluster has size 3 and you can then include the three weight variables but this probably doesn't resolve the underlying issue, which is that sampling weight=1/probability of selection, i.e., inherently the sampling weights should be one. If you are using a multiple group approach I don't think you can have f by dv1-dv3 because there will be 3 groups and the variables would be uncorrelated across groups.

Jilian Halladay posted on Saturday, June 08, 2019 - 5:41 am

Hello,

I have added weights to my latent profile analysis, and the output does not have any errors re: weights, but my profile results are not weighted. I.e. the proportions of each profile is relative to the sample and not the weighted sample.

Do I have to specify something other than weights = to be able to get weighted estimates?

Thanks!

Tihomir Asparouhov posted on Monday, June 10, 2019 - 9:36 am

How did you make the conclusion that they are not weighted? They should be weighted. You might have to send you example and data to support@statmodel.com if you can't resolve it. It could be useful to add the command
savedata: file is 1.dat; save=cprob;
You can then see in details how observations are weighted. You can match the weighted posterior probabilities from 1.dat to the results in this section
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

SEUNGWON SONG posted on Wednesday, March 25, 2020 - 5:30 pm

Hello.
I have a question about 3-level growth model using 6th wave longitudinal study.
The 1 level is "within individual(6 repeated)", 2 level is "b/t individual" and 3 level is "b/t school"
There are level 2 weight and level 3 weight separately, but no level 1 weight.
How can I calculate level 1 weights? Can I set it all to 1/6?
And if there is a missing on level 1 due to panel attrition(drop-out), do I need to calculate the level-1 weight considering the non-response?

Thank you in advance.

Tihomir Asparouhov posted on Thursday, March 26, 2020 - 1:12 pm

I would recommend that you start with User's guide example 9.12, i.e., level 1 is given in wide format. Then specify the level 2 weight with the weight command and the level 3 weight with the bweight command. Attrition should be entered as missing data.

shaun goh posted on Tuesday, May 19, 2020 - 3:51 am

Hello,

I am thinking of measurement invariance testing for a questionnaire across two samples. One sample has weights to correct for selective drop-out and design features. The other sample does not have weights.

grouping = sample (1 = weighted_sample_A 2 = unweighted_sample_B);
weights = var_X;

Assuming the above syntax is correct,
I was wondering what 'var_x' should contain. Is it simply the weights for each individual from sample A and missing flags for individuals from unweighted sample B?

Thank you!
Shaun

Tihomir Asparouhov posted on Tuesday, May 19, 2020 - 10:47 am

var_x should first be standardized so tha the total weight is the same as the sample size, i.e., multiply the original weight variable by (sample size of A)/(sum of all weights). The weight in group B should be set to 1.