Mplus Discussion >> Weighting

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Weighting

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Xu, Man posted on Thursday, February 12, 2009 - 2:32 am

Dear Dr. Muthen,

I have an individual level weighting variable for a multilevel dataset. This variable is a combination of both the school weight and individual weight within a school that the individual belongs to. I specified my model with WEIGHT and TYPE=TOWLEVEL in conjunction. I didn't specify a school weight using BWEIGHT as the weighting variable already containing school weighting information.

Is this appropriate? Or should I try to decompose the original weight into within weight and between weight for the estimation to be proper?

Thank you very much!

Xu, Man

Tihomir Asparouhov posted on Thursday, February 12, 2009 - 12:57 pm

It is not appropriate. You should try to decompose the original weight into within weight and between weight for the estimation to be proper.

Xu, Man posted on Friday, February 13, 2009 - 4:22 am

Dear Tihomir,

Thank you very much for your advice. I do have from the data a seperate school weight variable. So in order to get the "pure" student weight, I just need to divde the overal weight of both school and student (sum to the target population size) by the school weight, right?

A complication confuses me is that the data that I am using (PISA) focuses on the student population rather than the school population, therefore the school weights probably won't sum to the size of the school population. Do you think this is a problem in relation to the analysis?

Thank you very much.

Xu, Man

Tihomir Asparouhov posted on Tuesday, February 17, 2009 - 9:54 am

Xu

This sounds correct: divide the overall weight of both school and student by the school weight.

The fact that the school weights won't sum to the size of the school population is not a problem at all. These weights are rescaled anyway by Mplus to sum up to the sample size from that school - see
http://statmodel.com/download/Scaling3.pdf

Tihomir

Xu, Man posted on Tuesday, February 17, 2009 - 3:11 pm

Dear Tihomir,

Thank you very much for your reply. Your advice is greatly appreciated!

Xu, Man

Paul Norris posted on Wednesday, February 23, 2011 - 8:46 am

Dear Tihomir,

I looked at your 2008 paper on rescaling weights for multilevel analysis but am wondering how best to implement weights with my data. I have a dataset contains individuals (level 1) drawn from 21 countries (level 2).

The data include sampling weights at level 1 - weighting respondents in any given country to reflect the population of that country. The number of individuals sampled within each country varies (typically 2000 respondents per country but, for instance, 1 country provides 7000). The total of weights within any country is equal to the sample size obtained from that country. Now, I assume, weighting just on these weights (with no adjustment) would cause those countries with larger samples to have greater leverage in the calculation of any model.

Given that the sample size obtained within each country is a design artefact (question design was common the fieldwork within each country was left to local survey companies) rather than having a substantive interpretation, I would like to weight my model such that each country has equal impact on the model (my concern is with level 2 covariates).

Is it acceptable to use the level 1 weights as they are, and create level 2 weights to rebalance the impact of each country, or should I rescale my level 1 weights so that when weighted each country appears to have an equal sample size?

Best wishes,

Paul

Tihomir Asparouhov posted on Wednesday, February 23, 2011 - 4:49 pm

Use the level 1 weights as they are and do not create level 2 weights. All countries will then have equal weight. Level 1 weights will not cause countries with larger samples to have greater leverage.

Paul Norris posted on Tuesday, March 01, 2011 - 1:32 am

Dear Tihomir,

Many thanks for your quick response.

Paul

Haigen Huang posted on Thursday, September 13, 2012 - 8:01 am

Dear Dr. Muthen,

Do you know what weights should I use for the PISA 2009 data if I use multiple level models in Mplus?

is the W_FSTUWT(final student weight) at the level one, and the W_FSCHWT(final school weight) at the level two ?

Or should divide the W_FSTUWT by the W_FSCHWT to generate a within school weight. then use it at the level one and use the final school weight at level two?

I will appreciate your help!

Haigen

Tihomir Asparouhov posted on Thursday, September 13, 2012 - 10:30 am

Both approaches will yield the same result. The weights on the within level are scaled to add up to the cluster sample size with the Mplus defaults.

Stefan Vogtenhuber posted on Friday, April 26, 2013 - 12:41 am

hi,
i run a threelevel model on pisa-data (students-schools-countries) and i am mainly interested in variables at the school level (level 2). since pisa school-level sampling is indeed informative, i want to use the school weights, specifying B2WEIGHT = W_FSCHWT;

this gives the following error message (no matter if either raw, normalized or standardized school weights are used, or B2WTSCALE is SAMPLE):

THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL
AND/OR STARTING VALUES.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE
COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.

THE H1 MODEL ESTIMATION DID NOT CONVERGE. CHI-SQUARE TEST AND SAMPLE STATISTICS COULD NOT
BE COMPUTED.

the model works fine without the weights.
what can I do? Any help is appreciated.

Linda K. Muthen posted on Friday, April 26, 2013 - 7:17 am

When you weight the data, you change it so a model that fits the unweighted data may not fit the weighted data. Please send the output and your license number to support@statmodel.com.

Yvonne Terry-McElrath posted on Wednesday, August 14, 2013 - 11:41 am

Dear Tihomir,

I am working with school-level data that is based on nationally-representative samples of 8th, 10th, and 12th grades. Sampling is conducted separately for each grade, and each school receives a sampling probability weight. I am running twolevel random models to conduct multi-level mediation where the cluster variable is state (we are interested in state policy predictors). State was *not* part of the sampling procedure, but given the nature of the model, it is being used as the cluster. We want to combine 10th and 12th grades in one model, but as I noted, sampling was done independently for each grade.

I want to utilize wtscale=cluster, but I don't know if I should (a) scale the data separately by grade prior to Mplus analysis, or (b) go ahead and scale the 10th and 12th grade data together during Mplus analysis. If I scale the data prior to Mplus analysis, I will retain the "separate" nature of the 10th and 12th grade samples, but my weight will not sum to the cluster N (given that it was done separately by grade, but both grades are included in the model).

Thank you for your assistance.

Tihomir Asparouhov posted on Wednesday, August 14, 2013 - 12:13 pm

What model are you considering?

You might be able to utilize some 3 level models or multiple group two-level models, see
https://www.statmodel.com/examples/webnotes/webnote16.pdf

Yvonne Terry-McElrath posted on Wednesday, August 14, 2013 - 12:21 pm

We are running twolevel random MLR montecarlo models with logit link (dichotomous variables for both X, M, and Y; 2-1-1 mediation models). State policy does not differentiate between 10th and 12th grades - just focuses on middle vs. high school. Because there are no policy differences, and also because we want to have as large an N as possible, we want to combine 10th and 12th grades.

Tihomir Asparouhov posted on Wednesday, August 14, 2013 - 1:30 pm

Assuming that no school is present in both 10 grade and 12 grade samples I would suggest to scale the weights for 10 grade separately and for 12 grade separately so that the 10 grade sum up to the cluster sample size in 10 grade and the 12 grade sum up to the 12 grade cluster sample size. Then use wtscale=unscaled. The scaling of the weights would have to be done outside of Mplus before the analysis.

Yvonne Terry-McElrath posted on Thursday, August 15, 2013 - 7:50 am

Thank you so very much for your responses and help. It is much appreciated!

Nemanja posted on Saturday, February 07, 2015 - 12:31 pm

Dear Dr. Muthen,

Do you know what weights should I use for the PISA 2009 data?
I am estimating peer effects by OLS, but I need to use sample weights as sample is stratified.

Thank you very much.

Sincirely,
Nemanja

Linda K. Muthen posted on Saturday, February 07, 2015 - 2:36 pm

You should check with whoever created the data set.

Christoph Weber posted on Friday, March 11, 2016 - 2:14 am

Dear Mplus Team, I want to make sure if I get the following post right:

Tihomir Asparouhov posted on Wednesday, February 23, 2011 - 4:49 pm

"Use the level 1 weights as they are and do not create level 2 weights. All countries will then have equal weight. Level 1 weights will not cause countries with larger samples to have greater leverage."

--> Does this mean for a three level model (Student, school, country) using L1- and L2 weights that for the estimation of L3-effects all countries have the equal weight (no matter what the student or school sample size or population size is?

Thanks

Tihomir Asparouhov posted on Saturday, March 12, 2016 - 8:57 am

Yes

Anna Austin posted on Sunday, October 15, 2017 - 6:13 pm

Hello!

I am conducting latent class analysis with data from a complex survey. I am using TYPE=COMPLEX to account for weighting, stratification, and clustering. However, I would like to import the data into SAS to describe the prevalence of demographic characteristics of individuals belonging to each specific class. In doing so, I want to account for each individual's posterior probability (i.e., most likely class assignment), but still need to account for the survey weights. Can I simply multiple the posterior probabilities by the sample weights for use in analysis?

Tihomir Asparouhov posted on Tuesday, October 17, 2017 - 10:02 am

Yes, however, I would recommend that you use these better/more accurate procedures. Simply place your variables in the auxiliary command in the model you are running.

auxiliary=demog(bch);
or
auxiliary=demog(du3step);

For more information look up these commands in the User's Guide as well as web notes 15 and 21
http://statmodel.com/download/webnotes/webnote15.pdf

http://statmodel.com/examples/webnotes/webnote21.pdf

Anna Austin posted on Tuesday, October 17, 2017 - 3:49 pm

Dr. Asparouhov,

Thank you for your insight! I have conducted a three-step method to generate odds ratios to examine associations of various demographics with latent classes. However, for some factors the 95% CIs are extremely wide due to the low number of individuals with a particular value of a factor in a given class (i.e., only 3% with education <12 years in one class). My plan was to supplement the odds ratios by describing the weighted prevalences as sort of way to explain why the 95% CIs were so wide. Do you have any thoughts?

Tihomir Asparouhov posted on Wednesday, October 18, 2017 - 9:04 am

I would recommend that you report the probability result instead of the odds ratios (which as you say tend to explode when the probabilities are small). You can use the DCAT option of the auxiliary command as well.

Anna Austin posted on Wednesday, October 18, 2017 - 11:38 am

Thanks for your insight!

Is the probability result the information found directly about the odds ratios in the output? This section of the output gives an estimate, SE, and p-value for each factor for two of the three classes. If, for example, a binary variable has an estimate of 1.392 (p-value=0.004) for class 1, how would this probability result be interpreted?

Tihomir Asparouhov posted on Wednesday, October 18, 2017 - 12:03 pm

This doesn't look correct. Your input should look something like that

variable:
Names are u1-u5 u x;
usevar are u1-u5;
Categorical = u1-u5;
Classes = c(2);
Auxiliary = u(DCAT);

data: file=prob.dat;

Analysis: Type = Mixture;

Model:
%Overall%
[c#1*0];

%c#1%
[u1$1*-1 u2$1*-1 u3$1*-1 u4$1*-1 u5$1*-1];

%c#2%
[u1$1*1 u2$1*1 u3$1*1 u4$1*1 u5$1*1];

The output that the Auxiliary command generates for the variable u looks like that

EQUALITY TESTS OF MEANS/PROBABILITIES ACROSS CLASSES

U
Prob S.E. Odds Ratio S.E. 2.5% C.I. 97.5% C.I.

Class 1
Category 1 0.754 0.010 1.000 0.000 1.000 1.000
Category 2 0.246 0.010 0.909 0.076 0.772 1.072
Class 2
Category 1 0.736 0.010 1.000 0.000 1.000 1.000
Category 2 0.264 0.010 1.000 0.000 1.000 1.000

Chi-Square P-Value Degrees of Freedom

Overall test 1.286 0.257 1

Anna Austin posted on Wednesday, October 18, 2017 - 12:32 pm

Thanks for sharing the above code.

I have been using Vermunt's 3-step approach (R3STEP) rather than Lanza's 1-step approach (DCAT). Is it possible to get probabilities using Vermunt's 3-step approach? The output does look different from what you have provided above.

Tihomir Asparouhov posted on Wednesday, October 18, 2017 - 1:50 pm

It is possible using what we call the manual 3 step procedure, see Section 3
http://statmodel.com/download/webnotes/webnote15.pdf
You would simply declare the distal outcome as categorical in the last step.

A shortcut approach is to use the DU3step option in the auxiliary command. For a binary variable the mean is the same as the probability (given 0/1 coding of the binary variable). If you just need descriptive values I would recommend that approach (just keep in mind that SE will be inferior to those obtained via the full 3-step manual approach since the DU3step SE are based on continuous variable assumption).

Anna Austin posted on Wednesday, November 22, 2017 - 9:11 am

Hi Dr. Asparouhov,

I have tried declaring the distal outcome as categorical in the last step, but I get an error message indicating that I cannot declare a variable that is not a dependent variable as categorial. My model command is as follows:

MODEL:

%OVERALL%
CLASS on OUTCOME COV1 COV2 COV3;

%class#1%
[ModalC#1@0.603];

%class#2%
[ModalC#1@-3.151];

Is there something wrong with my model command?

Bengt O. Muthen posted on Wednesday, November 22, 2017 - 10:47 am

Send your output to Support along with your license number.