Mplus Discussion > Multilevel Data/Complex Sample >
 Moin Syed posted on Wednesday, May 11, 2011 - 9:09 am
Hello. I am working with a nationally representative data set and am specifying WEIGHT, STRATIFICATION, and CLUSTER using TYPE = COMPLEX. The analysis I want to conduct is to link county-level variables to individual-level ones. Thus, individuals are nested within counties and so I am also using TYPE = TWOLEVEL.

However, when I include the county variables in the BETWEEN statement I get the following error:

One or more between-level variables have variation within a cluster for
one or more clusters. Check your data and format statement.

This is because CLUSTER and county refer to different forms of nesting. That is, each cluster does not correspond with a single county.

Is there a way to do the analysis that I want to do, or must the CLUSTER and BETWEEN variables match up?

Thanks in advance for your help!
 Linda K. Muthen posted on Wednesday, May 11, 2011 - 9:27 am
There can be no variability on a variable on the BETWEEN list for the cluster variable named using the CLUSTER option. Perhaps county should be your cluster variable.
 Moin Syed posted on Wednesday, May 11, 2011 - 9:34 am
Thank you for your quick response. If I used county as my cluster variable then the standard errors would not be correct, since the cluster variable in conjunction with the stratification variable are needed for the calculation. Right? Can you think of any way to work around this problem?
 Linda K. Muthen posted on Wednesday, May 11, 2011 - 9:36 am
You can include both a cluster variable and a stratification variable which I think you are doing. Are you sure you are identifying the variables correctly.
 Moin Syed posted on Wednesday, May 11, 2011 - 11:09 am
Sorry if I was not clear. I am specifying a stratification variable and a cluster variable as part of the sampling specification. I also want to be able to include county as another cluster variable, a nesting variable that has nothing to do with how sampling occurred, so that i can model county-level effects. The county variable varies within some of the sampling clusters.
 Bengt O. Muthen posted on Thursday, May 12, 2011 - 9:57 am
It sounds like you have subjects classified by both cluster and county. If so, one possible approach is crossed-random effects modeling, where random variation due to both cluster and county is allowed. This is not yet available in Mplus.
 Moin Syed posted on Thursday, May 12, 2011 - 5:22 pm
Ah, thanks so much!
 Bruce A. Cooper posted on Friday, June 24, 2011 - 4:59 pm
I am using Web Note 12 info to test the addition of effects (M1 model) to a restricted model (M0 model) for multilevel complex survey data. The WN12 paper says that the LL for M10 should be the same as the LL for M0, after using the parameter estimates from the SVALUES output from M0. My M10 LL is not equal to the M0 LL, but is equal to the M1 LL instead (within a couple of decimals). The TECH5 output shows no iterations for the GRADIENT or QUASI-NEWTON sections until toward the end of that output, and then there are usually 2-4 iterations in the section for ITERATIONS FROM THE QUASI-NEWTON EM ACCELERATOR. Questions are: 1. Am I misunderstanding the note about which LL the M10 LL should be equal to? 2. are those iterations at the end a problem? and 3. If the LL for M10 is wrong, how do I fix it?
Thanks, Bruce
 Tihomir Asparouhov posted on Monday, June 27, 2011 - 10:53 am
1. Your understanding is correct.

2. Probably yes. What are the parameter estimates of the M10 run - the same as those for M0 or M1?

3. Try using mconvergence=100000; if this doesn't work send your example to support@statmodel.com
 Maria Rasmusson posted on Friday, March 16, 2012 - 8:07 am

I'm a beginner at this and I have some question for a study I'm doing for my PhD thesis.

I think I need to combine the two approaches of modeling for my complex survey data. I'm using PISA data that has a two-stage sampling and within, between and replicate weights. Is it possible to use all these weights if I choose TYPE= COMPLEX TWOLEVEL?
The school sample is stratified and the students are selected randomly within each school.

I want to do a confirmatory factor analysis on the Swedish sample. I want to analyze the variance for both the school and student levels.

Do I need to include the options stratification and cluster for my analysis? If so, could stratification be the stratum and could cluster be the school ID?

Thanks in advance!
Best regards,
 Linda K. Muthen posted on Saturday, March 17, 2012 - 10:41 am
Replicate weights cannot be used with TYPE=TWOLEVEL. You can use the STRATIFICATION, CLUSTER, BWEIGHT, and WEIGHT options with TYPE=TWOLEVEL.
 Maria Rasmusson posted on Monday, March 19, 2012 - 5:48 am
Thanks Dr. Muthén!

Do you think I can get proper estimates of the standard errors without the replicate weights?
PISA recommends that the replicate weights are used in order to avoid unbiased estimates of the standard errors.

Maybe I'm better off without TYPE=TWOLEVEL and if I use only TYPE= COMPLEX?

I understand that it is possible to use REPSE=FAY (.5) with TYPE=COMPLEX. Is that right?

I am so confused! Thanks again!
 Linda K. Muthen posted on Monday, March 19, 2012 - 1:34 pm
Replicate weights contain information on stratification and clustering. If you use the STRATIFICATION and CLUSTER options with the WEIGHT option, you should get the same results. You can do it both ways to see. I don't believe the replicate weights contain any other information about the sampling design.

You can use TYPE=COMPLEX if your model is aggregatable. Otherwise, you should use TYPE=TWOLEVEL. See the following paper which is available on the website for more information about this:

Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.
 Maria Rasmusson posted on Tuesday, March 20, 2012 - 1:27 am
Thanks a lot Dr Muthén!
I think I'm beginning to understand the difference between the two approaches. I'm very grateful for your explanation!
 Lisa M. Yarnell posted on Tuesday, March 27, 2012 - 9:03 am
Hello, I am thinking of using the following framework (from the manual) for a two-level random intercept model with time points nested within persons:

VARIABLE: NAMES = y x w xm clus;
BETWEEN = w xm;
CLUSTER = clus;
y ON x;
y ON w xm;

My questions are:
1) If I want to use MODEL INDIRECT, do I need to specify the %WITHIN% and %BETWEEN% portions of the model for the MODEL INDRECT as well? Or, if the indirect paths all involve between-person variables, do I just type the MODEL INDIRECT portion of code as part of te %BETWEEN% section of code above?
2) If a variable is both a within- and between-person variable, can I put it in both %WITHIN% and %BETWEEN% sections above? For example, we are working with alcohol consumption variables measured four times among the persons in our sample. We believe that alcohol consumption is a between-person variable (or individual difference variable), but that it also varies within persons over time, i.e., people can grow on their alcohol consumption over time. Hence, can we put our alcohol consumption variables on both the %WITHIN% and %BETWEEN% sections above, or would this model not run in Mplus?

Thank you for your help,
Lisa Yarnell
 Linda K. Muthen posted on Tuesday, March 27, 2012 - 6:56 pm
MODEL INDIRECT does not have a within and between part. Just specify the within and between indirect effects using IND and VIA statements.
 Lisa M. Yarnell posted on Thursday, March 29, 2012 - 3:35 pm
What is the benefit of using TYPE=TWOLEVEL over TYPE=COMPLEX for clustered data?

If one has data with persons nested in schools, or time points nested in persons, but no specifiic hypotheses about random slopes or how variables are related across the two levels, is it just as good to use TYPE=COMPLEX?

This will simplify the code, so that one does not have to use the %WITHIN% and %BETWEEN% lines, which I could see being more useful when specifying random slopes or relationships among variables across the two levels. Is my thinking correct here?
 Linda K. Muthen posted on Thursday, March 29, 2012 - 5:05 pm
There are two issues to consider. The first is whether the model is aggregatable or not. TYPE=COMPLEX or TYPE=TWOLEVEL can be used with aggregatable models. For non-aggregatable models, only TYPE=TWOLEVEL can be used. For more information on this topic, see the following paper on the website:

Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.

The other issue is whether you are interested in modeling the between level. If so, you must use TYPE=TWOLEVEL. If not, you can use TYPE=COMPLEX.
 Lisa M. Yarnell posted on Monday, December 31, 2012 - 6:19 pm
Hi Linda:

1) Is it possible to run a multiple-group model with TYPE=TWOLEVEL RANDOM?

For example, when I try to model the second group, in order to impose constraints across groups, I get this message:
*** ERROR in MODEL command
Random effect variables can only be declared in the GENERAL model.

How would I constrain things across groups if I cannot model the second group?

2) Are there any big things I should know about the WITHIN and BETWEEN portions of the model when trying to make this a multiple-group model? For example, is it possible to constrain WITHIN effects across groups? Is that even a logical constraint?
 Lisa M. Yarnell posted on Monday, December 31, 2012 - 9:33 pm
I think this issue is resolved, Linda. You can ignore this post because I figured out how to model the random effects in the general model for a two-group model--so it is indeed possible.
 Kim Bellens posted on Thursday, August 01, 2013 - 1:50 am

I would like to do analyses with TIMSS, which makes use of a two-stage stratified sample design within each country: after stratification, random schools are sampled and in these schools classes are sampled. Therefore, I use TYPE = COMPLEX TWOLEVEL. This model is running seperately for each country.

Now I would like to take into account the country level as an additional level to do analyses over countries, by using COMPLEX THREELEVEL. In this, countries are the highest level, schools and classes are the lower levels. Stratification is only done seperatly for each country, which means that country is an higher level than the level at which stratification takes place. In running these analyses, I do not receive any output, so I assume I'm doing something wrong. I assume this relates to the question asked by Moin Syed stated here above. Therefore, I was wondering whether this is already possible in Mplus?

Otherwise: would it be a good idea to use multigroup analysis (for each country) to get seperate estimates for all countries although taking them all together in one analysis (instead of seperate analysis for each country). I'm a bit cautious about that, because the overall estimates will not be right (as the stratification variable will be mixed up across countries). I'd like to hear your advice on this!

Many thanks for your help!
 Linda K. Muthen posted on Thursday, August 01, 2013 - 2:27 pm
Please send the input and data for the analysis that did not give output to support@statmodel.com.

You can redefine the stratification variable so that it is correct. I think

strata=10000*country +strata

will ensure that the stratification is accounted for properly.
 Olev Must posted on Friday, January 03, 2014 - 9:58 am
I am running the „Twolevel Complex“ model on the PISA data (plausible values; within = student; between = country; covariates on the between level). As I understood I must use the weight, cluster and stratification option. The replicates are not allowed. There are two stratification variables in PISA data: 1) RANDOMIZED FINAL VARIANCE STRATUM (1-80) – schools are divided to 80 strata in each country; 2) Original stratum ( in some countries there are only one strata, typically the schools in one country is dividend to several original strata). My problem is : is the „randomized final variance stratum“ appropriate selection to get the correct model estimates, including sampling variance and standard errors ?
Thank you very much,
 Tihomir Asparouhov posted on Friday, January 03, 2014 - 11:45 am

From your description it looks to me that you will only be able to use type=twolevel with weights. Since the country is the between level in the model, additional cluster variable and strata can be used only if they are nested above the between level cluster variable. There is no problem with that in principle. If you want to add an additional level in the model such as school you can use type=threelevel.
 Olev Must posted on Sunday, January 05, 2014 - 7:16 am

I used this simple approach - only twolevel + weights. And I modeled 5 times as there are 5 sets of plausible values; I reported the averages of parameters. Do you see here some mistakes or unused possibilities? (Really very small differences between models).
The reviewer claimed that the sample variance and standard errors of parameters are not correctly estimated. This was reason why I started to investigate replicates and stratification.

Thank you for reply!
 Christoph Weber posted on Monday, August 11, 2014 - 12:03 pm
Dear Mplus-Team!
I'm using type = complex twolevel random to test for crosslevel interactions. The levels are student, class, school. The crosslevel interactions contain student and class level variables. I don't use any school level variables.
Is type = complex twolevel appropiate for such models?

 Bengt O. Muthen posted on Monday, August 11, 2014 - 5:16 pm
Yes. You can also use Type=Threelevel.
 Christoph Weber posted on Tuesday, August 12, 2014 - 1:07 am
Thanks, with type = threelevel I will yield random slopes on class and school level. Is there the possibility to calculte the total random variance of the slope?
 Linda K. Muthen posted on Tuesday, August 12, 2014 - 10:31 am
You can sum the random slopes on the class and school level to obtain the total.
 Christoph Weber posted on Tuesday, August 12, 2014 - 2:03 pm
I tried this, but the sum of the class and school slope variance isn't equal to the slope variance of a twolevel model. Shouldn't they be equal?
 Bengt O. Muthen posted on Tuesday, August 12, 2014 - 3:28 pm
Not sure they should be equal. Instead of considering the twolevel run, you can look at TECH4 for the slope to get the total variance. Perhaps you have predictors of the slope, the variation in which also need to be taken into account.
 Lisa M. Yarnell posted on Wednesday, April 29, 2015 - 1:36 pm
Hello, a previous post and response from Dr. Muthen (above), was as follows:

Lisa M. Yarnell posted on March 29, 2012

What is the benefit of using TYPE=TWOLEVEL over TYPE=COMPLEX for clustered data? . . .

Linda K. Muthen posted on March 29, 2012

There are two issues to consider. The first is whether the model is aggregatable or not. TYPE=COMPLEX or TYPE=TWOLEVEL can be used with aggregatable models. For non-aggregatable models, only TYPE=TWOLEVEL can be used. For more information on this topic, see the following paper on the website:

Muthén, B. & Satorra, A. (1995). . . .

The other issue is whether you are interested in modeling the between level. If so, you must use TYPE=TWOLEVEL. If not, you can use TYPE=COMPLEX.

Was it intended to actually say: "For non-aggregatable models, only TYPE=*COMPLEX* can be used"?

I gather this from the abstract of Muthen & Satorra, 1995: "One method, termed aggregated analysis, computes the usual parameter estimates but adjusts standard errors and goodness-of-fit model testing.
The other method, termed *disaggregated* analysis, includes a new set of parameters reflecting the *complex sample structure.*

Thank you.
 Lisa M. Yarnell posted on Wednesday, April 29, 2015 - 1:59 pm
Actually, upon reading further, the prior response was correct. In a two-level model, design features can be incorporated by modeling them explicitly, e.g., including a set of dummy variables for region (or other stratification variables) as predictors in the model. This would not necessitate TYPE=COMPLEX.

Is this latter understanding correct?

Thank you.
 Bengt O. Muthen posted on Wednesday, April 29, 2015 - 5:14 pm
Clustering could not be handled without either Type=Complex or Type=Twolevel.
 Tao Yang posted on Wednesday, March 22, 2017 - 7:30 am
Hello, my data has individuals nested within groups and groups nested in departments. The outcome y is person level. Predictor x1 is average y of a group excluding the focal person, and predictor x2 is average y of a department excluding the focal group. So x1 and x2 are at group and dept levels respectively, but both also have variance within group and dept (due to computation). I tried Type = threelevel or Type = complex twolevel and they did not work due to within-level variance of x1 and x2.

It works if I do TYPE=COMPLEX with STRATIFICATION=deptID and CLUSTER=groupID.

1) Because depts are a sample (rather than the population), would STRATIFICATION=deptID lead to inaccurate SEs?

2) What might be a better way to specify the model? Thanks!
 Bengt O. Muthen posted on Wednesday, March 22, 2017 - 6:57 pm
Please send the problematic output to Support along with your license number.
 dummyvariable123 posted on Friday, October 27, 2017 - 6:48 am
Hello, I would like to run a two-level model (clustering variable: ourid) while accounting for the clustering at the third level (clustering variable: classroom1). Does my model in its current form take into account clustering at the classroom level?

yanti_OT c_fanti_OT c_komanti_OT wave0 Qwave;
CLUSTER = ourid classroom1;
within = wave0 c_fanti_OT c_komanti_OT Qwave;
define: Qwave = wave0*wave0;

Type is twolevel complex;

yanti_OT ON wave0 Qwave;
yanti_OT ON c_fanti_OT;
yanti_OT ON c_komanti_OT;

 Linda K. Muthen posted on Friday, October 27, 2017 - 3:03 pm
If you say TYPE = COMPLEX TWOLEVEL and CLUSTER= ourid classroom1, complex uses ourid and twolevel uses classroom1.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message