Message/Author 

Moin Syed posted on Wednesday, May 11, 2011  9:09 am



Hello. I am working with a nationally representative data set and am specifying WEIGHT, STRATIFICATION, and CLUSTER using TYPE = COMPLEX. The analysis I want to conduct is to link countylevel variables to individuallevel ones. Thus, individuals are nested within counties and so I am also using TYPE = TWOLEVEL. However, when I include the county variables in the BETWEEN statement I get the following error: *** ERROR One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement. This is because CLUSTER and county refer to different forms of nesting. That is, each cluster does not correspond with a single county. Is there a way to do the analysis that I want to do, or must the CLUSTER and BETWEEN variables match up? Thanks in advance for your help! 


There can be no variability on a variable on the BETWEEN list for the cluster variable named using the CLUSTER option. Perhaps county should be your cluster variable. 

Moin Syed posted on Wednesday, May 11, 2011  9:34 am



Thank you for your quick response. If I used county as my cluster variable then the standard errors would not be correct, since the cluster variable in conjunction with the stratification variable are needed for the calculation. Right? Can you think of any way to work around this problem? 


You can include both a cluster variable and a stratification variable which I think you are doing. Are you sure you are identifying the variables correctly. 

Moin Syed posted on Wednesday, May 11, 2011  11:09 am



Sorry if I was not clear. I am specifying a stratification variable and a cluster variable as part of the sampling specification. I also want to be able to include county as another cluster variable, a nesting variable that has nothing to do with how sampling occurred, so that i can model countylevel effects. The county variable varies within some of the sampling clusters. 


It sounds like you have subjects classified by both cluster and county. If so, one possible approach is crossedrandom effects modeling, where random variation due to both cluster and county is allowed. This is not yet available in Mplus. 

Moin Syed posted on Thursday, May 12, 2011  5:22 pm



Ah, thanks so much! 


I am using Web Note 12 info to test the addition of effects (M1 model) to a restricted model (M0 model) for multilevel complex survey data. The WN12 paper says that the LL for M10 should be the same as the LL for M0, after using the parameter estimates from the SVALUES output from M0. My M10 LL is not equal to the M0 LL, but is equal to the M1 LL instead (within a couple of decimals). The TECH5 output shows no iterations for the GRADIENT or QUASINEWTON sections until toward the end of that output, and then there are usually 24 iterations in the section for ITERATIONS FROM THE QUASINEWTON EM ACCELERATOR. Questions are: 1. Am I misunderstanding the note about which LL the M10 LL should be equal to? 2. are those iterations at the end a problem? and 3. If the LL for M10 is wrong, how do I fix it? Thanks, Bruce 


1. Your understanding is correct. 2. Probably yes. What are the parameter estimates of the M10 run  the same as those for M0 or M1? 3. Try using mconvergence=100000; if this doesn't work send your example to support@statmodel.com 


Hi! I'm a beginner at this and I have some question for a study I'm doing for my PhD thesis. I think I need to combine the two approaches of modeling for my complex survey data. I'm using PISA data that has a twostage sampling and within, between and replicate weights. Is it possible to use all these weights if I choose TYPE= COMPLEX TWOLEVEL? The school sample is stratified and the students are selected randomly within each school. I want to do a confirmatory factor analysis on the Swedish sample. I want to analyze the variance for both the school and student levels. Do I need to include the options stratification and cluster for my analysis? If so, could stratification be the stratum and could cluster be the school ID? Thanks in advance! Best regards, Maria 


Replicate weights cannot be used with TYPE=TWOLEVEL. You can use the STRATIFICATION, CLUSTER, BWEIGHT, and WEIGHT options with TYPE=TWOLEVEL. 


Thanks Dr. Muthén! Do you think I can get proper estimates of the standard errors without the replicate weights? PISA recommends that the replicate weights are used in order to avoid unbiased estimates of the standard errors. Maybe I'm better off without TYPE=TWOLEVEL and if I use only TYPE= COMPLEX? I understand that it is possible to use REPSE=FAY (.5) with TYPE=COMPLEX. Is that right? I am so confused! Thanks again! 


Replicate weights contain information on stratification and clustering. If you use the STRATIFICATION and CLUSTER options with the WEIGHT option, you should get the same results. You can do it both ways to see. I don't believe the replicate weights contain any other information about the sampling design. You can use TYPE=COMPLEX if your model is aggregatable. Otherwise, you should use TYPE=TWOLEVEL. See the following paper which is available on the website for more information about this: Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267316. 


Thanks a lot Dr Muthén! I think I'm beginning to understand the difference between the two approaches. I'm very grateful for your explanation! 


Hello, I am thinking of using the following framework (from the manual) for a twolevel random intercept model with time points nested within persons: VARIABLE: NAMES = y x w xm clus; WITHIN = x; BETWEEN = w xm; CLUSTER = clus; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% y ON x; %BETWEEN% y ON w xm; My questions are: 1) If I want to use MODEL INDIRECT, do I need to specify the %WITHIN% and %BETWEEN% portions of the model for the MODEL INDRECT as well? Or, if the indirect paths all involve betweenperson variables, do I just type the MODEL INDIRECT portion of code as part of te %BETWEEN% section of code above? 2) If a variable is both a within and betweenperson variable, can I put it in both %WITHIN% and %BETWEEN% sections above? For example, we are working with alcohol consumption variables measured four times among the persons in our sample. We believe that alcohol consumption is a betweenperson variable (or individual difference variable), but that it also varies within persons over time, i.e., people can grow on their alcohol consumption over time. Hence, can we put our alcohol consumption variables on both the %WITHIN% and %BETWEEN% sections above, or would this model not run in Mplus? Thank you for your help, Lisa Yarnell 


MODEL INDIRECT does not have a within and between part. Just specify the within and between indirect effects using IND and VIA statements. 


What is the benefit of using TYPE=TWOLEVEL over TYPE=COMPLEX for clustered data? If one has data with persons nested in schools, or time points nested in persons, but no specifiic hypotheses about random slopes or how variables are related across the two levels, is it just as good to use TYPE=COMPLEX? This will simplify the code, so that one does not have to use the %WITHIN% and %BETWEEN% lines, which I could see being more useful when specifying random slopes or relationships among variables across the two levels. Is my thinking correct here? 


There are two issues to consider. The first is whether the model is aggregatable or not. TYPE=COMPLEX or TYPE=TWOLEVEL can be used with aggregatable models. For nonaggregatable models, only TYPE=TWOLEVEL can be used. For more information on this topic, see the following paper on the website: Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267316. The other issue is whether you are interested in modeling the between level. If so, you must use TYPE=TWOLEVEL. If not, you can use TYPE=COMPLEX. 


Hi Linda: 1) Is it possible to run a multiplegroup model with TYPE=TWOLEVEL RANDOM? For example, when I try to model the second group, in order to impose constraints across groups, I get this message: *** ERROR in MODEL command Random effect variables can only be declared in the GENERAL model. How would I constrain things across groups if I cannot model the second group? 2) Are there any big things I should know about the WITHIN and BETWEEN portions of the model when trying to make this a multiplegroup model? For example, is it possible to constrain WITHIN effects across groups? Is that even a logical constraint? 


I think this issue is resolved, Linda. You can ignore this post because I figured out how to model the random effects in the general model for a twogroup modelso it is indeed possible. 

Kim Bellens posted on Thursday, August 01, 2013  1:50 am



Hi, I would like to do analyses with TIMSS, which makes use of a twostage stratified sample design within each country: after stratification, random schools are sampled and in these schools classes are sampled. Therefore, I use TYPE = COMPLEX TWOLEVEL. This model is running seperately for each country. Now I would like to take into account the country level as an additional level to do analyses over countries, by using COMPLEX THREELEVEL. In this, countries are the highest level, schools and classes are the lower levels. Stratification is only done seperatly for each country, which means that country is an higher level than the level at which stratification takes place. In running these analyses, I do not receive any output, so I assume I'm doing something wrong. I assume this relates to the question asked by Moin Syed stated here above. Therefore, I was wondering whether this is already possible in Mplus? Otherwise: would it be a good idea to use multigroup analysis (for each country) to get seperate estimates for all countries although taking them all together in one analysis (instead of seperate analysis for each country). I'm a bit cautious about that, because the overall estimates will not be right (as the stratification variable will be mixed up across countries). I'd like to hear your advice on this! Many thanks for your help! 


Please send the input and data for the analysis that did not give output to support@statmodel.com. You can redefine the stratification variable so that it is correct. I think strata=10000*country +strata will ensure that the stratification is accounted for properly. 

Olev Must posted on Friday, January 03, 2014  9:58 am



Hi, I am running the „Twolevel Complex“ model on the PISA data (plausible values; within = student; between = country; covariates on the between level). As I understood I must use the weight, cluster and stratification option. The replicates are not allowed. There are two stratification variables in PISA data: 1) RANDOMIZED FINAL VARIANCE STRATUM (180) – schools are divided to 80 strata in each country; 2) Original stratum ( in some countries there are only one strata, typically the schools in one country is dividend to several original strata). My problem is : is the „randomized final variance stratum“ appropriate selection to get the correct model estimates, including sampling variance and standard errors ? Thank you very much, Olev 


Olev From your description it looks to me that you will only be able to use type=twolevel with weights. Since the country is the between level in the model, additional cluster variable and strata can be used only if they are nested above the between level cluster variable. There is no problem with that in principle. If you want to add an additional level in the model such as school you can use type=threelevel. 

Olev Must posted on Sunday, January 05, 2014  7:16 am



Hi,Tihomir! I used this simple approach  only twolevel + weights. And I modeled 5 times as there are 5 sets of plausible values; I reported the averages of parameters. Do you see here some mistakes or unused possibilities? (Really very small differences between models). The reviewer claimed that the sample variance and standard errors of parameters are not correctly estimated. This was reason why I started to investigate replicates and stratification. Thank you for reply! 


Dear MplusTeam! I'm using type = complex twolevel random to test for crosslevel interactions. The levels are student, class, school. The crosslevel interactions contain student and class level variables. I don't use any school level variables. Is type = complex twolevel appropiate for such models? Thanks Christoph 


Yes. You can also use Type=Threelevel. 


Thanks, with type = threelevel I will yield random slopes on class and school level. Is there the possibility to calculte the total random variance of the slope? Christop 


You can sum the random slopes on the class and school level to obtain the total. 


I tried this, but the sum of the class and school slope variance isn't equal to the slope variance of a twolevel model. Shouldn't they be equal? Christoph 


Not sure they should be equal. Instead of considering the twolevel run, you can look at TECH4 for the slope to get the total variance. Perhaps you have predictors of the slope, the variation in which also need to be taken into account. 


Hello, a previous post and response from Dr. Muthen (above), was as follows:  Lisa M. Yarnell posted on March 29, 2012 What is the benefit of using TYPE=TWOLEVEL over TYPE=COMPLEX for clustered data? . . . Linda K. Muthen posted on March 29, 2012 There are two issues to consider. The first is whether the model is aggregatable or not. TYPE=COMPLEX or TYPE=TWOLEVEL can be used with aggregatable models. For nonaggregatable models, only TYPE=TWOLEVEL can be used. For more information on this topic, see the following paper on the website: Muthén, B. & Satorra, A. (1995). . . . The other issue is whether you are interested in modeling the between level. If so, you must use TYPE=TWOLEVEL. If not, you can use TYPE=COMPLEX.  Was it intended to actually say: "For nonaggregatable models, only TYPE=*COMPLEX* can be used"? I gather this from the abstract of Muthen & Satorra, 1995: "One method, termed aggregated analysis, computes the usual parameter estimates but adjusts standard errors and goodnessoffit model testing. The other method, termed *disaggregated* analysis, includes a new set of parameters reflecting the *complex sample structure.* Thank you. 


Actually, upon reading further, the prior response was correct. In a twolevel model, design features can be incorporated by modeling them explicitly, e.g., including a set of dummy variables for region (or other stratification variables) as predictors in the model. This would not necessitate TYPE=COMPLEX. Is this latter understanding correct? Thank you. 


Clustering could not be handled without either Type=Complex or Type=Twolevel. 

Tao Yang posted on Wednesday, March 22, 2017  7:30 am



Hello, my data has individuals nested within groups and groups nested in departments. The outcome y is person level. Predictor x1 is average y of a group excluding the focal person, and predictor x2 is average y of a department excluding the focal group. So x1 and x2 are at group and dept levels respectively, but both also have variance within group and dept (due to computation). I tried Type = threelevel or Type = complex twolevel and they did not work due to withinlevel variance of x1 and x2. It works if I do TYPE=COMPLEX with STRATIFICATION=deptID and CLUSTER=groupID. 1) Because depts are a sample (rather than the population), would STRATIFICATION=deptID lead to inaccurate SEs? 2) What might be a better way to specify the model? Thanks! 


Please send the problematic output to Support along with your license number. 


Hello, I would like to run a twolevel model (clustering variable: ourid) while accounting for the clustering at the third level (clustering variable: classroom1). Does my model in its current form take into account clustering at the classroom level? USEVARIABLES IS yanti_OT c_fanti_OT c_komanti_OT wave0 Qwave; CLUSTER = ourid classroom1; within = wave0 c_fanti_OT c_komanti_OT Qwave; define: Qwave = wave0*wave0; ANALYSIS: Type is twolevel complex; ESTIMATOR IS ML; MODEL: %within% yanti_OT ON wave0 Qwave; yanti_OT ON c_fanti_OT; yanti_OT ON c_komanti_OT; %between% yanti_OT; 


If you say TYPE = COMPLEX TWOLEVEL and CLUSTER= ourid classroom1, complex uses ourid and twolevel uses classroom1. 

mboer posted on Wednesday, March 27, 2019  3:13 am



Hello, I have data where individuals (level 1) are nested within schools (level 2) that are nested within countries (level 3). I understand from your reply above (3 Oct, 2017) that is it possible to carry out a twolevel model where level 1 = individuals and level 2 = schools, while accounting for countrylevel clustering (by using CLUSTER = country schools and TYPE = COMPLEX TWOLEVEL). However, I was wondering whether it is also possible to carry out a twolevel model, where level 1 = individuals and level 2 = countries, while accounting for schoollevel clustering. I tried this by changing the syntax to TYPE = TWOLEVEL COMPLEX, but it seems to yield similar results as the first approach. Or do I need to specify a TYPE = THREELEVEL model here? My dependent variable is categorical and I would like to obtain logit estimates, so I would prefer conducting a twolevel analysis with logistic regression. Thank you in advance. 


No, it is not possible to carry out a twolevel model where level 1 = individuals and level 2 = countries, while accounting for schoollevel clustering. You can use Threelevel but with categorical, only Bayes with probit is available. You could use Twolevel for individuals and schools and use multiplegroup for countries (that is, viewing country as a fixed model instead of a random mode of variation  see also the paper on our website: Muthén, B. & Asparouhov, T. (2018). Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociological Methods & Research, 47:4 637664. DOI: 10.1177/0049124117701488 

Tsz Tan Lau posted on Thursday, January 16, 2020  7:59 am



Hello Drs. Muthen, I have data where individuals are nested into schools and schools nested into countries. I am interested in modelling level 1 and level 2 effects. From my understanding, it is possible to use TYPE=THREELEVEL and TYPE=TWOLEVEL COMPLEX to account for country clustering. I've tried the same analysis using both TYPE = THREELEVEL and TWOLEVEL COMPLEX and yielded different path coefficents. May I know why this may be the case? Thank you in advance. 


I assume that you have many (at least 20, preferably more than 50) countries to make these analyses meaningful. TWOLEVEL COMPLEX only corrects the SEs but doesn't estimate any 3level parameters. Not all models "aggregatable", so shouldn't be expected to give the same point estimates; see Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267316. download paper contact first author show abstract 

Back to top 