Is type=complex or type=multilevel th... PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Alex Zablah posted on Thursday, March 10, 2005 - 11:31 am
Hi, hope all is well. I was hoping to get your insight/input on the following:

I have survey data (6 dependent variables) for 300 customers. These 300 customers belong to one of 10 different providers. Hence, the cases are not IID.

I also have survey data from these 10 providers (1 independent variable).

I want to test the relationship between the provider-level independent variable and three of the customer-level dependent variables using structural equation modeling (in Mplus 3.10). Given that my between-level sample size is only 10, that means that multi-level modeling is out of the question. Right?

If I disaggregate the provider level data (i.e. assign the same value for the independent variable to each customer who shares the same provider) would it be appropriate to run the SEM model (n=300) using the type=complex design?

I'd appreciate any feedback/suggestions you could offer.

Best regards,

 bmuthen posted on Thursday, March 10, 2005 - 4:27 pm
Yes, I think 10 clusters makes 2-level modeling not perform well. Unfortunately, our simulations suggest that type=complex also needs more clusters than 10 - at least 20. So perhaps the only way is to view "provider" as a fixed effect instead of a random effect and use 9 provider dummy variables as covariates. Bayesian analysis using priors is theonly way I have seen that attempts to deal with such a small number of clusters.
 Herb Marsh posted on Monday, April 08, 2013 - 6:33 pm
Why is it no longer possible to use Type=complex to get correct standard errors for analyses are done at level 1 when there are three levels.

For example, results at the student level when students are nested within classes, and classes are nested within schools.
 Linda K. Muthen posted on Tuesday, April 09, 2013 - 9:43 am
COMPLEX TWOLEVEL, THREELEVEL, and COMPLEX THREELEVEL are all available. There have been no changes. I'm not sure I understand your question.
 Herb Marsh posted on Tuesday, April 09, 2013 - 9:14 pm
Linda: Here is what I did and the error message that I got. I recall that it was possible to have two cluster variables when analyses were done only at level 1, but maybe I am mistaken. In any case, why is it apparently not allowed.

cluster is ID_Schl Id_Class ;

*** ERROR in VARIABLE command
Two cluster variables are allowed for TYPE=TWOLEVEL COMPLEX. Only one
cluster variable is allowed for TYPE=COMPLEX (single level). Limit on
the number of cluster variables reached.
 Linda K. Muthen posted on Wednesday, April 10, 2013 - 6:54 am
We've never allowed more than one cluster variable with TYPE=COMPLEX. You would need to use TWOLEVEL COMPLEX to handle two cluster variables.
 Melissa Kull posted on Monday, April 22, 2013 - 7:45 am

I have a three level model with observations (level 1) nested within individuals (level 2) nested within cities (level 3). The data I am using requires sampling weights and we only
have observations at 2 timepoints. From Chapter Nine of the users guide (p. 252) I am not sure whether I should treat this model as TYPE=TWOLEVEL (and treat our two observation points as "time") or TYPE=THREELEVEL and treat our first level as cross sectional? In addition, the outcome we are using is a count variable and it is my understanding that users can't use sample weights with count variables in TYPE=THREELEVEL?

Many thanks,
 Linda K. Muthen posted on Monday, April 22, 2013 - 12:09 pm
I would treat this a a TWOLEVEL analysis with data in the wide format. THREELEVEL is not available for count variables.
 anonymous posted on Wednesday, May 08, 2013 - 12:04 pm
I'm aware that TYPE=COMPLEX with the cluster option adjusts for non-independence in terms of the chi-square statistic and the standard errors of the model, but not the parameter estimates (parameter estimates are adjusted for with the multi-level modeling). Is it that the TYPE=COMPLEX and cluster option only adjusts for the parameter estimates' significance, but not their magnitude? I'm wondering whether it is appropriate to estimate a model of treatment effects involving children nested within schools using the TYPE=COMPLEX and cluster option.
 Linda K. Muthen posted on Wednesday, May 08, 2013 - 12:17 pm
Parameter estimates are adjusted if the WEIGHT option is used. There is no difference between COMPLEX and TWOLEVEL in this regard.
 Elina Dale posted on Sunday, May 12, 2013 - 9:11 am
Dear Dr. Muthen,

I am wondering about the difference between TYPE=COMPLEX ad TYPE=TWOLEVEL analysis of SEM in MPlus.

In traditional regression modeling, there is a distinction between population average and subject specific models. Population average models such as GEE describe the covariance among clustered observations, whereas SS/hierarchical models explain the source of this covariance. So, the coefficients are interpreted differently: PA model estimates the difference in Y b/n group A with X and group B without X; the SS model the expected change in individual's probability of Y given change in X.

I am wondering if I use TYPE=COMPLEX in my SEM as I have clustered data, the coefficient from my structural model - effect of treatment X on a latent factor F - is it interpreted as PA or SS? In other words, with specification COMPLEX, do we have a population average model or random effects model in MPlus?

Do we need to specify TWOLEVEL to have a subject specific interpretation of coefficients? Thank you!
 Linda K. Muthen posted on Monday, May 13, 2013 - 8:46 am
Subject-specific refers to random coefficients. You would need to use TYPE=TWOLEVEL RANDOM with random coefficients. TYPE=COMPLEX adjusts the standard errors for non-independence of observations.
 Elina Dale posted on Monday, May 13, 2013 - 10:34 am
So, TYPE=COMPLEX is a marginal model?

Are the coefficients interpreted as population average as in marginal models explained in papers by Zeger et al, 1988? It would be helpful to get a bit more explanation as to how some of MPlus specifications relate to more widespread / traditional types of analyses.

Thank you!
 Bengt O. Muthen posted on Monday, May 13, 2013 - 8:46 pm
A single-level regression model (linear or logistic) is a "widespread/traditional type of analysis" - if you have a regression model and use TYPE=COMPLEX you are doing regression analysis and you get your SEs adjusted for complex survey data features. So the interpretation is the usual one for regression modeling. Same for factor analysis. If you have two-level data and don't do TYPE=TWOLEVEL but do TYPE=COMPLEX you get a so called "aggregated" model using terms in well-known complex survey data literature such as the 1989 Analysis of Complex Surveys book edited by Skinner, Holt, and Smith.

GEE is a limited-information estimator, not a full-information maximum-likelihood estimator. You can see the relationship between GEE estimation and the closely related limited-information WLSMV estimation in Mplus in the paper on factor analysis on our website:

Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report.
 Elina Dale posted on Tuesday, May 14, 2013 - 10:14 am
Thank you, Dr. Muthen! So does this mean that TYPE=COMPLEX specifies design-based or model-based analysis?

It is vital for me to understand what this specification implies, so please, forgive my persistence.

Skinner et al. distinguish (A) design vs. (B) model based approaches to analysis. Within model-based approach we have (a) aggregated (marginal) and (b) disaggregated (random effects) models.

"A basic distinction is between design-based and model-based inference.... Aggregated analysis may therefore alternatively be referred to as
marginal modelling and the distinction between aggregated and disaggregated analysis is analogous, to a limited extent, to the distinction between population-averaged and subject-specific analysis, widely used in biostatistics."

Thank you again!
 Bengt O. Muthen posted on Tuesday, May 14, 2013 - 6:13 pm
You may be interested in the chapter:

Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.

The labels you refer to are not always clear cut (at least not to me) so I'll describe what we do instead. With TYPE=COMPLEX we do complex survey SEs using the Hubert-White sandwich estimator. The parameters are the usual single-level parameters. The fact that we can also handle SE calculations based on replicate weights might qualify us for the design-based camp; I am not sure about these distinctions. TYPE=COMPLEX does an aggregated analysis when data are hierarchical (say twolevel) because it doesn't model parameters on both levels. In contrast, TYPE=TWOLEVEL or TYPE = COMPLEX TWOLEVEL does a disaggregated analysis. I discuss the difference in the above chapter in terms of factor analysis.

You can also read more about what we do by reading the papers under our Complex Survey Data section:
 Bengt O. Muthen posted on Tuesday, May 14, 2013 - 6:23 pm
Another useful book is the 2003 Chambers & Skinner Wiley book.
 Elina Dale posted on Wednesday, May 15, 2013 - 5:37 am
Thank you so much, Dr. Muthen! Really appreciate your response, I think I understand now. I have also started looking at Chambers & Skinner 2003 book. Will check out your paper for which you sent me the link.
 Elina Dale posted on Monday, August 12, 2013 - 10:06 am
Dear Dr. Muthen, I have re-read your paper (B. Muthen & A.Sattora, 1995) on complex sample data in SEM and I still have clarifying questions on the procedure used by MPlus when I specify "COMPLEX" in the Analysis.
On pp. 281-288, you describe the aggregated analysis, which Chambers & Skinner (2003) say "may alternatively be referred to as marginal modeling".
I would greatly appreciate it if you could clarify:
1) whether the aggregated approach as described in Muthen & Sattora (1995) is a model or design-based approach to inference, b/c it can be used in either according to Chambers & Skinner (2003);
2) whether "COMPLEX" specification is a model-based aggregated approach.
Last question. Typically, as you say, design-based analysis uses weights in parameter estimation. I wonder if weights are required when using "COMPLEX". Thank you!
 Bengt O. Muthen posted on Tuesday, August 13, 2013 - 10:30 am
1) I see it as a model-based approach

2) I see COMPLEX as a model-based aggregated approach.

Weights are not required when using COMPLEX. For instance, there may be just clustering.
 Elina Dale posted on Wednesday, August 14, 2013 - 8:11 am
Thank you, Dr. Muthen! This is very helpful.
 Christoph Weber posted on Friday, February 14, 2014 - 7:10 am
Dear Dr. Muthen!
I am analysing threelevel data (students, classes, schools). The question is, if a school system reform has an effect on the achievement of students.

8. grade students were tested before the reform was implemented and then 8. grade students after the reform (using the same schools).

I'm using a threelevel model with "reform (0/1)" on the class level and estimate the effect on achievemnt (class level). Is this correct?
Further I wonder why I get a different estimate for "achievement ON reform", when I use type = complex (cluster = class)? The estimate for type complex is equal to the simple mean difference between reform = 0 and reform = 1 using SPSS.

christoph weber
 Bengt O. Muthen posted on Friday, February 14, 2014 - 11:57 am
You say that this is a school system reform; isn't your "reform" variable a school-level variable and not a classroom-level variable?

The 3-level model results won't agree with Type=complex with cluster=class because the latter takes only classroom clustering into account. You would do better with Type=Complex Twolevel and define 2 cluster variables: school and classroom (see UG).
 Christoph Weber posted on Friday, February 14, 2014 - 1:55 pm
Thanks, I treat reform as a class level variable, because we have a kind of trend analysis. The same schools were tested two times (8.graders 2008 and 8.graders 2012), thus there is variation of "reform" within the school clusters.

I thought that taking the complex design into account (complex or multilevel) just affects the SE, not the estimates.
Sorry, what does UG mean?

 Christoph Weber posted on Friday, February 14, 2014 - 2:01 pm
I get it, users guide
 Christoph Weber posted on Tuesday, February 25, 2014 - 6:10 am
One more question. I read a pdf "Mplus Short courses topic 7 multilevel modeling with ...".
There is a ranom effects ANOVA example comparing type = twolevel with type = complex. Both models yield the same mean and SE. When I compare the two types with my data I get different means. Is this because of different cluster sizes?. The Anova Example uses data with equal cluster size.

Christoph Weber
 Bengt O. Muthen posted on Tuesday, February 25, 2014 - 12:20 pm
Yes, my results were for equal cluster sizes.
 Christoph Weber posted on Tuesday, February 25, 2014 - 12:32 pm
Will twolevel and complex only yield the same results with equal cluster sizes?
 Linda K. Muthen posted on Tuesday, February 25, 2014 - 1:16 pm
 Christoph Weber posted on Saturday, July 19, 2014 - 3:56 am
To come back to the question:
Why do type complex and twolevel yield different means with unequal cluster sizes?
 Linda K. Muthen posted on Monday, July 28, 2014 - 10:36 am
The mean for type=complex is the average of Y_{ij}.

The mean for type=twolevel is the average of the average Y_{ij}.
 Keith Sanford posted on Tuesday, August 11, 2015 - 5:34 pm
I am running a confirmatory factor analysis with TYPE = COMPLEX and a CLUSTER variable and no weighting variable, and I have three questions.

1. Is this approach appropriate for my dataset (described below)?

2. Do the equations in in the Aggregated Modeling section (4.1) of the Muthén and Satorra (1995) paper on Complex Sample Data in SEM describe how MPlus estimates standard errors and fit statistics in this situation?

3. Would I get more accurate parameter estimates if I added a weighting variable?

My dataset includes a total of 600 people who are married. The dataset includes 100 couples with two participating partners, and the remaining 400 people are independent cases participating without their partners. Thus, I have 500 clusters, and most clusters contain only one case. I want to estimate an aggregate confirmatory factor analysis that handles the small amount of non-independence in the dataset.


 Bengt O. Muthen posted on Wednesday, August 12, 2015 - 4:48 pm
Yes on all 3 questions.
 Matthew Porter Wilcox posted on Friday, January 22, 2016 - 10:31 pm
After having read the previous posts, the UG, and the chapter suggested, I am still not clear as to which is more appropriate for my data set.

Teachers (n=93) have each rated their students (n=2,122). Each rates only the students in their class, and students rated only by their teacher. The scale purports to measure antisocial behavior and is comprised of 12 items scored with a 4-category rating scale. It is hypothesized to have two constructs: externalizing behavior and internalizing behavior. I am conducing a cross-sectional study using CFA to provide evidence of the factor structure, test assumptions of Cronbach’s Alpha, and evaluate Measurement Invariance for the scale between Male and Female students.

I have run specified models using both TYPE=Complex and TYPE=Twolevel, with CLUSTER=class for both. I have no missing data, and have used WLSMV for both complex and twolevel commands.

Is one approach more appropriate that the other?
 Bengt O. Muthen posted on Saturday, January 23, 2016 - 6:20 pm
Type=Twolevel modeling is a more ambitious approach where you estimate new parameters beyond those of Type=Complex. Type=Complex estimates the usual single-level CFA parameters and simply adjusts the SEs and chi-square for the clustering.

So you can learn more from Twolevel analysis and it is better able to reveal the factor structure.
 Regula Windlinger posted on Thursday, November 10, 2016 - 2:47 am
I have diary data (n=741) of teachers (n=72) in Schools (n=15). I am aware that I cannot use a threelevel-Model with only 15 Schools.

1. Would it be OK to use TYPE = Twolevel complex to take the clustering at School Level into account?
2. Would it be wrong to use a School Level variable (School leadership) as a predictor on the teacher Level?

Thank you very much!
 Bengt O. Muthen posted on Thursday, November 10, 2016 - 12:20 pm
1. No, Complex also needs at the very least 20 clusters.

2. No.
 Margarita  posted on Monday, February 27, 2017 - 6:49 am
Dear Dr. Muthens,

Are there any occasions in which Twolevel is used but a model is not defined in the between level? E.g. consider the following scenarios for y by x1-x25 (x1-x25 were measured in the within level)

1) Usevariables = x1-x25;
categorical = x1-x25;
cluster = cl;
Type = complex;
y by x1-x25;

2) Usevariables = x1-x25;
categorical = x1-x25;
clusters = cl;
Type = twolevel;
y by x1-x25;


3) Usevariables = x1-x25;
categorical = x1-x25;
within = x1-x25;
clusters = cl;
Type = twolevel;
y by x1-x25;


If one were not interested in modeling cluster-level parameters, what would be the right choice? I have definitely seen people using the third choice when wanting to account for the clustering in their data, but I am not sure what the benefit is. I noticed that in scenario 2, when x1-x25 were not defined as within variables (within = x1-x25) their thresholds were modeled in the between level, which I understand why. Should I then follow 1 or 2, if I am only interested in taking into account the clustering in the data? (note. clusters = 38, DEFF > 2 for all variables).

Thank you for your constant help.

 Bengt O. Muthen posted on Monday, February 27, 2017 - 6:45 pm
Approach 3) is just wrong because it says that the x's have no between-level variation.

The choice between 1) and 2) relates to the "aggregatability" concept discussed in the chapter on our website:

Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.
download paper contact first author show abstract
 Margarita  posted on Thursday, March 02, 2017 - 8:28 am
Thank you Dr. Muthen that's very helpful. If only some of the items within a measure have substantial ICCs and DEFF values, would it make sense, based on your experience, to define those items with low ICC/DEFF as within variables, and estimate the parameters on the between level only for those with acceptable ICC/DEFF values? I haven't seen twolevel CFA examples in the literature (at least I haven't found any) where they divide the items of a measure based on their ICC, so I am wondering whether by accounting for the clustering in the data for items that do not have between variance would be counterproductive.
 Bengt O. Muthen posted on Thursday, March 02, 2017 - 5:54 pm
You are doing a 2-level factor analysis. I would not separate items into high and low ICC/DEFF but include them all in the 2-level analysis and simply report the low, insignificant between-level loadings for the items with low ICC.
 Tamara Carigiet posted on Wednesday, February 14, 2018 - 5:19 am

I’m a very beginner in MPLUS and I try to figure out what the correct analysis for my research question is/will be. I want to test effects of level-2-predictors (teachers practices) on level-1-outcomes (success of transition to school of the children). Actually, I’m not really interested in differentiated results for classes or within/between-group variance (maybe I’m wrong here). I first of all want to test “main effects” (random intercepts) and be able to say if predictors (at level-1, level-2) have an impact on children’s transition-success and how large the effect (in average, over all the 39 classes) is. Can I do this using the command type=complex? Or do I have to use type=twolevel (which seems to me correct anyway but much more complicated then the former).

Thank you very much for your assistance!
 Bengt O. Muthen posted on Wednesday, February 14, 2018 - 4:23 pm
Type=Complex would be fine, particularly if you don't think the within- and between-level relationships differ.
 Ashley McDonald posted on Friday, June 01, 2018 - 6:51 pm

I am beginner in MPLUS and trying to figure out the correct analysis for my research question.

I have dyadic data from mothers and fathers. I am interested in running a path analysis with 2 predictor variables, 2 mediators and 1 outcome variable all at the individual level. Can I just use type = complex and using the cluster variable by the couple ID to estimate the model or do I need to run multilevel analysis with type = two level?

I would like to see if the model differs for moms and dads. Is it possible to compare the groups given the non-independence of the samples? If so how would I go about doing that.

Thank you in advance for you assistance.
 Bengt O. Muthen posted on Monday, June 04, 2018 - 1:03 pm
You can analyze this in a single-level, multivariate form, that is, your data set has 5+5 variables (columns) and you model mothers and fathers jointly - that accounts for their non-independence. You see papers on this under Dyadic Analysis on our website at

That also makes it easy to compare mom and dad parameters (e.g. using Model Test).
 Ashley McDonald posted on Monday, June 04, 2018 - 4:12 pm

Thanks for the response. I wanted to make sure that I understand. Are you suggesting an APIM model?

 Bengt O. Muthen posted on Monday, June 04, 2018 - 4:51 pm
 Ashley McDonald posted on Monday, June 04, 2018 - 5:13 pm

Are there other options if I am not interested in doing an APIM model?

 Bengt O. Muthen posted on Monday, June 04, 2018 - 5:22 pm
Seems like that gives the most flexible model, but you can use a Type=Twolevel approach as well. Type=Complex would say that moms and dads have the same parameter values which would be limiting(it simply corrects the SEs for non-independence).
 Aurelie Lange posted on Friday, April 05, 2019 - 5:31 am
Dear Dr Muthen,

I have siblings in my dataset. Some of the variables are reported by the children and some by the parents. I know that I can use type=complex to account for interdependence of siblings on the child-reported variables.
Can I also use type=complex when including parent-reported variables or should I use type=twolevel to account for the different levels in my data?

Thank you so much!

 Bengt O. Muthen posted on Saturday, April 06, 2019 - 12:59 pm
You might want to use twolevel.
 Nadim Khatib posted on Thursday, February 06, 2020 - 12:19 pm
After trying to fit a model as a twolevel model I find that estimates are very small on the between-level, the fit (based on SRMR between) is bad, or it doesn't converge. Should I just use type=complex instead?
 Bengt O. Muthen posted on Thursday, February 06, 2020 - 3:31 pm
See if the SEs change when you use Type=Complex versus regular. If not, this indicates that you don't need twolevel modeling.
 Stefan P. posted on Thursday, July 16, 2020 - 4:28 am
Dear Dres. M&M,

i have latent variables (Big Five), each measured by three observed variables at four timepoints. Now i want to predict the score by age and gender - with corrected standard errors due to repeated measures. Thus, i need the mean (of the latent variable).
1) Isn't it possible to condcuct just a type=complex analysis with latent variables?
2) How can i obtain the mean?

Thank you very much
 Bengt O. Muthen posted on Thursday, July 16, 2020 - 4:24 pm
1) Yes, but you may not get the same results as if you to either a two-level analysis (time and person), or a single-level analysis in wide format where you have 3x4 variables.

See also longitudinal factor analysis discussions in my Mplus Web Talk No. 1 at

2) You don't need latent variable means to study the effects of age and gender. In fact, with the Complex approach, you can't get them.
 Stefan P. posted on Saturday, July 18, 2020 - 6:12 am
I'm totally desperate right now:
1) I tried the single-level analysis in wide format for the last three weeks - until I came to the conclusion that a growth curve model makes no sense, since I don't want to examine the dependence of time but of age (we already talked about DATA COHORT in this context here in the forum).
2) How can I investigate the effect of age on - say - openness (O) without knowing the mean value for O? Ultimately, I want to have a regression function of the form O = mean + b1*age + b2*b^2 + b3*b^3 + female + e.
P215 P220 P225 ;!O
Ow BY P215 P220 P225 (p1-p3);
s | Ow ON AGE AGE_2 AGE_3;
Ob BY P215 P220 P225 (p1-p3);
P215@0 P220@0 P225@0;
 Bengt O. Muthen posted on Saturday, July 18, 2020 - 11:22 am
1) In the longitudinal factor analysis framework I talk about in my web talk, you don't need a growth model for the factors. It has the factor means being different in a free way at the different time points (fixed at zero at the first time point).

2) Perhaps you are saying that people are of different ages at the different time points. Perhaps you are saying that the time distances between the waves are not the same. It isn't clear to me.

Btw, this statement is incorrect because a random slope needs to refer to just 1 slope per statement and you mention 3:

s | Ow ON AGE AGE_2 AGE_3;
 Stefan P. posted on Monday, July 20, 2020 - 1:57 am
1) slowly I'm beginning to realize (I hope)...
unfortunately the output tells me the error message NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED.
Although the diagram looks good. What could that be?
2) I want to know how the average trait score develops over age - corrected for repeated measures (the time distance is alwasy 4 years).
 Bengt O. Muthen posted on Monday, July 20, 2020 - 6:06 pm
We need to see your full output to diagnose this - send to Support along with your license number.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message