Hi, hope all is well. I was hoping to get your insight/input on the following:
I have survey data (6 dependent variables) for 300 customers. These 300 customers belong to one of 10 different providers. Hence, the cases are not IID.
I also have survey data from these 10 providers (1 independent variable).
I want to test the relationship between the provider-level independent variable and three of the customer-level dependent variables using structural equation modeling (in Mplus 3.10). Given that my between-level sample size is only 10, that means that multi-level modeling is out of the question. Right?
If I disaggregate the provider level data (i.e. assign the same value for the independent variable to each customer who shares the same provider) would it be appropriate to run the SEM model (n=300) using the type=complex design?
I'd appreciate any feedback/suggestions you could offer.
bmuthen posted on Thursday, March 10, 2005 - 4:27 pm
Yes, I think 10 clusters makes 2-level modeling not perform well. Unfortunately, our simulations suggest that type=complex also needs more clusters than 10 - at least 20. So perhaps the only way is to view "provider" as a fixed effect instead of a random effect and use 9 provider dummy variables as covariates. Bayesian analysis using priors is theonly way I have seen that attempts to deal with such a small number of clusters.
Herb Marsh posted on Monday, April 08, 2013 - 6:33 pm
Why is it no longer possible to use Type=complex to get correct standard errors for analyses are done at level 1 when there are three levels.
For example, results at the student level when students are nested within classes, and classes are nested within schools.
COMPLEX TWOLEVEL, THREELEVEL, and COMPLEX THREELEVEL are all available. There have been no changes. I'm not sure I understand your question.
Herb Marsh posted on Tuesday, April 09, 2013 - 9:14 pm
Linda: Here is what I did and the error message that I got. I recall that it was possible to have two cluster variables when analyses were done only at level 1, but maybe I am mistaken. In any case, why is it apparently not allowed.
*** ERROR in VARIABLE command Two cluster variables are allowed for TYPE=TWOLEVEL COMPLEX. Only one cluster variable is allowed for TYPE=COMPLEX (single level). Limit on the number of cluster variables reached.
I have a three level model with observations (level 1) nested within individuals (level 2) nested within cities (level 3). The data I am using requires sampling weights and we only have observations at 2 timepoints. From Chapter Nine of the users guide (p. 252) I am not sure whether I should treat this model as TYPE=TWOLEVEL (and treat our two observation points as "time") or TYPE=THREELEVEL and treat our first level as cross sectional? In addition, the outcome we are using is a count variable and it is my understanding that users can't use sample weights with count variables in TYPE=THREELEVEL?
I would treat this a a TWOLEVEL analysis with data in the wide format. THREELEVEL is not available for count variables.
anonymous posted on Wednesday, May 08, 2013 - 12:04 pm
Hello, I'm aware that TYPE=COMPLEX with the cluster option adjusts for non-independence in terms of the chi-square statistic and the standard errors of the model, but not the parameter estimates (parameter estimates are adjusted for with the multi-level modeling). Is it that the TYPE=COMPLEX and cluster option only adjusts for the parameter estimates' significance, but not their magnitude? I'm wondering whether it is appropriate to estimate a model of treatment effects involving children nested within schools using the TYPE=COMPLEX and cluster option.
Parameter estimates are adjusted if the WEIGHT option is used. There is no difference between COMPLEX and TWOLEVEL in this regard.
Elina Dale posted on Sunday, May 12, 2013 - 9:11 am
Dear Dr. Muthen,
I am wondering about the difference between TYPE=COMPLEX ad TYPE=TWOLEVEL analysis of SEM in MPlus.
In traditional regression modeling, there is a distinction between population average and subject specific models. Population average models such as GEE describe the covariance among clustered observations, whereas SS/hierarchical models explain the source of this covariance. So, the coefficients are interpreted differently: PA model estimates the difference in Y b/n group A with X and group B without X; the SS model the expected change in individual's probability of Y given change in X.
I am wondering if I use TYPE=COMPLEX in my SEM as I have clustered data, the coefficient from my structural model - effect of treatment X on a latent factor F - is it interpreted as PA or SS? In other words, with specification COMPLEX, do we have a population average model or random effects model in MPlus?
Do we need to specify TWOLEVEL to have a subject specific interpretation of coefficients? Thank you!
Subject-specific refers to random coefficients. You would need to use TYPE=TWOLEVEL RANDOM with random coefficients. TYPE=COMPLEX adjusts the standard errors for non-independence of observations.
Elina Dale posted on Monday, May 13, 2013 - 10:34 am
So, TYPE=COMPLEX is a marginal model?
Are the coefficients interpreted as population average as in marginal models explained in papers by Zeger et al, 1988? It would be helpful to get a bit more explanation as to how some of MPlus specifications relate to more widespread / traditional types of analyses.
A single-level regression model (linear or logistic) is a "widespread/traditional type of analysis" - if you have a regression model and use TYPE=COMPLEX you are doing regression analysis and you get your SEs adjusted for complex survey data features. So the interpretation is the usual one for regression modeling. Same for factor analysis. If you have two-level data and don't do TYPE=TWOLEVEL but do TYPE=COMPLEX you get a so called "aggregated" model using terms in well-known complex survey data literature such as the 1989 Analysis of Complex Surveys book edited by Skinner, Holt, and Smith.
GEE is a limited-information estimator, not a full-information maximum-likelihood estimator. You can see the relationship between GEE estimation and the closely related limited-information WLSMV estimation in Mplus in the paper on factor analysis on our website:
Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report.
Elina Dale posted on Tuesday, May 14, 2013 - 10:14 am
Thank you, Dr. Muthen! So does this mean that TYPE=COMPLEX specifies design-based or model-based analysis?
It is vital for me to understand what this specification implies, so please, forgive my persistence.
Skinner et al. distinguish (A) design vs. (B) model based approaches to analysis. Within model-based approach we have (a) aggregated (marginal) and (b) disaggregated (random effects) models.
"A basic distinction is between design-based and model-based inference.... Aggregated analysis may therefore alternatively be referred to as marginal modelling and the distinction between aggregated and disaggregated analysis is analogous, to a limited extent, to the distinction between population-averaged and subject-specific analysis, widely used in biostatistics."
Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.
The labels you refer to are not always clear cut (at least not to me) so I'll describe what we do instead. With TYPE=COMPLEX we do complex survey SEs using the Hubert-White sandwich estimator. The parameters are the usual single-level parameters. The fact that we can also handle SE calculations based on replicate weights might qualify us for the design-based camp; I am not sure about these distinctions. TYPE=COMPLEX does an aggregated analysis when data are hierarchical (say twolevel) because it doesn't model parameters on both levels. In contrast, TYPE=TWOLEVEL or TYPE = COMPLEX TWOLEVEL does a disaggregated analysis. I discuss the difference in the above chapter in terms of factor analysis.
You can also read more about what we do by reading the papers under our Complex Survey Data section:
Another useful book is the 2003 Chambers & Skinner Wiley book.
Elina Dale posted on Wednesday, May 15, 2013 - 5:37 am
Thank you so much, Dr. Muthen! Really appreciate your response, I think I understand now. I have also started looking at Chambers & Skinner 2003 book. Will check out your paper for which you sent me the link.
Elina Dale posted on Monday, August 12, 2013 - 10:06 am
Dear Dr. Muthen, I have re-read your paper (B. Muthen & A.Sattora, 1995) on complex sample data in SEM and I still have clarifying questions on the procedure used by MPlus when I specify "COMPLEX" in the Analysis. On pp. 281-288, you describe the aggregated analysis, which Chambers & Skinner (2003) say "may alternatively be referred to as marginal modeling". I would greatly appreciate it if you could clarify: 1) whether the aggregated approach as described in Muthen & Sattora (1995) is a model or design-based approach to inference, b/c it can be used in either according to Chambers & Skinner (2003); 2) whether "COMPLEX" specification is a model-based aggregated approach. Last question. Typically, as you say, design-based analysis uses weights in parameter estimation. I wonder if weights are required when using "COMPLEX". Thank you!
Dear Dr. Muthen! I am analysing threelevel data (students, classes, schools). The question is, if a school system reform has an effect on the achievement of students.
8. grade students were tested before the reform was implemented and then 8. grade students after the reform (using the same schools).
I'm using a threelevel model with "reform (0/1)" on the class level and estimate the effect on achievemnt (class level). Is this correct? Further I wonder why I get a different estimate for "achievement ON reform", when I use type = complex (cluster = class)? The estimate for type complex is equal to the simple mean difference between reform = 0 and reform = 1 using SPSS.
You say that this is a school system reform; isn't your "reform" variable a school-level variable and not a classroom-level variable?
The 3-level model results won't agree with Type=complex with cluster=class because the latter takes only classroom clustering into account. You would do better with Type=Complex Twolevel and define 2 cluster variables: school and classroom (see UG).
Thanks, I treat reform as a class level variable, because we have a kind of trend analysis. The same schools were tested two times (8.graders 2008 and 8.graders 2012), thus there is variation of "reform" within the school clusters.
I thought that taking the complex design into account (complex or multilevel) just affects the SE, not the estimates. Sorry, what does UG mean?
One more question. I read a pdf "Mplus Short courses topic 7 multilevel modeling with ...". There is a ranom effects ANOVA example comparing type = twolevel with type = complex. Both models yield the same mean and SE. When I compare the two types with my data I get different means. Is this because of different cluster sizes?. The Anova Example uses data with equal cluster size.
I am running a confirmatory factor analysis with TYPE = COMPLEX and a CLUSTER variable and no weighting variable, and I have three questions.
1. Is this approach appropriate for my dataset (described below)?
2. Do the equations in in the Aggregated Modeling section (4.1) of the Muthén and Satorra (1995) paper on Complex Sample Data in SEM describe how MPlus estimates standard errors and fit statistics in this situation?
3. Would I get more accurate parameter estimates if I added a weighting variable?
My dataset includes a total of 600 people who are married. The dataset includes 100 couples with two participating partners, and the remaining 400 people are independent cases participating without their partners. Thus, I have 500 clusters, and most clusters contain only one case. I want to estimate an aggregate confirmatory factor analysis that handles the small amount of non-independence in the dataset.
After having read the previous posts, the UG, and the chapter suggested, I am still not clear as to which is more appropriate for my data set.
Teachers (n=93) have each rated their students (n=2,122). Each rates only the students in their class, and students rated only by their teacher. The scale purports to measure antisocial behavior and is comprised of 12 items scored with a 4-category rating scale. It is hypothesized to have two constructs: externalizing behavior and internalizing behavior. I am conducing a cross-sectional study using CFA to provide evidence of the factor structure, test assumptions of Cronbach’s Alpha, and evaluate Measurement Invariance for the scale between Male and Female students.
I have run specified models using both TYPE=Complex and TYPE=Twolevel, with CLUSTER=class for both. I have no missing data, and have used WLSMV for both complex and twolevel commands.
Type=Twolevel modeling is a more ambitious approach where you estimate new parameters beyond those of Type=Complex. Type=Complex estimates the usual single-level CFA parameters and simply adjusts the SEs and chi-square for the clustering.
So you can learn more from Twolevel analysis and it is better able to reveal the factor structure.
I have diary data (n=741) of teachers (n=72) in Schools (n=15). I am aware that I cannot use a threelevel-Model with only 15 Schools.
1. Would it be OK to use TYPE = Twolevel complex to take the clustering at School Level into account? 2. Would it be wrong to use a School Level variable (School leadership) as a predictor on the teacher Level?
1. No, Complex also needs at the very least 20 clusters.
Margarita posted on Monday, February 27, 2017 - 6:49 am
Dear Dr. Muthens,
Are there any occasions in which Twolevel is used but a model is not defined in the between level? E.g. consider the following scenarios for y by x1-x25 (x1-x25 were measured in the within level)
1) Usevariables = x1-x25; categorical = x1-x25; cluster = cl; Type = complex; Model: y by x1-x25;
2) Usevariables = x1-x25; categorical = x1-x25; clusters = cl; Type = twolevel; Model: %within% y by x1-x25;
3) Usevariables = x1-x25; categorical = x1-x25; within = x1-x25; clusters = cl; Type = twolevel; Model: %within% y by x1-x25;
If one were not interested in modeling cluster-level parameters, what would be the right choice? I have definitely seen people using the third choice when wanting to account for the clustering in their data, but I am not sure what the benefit is. I noticed that in scenario 2, when x1-x25 were not defined as within variables (within = x1-x25) their thresholds were modeled in the between level, which I understand why. Should I then follow 1 or 2, if I am only interested in taking into account the clustering in the data? (note. clusters = 38, DEFF > 2 for all variables).
Approach 3) is just wrong because it says that the x's have no between-level variation.
The choice between 1) and 2) relates to the "aggregatability" concept discussed in the chapter on our website:
Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316. download paper contact first author show abstract
Margarita posted on Thursday, March 02, 2017 - 8:28 am
Thank you Dr. Muthen that's very helpful. If only some of the items within a measure have substantial ICCs and DEFF values, would it make sense, based on your experience, to define those items with low ICC/DEFF as within variables, and estimate the parameters on the between level only for those with acceptable ICC/DEFF values? I haven't seen twolevel CFA examples in the literature (at least I haven't found any) where they divide the items of a measure based on their ICC, so I am wondering whether by accounting for the clustering in the data for items that do not have between variance would be counterproductive.
You are doing a 2-level factor analysis. I would not separate items into high and low ICC/DEFF but include them all in the 2-level analysis and simply report the low, insignificant between-level loadings for the items with low ICC.
I’m a very beginner in MPLUS and I try to figure out what the correct analysis for my research question is/will be. I want to test effects of level-2-predictors (teachers practices) on level-1-outcomes (success of transition to school of the children). Actually, I’m not really interested in differentiated results for classes or within/between-group variance (maybe I’m wrong here). I first of all want to test “main effects” (random intercepts) and be able to say if predictors (at level-1, level-2) have an impact on children’s transition-success and how large the effect (in average, over all the 39 classes) is. Can I do this using the command type=complex? Or do I have to use type=twolevel (which seems to me correct anyway but much more complicated then the former).
I am beginner in MPLUS and trying to figure out the correct analysis for my research question.
I have dyadic data from mothers and fathers. I am interested in running a path analysis with 2 predictor variables, 2 mediators and 1 outcome variable all at the individual level. Can I just use type = complex and using the cluster variable by the couple ID to estimate the model or do I need to run multilevel analysis with type = two level?
I would like to see if the model differs for moms and dads. Is it possible to compare the groups given the non-independence of the samples? If so how would I go about doing that.
You can analyze this in a single-level, multivariate form, that is, your data set has 5+5 variables (columns) and you model mothers and fathers jointly - that accounts for their non-independence. You see papers on this under Dyadic Analysis on our website at
Seems like that gives the most flexible model, but you can use a Type=Twolevel approach as well. Type=Complex would say that moms and dads have the same parameter values which would be limiting(it simply corrects the SEs for non-independence).
I have siblings in my dataset. Some of the variables are reported by the children and some by the parents. I know that I can use type=complex to account for interdependence of siblings on the child-reported variables. Can I also use type=complex when including parent-reported variables or should I use type=twolevel to account for the different levels in my data?
Nadim Khatib posted on Thursday, February 06, 2020 - 12:19 pm
After trying to fit a model as a twolevel model I find that estimates are very small on the between-level, the fit (based on SRMR between) is bad, or it doesn't converge. Should I just use type=complex instead?
See if the SEs change when you use Type=Complex versus regular. If not, this indicates that you don't need twolevel modeling.
Stefan P. posted on Thursday, July 16, 2020 - 4:28 am
Dear Dres. M&M,
i have latent variables (Big Five), each measured by three observed variables at four timepoints. Now i want to predict the score by age and gender - with corrected standard errors due to repeated measures. Thus, i need the mean (of the latent variable). 1) Isn't it possible to condcuct just a type=complex analysis with latent variables? 2) How can i obtain the mean?
2) You don't need latent variable means to study the effects of age and gender. In fact, with the Complex approach, you can't get them.
Stefan P. posted on Saturday, July 18, 2020 - 6:12 am
I'm totally desperate right now: 1) I tried the single-level analysis in wide format for the last three weeks - until I came to the conclusion that a growth curve model makes no sense, since I don't want to examine the dependence of time but of age (we already talked about DATA COHORT in this context here in the forum). 2) How can I investigate the effect of age on - say - openness (O) without knowing the mean value for O? Ultimately, I want to have a regression function of the form O = mean + b1*age + b2*b^2 + b3*b^3 + female + e. The code until now produces the "BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE" error: USEVARIABLES ARE SEX AGE AGE_2 AGE_3 P215 P220 P225 ;!O MISSING ARE ALL (-99); WITHIN = AGE AGE_2 AGE_3; BETWEEN = SEX; CLUSTER = PID; Analysis: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% Ow BY P215 P220 P225 (p1-p3); s | Ow ON AGE AGE_2 AGE_3; %BETWEEN% Ob BY P215 P220 P225 (p1-p3); P215@0P220@0P225@0; Ob ON SEX; OUTPUT: SAMPSTAT;
1) In the longitudinal factor analysis framework I talk about in my web talk, you don't need a growth model for the factors. It has the factor means being different in a free way at the different time points (fixed at zero at the first time point).
2) Perhaps you are saying that people are of different ages at the different time points. Perhaps you are saying that the time distances between the waves are not the same. It isn't clear to me.
Btw, this statement is incorrect because a random slope needs to refer to just 1 slope per statement and you mention 3:
s | Ow ON AGE AGE_2 AGE_3;
Stefan P. posted on Monday, July 20, 2020 - 1:57 am
1) slowly I'm beginning to realize (I hope)... unfortunately the output tells me the error message NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED. Although the diagram looks good. What could that be? 2) I want to know how the average trait score develops over age - corrected for repeated measures (the time distance is alwasy 4 years).