Message/Author 

Alex Zablah posted on Thursday, March 10, 2005  11:31 am



Hi, hope all is well. I was hoping to get your insight/input on the following: I have survey data (6 dependent variables) for 300 customers. These 300 customers belong to one of 10 different providers. Hence, the cases are not IID. I also have survey data from these 10 providers (1 independent variable). I want to test the relationship between the providerlevel independent variable and three of the customerlevel dependent variables using structural equation modeling (in Mplus 3.10). Given that my betweenlevel sample size is only 10, that means that multilevel modeling is out of the question. Right? If I disaggregate the provider level data (i.e. assign the same value for the independent variable to each customer who shares the same provider) would it be appropriate to run the SEM model (n=300) using the type=complex design? I'd appreciate any feedback/suggestions you could offer. Best regards, Alex 

bmuthen posted on Thursday, March 10, 2005  4:27 pm



Yes, I think 10 clusters makes 2level modeling not perform well. Unfortunately, our simulations suggest that type=complex also needs more clusters than 10  at least 20. So perhaps the only way is to view "provider" as a fixed effect instead of a random effect and use 9 provider dummy variables as covariates. Bayesian analysis using priors is theonly way I have seen that attempts to deal with such a small number of clusters. 

Herb Marsh posted on Monday, April 08, 2013  6:33 pm



Why is it no longer possible to use Type=complex to get correct standard errors for analyses are done at level 1 when there are three levels. For example, results at the student level when students are nested within classes, and classes are nested within schools. 


COMPLEX TWOLEVEL, THREELEVEL, and COMPLEX THREELEVEL are all available. There have been no changes. I'm not sure I understand your question. 

Herb Marsh posted on Tuesday, April 09, 2013  9:14 pm



Linda: Here is what I did and the error message that I got. I recall that it was possible to have two cluster variables when analyses were done only at level 1, but maybe I am mistaken. In any case, why is it apparently not allowed. cluster is ID_Schl Id_Class ; ... ANALYSIS: type= complex; ESTIMATOR=MLR; .... *** ERROR in VARIABLE command Two cluster variables are allowed for TYPE=TWOLEVEL COMPLEX. Only one cluster variable is allowed for TYPE=COMPLEX (single level). Limit on the number of cluster variables reached. 


We've never allowed more than one cluster variable with TYPE=COMPLEX. You would need to use TWOLEVEL COMPLEX to handle two cluster variables. 


Hello, I have a three level model with observations (level 1) nested within individuals (level 2) nested within cities (level 3). The data I am using requires sampling weights and we only have observations at 2 timepoints. From Chapter Nine of the users guide (p. 252) I am not sure whether I should treat this model as TYPE=TWOLEVEL (and treat our two observation points as "time") or TYPE=THREELEVEL and treat our first level as cross sectional? In addition, the outcome we are using is a count variable and it is my understanding that users can't use sample weights with count variables in TYPE=THREELEVEL? Many thanks, Melissa 


I would treat this a a TWOLEVEL analysis with data in the wide format. THREELEVEL is not available for count variables. 

anonymous posted on Wednesday, May 08, 2013  12:04 pm



Hello, I'm aware that TYPE=COMPLEX with the cluster option adjusts for nonindependence in terms of the chisquare statistic and the standard errors of the model, but not the parameter estimates (parameter estimates are adjusted for with the multilevel modeling). Is it that the TYPE=COMPLEX and cluster option only adjusts for the parameter estimates' significance, but not their magnitude? I'm wondering whether it is appropriate to estimate a model of treatment effects involving children nested within schools using the TYPE=COMPLEX and cluster option. 


Parameter estimates are adjusted if the WEIGHT option is used. There is no difference between COMPLEX and TWOLEVEL in this regard. 

Elina Dale posted on Sunday, May 12, 2013  9:11 am



Dear Dr. Muthen, I am wondering about the difference between TYPE=COMPLEX ad TYPE=TWOLEVEL analysis of SEM in MPlus. In traditional regression modeling, there is a distinction between population average and subject specific models. Population average models such as GEE describe the covariance among clustered observations, whereas SS/hierarchical models explain the source of this covariance. So, the coefficients are interpreted differently: PA model estimates the difference in Y b/n group A with X and group B without X; the SS model the expected change in individual's probability of Y given change in X. I am wondering if I use TYPE=COMPLEX in my SEM as I have clustered data, the coefficient from my structural model  effect of treatment X on a latent factor F  is it interpreted as PA or SS? In other words, with specification COMPLEX, do we have a population average model or random effects model in MPlus? Do we need to specify TWOLEVEL to have a subject specific interpretation of coefficients? Thank you! 


Subjectspecific refers to random coefficients. You would need to use TYPE=TWOLEVEL RANDOM with random coefficients. TYPE=COMPLEX adjusts the standard errors for nonindependence of observations. 

Elina Dale posted on Monday, May 13, 2013  10:34 am



So, TYPE=COMPLEX is a marginal model? Are the coefficients interpreted as population average as in marginal models explained in papers by Zeger et al, 1988? It would be helpful to get a bit more explanation as to how some of MPlus specifications relate to more widespread / traditional types of analyses. Thank you! 


A singlelevel regression model (linear or logistic) is a "widespread/traditional type of analysis"  if you have a regression model and use TYPE=COMPLEX you are doing regression analysis and you get your SEs adjusted for complex survey data features. So the interpretation is the usual one for regression modeling. Same for factor analysis. If you have twolevel data and don't do TYPE=TWOLEVEL but do TYPE=COMPLEX you get a so called "aggregated" model using terms in wellknown complex survey data literature such as the 1989 Analysis of Complex Surveys book edited by Skinner, Holt, and Smith. GEE is a limitedinformation estimator, not a fullinformation maximumlikelihood estimator. You can see the relationship between GEE estimation and the closely related limitedinformation WLSMV estimation in Mplus in the paper on factor analysis on our website: Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report. 

Elina Dale posted on Tuesday, May 14, 2013  10:14 am



Thank you, Dr. Muthen! So does this mean that TYPE=COMPLEX specifies designbased or modelbased analysis? It is vital for me to understand what this specification implies, so please, forgive my persistence. Skinner et al. distinguish (A) design vs. (B) model based approaches to analysis. Within modelbased approach we have (a) aggregated (marginal) and (b) disaggregated (random effects) models. "A basic distinction is between designbased and modelbased inference.... Aggregated analysis may therefore alternatively be referred to as marginal modelling and the distinction between aggregated and disaggregated analysis is analogous, to a limited extent, to the distinction between populationaveraged and subjectspecific analysis, widely used in biostatistics." Thank you again! 


You may be interested in the chapter: Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267316. The labels you refer to are not always clear cut (at least not to me) so I'll describe what we do instead. With TYPE=COMPLEX we do complex survey SEs using the HubertWhite sandwich estimator. The parameters are the usual singlelevel parameters. The fact that we can also handle SE calculations based on replicate weights might qualify us for the designbased camp; I am not sure about these distinctions. TYPE=COMPLEX does an aggregated analysis when data are hierarchical (say twolevel) because it doesn't model parameters on both levels. In contrast, TYPE=TWOLEVEL or TYPE = COMPLEX TWOLEVEL does a disaggregated analysis. I discuss the difference in the above chapter in terms of factor analysis. You can also read more about what we do by reading the papers under our Complex Survey Data section: http://www.statmodel.com/resrchpap.shtml 


Another useful book is the 2003 Chambers & Skinner Wiley book. 

Elina Dale posted on Wednesday, May 15, 2013  5:37 am



Thank you so much, Dr. Muthen! Really appreciate your response, I think I understand now. I have also started looking at Chambers & Skinner 2003 book. Will check out your paper for which you sent me the link. 

Elina Dale posted on Monday, August 12, 2013  10:06 am



Dear Dr. Muthen, I have reread your paper (B. Muthen & A.Sattora, 1995) on complex sample data in SEM and I still have clarifying questions on the procedure used by MPlus when I specify "COMPLEX" in the Analysis. On pp. 281288, you describe the aggregated analysis, which Chambers & Skinner (2003) say "may alternatively be referred to as marginal modeling". I would greatly appreciate it if you could clarify: 1) whether the aggregated approach as described in Muthen & Sattora (1995) is a model or designbased approach to inference, b/c it can be used in either according to Chambers & Skinner (2003); 2) whether "COMPLEX" specification is a modelbased aggregated approach. Last question. Typically, as you say, designbased analysis uses weights in parameter estimation. I wonder if weights are required when using "COMPLEX". Thank you! 


1) I see it as a modelbased approach 2) I see COMPLEX as a modelbased aggregated approach. Weights are not required when using COMPLEX. For instance, there may be just clustering. 

Elina Dale posted on Wednesday, August 14, 2013  8:11 am



Thank you, Dr. Muthen! This is very helpful. 


Dear Dr. Muthen! I am analysing threelevel data (students, classes, schools). The question is, if a school system reform has an effect on the achievement of students. 8. grade students were tested before the reform was implemented and then 8. grade students after the reform (using the same schools). I'm using a threelevel model with "reform (0/1)" on the class level and estimate the effect on achievemnt (class level). Is this correct? Further I wonder why I get a different estimate for "achievement ON reform", when I use type = complex (cluster = class)? The estimate for type complex is equal to the simple mean difference between reform = 0 and reform = 1 using SPSS. Thanks christoph weber 


You say that this is a school system reform; isn't your "reform" variable a schoollevel variable and not a classroomlevel variable? The 3level model results won't agree with Type=complex with cluster=class because the latter takes only classroom clustering into account. You would do better with Type=Complex Twolevel and define 2 cluster variables: school and classroom (see UG). 


Thanks, I treat reform as a class level variable, because we have a kind of trend analysis. The same schools were tested two times (8.graders 2008 and 8.graders 2012), thus there is variation of "reform" within the school clusters. I thought that taking the complex design into account (complex or multilevel) just affects the SE, not the estimates. Sorry, what does UG mean? Christoph 


I get it, users guide 


One more question. I read a pdf "Mplus Short courses topic 7 multilevel modeling with ...". There is a ranom effects ANOVA example comparing type = twolevel with type = complex. Both models yield the same mean and SE. When I compare the two types with my data I get different means. Is this because of different cluster sizes?. The Anova Example uses data with equal cluster size. thanks Christoph Weber 


Yes, my results were for equal cluster sizes. 


Will twolevel and complex only yield the same results with equal cluster sizes? 


Yes. 


To come back to the question: Why do type complex and twolevel yield different means with unequal cluster sizes? Thanks Christoph 


The mean for type=complex is the average of Y_{ij}. The mean for type=twolevel is the average of the average Y_{ij}. 


I am running a confirmatory factor analysis with TYPE = COMPLEX and a CLUSTER variable and no weighting variable, and I have three questions. 1. Is this approach appropriate for my dataset (described below)? 2. Do the equations in in the Aggregated Modeling section (4.1) of the Muthén and Satorra (1995) paper on Complex Sample Data in SEM describe how MPlus estimates standard errors and fit statistics in this situation? 3. Would I get more accurate parameter estimates if I added a weighting variable? My dataset includes a total of 600 people who are married. The dataset includes 100 couples with two participating partners, and the remaining 400 people are independent cases participating without their partners. Thus, I have 500 clusters, and most clusters contain only one case. I want to estimate an aggregate confirmatory factor analysis that handles the small amount of nonindependence in the dataset. Thanks, Keith 


Yes on all 3 questions. 


After having read the previous posts, the UG, and the chapter suggested, I am still not clear as to which is more appropriate for my data set. Teachers (n=93) have each rated their students (n=2,122). Each rates only the students in their class, and students rated only by their teacher. The scale purports to measure antisocial behavior and is comprised of 12 items scored with a 4category rating scale. It is hypothesized to have two constructs: externalizing behavior and internalizing behavior. I am conducing a crosssectional study using CFA to provide evidence of the factor structure, test assumptions of Cronbach’s Alpha, and evaluate Measurement Invariance for the scale between Male and Female students. I have run specified models using both TYPE=Complex and TYPE=Twolevel, with CLUSTER=class for both. I have no missing data, and have used WLSMV for both complex and twolevel commands. Is one approach more appropriate that the other? 


Type=Twolevel modeling is a more ambitious approach where you estimate new parameters beyond those of Type=Complex. Type=Complex estimates the usual singlelevel CFA parameters and simply adjusts the SEs and chisquare for the clustering. So you can learn more from Twolevel analysis and it is better able to reveal the factor structure. 


I have diary data (n=741) of teachers (n=72) in Schools (n=15). I am aware that I cannot use a threelevelModel with only 15 Schools. 1. Would it be OK to use TYPE = Twolevel complex to take the clustering at School Level into account? 2. Would it be wrong to use a School Level variable (School leadership) as a predictor on the teacher Level? Thank you very much! 


1. No, Complex also needs at the very least 20 clusters. 2. No. 

Margarita posted on Monday, February 27, 2017  6:49 am



Dear Dr. Muthens, Are there any occasions in which Twolevel is used but a model is not defined in the between level? E.g. consider the following scenarios for y by x1x25 (x1x25 were measured in the within level) 1) Usevariables = x1x25; categorical = x1x25; cluster = cl; Type = complex; Model: y by x1x25; 2) Usevariables = x1x25; categorical = x1x25; clusters = cl; Type = twolevel; Model: %within% y by x1x25; %between% 3) Usevariables = x1x25; categorical = x1x25; within = x1x25; clusters = cl; Type = twolevel; Model: %within% y by x1x25; %between% If one were not interested in modeling clusterlevel parameters, what would be the right choice? I have definitely seen people using the third choice when wanting to account for the clustering in their data, but I am not sure what the benefit is. I noticed that in scenario 2, when x1x25 were not defined as within variables (within = x1x25) their thresholds were modeled in the between level, which I understand why. Should I then follow 1 or 2, if I am only interested in taking into account the clustering in the data? (note. clusters = 38, DEFF > 2 for all variables). Thank you for your constant help. Best, Margarita 


Approach 3) is just wrong because it says that the x's have no betweenlevel variation. The choice between 1) and 2) relates to the "aggregatability" concept discussed in the chapter on our website: Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267316. download paper contact first author show abstract 

Margarita posted on Thursday, March 02, 2017  8:28 am



Thank you Dr. Muthen that's very helpful. If only some of the items within a measure have substantial ICCs and DEFF values, would it make sense, based on your experience, to define those items with low ICC/DEFF as within variables, and estimate the parameters on the between level only for those with acceptable ICC/DEFF values? I haven't seen twolevel CFA examples in the literature (at least I haven't found any) where they divide the items of a measure based on their ICC, so I am wondering whether by accounting for the clustering in the data for items that do not have between variance would be counterproductive. 


You are doing a 2level factor analysis. I would not separate items into high and low ICC/DEFF but include them all in the 2level analysis and simply report the low, insignificant betweenlevel loadings for the items with low ICC. 


Hello I’m a very beginner in MPLUS and I try to figure out what the correct analysis for my research question is/will be. I want to test effects of level2predictors (teachers practices) on level1outcomes (success of transition to school of the children). Actually, I’m not really interested in differentiated results for classes or within/betweengroup variance (maybe I’m wrong here). I first of all want to test “main effects” (random intercepts) and be able to say if predictors (at level1, level2) have an impact on children’s transitionsuccess and how large the effect (in average, over all the 39 classes) is. Can I do this using the command type=complex? Or do I have to use type=twolevel (which seems to me correct anyway but much more complicated then the former). Thank you very much for your assistance! Tamara 


Type=Complex would be fine, particularly if you don't think the within and betweenlevel relationships differ. 


Hello, I am beginner in MPLUS and trying to figure out the correct analysis for my research question. I have dyadic data from mothers and fathers. I am interested in running a path analysis with 2 predictor variables, 2 mediators and 1 outcome variable all at the individual level. Can I just use type = complex and using the cluster variable by the couple ID to estimate the model or do I need to run multilevel analysis with type = two level? I would like to see if the model differs for moms and dads. Is it possible to compare the groups given the nonindependence of the samples? If so how would I go about doing that. Thank you in advance for you assistance. Ashley 


You can analyze this in a singlelevel, multivariate form, that is, your data set has 5+5 variables (columns) and you model mothers and fathers jointly  that accounts for their nonindependence. You see papers on this under Dyadic Analysis on our website at http://www.statmodel.com/papers.shtml That also makes it easy to compare mom and dad parameters (e.g. using Model Test). 


Hi, Thanks for the response. I wanted to make sure that I understand. Are you suggesting an APIM model? Thanks, Ashley 


Right. 


Hi, Are there other options if I am not interested in doing an APIM model? Thanks, Ashley 


Seems like that gives the most flexible model, but you can use a Type=Twolevel approach as well. Type=Complex would say that moms and dads have the same parameter values which would be limiting(it simply corrects the SEs for nonindependence). 


Dear Dr Muthen, I have siblings in my dataset. Some of the variables are reported by the children and some by the parents. I know that I can use type=complex to account for interdependence of siblings on the childreported variables. Can I also use type=complex when including parentreported variables or should I use type=twolevel to account for the different levels in my data? Thank you so much! Sincerely, Aurelie 


You might want to use twolevel. 

Nadim Khatib posted on Thursday, February 06, 2020  12:19 pm



After trying to fit a model as a twolevel model I find that estimates are very small on the betweenlevel, the fit (based on SRMR between) is bad, or it doesn't converge. Should I just use type=complex instead? 


See if the SEs change when you use Type=Complex versus regular. If not, this indicates that you don't need twolevel modeling. 

Stefan P. posted on Thursday, July 16, 2020  4:28 am



Dear Dres. M&M, i have latent variables (Big Five), each measured by three observed variables at four timepoints. Now i want to predict the score by age and gender  with corrected standard errors due to repeated measures. Thus, i need the mean (of the latent variable). 1) Isn't it possible to condcuct just a type=complex analysis with latent variables? 2) How can i obtain the mean? Thank you very much Stefan 


1) Yes, but you may not get the same results as if you to either a twolevel analysis (time and person), or a singlelevel analysis in wide format where you have 3x4 variables. See also longitudinal factor analysis discussions in my Mplus Web Talk No. 1 at http://www.statmodel.com/MplusWebTalks.shtml 2) You don't need latent variable means to study the effects of age and gender. In fact, with the Complex approach, you can't get them. 

Stefan P. posted on Saturday, July 18, 2020  6:12 am



I'm totally desperate right now: 1) I tried the singlelevel analysis in wide format for the last three weeks  until I came to the conclusion that a growth curve model makes no sense, since I don't want to examine the dependence of time but of age (we already talked about DATA COHORT in this context here in the forum). 2) How can I investigate the effect of age on  say  openness (O) without knowing the mean value for O? Ultimately, I want to have a regression function of the form O = mean + b1*age + b2*b^2 + b3*b^3 + female + e. The code until now produces the "BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE" error: USEVARIABLES ARE SEX AGE AGE_2 AGE_3 P215 P220 P225 ;!O MISSING ARE ALL (99); WITHIN = AGE AGE_2 AGE_3; BETWEEN = SEX; CLUSTER = PID; Analysis: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% Ow BY P215 P220 P225 (p1p3); s  Ow ON AGE AGE_2 AGE_3; %BETWEEN% Ob BY P215 P220 P225 (p1p3); P215@0 P220@0 P225@0; Ob ON SEX; OUTPUT: SAMPSTAT; 


1) In the longitudinal factor analysis framework I talk about in my web talk, you don't need a growth model for the factors. It has the factor means being different in a free way at the different time points (fixed at zero at the first time point). 2) Perhaps you are saying that people are of different ages at the different time points. Perhaps you are saying that the time distances between the waves are not the same. It isn't clear to me. Btw, this statement is incorrect because a random slope needs to refer to just 1 slope per statement and you mention 3: s  Ow ON AGE AGE_2 AGE_3; 

Stefan P. posted on Monday, July 20, 2020  1:57 am



1) slowly I'm beginning to realize (I hope)... unfortunately the output tells me the error message NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED. Although the diagram looks good. What could that be? 2) I want to know how the average trait score develops over age  corrected for repeated measures (the time distance is alwasy 4 years). 


We need to see your full output to diagnose this  send to Support along with your license number. 

Back to top 