Adjusting standard errors for cluster... PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Anonymous posted on Tuesday, August 26, 2003 - 11:16 am
Two questions.
1) For complex survey data that have the information of person weight, strata, and PSU (primary smapling units), what commands do I need to adjust standard error for clustering sampling?

2)Can I work with sample sizes as
big as 50,000 fairly quickly?
 Linda K. Muthen posted on Tuesday, August 26, 2003 - 11:49 am
1. To adjust the standard errors for clustering, you would use TYPE=COMPLEX; with CLUSTER = psu. You can handle strata by including the strata variables as covariates or using them as grouping variables.

2. 50,000 should not be a problem.
 Anonymous posted on Tuesday, August 26, 2003 - 2:30 pm
Thank you Dr. Muthen. I am hoping you can help me with what is likley a simple problem. I am getting a error message saying that "Input file 'C:\WINDOWS\DESKTOP\MPLUS2' does not contain valid command." But, I cannot figure our what I have done wrong. Following you will find the input commands that I used. Thank you in advance for your assistance.


FILE IS "C:\DATA\non_cancer.dat";

NAMES ARE wtfa_sa stratum psu sex age_p gtcrisk father mother sibling
USEVARIABLES ARE wtfa_sa stratum psu sex age_p gtcrisk father mother
sibling children;
GROUPING IS stratum ();
WEIGHT IS wtfa_sa;
CATEROTICAL IS sex gtcrisk father mother sibling children;

f1 BY father mother sibling children;
gtcrisk ON f1 age sex;

CONVERGENCE = 0.00005;

 Linda K. Muthen posted on Tuesday, August 26, 2003 - 3:19 pm
Please send the output containing the error message to
 Anonymous posted on Monday, February 09, 2004 - 5:09 pm
I'm attempting to run a multigroup SEM in Mplus 2.14. The data were collected from a clustered design, so I am using TYPE=GENERAL COMPLEX and the CLUSTER option to correct the SEs.

When submit the model to Mplus using the MLM estimator, I get the msg:


When I attempt to use the MLMV estimator, I get no Chi-Square warning, but Mplus only returns the RMSEA, SRMR, and WRMR statistics.

I have experimented with changing the ITERATIONS and CONVERGENCE options, to no avail.

Could I be having problems be due to the fact that my subgroups both have several clusters of size n=1 ?
 Linda K. Muthen posted on Tuesday, February 10, 2004 - 9:19 am
I cannot answer this wihtout more information. Please send your output and data to
 Anonymous posted on Friday, July 29, 2005 - 2:40 pm
I have noticed that the standard errors that I get in Mplus 3 using type=complex and using the cluster= command are larger than those I get in Stata when running a similar model using their "robust" command which uses a Huber/White correction (with a clustering variable). I thought both of these were adopting the same approach and should therefore provide very similar results--am I wrong about that? I found the corrected errors in Mplus to be about 15% larger than the corrected errors in Stata for a probit model (single dichotomous outcome) and about 7% larger for a linear regression model. (Both of these models were run on a sample of about 650 clusters with about 10 observations in each cluster---and a relatively balanced design). (In the case of the dichotomous outcome, the SE's from Mplus 3 were very similar using either the WLS or the WLSMV estimators.) Can you provide some intuition why these differences between the software packages would occur? Thanks much.
 Tihomir Asparouhov posted on Friday, July 29, 2005 - 3:26 pm
Mplus web note 7 has simulation study on multivariate Probit model that shows Mplus estimates to be correct. It also has a growth model simulation study (which can be viewed as a generalization of linear regression model) that shows Mplus estimates to be correct. This simulation study is also available on this web page including data etc, see

Our FCMS (2005) article shows that Mplus and SUDAAN produce the same estimates for linear and logistic regression.

As far as we know the method used in both packages is the same, that is the method used in gllamm is the same as Mplus estimator MLR. Of course WLS and ML are not quite the same so that can be a source of slight difference.
 Anonymous posted on Friday, July 29, 2005 - 6:24 pm
Oops, operator error. I found the error in my code, and got them to match up. Very reassuring. Thanks
 M. Walls posted on Friday, March 07, 2008 - 11:12 am
We are working on a manuscript revision involving a SEM with a sample size of 352. Our sampling procedure was not a multi-stage cluster technique per se, but we do have data from youth at 10 different geographical locations.

Having said this, a reviewer for our paper is asking that we "adjust for clustering." In addition, s/he asks us to do "the equivalent of the Chow test."

for other analyses, we have always checked our final model by adjusting the standard errors as a function of an individuals "nested-ness" in 1 of the 10 sites; however, we do not know how to do this procedure (or one that would satisfy our reviewer) using Mplus. We have tried the type = complex option with "location" as our cluster varaible. Is this an appropriate technique?

Thank you for any assistance.
 Jan Fax posted on Sunday, March 09, 2008 - 8:28 am

I am planning to use the type=complex command to correct for downward-biased standard errors in regression model using data from a clustered sample (that is, ca. 1800 respondents from 54 teams). In the model, next to demographic variables I also have team-level predictor variables. However, I wonder whether the complex-procedure also adjusts for the degree-of-freedom regarding the team-level predictors. Will these df be based on the N=1800 individuals or the N=54 groups ?

 Linda K. Muthen posted on Sunday, March 09, 2008 - 10:08 am
I would suggest using TYPE=TWOLEVEL where you can model on the team level using the team-level predictors.
 Linda K. Muthen posted on Sunday, March 09, 2008 - 10:18 am
M. Walls: Both TYPE=TWOLEVEL and TYPE=COMPLEX require a minimum of 30 clusters in most cases. I do not know what the Chow test is. You could consider adding 9 dummy variables to represent the 10 clusters.
 Jan Fax posted on Sunday, March 09, 2008 - 1:02 pm
Dear Dr. Muthen,

thanks for you response. In fact I tried type=twolevel, but perhaps due to rather low ICCs (ranging from 0.01 to 0.03), these models did not converge. Hence my second choice now to try type=complex ( or to ignore the clustering). But still,
when using type=complex, are the df for group-level predictors based on the "individual-level" or "group-level" smaple, (or some combinations of them)?

 Bengt O. Muthen posted on Sunday, March 09, 2008 - 5:08 pm
The standard error computation with Type=Complex uses a "sandwich" procedure, the standard Huber-White procedure that takes clustering into account and assumes independence only among cluster units, not individual units. In this way, the SEs are correct for both individual-level and cluster-level covariates (Mplus does not compute or report "df" per se, but presents approx normal z scores as the test statistics). As an example, see UG ex 11.6, Step 1 and Step 2.
 Scott R. Colwell posted on Tuesday, October 11, 2011 - 6:31 am
Is the standard error of the mean estimated differently in TYPE=COMPLEX versus TYPE=TWOLEVEL? I'm using a random effects ANOVA (similar to yours in topic 7) with 1 variable. The SE in the TYPE=TWOLEVEL is 0.077 whereas in the TYPE=COMPLEX, the SE is 0.080. They should be the same shouldn't they?
 Linda K. Muthen posted on Tuesday, October 11, 2011 - 7:22 am
Yes, they should be the same. See Slides 21-24 of the Topic 7 course handout where this is illustrated. If you can't see the difference between that and what you have, please send the relevant files and your license number to
 Jorge Walter posted on Tuesday, July 30, 2013 - 3:30 am
My co-authors and I had a complex sampling strategy in one of our studies, where we asked respondents to list and rank-order ten contacts and then asked them follow-up questions on their first choice and a randomly-selected contact from the other nine contacts (ranks 2-10). Predicting their preferences with the help of the follow-up questions and a 2-Level model (accounting for non-independence of observations among respondents, we have the dyadic relationships between respondents and a particular contact representing Level-1, and respondents representing Level-2), we would like to account for the fact that half of our sample is represented by their #1 choices, and half by their other choices (#2-10), i.e., for the non-randomness of our sample. Is there a way to do this with MPlus?

Thanks in advance for your help!
 Bengt O. Muthen posted on Tuesday, July 30, 2013 - 2:56 pm
Are the two halves independent subjects so that a 2-group analysis is possible?
 Jorge Walter posted on Tuesday, July 30, 2013 - 5:02 pm
Thanks for your quick reply, Bengt. I'm not sure I fully understand your question, so here is an example: I would report on two of my old friends (my best friend and one randomly chosen from my top 10); you would report on two of your old friends (your best friend and another one randomly chosen from your top 10, with no overlap between your and my friends); and so on. So our final sample would consist to 50% of best friends, and to 50% of others from respondents' top 10. In other words, best friends are overrepresented in our sample, which may or may not introduce statistical biases. Does this answer your question?
 Jorge Walter posted on Tuesday, July 30, 2013 - 5:14 pm
PS I checked and since I'm estimating a 2-level model (see my description above), I don't seem to be able to include both clustering and grouping at the same time...
 Bengt O. Muthen posted on Tuesday, July 30, 2013 - 5:31 pm
Couldn't such data be analyzed in a single-level, "wide" model where the report on the best friend is one variable (one column in the data) and the other randomly chosen friend is another variable. Dyadic data can often be represented as single-level wide. But I am not familiar enough with these kinds of data.
 Jorge Walter posted on Tuesday, July 30, 2013 - 5:46 pm
I don't think that is feasible given that I'm trying to predict preferences (i.e., the DV is unique for each dyad).

Since I'm technically oversampling the "best friends" category, is there anything I could with sample weights?
 Bengt O. Muthen posted on Wednesday, July 31, 2013 - 11:04 am
You may want to post this on SEMNET to see if dyadic researchers have ideas.
 Jacqueline Sims posted on Tuesday, April 26, 2016 - 8:59 am
We were hoping to run some 0 inflated negative binomial models with students nested within neighborhoods, and neighborhoods nested within schools. I understand that for type=THREELEVEL, outcome variables can only be continous. Do you know of any workaround for this? Or will complex survey features be available for THREELEVEL with categorical variables in a future version?

Thanks very much!
 Linda K. Muthen posted on Tuesday, April 26, 2016 - 9:29 am
TYPE=THREELEVEL is available for categorical variables. It is not available for count variables. I know of no immediate plans to add this.
 Jacqueline Sims posted on Tuesday, April 26, 2016 - 9:34 am
Thank you for your quick reply. In the User Guide, I noticed that:

Complex survey features are not available for TYPE=THREELEVEL with categorical variables or TYPE=CROSSCLASSIFIED because these models are estimated using Bayesian analysis for which complex survey features have not been generally developed.

Are complex survey features now available? When I attempt to run these, I get the message that "TYPE=THREELEVEL is not available when outcomes are censored, unordered categorical (nominal), count or continous-time survival variables."
 Tihomir Asparouhov posted on Tuesday, April 26, 2016 - 10:16 am
Here are some ideas

- you can use “twolevel complex” to accommodate two-stage clustering like this

- you can convert a univariate 3-level model to an equivalent multivariate two-level (search DATA LONGTOWIDE: in the manual)

- run it as continuous then retain the cluster level with bigger ICC and then run as two-level using NB distribution (discard the clustering level with smaller possibly none effect)

- run two-level using neighborhoods to get the between level factor score then analyze the factor score in a two-level run with school as cluster
 Jacqueline Sims posted on Wednesday, April 27, 2016 - 1:55 pm
Thank you, Tihomir, for the suggestions! We will explore them. Thanks again!
 N_2018 posted on Thursday, July 06, 2017 - 2:23 pm

I am planning to conduct a simple path analysis using continuous observed variables. Participants (N = 5,227) were recruited from 11 countries. My outcome and predictors are all at the individual level, but I'd like to adjust my standard errors for the non-independence of the clusters. I am under the impression that in Mplus, accurate standard errors will not be produced with such a low cluster sample size-- is this true? If so, is there a reference that states the minimum cluster size required for accurate results? If I am incorrect and it is possible to adjust the standard errors for the non-independence of the clusters, would TYPE = TWOLEVEL COMPLEX be the appropriate command? Again, I have individual-level predictors and no cluster-level predictors (or outcomes).

Many thanks for your time and attention.

- N
 Bengt O. Muthen posted on Thursday, July 06, 2017 - 6:58 pm
Yes, 11 clusters is too low. I can't think of literature on this. You can create 10 dummy variable covariates to account for some of the clustering or use Bayes if you are familiar with that.
 Anton Dominicson  posted on Monday, July 23, 2018 - 3:50 pm
Dear Mplus team,
I have a question on whether the TYPE=COMPLEX command might be a pertinent option for our data. We are working with behavioral observation data in which researchers/observers code the participants' actions in their natural setting. A third of the sample was observed by two independent observers in order to calculate inter-observer agreement statistics. Now we have subjects whose actions were coded twice. In the analyses, would it be admissible to include both codings for each subject, i.e., two cases per subject, with the COMPLEX option? Im not sure if the Huber/White corrections are designed to handle these types of scenarios. Many thanks.
 Bengt O. Muthen posted on Tuesday, July 24, 2018 - 6:39 am
Take a multivariate approach: if say 10 variables are measured per participant, having 2 raters can be handled by viewing the data as having 20 variables per participant (20 columns in the data). That is using Type=General, not Complex or Twolevel.

If for instance, the 10 variables measure one factor, the multivariate approach considers 2 factors, one for each rater. You can then see how highly the factors relate.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message