Message/Author 

Anonymous posted on Tuesday, August 26, 2003  11:16 am



Two questions. 1) For complex survey data that have the information of person weight, strata, and PSU (primary smapling units), what commands do I need to adjust standard error for clustering sampling? 2)Can I work with sample sizes as big as 50,000 fairly quickly? 


1. To adjust the standard errors for clustering, you would use TYPE=COMPLEX; with CLUSTER = psu. You can handle strata by including the strata variables as covariates or using them as grouping variables. 2. 50,000 should not be a problem. 

Anonymous posted on Tuesday, August 26, 2003  2:30 pm



Thank you Dr. Muthen. I am hoping you can help me with what is likley a simple problem. I am getting a error message saying that "Input file 'C:\WINDOWS\DESKTOP\MPLUS2' does not contain valid command." But, I cannot figure our what I have done wrong. Following you will find the input commands that I used. Thank you in advance for your assistance. ****************************** DATA: FILE IS "C:\DATA\non_cancer.dat"; VARIABLE: NAMES ARE wtfa_sa stratum psu sex age_p gtcrisk father mother sibling children; USEVARIABLES ARE wtfa_sa stratum psu sex age_p gtcrisk father mother sibling children; MISSING IS .; GROUPING IS stratum (); CLUSTER IS psu; WEIGHT IS wtfa_sa; CATEROTICAL IS sex gtcrisk father mother sibling children; MODEL: f1 BY father mother sibling children; gtcrisk ON f1 age sex; ANALYSIS: TYPE IS COMPLEX BASIC; ESTIMATOR IS MLM; ITERATIONS = 1000; CONVERGENCE = 0.00005; OUTPUT: SAMPSTAT RESIDUAL STANDARDIZED CINTERVAL TECH3; 


Please send the output containing the error message to support@statmodel.com. 

Anonymous posted on Monday, February 09, 2004  5:09 pm



I'm attempting to run a multigroup SEM in Mplus 2.14. The data were collected from a clustered design, so I am using TYPE=GENERAL COMPLEX and the CLUSTER option to correct the SEs. When submit the model to Mplus using the MLM estimator, I get the msg: "THE MODEL ESTIMATION TERMINATED NORMALLY ROBUST CHISQUARE IS NOT DEFINED DUE TO NONPOSITIVE TRACE" When I attempt to use the MLMV estimator, I get no ChiSquare warning, but Mplus only returns the RMSEA, SRMR, and WRMR statistics. I have experimented with changing the ITERATIONS and CONVERGENCE options, to no avail. Could I be having problems be due to the fact that my subgroups both have several clusters of size n=1 ? 


I cannot answer this wihtout more information. Please send your output and data to support@statmodel.com. 

Anonymous posted on Friday, July 29, 2005  2:40 pm



I have noticed that the standard errors that I get in Mplus 3 using type=complex and using the cluster= command are larger than those I get in Stata when running a similar model using their "robust" command which uses a Huber/White correction (with a clustering variable). I thought both of these were adopting the same approach and should therefore provide very similar resultsam I wrong about that? I found the corrected errors in Mplus to be about 15% larger than the corrected errors in Stata for a probit model (single dichotomous outcome) and about 7% larger for a linear regression model. (Both of these models were run on a sample of about 650 clusters with about 10 observations in each clusterand a relatively balanced design). (In the case of the dichotomous outcome, the SE's from Mplus 3 were very similar using either the WLS or the WLSMV estimators.) Can you provide some intuition why these differences between the software packages would occur? Thanks much. 


Mplus web note 7 has simulation study on multivariate Probit model that shows Mplus estimates to be correct. It also has a growth model simulation study (which can be viewed as a generalization of linear regression model) that shows Mplus estimates to be correct. This simulation study is also available on this web page including data etc, see http://statmodel.com/mplus/examples/webnote.html Our FCMS (2005) article shows that Mplus and SUDAAN produce the same estimates for linear and logistic regression. As far as we know the method used in both packages is the same, that is the method used in gllamm is the same as Mplus estimator MLR. Of course WLS and ML are not quite the same so that can be a source of slight difference. 

Anonymous posted on Friday, July 29, 2005  6:24 pm



Oops, operator error. I found the error in my code, and got them to match up. Very reassuring. Thanks 

M. Walls posted on Friday, March 07, 2008  11:12 am



We are working on a manuscript revision involving a SEM with a sample size of 352. Our sampling procedure was not a multistage cluster technique per se, but we do have data from youth at 10 different geographical locations. Having said this, a reviewer for our paper is asking that we "adjust for clustering." In addition, s/he asks us to do "the equivalent of the Chow test." for other analyses, we have always checked our final model by adjusting the standard errors as a function of an individuals "nestedness" in 1 of the 10 sites; however, we do not know how to do this procedure (or one that would satisfy our reviewer) using Mplus. We have tried the type = complex option with "location" as our cluster varaible. Is this an appropriate technique? Thank you for any assistance. 

Jan Fax posted on Sunday, March 09, 2008  8:28 am



Hi, I am planning to use the type=complex command to correct for downwardbiased standard errors in regression model using data from a clustered sample (that is, ca. 1800 respondents from 54 teams). In the model, next to demographic variables I also have teamlevel predictor variables. However, I wonder whether the complexprocedure also adjusts for the degreeoffreedom regarding the teamlevel predictors. Will these df be based on the N=1800 individuals or the N=54 groups ? best, Jan 


I would suggest using TYPE=TWOLEVEL where you can model on the team level using the teamlevel predictors. 


M. Walls: Both TYPE=TWOLEVEL and TYPE=COMPLEX require a minimum of 30 clusters in most cases. I do not know what the Chow test is. You could consider adding 9 dummy variables to represent the 10 clusters. 

Jan Fax posted on Sunday, March 09, 2008  1:02 pm



Dear Dr. Muthen, thanks for you response. In fact I tried type=twolevel, but perhaps due to rather low ICCs (ranging from 0.01 to 0.03), these models did not converge. Hence my second choice now to try type=complex ( or to ignore the clustering). But still, when using type=complex, are the df for grouplevel predictors based on the "individuallevel" or "grouplevel" smaple, (or some combinations of them)? Best, Jan 


The standard error computation with Type=Complex uses a "sandwich" procedure, the standard HuberWhite procedure that takes clustering into account and assumes independence only among cluster units, not individual units. In this way, the SEs are correct for both individuallevel and clusterlevel covariates (Mplus does not compute or report "df" per se, but presents approx normal z scores as the test statistics). As an example, see UG ex 11.6, Step 1 and Step 2. 


Is the standard error of the mean estimated differently in TYPE=COMPLEX versus TYPE=TWOLEVEL? I'm using a random effects ANOVA (similar to yours in topic 7) with 1 variable. The SE in the TYPE=TWOLEVEL is 0.077 whereas in the TYPE=COMPLEX, the SE is 0.080. They should be the same shouldn't they? 


Yes, they should be the same. See Slides 2124 of the Topic 7 course handout where this is illustrated. If you can't see the difference between that and what you have, please send the relevant files and your license number to support@statmodel.com. 


My coauthors and I had a complex sampling strategy in one of our studies, where we asked respondents to list and rankorder ten contacts and then asked them followup questions on their first choice and a randomlyselected contact from the other nine contacts (ranks 210). Predicting their preferences with the help of the followup questions and a 2Level model (accounting for nonindependence of observations among respondents, we have the dyadic relationships between respondents and a particular contact representing Level1, and respondents representing Level2), we would like to account for the fact that half of our sample is represented by their #1 choices, and half by their other choices (#210), i.e., for the nonrandomness of our sample. Is there a way to do this with MPlus? Thanks in advance for your help! 


Are the two halves independent subjects so that a 2group analysis is possible? 


Thanks for your quick reply, Bengt. I'm not sure I fully understand your question, so here is an example: I would report on two of my old friends (my best friend and one randomly chosen from my top 10); you would report on two of your old friends (your best friend and another one randomly chosen from your top 10, with no overlap between your and my friends); and so on. So our final sample would consist to 50% of best friends, and to 50% of others from respondents' top 10. In other words, best friends are overrepresented in our sample, which may or may not introduce statistical biases. Does this answer your question? 


PS I checked and since I'm estimating a 2level model (see my description above), I don't seem to be able to include both clustering and grouping at the same time... 


Couldn't such data be analyzed in a singlelevel, "wide" model where the report on the best friend is one variable (one column in the data) and the other randomly chosen friend is another variable. Dyadic data can often be represented as singlelevel wide. But I am not familiar enough with these kinds of data. 


I don't think that is feasible given that I'm trying to predict preferences (i.e., the DV is unique for each dyad). Since I'm technically oversampling the "best friends" category, is there anything I could with sample weights? 


You may want to post this on SEMNET to see if dyadic researchers have ideas. 


We were hoping to run some 0 inflated negative binomial models with students nested within neighborhoods, and neighborhoods nested within schools. I understand that for type=THREELEVEL, outcome variables can only be continous. Do you know of any workaround for this? Or will complex survey features be available for THREELEVEL with categorical variables in a future version? Thanks very much! 


TYPE=THREELEVEL is available for categorical variables. It is not available for count variables. I know of no immediate plans to add this. 


Thank you for your quick reply. In the User Guide, I noticed that: Complex survey features are not available for TYPE=THREELEVEL with categorical variables or TYPE=CROSSCLASSIFIED because these models are estimated using Bayesian analysis for which complex survey features have not been generally developed. Are complex survey features now available? When I attempt to run these, I get the message that "TYPE=THREELEVEL is not available when outcomes are censored, unordered categorical (nominal), count or continoustime survival variables." 


Here are some ideas  you can use “twolevel complex” to accommodate twostage clustering like this  you can convert a univariate 3level model to an equivalent multivariate twolevel (search DATA LONGTOWIDE: in the manual)  run it as continuous then retain the cluster level with bigger ICC and then run as twolevel using NB distribution (discard the clustering level with smaller possibly none effect)  run twolevel using neighborhoods to get the between level factor score then analyze the factor score in a twolevel run with school as cluster 


Thank you, Tihomir, for the suggestions! We will explore them. Thanks again! 

Back to top 