Anonymous posted on Tuesday, August 26, 2003 - 11:16 am
Two questions. 1) For complex survey data that have the information of person weight, strata, and PSU (primary smapling units), what commands do I need to adjust standard error for clustering sampling?
2)Can I work with sample sizes as big as 50,000 fairly quickly?
1. To adjust the standard errors for clustering, you would use TYPE=COMPLEX; with CLUSTER = psu. You can handle strata by including the strata variables as covariates or using them as grouping variables.
2. 50,000 should not be a problem.
Anonymous posted on Tuesday, August 26, 2003 - 2:30 pm
Thank you Dr. Muthen. I am hoping you can help me with what is likley a simple problem. I am getting a error message saying that "Input file 'C:\WINDOWS\DESKTOP\MPLUS2' does not contain valid command." But, I cannot figure our what I have done wrong. Following you will find the input commands that I used. Thank you in advance for your assistance.
DATA: FILE IS "C:\DATA\non_cancer.dat";
VARIABLE: NAMES ARE wtfa_sa stratum psu sex age_p gtcrisk father mother sibling children; USEVARIABLES ARE wtfa_sa stratum psu sex age_p gtcrisk father mother sibling children; MISSING IS .; GROUPING IS stratum (); CLUSTER IS psu; WEIGHT IS wtfa_sa; CATEROTICAL IS sex gtcrisk father mother sibling children;
MODEL: f1 BY father mother sibling children; gtcrisk ON f1 age sex;
ANALYSIS: TYPE IS COMPLEX BASIC; ESTIMATOR IS MLM; ITERATIONS = 1000; CONVERGENCE = 0.00005;
I cannot answer this wihtout more information. Please send your output and data to email@example.com.
Anonymous posted on Friday, July 29, 2005 - 2:40 pm
I have noticed that the standard errors that I get in Mplus 3 using type=complex and using the cluster= command are larger than those I get in Stata when running a similar model using their "robust" command which uses a Huber/White correction (with a clustering variable). I thought both of these were adopting the same approach and should therefore provide very similar results--am I wrong about that? I found the corrected errors in Mplus to be about 15% larger than the corrected errors in Stata for a probit model (single dichotomous outcome) and about 7% larger for a linear regression model. (Both of these models were run on a sample of about 650 clusters with about 10 observations in each cluster---and a relatively balanced design). (In the case of the dichotomous outcome, the SE's from Mplus 3 were very similar using either the WLS or the WLSMV estimators.) Can you provide some intuition why these differences between the software packages would occur? Thanks much.
Mplus web note 7 has simulation study on multivariate Probit model that shows Mplus estimates to be correct. It also has a growth model simulation study (which can be viewed as a generalization of linear regression model) that shows Mplus estimates to be correct. This simulation study is also available on this web page including data etc, see http://statmodel.com/mplus/examples/webnote.html
Our FCMS (2005) article shows that Mplus and SUDAAN produce the same estimates for linear and logistic regression.
As far as we know the method used in both packages is the same, that is the method used in gllamm is the same as Mplus estimator MLR. Of course WLS and ML are not quite the same so that can be a source of slight difference.
Anonymous posted on Friday, July 29, 2005 - 6:24 pm
Oops, operator error. I found the error in my code, and got them to match up. Very reassuring. Thanks
M. Walls posted on Friday, March 07, 2008 - 11:12 am
We are working on a manuscript revision involving a SEM with a sample size of 352. Our sampling procedure was not a multi-stage cluster technique per se, but we do have data from youth at 10 different geographical locations.
Having said this, a reviewer for our paper is asking that we "adjust for clustering." In addition, s/he asks us to do "the equivalent of the Chow test."
for other analyses, we have always checked our final model by adjusting the standard errors as a function of an individuals "nested-ness" in 1 of the 10 sites; however, we do not know how to do this procedure (or one that would satisfy our reviewer) using Mplus. We have tried the type = complex option with "location" as our cluster varaible. Is this an appropriate technique?
Thank you for any assistance.
Jan Fax posted on Sunday, March 09, 2008 - 8:28 am
I am planning to use the type=complex command to correct for downward-biased standard errors in regression model using data from a clustered sample (that is, ca. 1800 respondents from 54 teams). In the model, next to demographic variables I also have team-level predictor variables. However, I wonder whether the complex-procedure also adjusts for the degree-of-freedom regarding the team-level predictors. Will these df be based on the N=1800 individuals or the N=54 groups ?
M. Walls: Both TYPE=TWOLEVEL and TYPE=COMPLEX require a minimum of 30 clusters in most cases. I do not know what the Chow test is. You could consider adding 9 dummy variables to represent the 10 clusters.
Jan Fax posted on Sunday, March 09, 2008 - 1:02 pm
Dear Dr. Muthen,
thanks for you response. In fact I tried type=twolevel, but perhaps due to rather low ICCs (ranging from 0.01 to 0.03), these models did not converge. Hence my second choice now to try type=complex ( or to ignore the clustering). But still, when using type=complex, are the df for group-level predictors based on the "individual-level" or "group-level" smaple, (or some combinations of them)?
The standard error computation with Type=Complex uses a "sandwich" procedure, the standard Huber-White procedure that takes clustering into account and assumes independence only among cluster units, not individual units. In this way, the SEs are correct for both individual-level and cluster-level covariates (Mplus does not compute or report "df" per se, but presents approx normal z scores as the test statistics). As an example, see UG ex 11.6, Step 1 and Step 2.
Is the standard error of the mean estimated differently in TYPE=COMPLEX versus TYPE=TWOLEVEL? I'm using a random effects ANOVA (similar to yours in topic 7) with 1 variable. The SE in the TYPE=TWOLEVEL is 0.077 whereas in the TYPE=COMPLEX, the SE is 0.080. They should be the same shouldn't they?
Yes, they should be the same. See Slides 21-24 of the Topic 7 course handout where this is illustrated. If you can't see the difference between that and what you have, please send the relevant files and your license number to firstname.lastname@example.org.
My co-authors and I had a complex sampling strategy in one of our studies, where we asked respondents to list and rank-order ten contacts and then asked them follow-up questions on their first choice and a randomly-selected contact from the other nine contacts (ranks 2-10). Predicting their preferences with the help of the follow-up questions and a 2-Level model (accounting for non-independence of observations among respondents, we have the dyadic relationships between respondents and a particular contact representing Level-1, and respondents representing Level-2), we would like to account for the fact that half of our sample is represented by their #1 choices, and half by their other choices (#2-10), i.e., for the non-randomness of our sample. Is there a way to do this with MPlus?
Thanks for your quick reply, Bengt. I'm not sure I fully understand your question, so here is an example: I would report on two of my old friends (my best friend and one randomly chosen from my top 10); you would report on two of your old friends (your best friend and another one randomly chosen from your top 10, with no overlap between your and my friends); and so on. So our final sample would consist to 50% of best friends, and to 50% of others from respondents' top 10. In other words, best friends are overrepresented in our sample, which may or may not introduce statistical biases. Does this answer your question?
Couldn't such data be analyzed in a single-level, "wide" model where the report on the best friend is one variable (one column in the data) and the other randomly chosen friend is another variable. Dyadic data can often be represented as single-level wide. But I am not familiar enough with these kinds of data.
We were hoping to run some 0 inflated negative binomial models with students nested within neighborhoods, and neighborhoods nested within schools. I understand that for type=THREELEVEL, outcome variables can only be continous. Do you know of any workaround for this? Or will complex survey features be available for THREELEVEL with categorical variables in a future version?
Thank you for your quick reply. In the User Guide, I noticed that:
Complex survey features are not available for TYPE=THREELEVEL with categorical variables or TYPE=CROSSCLASSIFIED because these models are estimated using Bayesian analysis for which complex survey features have not been generally developed.
Are complex survey features now available? When I attempt to run these, I get the message that "TYPE=THREELEVEL is not available when outcomes are censored, unordered categorical (nominal), count or continous-time survival variables."