ann feng posted on Thursday, November 11, 2010 - 1:23 pm
I am attempting a multilevel analysis on survey data collected annually over 3 years. Data were collected with geographic stratification (no cluster sampling) within each state and county of residence is my chosen independant clusters (data on a continental scale). My main interest is the relationship between outcome and the between-group predictor. Since the 3 year-combined sample is used, I could fit a hierarchical model using year of interviw as a within-group predictor and specify the stratification to reflect the sampling feature. But there are more than 4000 strata in the data so alternatively, I fitted the model ignoring the sampling strata,and instead specified stratification=year of interview to allow some time dependance. I am a little perplexed by the fact that I didn't account for the sampling strata and instead "post"-stratified the data by year. The between-group covariate is time-invariant by the way and although computations seem not be a problem for my model of choice, I wonder if I correctly adjusted for the year index here. I guess I could always allow year as a within-group predictor having a random slope across clusters (i.e. county) but when this specification was attemped computation failed (maybe my inexperience with syntax was to blame). So I am leaning towards the model using year as stratification but am very uneasy about the implications...Many thanks for any input from this resourceful community!
I would suggest that you use year (two dummy predictors) as a within-group predictor with a fixed slope (instead of random slope).Keep in mind that the strata specification only affects the SE - not the model estimates. Strata specification that doesn't reflect the true sampling method would not be correct.
ann feng posted on Friday, November 12, 2010 - 10:14 am
Thanks a lot Tihomir for your advice! I do have a follow-up question if you don't mind. So both type=complex and type=twolevel allows/requires the cluster option but do they have different meanings? From reading your Mplus notes 'Stratification in Multivariate Modeling' I surmise that units within a cluster (if cluster sampling used) should probabaly be as heterogeneous as possible in order to improve esimation precision but from a random effects modeling standpoint (allowing for intracluster correlation) the units are supposed to be similar wrt the variable of interest. So I am a bit confused as to if the clustering option is treated differently in the complex or twolevel methods? Again thanks for sharing your expertise!
The difference between complex and two-level is about what the model describes. Complex modeling gives a model for the entire population while taking into account the sampling scheme that causes non-independence between the observations. Twolevel modeling on the other hand describes the exact clustering effect and yields models that are specific to each cluster. In many cases both models can be used but the interpretation is different.
ann feng posted on Sunday, November 14, 2010 - 2:04 pm
Thank you Tihomir for explaining the two approaches' distinction and the reference. Will take a thorough read through. One other concern I have is that the between-group predictor in my model (at the county level) might not be independent since spatial clustering wrt this variable might be present, yet the assumption is that the random effect (a random slope predicted by this 2nd-level covariate) is normal and independent. Should I formally test the spatial independance assuption on the random effects? But then adjusting for the spatial correlation may pose a problem since there are more than 2000 clusters or counties and using indicator variable to specify a cluster boundary seems unwieldy to say the least. So to pose a more general question how do I account for autocorrelation in random effects in Mplus? Also is it correct to say that autocorrelation in my outcome variable(a categorical latent class variable) if present at the 2nd or between-group level is already accounted for by the random effects? I hope I didn't sidetrack too much from multilevel modeling to discuss the minute details of my model. I am really grateful for any pointers and advice you may have to help me wrap my head around this.Thanks!
You can use type=complex twolevel where you can introduce additional level of clustering (such as states or regions) and that can account for additional dependence between the random effects of neighboring counties. See http://statmodel.com/download/SurveyJSM1.pdf This would be a bit ad hoc though if the sampling of the counties is not based on such clustering.
Alternatively you can use the Bayes estimator in Mplus to estimate the two-level model, then generate plausible values, see http://statmodel.com/download/Plausible.pdf The plausible values can then be used in a spatial model to test spatial dependence. Mplus currently does not estimate spatial models however.
ann feng posted on Monday, November 15, 2010 - 10:56 am
Thank you so much Tihomir for your guidance and references!! I believe R is capable of some spatial independence testing like computing the Moran's I so I will probably resort to that after I am able to obtain the Bayesian results. You have been tremendously helpful. Thanks again for your insight and help!
i've also got a somewhat similar question: I'm trying to analyses data from students that are nested within classes that are nested within schools. My research questions don't really concern school- or class-level variables but i want to correct the standard-errors to account for clustering. Could i use stratification=school with cluster=class and type=complex to achieve this?
where Twolevel refers to modeling that takes into account clustering within classes and Complex refers to correcting SE's/chi-2 for clustering within schools.
Student 09 posted on Tuesday, January 17, 2012 - 12:25 am
If I understand this correctly, then model parameters such as random intercepts and slopes for a model using
Cluster = school class;
Type = Complex Twolevel;
refer to differences of students within and between classes, not to differences of students within and between schools.
But suppose a researchers is merely interested to control for clustering of students within classes, while her major interest focuses on the model parameters referring to differences of students within and between schools (and not to students within and between classes). Is there a syntax to adequately deal with such a research question?
We fit a multi-group 2 PL logistic model (ex. 5.5) to estimate treatment effects (we are using a 46-item measure). The design is blocked on teachers and random assignment is classes within teachers with students nested in classes. We have modeled this as STRATIFICATION=TEACHER and CLUSTER=CLASSES. With variance fixed at 1 across the two conditions, and the C group latent mean fixed at 0, and the freely estimated T group mean was is .35. We interpret this as a 30% of a standard deviation difference or an effect of .35. A nested models comparison indicated no significant differences when the nestedness of the data was modeled . Without the nested structure, the parameter differed significantly and considerably from 0. The questions are: 1) does it make sense to adjust SE in the two group IRT model where treatment effects are of interest; 2) if so, and if the assumption about the conceptualizing the latent variable difference as an effect size, an effect of .35 in a properly structured multilevel sample of 450 should be statistically significant…thoughts on why it isn’t? Also, we get a lot of these: WARNING: THE BIVARIATE TABLE OF X AND Y HAS AN EMPTY CELL. I understand that this indicates a correlation of 1.0. However, the binomial correlations are not 1.0 or even close in most cases. Ideas about why we are getting these messages and what can be done (short of dropping the items)?
I have a model as follows: Level 1 (students): Variables = educational aspirations (expect) and socioeconomic status (ses) Level 2 (schools): not interested in this level just controlling for it Level 3 (countries): Here I want the effect of ses on expect to be random at level three.
Essentially I want to model: expect~B0+SES where the intercept is random at Level2 and the intercept and slope is random at Level3. In R I would do:
I am working with a dataset with random sampling of counties and then stratified random sampling of households within these counties. Does the STRATIFICATION= command assume that the stratification is done at the PSU level?
So technically speaking the assumptions of the estimator are not the same as the actual sampling mechanism. In some cases it is ok to ignore this mismatch ... but the conservative thing to do it probably to ignore the stratification and accept the bigger SE without it. This is not specific to Mplus - most software use the same method.
Thanks, Tihomir. Good suggestion. I'm glad it is not a fatal flaw in the sampling design. As such are the vicissitudes of secondary data analysis!
Danica Cruz posted on Sunday, August 03, 2014 - 6:45 pm
I'm working with a nationally representative data set that instructs users as follows: "A 1-stage sampling plan should be set up using STATE and SAMPLE variables as strata, ID as the cluster and WGHT as the weight." However, if I enter both STATE and SAMPLE in the STRATIFICATION option, I get an error:
*** ERROR in VARIABLE command Unrecognized variable in STRATIFICATION option: STATE SAMPLE
If I remove one variable from the STRATIFICATION option, the program runs with no error.
I'm new to using complex survey data in Mplus, so please tell me if there is there another way to use both strata variables in Mplus. Thank you.
Yoosoo posted on Tuesday, October 20, 2015 - 1:05 pm
Dear Drs. Muthen,
I have a question about analysing complex survey with a binary outcome, using a three-level model.
I am analysing surveys obtained from two stage sampling. The sampling method involved: 1. stratifying population by districts 2. selecting PSUs from each district 3. selecting households from each PSU.
Three-level model is used to combine 30 national surveys in a single;the three levels are household, PSU, and nation. My main outcome is a binary variable at lv 1, and the main explanatory var is a continuous var at lv 2.
I read that the bayesian estimator used for three-level logistic analysis does not allow incorporating weights and stratification in MPlus. Would you suggest if there's any other alternatives I can use to incorporate stratification and weight in Mplus?
I would suggest the following approach. Convert the 3-level model to a 2-level model using the long to wide approach and transforming "household" to a multivariate observation. This will require entering missing data for the situation when household has varying size. Search the manual and the web site for "long to wide" and "wide to long" if you need help doing this. You can then use the ML estimator.
Other options could be using a 30 group two-level model or treating the weights and stratification as covariates. The weights and stratification variables are incompatible with the Bayes estimator on theoretical level.
I'm working with a dataset originating from two populations (Regular and Reserve Forces Army members). Within each population, we used a stratified (4 strata) random sampling technique. Hence I have 2 populations, 4 strata within each population, and sampling weights. I'm hoping to run a latent profile analysis on the overall data set (with strat is strat, weight is weight, etc.) and use Reg/Res as a predictor of the latent profiles. How should I go about this in order for the weights to remain accurate?
The weights should be proportional to "1/probability of selection". Without having a complete description of the weights and the sampling process one can not verify that this is the case. If so I would recommend using the weights in the data set without modifying them.
I have a question about clustering, too. In my study, I clustered classes in schools. Some of those schools have mixed age classrooms. Is it valid to cluster the single age-groups within one classroom addressing the developmental differences? Or should I cluster only the class itself without regard to the age-differences?