Message/Author 

ann feng posted on Thursday, November 11, 2010  1:23 pm



I am attempting a multilevel analysis on survey data collected annually over 3 years. Data were collected with geographic stratification (no cluster sampling) within each state and county of residence is my chosen independant clusters (data on a continental scale). My main interest is the relationship between outcome and the betweengroup predictor. Since the 3 yearcombined sample is used, I could fit a hierarchical model using year of interviw as a withingroup predictor and specify the stratification to reflect the sampling feature. But there are more than 4000 strata in the data so alternatively, I fitted the model ignoring the sampling strata,and instead specified stratification=year of interview to allow some time dependance. I am a little perplexed by the fact that I didn't account for the sampling strata and instead "post"stratified the data by year. The betweengroup covariate is timeinvariant by the way and although computations seem not be a problem for my model of choice, I wonder if I correctly adjusted for the year index here. I guess I could always allow year as a withingroup predictor having a random slope across clusters (i.e. county) but when this specification was attemped computation failed (maybe my inexperience with syntax was to blame). So I am leaning towards the model using year as stratification but am very uneasy about the implications...Many thanks for any input from this resourceful community! 


I would suggest that you use year (two dummy predictors) as a withingroup predictor with a fixed slope (instead of random slope).Keep in mind that the strata specification only affects the SE  not the model estimates. Strata specification that doesn't reflect the true sampling method would not be correct. 

ann feng posted on Friday, November 12, 2010  10:14 am



Thanks a lot Tihomir for your advice! I do have a followup question if you don't mind. So both type=complex and type=twolevel allows/requires the cluster option but do they have different meanings? From reading your Mplus notes 'Stratification in Multivariate Modeling' I surmise that units within a cluster (if cluster sampling used) should probabaly be as heterogeneous as possible in order to improve esimation precision but from a random effects modeling standpoint (allowing for intracluster correlation) the units are supposed to be similar wrt the variable of interest. So I am a bit confused as to if the clustering option is treated differently in the complex or twolevel methods? Again thanks for sharing your expertise! 


Take a look at http://statmodel.com/download/webnotes/mplusnote72.pdf section "Factor Analysis with cluster Sampling". The difference between complex and twolevel is about what the model describes. Complex modeling gives a model for the entire population while taking into account the sampling scheme that causes nonindependence between the observations. Twolevel modeling on the other hand describes the exact clustering effect and yields models that are specific to each cluster. In many cases both models can be used but the interpretation is different. 

ann feng posted on Sunday, November 14, 2010  2:04 pm



Thank you Tihomir for explaining the two approaches' distinction and the reference. Will take a thorough read through. One other concern I have is that the betweengroup predictor in my model (at the county level) might not be independent since spatial clustering wrt this variable might be present, yet the assumption is that the random effect (a random slope predicted by this 2ndlevel covariate) is normal and independent. Should I formally test the spatial independance assuption on the random effects? But then adjusting for the spatial correlation may pose a problem since there are more than 2000 clusters or counties and using indicator variable to specify a cluster boundary seems unwieldy to say the least. So to pose a more general question how do I account for autocorrelation in random effects in Mplus? Also is it correct to say that autocorrelation in my outcome variable(a categorical latent class variable) if present at the 2nd or betweengroup level is already accounted for by the random effects? I hope I didn't sidetrack too much from multilevel modeling to discuss the minute details of my model. I am really grateful for any pointers and advice you may have to help me wrap my head around this.Thanks! 


You can use type=complex twolevel where you can introduce additional level of clustering (such as states or regions) and that can account for additional dependence between the random effects of neighboring counties. See http://statmodel.com/download/SurveyJSM1.pdf This would be a bit ad hoc though if the sampling of the counties is not based on such clustering. Alternatively you can use the Bayes estimator in Mplus to estimate the twolevel model, then generate plausible values, see http://statmodel.com/download/Plausible.pdf The plausible values can then be used in a spatial model to test spatial dependence. Mplus currently does not estimate spatial models however. 

ann feng posted on Monday, November 15, 2010  10:56 am



Thank you so much Tihomir for your guidance and references!! I believe R is capable of some spatial independence testing like computing the Moran's I so I will probably resort to that after I am able to obtain the Bayesian results. You have been tremendously helpful. Thanks again for your insight and help! 


Heyhey, i've also got a somewhat similar question: I'm trying to analyses data from students that are nested within classes that are nested within schools. My research questions don't really concern school or classlevel variables but i want to correct the standarderrors to account for clustering. Could i use stratification=school with cluster=class and type=complex to achieve this? Many Thanks in advance! 


You can use Cluster = school class; and Type = Complex Twolevel; where Twolevel refers to modeling that takes into account clustering within classes and Complex refers to correcting SE's/chi2 for clustering within schools. 

Student 09 posted on Tuesday, January 17, 2012  12:25 am



If I understand this correctly, then model parameters such as random intercepts and slopes for a model using Cluster = school class; and Type = Complex Twolevel; refer to differences of students within and between classes, not to differences of students within and between schools. But suppose a researchers is merely interested to control for clustering of students within classes, while her major interest focuses on the model parameters referring to differences of students within and between schools (and not to students within and between classes). Is there a syntax to adequately deal with such a research question? Many thanks for your reply 


We fit a multigroup 2 PL logistic model (ex. 5.5) to estimate treatment effects (we are using a 46item measure). The design is blocked on teachers and random assignment is classes within teachers with students nested in classes. We have modeled this as STRATIFICATION=TEACHER and CLUSTER=CLASSES. With variance fixed at 1 across the two conditions, and the C group latent mean fixed at 0, and the freely estimated T group mean was is .35. We interpret this as a 30% of a standard deviation difference or an effect of .35. A nested models comparison indicated no significant differences when the nestedness of the data was modeled . Without the nested structure, the parameter differed significantly and considerably from 0. The questions are: 1) does it make sense to adjust SE in the two group IRT model where treatment effects are of interest; 2) if so, and if the assumption about the conceptualizing the latent variable difference as an effect size, an effect of .35 in a properly structured multilevel sample of 450 should be statistically significant…thoughts on why it isn’t? Also, we get a lot of these: WARNING: THE BIVARIATE TABLE OF X AND Y HAS AN EMPTY CELL. I understand that this indicates a correlation of 1.0. However, the binomial correlations are not 1.0 or even close in most cases. Ideas about why we are getting these messages and what can be done (short of dropping the items)? 


Student09: I would aggregate the data to the classroom level and have classroom as my unit of analysis. Then your cluster variable in a multilevel analysis can be schools. 


AnnaMaria: 1. Yes. When clustering is ignored, standard errors are too small. 2. It may be too few classrooms. A zero cell implies a correlation of one. The fact that the correlation is not estimated at one is the problem. 


I have a model as follows: Level 1 (students): Variables = educational aspirations (expect) and socioeconomic status (ses) Level 2 (schools): not interested in this level just controlling for it Level 3 (countries): Here I want the effect of ses on expect to be random at level three. Essentially I want to model: expect~B0+SES where the intercept is random at Level2 and the intercept and slope is random at Level3. In R I would do: glmer(expect~ses+(1SCHOOL)+(1+sesCOUNTRY), family=binomial (link=probit), data=AchData2, weights=W_FSTUWT) I cannot quite see how to go about this in mplus. 


So far I have tried MISSING=.; USEVARIABLES Expect ses; CLUSTER = CNT SCHOOLID; categorical = Expect; within = Zses; !weight = W_FSTUWT; !weights not allowed in bayes? ANALYSIS: TYPE=THREELEVEL RANDOM; ALGORITHM = INTEGRATION; INTEGRATION = MONTECARLO; MODEL: %WITHIN% S1 Expect on ses; %BETWEEN SCHOOLID% Expect; !I still get slope variances with this !s1@0;!maybe try this? seem dubious to me. !with or without this constraint results are similar to glmer in R %BETWEEN CNT% [S1]; S1; Expect; 


Your idea of s1@0 on the SCHOOLID level seems right to me. And, you are right that weights have not been developed in the Bayesian literature as far as I know. Perhaps use weightsrelated variables as covariates. 


Many Thanks Bengt. Two more question if that is ok. I plan on extending this model by adding a mediator. So: 1. My understanding is that the default link function in mplus is probit? 2. I have tried to label my parameters i.e. [S1] (p1); [s2](p2); ect. However, when I refer to these labels in model constraint no output is produced. Is there a reason for this? 


1. For Bayes and WLSMV it is. With ML you have probit and logit. 2. Please send output to Support. 


I am working with a dataset with random sampling of counties and then stratified random sampling of households within these counties. Does the STRATIFICATION= command assume that the stratification is done at the PSU level? 


Yes 


So technically speaking the assumptions of the estimator are not the same as the actual sampling mechanism. In some cases it is ok to ignore this mismatch ... but the conservative thing to do it probably to ignore the stratification and accept the bigger SE without it. This is not specific to Mplus  most software use the same method. 


Thanks, Tihomir. Good suggestion. I'm glad it is not a fatal flaw in the sampling design. As such are the vicissitudes of secondary data analysis! 

Danica Cruz posted on Sunday, August 03, 2014  6:45 pm



I'm working with a nationally representative data set that instructs users as follows: "A 1stage sampling plan should be set up using STATE and SAMPLE variables as strata, ID as the cluster and WGHT as the weight." However, if I enter both STATE and SAMPLE in the STRATIFICATION option, I get an error: *** ERROR in VARIABLE command Unrecognized variable in STRATIFICATION option: STATE SAMPLE If I remove one variable from the STRATIFICATION option, the program runs with no error. I'm new to using complex survey data in Mplus, so please tell me if there is there another way to use both strata variables in Mplus. Thank you. 


You can combine state and sample into one variable and use the new variable with the STRATIFICATION option. 

Danica Cruz posted on Tuesday, August 05, 2014  1:42 pm



Do you mean create a new variable to represent each possible combination of the two original stratification variables? For example: sample state new_variable 1 48 1 2 48 2 1 38 3 2 38 4 [etc.] 1 51 101 2 51 102 (50 states + DC) * 2 Thank you. 


No, create a new variable like DEFINE: new = (10*sample) + state; 

Yoosoo posted on Tuesday, October 20, 2015  1:05 pm



Dear Drs. Muthen, I have a question about analysing complex survey with a binary outcome, using a threelevel model. I am analysing surveys obtained from two stage sampling. The sampling method involved: 1. stratifying population by districts 2. selecting PSUs from each district 3. selecting households from each PSU. Threelevel model is used to combine 30 national surveys in a single;the three levels are household, PSU, and nation. My main outcome is a binary variable at lv 1, and the main explanatory var is a continuous var at lv 2. I read that the bayesian estimator used for threelevel logistic analysis does not allow incorporating weights and stratification in MPlus. Would you suggest if there's any other alternatives I can use to incorporate stratification and weight in Mplus? 


I would suggest the following approach. Convert the 3level model to a 2level model using the long to wide approach and transforming "household" to a multivariate observation. This will require entering missing data for the situation when household has varying size. Search the manual and the web site for "long to wide" and "wide to long" if you need help doing this. You can then use the ML estimator. Other options could be using a 30 group twolevel model or treating the weights and stratification as covariates. The weights and stratification variables are incompatible with the Bayes estimator on theoretical level. 


Good morning, I'm working with a dataset originating from two populations (Regular and Reserve Forces Army members). Within each population, we used a stratified (4 strata) random sampling technique. Hence I have 2 populations, 4 strata within each population, and sampling weights. I'm hoping to run a latent profile analysis on the overall data set (with strat is strat, weight is weight, etc.) and use Reg/Res as a predictor of the latent profiles. How should I go about this in order for the weights to remain accurate? Thank you!! 


The weights should be proportional to "1/probability of selection". Without having a complete description of the weights and the sampling process one can not verify that this is the case. If so I would recommend using the weights in the data set without modifying them. 


Thank you Tihomir! 

Back to top 