Data Structure PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Anonymous posted on Wednesday, December 22, 2004 - 2:44 am
I have a quick question about data structure. I am very familiar with the HLM program, especially with the data structure (2 datasets). So, I was wondering if I had to have my between level measures aggregated on my cluster measure for multilevel SEM when I set my data up? Or, can I leave the data measured at the individual level and the Mplus program will handle this for me? Thank you.
 Linda K. Muthen posted on Wednesday, December 22, 2004 - 8:42 am
For variables that are both within and between, you don't need to provide anything except the within level variable. Mplus does the rest. For a variable that is only between, you need to provide that cluster average for each individual.
 Anonymous posted on Wednesday, December 22, 2004 - 9:34 am
Thanks for the help. However, I have a follow-up question. Does this mean that between only data have rest in separate data file--similar to HLM format? For example, if I have my within data in within.dat, then the between only data need to be in between.dat. Is this correct?
 bmuthen posted on Wednesday, December 22, 2004 - 9:41 am
The Mplus data arrangement is different from HLM's. Mplus does not need two files, only one. The file has as rows the individuals and as columns the variables - both individual-level and between-level variables.
 Anonymous posted on Wednesday, December 22, 2004 - 11:20 am
Ok, I am now getting an error message that is as follows:

One or more between-level variables have variation within a cluster for
one or more clusters. Check your data and format statement.

The program identifies the variable that appears to be causing the problem, but I am unsure what it really means. So, what does this error message mean? Any suggestions on how to fix the probelm?

 Linda K. Muthen posted on Wednesday, December 22, 2004 - 11:32 am
If you declare that a variable is a between variable, each observation in a cluster must have the same value for that variable. The value would be the average of the individual level values for each cluster.
 Anonymous posted on Wednesday, December 22, 2004 - 11:54 am
Doesn't this artificially inflate the correlations at the between level? If I only have 36 clusters but 2000 observations with the clusters, doesn't giving each observation in a cluster the same value for that variable cause calculation trouble. That is, instead of 36 observation, I would be estimating using 2000.
 bmuthen posted on Wednesday, December 22, 2004 - 12:06 pm
Not a problem when you do twolevel analysis because then you tell the program that that variable is a between variable and that it only varies across the 36 clusters that you have.
 Anonymous posted on Wednesday, December 22, 2004 - 12:49 pm
Thank you!
 Bridget Goosby posted on Friday, March 10, 2006 - 1:23 pm
I am using the Type=Complex command along with cluster and strata feature. I keep getting the following error:
Each stratum must contain unique cluster IDs. Clusters are not nested within strata.

This is quite confusing. When I look at the actual data, within each stratum there are cases that fall within one of two clusters. Is this problematic? All of the cells are populated in each stratum and cluster and the total number of clusters add up to the number of cases in the sample. I would appreciate your feeback on this! Thanks.
 Linda K. Muthen posted on Friday, March 10, 2006 - 4:22 pm
Individuals from the same cluster cannot be in different strata. It seems that your data violates this. If you need further help, please send your input, data, output, and license number to
 Felix Dinger posted on Wednesday, April 29, 2009 - 7:51 am
Dear everybody,

I have a question concerning the TYPE=COMPLEX analysis. I don't really get the difference between the STRATIFICATION and CLUSTER options. I can't find an exact defintion in the Mplus User’s Guide. Fifth Edition (Muthén & Muthén, 1998-2007) either.

So what is the difference between these to option?

In which cases do I have to use STRATIFICATION and when CLUSTER?

Thanks for your Help!
 Linda K. Muthen posted on Wednesday, April 29, 2009 - 10:37 am
The CLUSTER option is used when the results will be inferred to a larger group than the groups sampled from. With STRATIFICATION, inference will not go beyond the groups sampled from. See a complex survey book like the one by Korn and Graubard for more information.
 Ebrahim Hamedi posted on Thursday, August 23, 2012 - 11:15 pm
I get this message:

"One or more between-level variables have variation within a cluster for one or more clusters. Check your data and format statement."

I checked the file again and again and found no fault with the between variable. I have 13 clusters and 13 values for each cluster. It says the problematic cluster is number 13. I checked the values for this cluster one by one. No problem at all. Unfortunately I still get this message. Can you please help me with this? Any idea will be appreciated.

many thanks,
 Linda K. Muthen posted on Friday, August 24, 2012 - 6:29 am
Please send the output, data, and your license number to
 PB posted on Friday, November 30, 2012 - 8:16 am
in my dataset I have multiple brand evaluations per respondent, i.e., one respondent evaluates multiple brands. However, each individual brand is, in turn, also evaluated by multiple respondents. Hence, in my data structure, there appears to be some sort of cross-classification or multiple membership. My raw data would look like this:

Brand Respondent
B1 R1
B1 R2
B2 R1
B2 R3
B3 R2
B3 R3

I would like to do an EFA on several categorical variables on the brand level.
If I use TYPE=COMPLEX with CLUSTER = Respondent, this would mean that we account for the non-independence of brand evaluations within one respondent.
If I use TYPE=COMPLEX with CLUSTER = Brand, this would mean that evaluations within one cluster (brand) are for one specific brand only, respectively.

However, now, in order to account for the complex data structure stated above which somehow includes both cluster options, I thought about using TYPE=COMPLEX TWOLEVEL with CLUSTER = Brand Respondent. However, I fear that this is inappropriate as in our case, the secondary sampling unit is not really nested within the primary sampling unit. Is this correct and, if so, could you suggest an alternative way to account for this complex data structure in an EFA in Mplus? Thank you very much in advance.
 Linda K. Muthen posted on Friday, November 30, 2012 - 3:26 pm
If I understand you correctly, your data looks as follows:

Respondent Brand1 Brand2 Brand3 etc.

If a respondent does not evaluate all brands, the brands not evaluated would get a missing value. In this setting, non-independence of observations is handled by multivariate analysis. If you want to do an EFA, you would say:

USEVARIABLES - Brand1-Brand100;

You do not need COMPLEX or TWOLEVEL.
 PB posted on Saturday, December 01, 2012 - 7:08 am
Thank you very much for the response. Actually, I am conducting an EFA on 43 binary items which are evaluated by the respondents for a subset of brands. Hence, my data looks like this:

Respondent Brand BinaryItem1 BinaryItem2 etc.
R1 B1 1 0
R1 B2 1 1
R2 B1 0 1
R2 B3 1 1
R3 B2 0 0
R3 B4 1 1

Wanting to conduct an EFA on the binary items 1 - 43 across all brands and respondents, I do not need to consider the fact that multiple brand evaluations are by the same respondent AND that multiple rows in the dataset are on the same brand (implying that the evaluations in these rows on the same brand are also linked)? The multivariate analysis would take care of these non-independencies?
 Bengt O. Muthen posted on Saturday, December 01, 2012 - 5:04 pm
Would it be proper to view each item-brand combination as a variable in a multivariate vector of variables that respondents rate? So if each respondent rates 2 brands and there are 43 items, you have a data set with 86 variables, that is, 86 columns in your data. That treats brand as a fixed mode of variation as opposed to a random mode of variation. For random you draw inference to a population of brands. I think cross-classified analysis comes into play for the random case.

If you go with the fixed mode idea, the question is what factor structure one should expect for the "86 columns". The correlation due to the fact that the same respondent rates more than one brand should be captured by the factors.
 stephanie moller posted on Thursday, January 16, 2014 - 10:07 am
can i use a covariance matrix as an input dataset for a multilevel model (instead of the original values)? if so, how would the covariance matrix need to be set up? is there an example that describes the file? would i need a within matrix and a between matrix.

I ask this because i am working with a restricted data that i can only analyze remotely and the system doesn't support mplus, but i could get the matrix from the system and analyze it on my computer? maybe? yes?
 Linda K. Muthen posted on Thursday, January 16, 2014 - 10:14 am
No. You need individual-level data for multilevel models.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message