Mplus Discussion >> Data Structure

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Data Structure

Mplus Discussion > Multilevel Data/Complex Sample >

Message/Author

Anonymous posted on Wednesday, December 22, 2004 - 2:44 am

I have a quick question about data structure. I am very familiar with the HLM program, especially with the data structure (2 datasets). So, I was wondering if I had to have my between level measures aggregated on my cluster measure for multilevel SEM when I set my data up? Or, can I leave the data measured at the individual level and the Mplus program will handle this for me? Thank you.

Linda K. Muthen posted on Wednesday, December 22, 2004 - 8:42 am

For variables that are both within and between, you don't need to provide anything except the within level variable. Mplus does the rest. For a variable that is only between, you need to provide that cluster average for each individual.

Anonymous posted on Wednesday, December 22, 2004 - 9:34 am

Thanks for the help. However, I have a follow-up question. Does this mean that between only data have rest in separate data file--similar to HLM format? For example, if I have my within data in within.dat, then the between only data need to be in between.dat. Is this correct?

bmuthen posted on Wednesday, December 22, 2004 - 9:41 am

The Mplus data arrangement is different from HLM's. Mplus does not need two files, only one. The file has as rows the individuals and as columns the variables - both individual-level and between-level variables.

Anonymous posted on Wednesday, December 22, 2004 - 11:20 am

Ok, I am now getting an error message that is as follows:

*** ERROR
One or more between-level variables have variation within a cluster for
one or more clusters. Check your data and format statement.

The program identifies the variable that appears to be causing the problem, but I am unsure what it really means. So, what does this error message mean? Any suggestions on how to fix the probelm?

Thanks

Linda K. Muthen posted on Wednesday, December 22, 2004 - 11:32 am

If you declare that a variable is a between variable, each observation in a cluster must have the same value for that variable. The value would be the average of the individual level values for each cluster.

Anonymous posted on Wednesday, December 22, 2004 - 11:54 am

Doesn't this artificially inflate the correlations at the between level? If I only have 36 clusters but 2000 observations with the clusters, doesn't giving each observation in a cluster the same value for that variable cause calculation trouble. That is, instead of 36 observation, I would be estimating using 2000.

bmuthen posted on Wednesday, December 22, 2004 - 12:06 pm

Not a problem when you do twolevel analysis because then you tell the program that that variable is a between variable and that it only varies across the 36 clusters that you have.

Anonymous posted on Wednesday, December 22, 2004 - 12:49 pm

Thank you!

Bridget Goosby posted on Friday, March 10, 2006 - 1:23 pm

I am using the Type=Complex command along with cluster and strata feature. I keep getting the following error:
*** ERROR
Each stratum must contain unique cluster IDs. Clusters are not nested within strata.

This is quite confusing. When I look at the actual data, within each stratum there are cases that fall within one of two clusters. Is this problematic? All of the cells are populated in each stratum and cluster and the total number of clusters add up to the number of cases in the sample. I would appreciate your feeback on this! Thanks.

Linda K. Muthen posted on Friday, March 10, 2006 - 4:22 pm

Individuals from the same cluster cannot be in different strata. It seems that your data violates this. If you need further help, please send your input, data, output, and license number to support@statmodel.com.

Felix Dinger posted on Wednesday, April 29, 2009 - 7:51 am

Dear everybody,

I have a question concerning the TYPE=COMPLEX analysis. I don't really get the difference between the STRATIFICATION and CLUSTER options. I can't find an exact defintion in the Mplus User�s Guide. Fifth Edition (Muth�n & Muth�n, 1998-2007) either.

So what is the difference between these to option?

In which cases do I have to use STRATIFICATION and when CLUSTER?

Thanks for your Help!
Felix

Linda K. Muthen posted on Wednesday, April 29, 2009 - 10:37 am

The CLUSTER option is used when the results will be inferred to a larger group than the groups sampled from. With STRATIFICATION, inference will not go beyond the groups sampled from. See a complex survey book like the one by Korn and Graubard for more information.

Ebrahim Hamedi posted on Thursday, August 23, 2012 - 11:15 pm

Hi
I get this message:

"One or more between-level variables have variation within a cluster for one or more clusters. Check your data and format statement."

I checked the file again and again and found no fault with the between variable. I have 13 clusters and 13 values for each cluster. It says the problematic cluster is number 13. I checked the values for this cluster one by one. No problem at all. Unfortunately I still get this message. Can you please help me with this? Any idea will be appreciated.

many thanks,
Ebi

Linda K. Muthen posted on Friday, August 24, 2012 - 6:29 am

Please send the output, data, and your license number to support@statmodel.com.

PB posted on Friday, November 30, 2012 - 8:16 am

Hello,
in my dataset I have multiple brand evaluations per respondent, i.e., one respondent evaluates multiple brands. However, each individual brand is, in turn, also evaluated by multiple respondents. Hence, in my data structure, there appears to be some sort of cross-classification or multiple membership. My raw data would look like this:

Brand Respondent
B1 R1
B1 R2
B2 R1
B2 R3
B3 R2
B3 R3

I would like to do an EFA on several categorical variables on the brand level.
If I use TYPE=COMPLEX with CLUSTER = Respondent, this would mean that we account for the non-independence of brand evaluations within one respondent.
If I use TYPE=COMPLEX with CLUSTER = Brand, this would mean that evaluations within one cluster (brand) are for one specific brand only, respectively.

However, now, in order to account for the complex data structure stated above which somehow includes both cluster options, I thought about using TYPE=COMPLEX TWOLEVEL with CLUSTER = Brand Respondent. However, I fear that this is inappropriate as in our case, the secondary sampling unit is not really nested within the primary sampling unit. Is this correct and, if so, could you suggest an alternative way to account for this complex data structure in an EFA in Mplus? Thank you very much in advance.
Pablo

Linda K. Muthen posted on Friday, November 30, 2012 - 3:26 pm

If I understand you correctly, your data looks as follows:

Respondent Brand1 Brand2 Brand3 etc.

If a respondent does not evaluate all brands, the brands not evaluated would get a missing value. In this setting, non-independence of observations is handled by multivariate analysis. If you want to do an EFA, you would say:

USEVARIABLES - Brand1-Brand100;
TYPE=EFA 1 4;

You do not need COMPLEX or TWOLEVEL.

PB posted on Saturday, December 01, 2012 - 7:08 am

Thank you very much for the response. Actually, I am conducting an EFA on 43 binary items which are evaluated by the respondents for a subset of brands. Hence, my data looks like this:

Respondent Brand BinaryItem1 BinaryItem2 etc.
R1 B1 1 0
R1 B2 1 1
R2 B1 0 1
R2 B3 1 1
R3 B2 0 0
R3 B4 1 1

Wanting to conduct an EFA on the binary items 1 - 43 across all brands and respondents, I do not need to consider the fact that multiple brand evaluations are by the same respondent AND that multiple rows in the dataset are on the same brand (implying that the evaluations in these rows on the same brand are also linked)? The multivariate analysis would take care of these non-independencies?

Bengt O. Muthen posted on Saturday, December 01, 2012 - 5:04 pm

Would it be proper to view each item-brand combination as a variable in a multivariate vector of variables that respondents rate? So if each respondent rates 2 brands and there are 43 items, you have a data set with 86 variables, that is, 86 columns in your data. That treats brand as a fixed mode of variation as opposed to a random mode of variation. For random you draw inference to a population of brands. I think cross-classified analysis comes into play for the random case.

If you go with the fixed mode idea, the question is what factor structure one should expect for the "86 columns". The correlation due to the fact that the same respondent rates more than one brand should be captured by the factors.

stephanie moller posted on Thursday, January 16, 2014 - 10:07 am

can i use a covariance matrix as an input dataset for a multilevel model (instead of the original values)? if so, how would the covariance matrix need to be set up? is there an example that describes the file? would i need a within matrix and a between matrix.

I ask this because i am working with a restricted data that i can only analyze remotely and the system doesn't support mplus, but i could get the matrix from the system and analyze it on my computer? maybe? yes?

Linda K. Muthen posted on Thursday, January 16, 2014 - 10:14 am

No. You need individual-level data for multilevel models.

Ok-young, JI posted on Friday, March 04, 2016 - 12:46 am

hi, Muthens

I am now getting an error message that is as follows:

*** ERROR
One or more between-level variables have variation within a cluster for
one or more clusters. Check your data and format statement.

so I checked my data over and over again. but i can't find any problems.

how to fixe this problem? please help me

Linda K. Muthen posted on Friday, March 04, 2016 - 9:01 am

Please send the output, data set, and your license number to support@statmodel.com.

Jiangang Xia posted on Wednesday, December 14, 2016 - 8:10 am

Dear Professor,
I got the same error message. Since I am using a restricted data, I could not send the data set to support@statmodel.com. Could you share any possible solutions? Thanks.

Jiangang Xia posted on Wednesday, December 14, 2016 - 1:14 pm

To follow up my own question, I have figured it out. It is because the format statement. Once I removed the format statement, the model worked. Thanks.

Andra Th posted on Tuesday, March 07, 2017 - 9:48 am

Dear Dr. Muthens,

I am trying to run a multilevel SEM and I get the same following error:

One or more between-level variables have variation within a cluster for
one or more clusters. Check your data and format statement.

I checked the variables over and over again, I tried to delete the cases which supposedly did not work, and all my between level variables have the same value for the entire group. Am I doing something wrong here? I would send the output but I�m running on a demo version just because I wanted to see how the program works for this kind of analyses. I�d appreciate your input.

Linda K. Muthen posted on Tuesday, March 07, 2017 - 9:58 am

You may have blanks in the data set causing the data to be misread. Or you may have the wrong number of variable names such that it does not match the number of columns in the data set.

Andra Th posted on Wednesday, March 08, 2017 - 3:29 am

Thank you for your answer. I checked and all missing values were coded as such and the number of variables matches the number of columns in the data set. I'd require to do this analysis. Are there other steps I could be taking to proof the data? Would it be possible to send you the input, data, and result?

Linda K. Muthen posted on Wednesday, March 08, 2017 - 8:09 am

You can send the files along with your license number to support@statmodel.com.

Anonymous posted on Tuesday, October 31, 2017 - 8:12 pm

What should I do if my dataset exceeds the maximum record length? I have about n=5,000 subjects, observed at n=4 time points, so a total of about 20,000 rows in the long format. This is double the Mplus maximum. Is there a way around this or can I just not use Mplus? (Planning to do a MSEM).

Linda K. Muthen posted on Wednesday, November 01, 2017 - 9:00 am

Why do you think this exceeds the Mplus maximum? I know of no such limit.

Anonymous posted on Wednesday, November 01, 2017 - 12:00 pm

Maybe I am misunderstanding? I am looking at the Version 7 user guide, Chapter 15, under the Data Command heading. It says "The maximum record length is 10,000."

Linda K. Muthen posted on Wednesday, November 01, 2017 - 2:59 pm

This is the length of a row not the number of rows.

Eftychia Stamkou posted on Friday, March 02, 2018 - 7:02 am

Hi,
I get the error message below: "One or more between-level variables have variation within a cluster for one or more clusters. Check your data and format statement." The program also specifies the cluster that shows variation in the between variable.

I checked the data and there is no variance in that cluster. There are also no blanks in the dataset since I specified missing values. There is also no format statement in my code.

Do you have any idea of why I may be getting this error?

Many thanks in advance!

Bengt O. Muthen posted on Friday, March 02, 2018 - 11:27 am

We need to see the output and data - send to Support along with your license number.

Liu Yue posted on Tuesday, April 24, 2018 - 7:28 am

i have a data from a test. The structure is defined as the items are nested within students, because i have some covariates for items and students simultanously. So the within level is the item level and the between level is the student level. If i recode the data as n*2, n=I*J, where I is the number of students, J is the number of items. The first column is the response data, the second column is the cluster variable. Can i use Mplus to analyze this multilevel data? Should i tell the program that the same position for different students means the same item?

Tihomir Asparouhov posted on Tuesday, April 24, 2018 - 4:26 pm

It depends on how many items you have - if the number of items is <=20 or 30 the model on page 66 of the User's Guide will work well for you - you can add all kinds of covariates there

If you have many more items consider using User's Guide example 9.26 where we use cross-classified modeling and take a look at section 4.2
http://www.statmodel.com/download/NCME12.pdf

aiza khan posted on Thursday, June 25, 2020 - 6:49 am

Dear Muthen,
i have received this error while performing multilevel SEM (2-1-1.

"Between Cluster ID with variation in this variable Variable (only one cluster ID will be listed)"

I have tried so many thing but still the error persist

When i remove the between level statement then code run and mplus gives all the relevant statistics (CFI TLI etc)

Bengt O. Muthen posted on Thursday, June 25, 2020 - 9:04 am

We need to see your full output to diagnose this - add TECH1 and send to Support along with your license number.