Message/Author 

Anonymous posted on Wednesday, December 22, 2004  2:44 am



I have a quick question about data structure. I am very familiar with the HLM program, especially with the data structure (2 datasets). So, I was wondering if I had to have my between level measures aggregated on my cluster measure for multilevel SEM when I set my data up? Or, can I leave the data measured at the individual level and the Mplus program will handle this for me? Thank you. 


For variables that are both within and between, you don't need to provide anything except the within level variable. Mplus does the rest. For a variable that is only between, you need to provide that cluster average for each individual. 

Anonymous posted on Wednesday, December 22, 2004  9:34 am



Thanks for the help. However, I have a followup question. Does this mean that between only data have rest in separate data filesimilar to HLM format? For example, if I have my within data in within.dat, then the between only data need to be in between.dat. Is this correct? 

bmuthen posted on Wednesday, December 22, 2004  9:41 am



The Mplus data arrangement is different from HLM's. Mplus does not need two files, only one. The file has as rows the individuals and as columns the variables  both individuallevel and betweenlevel variables. 

Anonymous posted on Wednesday, December 22, 2004  11:20 am



Ok, I am now getting an error message that is as follows: *** ERROR One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement. The program identifies the variable that appears to be causing the problem, but I am unsure what it really means. So, what does this error message mean? Any suggestions on how to fix the probelm? Thanks 


If you declare that a variable is a between variable, each observation in a cluster must have the same value for that variable. The value would be the average of the individual level values for each cluster. 

Anonymous posted on Wednesday, December 22, 2004  11:54 am



Doesn't this artificially inflate the correlations at the between level? If I only have 36 clusters but 2000 observations with the clusters, doesn't giving each observation in a cluster the same value for that variable cause calculation trouble. That is, instead of 36 observation, I would be estimating using 2000. 

bmuthen posted on Wednesday, December 22, 2004  12:06 pm



Not a problem when you do twolevel analysis because then you tell the program that that variable is a between variable and that it only varies across the 36 clusters that you have. 

Anonymous posted on Wednesday, December 22, 2004  12:49 pm



Thank you! 


I am using the Type=Complex command along with cluster and strata feature. I keep getting the following error: *** ERROR Each stratum must contain unique cluster IDs. Clusters are not nested within strata. This is quite confusing. When I look at the actual data, within each stratum there are cases that fall within one of two clusters. Is this problematic? All of the cells are populated in each stratum and cluster and the total number of clusters add up to the number of cases in the sample. I would appreciate your feeback on this! Thanks. 


Individuals from the same cluster cannot be in different strata. It seems that your data violates this. If you need further help, please send your input, data, output, and license number to support@statmodel.com. 


Dear everybody, I have a question concerning the TYPE=COMPLEX analysis. I don't really get the difference between the STRATIFICATION and CLUSTER options. I can't find an exact defintion in the Mplus User’s Guide. Fifth Edition (Muthén & Muthén, 19982007) either. So what is the difference between these to option? In which cases do I have to use STRATIFICATION and when CLUSTER? Thanks for your Help! Felix 


The CLUSTER option is used when the results will be inferred to a larger group than the groups sampled from. With STRATIFICATION, inference will not go beyond the groups sampled from. See a complex survey book like the one by Korn and Graubard for more information. 


Hi I get this message: "One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement." I checked the file again and again and found no fault with the between variable. I have 13 clusters and 13 values for each cluster. It says the problematic cluster is number 13. I checked the values for this cluster one by one. No problem at all. Unfortunately I still get this message. Can you please help me with this? Any idea will be appreciated. many thanks, Ebi 


Please send the output, data, and your license number to support@statmodel.com. 

PB posted on Friday, November 30, 2012  8:16 am



Hello, in my dataset I have multiple brand evaluations per respondent, i.e., one respondent evaluates multiple brands. However, each individual brand is, in turn, also evaluated by multiple respondents. Hence, in my data structure, there appears to be some sort of crossclassification or multiple membership. My raw data would look like this: Brand Respondent B1 R1 B1 R2 B2 R1 B2 R3 B3 R2 B3 R3 I would like to do an EFA on several categorical variables on the brand level. If I use TYPE=COMPLEX with CLUSTER = Respondent, this would mean that we account for the nonindependence of brand evaluations within one respondent. If I use TYPE=COMPLEX with CLUSTER = Brand, this would mean that evaluations within one cluster (brand) are for one specific brand only, respectively. However, now, in order to account for the complex data structure stated above which somehow includes both cluster options, I thought about using TYPE=COMPLEX TWOLEVEL with CLUSTER = Brand Respondent. However, I fear that this is inappropriate as in our case, the secondary sampling unit is not really nested within the primary sampling unit. Is this correct and, if so, could you suggest an alternative way to account for this complex data structure in an EFA in Mplus? Thank you very much in advance. Pablo 


If I understand you correctly, your data looks as follows: Respondent Brand1 Brand2 Brand3 etc. If a respondent does not evaluate all brands, the brands not evaluated would get a missing value. In this setting, nonindependence of observations is handled by multivariate analysis. If you want to do an EFA, you would say: USEVARIABLES  Brand1Brand100; TYPE=EFA 1 4; You do not need COMPLEX or TWOLEVEL. 

PB posted on Saturday, December 01, 2012  7:08 am



Thank you very much for the response. Actually, I am conducting an EFA on 43 binary items which are evaluated by the respondents for a subset of brands. Hence, my data looks like this: Respondent Brand BinaryItem1 BinaryItem2 etc. R1 B1 1 0 R1 B2 1 1 R2 B1 0 1 R2 B3 1 1 R3 B2 0 0 R3 B4 1 1 Wanting to conduct an EFA on the binary items 1  43 across all brands and respondents, I do not need to consider the fact that multiple brand evaluations are by the same respondent AND that multiple rows in the dataset are on the same brand (implying that the evaluations in these rows on the same brand are also linked)? The multivariate analysis would take care of these nonindependencies? 


Would it be proper to view each itembrand combination as a variable in a multivariate vector of variables that respondents rate? So if each respondent rates 2 brands and there are 43 items, you have a data set with 86 variables, that is, 86 columns in your data. That treats brand as a fixed mode of variation as opposed to a random mode of variation. For random you draw inference to a population of brands. I think crossclassified analysis comes into play for the random case. If you go with the fixed mode idea, the question is what factor structure one should expect for the "86 columns". The correlation due to the fact that the same respondent rates more than one brand should be captured by the factors. 


can i use a covariance matrix as an input dataset for a multilevel model (instead of the original values)? if so, how would the covariance matrix need to be set up? is there an example that describes the file? would i need a within matrix and a between matrix. I ask this because i am working with a restricted data that i can only analyze remotely and the system doesn't support mplus, but i could get the matrix from the system and analyze it on my computer? maybe? yes? 


No. You need individuallevel data for multilevel models. 


hi, Muthens I am now getting an error message that is as follows: *** ERROR One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement. so I checked my data over and over again. but i can't find any problems. how to fixe this problem? please help me 


Please send the output, data set, and your license number to support@statmodel.com. 

Jiangang Xia posted on Wednesday, December 14, 2016  8:10 am



Dear Professor, I got the same error message. Since I am using a restricted data, I could not send the data set to support@statmodel.com. Could you share any possible solutions? Thanks. 

Jiangang Xia posted on Wednesday, December 14, 2016  1:14 pm



To follow up my own question, I have figured it out. It is because the format statement. Once I removed the format statement, the model worked. Thanks. 

Andra Th posted on Tuesday, March 07, 2017  9:48 am



Dear Dr. Muthens, I am trying to run a multilevel SEM and I get the same following error: One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement. I checked the variables over and over again, I tried to delete the cases which supposedly did not work, and all my between level variables have the same value for the entire group. Am I doing something wrong here? I would send the output but I´m running on a demo version just because I wanted to see how the program works for this kind of analyses. I´d appreciate your input. 


You may have blanks in the data set causing the data to be misread. Or you may have the wrong number of variable names such that it does not match the number of columns in the data set. 

Andra Th posted on Wednesday, March 08, 2017  3:29 am



Thank you for your answer. I checked and all missing values were coded as such and the number of variables matches the number of columns in the data set. I'd require to do this analysis. Are there other steps I could be taking to proof the data? Would it be possible to send you the input, data, and result? 


You can send the files along with your license number to support@statmodel.com. 

Anonymous posted on Tuesday, October 31, 2017  8:12 pm



What should I do if my dataset exceeds the maximum record length? I have about n=5,000 subjects, observed at n=4 time points, so a total of about 20,000 rows in the long format. This is double the Mplus maximum. Is there a way around this or can I just not use Mplus? (Planning to do a MSEM). 


Why do you think this exceeds the Mplus maximum? I know of no such limit. 

Anonymous posted on Wednesday, November 01, 2017  12:00 pm



Maybe I am misunderstanding? I am looking at the Version 7 user guide, Chapter 15, under the Data Command heading. It says "The maximum record length is 10,000." 


This is the length of a row not the number of rows. 


Hi, I get the error message below: "One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement." The program also specifies the cluster that shows variation in the between variable. I checked the data and there is no variance in that cluster. There are also no blanks in the dataset since I specified missing values. There is also no format statement in my code. Do you have any idea of why I may be getting this error? Many thanks in advance! 


We need to see the output and data  send to Support along with your license number. 

Liu Yue posted on Tuesday, April 24, 2018  7:28 am



i have a data from a test. The structure is defined as the items are nested within students, because i have some covariates for items and students simultanously. So the within level is the item level and the between level is the student level. If i recode the data as n*2, n=I*J, where I is the number of students, J is the number of items. The first column is the response data, the second column is the cluster variable. Can i use Mplus to analyze this multilevel data? Should i tell the program that the same position for different students means the same item? 


It depends on how many items you have  if the number of items is <=20 or 30 the model on page 66 of the User's Guide will work well for you  you can add all kinds of covariates there If you have many more items consider using User's Guide example 9.26 where we use crossclassified modeling and take a look at section 4.2 http://www.statmodel.com/download/NCME12.pdf 

Back to top 