Anonymous posted on Wednesday, December 22, 2004 - 2:44 am
I have a quick question about data structure. I am very familiar with the HLM program, especially with the data structure (2 datasets). So, I was wondering if I had to have my between level measures aggregated on my cluster measure for multilevel SEM when I set my data up? Or, can I leave the data measured at the individual level and the Mplus program will handle this for me? Thank you.
For variables that are both within and between, you don't need to provide anything except the within level variable. Mplus does the rest. For a variable that is only between, you need to provide that cluster average for each individual.
Anonymous posted on Wednesday, December 22, 2004 - 9:34 am
Thanks for the help. However, I have a follow-up question. Does this mean that between only data have rest in separate data file--similar to HLM format? For example, if I have my within data in within.dat, then the between only data need to be in between.dat. Is this correct?
bmuthen posted on Wednesday, December 22, 2004 - 9:41 am
The Mplus data arrangement is different from HLM's. Mplus does not need two files, only one. The file has as rows the individuals and as columns the variables - both individual-level and between-level variables.
Anonymous posted on Wednesday, December 22, 2004 - 11:20 am
Ok, I am now getting an error message that is as follows:
*** ERROR One or more between-level variables have variation within a cluster for one or more clusters. Check your data and format statement.
The program identifies the variable that appears to be causing the problem, but I am unsure what it really means. So, what does this error message mean? Any suggestions on how to fix the probelm?
If you declare that a variable is a between variable, each observation in a cluster must have the same value for that variable. The value would be the average of the individual level values for each cluster.
Anonymous posted on Wednesday, December 22, 2004 - 11:54 am
Doesn't this artificially inflate the correlations at the between level? If I only have 36 clusters but 2000 observations with the clusters, doesn't giving each observation in a cluster the same value for that variable cause calculation trouble. That is, instead of 36 observation, I would be estimating using 2000.
bmuthen posted on Wednesday, December 22, 2004 - 12:06 pm
Not a problem when you do twolevel analysis because then you tell the program that that variable is a between variable and that it only varies across the 36 clusters that you have.
Anonymous posted on Wednesday, December 22, 2004 - 12:49 pm
I am using the Type=Complex command along with cluster and strata feature. I keep getting the following error: *** ERROR Each stratum must contain unique cluster IDs. Clusters are not nested within strata.
This is quite confusing. When I look at the actual data, within each stratum there are cases that fall within one of two clusters. Is this problematic? All of the cells are populated in each stratum and cluster and the total number of clusters add up to the number of cases in the sample. I would appreciate your feeback on this! Thanks.
Individuals from the same cluster cannot be in different strata. It seems that your data violates this. If you need further help, please send your input, data, output, and license number to email@example.com.
I have a question concerning the TYPE=COMPLEX analysis. I don't really get the difference between the STRATIFICATION and CLUSTER options. I can't find an exact defintion in the Mplus User’s Guide. Fifth Edition (Muthén & Muthén, 1998-2007) either.
So what is the difference between these to option?
In which cases do I have to use STRATIFICATION and when CLUSTER?
The CLUSTER option is used when the results will be inferred to a larger group than the groups sampled from. With STRATIFICATION, inference will not go beyond the groups sampled from. See a complex survey book like the one by Korn and Graubard for more information.
"One or more between-level variables have variation within a cluster for one or more clusters. Check your data and format statement."
I checked the file again and again and found no fault with the between variable. I have 13 clusters and 13 values for each cluster. It says the problematic cluster is number 13. I checked the values for this cluster one by one. No problem at all. Unfortunately I still get this message. Can you please help me with this? Any idea will be appreciated.
Hello, in my dataset I have multiple brand evaluations per respondent, i.e., one respondent evaluates multiple brands. However, each individual brand is, in turn, also evaluated by multiple respondents. Hence, in my data structure, there appears to be some sort of cross-classification or multiple membership. My raw data would look like this:
I would like to do an EFA on several categorical variables on the brand level. If I use TYPE=COMPLEX with CLUSTER = Respondent, this would mean that we account for the non-independence of brand evaluations within one respondent. If I use TYPE=COMPLEX with CLUSTER = Brand, this would mean that evaluations within one cluster (brand) are for one specific brand only, respectively.
However, now, in order to account for the complex data structure stated above which somehow includes both cluster options, I thought about using TYPE=COMPLEX TWOLEVEL with CLUSTER = Brand Respondent. However, I fear that this is inappropriate as in our case, the secondary sampling unit is not really nested within the primary sampling unit. Is this correct and, if so, could you suggest an alternative way to account for this complex data structure in an EFA in Mplus? Thank you very much in advance. Pablo
If I understand you correctly, your data looks as follows:
Respondent Brand1 Brand2 Brand3 etc.
If a respondent does not evaluate all brands, the brands not evaluated would get a missing value. In this setting, non-independence of observations is handled by multivariate analysis. If you want to do an EFA, you would say:
USEVARIABLES - Brand1-Brand100; TYPE=EFA 1 4;
You do not need COMPLEX or TWOLEVEL.
PB posted on Saturday, December 01, 2012 - 7:08 am
Thank you very much for the response. Actually, I am conducting an EFA on 43 binary items which are evaluated by the respondents for a subset of brands. Hence, my data looks like this:
Wanting to conduct an EFA on the binary items 1 - 43 across all brands and respondents, I do not need to consider the fact that multiple brand evaluations are by the same respondent AND that multiple rows in the dataset are on the same brand (implying that the evaluations in these rows on the same brand are also linked)? The multivariate analysis would take care of these non-independencies?
Would it be proper to view each item-brand combination as a variable in a multivariate vector of variables that respondents rate? So if each respondent rates 2 brands and there are 43 items, you have a data set with 86 variables, that is, 86 columns in your data. That treats brand as a fixed mode of variation as opposed to a random mode of variation. For random you draw inference to a population of brands. I think cross-classified analysis comes into play for the random case.
If you go with the fixed mode idea, the question is what factor structure one should expect for the "86 columns". The correlation due to the fact that the same respondent rates more than one brand should be captured by the factors.
can i use a covariance matrix as an input dataset for a multilevel model (instead of the original values)? if so, how would the covariance matrix need to be set up? is there an example that describes the file? would i need a within matrix and a between matrix.
I ask this because i am working with a restricted data that i can only analyze remotely and the system doesn't support mplus, but i could get the matrix from the system and analyze it on my computer? maybe? yes?
Jiangang Xia posted on Wednesday, December 14, 2016 - 8:10 am
Dear Professor, I got the same error message. Since I am using a restricted data, I could not send the data set to firstname.lastname@example.org. Could you share any possible solutions? Thanks.
Jiangang Xia posted on Wednesday, December 14, 2016 - 1:14 pm
To follow up my own question, I have figured it out. It is because the format statement. Once I removed the format statement, the model worked. Thanks.
Andra Th posted on Tuesday, March 07, 2017 - 9:48 am
Dear Dr. Muthens,
I am trying to run a multilevel SEM and I get the same following error:
One or more between-level variables have variation within a cluster for one or more clusters. Check your data and format statement.
I checked the variables over and over again, I tried to delete the cases which supposedly did not work, and all my between level variables have the same value for the entire group. Am I doing something wrong here? I would send the output but I´m running on a demo version just because I wanted to see how the program works for this kind of analyses. I´d appreciate your input.
You may have blanks in the data set causing the data to be misread. Or you may have the wrong number of variable names such that it does not match the number of columns in the data set.
Andra Th posted on Wednesday, March 08, 2017 - 3:29 am
Thank you for your answer. I checked and all missing values were coded as such and the number of variables matches the number of columns in the data set. I'd require to do this analysis. Are there other steps I could be taking to proof the data? Would it be possible to send you the input, data, and result?
Anonymous posted on Tuesday, October 31, 2017 - 8:12 pm
What should I do if my dataset exceeds the maximum record length? I have about n=5,000 subjects, observed at n=4 time points, so a total of about 20,000 rows in the long format. This is double the Mplus maximum. Is there a way around this or can I just not use Mplus? (Planning to do a MSEM).
Hi, I get the error message below: "One or more between-level variables have variation within a cluster for one or more clusters. Check your data and format statement." The program also specifies the cluster that shows variation in the between variable.
I checked the data and there is no variance in that cluster. There are also no blanks in the dataset since I specified missing values. There is also no format statement in my code.
Do you have any idea of why I may be getting this error?