I would like to estimate a model to predict a group-level outcome that is measured at three times. The main predictor is "team climate" which is measured at the individual level, but is aggregated to a group level variable. Furthermore, I have some control variables at the individual level (e.g., sex, age). The basic idea is to create a multilevel model that accounts for (1) individual variance both in the measurement of the team climate variable and in the prediction of the team-level outcome, and (2) the variability of the outcome across time. How can I specify such a model in Mplus? Your help is much appreciated.
(1) Here is one way to think about it. You may compare your case with the UG ex 9.1 figure on page 239. For the Within level (individuals) it sounds like you have individuals' team climate ratings as y, and control variables as x's. For Between (group) you have the y circle as a random intercept which varies across groups. That is your aggregate team climate, expressed as a latent variable. On Between it sounds like you don't have any w or xm variables, so you can just say
(2) Here the question is if you want to study growth or if time is just a nuisance and you simply want to take into account correlation across time. Multilevel growth models are shown in UG ex 9.12 and on.
Murphy T. posted on Wednesday, March 02, 2011 - 1:50 am
Thanks for your answer. I have some follow up questions. To specify:
(1) I have team performance as the dependent variable (measured at the team level only) and I want to regress it on team climate (team level) and control variables (individual level). Can I specify team performance (measured at team level) as the dependent variable on both within and between? Or do I have to specify team performance on between only and some other dependent variable on within?
(2) I just want to take it into account and not study growth. How can I specify this?
1. If you have individual-level control variables x, then using the individual level team climate in the way shown seems best.
2. Then you can handle that simply by saying
%Between% teamperf1-teamperf3 on y;
That is, you have 3 between-level team performance variables as 3 columns in your data.
Murphy T. posted on Wednesday, September 21, 2011 - 12:58 am
Thank you! I have now specified the model and it works (I decided to use only one measurement point for theoretical reasons, however).
Now I tried to specify an interaction between two latent variables at the between level. Both are individual-level variables that reflect team-level constructs. I used the XWITH command but got the error message:
"The XWITH option is not available for observed variable interactions. Use the DEFINE command to create an interaction variable. Problem with: ZSOCC_CS | ZSOCCYN XWITH ZCS"
My input was:
CLUSTER = tid; BETWEEN = Zaewg_1; CENTERING = GRANDMEAN (ZCS ZAR ZEM ZMP Zsex Zage Zsoccyn Zaewg_1); Analysis: Type = twolevel RANDOM; ALGORITHM = INTEGRATION; MODEL: %WITHIN% ZCS ZAR ZEM ZMP Zsoccyn on Zsex Zage; %BETWEEN% Zaewg_1 on ZCS ZAR ZEM ZMP Zsoccyn; Zsocc_CS | Zsoccyn XWITH ZCS; Zaewg_1 on Zsocc_CS;
Where "ZCS", "ZAR", "ZEM", "ZMP", "Zsoccyn" are the team climate variables; "Zsex" and "Zage" are individual-level controls and "Zaewg_1" is the team-level outcome.
It would be great if you could help me. Thank you very much.
I assume that your day-level variables have variation across level-2 units. If so, their between-level parts, their random intercepts, can be related to the control variable. That's how variables can relate across levels.
I have a dataset of individuals nested in teams. Some individuals, however, are members of several teams (e.g. 5 teams). Furthermore, my outcome variable is measured at the team level, while all predictors are measured at the individual level.
How would I construct a model incorporating the fact that the outcome variable is measured on the group level and the predictors on the individual level, while also taking into account that some individuals are members of multiple teams?
I've not seen an example in the literature on the combination of these two issues.
we want to analyse multilevel-data (indiviudals nested in teams) with a level 2 outcome (e.g. leaders' satisfaction), and a level 2 moderator (e.g. a leaders' trait). The independent variable is on level 1. This is our syntax. We are not sure if this is correct. Any corrections or hints are welcome! Does the (cross-level) interaction have to be defined as a between variable?
usevar = Leader_A Member_A Leader_J IactA; CLUSTER IS TEAM_3; BETWEEN ARE Leader_A Leader_J; DEFINE: IactA=Leader_A*Member_A; center Leader_A (grandmean) Member_A (groupmean); ANALYSIS: TYPE IS TWOLEVEL RANDOM; MODEL: %BETWEEN% Leader_J on Leader_A Member_A IactA;
I am interested in analyzing data consisting of repeated measures in clusters (schools) but with different individuals (students) at each time point. The objective is to analyze whether certain intervention had effect on the smoking prevalence in these schools, at two time points after the baseline. Everything is measured at the individual-level, but I'm using some of the measures as aggregated means on school-level, to serve as indicators of the school tobacco control policies. For me, measuring change over time is important, so could you advice how to analyze that in Mplus with this kind of data? I would prefer using binary outcome variable (daily smoker/other).
So are you saying that you want a binary growth model for 3 time points where the repeated outcome is an aggregate over students in the schools? Is the unit of analysis school? How many schools do you have?
Yes, that is my basic objective and the unit of analysis is school. However, I'm also interested whether it is possible to use individual outcome here.
I have altogether 339 schools with data from all three time points. There are altogether 108599 students in the data, but as I mentioned, each student has data only from one time point.
The variables of interest are gender, age, parental smoking, general attitudes towards smoking (these I would like to keep on individual level), school type and four variables related to school tobacco control policies (aggregated to school-level). The studied intervention relates to legislation so there is no specific intervention variable in the data, the time perspective is important for that. Then is the outcome for current student smoking, which could be used on individual level or aggregated to school mean.
If I wanted to study possible moderation effects (e.g. of some school-level policy), what would be a suitable model to test that in this setting?
I very much appreciate your help!
Nina Wirtz posted on Wednesday, March 11, 2015 - 3:57 am
Dear Bengt, I am currently trying to model a cross-level interaction with a level 1 predictor (x), a level 2 moderator (z) and a level 2 outcome (y). k is a level 2 control variable. (See Syntax below).
1. By not defining x as WITHIN variable, I am looking at the latent between-level part of x on level 2. However, as I am only using x on level 2 , I am actually forced to do so. If I define x as WITHIN variable I get an error msg. Is there any way around this or is the latent approach in this case (automatically) the preferable one?
2. Is the interaction term defined correctly? I've also tried the XWITH command, but that did not work.
3. Is the interaction term created with the grand mean-centered variables or with the raw scores?
Thank you very much for your help, I greatly appreciate it! Nina
usevar = x z k y Iact; MISSING = All(-999); CLUSTER IS team; BETWEEN ARE y z k;
DEFINE: center x z k (grandmean); Iact= x*z; ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR = ML;
1. If you are not interested in level-1 relationships, why don't you simply create a cluster-level version of x using Cluster_mean? Thereby you can do a single-level analysis.
2. The interaction definition is fine, but apply it to the cluster mean of x.
3. Grand-mean centering is done first.
Nina Wirtz posted on Thursday, March 12, 2015 - 2:17 am
Thank you for the helpful response Bengt!
Regarding 1: I have a formative construct on the within level (team members' health). I thought that I would avoid loss of information and get a more accurate estimation by using MLM (in reference to your 2008 paper with Lüdtke et al. on the MLC approach and some recent work by Croon, van Veldhoven, Peccei, & Wood on bathtub models with L2 outcomes). This way, the variance on the within variable as well as the dependence of observations among teams is taken into account, isn't it? In your opinion, does the multilevel structure make sense in my case? I highly appreciate your feedback. Thanks. Nina
I would like to estimate a model to predict a group level outcome (y). I have a level 1 predictor (x) and several level 2 predictors (z1 z2 z3). HVID is the level 2 cluster variable. Can you please verify that this is the correct syntax?
NAMES ARE = HVID z1 z2 z3 x y; MISSING = All(-99, -88); CLUSTER = HVID; BETWEEN ARE z1 z2 z3 y;
DEFINE: ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR = ML;
That looks right. The x variable on Between is the latent between part of x which is what you want.
Rick Vogel posted on Wednesday, June 29, 2016 - 2:13 pm
I have exactly the same data structure as in Allison's example above, with the exception that my group level outcome y is categorical.
When running the model, the error message is "Unrestricted x-variables for analysis with TYPE=TWOLEVEL and ALGORITHM=INTEGRATION must be specified as either a WITHIN or BETWEEN variable. The following variable cannot exist on both levels: x".
What are my options for solving this problem? 1) Is it correct to include x on the within level and the cluster mean of x on the between level? 2) How would instead a latent variable approach look like? 3) What else could I do?
Rick Vogel posted on Thursday, June 30, 2016 - 12:18 am
Thanks for the response. Just a follow-up question with regard to Q2: Is it correct that an equivalent solution would be to keep x on the within level and to create the factor only on the group level, as follows:
MODEL: %WITHIN% x; %BETWEEN% f by x; y on z1 z2 z3 f;
I am trying to conduct a multilevel path/mediation analysis with a categorical predictor (a high vs. low feedback intervention), an individual-level mediator (emotion) and a group level outcome (electricity consumption in shared apartments). The outcome is clustered within apartments (77 clusters)
I have the following input (I've tried various others), which is not currently converging and I'm looking for advice:
!level-1 variables m=Emo2_c; !emotions (disgust and empathy)
!level-2 dv y=elect; !electricity use z=Feed; !feedback
VARIABLE: NAMES ARE Mot Feed Feed2 apt_case elect water hotwater Intrins Emo2 Emo2_c;
USEVARIABLES ARE y m z; BETWEEN ARE y z; CLUSTER IS apt_case;
MISSING ARE ALL (-99);
ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% m; %BETWEEN% y ON m z; m ON z;
Dayna Walker posted on Wednesday, February 15, 2017 - 12:08 pm
Dear Drs. Muthen,
I have the same model as Allison above. Thank you for confirming this is the right syntax! My questions are about interpretation:
1) Should I interpret model fit statistics (e.g., CFI, TLI, RMSEA, SRMR) before interpreting significance of individual predictors, as with other model estimation techniques?
2) What does it mean if the p value of my level 1 predictor is different in the standardized (STDYX) output than in the non-standardized output (p = .046 vs p = .05)? Also, is STDYX the correct section of standardized output I should be using for interpretation (vs. STDY or STD)? I have both continuous and binary predictors in the model. The latent, level 1 predictor is continuous.
Please allow me to follow up on this enlightening topic. A colleague told me that a Generalized Estimating Equation (GEE) approach can model marginal distributions, i.e., modeling group-level dependent variables as a function of both group-level independent variables and individual-level independent variables. What is your opinion on using a GEE model and, is there a way to run GEE models in Mplus?
Another question is the within-group variable x. You mentioned that when x is modeled in both the within-group level and between-group level, x is decomposed as a latent variable at the between-group level, which I have difficulty to follow. How is this latent variable constructed, matrix wise? From my understanding, x will be running on a completely different matrix than the other group-level variables. Let me rephrase my question: How would you describe the model in an equation with x as both a within-group observed variable and a between-group latent variable?
I would really appreciate it if you could point to me some relevant references that you approve of. Thank you so much!
I won't give an opinion on GEE, but hierarchical data are typically analyzed using multilevel models. For a paper related to GEE, see the paper on our website
Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report. download paper contact first author show abstract
and its reference to the Melton-Liang paper.
Regarding the latent variable decomposition of x, see e.g.
Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229. download paper contact first author show abstract
Thank you for a prompt response. To my understanding, "Y ON S" will test whether the intercepts "Y" between classrooms vary (become more or less similar) as time goes.
I wanted to test whether this effect is due to L3 predictor "X" (classroom atmosphere). In other words, whether the effect of "X" on "Y" at L3 grows stronger or weaker over time. "Y on S" does not seem to test this, or am I missing something?
We want to do a similar analysis as Allison above. We have a group-level outcome and are only interested in the effects of level 1 predictors.
NAMES ARE = HVID x1 x2 x3 y; MISSING = All(-99, -88); CLUSTER = HVID; BETWEEN ARE y;
DEFINE: ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR = ML;
MODEL: %WITHIN% x1 x2 x3; %BETWEEN% y on x1 x2 x3;
The syntax above works but I have two questions: 1) I'm wondering now what is happening exactly. Is MPLUS automatically making aggregated means of x1 x2 x3 or is this an adjusted aggregated mean of some sort? 2) In the literature on micromacro analysis as in Croon & Vanveldhoven (2007) and Foster-Johnson & Kromrey (2018) a latent variable model is created where latent variables of x1 x2 x3 are introduced at level 2? See their mplus syntax on https://osf.io/z745e/?view_only=133543b6151a4ccbbde895839ceef378 But it's not clear to me why this is necessary. With their syntax I run into convergence problems, with the syntax above everything runs fine.
Mplus gives you a latent variable decomposition of the x's so that only the latent between-level part of each x is used on Between. You can read more about it in this paper on our website:
Asparouhov, T. & Muthén, B. (2019). Latent variable centering of predictors and mediators in multilevel and time-series models. Structural Equation Modeling: A Multidisciplinary Journal, 26, 119-142. DOI: 10.1080/10705511.2018.1511375 (Download scripts).
Kit Tse posted on Tuesday, October 20, 2020 - 10:22 pm
Dear Drs Muthen,
I have a school(L2)-student(L1) data structure, and I want to examine the effect of students' scores in Class A(L1-predictor X) on a school outcome(L2-outcome Y), controlling for students' average scores in the school. Class is an L1 moderator(C) with 2 values (1[Class A] or 0[Class B]). This is a bit different from the 1*1 moderation model in http://www.quantpsy.org/pubs/preacher_zhang_zyphur_2016_(code.appendix).pdf given the L2-outcome Y.
When I perform linear regression at L2 by averaging X, the aggregation procedure returns the school means of X in Class A that is highly correlated with the overall school means of X, resulting in collinearity issues. I have come up with the MPlus syntax instead: == %BETWEEN% Y ON X C X*C; %WITHIN% X C; ==
I wonder if my interpretations that 1. the coef of X represents the effect of X in Class B 2. the coef of (X+X*C) represents the effect of X in Class A are correct? Or are they wrong because C at L2 has values different from 0 and 1?
Furthermore, if I add in a school-level (L2) moderator (Z), could I change the syntax as follows: == %BETWEEN% Y ON X C Z X*C X*Z C*Z X*C*Z; %WITHIN% X C; ===
%BETWEEN% Y ON XA XB; XA with XB; %WITHIN% XA XB; XA with XB@0;
You would need to organize the data file as Y XA XB
If classrooms A and B have different sizes - fill in with missing values.
Kit Tse posted on Thursday, October 22, 2020 - 8:45 pm
Dear Dr. Asparouhov,
Thank you for your prompt response. Will the strong positive correlations at the school level between XA and XB (because they are classes from the same school) remain an issue in terms of multicollinearity in this model?