Multilevel model with group-level out... PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Murphy Tucson posted on Tuesday, March 01, 2011 - 12:42 pm
I would like to estimate a model to predict a group-level outcome that is measured at three times.
The main predictor is "team climate" which is measured at the individual level, but is aggregated to a group level variable. Furthermore, I have some control variables at the individual level (e.g., sex, age).
The basic idea is to create a multilevel model that accounts for (1) individual variance both in the measurement of the team climate variable and in the prediction of the team-level outcome, and (2) the variability of the outcome across time. How can I specify such a model in Mplus?
Your help is much appreciated.
 Bengt O. Muthen posted on Tuesday, March 01, 2011 - 1:45 pm
(1) Here is one way to think about it. You may compare your case with the UG ex 9.1 figure on page 239. For the Within level (individuals) it sounds like you have individuals' team climate ratings as y, and control variables as x's. For Between (group) you have the y circle as a random intercept which varies across groups. That is your aggregate team climate, expressed as a latent variable. On Between it sounds like you don't have any w or xm variables, so you can just say


(2) Here the question is if you want to study growth or if time is just a nuisance and you simply want to take into account correlation across time. Multilevel growth models are shown in UG ex 9.12 and on.
 Murphy T. posted on Wednesday, March 02, 2011 - 1:50 am
Thanks for your answer. I have some follow up questions. To specify:

(1) I have team performance as the dependent variable (measured at the team level only) and I want to regress it on team climate (team level) and control variables (individual level). Can I specify team performance (measured at team level) as the dependent variable on both within and between? Or do I have to specify team performance on between only and some other dependent variable on within?

(2) I just want to take it into account and not study growth. How can I specify this?

Thanks very much from a new Mplus user.
 Bengt O. Muthen posted on Wednesday, March 02, 2011 - 10:37 am
(1) You say

Between = teamperf;

in the VARIABLE command and in the MODEL command:

y on x1 x2;
! x1 x2 are control variables and y refers to
!team climate

teamperf on y;
! y is between part of team climate (the
! random intercept)

Is only the group-level outcome team performance measured 3 times, or are the other variables also measured 3 times?

To learn quicker, you may want to consider attending our multilevel course that we give end of March at Johns Hopkins.
 Murphy T. posted on Friday, March 04, 2011 - 9:50 am
Thank you very much.

(1) Do I understand it correctly that team climate has to be the individual level team climate variables rather then the (team-level) aggregated scores?

(2) Only the group-level outcome team performance is measured at 3 times; the other variables are measured at one time.
 Bengt O. Muthen posted on Saturday, March 05, 2011 - 2:43 pm
1. If you have individual-level control variables x, then using the individual level team climate in the way shown seems best.

2. Then you can handle that simply by saying

teamperf1-teamperf3 on y;

That is, you have 3 between-level team performance variables as 3 columns in your data.
 Murphy T. posted on Wednesday, September 21, 2011 - 12:58 am
Thank you! I have now specified the model and it works (I decided to use only one measurement point for theoretical reasons, however).

Now I tried to specify an interaction between two latent variables at the between level. Both are individual-level variables that reflect team-level constructs. I used the XWITH command but got the error message:

"The XWITH option is not available for observed variable interactions. Use
the DEFINE command to create an interaction variable.

My input was:

CLUSTER = tid;
BETWEEN = Zaewg_1;
Zsoccyn Zaewg_1);
Type = twolevel RANDOM;
ZCS ZAR ZEM ZMP Zsoccyn on Zsex Zage;
Zaewg_1 on ZCS ZAR ZEM ZMP Zsoccyn;
Zsocc_CS | Zsoccyn XWITH ZCS;
Zaewg_1 on Zsocc_CS;

Where "ZCS", "ZAR", "ZEM", "ZMP", "Zsoccyn" are the team climate variables; "Zsex" and "Zage" are individual-level controls and "Zaewg_1" is the team-level outcome.

It would be great if you could help me. Thank you very much.
 Linda K. Muthen posted on Wednesday, September 21, 2011 - 9:12 am
You can put a factor behind each of them on between, for example,

f1 BY Zsoccyn;

and use the factors in XWITH.
 Johnna Capitano posted on Friday, April 26, 2013 - 1:07 pm
I have a dataset of days clustered within people. My indirect model is all at a within level (all day level variables). I want to control for a between (level 2) variable.

Since the analysis is Type=Twolevel, I have the MODEL: %Within% followed by the model relationships.

How do I specify the controls? It seems that since the outcomes are at L1 and the controls are at L2, it will not allow me to regress on one the other in either a %between% or %within% statement.

Thank you!
 Bengt O. Muthen posted on Friday, April 26, 2013 - 2:08 pm
I assume that your day-level variables have variation across level-2 units. If so, their between-level parts, their random intercepts, can be related to the control variable. That's how variables can relate across levels.
 J. Botterman posted on Friday, November 22, 2013 - 2:55 am
Hi Bengt/Linda,

I have a dataset of individuals nested in teams. Some individuals, however, are members of several teams (e.g. 5 teams). Furthermore, my outcome variable is measured at the team level, while all predictors are measured at the individual level.

How would I construct a model incorporating the fact that the outcome variable is measured on the group level and the predictors on the individual level, while also taking into account that some individuals are members of multiple teams?

I've not seen an example in the literature on the combination of these two issues.

Your help is greatly appreciated.
 Bengt O. Muthen posted on Sunday, November 24, 2013 - 3:37 pm
You may want to take a look at the multiple membership literature:

and perhaps also the cross-classified literature:

Gonzalez, De Boeck, Tuerlinckx (2008) A Double-Structure Structural
Equation Model for Three-Mode Data. Psychological Methods, 337 -
 Thomas Rigotti posted on Friday, October 10, 2014 - 7:36 am

we want to analyse multilevel-data (indiviudals nested in teams) with a level 2 outcome (e.g. leaders' satisfaction), and a level 2 moderator (e.g. a leaders' trait).
The independent variable is on level 1.
This is our syntax. We are not sure if this is correct. Any corrections or hints are welcome!
Does the (cross-level) interaction have to be defined as a between variable?

usevar = Leader_A Member_A Leader_J IactA;
BETWEEN ARE Leader_A Leader_J;
center Leader_A (grandmean) Member_A (groupmean);
Leader_J on Leader_A Member_A IactA;
 Bengt O. Muthen posted on Friday, October 10, 2014 - 3:24 pm
So you intend "Member_A" to be the latent between-level part of the Member_A variable. Read about that under Part 2 of the UG ex 9.1 on page 262.

You should drop RANDOM in the Analysis command since you have only a random intercept/mean.
 Hanna Ollila posted on Tuesday, October 21, 2014 - 5:31 am
I am interested in analyzing data consisting of repeated measures in clusters (schools) but with different individuals (students) at each time point. The objective is to analyze whether certain intervention had effect on the smoking prevalence in these schools, at two time points after the baseline. Everything is measured at the individual-level, but I'm using some of the measures as aggregated means on school-level, to serve as indicators of the school tobacco control policies. For me, measuring change over time is important, so could you advice how to analyze that in Mplus with this kind of data? I would prefer using binary outcome variable (daily smoker/other).
 Bengt O. Muthen posted on Tuesday, October 21, 2014 - 12:23 pm
So are you saying that you want a binary growth model for 3 time points where the repeated outcome is an aggregate over students in the schools? Is the unit of analysis school? How many schools do you have?
 Hanna Ollila posted on Tuesday, October 21, 2014 - 11:11 pm
Yes, that is my basic objective and the unit of analysis is school. However, I'm also interested whether it is possible to use individual outcome here.

I have altogether 339 schools with data from all three time points. There are altogether 108599 students in the data, but as I mentioned, each student has data only from one time point.

The variables of interest are gender, age, parental smoking, general attitudes towards smoking (these I would like to keep on individual level), school type and four variables related to school tobacco control policies (aggregated to school-level). The studied intervention relates to legislation so there is no specific intervention variable in the data, the time perspective is important for that. Then is the outcome for current student smoking, which could be used on individual level or aggregated to school mean.

If I wanted to study possible moderation effects (e.g. of some school-level policy), what would be a suitable model to test that in this setting?

I very much appreciate your help!
 Nina Wirtz posted on Wednesday, March 11, 2015 - 3:57 am
Dear Bengt,
I am currently trying to model a cross-level interaction with a level 1 predictor (x), a level 2 moderator (z) and a level 2 outcome (y). k is a level 2 control variable. (See Syntax below).

1. By not defining x as WITHIN variable, I am looking at the latent between-level part of x on level 2. However, as I am only using x on level 2 , I am actually forced to do so. If I define x as WITHIN variable I get an error msg. Is there any way around this or is the latent approach in this case (automatically) the preferable one?

2. Is the interaction term defined correctly? I've also tried the XWITH command, but that did not work.

3. Is the interaction term created with the grand mean-centered variables or with the raw scores?

Thank you very much for your help, I greatly appreciate it!


usevar = x z k y Iact;
MISSING = All(-999);

center x z k (grandmean);
Iact= x*z;


x with k;
y on k z x Iact;
 Bengt O. Muthen posted on Wednesday, March 11, 2015 - 6:22 pm
1. If you are not interested in level-1 relationships, why don't you simply create a cluster-level version of x using Cluster_mean? Thereby you can do a single-level analysis.

2. The interaction definition is fine, but apply it to the cluster mean of x.

3. Grand-mean centering is done first.
 Nina Wirtz posted on Thursday, March 12, 2015 - 2:17 am
Thank you for the helpful response Bengt!

Regarding 1: I have a formative construct on the within level (team members' health). I thought that I would avoid loss of information and get a more accurate estimation by using MLM (in reference to your 2008 paper with Lüdtke et al. on the MLC approach and some recent work by Croon, van Veldhoven, Peccei, & Wood on bathtub models with L2 outcomes). This way, the variance on the within variable as well as the dependence of observations among teams is taken into account, isn't it?
In your opinion, does the multilevel structure make sense in my case? I highly appreciate your feedback. Thanks. Nina
 Bengt O. Muthen posted on Thursday, March 12, 2015 - 8:31 am
If you have a model for Within, I would include it, but not if it is just one variable - unless you are really keen on getting that latent variable decomposition (desirable with small cluster sizes).
 Allison L.. West posted on Tuesday, August 11, 2015 - 6:45 am
I would like to estimate a model to predict a group level outcome (y). I have a level 1 predictor (x) and several level 2 predictors (z1 z2 z3). HVID is the level 2 cluster variable. Can you please verify that this is the correct syntax?


NAMES ARE = HVID z1 z2 z3 x y;
MISSING = All(-99, -88);
BETWEEN ARE z1 z2 z3 y;


y on z1 z2 z3 x;

 Bengt O. Muthen posted on Tuesday, August 11, 2015 - 2:02 pm
That looks right. The x variable on Between is the latent between part of x which is what you want.
 Rick Vogel posted on Wednesday, June 29, 2016 - 2:13 pm
Dear all,

I have exactly the same data structure as in Allison's example above, with the exception that my group level outcome y is categorical.

When running the model, the error message is "Unrestricted x-variables for analysis with TYPE=TWOLEVEL and ALGORITHM=INTEGRATION must be specified as either a WITHIN or BETWEEN variable. The following variable cannot exist on both levels: x".

What are my options for solving this problem? 1) Is it correct to include x on the within level and the cluster mean of x on the between level? 2) How would instead a latent variable approach look like? 3) What else could I do?

Many thanks in advance,

 Bengt O. Muthen posted on Wednesday, June 29, 2016 - 3:11 pm
Q1 Yes.

Q2. Create a factor measured by x on both levels.
 Rick Vogel posted on Thursday, June 30, 2016 - 12:18 am
Thanks for the response. Just a follow-up question with regard to Q2: Is it correct that an equivalent solution would be to keep x on the within level and to create the factor only on the group level, as follows:

f by x;
y on z1 z2 z3 f;
 Bengt O. Muthen posted on Friday, July 01, 2016 - 11:46 am
I think so.
 Lisa Legault posted on Thursday, August 25, 2016 - 3:34 pm
I am trying to conduct a multilevel path/mediation analysis with a categorical predictor (a high vs. low feedback intervention), an individual-level mediator (emotion) and a group level outcome (electricity consumption in shared apartments). The outcome is clustered within apartments (77 clusters)

I have the following input (I've tried various others), which is not currently converging and I'm looking for advice:

!level-1 variables
m=Emo2_c; !emotions (disgust and empathy)

!level-2 dv
y=elect; !electricity use
z=Feed; !feedback

VARIABLE: NAMES ARE Mot Feed Feed2 apt_case elect water hotwater
Intrins Emo2 Emo2_c;

CLUSTER IS apt_case;


y ON m z;
m ON z;

 Linda K. Muthen posted on Thursday, August 25, 2016 - 4:57 pm
Please send the output and your license number to
 Dayna Walker posted on Wednesday, February 15, 2017 - 12:08 pm
Dear Drs. Muthen,

I have the same model as Allison above. Thank you for confirming this is the right syntax! My questions are about interpretation:

1) Should I interpret model fit statistics (e.g., CFI, TLI, RMSEA, SRMR) before interpreting significance of individual predictors, as with other model estimation techniques?

2) What does it mean if the p value of my level 1 predictor is different in the standardized (STDYX) output than in the non-standardized output (p = .046 vs p = .05)? Also, is STDYX the correct section of standardized output I should be using for interpretation (vs. STDY or STD)? I have both continuous and binary predictors in the model. The latent, level 1 predictor is continuous.

Many thanks for your guidance.
 Bengt O. Muthen posted on Wednesday, February 15, 2017 - 3:07 pm
1) Yes. The model must be a good representation of the data before interpreted.

2) See our FAQ:

Standardized coefficient can have different significance than unstandardized

STDYX is used for continuous predictors and STDY for binary ones.

Both of these questions are discussed in our new book.
 Jingtong Pan posted on Thursday, October 05, 2017 - 8:15 am
Dear Dr. Muthen,

Please allow me to follow up on this enlightening topic. A colleague told me that a Generalized Estimating Equation (GEE) approach can model marginal distributions, i.e., modeling group-level dependent variables as a function of both group-level independent variables and individual-level independent variables. What is your opinion on using a GEE model and, is there a way to run GEE models in Mplus?

Another question is the within-group variable x. You mentioned that when x is modeled in both the within-group level and between-group level, x is decomposed as a latent variable at the between-group level, which I have difficulty to follow. How is this latent variable constructed, matrix wise? From my understanding, x will be running on a completely different matrix than the other group-level variables. Let me rephrase my question: How would you describe the model in an equation with x as both a within-group observed variable and a between-group latent variable?

I would really appreciate it if you could point to me some relevant references that you approve of. Thank you so much!
 Bengt O. Muthen posted on Friday, October 06, 2017 - 6:10 pm
I won't give an opinion on GEE, but hierarchical data are typically analyzed using multilevel models. For a paper related to GEE, see the paper on our website

Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report.
download paper contact first author show abstract

and its reference to the Melton-Liang paper.

Regarding the latent variable decomposition of x, see e.g.

Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229.
download paper contact first author show abstract
 Jingtong Pan posted on Friday, October 06, 2017 - 6:18 pm
Thank you, Dr. Muthen!
 dummyvariable123 posted on Friday, October 05, 2018 - 9:59 am
Dear Dr. Muthen,

I want to see whether the effect of L3 predictor (classroom atmosphere) at L3 outcome (students attitudes) varies over time (decreases or increases as time goes).

How should my syntax look like to test this?
I tried to interact L3 predictor with time variable but I got an error message.

between = (class)X;
within = time inter;
inter = time*X;

Y ON time;


%BETWEEN class%
Y ON inter;

Error: One or more between-level variables have variation within a cluster for one or more CLASS clusters.

1. Should I introduce the interaction term at L1?
2. Or should I specify a random slope for the effect of time on the intercept at L1 and regress it on X at L3?
 Bengt O. Muthen posted on Friday, October 05, 2018 - 2:33 pm
Say on Within:

s | y on time;

where you specify time as a within variable. Then say on Level 3:

y on s;
 dummyvariable123 posted on Saturday, October 06, 2018 - 2:12 am
Dear Dr. Muthen,

Thank you for a prompt response.
To my understanding, "Y ON S" will test whether the intercepts "Y" between classrooms vary (become more or less similar) as time goes.

I wanted to test whether this effect is due to L3 predictor "X" (classroom atmosphere). In other words, whether the effect of "X" on "Y" at L3 grows stronger or weaker over time.
"Y on S" does not seem to test this, or am I missing something?

Thank you so much for your guidance.
 Bengt O. Muthen posted on Saturday, October 06, 2018 - 6:12 am
Sorry, I mean to say s on x.
 dummyvariable123 posted on Saturday, October 06, 2018 - 6:35 am
Thank you for clarifying. Should I keep the direct effect "Y on X" in as well?
 Bengt O. Muthen posted on Saturday, October 06, 2018 - 7:17 am
Yes, X could be expected to influence both the intercept ("Y" on L3) and the slope ("s").
 lopisok posted on Friday, June 19, 2020 - 6:51 am
Dear Bengt/Linda,

We want to do a similar analysis as Allison above. We have a group-level outcome and are only interested in the effects of level 1 predictors.


NAMES ARE = HVID x1 x2 x3 y;
MISSING = All(-99, -88);


x1 x2 x3;
y on x1 x2 x3;


The syntax above works but I have two questions:
1) I'm wondering now what is happening exactly. Is MPLUS automatically making aggregated means of x1 x2 x3 or is this an adjusted aggregated mean of some sort?
2) In the literature on micromacro analysis as in Croon & Vanveldhoven (2007) and Foster-Johnson & Kromrey (2018) a latent variable model is created where latent variables of x1 x2 x3 are introduced at level 2? See their mplus syntax on
But it's not clear to me why this is necessary. With their syntax I run into convergence problems, with the syntax above everything runs fine.

Kind regards
 Bengt O. Muthen posted on Friday, June 19, 2020 - 6:28 pm
Mplus gives you a latent variable decomposition of the x's so that only the latent between-level part of each x is used on Between. You can read more about it in this paper on our website:

Asparouhov, T. & Muthén, B. (2019). Latent variable centering of predictors and mediators in multilevel and time-series models. Structural Equation Modeling: A Multidisciplinary Journal, 26, 119-142. DOI: 10.1080/10705511.2018.1511375 (Download scripts).
 Kit Tse posted on Tuesday, October 20, 2020 - 10:22 pm
Dear Drs Muthen,

I have a school(L2)-student(L1) data structure, and I want to examine the effect of students' scores in Class A(L1-predictor X) on a school outcome(L2-outcome Y), controlling for students' average scores in the school. Class is an L1 moderator(C) with 2 values (1[Class A] or 0[Class B]). This is a bit different from the 1*1 moderation model in given the L2-outcome Y.

When I perform linear regression at L2 by averaging X, the aggregation procedure returns the school means of X in Class A that is highly correlated with the overall school means of X, resulting in collinearity issues. I have come up with the MPlus syntax instead:
X C;

I wonder if my interpretations that
1. the coef of X represents the effect of X in Class B
2. the coef of (X+X*C) represents the effect of X in Class A
are correct? Or are they wrong because C at L2 has values different from 0 and 1?

Furthermore, if I add in a school-level (L2) moderator (Z), could I change the syntax as follows:
X C;

Thank you in advance.
 Tihomir Asparouhov posted on Wednesday, October 21, 2020 - 10:09 pm
I would recommend this model instead

XA with XB;
XA with XB@0;

You would need to organize the data file as

If classrooms A and B have different sizes - fill in with missing values.
 Kit Tse posted on Thursday, October 22, 2020 - 8:45 pm
Dear Dr. Asparouhov,

Thank you for your prompt response.
Will the strong positive correlations at the school level between XA and XB (because they are classes from the same school) remain an issue in terms of multicollinearity in this model?

Thank you again.
 Tihomir Asparouhov posted on Friday, October 23, 2020 - 6:42 am
It might. You certainly won't change the correlation by writing it in a different way. You should try a three level model

%BETWEEN school%
%BETWEEN classroom%
X; or X on C; or X on C; X@0;

Here you can focus on Var(X) on the middle level. If it is zero that certainly will give you valuable information.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message