Message/Author 


I would like to estimate a model to predict a grouplevel outcome that is measured at three times. The main predictor is "team climate" which is measured at the individual level, but is aggregated to a group level variable. Furthermore, I have some control variables at the individual level (e.g., sex, age). The basic idea is to create a multilevel model that accounts for (1) individual variance both in the measurement of the team climate variable and in the prediction of the teamlevel outcome, and (2) the variability of the outcome across time. How can I specify such a model in Mplus? Your help is much appreciated. 


(1) Here is one way to think about it. You may compare your case with the UG ex 9.1 figure on page 239. For the Within level (individuals) it sounds like you have individuals' team climate ratings as y, and control variables as x's. For Between (group) you have the y circle as a random intercept which varies across groups. That is your aggregate team climate, expressed as a latent variable. On Between it sounds like you don't have any w or xm variables, so you can just say y; (2) Here the question is if you want to study growth or if time is just a nuisance and you simply want to take into account correlation across time. Multilevel growth models are shown in UG ex 9.12 and on. 

Murphy T. posted on Wednesday, March 02, 2011  1:50 am



Thanks for your answer. I have some follow up questions. To specify: (1) I have team performance as the dependent variable (measured at the team level only) and I want to regress it on team climate (team level) and control variables (individual level). Can I specify team performance (measured at team level) as the dependent variable on both within and between? Or do I have to specify team performance on between only and some other dependent variable on within? (2) I just want to take it into account and not study growth. How can I specify this? Thanks very much from a new Mplus user. 


(1) You say Between = teamperf; in the VARIABLE command and in the MODEL command: %Within y on x1 x2; ! x1 x2 are control variables and y refers to !team climate %Between% teamperf on y; ! y is between part of team climate (the ! random intercept) Is only the grouplevel outcome team performance measured 3 times, or are the other variables also measured 3 times? To learn quicker, you may want to consider attending our multilevel course that we give end of March at Johns Hopkins. 

Murphy T. posted on Friday, March 04, 2011  9:50 am



Thank you very much. (1) Do I understand it correctly that team climate has to be the individual level team climate variables rather then the (teamlevel) aggregated scores? (2) Only the grouplevel outcome team performance is measured at 3 times; the other variables are measured at one time. 


1. If you have individuallevel control variables x, then using the individual level team climate in the way shown seems best. 2. Then you can handle that simply by saying %Between% teamperf1teamperf3 on y; That is, you have 3 betweenlevel team performance variables as 3 columns in your data. 

Murphy T. posted on Wednesday, September 21, 2011  12:58 am



Thank you! I have now specified the model and it works (I decided to use only one measurement point for theoretical reasons, however). Now I tried to specify an interaction between two latent variables at the between level. Both are individuallevel variables that reflect teamlevel constructs. I used the XWITH command but got the error message: "The XWITH option is not available for observed variable interactions. Use the DEFINE command to create an interaction variable. Problem with: ZSOCC_CS  ZSOCCYN XWITH ZCS" My input was: CLUSTER = tid; BETWEEN = Zaewg_1; CENTERING = GRANDMEAN (ZCS ZAR ZEM ZMP Zsex Zage Zsoccyn Zaewg_1); Analysis: Type = twolevel RANDOM; ALGORITHM = INTEGRATION; MODEL: %WITHIN% ZCS ZAR ZEM ZMP Zsoccyn on Zsex Zage; %BETWEEN% Zaewg_1 on ZCS ZAR ZEM ZMP Zsoccyn; Zsocc_CS  Zsoccyn XWITH ZCS; Zaewg_1 on Zsocc_CS; Where "ZCS", "ZAR", "ZEM", "ZMP", "Zsoccyn" are the team climate variables; "Zsex" and "Zage" are individuallevel controls and "Zaewg_1" is the teamlevel outcome. It would be great if you could help me. Thank you very much. 


You can put a factor behind each of them on between, for example, f1 BY Zsoccyn; Zsoccyn@0; and use the factors in XWITH. 


I have a dataset of days clustered within people. My indirect model is all at a within level (all day level variables). I want to control for a between (level 2) variable. Since the analysis is Type=Twolevel, I have the MODEL: %Within% followed by the model relationships. How do I specify the controls? It seems that since the outcomes are at L1 and the controls are at L2, it will not allow me to regress on one the other in either a %between% or %within% statement. Thank you! 


I assume that your daylevel variables have variation across level2 units. If so, their betweenlevel parts, their random intercepts, can be related to the control variable. That's how variables can relate across levels. 


Hi Bengt/Linda, I have a dataset of individuals nested in teams. Some individuals, however, are members of several teams (e.g. 5 teams). Furthermore, my outcome variable is measured at the team level, while all predictors are measured at the individual level. How would I construct a model incorporating the fact that the outcome variable is measured on the group level and the predictors on the individual level, while also taking into account that some individuals are members of multiple teams? I've not seen an example in the literature on the combination of these two issues. Your help is greatly appreciated. 


You may want to take a look at the multiple membership literature: http://www.bristol.ac.uk/cmm/team/hg/xcmmrev2.pdf and perhaps also the crossclassified literature: Gonzalez, De Boeck, Tuerlinckx (2008) A DoubleStructure Structural Equation Model for ThreeMode Data. Psychological Methods, 337  353 


Hello, we want to analyse multileveldata (indiviudals nested in teams) with a level 2 outcome (e.g. leaders' satisfaction), and a level 2 moderator (e.g. a leaders' trait). The independent variable is on level 1. This is our syntax. We are not sure if this is correct. Any corrections or hints are welcome! Does the (crosslevel) interaction have to be defined as a between variable? usevar = Leader_A Member_A Leader_J IactA; CLUSTER IS TEAM_3; BETWEEN ARE Leader_A Leader_J; DEFINE: IactA=Leader_A*Member_A; center Leader_A (grandmean) Member_A (groupmean); ANALYSIS: TYPE IS TWOLEVEL RANDOM; MODEL: %BETWEEN% Leader_J on Leader_A Member_A IactA; 


So you intend "Member_A" to be the latent betweenlevel part of the Member_A variable. Read about that under Part 2 of the UG ex 9.1 on page 262. You should drop RANDOM in the Analysis command since you have only a random intercept/mean. 


I am interested in analyzing data consisting of repeated measures in clusters (schools) but with different individuals (students) at each time point. The objective is to analyze whether certain intervention had effect on the smoking prevalence in these schools, at two time points after the baseline. Everything is measured at the individuallevel, but I'm using some of the measures as aggregated means on schoollevel, to serve as indicators of the school tobacco control policies. For me, measuring change over time is important, so could you advice how to analyze that in Mplus with this kind of data? I would prefer using binary outcome variable (daily smoker/other). 


So are you saying that you want a binary growth model for 3 time points where the repeated outcome is an aggregate over students in the schools? Is the unit of analysis school? How many schools do you have? 


Yes, that is my basic objective and the unit of analysis is school. However, I'm also interested whether it is possible to use individual outcome here. I have altogether 339 schools with data from all three time points. There are altogether 108599 students in the data, but as I mentioned, each student has data only from one time point. The variables of interest are gender, age, parental smoking, general attitudes towards smoking (these I would like to keep on individual level), school type and four variables related to school tobacco control policies (aggregated to schoollevel). The studied intervention relates to legislation so there is no specific intervention variable in the data, the time perspective is important for that. Then is the outcome for current student smoking, which could be used on individual level or aggregated to school mean. If I wanted to study possible moderation effects (e.g. of some schoollevel policy), what would be a suitable model to test that in this setting? I very much appreciate your help! 

Nina Wirtz posted on Wednesday, March 11, 2015  3:57 am



Dear Bengt, I am currently trying to model a crosslevel interaction with a level 1 predictor (x), a level 2 moderator (z) and a level 2 outcome (y). k is a level 2 control variable. (See Syntax below). 1. By not defining x as WITHIN variable, I am looking at the latent betweenlevel part of x on level 2. However, as I am only using x on level 2 , I am actually forced to do so. If I define x as WITHIN variable I get an error msg. Is there any way around this or is the latent approach in this case (automatically) the preferable one? 2. Is the interaction term defined correctly? I've also tried the XWITH command, but that did not work. 3. Is the interaction term created with the grand meancentered variables or with the raw scores? Thank you very much for your help, I greatly appreciate it! Nina Syntax: usevar = x z k y Iact; MISSING = All(999); CLUSTER IS team; BETWEEN ARE y z k; DEFINE: center x z k (grandmean); Iact= x*z; ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR = ML; MODEL: %WITHIN% %BETWEEN% x with k; y on k z x Iact; 


1. If you are not interested in level1 relationships, why don't you simply create a clusterlevel version of x using Cluster_mean? Thereby you can do a singlelevel analysis. 2. The interaction definition is fine, but apply it to the cluster mean of x. 3. Grandmean centering is done first. 

Nina Wirtz posted on Thursday, March 12, 2015  2:17 am



Thank you for the helpful response Bengt! Regarding 1: I have a formative construct on the within level (team members' health). I thought that I would avoid loss of information and get a more accurate estimation by using MLM (in reference to your 2008 paper with Lüdtke et al. on the MLC approach and some recent work by Croon, van Veldhoven, Peccei, & Wood on bathtub models with L2 outcomes). This way, the variance on the within variable as well as the dependence of observations among teams is taken into account, isn't it? In your opinion, does the multilevel structure make sense in my case? I highly appreciate your feedback. Thanks. Nina 


If you have a model for Within, I would include it, but not if it is just one variable  unless you are really keen on getting that latent variable decomposition (desirable with small cluster sizes). 


I would like to estimate a model to predict a group level outcome (y). I have a level 1 predictor (x) and several level 2 predictors (z1 z2 z3). HVID is the level 2 cluster variable. Can you please verify that this is the correct syntax? Variable: NAMES ARE = HVID z1 z2 z3 x y; MISSING = All(99, 88); CLUSTER = HVID; BETWEEN ARE z1 z2 z3 y; DEFINE: ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR = ML; MODEL: %WITHIN% x; %BETWEEN% y on z1 z2 z3 x; Thanks, Allison 


That looks right. The x variable on Between is the latent between part of x which is what you want. 

Rick Vogel posted on Wednesday, June 29, 2016  2:13 pm



Dear all, I have exactly the same data structure as in Allison's example above, with the exception that my group level outcome y is categorical. When running the model, the error message is "Unrestricted xvariables for analysis with TYPE=TWOLEVEL and ALGORITHM=INTEGRATION must be specified as either a WITHIN or BETWEEN variable. The following variable cannot exist on both levels: x". What are my options for solving this problem? 1) Is it correct to include x on the within level and the cluster mean of x on the between level? 2) How would instead a latent variable approach look like? 3) What else could I do? Many thanks in advance, Rick 


Q1 Yes. Q2. Create a factor measured by x on both levels. 

Rick Vogel posted on Thursday, June 30, 2016  12:18 am



Thanks for the response. Just a followup question with regard to Q2: Is it correct that an equivalent solution would be to keep x on the within level and to create the factor only on the group level, as follows: MODEL: %WITHIN% x; %BETWEEN% f by x; y on z1 z2 z3 f; 


I think so. 


I am trying to conduct a multilevel path/mediation analysis with a categorical predictor (a high vs. low feedback intervention), an individuallevel mediator (emotion) and a group level outcome (electricity consumption in shared apartments). The outcome is clustered within apartments (77 clusters) I have the following input (I've tried various others), which is not currently converging and I'm looking for advice: !level1 variables m=Emo2_c; !emotions (disgust and empathy) !level2 dv y=elect; !electricity use z=Feed; !feedback VARIABLE: NAMES ARE Mot Feed Feed2 apt_case elect water hotwater Intrins Emo2 Emo2_c; USEVARIABLES ARE y m z; BETWEEN ARE y z; CLUSTER IS apt_case; MISSING ARE ALL (99); ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% m; %BETWEEN% y ON m z; m ON z; MODEL INDIRECT: y IND m z; 


Please send the output and your license number to support@statmodel.com. 

Dayna Walker posted on Wednesday, February 15, 2017  12:08 pm



Dear Drs. Muthen, I have the same model as Allison above. Thank you for confirming this is the right syntax! My questions are about interpretation: 1) Should I interpret model fit statistics (e.g., CFI, TLI, RMSEA, SRMR) before interpreting significance of individual predictors, as with other model estimation techniques? 2) What does it mean if the p value of my level 1 predictor is different in the standardized (STDYX) output than in the nonstandardized output (p = .046 vs p = .05)? Also, is STDYX the correct section of standardized output I should be using for interpretation (vs. STDY or STD)? I have both continuous and binary predictors in the model. The latent, level 1 predictor is continuous. Many thanks for your guidance. 


1) Yes. The model must be a good representation of the data before interpreted. 2) See our FAQ: Standardized coefficient can have different significance than unstandardized STDYX is used for continuous predictors and STDY for binary ones. Both of these questions are discussed in our new book. 


Dear Dr. Muthen, Please allow me to follow up on this enlightening topic. A colleague told me that a Generalized Estimating Equation (GEE) approach can model marginal distributions, i.e., modeling grouplevel dependent variables as a function of both grouplevel independent variables and individuallevel independent variables. What is your opinion on using a GEE model and, is there a way to run GEE models in Mplus? Another question is the withingroup variable x. You mentioned that when x is modeled in both the withingroup level and betweengroup level, x is decomposed as a latent variable at the betweengroup level, which I have difficulty to follow. How is this latent variable constructed, matrix wise? From my understanding, x will be running on a completely different matrix than the other grouplevel variables. Let me rephrase my question: How would you describe the model in an equation with x as both a withingroup observed variable and a betweengroup latent variable? I would really appreciate it if you could point to me some relevant references that you approve of. Thank you so much! 


I won't give an opinion on GEE, but hierarchical data are typically analyzed using multilevel models. For a paper related to GEE, see the paper on our website Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report. download paper contact first author show abstract and its reference to the MeltonLiang paper. Regarding the latent variable decomposition of x, see e.g. Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel effects in contextual studies. Psychological Methods, 13, 203229. download paper contact first author show abstract 


Thank you, Dr. Muthen! 


Dear Dr. Muthen, I want to see whether the effect of L3 predictor (classroom atmosphere) at L3 outcome (students attitudes) varies over time (decreases or increases as time goes). How should my syntax look like to test this? I tried to interact L3 predictor with time variable but I got an error message. between = (class)X; within = time inter; DEFINE: inter = time*X; MODEL: %within% Y; Y ON time; %BETWEEN id% Y; %BETWEEN class% Y; Y ON X; Y ON inter; Error: One or more betweenlevel variables have variation within a cluster for one or more CLASS clusters. 1. Should I introduce the interaction term at L1? 2. Or should I specify a random slope for the effect of time on the intercept at L1 and regress it on X at L3? 


Say on Within: s  y on time; where you specify time as a within variable. Then say on Level 3: y on s; 


Dear Dr. Muthen, Thank you for a prompt response. To my understanding, "Y ON S" will test whether the intercepts "Y" between classrooms vary (become more or less similar) as time goes. I wanted to test whether this effect is due to L3 predictor "X" (classroom atmosphere). In other words, whether the effect of "X" on "Y" at L3 grows stronger or weaker over time. "Y on S" does not seem to test this, or am I missing something? Thank you so much for your guidance. 


Sorry, I mean to say s on x. 


Thank you for clarifying. Should I keep the direct effect "Y on X" in as well? 


Yes, X could be expected to influence both the intercept ("Y" on L3) and the slope ("s"). 

lopisok posted on Friday, June 19, 2020  6:51 am



Dear Bengt/Linda, We want to do a similar analysis as Allison above. We have a grouplevel outcome and are only interested in the effects of level 1 predictors. Syntax:  NAMES ARE = HVID x1 x2 x3 y; MISSING = All(99, 88); CLUSTER = HVID; BETWEEN ARE y; DEFINE: ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR = ML; MODEL: %WITHIN% x1 x2 x3; %BETWEEN% y on x1 x2 x3;  The syntax above works but I have two questions: 1) I'm wondering now what is happening exactly. Is MPLUS automatically making aggregated means of x1 x2 x3 or is this an adjusted aggregated mean of some sort? 2) In the literature on micromacro analysis as in Croon & Vanveldhoven (2007) and FosterJohnson & Kromrey (2018) a latent variable model is created where latent variables of x1 x2 x3 are introduced at level 2? See their mplus syntax on https://osf.io/z745e/?view_only=133543b6151a4ccbbde895839ceef378 But it's not clear to me why this is necessary. With their syntax I run into convergence problems, with the syntax above everything runs fine. Kind regards 


Mplus gives you a latent variable decomposition of the x's so that only the latent betweenlevel part of each x is used on Between. You can read more about it in this paper on our website: Asparouhov, T. & Muthén, B. (2019). Latent variable centering of predictors and mediators in multilevel and timeseries models. Structural Equation Modeling: A Multidisciplinary Journal, 26, 119142. DOI: 10.1080/10705511.2018.1511375 (Download scripts). 

Kit Tse posted on Tuesday, October 20, 2020  10:22 pm



Dear Drs Muthen, I have a school(L2)student(L1) data structure, and I want to examine the effect of students' scores in Class A(L1predictor X) on a school outcome(L2outcome Y), controlling for students' average scores in the school. Class is an L1 moderator(C) with 2 values (1[Class A] or 0[Class B]). This is a bit different from the 1*1 moderation model in http://www.quantpsy.org/pubs/preacher_zhang_zyphur_2016_(code.appendix).pdf given the L2outcome Y. When I perform linear regression at L2 by averaging X, the aggregation procedure returns the school means of X in Class A that is highly correlated with the overall school means of X, resulting in collinearity issues. I have come up with the MPlus syntax instead: == %BETWEEN% Y ON X C X*C; %WITHIN% X C; == I wonder if my interpretations that 1. the coef of X represents the effect of X in Class B 2. the coef of (X+X*C) represents the effect of X in Class A are correct? Or are they wrong because C at L2 has values different from 0 and 1? Furthermore, if I add in a schoollevel (L2) moderator (Z), could I change the syntax as follows: == %BETWEEN% Y ON X C Z X*C X*Z C*Z X*C*Z; %WITHIN% X C; === Thank you in advance. 


I would recommend this model instead %BETWEEN% Y ON XA XB; XA with XB; %WITHIN% XA XB; XA with XB@0; You would need to organize the data file as Y XA XB If classrooms A and B have different sizes  fill in with missing values. 

Kit Tse posted on Thursday, October 22, 2020  8:45 pm



Dear Dr. Asparouhov, Thank you for your prompt response. Will the strong positive correlations at the school level between XA and XB (because they are classes from the same school) remain an issue in terms of multicollinearity in this model? Thank you again. 


It might. You certainly won't change the correlation by writing it in a different way. You should try a three level model %BETWEEN school% Y ON X; %BETWEEN classroom% X; or X on C; or X on C; X@0; %WITHIN% X; Here you can focus on Var(X) on the middle level. If it is zero that certainly will give you valuable information. 

Back to top 