3-level analysis by HLM and Mplus PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 bmuthen posted on Tuesday, March 26, 2002 - 10:39 am
A frequent question is how 3-level modeling of growth of students within schools compares to HLM-type analysis. Here are some answers.

Actually, these analyses are one and the same. The 3-level HLM formulation describes across-time variation at level 1 involving random intercept and slope coefficients, level 2 studies the across-student variation of those coefficients, while level 3 studies their across-school variation. In the latent variable context, level 1 and level 2 are combined into the "Within" part of the model, i.e. the part describing variation across students. The "Between" part describes across-school variation and corresponds to level 3 of HLM. In this way, the 3-level HLM model is turned into a 2-level latent variable model. The fact that level 1 and level 2 are both considered in the Within part of the latent variable model is due to viewing the level 1 across-time variation as a multivariate observation vector rather than as a univariate repeated observation. Level 1 of the Within part is the latent variable measurement model part and level 2 of the Within part is the structural part.

So in summary, Mplus estimates a random coefficient model and the Between component intercept and growth coefficients are representing variation across clusters in the coefficients defined for Within (students), just as in HLM analysis. If you use the ML estimator you get the same estimates.
 aboabdulmalik posted on Thursday, September 29, 2005 - 10:05 pm
I am examining the mediational effect of (COLL) on the relationship between CL and TS. I have the predictor CL at the school level, the mediator COLL at the school level, and the outcome variable, TS, at the individual level. Thus, it is 2 > 2 > 1 (i.e., CL > COLL > TS; with a direct path from the predictor to the outcome as well).

I am only familiar with the software HLM, through my independent reading. I wanted to examine the mediational effect but i was not able to regress the mediational variable on the predictor variable. Regressing the mediator on the predictor is one essential equation to examine mediational models. Can i examine this equation using OLS through SPSS regression. The rationale is that both the mediator and the predictor variables are both at the school level. While for other equations, i am using HLM to regress the outcome on the predictor and the outcome on the mediator.
If i use SPSS regression to regress the mediator on the predictor, how comparable the coefficients produced by SPSS and HLM. Is there any implimintation required while running the regression. Or, is there any function within the software HLM to run a mediational modeling. I know that MPlus enables to examine the mediational modeling, but my limited time and limited statistics will not help me. The best i can use is the simple 2-level HLM.
I appreciate your help.

Another question is related to examining the assumptions before using HLM. I checked my data in terms of outliers, normality, and linearity through SPSS. I followed Tabachnick and Fidell's 2001) discussion of regression assumptions. My question, then, is there any examinations that are specific to HLM and how to run these examination. Can you refer me to any published work with this regard.
Thank you
 bmuthen posted on Friday, September 30, 2005 - 9:15 am
I think the easiest way to get the correct estimates and particularly the correct standard errors is to use Mplus for this modeling. That would make teh analysis very straightforward.
 Marc Reis posted on Monday, November 21, 2005 - 12:36 pm

I would like to use Mplus to estimate the effect of the group-level variable organizational climate on an individual-level outcome. Organizational climate is measured with multiple individual-level (continous) indicators, so there are two scources of error variance for the aggregated organizational climate score for each organizational unit: the variation among the items (due to the fact that they are not perfectly reliable) and among the members of each organizational unit (assuming that there is a "true score" for each group). Since three-level-modeling is not yet available, are there any other ways to estimate the model?

Many thanks for any suggestions!
 Linda K. Muthen posted on Monday, November 21, 2005 - 6:39 pm
Multiple indicators do not count as a level in Mplus because it takes a multivariate approach to multilevel modeling. In Mplus this would be a two-level model if I am understanding you correctly.
 Marc Reis posted on Tuesday, November 22, 2005 - 2:46 am
My problem is that organizational climate is defined as a latent group-level-variable that influences a latent individual-level variable. The idea was to specify the following model:

DV_within BY i1-i6;

DV_between BY i1-i6;

I'am unsure about how Mplus treats i7-i12 in this case? Would ORG_CLIMATE be a "correct measure" of the latent group-level variable? (e.g. would it be reasonable to save a factor score for each group?)
 Linda K. Muthen posted on Tuesday, November 22, 2005 - 7:30 am
You would specify the model as you have done above. You would also have BETWEEN = i7-i12; in the VARIABLE command.
 Marc Reis  posted on Tuesday, November 22, 2005 - 1:15 pm
I tried to specify BETWEEN = i7-i12; in the VARIABLE command, but Mplus doesn´t allow group-level variables to have within-group variation. Note that i7-i12 are individual level indicators that I would like to aggegrate to the group level. So maybe there's another way or I made a mistake.
 bmuthen posted on Tuesday, November 22, 2005 - 4:36 pm
If i7-i12 are scores on individuals, then you don't put those variables on the Between = list. And, you might want to use a within-level factor, say


wORG by i7-i12;


bORG by i7-i12;

This assumes that you are interested in the within structure of these variables as well as the between structure (and it may not be the same).

Another alternative is to simply aggregate each variable to the between level, i.e. creating cluster means, and then treat these as Between = variables with only between-level variation - and then specify the between-level factor model you have.
 Anonymous posted on Wednesday, November 23, 2005 - 1:39 am

I have an idea about the above discussion, maybe this helps. In general there might be two reasons to specify a model with latent variables: The indicators are no perfect measures (--> error variance) and they usually do not measure the factor to the same extent (--> different factor loadings). So requesting factor scores means actually weighting the indicators, doesn´t it?

The question is whether it is necessary to weight the members of a group to obtain a better estimate of the group-level variable. Assuming that there is no special sampling procedure, I don´t see a rationale to weight one group member more than others. So from this point of view, it would be reasonable to simply compute the cluster mean as Bengt suggested, maybe based on the individual-level factor scores. But I am unsure whether to compute the factor scores based on the original covariance matrix or the dissaggregated within covariance matrix.

Maybe someone would like to comment...
 bmuthen posted on Wednesday, November 23, 2005 - 6:37 pm
Comments are invited regarding this. Concerning whether to compute factor scores based on the original cov matrix or within cov matrix, I would say that if factor scores are needed for a multilevel setting, you are better off getting factor scores from a multilevel factor analysis model.
 chantanee posted on Thursday, August 10, 2006 - 2:19 am
The purpose of my study was to find out the relationships among multilevel variables, student variables, classroom variables, and school variables, effected on student science learning achievement of Thai upper secondary school students. The study consisted of 3 sub objectives: (1) to identify student variables directly effected on science learning achievement (2) to identify the direct influences of classroom variables and cross level interaction between classroom variables and student variables effected on student science learning achievement, and (3) To identify the direct influences of school variables and cross level interaction between school variables and classroom variables or between school variables and student variables effected on student science learning achievement.
The sample of the study employed multi-stage random sampling. It consisted of 3 groups: (1) 132 school administrators: principals, assistant principals and heads of science department from 44 public upper secondary schools in Thailand, (2) 132 science teacher who taught Chemistry, Biology and Physics in the classroom, from 88 classroom samples ( 2 classrooms per school), and (3) 2,488 Grade 11 science students.
If it’s possible, could you kindly give me some advice for these question?
1. Are the results effectiveness to report?
2. Are two classrooms per school powerful enough for employing HLM?
 Bengt O. Muthen posted on Thursday, August 10, 2006 - 6:36 pm
This sounds like multilevel modeling would work well - you have enough schools and classrooms. 2 classrooms per school is a bare minimum which does not allow you to study many classroom variables. Mplus does not currently handle this 3-level model.
 Jeff Cookston posted on Thursday, October 19, 2006 - 2:23 pm
We have a dataset that includes three waves of data collected on a number of family context constructs. LGM has been our preferred method of analysis, but we're now planning to collect 2 days of cortisol samples from our participants with multiple cortisol samples each day. Most prior studies that use cortisol data tend to model the daily hormone patterns using HLM. I'm confused how we can use our LGMs of the family context constructs to predict the HLM based hormone patterns. Could the cortisol data be modeled in LGM and then used as a dual process predicted by family context?
 Linda K. Muthen posted on Thursday, October 19, 2006 - 4:27 pm
Are you referring to HLM the program or hierarchical linear modeling in general?
 Jeff Cookston posted on Thursday, October 19, 2006 - 8:33 pm
Sorry, hierarchical linear modeling in general.
 Linda K. Muthen posted on Friday, October 20, 2006 - 8:02 am
The SEM and HLM growth models differ in two basic ways. One is the treatment of time scores. In SEM, they are treated as parameters in the model. In HLM, they are treated as data. The second is the treatment of time-varying covariates. The regression coefficients are fixed in SEM and random in HLM. Mplus can have time scores as parameters or data and can have fixed or random coefficients for time-varying covariates. So I think you should be okay.
 Frank Gallo posted on Sunday, August 09, 2009 - 8:02 pm
Dear Dr. Muthen

I am a beginner with Mplus. I am using Mplus Version 5.21. I have stratified data: police arrests (n =3,300) within police departments (n = 16) that serve community population levels (n = 4). The DV police force is continuous. I have a mixture (nominal, ordinal, ratio) of 21 covariates at level 1 and none at levels 2 and 3. Community levels are fixed effects. Would the multilevel modeling features of Mplus handle these data? Thank you.

Best regards,
 Linda K. Muthen posted on Monday, August 10, 2009 - 6:44 am
It sounds like you have a two-level cross-sectional model which can be estimated in Mplus. The problem I see is that you have only 16 police departments. It is usually recommended to have a minimum of 30 clusters.
 Utkun Ozdil posted on Thursday, December 16, 2010 - 11:28 am

I collected my data from a university's three faculties (Faculty of Education, Faculty of Engineering, and Faculty of Arts and Sciences). In each of these faculties were involved second, third, and fourth grade undergraduates. So, I have students nested within grade levels within faculties. This led me to analyze a three-level model.
Does MPlus handle such data analysis as mine or is the HLM program more appropriate?


 Linda K. Muthen posted on Thursday, December 16, 2010 - 2:49 pm
Mplus does not currently have three-level cross-sectional models. HLM does. Your data, however, are not suitable for multilevel modeling given that faculty and grade cannot be considered random modes.
 Jing Zhang posted on Wednesday, August 24, 2011 - 3:35 pm
Dear Dr. Muthen,
In your post dated on March 26, 2002, you talked about how to deal with 3-level modeling of growth of students within schools. You said that level 1 and level 2 are combined into the "Within" part of the model by viewing the level 1 across-time variation as a multivariate observation vector rather than as a univariate repeated observation. My question is that:
Does this mean that the data in long format will not work, and the data has to be changed to wide format if it is in long format?
 Linda K. Muthen posted on Wednesday, August 24, 2011 - 5:07 pm
 Andrea M Reina Tamayo posted on Monday, September 22, 2014 - 8:12 am
I am working on a three-level CFA model (unbalanced data). The method I used was ESM to examine the variability of a continues variable. Because the method is so intense, I used three items to capture the construct.
Person gave ratings on these items three times a day, for five days. Therefore, moments were nested within days, and days within people. These are the 3 levels.

I gave a unique ID to every person, and this unique ID appears in the data 15 times per person.

Example below for one person a bit of a second person. M stands for moment.

ID Day M X1 X2 X3
1234 1 1 1 6 7
1234 1 2 2 4 6
1234 1 3 3 . .
1234 2 1
1234 2 2
1234 2 3
1234 3 1
1234 3 2
1234 3 3
1234 4 1
1234 4 2
1234 4 3
1234 5 1
1234 5 2
1234 5 3
1567 1 1
1567 1 2
1567 1 3
1567 2 1

When I run my analysis MPlus gives me a warning message:

Clusters for DAY with the same IDs have been found in different clusters
for RESP_NR. These clusters are assumed to be different because clusters for
DAY are not allowed to appear in more than one cluster for RESP_NR.

I want to know what does this warning message mean? Can I trust my output with it, or is it affecting my results?
I would appreciate your help!
 Linda K. Muthen posted on Monday, September 22, 2014 - 9:39 am
Please send the full output and your license number to support@statmodel.com.
 Luo Wenshu posted on Tuesday, March 24, 2015 - 7:42 pm
Hi Dr. Muthen,

I am using MPlus to run a 2-level HLM model and have the following questions.
1) Is there a default setting for centering of predictors, grandmean or groupmean?
2) For random slopes, if we find they are not statistically significant, does this mean we do not need to build level 2 model with predictors for these random slopes and just turn to fixed model for slopes?
3) Do we need to allow random intercepts and slopes correlated at Level 2?

Thank you very much.
 Bengt O. Muthen posted on Wednesday, March 25, 2015 - 7:36 am
1) The default is no centering.

2) You may still find significant influence of level-2 predictors on the random slopes.

3) Yes.
 Luo Wenshu posted on Thursday, March 26, 2015 - 1:25 am
Thank you very much Dr. Muthen,
For the correlations among random intercepts and slopes at Level 2, what is the default setting in MPlus? It seems that the corrlelations are fixed to be zeros by default.
 Bengt O. Muthen posted on Thursday, March 26, 2015 - 8:23 am
It depends on the analysis setting. You see in the output what is done in each case. If the covariance is not there, add it.
 Melody Kung posted on Thursday, August 06, 2015 - 12:28 pm
Hi Drs. Muthen,

I am running a 2-level model with 2 independent, latent, between-level variables.

Using example of 9.12 in the manual as a guide, I specified "within" and "between" variable names in the VARIABLES section and then defined the latent variables and their indicators in the MODEL section, followed by the "%WITHIN%" and "%BETWEEN%" statements. The example does not include latent variables, whereas my model does.

Can I include latent variables in the %BETWEEN% statements? When I try to do so, the error message that shows up states that the two latent variables in the BETWEEN option are unknown, even though I specified the latent variables in the MODEL section.

I hope my question makes sense. Thanks for any help you can provide.
 Linda K. Muthen posted on Thursday, August 06, 2015 - 1:48 pm
Please send the output and your license number to support@statmodel.com.
 Cynthia Yuen posted on Tuesday, February 09, 2016 - 1:56 pm

I have a longitudinal study where teens completed daily diaries for 14 days each year for 3 years. Thus, days are nested within years which are nested within individuals. I am interested in whether an individual level variable (e.g., sex) moderates certain slopes at the daily level (e.g., conflict --> distress within a day). I'm having trouble figuring out the appropriate analysis to run -- would this be a three-level model such as in example 9.20 of the version 7.6 guidebook?

Thank you!
 Bengt O. Muthen posted on Tuesday, February 09, 2016 - 6:21 pm
Having only 3 years on Level-2 is too few for a 3-level analysis. Perhaps you could do a 2-level analysis of days within subject and let year be represented by dummy covariates.
 Cynthia Yuen posted on Thursday, February 11, 2016 - 7:41 am
Thanks for the quick response! Would it be better to do something like 9.12 or 9.13 instead and model growth within a two-level model? Two of our main questions are whether the daily relations between events (e.g., conflict --> distress) change as teens age, and whether some individual-level characteristics like ethnicity predict how/whether these slopes change over time. Do you have any advice on how to appropriately model this?
 Bengt O. Muthen posted on Friday, February 12, 2016 - 5:26 pm
I don't hear that you have a growth model situation but a regression of distress on conflict - where that regression may change over year (I assume, not over the days). If so, I would do a 2-level regression where level 1 is time (the 14 days) and level 2 is subject. Year can be level-2 and can predict the DV distress and perhaps the slope by creating Year*conflict and letting that influence distress. But it is a research question which I don't have enough background in your study to really answer.
 Luo Wenshu posted on Friday, April 15, 2016 - 4:46 pm
Dear Dr. Muthen,

In 2-level HLM analysis (student and class level), I see from the users guide that for level 2 predictors, we may use observed (mx) or latent(x). I know the observed level 2 can be obtained by aggregating level 1 scores at the class level.
1) How is the latent Level 2 predictor calculated?
2)Which one is preferred?
3)If I have a level 1 predictor as Rasch measure (i.e.,latent variable), do I still need to use latent variable of the predictor at Level 2?

Thank you very much!
 Bengt O. Muthen posted on Friday, April 15, 2016 - 5:43 pm
1-2. See the paper on our website:

Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229.

3. If the Rasch indicators vary over level-2 units, you should express a latent variable model on this level also - the indicators are then the random intercepts of each observed indicator.
 Luo Wenshu posted on Friday, April 15, 2016 - 6:11 pm
Thank you for the quick response, Dr. Muthen!

I used latent level 2 predictors in my 2 level random slopes analysis. Using the default estimator MLR, I got the following warning message. Does this mean the result is not trustworthy. Do I need to turn to MLF estimator as suggested.


In addition, I found level 2 predictors had big standard errors. I suspect multicollinearity issue at level 2 because correlations at level 2 are usually higher than correlations at level 1 between the same set of variables. How to solve the problem?

Thank you very much Again!
 Bengt O. Muthen posted on Sunday, April 17, 2016 - 4:05 pm
Try a smaller mconvergence value than the one you see in the analysis Summary.

If this doesn't help, send output and license number to Support.
 Katrina Jia Lin posted on Sunday, September 04, 2016 - 1:25 am
Dear Dr. Muthen,

I am conducting a three-level path analysis and encountered this error message.

*** ERROR in DEFINE command
The GROUPMEAN specification for TYPE=THREELEVEL must include the name of
the cluster variable for the cluster means. Problem with:

The lines for the centering I used is

The cluster for level 2 is SCODE, and for level 3 is CCODE. I understand I will need to add SCODE to the line but I am not sure how. Can you please advise on this?

Thank you in advance.
 Linda K. Muthen posted on Sunday, September 04, 2016 - 6:57 am
Please send the full output and your license number to support@statmodel.com.
 Cora Parrisius posted on Monday, July 02, 2018 - 3:59 am
Dear Dr. Muthén,

We tried to run 3-level analyses (student, class, school) and had some difficulties.
We run an intervention study; within schools, classes were randomly assigned to either one of two intervention conditions or the control condition.
Now we have outcome y (measured after the intervention) and would like to know if the intervention had an effect on y while controlling for the pretest measure. We expect the intervention effect to differ between conditions and between classes within one condition.
That is, we want to allow random slopes for the intervention effect on the outcome y in our analyses.
The problem is that we had schools participating with 1 up to 5 classes which is why the distributions of the conditions within schools differed between schools.
To consider this in our analyses we would like to let the condition variables ("cond1" and "cond2") vary between schools.

Thus, the cond variable is a variable with model variance at level 3 and no variance at level 1.
When we try to implement that in Mplus we get the following error message:

"*** ERROR in MODEL command
Variables that have been declared as variables for the BETWEEN CLASS_ID level
cannot be used on the BETWEEN SCHOOL_ID level.
Variable incorrectly used: COND1".

Any recommendations?

 Bengt O. Muthen posted on Monday, July 02, 2018 - 10:10 am
Please send your input and data to Support along with your license number.
 Maurits Masselink posted on Thursday, August 02, 2018 - 9:02 am

I am following the recommendations of Nezlek (2016) to estimate the reliability of an ESM measure using three level analyses. He provides an Mplus 7 example output with a warning which Nezlek tells to ignore:

Clusters for DAYNUM with the same IDs have been found in different clusters for SUBJNUM. These clusters are assumed to be different because clusters for DAYNUM are not allowed to appear in more than one clusters for SUBJNUM.

The problem, when I repeat it in Mplus 8, instead of a warning, I get an error:

Usevariables = self;
Cluster = ID beep ;
Type = threelevel random;
%BETWEEN beep%

Clusters for BEEP with the same IDs have been found in different clusters for ID. These clusters must have different IDs because clusters for BEEP are not allowed to appear in more than one cluster for ID. Check that the cluster variables are specified in the right order. Alternately, create unique IDs for BEEP in DEFINE based on its original IDs and a multiple of ID.

I multiplied beep with ID, it runs, but I think it is a bit strange to do, and it gives me a between beep variance for SELF of 0 which is odd.

Is there something changed in how Mplus handles this between version 7 and 8?
Is creating unique ID's for Beep indeed the way to go?

 Tihomir Asparouhov posted on Thursday, August 02, 2018 - 2:15 pm
We go back and forth about what to do regarding that issue and indeed in V8.1 we have changed to the more restrictive setup, but in the next version we will go back to what we had in V7.

All you need to do is add this command


to create unique BEEP values for each level 3 cluster.
 Jim Sloane posted on Thursday, August 30, 2018 - 2:10 pm
Hello. I am trying to fit a multilevel growth model using ANALYSIS: TYPE = THREELEVEL RANDOM. I have data for test scores over time nested within students (L2) nested within schools (L3). Let's say it's the simplest case with a test score variable, "score", and a time variable, "t". I want to estimate the model such that there's a random slope on time at both the student and school levels. However, I'm having trouble doing it in Mplus. My two specific question are:

1. For this scenario, under Variable:, do I list score and t as (a) WITHIN = score t, (b) BETWEEN = score t, (c) both, or (d) neither?

2. How do I specify something like

s1 | score t
s2 | score t

to get slopes at both the student and school levels without getting an error about duplication of terms?

Thank you, and apologies if this is all spelled out somewhere already!
 Bengt O. Muthen posted on Thursday, August 30, 2018 - 6:18 pm
1. (d) Neither - which mean variation exists on all 3 levels.

2. You say Within = t and then you

say s | score on t;

and then mention s on the 2 higher levels. Also see V8 UG ex 9.20 and later examples for variations on this theme. See also out Short Course handout and video for Topic 10 on our website where 3-level modeling is discussed.
 Jim Sloane posted on Friday, August 31, 2018 - 9:53 am
Thanks very much! So, something like:

WITHIN = t t2 ;
CLUSTER = schid id;


s1 | score ON t t2;

score WITH s1;

%BETWEEN schid%
score WITH s1;
 Bengt O. Muthen posted on Friday, August 31, 2018 - 2:10 pm
Right, but you can't say

s1 | score ON t t2;

because s1 refers to one slope, not several. So say

s1 | score ON t;

s2 | score ON t2;

 Javed Ashraf posted on Wednesday, September 05, 2018 - 10:34 am
I had a query that can we conduct three level multilevel mediation modelling using categorical observed variables and between and within subject groups in latent variables context using Mplus version 8.1.

Is there any solution possible if we don't use the sample weights or employ complex survey methodology in the given scenario.
Best Regards

 Bengt O. Muthen posted on Wednesday, September 05, 2018 - 10:36 am
Yes, Mplus does 3-level with categorical variables using Bayesian estimation - see UG ex 9.21.
 Zhi Ye posted on Sunday, September 16, 2018 - 6:25 pm
Dear Dr. Muthen,
I am running a three-level interaction model according to example 9.20 in the UG as following:
WITHIN= skill1 skill2 ;
BETWEEN = (classid) prosocial1 prosocial2
(schoolid)climate1 climate2 ;

ANALYSIS: estimator=ML;
TYPE = threelevel Random;
skill by skill1 skill2 ;
PVW by PV1 PV2 ;
s1 | PVW on skill;
%BETWEEN classid%
prosocial by prosocial1 prosocial2;
PVB1 by PV1 PV2 ;
s2 | PVB1 on prosocial;
s12 | s1 on prosocial;
PVB1 with s1;
%BETWEEN schoolid%
PVB2 by PV1 PV2 ;
climate by climate1 climate2 ;
PVB2 on climate;
s1 on climate;
s2 on climate;
s12 on climate;

PVB2 with s1 s2 s12;
s1 with s2 s12;
s2 with s12;

However, there is an error showed that:

*** ERROR in MODEL command
The following random slope is not allowed for TYPE=THREELEVEL.
Problem with: S2 | PVB1 ON PROSOCIAL

The following random slope is not allowed for TYPE=THREELEVEL.
Problem with: S12 | S1 ON PROSOCIAL

Could you please help me to fix the problem?

Thank you so much!
 Bengt O. Muthen posted on Monday, September 17, 2018 - 10:13 am
Please send your output to Support along with your license number.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message