Message/Author 

bmuthen posted on Tuesday, March 26, 2002  10:39 am



A frequent question is how 3level modeling of growth of students within schools compares to HLMtype analysis. Here are some answers. Actually, these analyses are one and the same. The 3level HLM formulation describes acrosstime variation at level 1 involving random intercept and slope coefficients, level 2 studies the acrossstudent variation of those coefficients, while level 3 studies their acrossschool variation. In the latent variable context, level 1 and level 2 are combined into the "Within" part of the model, i.e. the part describing variation across students. The "Between" part describes acrossschool variation and corresponds to level 3 of HLM. In this way, the 3level HLM model is turned into a 2level latent variable model. The fact that level 1 and level 2 are both considered in the Within part of the latent variable model is due to viewing the level 1 acrosstime variation as a multivariate observation vector rather than as a univariate repeated observation. Level 1 of the Within part is the latent variable measurement model part and level 2 of the Within part is the structural part. So in summary, Mplus estimates a random coefficient model and the Between component intercept and growth coefficients are representing variation across clusters in the coefficients defined for Within (students), just as in HLM analysis. If you use the ML estimator you get the same estimates. 


Hi, I am examining the mediational effect of (COLL) on the relationship between CL and TS. I have the predictor CL at the school level, the mediator COLL at the school level, and the outcome variable, TS, at the individual level. Thus, it is 2 > 2 > 1 (i.e., CL > COLL > TS; with a direct path from the predictor to the outcome as well). I am only familiar with the software HLM, through my independent reading. I wanted to examine the mediational effect but i was not able to regress the mediational variable on the predictor variable. Regressing the mediator on the predictor is one essential equation to examine mediational models. Can i examine this equation using OLS through SPSS regression. The rationale is that both the mediator and the predictor variables are both at the school level. While for other equations, i am using HLM to regress the outcome on the predictor and the outcome on the mediator. If i use SPSS regression to regress the mediator on the predictor, how comparable the coefficients produced by SPSS and HLM. Is there any implimintation required while running the regression. Or, is there any function within the software HLM to run a mediational modeling. I know that MPlus enables to examine the mediational modeling, but my limited time and limited statistics will not help me. The best i can use is the simple 2level HLM. I appreciate your help. Another question is related to examining the assumptions before using HLM. I checked my data in terms of outliers, normality, and linearity through SPSS. I followed Tabachnick and Fidell's 2001) discussion of regression assumptions. My question, then, is there any examinations that are specific to HLM and how to run these examination. Can you refer me to any published work with this regard. Thank you 

bmuthen posted on Friday, September 30, 2005  9:15 am



I think the easiest way to get the correct estimates and particularly the correct standard errors is to use Mplus for this modeling. That would make teh analysis very straightforward. 

Marc Reis posted on Monday, November 21, 2005  12:36 pm



Hello, I would like to use Mplus to estimate the effect of the grouplevel variable organizational climate on an individuallevel outcome. Organizational climate is measured with multiple individuallevel (continous) indicators, so there are two scources of error variance for the aggregated organizational climate score for each organizational unit: the variation among the items (due to the fact that they are not perfectly reliable) and among the members of each organizational unit (assuming that there is a "true score" for each group). Since threelevelmodeling is not yet available, are there any other ways to estimate the model? Many thanks for any suggestions! 


Multiple indicators do not count as a level in Mplus because it takes a multivariate approach to multilevel modeling. In Mplus this would be a twolevel model if I am understanding you correctly. 

Marc Reis posted on Tuesday, November 22, 2005  2:46 am



My problem is that organizational climate is defined as a latent grouplevelvariable that influences a latent individuallevel variable. The idea was to specify the following model: %within% DV_within BY i1i6; %between% ORG_CLIMATE BY i7i12; DV_between BY i1i6; DV_between ON ORG_CLIMATE; I'am unsure about how Mplus treats i7i12 in this case? Would ORG_CLIMATE be a "correct measure" of the latent grouplevel variable? (e.g. would it be reasonable to save a factor score for each group?) 


You would specify the model as you have done above. You would also have BETWEEN = i7i12; in the VARIABLE command. 

Marc Reis posted on Tuesday, November 22, 2005  1:15 pm



I tried to specify BETWEEN = i7i12; in the VARIABLE command, but Mplus doesn´t allow grouplevel variables to have withingroup variation. Note that i7i12 are individual level indicators that I would like to aggegrate to the group level. So maybe there's another way or I made a mistake. 

bmuthen posted on Tuesday, November 22, 2005  4:36 pm



If i7i12 are scores on individuals, then you don't put those variables on the Between = list. And, you might want to use a withinlevel factor, say %Within% wORG by i7i12; %Between% bORG by i7i12; This assumes that you are interested in the within structure of these variables as well as the between structure (and it may not be the same). Another alternative is to simply aggregate each variable to the between level, i.e. creating cluster means, and then treat these as Between = variables with only betweenlevel variation  and then specify the betweenlevel factor model you have. 

Anonymous posted on Wednesday, November 23, 2005  1:39 am



Hello, I have an idea about the above discussion, maybe this helps. In general there might be two reasons to specify a model with latent variables: The indicators are no perfect measures (> error variance) and they usually do not measure the factor to the same extent (> different factor loadings). So requesting factor scores means actually weighting the indicators, doesn´t it? The question is whether it is necessary to weight the members of a group to obtain a better estimate of the grouplevel variable. Assuming that there is no special sampling procedure, I don´t see a rationale to weight one group member more than others. So from this point of view, it would be reasonable to simply compute the cluster mean as Bengt suggested, maybe based on the individuallevel factor scores. But I am unsure whether to compute the factor scores based on the original covariance matrix or the dissaggregated within covariance matrix. Maybe someone would like to comment... 

bmuthen posted on Wednesday, November 23, 2005  6:37 pm



Comments are invited regarding this. Concerning whether to compute factor scores based on the original cov matrix or within cov matrix, I would say that if factor scores are needed for a multilevel setting, you are better off getting factor scores from a multilevel factor analysis model. 

chantanee posted on Thursday, August 10, 2006  2:19 am



The purpose of my study was to find out the relationships among multilevel variables, student variables, classroom variables, and school variables, effected on student science learning achievement of Thai upper secondary school students. The study consisted of 3 sub objectives: (1) to identify student variables directly effected on science learning achievement (2) to identify the direct influences of classroom variables and cross level interaction between classroom variables and student variables effected on student science learning achievement, and (3) To identify the direct influences of school variables and cross level interaction between school variables and classroom variables or between school variables and student variables effected on student science learning achievement. The sample of the study employed multistage random sampling. It consisted of 3 groups: (1) 132 school administrators: principals, assistant principals and heads of science department from 44 public upper secondary schools in Thailand, (2) 132 science teacher who taught Chemistry, Biology and Physics in the classroom, from 88 classroom samples ( 2 classrooms per school), and (3) 2,488 Grade 11 science students. If it’s possible, could you kindly give me some advice for these question? 1. Are the results effectiveness to report? 2. Are two classrooms per school powerful enough for employing HLM? 


This sounds like multilevel modeling would work well  you have enough schools and classrooms. 2 classrooms per school is a bare minimum which does not allow you to study many classroom variables. Mplus does not currently handle this 3level model. 


We have a dataset that includes three waves of data collected on a number of family context constructs. LGM has been our preferred method of analysis, but we're now planning to collect 2 days of cortisol samples from our participants with multiple cortisol samples each day. Most prior studies that use cortisol data tend to model the daily hormone patterns using HLM. I'm confused how we can use our LGMs of the family context constructs to predict the HLM based hormone patterns. Could the cortisol data be modeled in LGM and then used as a dual process predicted by family context? 


Are you referring to HLM the program or hierarchical linear modeling in general? 


Sorry, hierarchical linear modeling in general. 


The SEM and HLM growth models differ in two basic ways. One is the treatment of time scores. In SEM, they are treated as parameters in the model. In HLM, they are treated as data. The second is the treatment of timevarying covariates. The regression coefficients are fixed in SEM and random in HLM. Mplus can have time scores as parameters or data and can have fixed or random coefficients for timevarying covariates. So I think you should be okay. 

Frank Gallo posted on Sunday, August 09, 2009  8:02 pm



Dear Dr. Muthen I am a beginner with Mplus. I am using Mplus Version 5.21. I have stratified data: police arrests (n =3,300) within police departments (n = 16) that serve community population levels (n = 4). The DV police force is continuous. I have a mixture (nominal, ordinal, ratio) of 21 covariates at level 1 and none at levels 2 and 3. Community levels are fixed effects. Would the multilevel modeling features of Mplus handle these data? Thank you. Best regards, Frank 


It sounds like you have a twolevel crosssectional model which can be estimated in Mplus. The problem I see is that you have only 16 police departments. It is usually recommended to have a minimum of 30 clusters. 

Utkun Ozdil posted on Thursday, December 16, 2010  11:28 am



Hi,, I collected my data from a university's three faculties (Faculty of Education, Faculty of Engineering, and Faculty of Arts and Sciences). In each of these faculties were involved second, third, and fourth grade undergraduates. So, I have students nested within grade levels within faculties. This led me to analyze a threelevel model. Does MPlus handle such data analysis as mine or is the HLM program more appropriate? Thanks... Utkun 


Mplus does not currently have threelevel crosssectional models. HLM does. Your data, however, are not suitable for multilevel modeling given that faculty and grade cannot be considered random modes. 

Jing Zhang posted on Wednesday, August 24, 2011  3:35 pm



Dear Dr. Muthen, In your post dated on March 26, 2002, you talked about how to deal with 3level modeling of growth of students within schools. You said that level 1 and level 2 are combined into the "Within" part of the model by viewing the level 1 acrosstime variation as a multivariate observation vector rather than as a univariate repeated observation. My question is that: Does this mean that the data in long format will not work, and the data has to be changed to wide format if it is in long format? Thanks, Jing 


Yes. 


I am working on a threelevel CFA model (unbalanced data). The method I used was ESM to examine the variability of a continues variable. Because the method is so intense, I used three items to capture the construct. Person gave ratings on these items three times a day, for five days. Therefore, moments were nested within days, and days within people. These are the 3 levels. I gave a unique ID to every person, and this unique ID appears in the data 15 times per person. Example below for one person a bit of a second person. M stands for moment. ID Day M X1 X2 X3 1234 1 1 1 6 7 1234 1 2 2 4 6 1234 1 3 3 . . 1234 2 1 1234 2 2 1234 2 3 1234 3 1 1234 3 2 1234 3 3 1234 4 1 1234 4 2 1234 4 3 1234 5 1 1234 5 2 1234 5 3 1567 1 1 1567 1 2 1567 1 3 1567 2 1 When I run my analysis MPlus gives me a warning message: *** WARNING Clusters for DAY with the same IDs have been found in different clusters for RESP_NR. These clusters are assumed to be different because clusters for DAY are not allowed to appear in more than one cluster for RESP_NR. I want to know what does this warning message mean? Can I trust my output with it, or is it affecting my results? I would appreciate your help! 


Please send the full output and your license number to support@statmodel.com. 

Luo Wenshu posted on Tuesday, March 24, 2015  7:42 pm



Hi Dr. Muthen, I am using MPlus to run a 2level HLM model and have the following questions. 1) Is there a default setting for centering of predictors, grandmean or groupmean? 2) For random slopes, if we find they are not statistically significant, does this mean we do not need to build level 2 model with predictors for these random slopes and just turn to fixed model for slopes? 3) Do we need to allow random intercepts and slopes correlated at Level 2? Thank you very much. 


1) The default is no centering. 2) You may still find significant influence of level2 predictors on the random slopes. 3) Yes. 

Luo Wenshu posted on Thursday, March 26, 2015  1:25 am



Thank you very much Dr. Muthen, For the correlations among random intercepts and slopes at Level 2, what is the default setting in MPlus? It seems that the corrlelations are fixed to be zeros by default. 


It depends on the analysis setting. You see in the output what is done in each case. If the covariance is not there, add it. 

Melody Kung posted on Thursday, August 06, 2015  12:28 pm



Hi Drs. Muthen, I am running a 2level model with 2 independent, latent, betweenlevel variables. Using example of 9.12 in the manual as a guide, I specified "within" and "between" variable names in the VARIABLES section and then defined the latent variables and their indicators in the MODEL section, followed by the "%WITHIN%" and "%BETWEEN%" statements. The example does not include latent variables, whereas my model does. Can I include latent variables in the %BETWEEN% statements? When I try to do so, the error message that shows up states that the two latent variables in the BETWEEN option are unknown, even though I specified the latent variables in the MODEL section. I hope my question makes sense. Thanks for any help you can provide. 


Please send the output and your license number to support@statmodel.com. 


Hello, I have a longitudinal study where teens completed daily diaries for 14 days each year for 3 years. Thus, days are nested within years which are nested within individuals. I am interested in whether an individual level variable (e.g., sex) moderates certain slopes at the daily level (e.g., conflict > distress within a day). I'm having trouble figuring out the appropriate analysis to run  would this be a threelevel model such as in example 9.20 of the version 7.6 guidebook? Thank you! 


Having only 3 years on Level2 is too few for a 3level analysis. Perhaps you could do a 2level analysis of days within subject and let year be represented by dummy covariates. 

Cynthia Yuen posted on Thursday, February 11, 2016  7:41 am



Thanks for the quick response! Would it be better to do something like 9.12 or 9.13 instead and model growth within a twolevel model? Two of our main questions are whether the daily relations between events (e.g., conflict > distress) change as teens age, and whether some individuallevel characteristics like ethnicity predict how/whether these slopes change over time. Do you have any advice on how to appropriately model this? 


I don't hear that you have a growth model situation but a regression of distress on conflict  where that regression may change over year (I assume, not over the days). If so, I would do a 2level regression where level 1 is time (the 14 days) and level 2 is subject. Year can be level2 and can predict the DV distress and perhaps the slope by creating Year*conflict and letting that influence distress. But it is a research question which I don't have enough background in your study to really answer. 

Luo Wenshu posted on Friday, April 15, 2016  4:46 pm



Dear Dr. Muthen, In 2level HLM analysis (student and class level), I see from the users guide that for level 2 predictors, we may use observed (mx) or latent(x). I know the observed level 2 can be obtained by aggregating level 1 scores at the class level. 1) How is the latent Level 2 predictor calculated? 2)Which one is preferred? 3)If I have a level 1 predictor as Rasch measure (i.e.,latent variable), do I still need to use latent variable of the predictor at Level 2? Thank you very much! 


12. See the paper on our website: Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel effects in contextual studies. Psychological Methods, 13, 203229. 3. If the Rasch indicators vary over level2 units, you should express a latent variable model on this level also  the indicators are then the random intercepts of each observed indicator. 

Luo Wenshu posted on Friday, April 15, 2016  6:11 pm



Thank you for the quick response, Dr. Muthen! I used latent level 2 predictors in my 2 level random slopes analysis. Using the default estimator MLR, I got the following warning message. Does this mean the result is not trustworthy. Do I need to turn to MLF estimator as suggested. WARNING: THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. AN ADJUSTMENT TO THE ESTIMATION OF THE INFORMATION MATRIX HAS BEEN MADE. THE CONDITION NUMBER IS 0.157D01. THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OR LOGCRITERION OPTIONS OR BY CHANGING THE STARTING VALUES OR BY USING THE MLF ESTIMATOR. In addition, I found level 2 predictors had big standard errors. I suspect multicollinearity issue at level 2 because correlations at level 2 are usually higher than correlations at level 1 between the same set of variables. How to solve the problem? Thank you very much Again! 


Try a smaller mconvergence value than the one you see in the analysis Summary. If this doesn't help, send output and license number to Support. 


Dear Dr. Muthen, I am conducting a threelevel path analysis and encountered this error message. *** ERROR in DEFINE command The GROUPMEAN specification for TYPE=THREELEVEL must include the name of the cluster variable for the cluster means. Problem with: PAAM CSP The lines for the centering I used is DEFINE: CENTER PAAM CSP(GROUPMEAN); The cluster for level 2 is SCODE, and for level 3 is CCODE. I understand I will need to add SCODE to the line but I am not sure how. Can you please advise on this? Thank you in advance. 


Please send the full output and your license number to support@statmodel.com. 


Dear Dr. Muthén, We tried to run 3level analyses (student, class, school) and had some difficulties. We run an intervention study; within schools, classes were randomly assigned to either one of two intervention conditions or the control condition. Now we have outcome y (measured after the intervention) and would like to know if the intervention had an effect on y while controlling for the pretest measure. We expect the intervention effect to differ between conditions and between classes within one condition. That is, we want to allow random slopes for the intervention effect on the outcome y in our analyses. The problem is that we had schools participating with 1 up to 5 classes which is why the distributions of the conditions within schools differed between schools. To consider this in our analyses we would like to let the condition variables ("cond1" and "cond2") vary between schools. Thus, the cond variable is a variable with model variance at level 3 and no variance at level 1. When we try to implement that in Mplus we get the following error message: "*** ERROR in MODEL command Variables that have been declared as variables for the BETWEEN CLASS_ID level cannot be used on the BETWEEN SCHOOL_ID level. Variable incorrectly used: COND1". Any recommendations? Thanks, Cora 


Please send your input and data to Support along with your license number. 


Hi, I am following the recommendations of Nezlek (2016) to estimate the reliability of an ESM measure using three level analyses. He provides an Mplus 7 example output with a warning which Nezlek tells to ignore: WARNING Clusters for DAYNUM with the same IDs have been found in different clusters for SUBJNUM. These clusters are assumed to be different because clusters for DAYNUM are not allowed to appear in more than one clusters for SUBJNUM. The problem, when I repeat it in Mplus 8, instead of a warning, I get an error: Usevariables = self; Cluster = ID beep ; Analysis: Type = threelevel random; MODEL: %WITHIN% self; %BETWEEN beep% SELF; %BETWEEN ID% SELF; ERROR Clusters for BEEP with the same IDs have been found in different clusters for ID. These clusters must have different IDs because clusters for BEEP are not allowed to appear in more than one cluster for ID. Check that the cluster variables are specified in the right order. Alternately, create unique IDs for BEEP in DEFINE based on its original IDs and a multiple of ID. I multiplied beep with ID, it runs, but I think it is a bit strange to do, and it gives me a between beep variance for SELF of 0 which is odd. Is there something changed in how Mplus handles this between version 7 and 8? Is creating unique ID's for Beep indeed the way to go? Maurits 


We go back and forth about what to do regarding that issue and indeed in V8.1 we have changed to the more restrictive setup, but in the next version we will go back to what we had in V7. All you need to do is add this command DEFINE: BEEP=BEEP+100000*ID; to create unique BEEP values for each level 3 cluster. 

Jim Sloane posted on Thursday, August 30, 2018  2:10 pm



Hello. I am trying to fit a multilevel growth model using ANALYSIS: TYPE = THREELEVEL RANDOM. I have data for test scores over time nested within students (L2) nested within schools (L3). Let's say it's the simplest case with a test score variable, "score", and a time variable, "t". I want to estimate the model such that there's a random slope on time at both the student and school levels. However, I'm having trouble doing it in Mplus. My two specific question are: 1. For this scenario, under Variable:, do I list score and t as (a) WITHIN = score t, (b) BETWEEN = score t, (c) both, or (d) neither? 2. How do I specify something like s1  score t s2  score t to get slopes at both the student and school levels without getting an error about duplication of terms? Thank you, and apologies if this is all spelled out somewhere already! 


1. (d) Neither  which mean variation exists on all 3 levels. 2. You say Within = t and then you say s  score on t; and then mention s on the 2 higher levels. Also see V8 UG ex 9.20 and later examples for variations on this theme. See also out Short Course handout and video for Topic 10 on our website where 3level modeling is discussed. 

Jim Sloane posted on Friday, August 31, 2018  9:53 am



Thanks very much! So, something like: Variable: WITHIN = t t2 ; CLUSTER = schid id; ANALYSIS: TYPE = THREELEVEL RANDOM; MODEL: %WITHIN% s1  score ON t t2; %BETWEEN id% s1; score WITH s1; %BETWEEN schid% s1; score WITH s1; 


Right, but you can't say s1  score ON t t2; because s1 refers to one slope, not several. So say s1  score ON t; s2  score ON t2; etc 

Javed Ashraf posted on Wednesday, September 05, 2018  10:34 am



Hi I had a query that can we conduct three level multilevel mediation modelling using categorical observed variables and between and within subject groups in latent variables context using Mplus version 8.1. Is there any solution possible if we don't use the sample weights or employ complex survey methodology in the given scenario. Best Regards Javed 


Yes, Mplus does 3level with categorical variables using Bayesian estimation  see UG ex 9.21. 

Zhi Ye posted on Sunday, September 16, 2018  6:25 pm



Dear Dr. Muthen, I am running a threelevel interaction model according to example 9.20 in the UG as following: WITHIN= skill1 skill2 ; BETWEEN = (classid) prosocial1 prosocial2 (schoolid)climate1 climate2 ; DEFINE: CENTER skill1 skill2 (GRANDMEAN); ANALYSIS: estimator=ML; TYPE = threelevel Random; ALGORITHM=INTEGRATION; MODEL: %WITHIN% skill by skill1 skill2 ; PVW by PV1 PV2 ; s1  PVW on skill; %BETWEEN classid% prosocial by prosocial1 prosocial2; PVB1 by PV1 PV2 ; s2  PVB1 on prosocial; s12  s1 on prosocial; PVB1 with s1; %BETWEEN schoolid% PVB2 by PV1 PV2 ; climate by climate1 climate2 ; PVB2 on climate; s1 on climate; s2 on climate; s12 on climate; PVB2 with s1 s2 s12; s1 with s2 s12; s2 with s12; OUTPUT: TECH1 TECH8; However, there is an error showed that: *** ERROR in MODEL command The following random slope is not allowed for TYPE=THREELEVEL. Problem with: S2  PVB1 ON PROSOCIAL The following random slope is not allowed for TYPE=THREELEVEL. Problem with: S12  S1 ON PROSOCIAL Could you please help me to fix the problem？ Thank you so much! 


Please send your output to Support along with your license number. 

Back to top 