Message/Author 

chera posted on Wednesday, October 18, 2000  9:29 am



I am attempting to figure out how to create a structural model with a one factor twolevel independent variable predicting a onelevel continuous dependent variable. Is this possible to do? Should I just build the dependent variable on the within side alone? 


Let me restate your question to be sure that I understand it. You have a factor on the cluster (between) level and you want to relate it to an observed variable on the individual (within) level. If this is what you mean, you would state this in the between part of the model as y ON f, for example. Every observed variable on the within level has a between level counterpart that is automatically created by Mplus. 

Anonymous posted on Saturday, January 06, 2001  11:40 am



I indadvertantly posted this question already under Hierarchichal Regression, but it really belongs under this heading. I will repeat it here: If I have found a twofactor model with an EFA and have found it not to converge as a twofactor model in a multilevel framework, are there any cites that I could use to argue that the twofactor model found in the EFA is an artifact of the nested nature of the data? 


I don't know of any cites related to this. I would first get the multilevel factor analysis model to converge starting with within level and then adding the between. It is known that there are sometimes less factors found on the between level than on the within level. See references in Muthen (1989, Psychometrika) on the website. It is certainly possible that one factor on the between level and two factors on the within level could give rise to the wellfitting twofactor EFA that you found. It is also possible that there are twofactors on the between and one on the within. You should probably work more with the multilevel model before you draw any conclusions about the artifactual results from the EFA. 

Anonymous posted on Friday, February 02, 2001  12:32 am



Is it possible to estimate a multilevel model without the option that every observed variable on the within level has a between level counterpart, which is automatically created by Mplus. 


No, this is not possible. However, if the variable is assumed to have no relationship with another variable on the between level, this can be specified in the between model. 

Anonymous posted on Monday, August 27, 2001  1:42 pm



Are there technical, practical, or computational reasons why Mplus only allows for the calculation of 2level HLMs ? Do you plan on allowing for 3level HLMs in the future ? 


No. We do plan to add 3level in the future. It is among many planned additions. 

Anonymous posted on Tuesday, November 13, 2001  6:09 pm



Hello  I am learning Mplus so that I can estimate some multilevel path models, but I'm afraid I've gotten confused. In a standard mixed regression model, you can estimate a level1 regression where x_1 and x_2 predict y, and it is possible to get random components for the intercept and both regression parameters over level2 units. However, as best as I can tell, in Mplus it is only possible to get a random intercept but NOT random slopes in the same situation. Is there a straightforward way to understand why this is so? Is the answer to this the same reason why in a multilevel CFA in Mplus you can only get random intercepts in the indicators but the factor loading matrices are forced to be invariant across level2 units? Thank you very much. 


You are correct that random slopes are not part of the Mplus multilevel model for crosssectional data. Latent variable modeling has traditionally considered mean and covariance structure models. With random slopes, there is no one covariance structure, but the covariance structure changes for each covariate value. See, for example, the Raudenbush chapter in the Collins, Sayer book. In Version 3 of Mplus, random slopes for observed covariates will be included. 

Anonymous posted on Sunday, March 10, 2002  12:13 pm



In the Step 4 (estimation of between structure) of the multilevel CFA model building procedure described in Muthen (1994) Sociological Methods and Research article, I am running into the following problem: *** FATAL ERROR THE SAMPLE COVARIANCE MATRIX COULD NOT BE INVERTED.THIS CAN OCCUR IF A VARIABLE HAS NO VARIATION, OR IF TWO VARIABLES ARE PERFECTLY CORRELATED, OR IF THE NUMBER OF OBSERVATIONS IS NOT GREATER THAN THE NUMBER OF VARIABLES. CHECK YOUR DATA. THIS PROBLEM IS DUE TO: VAR11 How can I understand which of these is causing the real problem? If the problem is due to only one variable as suggested, does that mean that variable has no variance in the Sb matrix. When I checked the ICC of that items it is not very small (relative to other items in the analysis). And, is it possible to use the Sb matrix in an exploratory factor analysis in Mplus to get an idea of the factor structure in the between level? Your help is much appreciated. 


As stated in the article, this is a common problem. Are you analyzing Sb or SigmaB. You will probably have the same problems with both but SigmaB is recommended. You can save this using SAVEDATA: FILE (SIGB) IS filename; The covariance matrix is saved by default. You can also save the correlation matrix in a separate run by stating FILE (SIGB) is filename; TYPE=CORRELATION; You can see if any variables have zero variances by looking at the diagonal of the covariance matrix. You can see if any variables have correlations of one by looking at the correlation matrix. The sample size is the number of clusters. If you have more variables than the number of clusters, then you violate the last warning. You can use the SigmaB correlation matrix in EFA with the ULS estimator. This is the default estimator. 

Anonymous posted on Monday, March 11, 2002  12:15 pm



Thank you very much for your reply. I am using the SIGB matrix. None of the variances have zero variance, although some of them is very close. The problem seems to be some of the correlations that are larger than 1.0. I am guessing these correlations are caused by the low item variances. Does this simply mean that there is not enough variance in the group level to model? The EFA output says: THE INPUT SAMPLE CORRELATION MATRIX IS NOT POSITIVE DEFINITE. THE ESTIMATES GIVEN BELOW ARE STILL VALID. I am not sure I understand why the the estimates are still valid even though the matrix is positive definite. Can I legitimatly report these estimates in a manuscript? Is there any literature that explains why these estimates are considered valid? Thank you again for your help. 


Correlations greater than one means that the matrix is not positive definite which is a common problem with the estimated sigma between matrix as is mentioned in step 4 of the paper. It does not mean that there is low variance on the group level, but simply that the sigma between matrix is not wellestimated. EFA estimation using ULS does not depend on the correlation matrix being positive definite. This is just an informational warning. However, in your case with correlations greater than one, I would not trust the results. You may instead want to use the second alternative mentioned in step 4, to analyze the sample between matrix using ULS as an approximation to analyzing the sigma between matrix. You can use these results in the multilevel model. 

Anonymous posted on Monday, November 11, 2002  9:17 am



I'm student who studing multilevel model. I'm finding reference about multilevel SEM analysis exept mplus manual. Your help is much appreciated. 


Following are two basic multilevel references: Raudenbush, S.W. & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods. Second edition. Newbury Park, CA: Sage Publications. Snijders, T. & Bosker, R. (1999). Multilevel analysis. An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage Publications. Following is a reference that uses Mplus in the analysis: Heck, R. (2001). Multilevel modeling with SEM. In G.A. Marcoulides & R.E. Schumacker (eds.), New Developments and Techniques in Structural Equation Modeling (pp. 89127). Lawrence Erlbaum Associates. You can find other multilevel references at www.statmodel.com under References. 

Yifu Chen posted on Tuesday, April 01, 2003  7:51 am



Hi, Dr. Muthen, We are trying to run a twolevel SEM model and encounter problems. We have 320 subjects nested within 9 counties. Five counties are in intervention group and four are in control group. (This means the treatment is in the county level). We now want to run a model with three latent constructs. Two latent constructs have four indicators each and one latent construct (intervention) has only one indicator. Here we treat intervention as a betweenlevel variable, so we run the twolevel model like this: Between intven>Eta2 Eta1^ Within Eta1>Eta2 Is this a right way to run this model? Besides, we have a little confuse about the sample size for the betweenlevel. For our case, is it right to say that we have 9 cases in the betweenlevel? Or the model presented in the betweenlevel is only the result of adjusting cluster effect. Thank you for your help! 

Yifu Chen posted on Wednesday, April 02, 2003  7:56 am



Hi, Dr. Muthen, This is a follow up question. We try the model using complex sample. We have 9 clusters in the sample. In the model there are 11 observed variables and then we got the following message: *** FATAL ERROR THE SAMPLE BETWEEN COVARIANCE MATRIX COULD NOT BE INVERTED. THIS CAN OCCUR IF A VARIABLE HAS NO VARIATION, OR IF TWO VARIABLES ARE PERFECTLY CORRELATED, OR IF THE NUMBER OF CLUSTERS IS NOT GREATER THAN THE NUMBER OF VARIABLES. CHECK YOUR DATA. THE PROBLEM IS DUE TO: NUMBER OF VARIABLES : 11 NUMBER OF CLUSTERS : 9 So, if we understand correctly, the error is because we have more number of variables than the number of clusters. Does this mean that when running the complex sample model, we should have more number of clusters than number of variables? Do you have any suggestions for dealing with multilevel issue when the number of cluster is small? Thanks 


I am asking someone with experience with a small number of clusters to answer your question. Less than 20 clusters makes the statistical analysis difficult. 

booil jo posted on Thursday, April 03, 2003  8:50 am



Regarding Yifu Chen on Tuesday, April 01, 2003 I think your model setup is correct given your cluster randomized trial situation and your research question. However, in your situation with only 9 clusters, I don't think it is a good idea to rely on nonparametric standard errors provided when COMPLEX command is used. Although simple, the sandwich estimator is known to yield anticonservative coverage probability (i.e., type I error rate higher than the nominal rate) with small numbers of clusters. If the number of clusters per condition is less than 10 (you only have 5 and 4 in each condition), the resulting sandwich estimator is very unreliable. See, for example, Jo et al. (2002) and Murray et al. (1998). To counter this limitation, several methods such as jacknife sandwich estimates (MacKinnon & White, 1985), adjustment using the tdistribution (Thornquist & Anderson, 1992), and adjustment considering the variance of the sandwich estimate (Kauermann & Carroll, 2001) have been suggested. As far as I know, modification procedures to counter anticonservative sandwich estimates are not embedded in the current version of Mplus. I wonder if the conventional model based methods such as mixed effect ANOVA would do any better than the sandwich method in your situation. Another (even simpler) way to deal with this problem will be to treat the clusters as fixed (i.e., dummy covariates) and do regular fixed effect regression analysis. However, the results will be valid only under a strong assumption that the nesting structure is completely explained by these dummy covariates. For more explanation about the disadvantage of this regular regression approach, see Chapter 4 of Snijders & Bosker (1999). REFERENCES Jo, B., Muthén, B., Ialongo, N.S., & Brown, C.H. (2002). Cluster randomized trials with nonadherence. Submitted for publication. Can be downloaded from Mplus website. Kauermann, G., & Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation. Journal of the American Statistical Association, 96, 13871396. MacKinnon, J. G., & White. H. (1985). Some heteroscedasticityconsistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics, 29, 305325. Murray, D. M., Hannan, P. J., Wolfinger, R. D., Baker, W. L., & Dwyer, J.H. (1998). Analysis of data from grouprandomized trials with repeat observations on the same groups. Statistics in Medicine, 17, 15811600. Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Thousand Oaks, CA: Sage. Thornquist, M. D., & Anderson, G. L. (1992). Small sample properties of generalized estimating equations in grouprandomized designs with Gaussian response. Paper presented at the annual meeting of American Public Health Association. Washington, D. C. 

Anonymous posted on Monday, April 28, 2003  5:52 pm



I am attempting to build a 2level structural model (n=762, number of clusters=108). i am trying to follow the 4 steps given in muthen 1994 (multilevel covariance structure analysis). i am having a difficult time understanding how to accomplish the third step, in which one estimates the pooled withingroup covariance matrix (with a sample size of total n minus the number of groups). can you give some guidance on how to accomplish this in mplus? thank you. 


In the SAVEDATA command, the FILE (SAMPLE) option will save the pooledwithin covariance matrix. See page 90 of the Mplus User's Guide. 

Janet Holt posted on Thursday, February 19, 2004  8:44 am



In constructing a multilevel model in MPLUS. I understand that random slopes need to be fixed in MPLUS. However, is it possible to model a crosslevel interaction even with fixed slopes. In HLM this would be comparable to a gamma11 with no error term (u1j) in the equation. 


Random slopes are allowed in Mplus. This is discussed in the Addendum to the Mplus User's Guide which can be found at www.statmodel.com under Product Support. 


I am developing a model of the relationship of executive functioning between parents and children. I want to do this in a latent variable framework, as I have a number of tests of executive functioning. So I was hoping to have a model where the latent variables executive functioning of the mom and executive functioning of the dad are regressed on executive functioning of the child,with Efmom and Efdad indicated by the tests measuring executive functioning in the moms and dads and Efchild indicated by the tests measuring executive function in the children. The problem I’m running into is that I have siblings in the sample, so when I develop the SEM like this, I am actually duplicating parents in the parent side of the model. This doesn’t seem right to me, but I can’t quite figure out how to put this into a modeling framework that makes sense. Thanks for your help, Jennie 

bmuthen posted on Tuesday, December 07, 2004  11:31 am



You can handle this in 3 ways. First, you can use type = complex with cluster=family to get the right SEs and chisquare taking the correlations within family into account. Second, you can do 2level modeling with cluster=family, where family variables go on level 2 (between). Third, you can do multivariate modeling of all siblings jointly  see the KhooMuthen paper on the Mplus web site. 


First I wanted to look at a CFA of this, so I tried the 2level modeling using this syntax: CLUSTER IS family; BETWEEN = read56m name56m read56f name56f WCST56mo trlBresm towerfa towermo stopfa stopmo WCST56fa trlBresf; WITHIN = word45rs colr45rs wcst45rs toh45rs ssrt45 trail45r; ANALYSIS: TYPE = twolevel; MODEL: %BETWEEN% speedmo by read56m name56m trala56m ; speedfa by read56f name56f trala56f; EFmom by towermo stopmo WCST56mo trlBresm; EFdad by towerfa stopfa WCST56fa trlBresf ; %WITHIN% execfunc by toh45rs SSRT45 wcst45rs trail45r ; speed by word45rs colr45rs ; where all the variables with m or mo at tne end are mother variables, those with f or fa at the end are father variables and the rest are child variables. When I run this, I get the result that the intraclass correlations for all of the child variables are all 0.000, although when I look at these with SAS proc mixed, they are not zero. So I assume I'm doing something wrong in my setup here? Second, I want to look at the mother and father (between variables), EFmom and EFdad, predicting the child (within variable) Execfunc. But I can't see how to do this in the model, because I need to specify between or within and this is both. How can I set this up properly? Thanks, Jennie 

bmuthen posted on Saturday, December 11, 2004  6:00 pm



If you want betweenlevel variation of the child variables  and hence get a nonzero intraclass correlation  you should not put these variables of the Within list because that says they have zero between variance. You may then also add a betweenlevel version of the exefunc factor: exefuncb by toh45rscolr45rs; where you may find that you need to fix the residual variances at zero. You can then add the between level statement: exefuncb on efmom efdad; 

Anonymous posted on Tuesday, February 15, 2005  6:36 am



I am trying to save the within correlation matrix of a 2level CFA. I am using this syntax: SAVEDATA: SAMPLE IS filename.dat; TYPE IS correlation; It does generate a datafile, but it does not contain any data/correlation matrix (0 KB). The syntax mentioned above (FILE (SIGB) is filename; TYPE=CORRELATION;) does not work anymore. Also, the empty data file is saved in the WINDOWS registry, how do I specify a path? What am I doing wrong? ;) Thank you for any information. 


The best thing to do with a question like this is to send your output and data to support@statmodel.com. 

Anonymous posted on Thursday, April 28, 2005  12:51 pm



Hello, My collegues and I are working on a multilevel, multi group analysis trying to confirm a specific model. I was looking at your document '6 steps for TwoLevel SEM' and was wondering if we should be updating our model as we proceed through the steps even if we are doing a confirmatory type of analyses. Thank you for your help. 

bmuthen posted on Thursday, April 28, 2005  6:33 pm



Tough question. Seems like the 6 steps are exploratory in nature  otherwise you would simply go straight to the last (confirmatory) step. 

Anonymous posted on Thursday, May 05, 2005  6:34 am



Can MPlus do a threelevel longitudinal model with crossclassifications of level 2 units at level 3? (time is level 1, student level 2, teacher level 3  students change teachers) 


Yes, Mplus can have three levels when one of them it time. 

bmuthen posted on Saturday, May 07, 2005  12:00 pm



Crossclassified random effects modeling is not yet available in Mplus. 


Hello, I have problems with getting a considerable fit with my data. I am testing a path model where I take into account the fact that my data are clustered so I use the 'TYPE= complex' command to get accurate SE. However, when I test the same path model without the 'TYPEcomplex' command the fit (CFI; TLI and RMSEA) is much better. The modification indexes do not give me good suggestions to improve my model. At this moment I have a CFI: 0.819, TLI: 0.764 and RMSEA 0.182. According to the rules of thumb these values are not good enough. Is it possible that merely the specification of the clustered data is responsible for the lowering fit? And do you have any suggestions (besides looking at the MI because these do not help) how to improve the fit? Thank you 

bmuthen posted on Tuesday, May 10, 2005  5:53 am



More information is needed to answer this. Typically, taking clustering into account (using type=complex) lowers the chisquare value in the test of model fit, at least if you have substantial intraclass correlations. What were your chisquare values without type=complex and with it? And what were your CFI, TLI, and RMSEA values without type=complex? 


With taking clustering into account: CFI=0.735 TLI=0.706 RMSEA=0.125 WRMR=2.323 Chi square model fit=219.575 df=9 Chi square model fit for the baseline model:804.756 df=10 Without taking clustering into account: CFI=0.821 TLI=0.742 RMSEA=0.190 WRMR=4.168 Chi square model fit=990.507 df=18 Chi square model fit for the baseline model:5473.1354 df=26 I have continuous as well as categorical dependent variables. The estimator is WLSMV and I used theta parameterization. Do the estimator or the type of parameterization have something to do with the poor fit? Are these fit indices biased or is my model specified incorrectly (but as i said the MI do not give meaningfull indications what can be altered)? Thank you! 

bmuthen posted on Tuesday, May 10, 2005  5:21 pm



It looks to me like the fit is not good with or without taking clustering into account. I think the CFI should be at least 0.96 and the RMSEA less than 0.05, for example. I would try to revise the model. But you claim that MIs don't help. You say you have a path model  perhaps you could make that justidentified by including all paths to see where your model goes wrong. 

Anonymous posted on Saturday, August 06, 2005  9:07 pm



I am trying to replicate the Step 0 or basic.inp for example 9.8 using the Six Steps for TwoLevel SEM. When I run the following syntax: TITLE: test DATA: FILE IS ex9.8.dat; VARIABLE: NAMES ARE y1y6 x1 x2 w clus; USEVARIABLES ARE y1y6 ; CLUSTER = clus; ANALYSIS: TYPE = TWOLEVEL BASIC; SAVEDATA: SAMPLE = spw.dat; SIGB = estsigb.dat; TYPE = CORR; I get the following error message: *** WARNING in Savedata command (Err#: 9) Error opening SAMPLE save file: spw.dat SAMPLE will not be saved. *** WARNING in Savedata command (Err#: 9) Error opening SIGB save file: estsigb.dat SIGB will not be saved. 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS Do you know why I would receive this error message? Thanks for your help. 

bmuthen posted on Monday, August 08, 2005  1:20 pm



Please send this to support@statmodel.com. 

Mpduser1 posted on Sunday, November 06, 2005  6:31 pm



I had a question about the default error structure used in Mplus 3.13. I am attempting to build a model of the following form: x > y1 x > y2 x > y3 (x, y1, y2, y3) > z 1. y1, y2, y3, and z all have random intercpts (with sufficient betweenlevel variation); 2. I specify the "x > y" coefficients as random (and I have sufficient between level random variation); 3. I specify the "x > y" coefficients as random (and I have sufficient betweenlevel variation). If I include no special statements in Mplus, Mplus assumes that the random coefficients for the random y's intercepts and the "x > y" coefficents are correlated. But I generally have problems estimating the model if I assume that the residual variances for the "x > y" and the "x > z" random coefficients are correlated, and thus have to use "@0" to manually restrict these paramters' covariances. Shouldn't Mplus assume that the "x > y" and the "x > z" random coefficients' error structures are uncorrelated ? On the one hand, I could see making an argument that these terms should be correlated (for example, if I was to write the above model out longhand), but I can rarely get such models to converge in practice. 

BMuthen posted on Saturday, November 12, 2005  6:00 pm



The Mplus defaults can be overridden by using @0 as you say. It is sometimes the case that it is difficult to estimate a model where the covariance matrix for the random effects has many free elements. 

mpduser1 posted on Tuesday, March 14, 2006  8:35 am



I have a pair of dummy categorical variables I wish to use as predictors in a series of HLM and OLS models. The predictors are coded as "0,1" (i.e., 0=male, 1=female) in my sample. Does Mplus retain the dummy 0/1 coding, or use a 1/2, contrast, (or other) coding scheme ? 


Mplus retains the coding of predictors. 

Ramin Azad posted on Monday, July 17, 2006  12:48 am



I am wondering how we can compute the figues below from a SEM when we see that, for example, the structural model explains 41.9%, 31.9%, and 34.1%, respectively, of the variation in X,Y,Z. I mean how can we get these figuers from a SEM? 


Rsquare is the explained variance divided by the total variance as in regular regression. 

Ramin Azad posted on Saturday, October 21, 2006  8:40 am



Hi I have some questions: 1) I have to compute a question which asks the respondents to give the number of new ideas that had been adopted by the organization in a period of time. Different firms have given different responses. for example, zero, 7, 8, five to ten, etc. What is the name of this scale? 2) When I want to find out the impact of a fivepoint Likert scale on the above scale, can I use a Regression? if not, what should I do? Please accept my thanks in advance. Hamid 

Ramin Azad posted on Saturday, October 21, 2006  8:43 am



Hi I have another question.How can I multiple R? It means that I have to use correlation in power 2? Then what is the difference between that and R2? Thanks Hamid 


If the variable is scored as the number of ideas and is normally distributed, it can be treated as continuous. If you use categories like 5  10, then it would be a Likert scale. You can regress the variable on a Likert scale. Rsquare is not multiple R but I'm not sure what multiple R is. You should look it up in a textbook. 

Ramin Azad posted on Sunday, October 22, 2006  2:07 am



Hi Linda Thank you for your help. If the variable which is scored as the number of ideas and is normally distributed, and I treat it as continuous variable,then (1)can I regress it to find out the impact of a fivepoint Likert scale on this variable? Or (2) should I myself categorize the responses in let's say five categories and then regress it? (3) Which one is correct? (4) Should I use Hierarchical regression? Thank you so much 


If you regress a continuous variable on a Likert scaled variable, the Likert scaled variable is treated as a continuous variable. An alternative is to create a set of four dummy variables and use them as covariates. That would have to be your decision. Regular regression is sufficient. 


I have a question about crosslevel interaction. I have a study design similar to the user's guide example 9.2: two level regression analysis for a categorical depentende variable. Is it possible to include crosslevel interaction in this model? Thank you! 


The model includes a crosslevel interaction. It is the random slope. 


I am not sure if I am asking something too basic about statistics or I didn't phrase my question correctly. I mean, more specifically, I was wondering if I could interact individual's race (level 1 var) and community’s SES status (level 2 var) in order to see the different effect of community's SES by ethnicity on the outcome? Or is this theoretically, or methodologically irrelevant? Thank you! 


Look at example 9.2 Race (ethnicity) is x and community SES is w. If you have a random slope on level 1 for y on x, this means that the model includes the crosslevel interaction term x*w (see multilevel text books on crosslevel interactions). So it sounds to me that you want to do exactly what ex9.2 does. 


Thank you for your response, yes, it makes perfect sense, but I'm not sure what happens if I have several community level variables and individual variables, out of which I am only interested in the interaction between Race and community SES. Am I able to model that way?? Maybe not in a random effect model, but in a random intercept model? Thank you! 


If you specify y ON x; you get a random intercept and a fixed slope. If you specify s  y ON x; you get a random intercept and a random slope. 

Ramin Azad posted on Sunday, January 07, 2007  11:32 am



Hi I am wondering if it is possible that the results of regression are different from SEM? if so, what are the reasons? For example, I have found insignificant relationships, whereas, SEM shows a very strong significant with the same data!!!! Kind regards Ramin 


I would have to see the models you are comparing to answer this. Depending on the model, you might see different results. Please send your outputs and license number to support@statmodel.com. 

Ramin Azad posted on Tuesday, January 09, 2007  10:25 am



Hi Thank you for your reply. I got my funny mistake.However, I have another question. How can I find R2 in the AMOS diagram or its results? Kind regards Ramin 


This is a discussion board for Mplus. I would have no idea how to find something in Amos. You should contact Amos support. 

Mike Tobak posted on Sunday, June 10, 2007  9:15 am



Hi, Prof. Muthen, I have several questions about multilevel path analysis and Mplus. I am new to this field and Mplus. I am trying to analyze a twolevel path analysis model with random slopes, and binary level2 covariates. I have 20 path coefficients to estimate for within model and I have 30 clusters. Q1: I wonder how many random slopes I can estimate at level2 if I only have 30 clusters. It seemed that I cannot set all of the 20 path coefficients to be random, since I only have 30 clusters. I want to use TYPE=TWOLEVEL. Q2: Could you please tell me in which of your articles I can find the mathematics, derivations and related algorithms to this specific model? I am new to multilevel SEM, and I found piles of papers from the website (kind of lost). I wanted to start with the key paper for multilevel path analysis (no latent variables) with random slopes. Thank you for your time! 


Q1. Each random slope is one dimension of integration so estimating more than 4 becomes computationally very heavy. In our experience, slopes are most often not random. As a first step, you might consider looking at each regression in your path model separately to determine which slopes are random. Q2. I would start with looking at multilevel regression in, for example, Raudenbush and Bryk and path analysis in, for example, Bollen. Following are three relevant articles: Bauer et al. (2006). Psych Methods, 11, 142163. Kenny et al. (2003). Psych Methods, 8, 115128. Krull et al. Multivariate Behavioral Research, 36, 249277. 

Mike Tobak posted on Wednesday, June 13, 2007  6:24 pm



Thank you! I will read the articles carecfully! 

Mike Tobak posted on Thursday, June 28, 2007  12:40 pm



Hi, Prof.Muthen, Thank you for your recommendations. I have read the articles and books. I wonder if you can tell me, besides Mplus User's Guide, what is the key article of Mplus talking about including random slopes in a multilevel SEM. I know that the maximum likelihood with numerical integration is used in Mplus, but I want to know more details and mathematics behind it. And it seems that Mplus user's guide didn't provide too many details in this particular field regarding how to obtain the estimations of multilevel SEM with random slopes. Thank you for your time and help!! 


We don't have a paper on random slopes per se, but this topic is included in the technical details of the MuthenAsparouhov (2006) chapter for the forthcoming ChapmanHall book, which is on our web site under Papers, within the growth mixture topic. 


I have a fairly simple SEM model based on a finite sample size. That is the sample is students and I probably have over 10% of the them in the sample. Given the problems that a finite sample can have on estimating standard errors and fit statistics, I am wondering if the Type = Complex function will correct for this. My thinking is that the illeffects of the finite sample will show up in the estimation of the standard errors (and fit statistics) due to the potential nonindependence of the respondents. If I test for both the design effect and the intraclass correlation coefficient (Hech 2001; Muthén and Satorra 1995) and they show a high degree of nonindependence, then type = complex should work so long as I have reasonable clusters. Does that seem sound to you? 


Mplus does not do anything special for finite samples. Using TYPE=COMPLEX would take the nonindependence of observations into account. 

Joyce Kwan posted on Monday, May 05, 2008  3:39 am



Hi Dr Muthen, I have a sample of 482 (with 19 clusters, average cluster size = 25). I have 38 observed variables for 7 latent factors. I wonder if it is appropriate for me to fit a multilevel CFA model because as I run the model, I encountered error message as followings, THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. A MATRIX COULD NOT BE INVERTED DURING THE H1 MODEL ESTIMATION. THE ESTIMATED BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE.COMPUTATION COULD NOT BE COMPLETED.THE VARIANCE OF N4 APPROACHES 0.FIX THIS VARIANCE AND THE CORRESPONDING COVARIANCES TO 0, DECREASE THE MINIMUM VARIANCE, OR SPECIFY THE VARIABLE AS A WITHIN VARIABLE. THE H1 MODEL ESTIMATION DID NOT CONVERGE.SAMPLE STATISTICS COULD NOT BE COMPUTED. Is it because the number of cluster is too small so I encountered problem when I run the analysis? or anything else suggested by the error message? Thanks. 


This may be because the number of your clusters is small. A minimum of 3050 is recommended. When Version 5.1 is made available later today or early tomorrow, I suggest trying to run the analysis using Version 5.1. 

Joyce Kwan posted on Friday, May 09, 2008  1:11 am



Dear Dr Muthen, Thanks for your answering. Some followup questions regarding to my previous question. What is the sample size requirement for doing a multilevel CFA? You suggested me to have a minimum of 3050 number of clusters. How about the cluster size? Any literature related to sample size requirement of multilevel modeling/multilevel CFA I can make reference to? Thanks. 


The necessary sample size depends on many factors and is best determined by doing a Monte Carlo simulation study. Joop Hox has written a lot about the number of clusters needed and cluster size. I would search for articles by him. 


Hello, I'm doing logistic regression analyses in mplus with multiple groups. To say something about the change in the amount of variance explained, R2change (R2model2 minus R2model1), between two logistic regression analyses, can I simply subtract the r2scores (all predictorsjust demographic predictors) from 2 different analyses predicting the same outcome? From other postings, it seems like I can't. If not, how would I answer such a question? Thank You very much in advance. 


I don't think Rsquare for logistic regression is accepted in all circles and I don't know of a test of the difference in two Rsquare values. I would try to answer my question in a different way. See the logistic regression literature to see how others do this where I think they work with likelihood ratio differences. 


Hello, I am trying to predict my dependent variable with race, poverty, and vars'15. Poverty & var5 are level 2 & all others are level 1. I am testing for a crosslevel interaction between race and poverty (thus I indicate a random slope/fixed intercept). I'm unsure about how to include the other variables in the program since they should NOT have random slopes. Thanks for your help in advance. Analysis: Type=Random Twolevel; Model: %WITHIN% s  dep on race; dep ON var1 var2 var3 var4; %BETWEEN% dep s ON poverty; dep ON var5; Thanks, Tucker 


This looks correct. You use the  symbol for random effects. Without the  symbol, fixed effects are estimated. 

John Hipp posted on Friday, June 06, 2008  3:58 pm



Hi this is probably a simple question, but I did not see the answer above. I'm trying to run a multilevel model where an individuallevel measure affects y at the individual level, but its neighborhoodlevel equivalent (as a latent variable, not a summation) does also. If the data were swung wide, a simple version of my model would be (if there were four people in every neighborhood): fy by y1 y2 y3 y4; fx by x1 x2 x3 x4; fy on fx; y1 on x1; y2 on x2; y3 on x3; y4 on x4; What would my code look like if I used MPlus's multilevel capability? My guess is something like: %WITHIN% y on x; x on ; %BETWEEN% y on x; I'm guessing that regressing x on an intercept would give a random version of x? And I would not declare x as either a within or between variable. Is this giving me the model I want to estimate? thanks much. 


You just need %WITHIN% y on x; %BETWEEN% y on x; Here, x gets decomposed into latent within and between parts if it isn't on either the Between= list or the Within=list. See the V5 UG ex 9.1, second part on pages 230231. 


I am using Mplus to test multilevel mediation models. I am interested in testing relations among a binary treatment condition variable (manipulated between subjects), a continuous mediator(assessed repeatedly within subjects on a variable occasion schedule), and a categorical betweensubjects treatment outcome (smoking vs. abstinent). I can estimate the treatment effects on the mediator in a twolevel Mplus model in which the intercept and slope are random, but I cannot use the estimated intercept or slope variables as a predictor of a categorical outcome between subjects. The outcome is treated as an observed variable (a knownclass variable), not a latent variable. Do I need to add the mixture model program to my base and multilevel program in order to fit such a model, or is there a way I can do this wtihin the multilevel Mplus program? I have opted not to use a growth curve model due to the high number of withinsubjects observations (around 50). Thank you in advance for your advice. 


Let me see if I interpret this correctly. It sounds like you consider 2level data where level 1 is occasion and level 2 is individual  you do this as a 2level, univ (actually bivariate) outcome model rather than as a 1level multivariate outcome model due to having many time points. The ultimate outcome is a binary variable (smoking or not). It sounds like you are not considering a growth model but a path analysis model (tx > mediator > outcome). The way I hear you the random intercept and slope that you refer to are for the mediator regression on tx. If I am right so far, then you can specify a random intercept also for the binary outcome and on level 2 let that be predicted by the intercept and slope from the mediator regression. You don't need mixture. 


Thank you very much for your response. I failed to note that the model does include a growth component. I've pasted the syntax and error messages below. The variables are defined as follows: couns is the binary indicator of treatment group, reltime is the number of days since the stopsmoking day, ability is the mediator assessed repeatedly within subjects, and abst is the binary outcome at the end of treatment (and is not time varying). The model below does not work, but if I substitute a continous outcome for abst or drop the CATEGORICAL statement the model runs and converges. Does this additional information change your advice at all? Thank you for your guidance! CATEGORICAL = abst; WITHIN = reltime; BETWEEN = couns abst; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% slope  ability ON reltime; %BETWEEN% ability slope ON couns; abst ON couns ability slope; ERROR in MODEL command Observed variable on the righthand side of a betweenlevel ON statement must be a BETWEEN variable. Problem with: ABILITY ERROR The following MODEL statements are ignored: * Statements in the BETWEEN level: ABST ON ABILITY 


Please send your files and license number to support@statmodel.com. 


Dear Statmodelers My model tries to explain the extent to which groups of individuals can fix errors in files. Each group is given one file and has a week to fix the errors in it. In multilevel terms, the file is at Level 2 and the individual members are at Level 1. I am interested in the extent to which individuals in the groups identify errors and fix them. (Note that in my context, the individuals typically make copies of the file and distribute it to each other and then compile their work). After the time period elapses, each group will have a file (at Level 2) that has more or less errors than it did originally. My model has the form: Level 2 predictor:  Number of errors in file at T1 Level 1 endogenous variables:  Extent to which perceive errors  Extent to which take actions to fix errors  Extent of knowledge of file contents (moderator) Level 2 outcome:  Number of errors in file at T2 My model predicts that:  The # of errors (Level 2) affects individuals perceptions (Level 1), moderated by individual's knowledge (Level 1)  At Level 1, perception > action  The level 1 variable (action) affects the level 2 outcome (errors in file) I don't think I can do this in HLM. I wondered whether I might be able to it with MPLUS. I hope you can help. 


How many groups do you have? Does each group get the same files? 


Hi Linda  thanks for getting back to me so fast! About 100 groups of 4 per group. I suspect that this is small. However, I am at the planning stage of my research and can adjust the numbers up if necessary. Also, I think I solved one of the problems with my design. In my model, I had conceptualized a Level 1 factor moderating the effect of a Level 2 factor on a Level 1 factor. I had thought that this could not be done in HLM. However, I now think that this is the same as a Level 2 factor moderating the effect of a Level 1 factor on another Level 1 factor, which of course is perfectly feasible in multilevel modeling. I still have a remaining problem, however, in that I have an "upward" effect from a Level 1 factor to a Level 2 outcome. I think that is not possible to test this in HLM but I'm not sure about MPLUS. all the best, Andrew. 


Regarding your last paragraph, the way something like this can be accomplished in Mplus is that you use the random intercept (or mean) of the level 1 factor (which is assumed to have variation across level 2 as well) to predict the level 2 outcome. One remaining issue is that it seems that the file fixing task needs to be the same for all groups to do this 2level modeling. 

Tianyi Yu posted on Thursday, October 09, 2008  6:51 am



Hi, Dr. Muthen: I want to test a multilevel model in which the level 1 is a growth model. But at the level 2, I want to use the intercept and slope as predictors of the other outcome variables. So, the model looks like: %within% s  y on time; %between% [s y]; s with y; z on s; z on y; I got a warning message as: In the MODEL command, the following variable is an xvariable on the BETWEEN level and a yvariable on the WITHIN level. This variable will be treated as a yvariable on both levels. Is that still the case in Mplus version 5 that one variable can only be either xvariable or yvariable on both levels? If I really want to test such kind of model, is the latent growth model (SEM) the better (or the only way) to do it? Thanks so much! Tianyi 


Yes, a variable must be treated as dependent or independent at both levels. This is just a warning so you realize that distributional assumptions are being made about the variable. 


I am trying to design a study for use with Mplus. New user so please bear with me. I am attempting to run it as a multilevel SEM, but when I originally designed the study, I had many more clusters than it seems I now have to work with. Is there a minimum number of clusters required for Mplus to run the model? I know that with a a low level 2 sample size that I will have limited statistical power in my leveltwo analysis and that for all practical purposes, I may have to forget about statistical significance at leveltwo and instead focus on the magnitude of the relationships between latent variables. But my question is, can I use MPlus to run the analysis regardless if that's my preference, or will I have to find other software? Will I get an error message with too few clusters? Thanks in advance. 


It is recommended to have a minimum of 3050 clusters for multilevel analysis. This is not specific to Mplus. I think the error message you refer to says you have more parameters than clusters. If you have further questions about this, please send your full output and license number to support@statmodel.com. 


Thank you. I am not yet at the point of having data to run, but will send in the output if at that point I encounter a problem. Just to clarify, the error message for more parameters than clusters...which is a problem likely to arise with my small cluster number...will this message pop up INSTEAD of an output, basically keeping me from running the analysis, or will the program just warn me of the problems with an error message while still providing the output for review? I just want to make sure that I can use your program for my analysis at all. Thanks again for the help. 


You will be warned of the problem. The analysis will not stop. 


Hello Lynda Im trying to write syntax for a 211 mediation, with 3 separate factors (la, lm and lam) predicting t in turn predicting the latent variable eg (that is indicated by e1 e2 e3 pe and cy). I am obviously confused as I get the following output. Can you help please, thansk M VARIABLE: NAMES ARE y s a g t e1 e2 e3 la lm lam pe cy; WITHIN = y a t e1 e2 e3 pe cy; BETWEEN IS la lm lam; CLUSTER IS s; CENTERING = GRANDMEAN (y a g t e1 e2 e3 pe cy); ANALYSIS: TYPE IS TWOLEVEL RANDOM; MODEL: %WITHIN% t; eg BY e1 e2 e3 pe cy; eg on t(b); %BETWEEN% egb by e1 e2 e3 pe cy; t eg; t on lam (a); t on la; t on lm; egb on t(b); MODEL CONSTRAINT: NEW(indb); indb=a*b; MODEL CONSTRAINT: OUTPUT: TECH1 TECH8 CINTERVAL; example warnings*** ERROR in MODEL command Withinlevel variables cannot be used on the between level. Withinlevel variable used: E1 *** ERROR in MODEL command Withinlevel variables cannot be used on the between level. etc., The following MODEL statements are ignored: * Statements in the BETWEEN level: T EG EGB BY E1 EGB BY E2 EGB BY E3 EGB BY PE EGB BY CY EGB ON T T ON LAM T ON LA T ON LM 


Variables on the WITHIN list cannot be used in the between part of the model. If you remove them, they can then be used in both parts of the model. 

Utkun Ozdil posted on Wednesday, December 01, 2010  11:47 am



Hi Drs Muthen,, I've just begun learning MPlus to conduct multilevel structural modeling. Although I'm familiar with LISREL,, MPlus and multilevel modeling are quite new for me. Unfortunately,, I have to learn the two via books, articles, and manuals by myself... =( And I got confused at the very beginning as I read and read all these materials. On the course of learning I think I need some practical highlights about the framework of multilevel modeling or the stepbystep procedures to follow in data analysis (which would make me easily handle the core issues)... I would appreciate your recommendations. And I have one more question:::I have a large data set saved as .sav file in SPSS. Does MPlus allow exporting a file such as that? (like LISREL?) Thanks... Utkun 


If you go to the website, you will find our course videos and handouts. You might find these useful. Chapter 9 of the user's guide contains many multilevel examples. Mplus reads only numeric ASCII files. 

Utkun Ozdil posted on Thursday, February 10, 2011  5:11 am



Hi,, While I was watching the course videos about multilevel analysis I noticed that a variable (e.g. f1) is treated as an observed variable in the within part and the same variable is treated as a latent variable in the between part. I would appreciate if you explain the reason for that. Thanks... Utkun 


This latent variable decomposition is explained in the second part of Example 9.1. 


I have a longitudinal data set where time (5 time points) is nested in kids (3000) who are nested in classroom (150) which are nested in schools (64). Does Mplus6 allow me to run a 4level multilevel model? 


No, you could do 3 levels if one of them is time. 

Nidhi Kohli posted on Thursday, September 08, 2011  10:38 am



I am trying to fit an SEM model on a 3level nested dataset where n = 322. The dependent variable is unordered categorical variable with 3 categories: 0, 1 & 2. I ran the model in Mplus and got the following messages: THE ESTIMATED WITHIN COVARIANCE MATRIX COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 1900. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. My questions is how can I address the above mentioned issues in an effective way. Here is the Mplus code: VARIABLE: ... CATEGORICAL = alt_dep anx_att anx_diag dep_diag other_mh_diag; COUNT = anx_meds dep_meds other_mh_meds; WITHIN = ...; CLUSTER = rand_cid rand_mdid; ANALYSIS: TYPE = COMPLEX TWOLEVEL; ALGORITHM = INTEGRATION; MITERATIONS = 2000; PROCESSORS = 8; MODEL: %WITHIN% f1 BY anx_diag anx_meds anx_att; f2 BY dep_diag dep_meds phq2tot; f3 BY other_mh_diag other_mh_meds; f1@1 f2@1 f3@1; [anx_meds@0 anx_att$1@0 anx_diag$1@0]; ... f1 WITH f2; ... alt_dep ON f1 f2 f3; 


You say you have an unordered categorical variable but I don't see the NOMINAL option. 

Nidhi Kohli posted on Thursday, September 08, 2011  3:56 pm



Do you mean I should use the NOMINAL statement instead of CATEGORICAL under the ANALYSIS command? I was not aware that this can make a difference. Thanks you. 


Yes, you should use NOMINAL. With CATEGORICAL, a multiple category is treated as an ordered categorical variable which would be the wrong model. See the user's guide for further information. 

Nidhi Kohli posted on Friday, September 09, 2011  8:08 am



I changed to NOMINAL, however, I am still getting the same message, i.e., THE ESTIMATED WITHIN COVARIANCE MATRIX COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 138. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. Here is the code: VARIABLE: ... NOMINAL = alt_dep anx_att anx_diag dep_diag other_mh_diag; COUNT = anx_meds dep_meds other_mh_meds; WITHIN = anx_att phq2tot anx_meds dep_meds other_mh_meds anx_diag dep_diag other_mh_diag; CLUSTER = rand_cid rand_mdid; ... ANALYSIS: TYPE = COMPLEX TWOLEVEL; ALGORITHM = INTEGRATION; MITERATIONS = 5000; PROCESSORS = 8; MODEL: %WITHIN% f1 BY anx_att#1 anx_diag#1 anx_meds; f2 BY phq2tot dep_diag#1 dep_meds; f3 BY other_mh_diag#1 other_mh_meds; f1 f2 f3; f1 WITH f2; ...; [anx_att@0 anx_diag@0 anx_meds@0]; ...; alt_dep#1 alt_dep#2 ON f1 f2 f3; 


Please send the full output and your license number to support@statmodel.com. 


Hi Linda and Bengt, I have what I hope is a somewhat simple question. I have seen each of you mention that that the between level often doesn't support the same number of factors as the within level. I learned multilevel modeling in Mplus from Bob Vandenberg who also asserted the same thing. In one of his examples he has two factors at the within level that he collapses into a single inclusive factor on the between level. I am running into a similar situation. I have two workunit climate constructs (service climate and psychological safety) that load nicely as independent constructs on the within level. But, after many different modeling attempts, it seems that I may be best served to collapse them into an omnibus unitlevel climate factor at the between level. So, here is the question...is there a reference/paper/citation for this phenomenon (i.e., fewer factors on the between level)? If not, what justification/description would you include in a manuscript to convince reviewers that it is appropriate to model a different factor structure at the between level? 


There is a reference in the following paper to a paper by Harnqvist, Gustaffson, and Muthen that was forthcoming in Intelligence at that time. I think that discusses this issue. You can Google it. Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376398. 


Thank you, Linda. That was just what I needed. 

jenny posted on Thursday, January 19, 2012  7:59 pm



Hi, I am attempting to test a moderatedmediation model with the variables in the mediation (x, m, y) conceptually at the group level (level 2), The moderating variable (z) conceptually at the individual level (level 1). Data collected for x, m, and y were through individual responses (and using iccs and rwgs to justify aggregation to the grouplevel). The model depicts x (level 2) and z (level 1) interacting to affect m (level 2) and y (level 2). In other words, we are interested in testing a 2way interaction between a Level 2 IV and a Level 1 moderator on a Level 2 DV, mediated through a Level 2 mediator. Is it possible to test such a model in MPlus? Thanks very much! 


That works in Mplus  you let the betweenlevel component of z moderate the effects on the between level using XWITH since the betweenlevel part of z is latent. 


Hello Dr. Muthen, I am just starting to familiarize myself with Multilevel SEM. I apologize for the very simple question, but to confirm, I should be constructing multilevel data sets in the "long" format correct? Is this the optimal way to analyze multilevel data in Mplus? Thanks for the help! 


We recommend using the wide format which is more flexible than the long format. See the examples in Chapter 6 of the user's guide. 


See chapter 9 for multilevel SEM. 


Hi, I have a data set where we ask participants about 3 important goals (so goal nested within person). I would like to test a model where X1>y1>y2>y3>Y4, where all the variables are assessed at the goal (within) level. So I was wondering whether I should model the entire path on both the between and within level (like I would for a mediation analysis, following Preacher 2010), or only on the within level? When I tried to model it on both levels (specifying the same model on both), some of the paths are significant only on the between and not the within level, and I am not quite sure what that means conceptually? Also, if I wanted to control for a personlevel (between) variable, how would that influence the model, and where would I put that in? thank you, Marina 


I'm not sure what your cluster variable would be in this analysis. If you have measured all individual on all variables, multivariate modeling takes into account nonindependence of observations. 


The cluster variable is person  I have multiple goals for each respondent, and all my variables are assessed separately for each goal. 


What does your data set look like? person y1 y2 y3 y4 or person y1 person y2 person y3 person y4 


My dataset looks like this: person1 goal1 y1 y2 y3 person1 goal2 y1 y2 y3 person1 goal3 y1 y2 y3 person2 goal1 y1 y2 y3 person2 goal2 y1 y2 y3 ... By the way, would you prefer that I communicate directly with you by email about this rather than on the message board? Thank you, Marina 


Please send the input, data, and your license number to support@statmodel.com. 


Hello! I am working out some simple examples for a MLSEM workshop and wanted to demonstrate running a "Maximal" model (Hox, 2002), wherein I covary all within and between variables  which should just result in the ML covariance estimates at each level (as provided in the SAMPSTAT section). Syntax for this example: ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% Q1 WITH Q2Q7; Q2 WITH Q3Q7; Q3 WITH Q4Q7; Q4 WITH Q5Q7; Q5 WITH Q6Q7; Q6 WITH Q7; %BETWEEN% Q1 WITH Q2Q7; Q2 WITH Q3Q7; Q3 WITH Q4Q7; Q4 WITH Q5Q7; Q5 WITH Q6Q7; Q6 WITH Q7; What is stumping me is that I am getting a Model chisq value greater than 0. The df = 0, but chisq = 0.416 in this case. In the past, I have always obtained chisq=0. I removed Q1 and tried again (just to play) and got chisq=.2. Any ideas why chisq would not be zero? Thank you, Laura 


Please send the output to support@statmodel.com. 

Eva posted on Thursday, June 07, 2012  12:01 am



I want to build my BETWEEN model without predictors at the WITHIN level using the TWOLEVEL analysis type, to compare the paths of this BETWEENonly model against the final model with both WITHIN and BETWEEN models. My outcome binary variable is measured at the WITHIN level. How would I specify a WITHIN model with no level1 predictors that contains only the intercept that could be used as the outcome at the BETWEEN level? (So I want to do the %WITHIN% model as Y ON B0j; then specify my %BETWEEN% model as Y ON Z;) 


Just use %WITHIN% with no entries. A variance is not estimated for a binary variable and the threshold is declared in the between part of the model. 

Helen Zhao posted on Thursday, June 21, 2012  6:16 pm



Hi Linda, I'm running into an error like this ERROR: One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement. I wonder why it happens? My data is like: ID Cluster 1 1 2 1 3 2 4 3 5 3 6 3 7 3 8 4 9 4 Could you please help? thx!! 


The value of a betweenlevel variable by definition is the same for each person in a cluster. This is what generates the message. 


Hi, I have a 3wave longitudinal design where all variables are nested within individuals, therefore, I follow previous literature to analyze my data as a multilevel SEM model with Mplus. All variables are withinlevel (repeated) variables. It is important to repeat exactly the same paths both at the within and the betweenlevel of my model in the input. Most of the significant paths occur at the between level and I have to discuss that in the discussion of my paper. Are the following statements correct? "All paths were examined at both levels of analysis. A significant path at the within level means that at measurement times that the independent variable is high, the dependent variable is high too. At the between level, it means that if the aggregate level of the independent variable is high irrespective of time, the aggregate level of the dependent variable is also high." Thank you! Paris 


That sounds correct. 


Dear Linda, Greetings! I have a 64bit machine and I am using MPLUS v7. When running Multilevel SEM, I got an error message "NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE. THE ANALYSIS REQUIRES 5 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.75938E+06 INTEGRATION POINTS... " Much obliged if you could help. Many thanks! Magdalena 


Try using INTEGRATION = MONTECARLO (5000). 


Dear Linda, Thank you very much for your advice. Magdalena 


Dear Linda, I am testing a multilevel SEM model with 4 dependent withinlevel variables and 10 independent betweenlevel variables. 4 of the betweenvariables are control variables. The rest are 4 predictors (main effects) and their 2 interaction terms computed manually in SPSS. The only relationships that are specified at the withinlevel are the intercorrelations between the DV's. All other (hypothesized) relationships are at the betweenlevel. I haven't been able to get any results yet. Even if I try to build a very simple model based on the above, I always get an error which if I interpret correct says that there are negative values in my data file which is logical since I have standardized the predictors. An example of some of the errors I get: Invalid symbol in data file: "" at record #: 8, field #: 62 Best regards, Paris 


It sounds like that data may be in fixed format and you are reading it in free format. If you can't see the issue, send the output, data, and your license number to support@statmodel.com. 


Thank you Linda! I replaced commas with dots in the data file and it works now. There is another problem though, I now get: A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX [...] DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS. Indeed my clusters are only 30. I have tried to reduce my parameters as much as I can but I still get the error. Can I ignore it or do I really have to reduce the parameters more? I have also tried estimation = BAYES because I have heard it works well with small samples. Then I do not get the error anymore but I am not sure how to interpret the findings. The "p values" are similar to what I was getting previously with ML estimator. However, I also get "significance" based on the confidence intervals and this is nonsignificant nearly for all findings. Do I need to look at "p values" or at "significance" and its associated asterisk to interpret my findings? Best, Paris 


I would ignore the message as long as you don't have more than 30 parameters in the between part of the model. With Bayes, you look at the credibility intervals. 


Thank you Linda. I'm now reading Bengt Muthen's (2010) working paper on Bayes but there are a couple of things I am trying to clarify: 1. One of my interaction effects has a standardardized estimate of 0.336 and a p value of 0.04 but the CI include zero (0.6700.032). Do I have to report this interaction effect as significant or nonsignificant? 2. In the example of multilevel analysis in this article priors are specified but I do not have this information for my model. Can I run the model with the default without specifying priors? Bast, Paris 


1. You should use the confidence interval. The pvalues is a test of the parameter being positive. 2. Yes. 


Thank you! Paris 

kja posted on Thursday, April 18, 2013  9:42 am



Hello, I am trying to build a multilevel model in MPLUS but am running into some confusion. I have nine indicators that load on three latent factors, and I want to test whether these three latent factors predict minutes to relapse (DV). Measures are clustered within individuals because we examined the DV in two conditions: deprived and nondeprived. I would also like to examine whether one variable mediates any of these relationships (also measured in both conditions). And last, I would like to see whether deprivation moderates any of these relations. Can all of this be done in MPLUS, and if so is a multilevel model the best way to test these questions? I am trying to set up the most basic model (not accounting yet for mediation or moderation) with the following syntax: â€¦ CLUSTER = sid; WITHIN = deprived; BETWEEN = shs cesd ad gdd gda aa asrs aqr audit; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% delay ON deprived; %BETWEEN% f1 BY shs cesd ad gdd; f2 BY gdd gda aa; f3 BY asrs aqr audit; delay ON f1 f2 f3; â€¦ But with this, I get the following error: THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX etc. It seems like I need to change start values, but I wanted to make sure this was the proper set up first. Thank you very much in advance for your time. 


It does not sound like you need a multilevel model. When several variables are measured on each individual, multivariate modeling handles the nonindependence of observations. MODEL: delay ON deprived; f1 BY shs cesd ad gdd; f2 BY gdd gda aa; f3 BY asrs aqr audit; delay ON f1 f2 f3; 

kja posted on Thursday, April 18, 2013  12:10 pm



Hi Linda, Thank you very much  it sounds like I was making it a bit more complicated than necessary. In this model, the data would then be in wide format, correct? 


Yes, it would be in wide format. 

kja posted on Thursday, April 18, 2013  1:07 pm



Hello, When I switched the data back to wide format, I realized I am a little confused as to your response above. Participants have two 'delay' scores  one when nondeprived and one when deprived. I would like to test the overall main effect of the latent factors on the 'delay' score, as well as whether deprivation moderates these links. With the above model, MODEL: delay ON deprived; f1 BY shs cesd ad gdd; f2 BY gdd gda aa; f3 BY asrs aqr audit; delay ON f1 f2 f3; this only accounts for one delay score with data in wide format. Would I then need to take out the first statement and include both in the model: MODEL: f1 BY shs cesd ad gdd; f2 BY gdd gda aa; f3 BY asrs aqr audit; delay_nondeprived ON f1 f2 f3; delay_deprived ON f1 f2 f3; And then constrain the pathways to the two different delay scores to test for moderation? I have a relatively smaller sample size, so I am also trying to figure out how to maximize power and wasn't sure if using long format data with multilevel modeling would do this. Thank you again, and sorry for the confusion. 


Ok, so deprivation status does not define groups. Then I would go with your second MODEL, that is, you have 2 delay outcomes (so wide in that regard). And, yes, moderation can be thought of as the differences in their coefficients. 


Hello, for my multilevel model with latent var x = level1 (student) predictor and observed xa (aggregated x) = level2 (classroom) predictor I have 2 questions: 1) %within% Mod indices suggest "xa with xa" and indicate that M.I. would be 310.898. a) What does this mean? (I guess I should set free the variance of xa across classrooms, but is this not a default?) b) How can I change my model effectively to improve model fit? 2) With two latent predictors on level 1 (x1, x2) and level 2 (xa1, xa2) I want to test for interaction on level 1. When including the latent interaction term f  x1 XWITH x2 in the equation, the following error message appears "THE ESTIMATED BETWEEN COVARIANCE MATRIX COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 194. CHANGE YOUR MODEL AND/OR STARTING VALUES." a) Is the problem that I can not include latent interactions on level1? b) How can I test them ? c) If this is not the problem, what could be wrong? Thank you very much. I appreciate your helpful comments. 


1a) Not all MI's make substantive sense; ignore this one. 1b) The usual SEM rules apply: Free parameters with large MIs and when freeing makes substantive sense. 2)Send files to support for diagnosis. 


Dear Bengt, thank you very much. I will send the files to mplus support. However, I would like to know if it is generally possible to include interaction terms (either observed or latent) only on the %within% or only on the %between% level? 


XWITH can appear on both within and between. 

hogehoge posted on Friday, August 16, 2013  4:28 am



Hello, I ran the model below and got the following error messages. Model: %within% STRESS on DEMAND CONTROL; SLOPE  DEMAND on SEX; %between% STRESS on DEMAND; MSICKLR on JUN STRESS; SLOPE on JUN; *** ERROR in MODEL command Observed variable on the righthand side of a betweenlevel ON statement must be a BETWEEN variable. Problem with: DEMAND *** ERROR in MODEL command Observed variable on the righthand side of a betweenlevel ON statement must be a BETWEEN variable. Problem with: STRESS *** ERROR The following MODEL statements are ignored: * Statements in the BETWEEN level: STRESS ON DEMAND MSICKLR ON STRESS But when running the model without random slope, I didn't get such errors. Model: %within% STRESS on DEMAND CONTROL; DEMAND on SEX; %between% STRESS on DEMAND; MSICKLR on JUN STRESS; Is it impossible to use observed withinlevel variables on the righthand side of betweenlevel ON statement? How can I change the model with random slope? Thank you. 


If all the variables are continuous/normal there is no problem with this model. Are you running version 7.11? 

hogehoge posted on Friday, August 16, 2013  8:13 pm



Thank you for your help. I am using version 7. STRESS, DEMAND and CONTROL are withinlevel continuous variables. MSICKLR is a betweenlevel continuous variable. SEX is a withinlevel binary variable. JUN is a betweenlevel binary variable. 


Please send the input, data ,output, and your license number to support@statmodel.com. 


Remove these specifications SEX is a withinlevel binary variable. JUN is a betweenlevel binary variable. These variables are covariates and don't need that. 


For the run that you sent to Support, all you have to do is to remove the unnecessary request for integration. 

hogehoge posted on Tuesday, August 20, 2013  12:41 pm



Dear Bengt, I'm sorry for the mishap. Thank you so much. 

hogehoge posted on Monday, September 09, 2013  9:02 pm



Hello, When using withinlevel independent variables as betweenlevel dependent variable, I got the following warning messages. Model: %within% A on B C; B with C; %between% A on B C; B on D; C on E; *** WARNING in MODEL command In the MODEL command, the following variable is a yvariable on the BETWEEN level and an xvariable on the WITHIN level. This variable will be treated as a yvariable on both levels: B *** WARNING in MODEL command In the MODEL command, the following variable is a yvariable on the BETWEEN level and an xvariable on the WITHIN level. This variable will be treated as a yvariable on both levels: C Then I have a question. Is the withinlevel correlation between B and C residual correlation? 


When we say B and C are treated as dependent variables, we mean that distributional assumptions are made about them. B WITH C is a covariance not a residual covariance. 

hogehoge posted on Tuesday, September 10, 2013  9:42 pm



Dear Linda, Thank you very much for your explanation. 


I am I am trying to develop a twolevel model using school class as the cluster variable. However, I am getting the following error message: *** ERROR One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement. Between Cluster ID with variation in this variable Variable (only one cluster ID will be listed) I checked my data and the within cluster values are the same. Is there something else I should do to try and fix this? Thank you. 


Perhaps you are misreading you data, for example, having blanks in a free format data set. If you can't see the problem, please send the files and your license number to support@statmodel.com. 


Dear Drs Muthén, I am new to multilevel modeling with Mplus. I understood that an observed DV (I have one in my model) on the withinlevel has a between level counterpart, which is automatically created by Mplus. I would like to know what this counterpart is. In the Mplus user guide v7.0 it is writen that "In the within part of the model, the ON statement describes the linear regression of y on the observed individuallevel covariate x. " (exampl. 9.1; p.262) and "In the between part of the model, the ON statement describes the linear regression of the random intercept y on the observed clusterlevel covariates w and xm." a) Is the between counterpart the random intercept of y? b) Am I right by stating that on the betweenlevel my (averaged) IVs ( predict the random intercept of my (individually perceived) DV?  I use average values as IVs in the between part of the model and list them in between part of the syntax.I did not average my DV. Thanks in advance. Best, Rebecca 


a) Yes. You can think of it as the clustermean of y. b) Yes, but IVs can also use the latent variable decomposition into within and between. See Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel effects in contextual studies. Psychological Methods, 13, 203229. 

Megan Bell posted on Sunday, April 06, 2014  8:46 pm



Dear Drs Muthén, I am a student who is new to MLM in Mplus. I have received conflicting advice on whether I should be running a two or threelevel model for my analysis, and would appreciate your opinion. I am looking at the impact of child, parent, and neighbourhood characteristics on scores on a developmental outcome measure. My sample is a single population birth cohort (twins excluded), so there is only one child per family. I have 5 binary (yes/no) outcome variables, and 2 explanatory variables each for children, parents and neighbourhoods. One person has advised me to run a threelevel model with children (L1) nested within families (i.e. parents; L2), nested within neighbourhoods (L3). Another person has advised me to run a twolevel model, with children and parents (L1) nested within neighbourhoods (L2), the reasoning being that there is only one child per family, so parents cannot be on a separate level. I will be running multigroup models, to compare boys with girls. I have also been advised to run separate models for each outcome variable, rather than to include all outcome variables at level 1 and nest them within individuals. Your advice on the best way to build my model would be appreciated. Please let me know if anything is unclear. 


How many families and neighborhoods do you have? 

Megan Bell posted on Monday, April 07, 2014  7:26 am



Dear Linda, I have more than 20,000 families and around 80 neighbourhoods. Many thanks. 


I would do TWOLEVEL with children and parents nested in neighborhoods. When each cluster has only one observation, there is not ill effect of ignoring that clustering. I would run the theoretical model using all outcomes for boys and girls separately as a first step. 

Megan Bell posted on Monday, April 07, 2014  6:04 pm



Thank you Linda, appreciate your advice. 

Carolyn CL posted on Friday, October 24, 2014  10:11 am



Dear Drs. Muthen, I am running a TWOLEVEL SEM (N = 1285). I have a large number of clusters (N = 479) but few observations per cluster (Min = 1 (40%), Max = 16, Mean = 2.68). All measures are at the individuallevel, but the model takes into account potential schoolbased clustering (ICC's are mostly low, ranging from 0.5%  13%, with one variable at 34%). All pathways are estimated at the within and between levels. IVs: 3 dummy variables (reflecting categories of SES), sex, age DVs: 2 latent variables (one with 3 continuous indicators, the other with 5 categorical indicators), 2 categorical variables, 1 continuous variable The model will not run  usually getting stuck during bivariate or univariate estimation. I tried variations in my modeling approach, such as only modeling factors on the within level (in CFA they appear not to fit on the between level), using cluster_mean to create betweenlevel variables, and switching from ML to WLSMV. But nothing works  I always get an error message and no results. Much of the time, it appears that the problems are with bivariate estimation for my 3 dummy IVs (e.g. "SINGULAR INFORMATION MATRIX PROBLEM OCCURRED IN THE BIVARIATE ESTIMATION". I am hoping you may be able to provide some assistance with why the model is not running. 


The error message you report seems to be when you use WLSMV. What happens when you use ML? I assume that you have first explored parts of the model and made those converge before putting it all together. 

Carolyn CL posted on Friday, October 24, 2014  1:27 pm



As per your recommendation, I estimated parts of the model using WLSMV to ensure that it ran at a basic level (i.e., each DV regressed on the 3 SES dummies, age and sex), these all seemed to work fine. The next step, however, of adding a second DV and a structural component starts leading to issues. Usually involving estimating the alpha, beta or psi for certain variables or associations. For example, I received the following, with results but no standard errors: NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED. SLOW CONVERGENCE DUE TO PARAMETER 18. THE FIT FUNCTION DERIVATIVE FOR THIS PARAMETER IS 0.98714615D02. The parameter is a Beta linking a categorical DV to another categorical DV. Trying to run these models instead using ML led to the following error message, with no results: Observed variable on the righthand side of a betweenlevel ON statement must be a BETWEEN variable. 

Mark Prince posted on Tuesday, February 10, 2015  7:36 am



Hello, I am trying to run a 211 MSEM with random slopes and I keep getting the following errors: *** ERROR in MODEL command Observed variable on the righthand side of a betweenlevel ON statement must be a BETWEEN variable. Problem with: HELPSTRAT *** ERROR The following MODEL statements are ignored: * Statements in the BETWEEN level: TOTLDRNKS ON HELPSTRAT Here is my code (below). CD1 and CD2 are level 2 variables totldrnks and helpstrat are assessed at level 1 USEVARIABLES ARE ID CD1 CD2 TOTLDRNKS HELPSTRAT; BETWEEN = CD1 CD2; CLUSTER = ID; ANALYSIS: TYPE = TWOLEVEL RANDOM; Algorithm = integration; Integration = montecarlo; MODEL: %WITHIN% helpstrat totldrnks; sb1  totldrnks on helpstrat; %BETWEEN% CD1 CD2 helpstrat totldrnks ; helpstrat on cd1 (a1); helpstrat on cd2 (a2); totldrnks on helpstrat (bb1); totldrnks on cd1; totldrnks on cd2; sb1 with helpstrat totldrnks; [sb1] (bw1); Model constraint: New(b1 indb1 indb2); b1 = bb1+bw1; indb1 = a1*b1; indb2 = a2*b1; Output: cinterval; 


This relates to today's posts with Falkenstrom. Because you define a random slope for helpstrat on Within, there is no latent variable decomposition into within and between parts of helpstrat as there is otherwise, so that there is no between part of helpstrat to regress on on Between in your statement: %BETWEEN% totldrnks on helpstrat (bb1); You have to create a clusterlevel version of helpstrat, say using the Cluster_mean option. 


I am interested in the group level factor analytic results. Just a few questions: 1. When I run a twolevel CFA, should the between group results be the same/similar to the results when I run a singlelevel analysis BUT using the corrected between group correlation matrix that Mplus generates as the data file (and specifying the correct ns at the group level)? 2. Mplus generates a between group correlation matrix (using a type=basic twolevel specification). How is the corrected covariance matrix (as per Muthen [1994]) scaled into a correlation matrix in this instance (it's not anymore dividing the covariance by the product of the SDs)? 3. How different are the decompositions of the within and between correlation matrices from the WABA (within and between analysis) method described by Dansereau et al., 1984? (my understanding is that there is something off with the between correlation matrix computed using WABA). Thanks again! 


1. Only if you have a large number of clusters. 2. Send the relevant output to support to show the difference you refer to. 3. I am not familiar with WABA. 

Yoosoo posted on Saturday, March 28, 2015  3:16 pm



Hello, I have a question regarding the multilevel latent covariate (MLC) model with binary outcome and a formative/aggregated Level 2 contextual variable . My data has twolevel structure, (individuals within community) with low sampling ratio. The outcome is a binary variable (healthy/unhealthy). The independent variables include a binary variable at level 1 (HCARD, possession of health card) and a contextual variable at level 2 (% community population with health card). I applied TYPE= TWOLEVEL COMPLEX RANDOM with MLC by excluding the HCARD variable in the within/between variable section. Both my within/between models include regression of outcome on HCARD. I'm getting the following error: *** ERROR in MODEL command Unrestricted xvariables for analysis with TYPE=TWOLEVEL and ALGORITHM=INTEGRATION must be specified as either a WITHIN or BETWEEN variable.The following variable cannot exist on both levels: HCARD Do you have any suggestions on what may be wrong with my model? Also is my method of introducing MLC correct, for an formative aggregate L2 contextual variable (that is analogous to community gender composition)? 


I don't think you need TWOLEVEL and COMPLEX if you have individuals nested in community. Use only TWOLEVEL RANDOM with the cluster variable being community. You will need to create a clusterlevel variable for HCARD to use on between. You can use the CLUSTER_MEAN option of the DEFINE command to do this. 


I have already completed a CFA at the individual level (teachers) but I need to look at the next level (teachers in schools) to determine if there is betweenlevel and withinlevel variance. I am attempting to run a Multilevel CFA using a number of resources. One article aligned with my research suggest that I first create within and between matrices and obtain ICC values, then run a confirmatory factor analysis on the within matrix. When I attempt this step (run the CFA referencing the within matrix) I get the error message *** ERROR Insufficient data in "WinCov.dat" Is there a resource you can recommend for me to doublecheck my language or to ensure that the SAVEDATA generated SAMPLE IS WinCov.dat; file is complete? thank you. 


If you have three levels, you can use TYPE=THREELEVEL. Please send the relevant files and your license number to support@statmodel.com. 

Yoosoo posted on Sunday, March 29, 2015  3:20 pm



Dear Linda, Thank you for your response.I have two followup questions. 1) I am trying to use the MLC approach as per the following paper: Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel effects in contextual studies. Psychological Methods, 13, 203229. I was under the understanding that MLC approach uses a single variable to represent both level 1 and level 2 influences. Does using cluster_mean option apply to the MLC approach? 2) Sorry I did not explain it fully. My data was sampling used stratification method. Would COMPLEX option be suitable for my model? Thank you very much for your excellent support as always. You've been a great help to my research! 


1) The cluster_mean option is not using the latent betweenvariable approach of MLC. it is simply the observed cluster mean. The latent betweenvariable approach of MLC is not available with algorithm = integration. Algorithm = integration is needed with random slopes. 2) Yes. 

Yoosoo posted on Monday, March 30, 2015  1:39 pm



Thank you Bengt for the response. Removing RANDOM from my analysis did not resolve the problem. I believe this is because my outcome is a binary variable (as elaborated in my first post), which I believe requires algorithm=integration. Would you please suggest if there is any other way to apply MLC on a binary outcome on MPlus? Thank you so much. 


Two alternatives: 2level WLSMV or ML adding a factor behind the X. For 2level WLSMV, see the UG ex 9.9 and the paper: Asparouhov, T. & Muthén, B. (2007). Computationally efficient estimation of multilevel highdimensional latent variable models. Proceedings of the 2007 JSM meeting in Salt Lake City, Utah, Section on Statistics in Epidemiology. download paper contact first author show abstract For ML, add a factor behind X: %Within% f BY x; x@0; %Between% fb BY x; x@0; The factors then capture the latent variable decomposition. 

Yoosoo posted on Monday, April 06, 2015  9:31 am



Thank you for the response. I have a few followup questions. My x variable being decomposed is a binary variable. My cluster level variable is the fraction of individuals with x=1. 1) Does the suggested ML code still apply, or does setting it as following make sense (since x is binary)? %Within% f BY x@1; %Between% fb BY x; x@0; 2) Do I need to make further changes to the code since my cluster average x is a ratio and not average (referred to as "formative" aggregate in the following paper): Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel effects in contextual studies. Psychological Methods, 13, 203229. Thank you. 


I would not get into latent variable decomposition with a binary covariate. 

Yoosoo posted on Monday, April 06, 2015  6:55 pm



Thank you Dr. Muthen for the response. I wonder if this is better suited for SEMNET, but would you help me understand why binary covariates may be better off without latent decomposition? Is it a theoretical issue, or more of a practical concern (computational)? My data has low sampling ratio (~0.15) and low number of LV.1 units/cluster. I understand that MLC may still be biased but I wanted to use it as a comparison to MMC. Thank you so much for your patient support as always. 


The latent variable decomposition assumes that the latent between and within parts are uncorrelated, normal variables. 

Back to top 