Let me restate your question to be sure that I understand it. You have a factor on the cluster (between) level and you want to relate it to an observed variable on the individual (within) level. If this is what you mean, you would state this in the between part of the model as y ON f, for example. Every observed variable on the within level has a between level counterpart that is automatically created by Mplus.
Anonymous posted on Saturday, January 06, 2001 - 11:40 am
I indadvertantly posted this question already under Hierarchichal Regression, but it really belongs under this heading. I will repeat it here: If I have found a two-factor model with an EFA and have found it not to converge as a two-factor model in a multilevel framework, are there any cites that I could use to argue that the two-factor model found in the EFA is an artifact of the nested nature of the data?
I don't know of any cites related to this. I would first get the multilevel factor analysis model to converge starting with within level and then adding the between. It is known that there are sometimes less factors found on the between level than on the within level. See references in Muthen (1989, Psychometrika) on the website. It is certainly possible that one factor on the between level and two factors on the within level could give rise to the well-fitting two-factor EFA that you found. It is also possible that there are two-factors on the between and one on the within. You should probably work more with the multilevel model before you draw any conclusions about the artifactual results from the EFA.
Anonymous posted on Friday, February 02, 2001 - 12:32 am
Is it possible to estimate a multilevel model without the option that every observed variable on the within level has a between level counterpart, which is automatically created by Mplus.
No. We do plan to add 3-level in the future. It is among many planned additions.
Anonymous posted on Tuesday, November 13, 2001 - 6:09 pm
Hello -- I am learning Mplus so that I can estimate some multilevel path models, but I'm afraid I've gotten confused.
In a standard mixed regression model, you can estimate a level-1 regression where x_1 and x_2 predict y, and it is possible to get random components for the intercept and both regression parameters over level-2 units. However, as best as I can tell, in Mplus it is only possible to get a random intercept but NOT random slopes in the same situation.
Is there a straightforward way to understand why this is so? Is the answer to this the same reason why in a multilevel CFA in Mplus you can only get random intercepts in the indicators but the factor loading matrices are forced to be invariant across level-2 units?
You are correct that random slopes are not part of the Mplus multilevel model for cross-sectional data. Latent variable modeling has traditionally considered mean and covariance structure models. With random slopes, there is no one covariance structure, but the covariance structure changes for each covariate value. See, for example, the Raudenbush chapter in the Collins, Sayer book. In Version 3 of Mplus, random slopes for observed covariates will be included.
Anonymous posted on Sunday, March 10, 2002 - 12:13 pm
In the Step 4 (estimation of between structure) of the multilevel CFA model building procedure described in Muthen (1994) Sociological Methods and Research article, I am running into the following problem:
*** FATAL ERROR
THE SAMPLE COVARIANCE MATRIX COULD NOT BE INVERTED.THIS CAN OCCUR IF A VARIABLE HAS NO VARIATION, OR IF TWO VARIABLES ARE PERFECTLY CORRELATED, OR IF THE NUMBER OF OBSERVATIONS IS NOT GREATER THAN THE NUMBER OF VARIABLES. CHECK YOUR DATA. THIS PROBLEM IS DUE TO: VAR11 How can I understand which of these is causing the real problem? If the problem is due to only one variable as suggested, does that mean that variable has no variance in the Sb matrix. When I checked the ICC of that items it is not very small (relative to other items in the analysis).
And, is it possible to use the Sb matrix in an exploratory factor analysis in Mplus to get an idea of the factor structure in the between level? Your help is much appreciated.
As stated in the article, this is a common problem. Are you analyzing Sb or SigmaB. You will probably have the same problems with both but SigmaB is recommended. You can save this using SAVEDATA: FILE (SIGB) IS filename; The covariance matrix is saved by default. You can also save the correlation matrix in a separate run by stating FILE (SIGB) is filename; TYPE=CORRELATION;
You can see if any variables have zero variances by looking at the diagonal of the covariance matrix. You can see if any variables have correlations of one by looking at the correlation matrix. The sample size is the number of clusters. If you have more variables than the number of clusters, then you violate the last warning.
You can use the SigmaB correlation matrix in EFA with the ULS estimator. This is the default estimator.
Anonymous posted on Monday, March 11, 2002 - 12:15 pm
Thank you very much for your reply. I am using the SIGB matrix. None of the variances have zero variance, although some of them is very close. The problem seems to be some of the correlations that are larger than 1.0. I am guessing these correlations are caused by the low item variances. Does this simply mean that there is not enough variance in the group level to model?
The EFA output says: THE INPUT SAMPLE CORRELATION MATRIX IS NOT POSITIVE DEFINITE. THE ESTIMATES GIVEN BELOW ARE STILL VALID.
I am not sure I understand why the the estimates are still valid even though the matrix is positive definite. Can I legitimatly report these estimates in a manuscript? Is there any literature that explains why these estimates are considered valid? Thank you again for your help.
Correlations greater than one means that the matrix is not positive definite which is a common problem with the estimated sigma between matrix as is mentioned in step 4 of the paper. It does not mean that there is low variance on the group level, but simply that the sigma between matrix is not well-estimated.
EFA estimation using ULS does not depend on the correlation matrix being positive definite. This is just an informational warning. However, in your case with correlations greater than one, I would not trust the results. You may instead want to use the second alternative mentioned in step 4, to analyze the sample between matrix using ULS as an approximation to analyzing the sigma between matrix. You can use these results in the multilevel model.
Anonymous posted on Monday, November 11, 2002 - 9:17 am
I'm student who studing multi-level model.
I'm finding reference about multi-level SEM analysis exept mplus manual.
Raudenbush, S.W. & Bryk, A.S. (2002). Hierarchical linear models: Applications and data analysis methods. Second edition. Newbury Park, CA: Sage Publications.
Snijders, T. & Bosker, R. (1999). Multilevel analysis. An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage Publications.
Following is a reference that uses Mplus in the analysis:
Heck, R. (2001). Multilevel modeling with SEM. In G.A. Marcoulides & R.E. Schumacker (eds.), New Developments and Techniques in Structural Equation Modeling (pp. 89-127). Lawrence Erlbaum Associates.
You can find other multilevel references at www.statmodel.com under References.
Yi-fu Chen posted on Tuesday, April 01, 2003 - 7:51 am
Hi, Dr. Muthen,
We are trying to run a two-level SEM model and encounter problems. We have 320 subjects nested within 9 counties. Five counties are in intervention group and four are in control group. (This means the treatment is in the county level). We now want to run a model with three latent constructs. Two latent constructs have four indicators each and one latent construct (intervention) has only one indicator. Here we treat intervention as a between-level variable, so we run the two-level model like this:
Between intven------>Eta2 Eta1----------^
Is this a right way to run this model? Besides, we have a little confuse about the sample size for the between-level. For our case, is it right to say that we have 9 cases in the between-level? Or the model presented in the between-level is only the result of adjusting cluster effect.
Thank you for your help!
Yi-fu Chen posted on Wednesday, April 02, 2003 - 7:56 am
Hi, Dr. Muthen,
This is a follow up question. We try the model using complex sample. We have 9 clusters in the sample. In the model there are 11 observed variables and then we got the following message:
*** FATAL ERROR
THE SAMPLE BETWEEN COVARIANCE MATRIX COULD NOT BE INVERTED. THIS CAN OCCUR IF A VARIABLE HAS NO VARIATION, OR IF TWO VARIABLES ARE PERFECTLY CORRELATED, OR IF THE NUMBER OF CLUSTERS IS NOT GREATER THAN THE NUMBER OF VARIABLES. CHECK YOUR DATA. THE PROBLEM IS DUE TO:
NUMBER OF VARIABLES : 11 NUMBER OF CLUSTERS : 9
So, if we understand correctly, the error is because we have more number of variables than the number of clusters. Does this mean that when running the complex sample model, we should have more number of clusters than number of variables? Do you have any suggestions for dealing with multilevel issue when the number of cluster is small?
I am asking someone with experience with a small number of clusters to answer your question. Less than 20 clusters makes the statistical analysis difficult.
booil jo posted on Thursday, April 03, 2003 - 8:50 am
Regarding Yi-fu Chen on Tuesday, April 01, 2003-
I think your model setup is correct given your cluster randomized trial situation and your research question. However, in your situation with only 9 clusters, I don't think it is a good idea to rely on nonparametric standard errors provided when COMPLEX command is used. Although simple, the sandwich estimator is known to yield anticonservative coverage probability (i.e., type I error rate higher than the nominal rate) with small numbers of clusters. If the number of clusters per condition is less than 10 (you only have 5 and 4 in each condition), the resulting sandwich estimator is very unreliable. See, for example, Jo et al. (2002) and Murray et al. (1998). To counter this limitation, several methods such as jacknife sandwich estimates (MacKinnon & White, 1985), adjustment using the t-distribution (Thornquist & Anderson, 1992), and adjustment considering the variance of the sandwich estimate (Kauermann & Carroll, 2001) have been suggested. As far as I know, modification procedures to counter anti-conservative sandwich estimates are not embedded in the current version of Mplus. I wonder if the conventional model based methods such as mixed effect ANOVA would do any better than the sandwich method in your situation. Another (even simpler) way to deal with this problem will be to treat the clusters as fixed (i.e., dummy covariates) and do regular fixed effect regression analysis. However, the results will be valid only under a strong assumption that the nesting structure is completely explained by these dummy covariates. For more explanation about the disadvantage of this regular regression approach, see Chapter 4 of Snijders & Bosker (1999).
Jo, B., Muthén, B., Ialongo, N.S., & Brown, C.H. (2002). Cluster randomized trials with nonadherence. Submitted for publication. Can be downloaded from Mplus website.
Kauermann, G., & Carroll, R. J. (2001). A note on the efficiency of sandwich covariance matrix estimation. Journal of the American Statistical Association, 96, 1387-1396.
MacKinnon, J. G., & White. H. (1985). Some heteroscedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics, 29, 305-325.
Murray, D. M., Hannan, P. J., Wolfinger, R. D., Baker, W. L., & Dwyer, J.H. (1998). Analysis of data from group-randomized trials with repeat observations on the same groups. Statistics in Medicine, 17, 1581-1600.
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Thousand Oaks, CA: Sage.
Thornquist, M. D., & Anderson, G. L. (1992). Small sample properties of generalized estimating equations in group-randomized designs with Gaussian response. Paper presented at the annual meeting of American Public Health Association. Washington, D. C.
Anonymous posted on Monday, April 28, 2003 - 5:52 pm
I am attempting to build a 2-level structural model (n=762, number of clusters=108). i am trying to follow the 4 steps given in muthen 1994 (multilevel covariance structure analysis). i am having a difficult time understanding how to accomplish the third step, in which one estimates the pooled within-group covariance matrix (with a sample size of total n minus the number of groups). can you give some guidance on how to accomplish this in mplus? thank you.
In the SAVEDATA command, the FILE (SAMPLE) option will save the pooled-within covariance matrix. See page 90 of the Mplus User's Guide.
Janet Holt posted on Thursday, February 19, 2004 - 8:44 am
In constructing a multilevel model in MPLUS. I understand that random slopes need to be fixed in MPLUS. However, is it possible to model a cross-level interaction even with fixed slopes. In HLM this would be comparable to a gamma11 with no error term (u1j) in the equation.
I am developing a model of the relationship of executive functioning between parents and children. I want to do this in a latent variable framework, as I have a number of tests of executive functioning. So I was hoping to have a model where the latent variables executive functioning of the mom and executive functioning of the dad are regressed on executive functioning of the child,with Efmom and Efdad indicated by the tests measuring executive functioning in the moms and dads and Efchild indicated by the tests measuring executive function in the children. The problem I’m running into is that I have siblings in the sample, so when I develop the SEM like this, I am actually duplicating parents in the parent side of the model. This doesn’t seem right to me, but I can’t quite figure out how to put this into a modeling framework that makes sense.
Thanks for your help,
bmuthen posted on Tuesday, December 07, 2004 - 11:31 am
You can handle this in 3 ways.
First, you can use type = complex with cluster=family to get the right SEs and chi-square taking the correlations within family into account.
Second, you can do 2-level modeling with cluster=family, where family variables go on level 2 (between).
Third, you can do multivariate modeling of all siblings jointly - see the Khoo-Muthen paper on the Mplus web site.
First I wanted to look at a CFA of this, so I tried the 2-level modeling using this syntax: CLUSTER IS family; BETWEEN = read56m name56m read56f name56f WCST56mo trlBresm towerfa towermo stopfa stopmo WCST56fa trlBresf; WITHIN = word45rs colr45rs wcst45rs toh45rs ssrt45 trail45r; ANALYSIS: TYPE = twolevel; MODEL: %BETWEEN% speedmo by read56m name56m trala56m ; speedfa by read56f name56f trala56f; EFmom by towermo stopmo WCST56mo trlBresm; EFdad by towerfa stopfa WCST56fa trlBresf ; %WITHIN% execfunc by toh45rs SSRT45 wcst45rs trail45r ; speed by word45rs colr45rs ;
where all the variables with m or mo at tne end are mother variables, those with f or fa at the end are father variables and the rest are child variables. When I run this, I get the result that the intraclass correlations for all of the child variables are all 0.000, although when I look at these with SAS proc mixed, they are not zero. So I assume I'm doing something wrong in my setup here?
Second, I want to look at the mother and father (between variables), EFmom and EFdad, predicting the child (within variable) Execfunc. But I can't see how to do this in the model, because I need to specify between or within and this is both. How can I set this up properly?
bmuthen posted on Saturday, December 11, 2004 - 6:00 pm
If you want between-level variation of the child variables - and hence get a non-zero intraclass correlation - you should not put these variables of the Within list because that says they have zero between variance.
You may then also add a between-level version of the exefunc factor:
exefuncb by toh45rs-colr45rs;
where you may find that you need to fix the residual variances at zero.
You can then add the between level statement:
exefuncb on efmom efdad;
Anonymous posted on Tuesday, February 15, 2005 - 6:36 am
I am trying to save the within correlation matrix of a 2level CFA. I am using this syntax:
SAVEDATA: SAMPLE IS filename.dat; TYPE IS correlation;
It does generate a datafile, but it does not contain any data/correlation matrix (0 KB).
The syntax mentioned above (FILE (SIGB) is filename; TYPE=CORRELATION;) does not work anymore.
Also, the empty data file is saved in the WINDOWS registry, how do I specify a path?
Anonymous posted on Thursday, April 28, 2005 - 12:51 pm
Hello, My collegues and I are working on a multi-level, multi group analysis trying to confirm a specific model. I was looking at your document '6 steps for Two-Level SEM' and was wondering if we should be updating our model as we proceed through the steps even if we are doing a confirmatory type of analyses. Thank you for your help.
bmuthen posted on Thursday, April 28, 2005 - 6:33 pm
Tough question. Seems like the 6 steps are exploratory in nature - otherwise you would simply go straight to the last (confirmatory) step.
Anonymous posted on Thursday, May 05, 2005 - 6:34 am
Can MPlus do a three-level longitudinal model with cross-classifications of level 2 units at level 3? (time is level 1, student level 2, teacher level 3 -- students change teachers)
I have problems with getting a considerable fit with my data. I am testing a path model where I take into account the fact that my data are clustered so I use the 'TYPE= complex' command to get accurate SE. However, when I test the same path model without the 'TYPE-complex' command the fit (CFI; TLI and RMSEA) is much better. The modification indexes do not give me good suggestions to improve my model. At this moment I have a CFI: 0.819, TLI: 0.764 and RMSEA 0.182. According to the rules of thumb these values are not good enough. Is it possible that merely the specification of the clustered data is responsible for the lowering fit? And do you have any suggestions (besides looking at the MI because these do not help) how to improve the fit?
More information is needed to answer this. Typically, taking clustering into account (using type=complex) lowers the chi-square value in the test of model fit, at least if you have substantial intraclass correlations. What were your chi-square values without type=complex and with it? And what were your CFI, TLI, and RMSEA values without type=complex?
With taking clustering into account: CFI=0.735 TLI=0.706 RMSEA=0.125 WRMR=2.323 Chi square model fit=219.575 df=9 Chi square model fit for the baseline model:804.756 df=10 Without taking clustering into account: CFI=0.821 TLI=0.742 RMSEA=0.190 WRMR=4.168 Chi square model fit=990.507 df=18 Chi square model fit for the baseline model:5473.1354 df=26
I have continuous as well as categorical dependent variables. The estimator is WLSMV and I used theta parameterization.
Do the estimator or the type of parameterization have something to do with the poor fit? Are these fit indices biased or is my model specified incorrectly (but as i said the MI do not give meaningfull indications what can be altered)?
It looks to me like the fit is not good with or without taking clustering into account. I think the CFI should be at least 0.96 and the RMSEA less than 0.05, for example. I would try to revise the model. But you claim that MIs don't help. You say you have a path model - perhaps you could make that just-identified by including all paths to see where your model goes wrong.
Anonymous posted on Saturday, August 06, 2005 - 9:07 pm
I am trying to replicate the Step 0 or basic.inp for example 9.8 using the Six Steps for Two-Level SEM. When I run the following syntax:
TITLE: test DATA: FILE IS ex9.8.dat; VARIABLE: NAMES ARE y1-y6 x1 x2 w clus; USEVARIABLES ARE y1-y6 ; CLUSTER = clus; ANALYSIS: TYPE = TWOLEVEL BASIC; SAVEDATA: SAMPLE = spw.dat; SIGB = estsigb.dat; TYPE = CORR;
I get the following error message:
*** WARNING in Savedata command (Err#: 9) Error opening SAMPLE save file: spw.dat SAMPLE will not be saved. *** WARNING in Savedata command (Err#: 9) Error opening SIGB save file: estsigb.dat SIGB will not be saved. 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
Do you know why I would receive this error message? Thanks for your help.
bmuthen posted on Monday, August 08, 2005 - 1:20 pm
Mpduser1 posted on Sunday, November 06, 2005 - 6:31 pm
I had a question about the default error structure used in Mplus 3.13.
I am attempting to build a model of the following form:
x --> y1 x --> y2 x --> y3 (x, y1, y2, y3) --> z
1. y1, y2, y3, and z all have random intercpts (with sufficient between-level variation);
2. I specify the "x --> y" coefficients as random (and I have sufficient between level random variation);
3. I specify the "x --> y" coefficients as random (and I have sufficient between-level variation).
If I include no special statements in Mplus, Mplus assumes that the random coefficients for the random y's intercepts and the "x --> y" coefficents are correlated.
But I generally have problems estimating the model if I assume that the residual variances for the "x --> y" and the "x --> z" random coefficients are correlated, and thus have to use "@0" to manually restrict these paramters' covariances.
Shouldn't Mplus assume that the "x --> y" and the "x --> z" random coefficients' error structures are uncorrelated ?
On the one hand, I could see making an argument that these terms should be correlated (for example, if I was to write the above model out longhand), but I can rarely get such models to converge in practice.
BMuthen posted on Saturday, November 12, 2005 - 6:00 pm
The Mplus defaults can be overridden by using @0 as you say. It is sometimes the case that it is difficult to estimate a model where the covariance matrix for the random effects has many free elements.
mpduser1 posted on Tuesday, March 14, 2006 - 8:35 am
I have a pair of dummy categorical variables I wish to use as predictors in a series of HLM and OLS models.
The predictors are coded as "0,1" (i.e., 0=male, 1=female) in my sample.
Does Mplus retain the dummy 0/1 coding, or use a 1/2, contrast, (or other) coding scheme ?
R-square is the explained variance divided by the total variance as in regular regression.
Ramin Azad posted on Saturday, October 21, 2006 - 8:40 am
I have some questions:
1) I have to compute a question which asks the respondents to give the number of new ideas that had been adopted by the organization in a period of time. Different firms have given different responses. for example, zero, 7, 8, five to ten, etc. What is the name of this scale?
2) When I want to find out the impact of a five-point Likert scale on the above scale, can I use a Regression? if not, what should I do?
Please accept my thanks in advance. Hamid
Ramin Azad posted on Saturday, October 21, 2006 - 8:43 am
Hi I have another question.How can I multiple R? It means that I have to use correlation in power 2? Then what is the difference between that and R2? Thanks Hamid
If the variable is scored as the number of ideas and is normally distributed, it can be treated as continuous. If you use categories like 5 - 10, then it would be a Likert scale.
You can regress the variable on a Likert scale.
R-square is not multiple R but I'm not sure what multiple R is. You should look it up in a textbook.
Ramin Azad posted on Sunday, October 22, 2006 - 2:07 am
Thank you for your help. If the variable which is scored as the number of ideas and is normally distributed, and I treat it as continuous variable,then (1)can I regress it to find out the impact of a five-point Likert scale on this variable?
(2) should I myself categorize the responses in let's say five categories and then regress it?
If you regress a continuous variable on a Likert scaled variable, the Likert scaled variable is treated as a continuous variable. An alternative is to create a set of four dummy variables and use them as covariates. That would have to be your decision. Regular regression is sufficient.
I have a question about cross-level interaction. I have a study design similar to the user's guide example 9.2: two level regression analysis for a categorical depentende variable. Is it possible to include cross-level interaction in this model?
I am not sure if I am asking something too basic about statistics or I didn't phrase my question correctly. I mean, more specifically, I was wondering if I could interact individual's race (level 1 var) and community’s SES status (level 2 var) in order to see the different effect of community's SES by ethnicity on the outcome? Or is this theoretically, or methodologically irrelevant?
Look at example 9.2 Race (ethnicity) is x and community SES is w. If you have a random slope on level 1 for y on x, this means that the model includes the cross-level interaction term x*w (see multilevel text books on cross-level interactions). So it sounds to me that you want to do exactly what ex9.2 does.
Thank you for your response, yes, it makes perfect sense, but I'm not sure what happens if I have several community level variables and individual variables, out of which I am only interested in the interaction between Race and community SES. Am I able to model that way?? Maybe not in a random effect model, but in a random intercept model?
If you specify y ON x; you get a random intercept and a fixed slope. If you specify s | y ON x; you get a random intercept and a random slope.
Ramin Azad posted on Sunday, January 07, 2007 - 11:32 am
I am wondering if it is possible that the results of regression are different from SEM? if so, what are the reasons? For example, I have found insignificant relationships, whereas, SEM shows a very strong significant with the same data!!!!
This is a discussion board for Mplus. I would have no idea how to find something in Amos. You should contact Amos support.
Mike Tobak posted on Sunday, June 10, 2007 - 9:15 am
Hi, Prof. Muthen,
I have several questions about multilevel path analysis and Mplus. I am new to this field and Mplus. I am trying to analyze a two-level path analysis model with random slopes, and binary level-2 covariates. I have 20 path coefficients to estimate for within model and I have 30 clusters.
Q1: I wonder how many random slopes I can estimate at level-2 if I only have 30 clusters. It seemed that I cannot set all of the 20 path coefficients to be random, since I only have 30 clusters. I want to use TYPE=TWOLEVEL.
Q2: Could you please tell me in which of your articles I can find the mathematics, derivations and related algorithms to this specific model? I am new to multilevel SEM, and I found piles of papers from the website (kind of lost). I wanted to start with the key paper for multilevel path analysis (no latent variables) with random slopes.
Q1. Each random slope is one dimension of integration so estimating more than 4 becomes computationally very heavy. In our experience, slopes are most often not random. As a first step, you might consider looking at each regression in your path model separately to determine which slopes are random.
Q2. I would start with looking at multilevel regression in, for example, Raudenbush and Bryk and path analysis in, for example, Bollen. Following are three relevant articles:
Bauer et al. (2006). Psych Methods, 11, 142-163. Kenny et al. (2003). Psych Methods, 8, 115-128. Krull et al. Multivariate Behavioral Research, 36, 249-277.
Mike Tobak posted on Wednesday, June 13, 2007 - 6:24 pm
Thank you! I will read the articles carecfully!
Mike Tobak posted on Thursday, June 28, 2007 - 12:40 pm
Thank you for your recommendations. I have read the articles and books. I wonder if you can tell me, besides Mplus User's Guide, what is the key article of Mplus talking about including random slopes in a multilevel SEM.
I know that the maximum likelihood with numerical integration is used in Mplus, but I want to know more details and mathematics behind it. And it seems that Mplus user's guide didn't provide too many details in this particular field regarding how to obtain the estimations of multilevel SEM with random slopes.
We don't have a paper on random slopes per se, but this topic is included in the technical details of the Muthen-Asparouhov (2006) chapter for the forthcoming Chapman-Hall book, which is on our web site under Papers, within the growth mixture topic.
I have a fairly simple SEM model based on a finite sample size. That is the sample is students and I probably have over 10% of the them in the sample. Given the problems that a finite sample can have on estimating standard errors and fit statistics, I am wondering if the Type = Complex function will correct for this.
My thinking is that the ill-effects of the finite sample will show up in the estimation of the standard errors (and fit statistics) due to the potential non-independence of the respondents. If I test for both the design effect and the intraclass correlation coefficient (Hech 2001; Muthén and Satorra 1995) and they show a high degree of non-independence, then type = complex should work so long as I have reasonable clusters.
I have a sample of 482 (with 19 clusters, average cluster size = 25). I have 38 observed variables for 7 latent factors. I wonder if it is appropriate for me to fit a multilevel CFA model because as I run the model, I encountered error message as followings,
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.
A MATRIX COULD NOT BE INVERTED DURING THE H1 MODEL ESTIMATION. THE ESTIMATED BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE.COMPUTATION COULD NOT BE COMPLETED.THE VARIANCE OF N4 APPROACHES 0.FIX THIS VARIANCE AND THE CORRESPONDING COVARIANCES TO 0, DECREASE THE MINIMUM VARIANCE, OR SPECIFY THE VARIABLE AS A WITHIN VARIABLE. THE H1 MODEL ESTIMATION DID NOT CONVERGE.SAMPLE STATISTICS COULD NOT BE COMPUTED.
Is it because the number of cluster is too small so I encountered problem when I run the analysis? or anything else suggested by the error message?
This may be because the number of your clusters is small. A minimum of 30-50 is recommended. When Version 5.1 is made available later today or early tomorrow, I suggest trying to run the analysis using Version 5.1.
Thanks for your answering. Some follow-up questions regarding to my previous question. What is the sample size requirement for doing a multilevel CFA? You suggested me to have a minimum of 30-50 number of clusters. How about the cluster size? Any literature related to sample size requirement of multilevel modeling/multilevel CFA I can make reference to?
The necessary sample size depends on many factors and is best determined by doing a Monte Carlo simulation study. Joop Hox has written a lot about the number of clusters needed and cluster size. I would search for articles by him.
I'm doing logistic regression analyses in mplus with multiple groups.
To say something about the change in the amount of variance explained, R2-change (R2-model2 minus R2-model1), between two logistic regression analyses, can I simply subtract the r2scores (all predictors-just demographic predictors) from 2 different analyses predicting the same outcome?
From other postings, it seems like I can't. If not, how would I answer such a question?
I don't think R-square for logistic regression is accepted in all circles and I don't know of a test of the difference in two R-square values. I would try to answer my question in a different way. See the logistic regression literature to see how others do this where I think they work with likelihood ratio differences.
Hello, I am trying to predict my dependent variable with race, poverty, and vars'1--5. Poverty & var5 are level 2 & all others are level 1. I am testing for a cross-level interaction between race and poverty (thus I indicate a random slope/fixed intercept). I'm unsure about how to include the other variables in the program since they should NOT have random slopes. Thanks for your help in advance.
Analysis: Type=Random Twolevel; Model: %WITHIN% s | dep on race; dep ON var1 var2 var3 var4; %BETWEEN% dep s ON poverty; dep ON var5;
This looks correct. You use the | symbol for random effects. Without the | symbol, fixed effects are estimated.
John Hipp posted on Friday, June 06, 2008 - 3:58 pm
Hi- this is probably a simple question, but I did not see the answer above. I'm trying to run a multilevel model where an individual-level measure affects y at the individual level, but its neighborhood-level equivalent (as a latent variable, not a summation) does also. If the data were swung wide, a simple version of my model would be (if there were four people in every neighborhood): fy by y1 y2 y3 y4; fx by x1 x2 x3 x4; fy on fx; y1 on x1; y2 on x2; y3 on x3; y4 on x4;
What would my code look like if I used M-Plus's multilevel capability? My guess is something like: %WITHIN% y on x; x on ; %BETWEEN% y on x; I'm guessing that regressing x on an intercept would give a random version of x? And I would not declare x as either a within or between variable. Is this giving me the model I want to estimate? thanks much.
I am using Mplus to test multilevel mediation models. I am interested in testing relations among a binary treatment condition variable (manipulated between subjects), a continuous mediator(assessed repeatedly within subjects on a variable occasion schedule), and a categorical between-subjects treatment outcome (smoking vs. abstinent). I can estimate the treatment effects on the mediator in a two-level Mplus model in which the intercept and slope are random, but I cannot use the estimated intercept or slope variables as a predictor of a categorical outcome between subjects. The outcome is treated as an observed variable (a knownclass variable), not a latent variable. Do I need to add the mixture model program to my base and multilevel program in order to fit such a model, or is there a way I can do this wtihin the multilevel Mplus program? I have opted not to use a growth curve model due to the high number of within-subjects observations (around 50). Thank you in advance for your advice.
Let me see if I interpret this correctly. It sounds like you consider 2-level data where level 1 is occasion and level 2 is individual - you do this as a 2-level, univ (actually bivariate) outcome model rather than as a 1-level multivariate outcome model due to having many time points. The ultimate outcome is a binary variable (smoking or not). It sounds like you are not considering a growth model but a path analysis model (tx -> mediator -> outcome). The way I hear you the random intercept and slope that you refer to are for the mediator regression on tx. If I am right so far, then you can specify a random intercept also for the binary outcome and on level 2 let that be predicted by the intercept and slope from the mediator regression. You don't need mixture.
Thank you very much for your response. I failed to note that the model does include a growth component. I've pasted the syntax and error messages below. The variables are defined as follows: couns is the binary indicator of treatment group, reltime is the number of days since the stop-smoking day, ability is the mediator assessed repeatedly within subjects, and abst is the binary outcome at the end of treatment (and is not time varying).
The model below does not work, but if I substitute a continous outcome for abst or drop the CATEGORICAL statement the model runs and converges. Does this additional information change your advice at all? Thank you for your guidance!
CATEGORICAL = abst; WITHIN = reltime; BETWEEN = couns abst; ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% slope | ability ON reltime; %BETWEEN% ability slope ON couns; abst ON couns ability slope;
ERROR in MODEL command Observed variable on the right-hand side of a between-level ON statement must be a BETWEEN variable. Problem with: ABILITY ERROR The following MODEL statements are ignored: * Statements in the BETWEEN level: ABST ON ABILITY
My model tries to explain the extent to which groups of individuals can fix errors in files. Each group is given one file and has a week to fix the errors in it. In multilevel terms, the file is at Level 2 and the individual members are at Level 1. I am interested in the extent to which individuals in the groups identify errors and fix them. (Note that in my context, the individuals typically make copies of the file and distribute it to each other and then compile their work). After the time period elapses, each group will have a file (at Level 2) that has more or less errors than it did originally.
My model has the form: Level 2 predictor: - Number of errors in file at T1
Level 1 endogenous variables: - Extent to which perceive errors - Extent to which take actions to fix errors - Extent of knowledge of file contents (moderator)
Level 2 outcome: - Number of errors in file at T2
My model predicts that: - The # of errors (Level 2) affects individuals perceptions (Level 1), moderated by individual's knowledge (Level 1)
- At Level 1, perception --> action
- The level 1 variable (action) affects the level 2 outcome (errors in file)
I don't think I can do this in HLM. I wondered whether I might be able to it with MPLUS.
About 100 groups of 4 per group. I suspect that this is small. However, I am at the planning stage of my research and can adjust the numbers up if necessary.
Also, I think I solved one of the problems with my design. In my model, I had conceptualized a Level 1 factor moderating the effect of a Level 2 factor on a Level 1 factor. I had thought that this could not be done in HLM. However, I now think that this is the same as a Level 2 factor moderating the effect of a Level 1 factor on another Level 1 factor, which of course is perfectly feasible in multilevel modeling.
I still have a remaining problem, however, in that I have an "upward" effect from a Level 1 factor to a Level 2 outcome. I think that is not possible to test this in HLM but I'm not sure about MPLUS.
Regarding your last paragraph, the way something like this can be accomplished in Mplus is that you use the random intercept (or mean) of the level 1 factor (which is assumed to have variation across level 2 as well) to predict the level 2 outcome.
One remaining issue is that it seems that the file fixing task needs to be the same for all groups to do this 2-level modeling.
Tianyi Yu posted on Thursday, October 09, 2008 - 6:51 am
Hi, Dr. Muthen:
I want to test a multilevel model in which the level 1 is a growth model. But at the level 2, I want to use the intercept and slope as predictors of the other outcome variables.
So, the model looks like:
s | y on time;
%between% [s y]; s with y; z on s; z on y;
I got a warning message as:
In the MODEL command, the following variable is an x-variable on the BETWEEN level and a y-variable on the WITHIN level. This variable will be treated as a y-variable on both levels.
Is that still the case in Mplus version 5 that one variable can only be either x-variable or y-variable on both levels?
If I really want to test such kind of model, is the latent growth model (SEM) the better (or the only way) to do it?
I am trying to design a study for use with Mplus. New user so please bear with me.
I am attempting to run it as a multilevel SEM, but when I originally designed the study, I had many more clusters than it seems I now have to work with. Is there a minimum number of clusters required for Mplus to run the model? I know that with a a low level 2 sample size that I will have limited statistical power in my level-two analysis and that for all practical purposes, I may have to forget about statistical significance at level-two and instead focus on the magnitude of the relationships between latent variables. But my question is, can I use MPlus to run the analysis regardless if that's my preference, or will I have to find other software? Will I get an error message with too few clusters? Thanks in advance.
It is recommended to have a minimum of 30-50 clusters for multilevel analysis. This is not specific to Mplus. I think the error message you refer to says you have more parameters than clusters. If you have further questions about this, please send your full output and license number to firstname.lastname@example.org.
Thank you. I am not yet at the point of having data to run, but will send in the output if at that point I encounter a problem. Just to clarify, the error message for more parameters than clusters...which is a problem likely to arise with my small cluster number...will this message pop up INSTEAD of an output, basically keeping me from running the analysis, or will the program just warn me of the problems with an error message while still providing the output for review? I just want to make sure that I can use your program for my analysis at all. Thanks again for the help.
Hello Lynda Im trying to write syntax for a 2-1-1 mediation, with 3 separate factors (la, lm and lam) predicting t in turn predicting the latent variable eg (that is indicated by e1 e2 e3 pe and cy). I am obviously confused as I get the following output. Can you help please, thansk M
VARIABLE: NAMES ARE y s a g t e1 e2 e3 la lm lam pe cy; WITHIN = y a t e1 e2 e3 pe cy; BETWEEN IS la lm lam; CLUSTER IS s; CENTERING = GRANDMEAN (y a g t e1 e2 e3 pe cy); ANALYSIS: TYPE IS TWOLEVEL RANDOM; MODEL: %WITHIN% t; eg BY e1 e2 e3 pe cy; eg on t(b); %BETWEEN% egb by e1 e2 e3 pe cy; t eg; t on lam (a); t on la; t on lm; egb on t(b); MODEL CONSTRAINT: NEW(indb); indb=a*b; MODEL CONSTRAINT: OUTPUT: TECH1 TECH8 CINTERVAL; example warnings*** ERROR in MODEL command Within-level variables cannot be used on the between level. Within-level variable used: E1 *** ERROR in MODEL command Within-level variables cannot be used on the between level. etc., The following MODEL statements are ignored: * Statements in the BETWEEN level: T EG EGB BY E1 EGB BY E2 EGB BY E3 EGB BY PE EGB BY CY EGB ON T T ON LAM T ON LA T ON LM
Variables on the WITHIN list cannot be used in the between part of the model. If you remove them, they can then be used in both parts of the model.
Utkun Ozdil posted on Wednesday, December 01, 2010 - 11:47 am
Hi Drs Muthen,,
I've just begun learning MPlus to conduct multilevel structural modeling. Although I'm familiar with LISREL,, MPlus and multilevel modeling are quite new for me. Unfortunately,, I have to learn the two via books, articles, and manuals by myself... =( And I got confused at the very beginning as I read and read all these materials.
On the course of learning I think I need some practical highlights about the framework of multilevel modeling or the step-by-step procedures to follow in data analysis (which would make me easily handle the core issues)... I would appreciate your recommendations.
And I have one more question:::I have a large data set saved as .sav file in SPSS. Does MPlus allow exporting a file such as that? (like LISREL?)
If you go to the website, you will find our course videos and handouts. You might find these useful. Chapter 9 of the user's guide contains many multilevel examples.
Mplus reads only numeric ASCII files.
Utkun Ozdil posted on Thursday, February 10, 2011 - 5:11 am
While I was watching the course videos about multilevel analysis I noticed that a variable (e.g. f1) is treated as an observed variable in the within part and the same variable is treated as a latent variable in the between part. I would appreciate if you explain the reason for that.
I have a longitudinal data set where time (5 time points) is nested in kids (3000) who are nested in classroom (150) which are nested in schools (64). Does Mplus6 allow me to run a 4-level multi-level model?
Nidhi Kohli posted on Thursday, September 08, 2011 - 10:38 am
I am trying to fit an SEM model on a 3-level nested dataset where n = 322. The dependent variable is unordered categorical variable with 3 categories: 0, 1 & 2. I ran the model in Mplus and got the following messages:
THE ESTIMATED WITHIN COVARIANCE MATRIX COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 1900. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.
My questions is how can I address the above mentioned issues in an effective way. Here is the Mplus code:
VARIABLE: ... CATEGORICAL = alt_dep anx_att anx_diag dep_diag other_mh_diag; COUNT = anx_meds dep_meds other_mh_meds; WITHIN = ...; CLUSTER = rand_cid rand_mdid; ANALYSIS: TYPE = COMPLEX TWOLEVEL; ALGORITHM = INTEGRATION; MITERATIONS = 2000; PROCESSORS = 8; MODEL: %WITHIN% f1 BY anx_diag anx_meds anx_att; f2 BY dep_diag dep_meds phq2tot; f3 BY other_mh_diag other_mh_meds; f1@1f2@1f3@1; [anx_meds@0 anx_att$1@0 anx_diag$1@0]; ... f1 WITH f2; ... alt_dep ON f1 f2 f3;
Hi Linda and Bengt, I have what I hope is a somewhat simple question.
I have seen each of you mention that that the between level often doesn't support the same number of factors as the within level. I learned multilevel modeling in Mplus from Bob Vandenberg who also asserted the same thing. In one of his examples he has two factors at the within level that he collapses into a single inclusive factor on the between level.
I am running into a similar situation. I have two work-unit climate constructs (service climate and psychological safety) that load nicely as independent constructs on the within level. But, after many different modeling attempts, it seems that I may be best served to collapse them into an omnibus unit-level climate factor at the between level.
So, here is the question...is there a reference/paper/citation for this phenomenon (i.e., fewer factors on the between level)?
If not, what justification/description would you include in a manuscript to convince reviewers that it is appropriate to model a different factor structure at the between level?
jenny posted on Thursday, January 19, 2012 - 7:59 pm
Hi, I am attempting to test a moderated-mediation model with the variables in the mediation (x, m, y) conceptually at the group level (level 2), The moderating variable (z) conceptually at the individual level (level 1). Data collected for x, m, and y were through individual responses (and using iccs and rwgs to justify aggregation to the group-level). The model depicts x (level 2) and z (level 1) interacting to affect m (level 2) and y (level 2).
In other words, we are interested in testing a 2-way interaction between a Level 2 IV and a Level 1 moderator on a Level 2 DV, mediated through a Level 2 mediator.
I am just starting to familiarize myself with Multilevel SEM. I apologize for the very simple question, but to confirm, I should be constructing multilevel data sets in the "long" format correct? Is this the optimal way to analyze multilevel data in Mplus? Thanks for the help!
Hi, I have a data set where we ask participants about 3 important goals (so goal nested within person). I would like to test a model where X1-->y1-->y2-->y3-->Y4, where all the variables are assessed at the goal (within) level. So I was wondering whether I should model the entire path on both the between and within level (like I would for a mediation analysis, following Preacher 2010), or only on the within level? When I tried to model it on both levels (specifying the same model on both), some of the paths are significant only on the between and not the within level, and I am not quite sure what that means conceptually? Also, if I wanted to control for a person-level (between) variable, how would that influence the model, and where would I put that in?
Hello! I am working out some simple examples for a ML-SEM workshop and wanted to demonstrate running a "Maximal" model (Hox, 2002), wherein I covary all within and between variables -- which should just result in the ML covariance estimates at each level (as provided in the SAMPSTAT section).
Syntax for this example: ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% Q1 WITH Q2-Q7; Q2 WITH Q3-Q7; Q3 WITH Q4-Q7; Q4 WITH Q5-Q7; Q5 WITH Q6-Q7; Q6 WITH Q7;
%BETWEEN% Q1 WITH Q2-Q7; Q2 WITH Q3-Q7; Q3 WITH Q4-Q7; Q4 WITH Q5-Q7; Q5 WITH Q6-Q7; Q6 WITH Q7;
What is stumping me is that I am getting a Model chi-sq value greater than 0. The df = 0, but chi-sq = 0.416 in this case. In the past, I have always obtained chi-sq=0. I removed Q1 and tried again (just to play) and got chi-sq=.2.
Any ideas why chi-sq would not be zero? Thank you, Laura
I want to build my BETWEEN model without predictors at the WITHIN level using the TWOLEVEL analysis type, to compare the paths of this BETWEEN-only model against the final model with both WITHIN and BETWEEN models.
My outcome binary variable is measured at the WITHIN level. How would I specify a WITHIN model with no level-1 predictors that contains only the intercept that could be used as the outcome at the BETWEEN level? (So I want to do the %WITHIN% model as Y ON B-0j; then specify my %BETWEEN% model as Y ON Z;)
I have a 3-wave longitudinal design where all variables are nested within individuals, therefore, I follow previous literature to analyze my data as a multilevel SEM model with Mplus. All variables are within-level (repeated) variables. It is important to repeat exactly the same paths both at the within- and the between-level of my model in the input. Most of the significant paths occur at the between level and I have to discuss that in the discussion of my paper. Are the following statements correct?
"All paths were examined at both levels of analysis. A significant path at the within level means that at measurement times that the independent variable is high, the dependent variable is high too. At the between level, it means that if the aggregate level of the independent variable is high irrespective of time, the aggregate level of the dependent variable is also high."
Greetings! I have a 64-bit machine and I am using MPLUS v7. When running Multilevel SEM, I got an error message "NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE. THE ANALYSIS REQUIRES 5 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.75938E+06 INTEGRATION POINTS... "
Much obliged if you could help. Many thanks! Magdalena
I am testing a multilevel SEM model with 4 dependent within-level variables and 10 independent between-level variables. 4 of the between-variables are control variables. The rest are 4 predictors (main effects) and their 2 interaction terms computed manually in SPSS. The only relationships that are specified at the within-level are the intercorrelations between the DV's. All other (hypothesized) relationships are at the between-level. I haven't been able to get any results yet. Even if I try to build a very simple model based on the above, I always get an error which -if I interpret correct- says that there are negative values in my data file -which is logical since I have standardized the predictors. An example of some of the errors I get:
Invalid symbol in data file: "-" at record #: 8, field #: 62
Thank you Linda! I replaced commas with dots in the data file and it works now. There is another problem though, I now get:
A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX [...] DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS.
Indeed my clusters are only 30. I have tried to reduce my parameters as much as I can but I still get the error. Can I ignore it or do I really have to reduce the parameters more?
I have also tried estimation = BAYES because I have heard it works well with small samples. Then I do not get the error anymore but I am not sure how to interpret the findings. The "p values" are similar to what I was getting previously with ML estimator. However, I also get "significance" based on the confidence intervals and this is non-significant nearly for all findings. Do I need to look at "p values" or at "significance" and its associated asterisk to interpret my findings?
Thank you Linda. I'm now reading Bengt Muthen's (2010) working paper on Bayes but there are a couple of things I am trying to clarify:
1. One of my interaction effects has a standardardized estimate of -0.336 and a p value of 0.04 but the CI include zero (-0.670-0.032). Do I have to report this interaction effect as significant or non-significant?
2. In the example of multilevel analysis in this article priors are specified but I do not have this information for my model. Can I run the model with the default without specifying priors?
I am trying to build a multi-level model in MPLUS but am running into some confusion. I have nine indicators that load on three latent factors, and I want to test whether these three latent factors predict minutes to relapse (DV). Measures are clustered within individuals because we examined the DV in two conditions: deprived and non-deprived. I would also like to examine whether one variable mediates any of these relationships (also measured in both conditions). And last, I would like to see whether deprivation moderates any of these relations. Can all of this be done in MPLUS, and if so is a multi-level model the best way to test these questions? I am trying to set up the most basic model (not accounting yet for mediation or moderation) with the following syntax: â€¦ CLUSTER = sid; WITHIN = deprived; BETWEEN = shs cesd ad gdd gda aa asrs aqr audit; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% delay ON deprived; %BETWEEN% f1 BY shs cesd ad gdd; f2 BY gdd gda aa; f3 BY asrs aqr audit; delay ON f1 f2 f3; â€¦ But with this, I get the following error: THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX etc. It seems like I need to change start values, but I wanted to make sure this was the proper set up first.
When I switched the data back to wide format, I realized I am a little confused as to your response above. Participants have two 'delay' scores - one when nondeprived and one when deprived. I would like to test the overall main effect of the latent factors on the 'delay' score, as well as whether deprivation moderates these links. With the above model,
MODEL: delay ON deprived; f1 BY shs cesd ad gdd; f2 BY gdd gda aa; f3 BY asrs aqr audit; delay ON f1 f2 f3;
this only accounts for one delay score with data in wide format. Would I then need to take out the first statement and include both in the model:
MODEL: f1 BY shs cesd ad gdd; f2 BY gdd gda aa; f3 BY asrs aqr audit; delay_nondeprived ON f1 f2 f3; delay_deprived ON f1 f2 f3;
And then constrain the pathways to the two different delay scores to test for moderation? I have a relatively smaller sample size, so I am also trying to figure out how to maximize power and wasn't sure if using long format data with multi-level modeling would do this.
Ok, so deprivation status does not define groups. Then I would go with your second MODEL, that is, you have 2 delay outcomes (so wide in that regard). And, yes, moderation can be thought of as the differences in their coefficients.
Hello, for my multilevel model with latent var x = level1 (student) predictor and observed xa (aggregated x) = level2 (classroom) predictor I have 2 questions:
1) %within% Mod indices suggest "xa with xa" and indicate that M.I. would be 310.898.
a) What does this mean? (I guess I should set free the variance of xa across classrooms, but is this not a default?) b) How can I change my model effectively to improve model fit?
2) With two latent predictors on level 1 (x1, x2) and level 2 (xa1, xa2) I want to test for interaction on level 1. When including the latent interaction term f | x1 XWITH x2 in the equation, the following error message appears "THE ESTIMATED BETWEEN COVARIANCE MATRIX COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 194. CHANGE YOUR MODEL AND/OR STARTING VALUES."
a) Is the problem that I can not include latent interactions on level1? b) How can I test them ? c) If this is not the problem, what could be wrong?
Thank you very much. I appreciate your helpful comments.
thank you very much. I will send the files to mplus support. However, I would like to know if it is generally possible to include interaction terms (either observed or latent) only on the %within% or only on the %between% level?
hogehoge posted on Friday, August 16, 2013 - 4:28 am
I ran the model below and got the following error messages.
Model: %within% STRESS on DEMAND CONTROL; SLOPE | DEMAND on SEX; %between% STRESS on DEMAND; MSICKLR on JUN STRESS; SLOPE on JUN;
*** ERROR in MODEL command Observed variable on the right-hand side of a between-level ON statement must be a BETWEEN variable. Problem with: DEMAND *** ERROR in MODEL command Observed variable on the right-hand side of a between-level ON statement must be a BETWEEN variable. Problem with: STRESS *** ERROR The following MODEL statements are ignored: * Statements in the BETWEEN level: STRESS ON DEMAND MSICKLR ON STRESS
But when running the model without random slope, I didn't get such errors.
Model: %within% STRESS on DEMAND CONTROL; DEMAND on SEX; %between% STRESS on DEMAND; MSICKLR on JUN STRESS;
Is it impossible to use observed within-level variables on the right-hand side of between-level ON statement? How can I change the model with random slope?
For the run that you sent to Support, all you have to do is to remove the unnecessary request for integration.
hogehoge posted on Tuesday, August 20, 2013 - 12:41 pm
I'm sorry for the mishap. Thank you so much.
hogehoge posted on Monday, September 09, 2013 - 9:02 pm
When using within-level independent variables as between-level dependent variable, I got the following warning messages.
Model: %within% A on B C; B with C; %between% A on B C; B on D; C on E;
*** WARNING in MODEL command In the MODEL command, the following variable is a y-variable on the BETWEEN level and an x-variable on the WITHIN level. This variable will be treated as a y-variable on both levels: B *** WARNING in MODEL command In the MODEL command, the following variable is a y-variable on the BETWEEN level and an x-variable on the WITHIN level. This variable will be treated as a y-variable on both levels: C
Then I have a question. Is the within-level correlation between B and C residual correlation?
Dear Drs Muthén, I am new to multilevel modeling with Mplus. I understood that an observed DV (I have one in my model) on the within-level has a between level counterpart, which is automatically created by Mplus. I would like to know what this counterpart is. In the Mplus user guide v7.0 it is writen that "In the within part of the model, the ON statement describes the linear regression of y on the observed individual-level covariate x. " (exampl. 9.1; p.262) and "In the between part of the model, the ON statement describes the linear regression of the random intercept y on the observed cluster-level covariates w and xm."
a) Is the between counterpart the random intercept of y? b) Am I right by stating that on the between-level my (averaged) IVs ( predict the random intercept of my (individually perceived) DV? - I use average values as IVs in the between part of the model and list them in between part of the syntax.I did not average my DV.
a) Yes. You can think of it as the cluster-mean of y.
b) Yes, but IVs can also use the latent variable decomposition into within and between. See
Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229.
Megan Bell posted on Sunday, April 06, 2014 - 8:46 pm
Dear Drs Muthén,
I am a student who is new to MLM in Mplus. I have received conflicting advice on whether I should be running a two- or three-level model for my analysis, and would appreciate your opinion.
I am looking at the impact of child, parent, and neighbourhood characteristics on scores on a developmental outcome measure. My sample is a single population birth cohort (twins excluded), so there is only one child per family.
I have 5 binary (yes/no) outcome variables, and 2 explanatory variables each for children, parents and neighbourhoods.
One person has advised me to run a three-level model with children (L1) nested within families (i.e. parents; L2), nested within neighbourhoods (L3).
Another person has advised me to run a two-level model, with children and parents (L1) nested within neighbourhoods (L2), the reasoning being that there is only one child per family, so parents cannot be on a separate level.
I will be running multigroup models, to compare boys with girls. I have also been advised to run separate models for each outcome variable, rather than to include all outcome variables at level 1 and nest them within individuals.
Your advice on the best way to build my model would be appreciated. Please let me know if anything is unclear.
I would do TWOLEVEL with children and parents nested in neighborhoods. When each cluster has only one observation, there is not ill effect of ignoring that clustering.
I would run the theoretical model using all outcomes for boys and girls separately as a first step.
Megan Bell posted on Monday, April 07, 2014 - 6:04 pm
Thank you Linda, appreciate your advice.
Carolyn CL posted on Friday, October 24, 2014 - 10:11 am
Dear Drs. Muthen,
I am running a TWOLEVEL SEM (N = 1285). I have a large number of clusters (N = 479) but few observations per cluster (Min = 1 (40%), Max = 16, Mean = 2.68). All measures are at the individual-level, but the model takes into account potential school-based clustering (ICC's are mostly low, ranging from 0.5% - 13%, with one variable at 34%). All pathways are estimated at the within and between levels.
IVs: 3 dummy variables (reflecting categories of SES), sex, age DVs: 2 latent variables (one with 3 continuous indicators, the other with 5 categorical indicators), 2 categorical variables, 1 continuous variable
The model will not run - usually getting stuck during bivariate or univariate estimation. I tried variations in my modeling approach, such as only modeling factors on the within level (in CFA they appear not to fit on the between level), using cluster_mean to create between-level variables, and switching from ML to WLSMV. But nothing works - I always get an error message and no results. Much of the time, it appears that the problems are with bivariate estimation for my 3 dummy IVs (e.g. "SINGULAR INFORMATION MATRIX PROBLEM OCCURRED IN THE BIVARIATE ESTIMATION".
I am hoping you may be able to provide some assistance with why the model is not running.
The error message you report seems to be when you use WLSMV. What happens when you use ML?
I assume that you have first explored parts of the model and made those converge before putting it all together.
Carolyn CL posted on Friday, October 24, 2014 - 1:27 pm
As per your recommendation, I estimated parts of the model using WLSMV to ensure that it ran at a basic level (i.e., each DV regressed on the 3 SES dummies, age and sex), these all seemed to work fine. The next step, however, of adding a second DV and a structural component starts leading to issues. Usually involving estimating the alpha, beta or psi for certain variables or associations.
For example, I received the following, with results but no standard errors:
NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED. SLOW CONVERGENCE DUE TO PARAMETER 18. THE FIT FUNCTION DERIVATIVE FOR THIS PARAMETER IS -0.98714615D-02.
The parameter is a Beta linking a categorical DV to another categorical DV.
Trying to run these models instead using ML led to the following error message, with no results:
Observed variable on the right-hand side of a between-level ON statement must be a BETWEEN variable.
Mark Prince posted on Tuesday, February 10, 2015 - 7:36 am
I am trying to run a 2-1-1 MSEM with random slopes and I keep getting the following errors:
*** ERROR in MODEL command Observed variable on the right-hand side of a between-level ON statement must be a BETWEEN variable. Problem with: HELPSTRAT *** ERROR The following MODEL statements are ignored: * Statements in the BETWEEN level: TOTLDRNKS ON HELPSTRAT
Here is my code (below). CD1 and CD2 are level 2 variables totldrnks and helpstrat are assessed at level 1
This relates to today's posts with Falkenstrom. Because you define a random slope for helpstrat on Within, there is no latent variable decomposition into within and between parts of helpstrat as there is otherwise, so that there is no between part of helpstrat to regress on on Between in your statement:
totldrnks on helpstrat (bb1);
You have to create a cluster-level version of helpstrat, say using the Cluster_mean option.
I am interested in the group level factor analytic results. Just a few questions:
1. When I run a twolevel CFA, should the between group results be the same/similar to the results when I run a single-level analysis BUT using the corrected between group correlation matrix that Mplus generates as the data file (and specifying the correct ns at the group level)?
2. Mplus generates a between group correlation matrix (using a type=basic twolevel specification). How is the corrected covariance matrix (as per Muthen ) scaled into a correlation matrix in this instance (it's not anymore dividing the covariance by the product of the SDs)?
3. How different are the decompositions of the within and between correlation matrices from the WABA (within and between analysis) method described by Dansereau et al., 1984? (my understanding is that there is something off with the between correlation matrix computed using WABA).
2. Send the relevant output to support to show the difference you refer to.
3. I am not familiar with WABA.
Yoosoo posted on Saturday, March 28, 2015 - 3:16 pm
I have a question regarding the multilevel latent covariate (MLC) model with binary outcome and a formative/aggregated Level 2 contextual variable .
My data has two-level structure, (individuals within community) with low sampling ratio. The outcome is a binary variable (healthy/unhealthy). The independent variables include a binary variable at level 1 (HCARD, possession of health card) and a contextual variable at level 2 (% community population with health card).
I applied TYPE= TWOLEVEL COMPLEX RANDOM with MLC by excluding the HCARD variable in the within/between variable section. Both my within/between models include regression of outcome on HCARD. I'm getting the following error:
*** ERROR in MODEL command Unrestricted x-variables for analysis with TYPE=TWOLEVEL and ALGORITHM=INTEGRATION must be specified as either a WITHIN or BETWEEN variable.The following variable cannot exist on both levels: HCARD
Do you have any suggestions on what may be wrong with my model? Also is my method of introducing MLC correct, for an formative aggregate L2 contextual variable (that is analogous to community gender composition)?
I have already completed a CFA at the individual level (teachers) but I need to look at the next level (teachers in schools) to determine if there is between-level and within-level variance.
I am attempting to run a Multilevel CFA using a number of resources. One article aligned with my research suggest that I first create within and between matrices and obtain ICC values, then run a confirmatory factor analysis on the within matrix. When I attempt this step (run the CFA referencing the within matrix) I get the error message *** ERROR Insufficient data in "WinCov.dat"
Is there a resource you can recommend for me to double-check my language or to ensure that the SAVEDATA generated SAMPLE IS WinCov.dat; file is complete?
Thank you for your response.I have two follow-up questions.
1) I am trying to use the MLC approach as per the following paper:
Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229.
I was under the understanding that MLC approach uses a single variable to represent both level 1 and level 2 influences. Does using cluster_mean option apply to the MLC approach?
2) Sorry I did not explain it fully. My data was sampling used stratification method. Would COMPLEX option be suitable for my model?
Thank you very much for your excellent support as always. You've been a great help to my research!
1) The cluster_mean option is not using the latent between-variable approach of MLC. it is simply the observed cluster mean. The latent between-variable approach of MLC is not available with algorithm = integration. Algorithm = integration is needed with random slopes.
Removing RANDOM from my analysis did not resolve the problem. I believe this is because my outcome is a binary variable (as elaborated in my first post), which I believe requires algorithm=integration.
Would you please suggest if there is any other way to apply MLC on a binary outcome on MPlus? Thank you so much.
Two alternatives: 2-level WLSMV or ML adding a factor behind the X.
For 2-level WLSMV, see the UG ex 9.9 and the paper:
Asparouhov, T. & Muthén, B. (2007). Computationally efficient estimation of multilevel high-dimensional latent variable models. Proceedings of the 2007 JSM meeting in Salt Lake City, Utah, Section on Statistics in Epidemiology. download paper contact first author show abstract
2) Do I need to make further changes to the code since my cluster average x is a ratio and not average (referred to as "formative" aggregate in the following paper):
Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229.
I wonder if this is better suited for SEMNET, but would you help me understand why binary covariates may be better off without latent decomposition? Is it a theoretical issue, or more of a practical concern (computational)? My data has low sampling ratio (~0.15) and low number of LV.1 units/cluster. I understand that MLC may still be biased but I wanted to use it as a comparison to MMC. Thank you so much for your patient support as always.
The latent variable decomposition assumes that the latent between and within parts are uncorrelated, normal variables.
Cynthia Yuen posted on Monday, September 21, 2015 - 10:31 am
I am trying to run a model using daily diary data where the person is the cluster. I'm interested in whether the level 1 random slope b1 between X (obsv) and M (latent) predicts a level 2 outcome Y (latent), such that a steeper slope is related to greater Y. I'm also trying to show that another level 2 variable W (latent) moderates b1, with the goal that b1 predicts Y only at certain levels of W. Does this make sense to do/is it possible? If so, would this be the appropriate code? Thanks a lot!
Cynthia Yuen posted on Tuesday, September 22, 2015 - 3:07 pm
How would you interpret this if B1 ON W was significant but Y on B1 was not? Does this mean that the slope does not predict Y because the slope is only significant for individuals at a certain level of W? How would you unpack that to find out the relationship between Y and B1 for people who vary in W? Thanks again!
Y ON B1 nonsig: The slope does not influence the intercept
Cynthia Yuen posted on Wednesday, September 23, 2015 - 4:24 pm
In the case where Y ON B1 was not significant but B1 ON W was sig, does Y ON B1 take into account the effect of W? For example, if I anticipated that B1 would be positive for people high on W but nonsig or negative for people low on W, would the nonsig Y ON B1 apply to the whole sample, regardless of their scores on W? Or could Y ON B1 be significant only for those high on W? If that's the case, how would I test that?
A rather basic question on multilevel modeling, I am afraid.
I was under the assumption that by not declaring variables as within or between mplus would separate the variances and analyse the between and within parts independently, thus I was expecting to have the same results with model 1 as with model 2 below: Cluster = team; between = ; within = ; Analysis: Type = twolevel random;Estimator = ml; !model 1 Model: %Within% x y; %Between% x on y;
!model 2 Model: %Within% x on y; %Between% x on y;
However, in the first case the between effect of x on y is significant, coefficient .84, while in the second it is not, dropping to .45 and the within effect of x on y is significant, coefficient .32.
Does this mean that even if we are only interested in the between effects the within effects must be estimated, otherwise the between effect will be biased? Many thanks, Claudia
Check that you have the same number of parameters in the 2 models. Also ask for TECH3 so you can see which Within parameters are correlated with which Between parameters. If you find non-zero correlations, misfit on one level can affect misfit on the other level.
Thank you very much for your answer, I am not quite clear about how to interpret your note or the results though.
Indeed, the number of parameters estimated differs (2w+5b in model 1; 2w+3b in model 2), but i wasn't expecting these to be the same, as in model two I am also estimating the within x-->y effect and in model 1 I don't.
Also, the correlations between w and b parameters are not 0 - at least not always. Should they be? What are the implications of this? I have test the same models and looked at correlations between parameters using datasets from MPlus website examples (e.g., file ex9.1a.dat) and the same occurs, so I don't think this is something particular to my data. You indicate that 'If you find non-zero correlations, misfit on one level can affect misfit on the other level' how should this be then addressed?
Finally, would the correlations between within and between parameters be 0, would I then find that the between effect x-->y would remain consistent regardless of estimating or not within x-->y?
Apologies if I am lacking basic notions that would explain this, I have tried to look for this information but couldn't find anything, I am happy to read further is there is basic underlying knowledge I am missing and you can point me to a source. Thank you again, Claudia
I want to study the cross-interaction effect between x1 and w. Names are varid u1-u6 x1 x2 w; Categorical=u1-u6; !u1-u6 are binary within=x1 x2; ! x1 and x2 are ordinal between=w; !w is nominal with seven categ. cluster=varid; Analysis: type=twolevel; Model: %within% f1w by u1*-u3; f2w by u4*-u6; f1w f2w on x1 x2; f2w on f1w; f1w@1; f2w@1; %between% f1b by u1*-u3; f2b by u4*-u6; f1b f2b on w; f2b on f1b; f1b@1; f2b@1; 1. Can I consider the above model? 2. I have generated dummy variables. Should I run six models for each dummy variable as between variable? 3. How do I interpret the results? Thank you very much.
Ald posted on Wednesday, December 30, 2015 - 11:32 am
Thank you. I add the random slopes:
Names are varid u1-u6 x1 x2 w; Categorical=u1-u6; !u1-u6 are binary within=x1 x2; ! x1 and x2 are ordinal between=w; !w is nominal cluster=varid; Analysis: type= twolevel random; estimator= ml; integration= montecarlo(500); Model: %within% f1w by u1*-u3; f2w by u4*-u6; f1w f2w on x2; f2w on f1w; f1w@1; f2w@1; s1 | f1w on x1; s2 | f2w on x1; %between% f1b by u1*-u3; f2b by u4*-u6; f1b f2b on w; f2b on f1b; f1b@1; f2b@1; s1 s2 on w;
I am wondering if I omit further model specifications and about defining two slopes in the same run or one slope at a time. Thank you very much.
Syntax looks ok. Do one random slope at a time as a start.
Yulan Han posted on Wednesday, April 27, 2016 - 2:11 am
Dear Dr. Muthen,
I¡¯m doing a multilevel SEM using type=twolevel. I have 854 individuals from 153 working teams nested in 15 firms. My interest is in how team-level variables influence individual-level variables. But the reviewer asked me to consider the effect of firm using type=complex or dummy variables. I tried dummy variables. But, because there are too many dummy variables (14), the model fit indices became worse especially CFI and TLI. I also tried type=complex, and the model fit indices are good. Can I use type=complex when I only have 15 firms? I saw you said ¡°Less than 20 clusters makes the statistical analysis difficult¡± on the discussion board. Does it mean the results I got based on 15 firm are not reliable?
I am trying to run a multilevel model with imputed data to test for a cross-level interaction. I have covariates on both the first and the second level which have missing data, so I entered the respective variances in the model. However, every time I include the slope in my model, I get the following error message:
SERIOUS PROBLEM IN THE OPTIMIZATION WHEN COMPUTING THE POSTERIOR DISTRIBUTION. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.
Unfortunately, I was not able to figure out what`s the problem with my model. The syntax is:
[….] model: %within% mathe on S_Sex HISEI; s | S_MIG on mathe; S_MIG; S_Sex; HISEI;
%between% mathe on sform L_MIG L_Sex Year Year_sc Ausb HISEI_kl MIG_kl S_Sex_kl;
Hi, I am running an intercept only two level model and at the output, I don't get an estimate for variance neither for within nor for between level. Standard errors appear as well as between level mean. Both variances appear significant though. My sample is small 130 observation across 29 clusters. Is it possible that the reason I dont get an estimate for the variance is due to the small sample?
I am running a multilevel (2-level) analysis (2-2-1 model) and I have 2 problems.
1. My cluster variable is the census tract. However, some census tracts have decimals (e.g. 701.01, 701.02). Mplus seems to recognize this as a variability in the cluster variable and gives me an error message.
2. My outcome variable (y) and other covariates (x1, x2, x3) are individual-level variables and I treated them as both within and between levels variables. When I regress y on x and m (both between level variables), I do not have any problem. But when I include x1, x2, and x3, it tells me x1, x2, x3 cannot exist on both levels and must be specified as either a WITHIN or BETWEEN variable. How do I address this as I intend to control for x1, x2, and x3? My code is below.
VARIABLE: NAMES ARE clu x m y x1 x2 x3; USEVARIABLES ARE clu x m y x1 x2 x3; MISSING ARE ALL (99); CATEGORICAL ARE y; BETWEEN ARE x m; CLUSTER IS clu;
ANALYSIS: TYPE IS TWOLEVEL RANDOM;
MODEL: %BETWEEN% m ON x x1 x2 x3 (a1-a4); y ON m x1 x2 x3 (b1-b4); y ON x x1 x2 x3 (c1-c4);
MODEL CONSTRAINT: NEW(direct indirect total); indirect = a1*b1; direct = c1; total = c1+ a1*b1;
I created a multilevel SEM using the following model command (A, B, X, & Y are observed variables):
VARIABLE: NAMES ARE Group A B X Y z1 z2; USEVARIABLES ARE A B X Y; CLUSTER = Group
ANALYSIS: TYPE IS TWOLEVEL; ESTIMATOR IS MUML; ITERATIONS = 1000; CONVERGENCE = 0.00005; COVERAGE = 0.10;
%WITHIN% A on X Y; B on X Y;
%BETWEEN% A on X Y; B on X Y;
Instead of using the observed variable, Y, I'd like to use a second-order latent variable, Z consisting of two observed variables, z1 & z2. Is it possible to use a second-order latent variable in a multilevel SEM? If possible, what command should I use?
My model is similar to Mplus Short Course 7, slide 81. x and y are level 1, m is level 2. I am not using monte carlo. I have random slopes. I would like to add modifiers to the x->m relationship and the m->y relationship. Can you show me how to do this? Thank you.
Perhaps something like this is what you have in mind, letting xz1 be the interaction between x and the z1 moderator and mz2 be the interaction between m and the z2 moderator (I assume Z1 and Z2 are between level):
Let's talk about case 1. (case 2 is analogous). The x-> m relationship is a between-level relationship given that m is a between variable. This means that on Between x has to appear in its cluster version (either by latent variable decomposition as in UG ex 9.2 or by computing its observed cluster mean). And the z1 moderator variable similarly has to appear on Between in its cluster version.
Thank you. If I use the 'observed cluster mean' method, I would compute xmean by taking the average of x for each cluster. For slide 34, I would take the mean for each school. I compute the average of z1 for each school which is z1mean. Same for z2.
Now I have xmean, z1mean, z2mean. Can you show me how to write the coding for this model if I have random slopes? To find this model, look at August 23. Thank you,
Thank you. I am in situation 2 (Moderation of Between-level). My level 2 variable is only measured at level 2. This variable is department morale (DM). I was trying to use the school example as this example is discussed in the slides. I need to prove to my colleagues that I can use Mplus for this type of situation. I have seen published papers using mplus for this type of problem.
Can you show me how to handle xmean->m measured at level2, modified by z1mean [random slope] Or should I create a product term? Not sure how to start.
Yes, create a product term in line with my suggestion (2).
Guillem Rico posted on Wednesday, October 11, 2017 - 5:01 pm
Dear Dr. Muthén,
I am new to Mplus. I would like to estimate how different job attributes affect interest in a job position. I am using a conjoint study where respondents are asked to rate their interest in different types of jobs. The dataset is structured in stacked form, such that each job rating is a separate observation (there are eight different observations per respondent), with job attributes as (dichotomous) predictors. I want to examine how the effect of these job attributes vary across various respondents' latent characteristics (associated with different observed variables at the respondent level). I have two questions:
1. What kind of multilevel/complex data approach is more appropriate for this? Please note that I am trying to estimate a cross-level interaction here.
2. Is it feasible to obtain latent variable estimates at level 2 (i.e. respondent) within the same model? If so, could you provide some guidance as to how to do this?
Thank you! My first attempt apparently run without problems and results make sense to me, but I am wondering if I am actually doing what I intend to. Specifically, I want to see how the effect of job attributes (sector, service, duration) varies with respondents' latent characteristics (aps, cpv, com, ss). Female, age, and findjob are covariates. I am using this syntax:
Variable: Names are [omitted] ; Usevariables = id female age findjob accept duration service sector mp7_a mp7_b mp7_c mp7_d mp8_a mp8_b mp8_c mp8_d mp9_a mp9_b mp9_c mp9_d mp10_a mp10_b mp10_c mp10_d ; Missing are all (-999) ; Within = duration service sector ; Between = female age findjob mp7_a mp7_b mp7_c mp7_d mp8_a mp8_b mp8_c mp8_d mp9_a mp9_b mp9_c mp9_d mp10_a mp10_b mp10_c mp10_d ; Cluster = id ;
Analysis: Type = twolevel random ;
Model: %within% s1 | accept ON sector ; s2 | accept ON service ; s3 | accept ON duration ;
%between% aps BY mp7_a mp7_b mp7_c mp7_d ; cpv BY mp8_a mp8_b mp8_c mp8_d ; com BY mp9_a mp9_b mp9_c mp9_d ; ss BY mp10_a mp10_b mp10_c mp10_d ; accept ON female age findjob aps cpv com ss ; s1 s2 s3 ON aps cpv com ss ; accept WITH s1 s2 s3 ;