Message/Author 


Hello, I have 2 questions regarding correlating variables in multilevel models. 1) In addition to having 2 latent variables at level 1, I have a measured predictor variable (no error). I was wondering why, at the second level, this measured variable (which becomes latent at level 2) is not correlated to the other latent exogenous variables by default? I tried correlating it through syntax, but it made the model fit considerably worse. 2) If I want to make a variable at level 1 only correlate with the error terms of the other variables at that level, how would that be stated in Mplus sytax? In the manual, I could only figure out how to correlate variables themselves, not a variable with error terms of other variables. Thanks! 


Please send your questions along with output that you can point to along with your license number to support@statmodel.com. I am not sure exactly what you are asking. Please be specific in your questions referring the the parameters in question by name. 


Hello, I am using a multilevel growth curve modeling to examine whether any of my covariates explain variance in the growth factors at the within and between level. Why would the model fit (chisquare value) change when I explicitly specify (WITH command) the correlations/covariances among the predictors? Thank you! 


Means, variances, and covariances of observed exogenous variables, covariates, are not part of a regression model. When you mention these variables using WITH statements, they are treated as dependent variables in the model. Distributional assumptions are made about them and their means, variances, and covariances are estimated. 


Thank you for your answer. I am not sure whether I really understand though. Does it mean that if I don't use WITH statements, exogeneous variables are considered to be orthogonal to each other? What also confuses me is that when I am running a simple path model (not a multilevel model) where I use WITH statements (to specify correlations among the exogeneous variables) vs. when I don't include WITH statements, the chisquare value stays the same. I appreciate your help! 


No, it does not mean that the covariances are zero. You can think about it as though the covariances are fixed at the sample values. When TYPE=GENERAL is used with continuous outcomes, it just so happens that whether observed exogenous variables are treated as independent variables in the model or dependent variables in the model, the results are the same. This is not the case in other situations like multilevel modeling. 


Thank you again. And one more thing: So, I have not used WITH commands to specify covariances among exogeneous variables in my multilevel model. However, if I want to present covariances and correlations of these variables, will I use the estimates from the output (using sampstat command)? And, how do I get the significance values for those? 


Yes, you get those from Sampstat. You don't get SEs or significance of those quantities by Sampstat. 


How do I get the SEs or significances for the correlations in CFAs or SEMs? Is there any possibility to compute them? 


For correlations among exogenous factors you can look at the STD solutions. A general approach is to express correlations using the model parameters by giving the model parameters labels that are then referred to in Model Constraint. For guidance, see UG ex 5.20. 

Anabel posted on Monday, February 07, 2011  3:48 am



Dear Bengt, thanks a lot for your response. But I think I have to follow up on my question. I need the significances for the correlations among the endogonous and exogonous factors and the manifest variables respectively for my SEM models. I don´t really understand how to calculate those through the model constraint. Could you please give an example? Thanks a lot. 


With an SEM, why don't you focus on the structural parameter estimates (perhaps in standardized form) that the model specifies rather than the factor correlations? If you want the factor correlations, why not formulate a CFA model? Those estimates and their significance should be close to the SEM if the model fits well. UG ex 5.20 gives an example. 


Hi, I would like to know why it is so that when I am estimating correlations between many variables simultaneously (x with y z b), I get different estimates compared to when I only estimate a correlation between two variables (x with y)? In both instances, I should be estimating bivariate correlations (and there are no missing data). Thank you! 


This should only happen if there are missing data. Please send the relevant outputs and your license number to support@statmodel.com. 


Hi, Sorry to ask such a basic question, but I am a new user and getting an error message: *** WARNING in MODEL command All variables are uncorrelated with all other variables in the model. Check that this is what is intended. I am trying to get a correlation matrix for data that are days clustered in people. I've entered the usevariables, the cluster is ID, analysis: Type=twolevel, and output: sampstat. I know the variables cannot truly be uncorrelated  they are correlated in spss before aggregating. Any thoughts? Johnna 


You should use TYPE=TWOLEVEL BASIC; in the ANALYSIS command not SAMPSTAT. SAMPSTAT is used when you are also estimating a model. 


Thank you very much. Now I am trying to create an interaction term using that vertical bar that is just above the enter key:  However, it does not appear when I shift and hit the key. It works fine when I am in Word, but not in MPlus. I have a MacBook Air. Can you think of why the  will not work? Sorry this is not the right spot to post, but I couldn't see a section for key issues. Johnna 


You can try choosing another font in the Mplus Editor. Go under Option and choose Font Selection. Another font may render that character better than the default font. 


It looks like one cannot request standard errors to determine whether betweenlevel correlations are significant? Is there a way to request these? Thanks. Eric 


Please send your output to support so we know what your situation is. 

Eric Deemer posted on Sunday, September 29, 2013  11:18 am



I'm fitting a multilevel mediation model with just 3 variablesX, M, and Y. M has variation on both levels. I want to estimate the betweenlevel correlations but I know that Mplus doesn't provide SEs with correlation output. I was thinking of separately regressing Y on M and M on X since these regression coefficients would be the same as correlation coefficients. Would this be true in the ML framework? Eric 

Eric Deemer posted on Sunday, September 29, 2013  12:35 pm



Just to clarify, my model would be... %WITHIN% y on xw; %BETWEEN% y on xb; ...using group mean centering. Eric 


You can use WITH instead of ON and look at the StdYX results. 


We are examining the validity of therapist scores from a new measure; many therapists rate more than one client. ICCs for the therapist scores range from .21 to .34, and are lower for other variables. We wish to model correlations at the client (within) level to examine construct validity. We used (TID = therapist id): Cluster = tid; Analysis: Type = twolevel; Model: %within% X with Y; output: standardized; We could also use type=complex to examine the same issue. I have run a few correlations with both approaches, and the results differ more than I thought they would (e.g., r=.25 vs. .32). Why would the correlations differ so much? With a very simple analysis such as this, what should we consider in choosing one approach over the other? 


When you estimate the twolevel model, make sure you get betweenlevel variances and covariance for X and Y. If not obtained by default, say X with Y on Between as well. I would use Twolevel analysis. Using Complex implies that you estimate a correlation that is composed of both within and between parts, whereas you are interested in the within part. 


Hi Linda, Hi Bengt, I have 5day data from both partners of couples. I've ordered the data with couple being one case (so five rows for five days of a couple) and variables for each spouse in one row. I have two questions: 1) Because of the way I structured the data, I think I reduced the three levels (couples, spouses, days) to two levels (couples, days). All variables are measured at the daily level though. What is the best analysis strategy for this data? Type is multilevel or complex? I've tested both (estimator = MLR) and the results do not substantially differ. For now, I've chosen to go with TYPE = Twolevel, estimator = MLR and I modeled the regression pathways on level 1, and let Mplus estimate only the variance of the dependent variable at level 2. Is this okay, or would type = complex make more sense since I do not model anything on level 2? 2) What is the best way to get means, standard errors and correlations for the descriptives table? The reviewers want correlations for level 1 and level 2. If I estimate an empty model (only variance of DV at level 1 and level 2) the correlations are quite different from the estimates I get when I estimate the regression model. I do have some missing data. Hopefully you can help me with these questions. Thanks very much in advance, Lieke 


1) It sounds like you have the data ready for a twolevel analysis in long format where days represent level1 (describing what varies across time) and couples represent level2 (what varies across couple). A variable measured each day can have components of variation on both levels. 2) Report the twolevel sample statistics which you can obtain also using Type=Twolevel Basic. 


Hi Bengt, Thanks for your quick reply. To follow up on each answer: 1) Is it wrong to use type = complex for the long format? And if I do use a twolevel model but my hypotheses concern only level 1 effects, is it enough to only estimate variance of the dependent variable on level 2? 2) If I use Type = Two level basic, what model at level 2 and 1 do I specify? Thanks! 


1) Type=complex does not model the level2 relationships so it depends on what your model looks like. For instance, if it is a growth model over the 5 time points that is your primary interest, that is a level1 focus and Type=complex is ok. But if you have a model with relationships between couples you need level 2. 2) Twolevel Basic does not require a model. 


Thanks, issue two is now solved. The other issue is a little more difficult: 1) I don't have a growth model but an actor partner model. More specifically, I want to know how wife's job demands and husband's job demands affect how much support they each give to each other at home, and their rating of family quality. It is a mediation model (job demands > support given > family quality) and as I have those three variables for husbands and wives I model actor as well as partner effects. I measured all variables on five consecutive days. There are very strong level 2 correlations and somewhat weaker but similar correlations at level 1. I wonder if I should model this as a level 1 model, but not center at the group mean (which I usually would do to examine day level effects) so that the between level effects are still there. I don't think I can model this at level 2 because I only have 26 couples. I'd like to make use of the fact that I measured each variable 5 times even if that means that I can only test couple level effects that are measured reliably. Any thoughts on what the best way to model this would be? 


Does your mediation model have contemporaneous relations for its 3 variables or do you have lagged effects? 


They are all same day effects, so no lagged effects. 


You could just do (level1 is time, level2 is subject) %Within% y on x m; m on x; x; %Between% y on x m; m on x; where x, m, and y correspond to your 3 variables for one spouse (you have to extend to both spouses according to your actor partner model). That means that a latent variable decomposition is done of the 3 variables into within and between components. The variables are correlated across time due to their random intercepts and means on the between level. 26 couples is a little low for such twolevel analysis, but it can be tried. 


I've modeled it as you suggested but the model is too complex (I actually have 2 x variables per spouse). Mplus warns that there are more parameter than number of clusters and suggests to reduce the number of parameters. One solutions I'm thinking is to keep the model on within level and control for correlations between the two support variables of spouses (m) and family quality rated by spouses (y). This works better, but Mplus still suggests to reduce number of parameters. What if I go with Type = complex? This model fits well and does not give any warnings. Or is it not possible to use type=complex for this kind of data? Thanks again for your time! 


You can use Type=Complex. 


Great, that will work well. Thank you so much for taking the time to read and answer all these questions. One final question: if I use type=complex, do I interpret the relationships as within or between? In other words, would I say for a significant effect: on days on which men had high job demands (as compared to days on which they had low job demands), they provided less support at home. Or would I say: men who had high job demands (as compared to men with low job demands) gave less support to their wives at home? 


With Type=Complex you are not dividing variables into within and between (the assumption is that this is not important) so you are looking at the relationships between the total observed variables. So the latter interpretation holds. 


Perfect. Thank you so much for your help! I really appreciate it. Lieke 


101 question: I was wondering if I use the right method to get the correlations and their significance at both levels for a study with 2 DV's and x IV's. Could you maybe say if I got it right/wrong? Thanks in advance. USEVARS = DV1 DV2 IV1 IV2 IV3 IV4; WITHIN = IV1 IV2 etc; BETWEEN = IV3 IV4 etc; %WITHIN% DV1 WITH DV2 IV1 IV2; DV2 WITH IV1 IV2; IV1 WITH IV2; etc. %BETWEEN% DV1 WITH DV2 IV3 IV4; DV2 WITH IV3 IV4; IV3 WITH IV4; etc. 


This input looks fine and gives covariances. If you request Standardized, you also get correlations. 


Dear Dr Muthen, Are there any theoretical or methodological reasons to include or exclude the exogeneous variables as dependent variables in the model (using the WITH statement)? Usually, I specify the means of the predictors to make sure that all the data is being used (and Mplus then automatically estimates the correlations). For the results however, it seems to make quite a difference if the correlations between predictors are being estimated or not. Therefore, I would like to know on what ground one should decide to include these correlations. Thank you for your answer! Sincerely, Aurelie 


Generally speaking, I think bringing x's into the model is often done too casually. You are better off generally if you don't have to bring the x's into the model because when you do, you are adding assumptions to your modeling. But with much missing data on the x's, you may not have a choice. In that case, there are several considerations, particularly when some x's are binary. It's a long story. We discuss the issues in our RMA book chapters 9 and 10 where we show simulation studies indicating that certain ways of bringing x's into the model are better than other ways. 

Back to top 