This is a basic question regarding the correlation of exogenous variables in SEM.
I have a mediation model where 1) a latent variable is regressed on several observed exogenous variables, 2) a mediator is regressed on that latent variable and the exogenous variables, 3) a dependent variable is regressed on all of the above variables.
The exogenous variables are known to be correlated (e.g. race, income, education, age). From the Mplus output, it appears that the model does not automatically estimate the correlations among exogenous variables.
How should I approach the correlation of these exogenous variables? Are there specific guidelines or rationale that I should consider? My research question does not specifically involve the exogenous variables; rather, I am adjusting for them because there is known to be sociodemographic variation in my constructs of interest.
In regression, the model is estimated conditioned on observed exogenous variables. Their means, variances, and covariances are not model parameters. If you want to know their means, variances, and covariances, do a TYPE=BASIC.
Thank you Linda. I may have not been completely clear. I am not necessarily interested in obtaining estimates for these correlations.
Rather, I am trying to figure out, what are the implications of including versus not including WITH statements for the exogenous variables when conducting SEM? I.e., how will including versus not including such WITH statements impact the overall model? Is there rationale for choosing when to include these correlations?
I would like to account for correlations between covariates and an exogenous factor. I am regressing:
F2 ON F1 X1 X2; ! X1-X2 observed covariates F2 BY V1-V5; ! F2 endogenous factor F1 BY U1-U3; ! F1 exogenous factor
I plan to use the WLSMV estimator (indicators U_i and V_j are categorical).
A. According to the Mplus 6.1 version note, the default is to no longer include into the model the correlations between covariates, in ML estimations. Is that also the default assumption in WLSMV estimations?
B. By default, Mplus assumes an absence of correlation between the exogenous factor F1 and the covariates X1 X2, right ?
Non-zero correlations would however be more realistic here (F1=impairment, X1=pain, X2=parenting stress). Modification indices do suggest links: F1 ON X1; F1 ON X2;
Yet, domain-driven theory makes more sense of a reverse link: X1 X2 ON F1.
C. What are the statistical implications of replacing "F1 ON X1 X2" by: C1. "X1 X2 ON F1"? C2. "X1 X2 WITH F1"? Does the latter imply a zero correlation between covariates X1 and X2, unless I add "X1 WITH X2" ?
Many thanks in advance for your attention and any guidance,
Thank you very much for your prompt answer, I highly appreciate your constant timeliness in answering our questions.
Just a few precisions I would like to add:
Re:A. I was wondering why your 6.1 version note mentioned only ML estimation (and not others like WLSMV). Was it already implicit in WLSMV, that correlations between covariates' are also left out of the model?
In my analyses, WLSMV regression results differ much, when specifying or not correlations between an exogenous factor F1 and covariates X1 X2. In contrast, with or without those correlations, MLR results vary little (close to WLSMV with the correlations specified). Do MLR and WLSMV handle in different ways the correlation between exogenous factors and covariates?
Re:C1-C2. Actually *both* "X1 X2 ON F1" and "X1 X2 WITH F1" imply a zero correlation between covariates X1 and X2 (conditioned on F1), unless I add "X1 WITH X2", right?
Is this added complexity the reason that
C3. Your 6.1 version note says: "Because covariates are characterized as exogenous variables, using ON with the covariates on the right-hand side is a natural approach"?
C4. Modification indices usually propose "F1 ON X1" and "F1 ON X2" instead of "X1 X2 ON F1"?
Before Version 6.1, all models were estimated conditioned on the observed exogenous variables except maximum likelihood with continuous dependent variables. The model results in this case are the same whether the model is estimated conditioned on the observed exogenous variables or not. We made the change for consistency.
With WLSMV we do not recommend using WITH with observed and latent exogenous variables. We recommend using ON. WITH changes the model.
c1-c2. I think so. Try it and see.
c4. Modification indices are given for all fixed or constrained parameters. Whether it makes sense to add them is determined by the user.
Using WITH the sample statistics for model estimation are tetrachoric/polychoric correlations. With ON they are probit regression coefficients and residual correlations.
ellen posted on Friday, September 14, 2012 - 10:32 am
This is a basic question. It appears that Mplus estimates correlations between exogenous predictor variables by default even when the WITH statements are not specified.
I have 4 exogenous predictors (e.g., A, B, C, D), and 1 mediator, and 1 outcome variable. When I specified WITH statements for the covariances among three exogenous variables (A, B, C), but did NOT specify an WITH statement for the three exogenous variables with the fourth variable, D, the output in Mplus somehow still estimate the covariances in the output.
If I don't want a covariance parameter to be estimated between two exogenous variables, do I need to fix it to zero or simply not estimate it? Does fixing a parameter to zero mean the same as removing this parameter from the model?
We need to see the output, but the answer to your last question is yes.
Stephanie posted on Thursday, November 15, 2012 - 3:39 am
We have questions regarding the correlation of exogenous variables in our SEM. (We are using MPlus Version 5.1).
In our model we have four exogenous variables: One latent variable, two continuous manifest variables and one dichotomous manifest variable. As our dependent variable is dichotomous we are using the WLSMV estimator. Our questions are:
1. Do we explicitly have to integrate a correlation with the WITH statement for all these exogenous variables – manifest and latent? Or does MPlus calculate these correlations by default?
2. If it does, where can we see the correlations with their level of significance in the output? In our current output (without integrated correlations) we can only see correlations between the latent exogenous variable and all the other variables in the model but no correlations between this latent exogenous variable and the other exogenous variables.
3. Are correlations probably only necessary between exogenous manifest and latent variables but not between exogenous manifest variables?
We thank you very much in advance for your support!
With WLSMV, I would not covary the observed and latent exogenous variables. I would relate these variables using the observed exogenous variables on the right-hand side of ON statements where the latent variables are dependent variables.
Stephanie posted on Wednesday, April 10, 2013 - 6:39 am
We have a further question regarding the correlation of exogenous variables in our SEM.
We are using the same model as mentioned in our last question on November 15th. But now we have only four manifest exogenous variables from which three are continuous and one is dichtomous. Our dependent variable still is dichotomous, so we are using the WLSMV estimator.
In this case, do we have to integrate a correlation with the WITH statement for all these exogenous variables? Or does MPlus calculate these correlations by default? And if it does, where can we see the correlations with their level of significance in the output?
In regression, the model is estimated conditional on the observed exogenous variables. You should not mention them in the model command. You can see their correlations in the descriptive statistics from the SAMPSTAT option.
Stephanie posted on Thursday, April 11, 2013 - 12:04 am
Thank you very much for your help! I have included the SAMPSTAT option in my model command. But unfortunately the output only contains correlatios between all other variables in the model but not between the four manifest exogenous variables. How is it possible to get them?
I performed a SEM with observed and latent variables. A reviewer stated that we "Ought provide the correlation matrix".
I am concerned about this, in fact, SEM has been performed to test a model we tested on an another sample. furthermore, the correlation matrix would refer to observed variables rather than latent ones.
1. do you think is really importnat to provide the correlation matrix of the variables tested within the SEM?
2. does it make sense to you that the relationships (coefficients) obtained with the SEM differ from those obtained with correlation? (e.g., a significant relationship in SEM is not significant in correlation matrix)
It makes sense to provide descriptive statistics for the data that are analyzed. Means, variances, and a correlation matrix provide a good description of the data and can be used to reanalyze the data.
Carolyn CL posted on Monday, June 10, 2013 - 1:38 pm
Dear Drs. Muthen and MPLUS experts,
After reading up on the issue of correlating exogenous variables, I would like to be clear on the following:
Model: Y1 ON Z1 X1 X2 X3; !X1-X3 are observed exogenous variables Z1 ON X1 X2 X3; Z1 BY X4 X5 X6 X7; !Z1 is an endogenous latent variable
1. The regression coefficients for variables X1-X3 on Y1 and Z1 are estimated conditional on each other (and on Z1, for Y1).
2. This allows me to say, in the interpretation of the results, that the effect of X1 (for example) on Y1, controls for the effects of X2, X3 and Z1.
3. If I want to say that I allowed X1-X3 to correlate with each other, I would need to add WITH terms as follows to the model:
X1 WITH X2 X3; X2 WITH X3;
In this way, I would effectively be estimating parameters for the covariance between X1-X3.
1-2 are correct, but 3 is wrong. You don't need to, and typically should not, add WITH terms for the observed exogenous variables (what we call covariates). The covariance parameters activated by WITH are not part of the model. You can think of the covariates as being correlated by default - just like they are in regular regression. The sample statistics show their values.
Carolyn CL posted on Monday, June 10, 2013 - 3:35 pm
Many thanks for this.
One final question, what would be the implication for the model and interpretation of the results if I did add covariance parameters to the model by using the WITH statement?
I ask because a reviewer noted that 'We would expect substantial bivariate correlations between the various measures X1-X3. Since these are all modeled simultaneously as manifest variables, I'm concerned about the validity of the model results'.
It sounds like the reviewer does not understand that the x1-x3 correlations are not held at zero in your analysis. Your analysis is ok if these 3 variables correlate, which covariates typically do.
In your case, if you have no missing data on x1-x3, if you add WITH among your 3 covariates you will get the same results, except that CFA and TLI will be inflated.
GP posted on Thursday, September 19, 2013 - 12:54 pm
Dear Drs. Muthen,
I'm running the path model below. x1, x4, and m are dichotomous; x2 and x3 are continuous. tc is censored, with corresponding time variable t. I also have a clustering variable c.
I would like to know how to model the correlation between x2/x4 and x3/x4. I understand I cannot use the WITH command, since x4 is dichotomous.
Thank you for your help.
Variable: Names = x1 x2 x3 x4 m t tc c; Categorical = m x4; Cluster=c; Survival = t (all); Timecensored =tc (0=not 1=right); Analysis: Basehazard = off ; Type=complex; Algorithm=integration; Integration=montecarlo; Model: x2 ON x1; x3 ON x1; x4 ON x1; m ON x2 x3 x4; t ON x2 x3 x4 m;
This is a big topic, made more complex since you consider a continuous-time survival outcome. x4 is dichotomous and also a DV. For mediation models like this one, one approach is to work x4*, the latent continuous response variable behind the observed x4. Then all relationships stay linear and things are easy. WLSMV does that, but doesn't handle survival analysis. You need ML for that, but ML does not use x4* so you end up with a mixture of linear and logit/probit regressions. To do mediation modeling right, you should use ML and the causal effect approaches described by e.g. Vanderwheele in the epidemiology literature. I think such survival analysis can be done in Mplus, but I haven't tried it out. I did not include that in my causal effects paper (see out website).
I know this wasn't your question, but felt I had to mention it. With ML, residual covariances that you ask about can be done using a factor behind the two variables - then they correlate beyond that their predictors produce.
They will not be. The model is estimated conditioned on the observed exogenous covariates. Their means, variances, and covariances are not model parameters.
shaun goh posted on Thursday, October 01, 2015 - 7:03 pm
Dear Prof Muthen,
I am struggling with the specification of correlations between predictors, in a model where there are three predictors : two exogenous covariates and one endogenous latent factor.
By default, Mplus correlates the two exogenous covariates and does not bring them into the likelihood. However, the two exogenous covariates are not correlated with the one endogenous latent factor.
On one hand, I do not want to do so as one of the covariates is non-normal (i.e. gender) and using the WITH command decreases fit. On the other hand, I understand that predictors are typically correlated, and I should specify correlations between all predictors?
I would like to check on something that you mentioned in a post above regarding WITH terms.
"You don't need to, and typically should not, add WITH terms for the observed exogenous variables (what we call covariates). The covariance parameters activated by WITH are not part of the model. You can think of the covariates as being correlated by default - just like they are in regular regression. The sample statistics show their values."
I would like to confirm that this is also the case for type=complex and MLR estimation. That is, that the covariates are correlated by default in the model, and are recognized as such because they are on the right side of ON.
We have a mediated model in which the meditor and dependent variable are latent factors, and the independent variables are a mix of observed and binary variables.
The model looks like this:
F1 by V1 V2 V3; F2 by V5 V5 V6 V7 V8;
F2 ON V9 V10 V11 V12; F1 ON V9 V10 V11 V12;
F1 ON F2;
F2 ON V13 V14 V15 V16; F1 ON V13 V14 V15 V16;
F1 IND V13; F1 IND V14; F1 IND V15; F1 IND V16;
Where V1-V12 are all continuous variables, V9-V12 are a mix of binary and continuous control variables, and V13-V16 are all binary independent variables.
This is true in any regression model. In regression, the model is estimated conditioned on the covariates. They are not assumed to be uncorrelated.
Irene Dias posted on Tuesday, April 12, 2016 - 3:05 pm
I have four exogenous variables (two observed [x2 and x4] and two latent [F1 and F2]), two observed mediators (x1 and x3) and two endogenous latent (F3 and F4). All observed are continuous and I am using ML. The model is as follow: F1 by a1 a2; !F1 exogenous factor F2 by a3 a4; !F2 exogenous factor F3 by a5 a6; !F3 endogenous factor F4 by a7 a8; !F4 endogenous factor
F3 ON F1 x1 x2; x1 ON F1 x2;
F4 ON F2 x3 x4; x3 ON F2 x4;
Model indirect: F3 IND x1 x2; F3 IND x1 F1;
F4 IND x3 x4; F4 IND x3 F2;
I am not specifying any with statements but I can see in the diagram that by default, Mplus calculates the covariance between x2 and x4 and the covariance between F1 and F2 (double-sided arrows in the diagram). However, theoretically, F1 should not also be correlated with F2 but also with x2 and x4, and so does F2, i.e. all four observed and latent predictors should correlate. Should I specify the statements F1 WITH x2 x4 and F2 WITH x2 x4? If I understood what you said before, I should not specify with statements between the observed predictors, right?
We show the covariance between x2 and x4 because they are not uncorrelated during model estimation although the parameter is not estimated. In regression, the model is estimated conditioned on the observed exogenous variables. If you want to relate exogenous observed and latent variables, you should use ON statements.
Deniz posted on Thursday, November 16, 2017 - 7:32 am
Dear Dr. Muthen, I have a structural equation model in which four latent dependent variables are predicted by three latent and two manifest independent variables. 1) Does mplus take correlations between manifest and latent independent variables into account? 2) And if yes, where can I find these correlations in the output? (I know that correlations between latent variables are given in the model results section & correlations between manifest variables are given in sampstat results).
1) You will see in the output if these correlation estimates show up. If not, they are zero - and you may want to free them using WITH in the Model command.
Eric M. posted on Thursday, December 14, 2017 - 2:59 pm
Hi. I’d like some clarity about whether or not exogenous latent variables (with continuous indicators) should be correlated with exogenous observed variables (all continuous). In the below example exogenous observed variables are used as controls in this model. Do I need to correlate the exogenous latent variable (F1) with the exogenous observed variables (gender, age)? I noticed that if I added other exogenous latent variables that latent variables are correlated with each other… and the observed variables are correlated with each other. However, exogenous latent variables are not correlated with exogenous observed variables in the diagram that is produced.
It was my understanding that exogenous variables should be correlated. Is this something specific to MPLUS’s defaults. Can you please clarify? Thank you!
For example: F1 BY V1 V2 V3; Z1 Y1 ON F1; Y2 ON Y1; Z2 ON Z1; Y3 ON Z2 Y2;
!with covariates (observed variables) on some of the endogenous variables
Y1 Y2 Y3 ON gender; Y1 Y2 Y3 ON age;
Eric M. posted on Thursday, December 14, 2017 - 3:25 pm
Correction and additional info.
gender (clearly not continuous). The example I provided isn't the exact model.
But adding adding a WITH statement correlated these exogenous variables in model fails to converge. with the message. Where the problem parameter is one of the covariates.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.535D-20.
Yes, with continuous DVs you should correlate the exogenous factors with the observed, exogenous variables. This is not the default due to aspects of different estimators (with categorical DVs and WLSMV you don't want to use WITH but instead ON).
The message you get is ignorable when one of your observed exogenous variables is binary. It happens because due to the WITH statement you are also estimating the mean and variance of the binary variable and those 2 parameters are mathematically related as p and p*(1-p).
Eric M. posted on Thursday, December 14, 2017 - 6:54 pm
Thank you! That really clears things up for me.
Eric M. posted on Friday, February 16, 2018 - 12:36 pm
I have an additional follow up to a previous post about covariances among observed and latent exogenous variables. Bengt metnioned: "Yes, with continuous DVs you should correlate the exogenous factors with the observed, exogenous variables. This is not the default due to aspects of different estimators (with categorical DVs and WLSMV you don't want to use WITH but instead ON)."
However, I just noticed by doing so, you lose df's. This wouldn't normally occur when let say the variables were all observed or all latent. Is this problematic or does it change the model substantially?