This is a basic question regarding the correlation of exogenous variables in SEM.
I have a mediation model where 1) a latent variable is regressed on several observed exogenous variables, 2) a mediator is regressed on that latent variable and the exogenous variables, 3) a dependent variable is regressed on all of the above variables.
The exogenous variables are known to be correlated (e.g. race, income, education, age). From the Mplus output, it appears that the model does not automatically estimate the correlations among exogenous variables.
How should I approach the correlation of these exogenous variables? Are there specific guidelines or rationale that I should consider? My research question does not specifically involve the exogenous variables; rather, I am adjusting for them because there is known to be sociodemographic variation in my constructs of interest.
In regression, the model is estimated conditioned on observed exogenous variables. Their means, variances, and covariances are not model parameters. If you want to know their means, variances, and covariances, do a TYPE=BASIC.
Thank you Linda. I may have not been completely clear. I am not necessarily interested in obtaining estimates for these correlations.
Rather, I am trying to figure out, what are the implications of including versus not including WITH statements for the exogenous variables when conducting SEM? I.e., how will including versus not including such WITH statements impact the overall model? Is there rationale for choosing when to include these correlations?
I would like to account for correlations between covariates and an exogenous factor. I am regressing:
F2 ON F1 X1 X2; ! X1-X2 observed covariates F2 BY V1-V5; ! F2 endogenous factor F1 BY U1-U3; ! F1 exogenous factor
I plan to use the WLSMV estimator (indicators U_i and V_j are categorical).
A. According to the Mplus 6.1 version note, the default is to no longer include into the model the correlations between covariates, in ML estimations. Is that also the default assumption in WLSMV estimations?
B. By default, Mplus assumes an absence of correlation between the exogenous factor F1 and the covariates X1 X2, right ?
Non-zero correlations would however be more realistic here (F1=impairment, X1=pain, X2=parenting stress). Modification indices do suggest links: F1 ON X1; F1 ON X2;
Yet, domain-driven theory makes more sense of a reverse link: X1 X2 ON F1.
C. What are the statistical implications of replacing "F1 ON X1 X2" by: C1. "X1 X2 ON F1"? C2. "X1 X2 WITH F1"? Does the latter imply a zero correlation between covariates X1 and X2, unless I add "X1 WITH X2" ?
Many thanks in advance for your attention and any guidance,
Thank you very much for your prompt answer, I highly appreciate your constant timeliness in answering our questions.
Just a few precisions I would like to add:
Re:A. I was wondering why your 6.1 version note mentioned only ML estimation (and not others like WLSMV). Was it already implicit in WLSMV, that correlations between covariates' are also left out of the model?
In my analyses, WLSMV regression results differ much, when specifying or not correlations between an exogenous factor F1 and covariates X1 X2. In contrast, with or without those correlations, MLR results vary little (close to WLSMV with the correlations specified). Do MLR and WLSMV handle in different ways the correlation between exogenous factors and covariates?
Re:C1-C2. Actually *both* "X1 X2 ON F1" and "X1 X2 WITH F1" imply a zero correlation between covariates X1 and X2 (conditioned on F1), unless I add "X1 WITH X2", right?
Is this added complexity the reason that
C3. Your 6.1 version note says: "Because covariates are characterized as exogenous variables, using ON with the covariates on the right-hand side is a natural approach"?
C4. Modification indices usually propose "F1 ON X1" and "F1 ON X2" instead of "X1 X2 ON F1"?
Before Version 6.1, all models were estimated conditioned on the observed exogenous variables except maximum likelihood with continuous dependent variables. The model results in this case are the same whether the model is estimated conditioned on the observed exogenous variables or not. We made the change for consistency.
With WLSMV we do not recommend using WITH with observed and latent exogenous variables. We recommend using ON. WITH changes the model.
c1-c2. I think so. Try it and see.
c4. Modification indices are given for all fixed or constrained parameters. Whether it makes sense to add them is determined by the user.
Using WITH the sample statistics for model estimation are tetrachoric/polychoric correlations. With ON they are probit regression coefficients and residual correlations.
ellen posted on Friday, September 14, 2012 - 10:32 am
This is a basic question. It appears that Mplus estimates correlations between exogenous predictor variables by default even when the WITH statements are not specified.
I have 4 exogenous predictors (e.g., A, B, C, D), and 1 mediator, and 1 outcome variable. When I specified WITH statements for the covariances among three exogenous variables (A, B, C), but did NOT specify an WITH statement for the three exogenous variables with the fourth variable, D, the output in Mplus somehow still estimate the covariances in the output.
If I don't want a covariance parameter to be estimated between two exogenous variables, do I need to fix it to zero or simply not estimate it? Does fixing a parameter to zero mean the same as removing this parameter from the model?
We need to see the output, but the answer to your last question is yes.
Stephanie posted on Thursday, November 15, 2012 - 3:39 am
We have questions regarding the correlation of exogenous variables in our SEM. (We are using MPlus Version 5.1).
In our model we have four exogenous variables: One latent variable, two continuous manifest variables and one dichotomous manifest variable. As our dependent variable is dichotomous we are using the WLSMV estimator. Our questions are:
1. Do we explicitly have to integrate a correlation with the WITH statement for all these exogenous variables – manifest and latent? Or does MPlus calculate these correlations by default?
2. If it does, where can we see the correlations with their level of significance in the output? In our current output (without integrated correlations) we can only see correlations between the latent exogenous variable and all the other variables in the model but no correlations between this latent exogenous variable and the other exogenous variables.
3. Are correlations probably only necessary between exogenous manifest and latent variables but not between exogenous manifest variables?
We thank you very much in advance for your support!
With WLSMV, I would not covary the observed and latent exogenous variables. I would relate these variables using the observed exogenous variables on the right-hand side of ON statements where the latent variables are dependent variables.
Stephanie posted on Wednesday, April 10, 2013 - 6:39 am
We have a further question regarding the correlation of exogenous variables in our SEM.
We are using the same model as mentioned in our last question on November 15th. But now we have only four manifest exogenous variables from which three are continuous and one is dichtomous. Our dependent variable still is dichotomous, so we are using the WLSMV estimator.
In this case, do we have to integrate a correlation with the WITH statement for all these exogenous variables? Or does MPlus calculate these correlations by default? And if it does, where can we see the correlations with their level of significance in the output?
In regression, the model is estimated conditional on the observed exogenous variables. You should not mention them in the model command. You can see their correlations in the descriptive statistics from the SAMPSTAT option.
Stephanie posted on Thursday, April 11, 2013 - 12:04 am
Thank you very much for your help! I have included the SAMPSTAT option in my model command. But unfortunately the output only contains correlatios between all other variables in the model but not between the four manifest exogenous variables. How is it possible to get them?
I performed a SEM with observed and latent variables. A reviewer stated that we "Ought provide the correlation matrix".
I am concerned about this, in fact, SEM has been performed to test a model we tested on an another sample. furthermore, the correlation matrix would refer to observed variables rather than latent ones.
1. do you think is really importnat to provide the correlation matrix of the variables tested within the SEM?
2. does it make sense to you that the relationships (coefficients) obtained with the SEM differ from those obtained with correlation? (e.g., a significant relationship in SEM is not significant in correlation matrix)
It makes sense to provide descriptive statistics for the data that are analyzed. Means, variances, and a correlation matrix provide a good description of the data and can be used to reanalyze the data.
Carolyn CL posted on Monday, June 10, 2013 - 1:38 pm
Dear Drs. Muthen and MPLUS experts,
After reading up on the issue of correlating exogenous variables, I would like to be clear on the following:
Model: Y1 ON Z1 X1 X2 X3; !X1-X3 are observed exogenous variables Z1 ON X1 X2 X3; Z1 BY X4 X5 X6 X7; !Z1 is an endogenous latent variable
1. The regression coefficients for variables X1-X3 on Y1 and Z1 are estimated conditional on each other (and on Z1, for Y1).
2. This allows me to say, in the interpretation of the results, that the effect of X1 (for example) on Y1, controls for the effects of X2, X3 and Z1.
3. If I want to say that I allowed X1-X3 to correlate with each other, I would need to add WITH terms as follows to the model:
X1 WITH X2 X3; X2 WITH X3;
In this way, I would effectively be estimating parameters for the covariance between X1-X3.
1-2 are correct, but 3 is wrong. You don't need to, and typically should not, add WITH terms for the observed exogenous variables (what we call covariates). The covariance parameters activated by WITH are not part of the model. You can think of the covariates as being correlated by default - just like they are in regular regression. The sample statistics show their values.
Carolyn CL posted on Monday, June 10, 2013 - 3:35 pm
Many thanks for this.
One final question, what would be the implication for the model and interpretation of the results if I did add covariance parameters to the model by using the WITH statement?
I ask because a reviewer noted that 'We would expect substantial bivariate correlations between the various measures X1-X3. Since these are all modeled simultaneously as manifest variables, I'm concerned about the validity of the model results'.
It sounds like the reviewer does not understand that the x1-x3 correlations are not held at zero in your analysis. Your analysis is ok if these 3 variables correlate, which covariates typically do.
In your case, if you have no missing data on x1-x3, if you add WITH among your 3 covariates you will get the same results, except that CFA and TLI will be inflated.
GP posted on Thursday, September 19, 2013 - 12:54 pm
Dear Drs. Muthen,
I'm running the path model below. x1, x4, and m are dichotomous; x2 and x3 are continuous. tc is censored, with corresponding time variable t. I also have a clustering variable c.
I would like to know how to model the correlation between x2/x4 and x3/x4. I understand I cannot use the WITH command, since x4 is dichotomous.
Thank you for your help.
Variable: Names = x1 x2 x3 x4 m t tc c; Categorical = m x4; Cluster=c; Survival = t (all); Timecensored =tc (0=not 1=right); Analysis: Basehazard = off ; Type=complex; Algorithm=integration; Integration=montecarlo; Model: x2 ON x1; x3 ON x1; x4 ON x1; m ON x2 x3 x4; t ON x2 x3 x4 m;
This is a big topic, made more complex since you consider a continuous-time survival outcome. x4 is dichotomous and also a DV. For mediation models like this one, one approach is to work x4*, the latent continuous response variable behind the observed x4. Then all relationships stay linear and things are easy. WLSMV does that, but doesn't handle survival analysis. You need ML for that, but ML does not use x4* so you end up with a mixture of linear and logit/probit regressions. To do mediation modeling right, you should use ML and the causal effect approaches described by e.g. Vanderwheele in the epidemiology literature. I think such survival analysis can be done in Mplus, but I haven't tried it out. I did not include that in my causal effects paper (see out website).
I know this wasn't your question, but felt I had to mention it. With ML, residual covariances that you ask about can be done using a factor behind the two variables - then they correlate beyond that their predictors produce.
They will not be. The model is estimated conditioned on the observed exogenous covariates. Their means, variances, and covariances are not model parameters.
shaun goh posted on Thursday, October 01, 2015 - 7:03 pm
Dear Prof Muthen,
I am struggling with the specification of correlations between predictors, in a model where there are three predictors : two exogenous covariates and one endogenous latent factor.
By default, Mplus correlates the two exogenous covariates and does not bring them into the likelihood. However, the two exogenous covariates are not correlated with the one endogenous latent factor.
On one hand, I do not want to do so as one of the covariates is non-normal (i.e. gender) and using the WITH command decreases fit. On the other hand, I understand that predictors are typically correlated, and I should specify correlations between all predictors?