Control variables in SEM PreviousNext
Mplus Discussion > Structural Equation Modeling >
 Pankaj P posted on Monday, June 05, 2006 - 5:13 pm
How do I model control variables in SEM? I am trying to do it in the spirit of Hierarchical regression. If I treat my control variable as a mediator variable, would it work?
 Bengt O. Muthen posted on Monday, June 05, 2006 - 5:41 pm
Remind me about the definition of a "control variable".
 Pankaj P posted on Monday, June 05, 2006 - 7:13 pm
Controlling for a variable means explaining relationship between independent and dependent variables AFTER we extract the impact of control variable on the DV. In other words, we run regresion on the residuals of regression between control variable and the DV.
I read in a paper somewhere, that I could do the same in SEM i.e. run SEM on residual covariance matrix. However, would considering indirect effects take care of controlling (besides its mediating effect)?
 Linda K. Muthen posted on Tuesday, June 06, 2006 - 10:19 am
Yes, you can use the residual covariance matrix as data in Mplus. I don't think that indirect effects achieve what you want.
 Bonnie Zhang posted on Sunday, June 18, 2006 - 11:45 am
For a simple mediation model, if I control for var1 and var2 for path a), is that required to control for the same variables for path c) or b)? In other words, do I need to control for the same variables for each path in the model? Thanks a lot!
 Bengt O. Muthen posted on Sunday, June 18, 2006 - 6:22 pm
I think that is a good idea.
 tommy lake posted on Thursday, June 14, 2007 - 9:24 pm
Dr. Muthen,

I'd like to follow up this discussion. Is it required to control for the same variables for each path in an SEM model, even when theory does not require so?

 Linda K. Muthen posted on Friday, June 15, 2007 - 7:53 am
No. I can see why you might have thought this from Bengt's answer but this is not required.
 Chris Chen posted on Wednesday, December 22, 2010 - 7:38 pm
Dear Dr. Muthen,
When putting control variables in SEM models, could one simply use the following command in addition to the other pathes of the model:

dependent variable ON control variable

if there is a mediation relationship, X->M->Y, should the control variable be inputted as

M ON Control variable
Y ON control variable

Should "X ON control variable" be specified also, or are the control only required on the outcome variables?

Thank you!
 Linda K. Muthen posted on Thursday, December 23, 2010 - 10:26 am
You would regress the dependent variables and the mediators on the control variables. You would not regress the covariates on the control variables.
 Chris Chen posted on Sunday, December 26, 2010 - 4:59 pm
Dear Dr. Muthen,

Thanks for the reply.

For the statement "you would not regress the covariates on the control variables" in your message, do you mean would not regress the "independent variables" on the control variables? It seems to me "covariates" and "control variables" are the same term. could you please clarify?

Thank you!
 Linda K. Muthen posted on Monday, December 27, 2010 - 5:58 am
Control variables are covariates as as the x variables you mention. I would not regress x on the control variables.
 Heather Knous-Westfall posted on Tuesday, September 27, 2011 - 1:21 pm
i'm having some issues with adding control variables into my sem model. i have two latents and am looking at both of them as potential mediators. when i run the model i have proposed, the model fit is excellent and most of the hypothesized paths are significant. however, when i add in a control variable, the model fit is terrible and i'm not understanding why that may be.

for example:

x2 is my control/confound variable

F1 BY y3 y7; F2 BY y9 y10 y12-y14;
F1 on x6; F2 on x6; y15 on F1 F2 x6;
F1 with F2;
F2 on x2;
y15 IND x6

when i run the above, model fit is horrible, but when i run this one, it's fine.

F1 BY y3 y7; F2 BY y9 y10 y12-y14;
F1 on x6; F2 on x6; y15 on F1 F2 x6;
F1 with F2;
y15 IND x6

any ideas on what that may mean? thanks!
 Linda K. Muthen posted on Tuesday, September 27, 2011 - 2:20 pm
When you add x2, you are not just adding the path from x2 to f2. You are adding other paths also but are fixing them at zero. This can cause model misfit.
 caroline masquillier posted on Wednesday, October 24, 2012 - 1:13 pm
Dear Prof. Muthen,

When I add manifest variables in my structural model (with four latent factors), they have an influence on some of the factor loadings of these latent constructs. Some of the factor loadings diminish far below the 0.40 boundary line in comparison with the measurement model (with all the latent factors together where all factor loadings were above 0.40). Can this be possible? Is there a solution for this ?

thank you very much in advance.
 Linda K. Muthen posted on Wednesday, October 24, 2012 - 3:07 pm
This may suggest the need for direct effects from the covariates to the factor indicators. Ask for MODINDICES (ALL) in the OUTPUT command to see if this is the case.
 saeid posted on Wednesday, March 20, 2013 - 8:21 am
Dear Dr. Muthen,
I have a question regarding using control variable:

How is possible to use binary variables (such as gender) in SEM (as control variables)?

 Linda K. Muthen posted on Wednesday, March 20, 2013 - 10:23 am
You just regress the factors on the binary variable.
 Jetty posted on Friday, November 08, 2013 - 9:34 am
I am testing a path analysis model of X via 4 mediators to Y (ordered categorical). I am using example 3.16 from newest manual, except I specify that the DV is categorical and use X1,X2, and X3 in the second MODEL command (not X2 only as in the text) because I need a fully adjusted model. My question is: Given that X2 and X3 are not included in the MODEL INDIRECT statements, are the indirect effects estimates adjusted for X2 and X3? If not, what is the proper syntax for including them as control variables?
 Linda K. Muthen posted on Friday, November 08, 2013 - 1:32 pm
Control variables are treated as any other covariate.
 dvl posted on Thursday, December 12, 2013 - 8:09 am
Dear professor,

I'd like to ask a question on the control variables included in different regressions in a path model (no latent concepts). As my theory assumes different control variables for each of my endogenous variables in the model, each path has different control variables. As I have read above that is not a problem given that including the same control variables in each equation is statistically not required (to my opinion, it would make path models less interesting if they did). However, the next issues are unclear to me:

(1) How should I interpret indirect and total effects when different control variables are included in each equation? For which variables are the total and indirect effects controlled in this case and how should we report on this?

(2) In a path model all exogenous variables are correlated. For example:

X -> W -> P
C1 = control variable 1
C2 = control variable 2

W on X C1;
P on W C2;

Mplus assumes C1 and C2 to be correlated, even if I do not include C2 as a control in the relationship on W. Is this true? So the regression on W is not controlled for C2, but somehow, the correlation between C2 and C1 should affect the relationship between X and W? You know how I should see this?

I really struggle with this! I hope you can help.
Thanks a lot!
 dvl posted on Friday, December 13, 2013 - 2:45 am

Regarding my question above, I have figured (2) out already but the question that remains open to me is whether it is possible to interpret an indirect effect when the mediator variable and the dependent variable have different control variables? I am really confused about that!

Thanks a lot!
 Bengt O. Muthen posted on Friday, December 13, 2013 - 4:44 pm
I would include all control variables in both equations. Some may not be significant, but that's ok.
 Joris van der Voet posted on Thursday, December 19, 2013 - 1:59 am
Dear dr. Muthen,

in a structural model, I want to control for the effects of a nominal variable (different countries). I am not interested in the effect different countries may have on my dependent variables, I merely want to control for the variance they may explain in the model.

Would I need to construct dummy variables for each of the countries and enter them separately in the model (with one being the reference category), or can I enter a single variable in which country A =1, country B = 2, etc.?

Thank you.
 Bengt O. Muthen posted on Thursday, December 19, 2013 - 10:41 am
You have to construct a set of C-1 dummies, where C is the number of countries.
 dvl posted on Monday, February 03, 2014 - 1:28 am
Dear Professor,

Regarding my question above: I have path model (with all manifest variables, no latent concepts due to data limitations). I want to include the same control variables in each equation in order to know where my indirect effects are controlled for! But in that case, I have a saturated model and I have no fit indices! Can I do something with that kind of models, because in the literature, I see no publications using saturated models? Can you give me some advise on this?
 Bengt O. Muthen posted on Monday, February 03, 2014 - 8:39 am
Although it is true that chi-square model testing can't reject the model, it is not a fatal flaw to consider a saturated model if your theories don't involve excluded paths. There should be many such models in the literature. Regression analysis is another example of a saturated model.
 Imaan posted on Saturday, February 15, 2014 - 2:52 am
I have three mediators in my model. Should i take control variable which have significant relationship with DV and Mediators.But control variables perform differently to DV and mediators. How would i run this analysis. Should i include all control variables to DV and Mediators paths.or should i exclude control variable which has insignificant relationship with Dv or Mediators
 Linda K. Muthen posted on Sunday, February 16, 2014 - 11:19 am
I would use the control variables in every regression. I would not exclude insignificant paths.
 Imaan posted on Sunday, February 16, 2014 - 11:31 am
if we have four control variables, should i show paths with four mediators and dv seperatly
 Bengt O. Muthen posted on Sunday, February 16, 2014 - 12:46 pm
The control variables are covariates, so they don't change the number of mediators (you said you had 3 mediators).
 Sarah  posted on Thursday, May 01, 2014 - 6:50 am
I'm running a multiple mediation model with 3 mediators, income, maternal mental health and parenting behaviour. According to theory, the mediators themselves are all interlinked. More specifically, income is thought to directly affect maternal mental health and parenting behaviours.
I am including a range of controls for each of my mediators and I wondered if it was acceptable to allow income to act as a control for the other two mediators even though income is itself a mediator?

Many thanks for your help.
 Bengt O. Muthen posted on Thursday, May 01, 2014 - 9:38 pm
This is fine.
 Rama Srinivasan posted on Saturday, May 31, 2014 - 4:45 pm
Hello Dr. Muthen,

I am working on a SEM model.
I do not want Mplus to estimate some of the paths that it automatically does, like between some of the exogenous variables.
So I was thinking of fixing the paths to 0.
Will this be the right way to do it? If so, what is the syntax for that? Should I mention each interaction in a separate syntax or can they be mentioned on the same line?

 Linda K. Muthen posted on Monday, June 02, 2014 - 10:42 am
Are these observed exogenous variables or latent variables.
 RSrinivasan posted on Monday, June 02, 2014 - 10:49 am
Hi Linda,

These are latent variables.

 Linda K. Muthen posted on Monday, June 02, 2014 - 11:46 am
You can say

f1 WITH f2@0;
 RSrinivasan posted on Tuesday, June 03, 2014 - 6:53 am
Thanks Linda
 lamjas posted on Thursday, June 26, 2014 - 1:32 am
I have two questions.

(1) For a simple mediation SEM using a cross-sectional data, let's say IV-->M-->DV, and gender is the control variable, should I add paths of gender to all three variables, or just on M and DV?

(2) For a cross-lagged SEM, let's say A1, A2, B1, B2 (the number indicates the time point), while gender and age as control variables, am I right that I should add paths of gender and age to Time 1 variables only?

If I have a time-variant control variable (say X1 and X2), then I should add paths of X1 to A1 and B1, whereas X2 to A2 and B2. Am I right?
 Linda K. Muthen posted on Thursday, June 26, 2014 - 2:19 pm
1. Regress m and dv on gender. Do you regress iv on gender.

2-3. There are no set rules. Your research questions should guide you.
 lamjas posted on Thursday, June 26, 2014 - 6:20 pm
Hi Linda,

For your answer (1), should I also regress iv on gender?

Thank you again.
 Linda K. Muthen posted on Thursday, June 26, 2014 - 6:29 pm
I meant do not regress iv on gender.
 lamjas posted on Thursday, June 26, 2014 - 6:42 pm
Hi Linda,

If I apply the logic of your answer (1) to my question (2) and (3), A1 and B1 are not appropriate to regress on any covariates. Is that right?
 Linda K. Muthen posted on Friday, June 27, 2014 - 11:09 am
The distinction is that you don't regress an exogenous iv on another exogenous variable. Your a and b variables are dependent variables.
 Paula Yuma posted on Sunday, June 29, 2014 - 10:13 am
Thank you for this amazing forum!

I'm running a mediation analysis with three exogenous latent factors (HSEP, NSEP and PARKS), one mediating latent factor (PNS) and one observed outcome (PA).

I'm receiving some advice that in addition to placing the control variables (cov1-8) in the regression equations (ON statements) as below:

PNS on HSEP NSEP PARKS cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8;

PA on PNS HSEP NSEP PARKS cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8;

I should also be regressing the latent factors NSEP HSEP and PARKS on all the covariates, as follows:

NSEP on cov1-cov8;
HSEP on cov1-cov8;
PARKS on cov1-cov8;

My model runs well with the control variables in the ON statements with the mediator and outcome. It essentially doesn't run at all when I regress the HSEP NSEP and PARKS factors on the controls. What would you advise in terms of how to include these controls, and if possible, could you explain why?

Additionally, some of my covariates are dichotomous dummy variables for race. Should their covariances with one another be set to 0 using cov1 with cov2@0?

Many thanks!
Paula Yuma
Doctoral Candidate UT Austin School of Social Work
 Bengt O. Muthen posted on Sunday, June 29, 2014 - 11:18 am
I agree that you need to allow for factors and covariates to be related. If this runs into problems as you say, you may want to send to Support.

Nothing needs to be said about the relationships among covariates. I assume that for the dummies, you have one less dummy variable than the number of categories.
 Paula Yuma posted on Sunday, June 29, 2014 - 12:09 pm
Thank you so much for your reply. Yes, I have one less dummy than the number of categories.

Just to clarify, do the independent factors need to be related to the covariates through a regression statement (where the IVs are regressed on the covariates), or just allowed to covary, as they would by default as they are all included in the mediator and outcome regression statements?

Many thanks.
 Bengt O. Muthen posted on Monday, June 30, 2014 - 10:30 am
Typically you can let covariates and exogenous factors simply correlate using WITH. For categorical DV modeling, e.g. using WLSMV, using ON can be preferred because WITH makes the covariates "into y's" (so with parameters that are part of the model) and this has implications for assumptions of underlying normality.
 Yvonne LEE posted on Friday, August 08, 2014 - 9:15 am
Want to double-check to avoid misunderstanding.

1. One of the paths in my model is Childhood Adversities -> Distorted Sex Attitude -> PornUse-> Rape, can I add social desirability (control variable) on each of these factors?

2. Distorted Sex Attitude is a latent factor with 4 indicators. Should I add social desirability control factor at the factor level or indicator level?
 Linda K. Muthen posted on Friday, August 08, 2014 - 11:10 am
You can add the control variable anyplace you want it on the right-hand side of ON. Where you add it should be decided by your research hypothesis. I cannot make that decision.
 Mollie Bandy posted on Tuesday, April 05, 2016 - 12:15 pm
I have a mediation model with an observed predictor variable, latent mediator, and both latent and observed outcome variables. When I add covariates to the model (via outcomes ON covariates statements)the model fit is decreases. How are covariates affecting model fit?
 Linda K. Muthen posted on Tuesday, April 05, 2016 - 1:13 pm
Please send the output and your license number to
 Hewa G posted on Wednesday, May 11, 2016 - 2:53 pm
I'm working on a SEM model. Iím using a categorical variable (level1=1 and level2=2) as a control variable in the research model. Do I have to dummy code this variable? Can I do this with the define command in Mplus?
 Linda K. Muthen posted on Wednesday, May 11, 2016 - 3:19 pm
You can use the variable as is or change the values to 0/1 using the DEFINE command.
 Hewa G posted on Wednesday, May 11, 2016 - 4:27 pm
Thank you for your reply. If I write the command for the categorical variable of job level,

DEFINE: JLEVEL(1=0 2=1);
is this correct?
 Linda K. Muthen posted on Wednesday, May 11, 2016 - 4:55 pm
See the user's guide for the correct specification of the DEFINE command.

The CATEGORICAL list is for dependent variables only. In regression, covariates can be binary or continuous. In both cases, they are treated as continuous.
 Allison Ross posted on Thursday, January 26, 2017 - 4:21 pm
I have a model with three latent variables and have a large number of control variables affecting one or more of them. When I run the models without controls, I get very good model fit data. Once I add the controls, however, I suspect a model misfit because of the large amount of paths I'm creating. My question is, how do I adjust for this? I understand that mplus automatically fixes all paths that I do not explicitly specify to zero. Is this correct? If so, do I release some of these paths?
 Bengt O. Muthen posted on Thursday, January 26, 2017 - 5:07 pm
Mplus has fixed zero path coefficients for path you don't include in the Model command. You can use Modindices to see where your model doesn't fit, that is, which paths should be free to be estimated.
 Allison Ross posted on Thursday, January 26, 2017 - 8:08 pm
Thank you for your prompt response. After reviewing the Modindices and using the WITH command to define relevant correlations among some control variables, I still get quite a poor fit compared to the model without controls. I have two questions for you.

1. Is the WITH command sufficient to capture these relationships? Would it make sense to use * and free the relevant parameters?

2. Is it recommended to regress each latent construct on each control or can theory dictate which relationship is identified?

Thank you for your time.
 Bengt O. Muthen posted on Saturday, January 28, 2017 - 11:27 am
We need to see your output to say. Please send to Support along with your license number.
 Paulette Del Rosso posted on Tuesday, April 25, 2017 - 6:32 pm
I heard from someone who attended one of your seminars on SEM that you can test for differential item functioning (DIF) using the modification indices (MIís). The procedure: first, specifying a CFA; second, specifying a dummy coded variable that represents the reference and target groups without any paths to the observed or latent variables; and third, examining the MIís to see whether any paths need to be specified between the dummy coded variable and the observed indicator variables. Please confirm if this is a plausible and valid technique.
I am currently engaged in a SEM project in which I am modeling expectation of success in school children as a function of various ecological variables. I would like to control for the demographic variables of sex, ethnicity, and SES. Assuming the above DIF technique is valid I was wondering if I can test whether these demographic control variables are necessary in my SEM by initially including them in my SEM without any paths specified and then checking whether the MIís indicate that paths should be included in the SEM between the demographic variables and my variables of interest.
One of my challenges in specifying my SEM is that the background demographic variables can have paths specified to all variables in my model, which makes specifying my SEM very cumbersome and challenging. I was wondering whether this technique of using the MIís can lessen the complexity of specifying my SEM.
 Bengt O. Muthen posted on Wednesday, April 26, 2017 - 3:03 pm
The procedure I've had in mind regresses the factor on x's and then looks for MI's for direct effects (Y1 on x etc.). Having f on x in the model, you can also regress each factor indicator on all x's and see which slopes are significant.

Or you can use BSEM to do this in a single-step analysis including all direct effect with small-variance priors.
 Rebecca Woodruff posted on Tuesday, August 29, 2017 - 9:42 am
This is a follow-up question to several previous posts. When adding control variables to a PATH model, why do you not regress the IV on the control variables?

A collaborator on my project (epidemiologist by training) is advising me that I will be ignoring important potential confounders by not doing this, citing VanderWeele and colleagues' work with causal mediation.
 Bengt O. Muthen posted on Tuesday, August 29, 2017 - 2:33 pm
The IV (the exposure variable) is not regressed on the control variables in for example

Statistics and Its Interface Volume 2 (2009) 457Ė468
Conceptual issues concerning mediation,
interventions and composition
Tyler J. VanderWeele∗ and Stijn Vansteelandt

Perhaps you can refer me to writing where this is cone.
 jb posted on Tuesday, February 20, 2018 - 4:18 am
First of all thanks so much for this forum, it's so helpful especially for newbies to SEM and mplus like me. I learn so much going through the responses.
I am running a moderated mediation model with 2 serial mediators (M1 and M2) and one moderator (W) on the X-M1 (a) pathway

I want to add two control variables (C1, C2). Is the syntax below correct? I have been reading through the discussions above but I am still confused.

Y ON M1 (b1);
Y ON M2 (b2);

Y ON X (cdash);
Y ON C1;
Y ON C2;

M1 ON X (a1);
M1 ON W (a3);
M1 ON XW (a4);

M1 ON C1;
M1 ON C2;
M2 ON C1;
M2 ON C2;

M2 ON X (a2);
M2 ON M1 (d1);
 Bengt O. Muthen posted on Tuesday, February 20, 2018 - 2:34 pm
Yes, this looks correct.
 jb posted on Wednesday, February 21, 2018 - 1:06 pm
Thanks - what if the variables are demographic eg gender? Is it the same procedure as above?
 Bengt O. Muthen posted on Wednesday, February 21, 2018 - 3:50 pm
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message