Remind me about the definition of a "control variable".
Pankaj P posted on Monday, June 05, 2006 - 7:13 pm
Controlling for a variable means explaining relationship between independent and dependent variables AFTER we extract the impact of control variable on the DV. In other words, we run regresion on the residuals of regression between control variable and the DV. I read in a paper somewhere, that I could do the same in SEM i.e. run SEM on residual covariance matrix. However, would considering indirect effects take care of controlling (besides its mediating effect)? Thanks!
For a simple mediation model, if I control for var1 and var2 for path a), is that required to control for the same variables for path c) or b)? In other words, do I need to control for the same variables for each path in the model? Thanks a lot!
You would regress the dependent variables and the mediators on the control variables. You would not regress the covariates on the control variables.
Chris Chen posted on Sunday, December 26, 2010 - 4:59 pm
Dear Dr. Muthen,
Thanks for the reply.
For the statement "you would not regress the covariates on the control variables" in your message, do you mean would not regress the "independent variables" on the control variables? It seems to me "covariates" and "control variables" are the same term. could you please clarify?
i'm having some issues with adding control variables into my sem model. i have two latents and am looking at both of them as potential mediators. when i run the model i have proposed, the model fit is excellent and most of the hypothesized paths are significant. however, when i add in a control variable, the model fit is terrible and i'm not understanding why that may be.
x2 is my control/confound variable
F1 BY y3 y7; F2 BY y9 y10 y12-y14; F1 on x6; F2 on x6; y15 on F1 F2 x6; F1 with F2; F2 on x2; MODEL INDIRECT: y15 IND x6
when i run the above, model fit is horrible, but when i run this one, it's fine.
F1 BY y3 y7; F2 BY y9 y10 y12-y14; F1 on x6; F2 on x6; y15 on F1 F2 x6; F1 with F2; MODEL INDIRECT: y15 IND x6
When I add manifest variables in my structural model (with four latent factors), they have an influence on some of the factor loadings of these latent constructs. Some of the factor loadings diminish far below the 0.40 boundary line in comparison with the measurement model (with all the latent factors together where all factor loadings were above 0.40). Can this be possible? Is there a solution for this ?
You just regress the factors on the binary variable.
Jetty posted on Friday, November 08, 2013 - 9:34 am
I am testing a path analysis model of X via 4 mediators to Y (ordered categorical). I am using example 3.16 from newest manual, except I specify that the DV is categorical and use X1,X2, and X3 in the second MODEL command (not X2 only as in the text) because I need a fully adjusted model. My question is: Given that X2 and X3 are not included in the MODEL INDIRECT statements, are the indirect effects estimates adjusted for X2 and X3? If not, what is the proper syntax for including them as control variables?
Control variables are treated as any other covariate.
dvl posted on Thursday, December 12, 2013 - 8:09 am
I'd like to ask a question on the control variables included in different regressions in a path model (no latent concepts). As my theory assumes different control variables for each of my endogenous variables in the model, each path has different control variables. As I have read above that is not a problem given that including the same control variables in each equation is statistically not required (to my opinion, it would make path models less interesting if they did). However, the next issues are unclear to me:
(1) How should I interpret indirect and total effects when different control variables are included in each equation? For which variables are the total and indirect effects controlled in this case and how should we report on this?
(2) In a path model all exogenous variables are correlated. For example:
X -> W -> P C1 = control variable 1 C2 = control variable 2
W on X C1; P on W C2;
Mplus assumes C1 and C2 to be correlated, even if I do not include C2 as a control in the relationship on W. Is this true? So the regression on W is not controlled for C2, but somehow, the correlation between C2 and C1 should affect the relationship between X and W? You know how I should see this?
I really struggle with this! I hope you can help. Thanks a lot!
Regarding my question above, I have figured (2) out already but the question that remains open to me is whether it is possible to interpret an indirect effect when the mediator variable and the dependent variable have different control variables? I am really confused about that!
in a structural model, I want to control for the effects of a nominal variable (different countries). I am not interested in the effect different countries may have on my dependent variables, I merely want to control for the variance they may explain in the model.
Would I need to construct dummy variables for each of the countries and enter them separately in the model (with one being the reference category), or can I enter a single variable in which country A =1, country B = 2, etc.?
Regarding my question above: I have path model (with all manifest variables, no latent concepts due to data limitations). I want to include the same control variables in each equation in order to know where my indirect effects are controlled for! But in that case, I have a saturated model and I have no fit indices! Can I do something with that kind of models, because in the literature, I see no publications using saturated models? Can you give me some advise on this?
Although it is true that chi-square model testing can't reject the model, it is not a fatal flaw to consider a saturated model if your theories don't involve excluded paths. There should be many such models in the literature. Regression analysis is another example of a saturated model.
Imaan posted on Saturday, February 15, 2014 - 2:52 am
I have three mediators in my model. Should i take control variable which have significant relationship with DV and Mediators.But control variables perform differently to DV and mediators. How would i run this analysis. Should i include all control variables to DV and Mediators paths.or should i exclude control variable which has insignificant relationship with Dv or Mediators
I'm running a multiple mediation model with 3 mediators, income, maternal mental health and parenting behaviour. According to theory, the mediators themselves are all interlinked. More specifically, income is thought to directly affect maternal mental health and parenting behaviours. I am including a range of controls for each of my mediators and I wondered if it was acceptable to allow income to act as a control for the other two mediators even though income is itself a mediator?
I am working on a SEM model. I do not want Mplus to estimate some of the paths that it automatically does, like between some of the exogenous variables. So I was thinking of fixing the paths to 0. Will this be the right way to do it? If so, what is the syntax for that? Should I mention each interaction in a separate syntax or can they be mentioned on the same line?
lamjas posted on Thursday, June 26, 2014 - 1:32 am
I have two questions.
(1) For a simple mediation SEM using a cross-sectional data, let's say IV-->M-->DV, and gender is the control variable, should I add paths of gender to all three variables, or just on M and DV?
(2) For a cross-lagged SEM, let's say A1, A2, B1, B2 (the number indicates the time point), while gender and age as control variables, am I right that I should add paths of gender and age to Time 1 variables only?
If I have a time-variant control variable (say X1 and X2), then I should add paths of X1 to A1 and B1, whereas X2 to A2 and B2. Am I right?
The distinction is that you don't regress an exogenous iv on another exogenous variable. Your a and b variables are dependent variables.
Paula Yuma posted on Sunday, June 29, 2014 - 10:13 am
Hello, Thank you for this amazing forum!
I'm running a mediation analysis with three exogenous latent factors (HSEP, NSEP and PARKS), one mediating latent factor (PNS) and one observed outcome (PA).
I'm receiving some advice that in addition to placing the control variables (cov1-8) in the regression equations (ON statements) as below:
PNS on HSEP NSEP PARKS cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8;
PA on PNS HSEP NSEP PARKS cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8;
I should also be regressing the latent factors NSEP HSEP and PARKS on all the covariates, as follows:
NSEP on cov1-cov8; HSEP on cov1-cov8; PARKS on cov1-cov8;
My model runs well with the control variables in the ON statements with the mediator and outcome. It essentially doesn't run at all when I regress the HSEP NSEP and PARKS factors on the controls. What would you advise in terms of how to include these controls, and if possible, could you explain why?
Additionally, some of my covariates are dichotomous dummy variables for race. Should their covariances with one another be set to 0 using cov1 with cov2@0?
Many thanks! Paula Yuma Doctoral Candidate UT Austin School of Social Work
I agree that you need to allow for factors and covariates to be related. If this runs into problems as you say, you may want to send to Support.
Nothing needs to be said about the relationships among covariates. I assume that for the dummies, you have one less dummy variable than the number of categories.
Paula Yuma posted on Sunday, June 29, 2014 - 12:09 pm
Thank you so much for your reply. Yes, I have one less dummy than the number of categories.
Just to clarify, do the independent factors need to be related to the covariates through a regression statement (where the IVs are regressed on the covariates), or just allowed to covary, as they would by default as they are all included in the mediator and outcome regression statements?
Typically you can let covariates and exogenous factors simply correlate using WITH. For categorical DV modeling, e.g. using WLSMV, using ON can be preferred because WITH makes the covariates "into y's" (so with parameters that are part of the model) and this has implications for assumptions of underlying normality.
Yvonne LEE posted on Friday, August 08, 2014 - 9:15 am
Want to double-check to avoid misunderstanding.
1. One of the paths in my model is Childhood Adversities -> Distorted Sex Attitude -> PornUse-> Rape, can I add social desirability (control variable) on each of these factors?
2. Distorted Sex Attitude is a latent factor with 4 indicators. Should I add social desirability control factor at the factor level or indicator level?
I have a mediation model with an observed predictor variable, latent mediator, and both latent and observed outcome variables. When I add covariates to the model (via outcomes ON covariates statements)the model fit is decreases. How are covariates affecting model fit?
Hewa G posted on Wednesday, May 11, 2016 - 2:53 pm
I'm working on a SEM model. Iím using a categorical variable (level1=1 and level2=2) as a control variable in the research model. Do I have to dummy code this variable? Can I do this with the define command in Mplus?
I have a model with three latent variables and have a large number of control variables affecting one or more of them. When I run the models without controls, I get very good model fit data. Once I add the controls, however, I suspect a model misfit because of the large amount of paths I'm creating. My question is, how do I adjust for this? I understand that mplus automatically fixes all paths that I do not explicitly specify to zero. Is this correct? If so, do I release some of these paths?
Thank you for your prompt response. After reviewing the Modindices and using the WITH command to define relevant correlations among some control variables, I still get quite a poor fit compared to the model without controls. I have two questions for you.
1. Is the WITH command sufficient to capture these relationships? Would it make sense to use * and free the relevant parameters?
2. Is it recommended to regress each latent construct on each control or can theory dictate which relationship is identified?
I heard from someone who attended one of your seminars on SEM that you can test for differential item functioning (DIF) using the modification indices (MIís). The procedure: first, specifying a CFA; second, specifying a dummy coded variable that represents the reference and target groups without any paths to the observed or latent variables; and third, examining the MIís to see whether any paths need to be specified between the dummy coded variable and the observed indicator variables. Please confirm if this is a plausible and valid technique. I am currently engaged in a SEM project in which I am modeling expectation of success in school children as a function of various ecological variables. I would like to control for the demographic variables of sex, ethnicity, and SES. Assuming the above DIF technique is valid I was wondering if I can test whether these demographic control variables are necessary in my SEM by initially including them in my SEM without any paths specified and then checking whether the MIís indicate that paths should be included in the SEM between the demographic variables and my variables of interest. One of my challenges in specifying my SEM is that the background demographic variables can have paths specified to all variables in my model, which makes specifying my SEM very cumbersome and challenging. I was wondering whether this technique of using the MIís can lessen the complexity of specifying my SEM.
The procedure I've had in mind regresses the factor on x's and then looks for MI's for direct effects (Y1 on x etc.). Having f on x in the model, you can also regress each factor indicator on all x's and see which slopes are significant.
Or you can use BSEM to do this in a single-step analysis including all direct effect with small-variance priors.
This is a follow-up question to several previous posts. When adding control variables to a PATH model, why do you not regress the IV on the control variables?
A collaborator on my project (epidemiologist by training) is advising me that I will be ignoring important potential confounders by not doing this, citing VanderWeele and colleagues' work with causal mediation.