Message/Author 

Pankaj P posted on Monday, June 05, 2006  5:13 pm



Hi! How do I model control variables in SEM? I am trying to do it in the spirit of Hierarchical regression. If I treat my control variable as a mediator variable, would it work? Thanks! 


Remind me about the definition of a "control variable". 

Pankaj P posted on Monday, June 05, 2006  7:13 pm



Controlling for a variable means explaining relationship between independent and dependent variables AFTER we extract the impact of control variable on the DV. In other words, we run regresion on the residuals of regression between control variable and the DV. I read in a paper somewhere, that I could do the same in SEM i.e. run SEM on residual covariance matrix. However, would considering indirect effects take care of controlling (besides its mediating effect)? Thanks! 


Yes, you can use the residual covariance matrix as data in Mplus. I don't think that indirect effects achieve what you want. 


For a simple mediation model, if I control for var1 and var2 for path a), is that required to control for the same variables for path c) or b)? In other words, do I need to control for the same variables for each path in the model? Thanks a lot! 


I think that is a good idea. 

tommy lake posted on Thursday, June 14, 2007  9:24 pm



Dr. Muthen, I'd like to follow up this discussion. Is it required to control for the same variables for each path in an SEM model, even when theory does not require so? Thanks 


No. I can see why you might have thought this from Bengt's answer but this is not required. 

Chris Chen posted on Wednesday, December 22, 2010  7:38 pm



Dear Dr. Muthen, When putting control variables in SEM models, could one simply use the following command in addition to the other pathes of the model: dependent variable ON control variable if there is a mediation relationship, X>M>Y, should the control variable be inputted as M ON Control variable Y ON control variable Should "X ON control variable" be specified also, or are the control only required on the outcome variables? Thank you! 


You would regress the dependent variables and the mediators on the control variables. You would not regress the covariates on the control variables. 

Chris Chen posted on Sunday, December 26, 2010  4:59 pm



Dear Dr. Muthen, Thanks for the reply. For the statement "you would not regress the covariates on the control variables" in your message, do you mean would not regress the "independent variables" on the control variables? It seems to me "covariates" and "control variables" are the same term. could you please clarify? Thank you! 


Control variables are covariates as as the x variables you mention. I would not regress x on the control variables. 


i'm having some issues with adding control variables into my sem model. i have two latents and am looking at both of them as potential mediators. when i run the model i have proposed, the model fit is excellent and most of the hypothesized paths are significant. however, when i add in a control variable, the model fit is terrible and i'm not understanding why that may be. for example: x2 is my control/confound variable F1 BY y3 y7; F2 BY y9 y10 y12y14; F1 on x6; F2 on x6; y15 on F1 F2 x6; F1 with F2; F2 on x2; MODEL INDIRECT: y15 IND x6 when i run the above, model fit is horrible, but when i run this one, it's fine. F1 BY y3 y7; F2 BY y9 y10 y12y14; F1 on x6; F2 on x6; y15 on F1 F2 x6; F1 with F2; MODEL INDIRECT: y15 IND x6 any ideas on what that may mean? thanks! 


When you add x2, you are not just adding the path from x2 to f2. You are adding other paths also but are fixing them at zero. This can cause model misfit. 


Dear Prof. Muthen, When I add manifest variables in my structural model (with four latent factors), they have an influence on some of the factor loadings of these latent constructs. Some of the factor loadings diminish far below the 0.40 boundary line in comparison with the measurement model (with all the latent factors together where all factor loadings were above 0.40). Can this be possible? Is there a solution for this ? thank you very much in advance. 


This may suggest the need for direct effects from the covariates to the factor indicators. Ask for MODINDICES (ALL) in the OUTPUT command to see if this is the case. 

saeid posted on Wednesday, March 20, 2013  8:21 am



Dear Dr. Muthen, I have a question regarding using control variable: How is possible to use binary variables (such as gender) in SEM (as control variables)? Thanks, Saeid 


You just regress the factors on the binary variable. 

Jetty posted on Friday, November 08, 2013  9:34 am



I am testing a path analysis model of X via 4 mediators to Y (ordered categorical). I am using example 3.16 from newest manual, except I specify that the DV is categorical and use X1,X2, and X3 in the second MODEL command (not X2 only as in the text) because I need a fully adjusted model. My question is: Given that X2 and X3 are not included in the MODEL INDIRECT statements, are the indirect effects estimates adjusted for X2 and X3? If not, what is the proper syntax for including them as control variables? 


Control variables are treated as any other covariate. 

dvl posted on Thursday, December 12, 2013  8:09 am



Dear professor, I'd like to ask a question on the control variables included in different regressions in a path model (no latent concepts). As my theory assumes different control variables for each of my endogenous variables in the model, each path has different control variables. As I have read above that is not a problem given that including the same control variables in each equation is statistically not required (to my opinion, it would make path models less interesting if they did). However, the next issues are unclear to me: (1) How should I interpret indirect and total effects when different control variables are included in each equation? For which variables are the total and indirect effects controlled in this case and how should we report on this? (2) In a path model all exogenous variables are correlated. For example: X > W > P C1 = control variable 1 C2 = control variable 2 W on X C1; P on W C2; Mplus assumes C1 and C2 to be correlated, even if I do not include C2 as a control in the relationship on W. Is this true? So the regression on W is not controlled for C2, but somehow, the correlation between C2 and C1 should affect the relationship between X and W? You know how I should see this? I really struggle with this! I hope you can help. Thanks a lot! 

dvl posted on Friday, December 13, 2013  2:45 am



Dear, Regarding my question above, I have figured (2) out already but the question that remains open to me is whether it is possible to interpret an indirect effect when the mediator variable and the dependent variable have different control variables? I am really confused about that! Thanks a lot! 


I would include all control variables in both equations. Some may not be significant, but that's ok. 


Dear dr. Muthen, in a structural model, I want to control for the effects of a nominal variable (different countries). I am not interested in the effect different countries may have on my dependent variables, I merely want to control for the variance they may explain in the model. Would I need to construct dummy variables for each of the countries and enter them separately in the model (with one being the reference category), or can I enter a single variable in which country A =1, country B = 2, etc.? Thank you. 


You have to construct a set of C1 dummies, where C is the number of countries. 

dvl posted on Monday, February 03, 2014  1:28 am



Dear Professor, Regarding my question above: I have path model (with all manifest variables, no latent concepts due to data limitations). I want to include the same control variables in each equation in order to know where my indirect effects are controlled for! But in that case, I have a saturated model and I have no fit indices! Can I do something with that kind of models, because in the literature, I see no publications using saturated models? Can you give me some advise on this? 


Although it is true that chisquare model testing can't reject the model, it is not a fatal flaw to consider a saturated model if your theories don't involve excluded paths. There should be many such models in the literature. Regression analysis is another example of a saturated model. 

Imaan posted on Saturday, February 15, 2014  2:52 am



I have three mediators in my model. Should i take control variable which have significant relationship with DV and Mediators.But control variables perform differently to DV and mediators. How would i run this analysis. Should i include all control variables to DV and Mediators paths.or should i exclude control variable which has insignificant relationship with Dv or Mediators 


I would use the control variables in every regression. I would not exclude insignificant paths. 

Imaan posted on Sunday, February 16, 2014  11:31 am



if we have four control variables, should i show paths with four mediators and dv seperatly 


The control variables are covariates, so they don't change the number of mediators (you said you had 3 mediators). 

Sarah posted on Thursday, May 01, 2014  6:50 am



I'm running a multiple mediation model with 3 mediators, income, maternal mental health and parenting behaviour. According to theory, the mediators themselves are all interlinked. More specifically, income is thought to directly affect maternal mental health and parenting behaviours. I am including a range of controls for each of my mediators and I wondered if it was acceptable to allow income to act as a control for the other two mediators even though income is itself a mediator? Many thanks for your help. 


This is fine. 


Hello Dr. Muthen, I am working on a SEM model. I do not want Mplus to estimate some of the paths that it automatically does, like between some of the exogenous variables. So I was thinking of fixing the paths to 0. Will this be the right way to do it? If so, what is the syntax for that? Should I mention each interaction in a separate syntax or can they be mentioned on the same line? Thanks 


Are these observed exogenous variables or latent variables. 


Hi Linda, These are latent variables. Thanks 


You can say f1 WITH f2@0; 


Thanks Linda 

lamjas posted on Thursday, June 26, 2014  1:32 am



I have two questions. (1) For a simple mediation SEM using a crosssectional data, let's say IV>M>DV, and gender is the control variable, should I add paths of gender to all three variables, or just on M and DV? (2) For a crosslagged SEM, let's say A1, A2, B1, B2 (the number indicates the time point), while gender and age as control variables, am I right that I should add paths of gender and age to Time 1 variables only? If I have a timevariant control variable (say X1 and X2), then I should add paths of X1 to A1 and B1, whereas X2 to A2 and B2. Am I right? 


1. Regress m and dv on gender. Do you regress iv on gender. 23. There are no set rules. Your research questions should guide you. 

lamjas posted on Thursday, June 26, 2014  6:20 pm



Hi Linda, For your answer (1), should I also regress iv on gender? Thank you again. 


I meant do not regress iv on gender. 

lamjas posted on Thursday, June 26, 2014  6:42 pm



Hi Linda, If I apply the logic of your answer (1) to my question (2) and (3), A1 and B1 are not appropriate to regress on any covariates. Is that right? 


The distinction is that you don't regress an exogenous iv on another exogenous variable. Your a and b variables are dependent variables. 

Paula Yuma posted on Sunday, June 29, 2014  10:13 am



Hello, Thank you for this amazing forum! I'm running a mediation analysis with three exogenous latent factors (HSEP, NSEP and PARKS), one mediating latent factor (PNS) and one observed outcome (PA). I'm receiving some advice that in addition to placing the control variables (cov18) in the regression equations (ON statements) as below: PNS on HSEP NSEP PARKS cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8; PA on PNS HSEP NSEP PARKS cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8; I should also be regressing the latent factors NSEP HSEP and PARKS on all the covariates, as follows: NSEP on cov1cov8; HSEP on cov1cov8; PARKS on cov1cov8; My model runs well with the control variables in the ON statements with the mediator and outcome. It essentially doesn't run at all when I regress the HSEP NSEP and PARKS factors on the controls. What would you advise in terms of how to include these controls, and if possible, could you explain why? Additionally, some of my covariates are dichotomous dummy variables for race. Should their covariances with one another be set to 0 using cov1 with cov2@0? Many thanks! Paula Yuma Doctoral Candidate UT Austin School of Social Work 


I agree that you need to allow for factors and covariates to be related. If this runs into problems as you say, you may want to send to Support. Nothing needs to be said about the relationships among covariates. I assume that for the dummies, you have one less dummy variable than the number of categories. 

Paula Yuma posted on Sunday, June 29, 2014  12:09 pm



Thank you so much for your reply. Yes, I have one less dummy than the number of categories. Just to clarify, do the independent factors need to be related to the covariates through a regression statement (where the IVs are regressed on the covariates), or just allowed to covary, as they would by default as they are all included in the mediator and outcome regression statements? Many thanks. 


Typically you can let covariates and exogenous factors simply correlate using WITH. For categorical DV modeling, e.g. using WLSMV, using ON can be preferred because WITH makes the covariates "into y's" (so with parameters that are part of the model) and this has implications for assumptions of underlying normality. 

Yvonne LEE posted on Friday, August 08, 2014  9:15 am



Want to doublecheck to avoid misunderstanding. 1. One of the paths in my model is Childhood Adversities > Distorted Sex Attitude > PornUse> Rape, can I add social desirability (control variable) on each of these factors? 2. Distorted Sex Attitude is a latent factor with 4 indicators. Should I add social desirability control factor at the factor level or indicator level? 


You can add the control variable anyplace you want it on the righthand side of ON. Where you add it should be decided by your research hypothesis. I cannot make that decision. 


I have a mediation model with an observed predictor variable, latent mediator, and both latent and observed outcome variables. When I add covariates to the model (via outcomes ON covariates statements)the model fit is decreases. How are covariates affecting model fit? 


Please send the output and your license number to support@statmodel.com. 

Hewa G posted on Wednesday, May 11, 2016  2:53 pm



I'm working on a SEM model. I’m using a categorical variable (level1=1 and level2=2) as a control variable in the research model. Do I have to dummy code this variable? Can I do this with the define command in Mplus? 


You can use the variable as is or change the values to 0/1 using the DEFINE command. 

Hewa G posted on Wednesday, May 11, 2016  4:27 pm



Thank you for your reply. If I write the command for the categorical variable of job level, VARIABLE:CATEGORICAL IS JLEVEL DEFINE: JLEVEL(1=0 2=1); is this correct? 


See the user's guide for the correct specification of the DEFINE command. The CATEGORICAL list is for dependent variables only. In regression, covariates can be binary or continuous. In both cases, they are treated as continuous. 


I have a model with three latent variables and have a large number of control variables affecting one or more of them. When I run the models without controls, I get very good model fit data. Once I add the controls, however, I suspect a model misfit because of the large amount of paths I'm creating. My question is, how do I adjust for this? I understand that mplus automatically fixes all paths that I do not explicitly specify to zero. Is this correct? If so, do I release some of these paths? 


Mplus has fixed zero path coefficients for path you don't include in the Model command. You can use Modindices to see where your model doesn't fit, that is, which paths should be free to be estimated. 


Thank you for your prompt response. After reviewing the Modindices and using the WITH command to define relevant correlations among some control variables, I still get quite a poor fit compared to the model without controls. I have two questions for you. 1. Is the WITH command sufficient to capture these relationships? Would it make sense to use * and free the relevant parameters? 2. Is it recommended to regress each latent construct on each control or can theory dictate which relationship is identified? Thank you for your time. 


We need to see your output to say. Please send to Support along with your license number. 


I heard from someone who attended one of your seminars on SEM that you can test for differential item functioning (DIF) using the modification indices (MI’s). The procedure: first, specifying a CFA; second, specifying a dummy coded variable that represents the reference and target groups without any paths to the observed or latent variables; and third, examining the MI’s to see whether any paths need to be specified between the dummy coded variable and the observed indicator variables. Please confirm if this is a plausible and valid technique. I am currently engaged in a SEM project in which I am modeling expectation of success in school children as a function of various ecological variables. I would like to control for the demographic variables of sex, ethnicity, and SES. Assuming the above DIF technique is valid I was wondering if I can test whether these demographic control variables are necessary in my SEM by initially including them in my SEM without any paths specified and then checking whether the MI’s indicate that paths should be included in the SEM between the demographic variables and my variables of interest. One of my challenges in specifying my SEM is that the background demographic variables can have paths specified to all variables in my model, which makes specifying my SEM very cumbersome and challenging. I was wondering whether this technique of using the MI’s can lessen the complexity of specifying my SEM. 


The procedure I've had in mind regresses the factor on x's and then looks for MI's for direct effects (Y1 on x etc.). Having f on x in the model, you can also regress each factor indicator on all x's and see which slopes are significant. Or you can use BSEM to do this in a singlestep analysis including all direct effect with smallvariance priors. 

Back to top 