Demographic variables in SEM PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Jiyoung  posted on Saturday, June 20, 2009 - 9:34 am
I want to run a structural equation model. The variables in my model were chosen based on theories that previous studies found.

Along with the variables based on theories, I also want to include demographic variables (e.g., gender, race) to control the effect of demographic variables on endogenous variables. I wonder if it is correct that those demographic variables are just added like other exogenous variables based on theories.

For instance, let's say that I want to predict intention to use the Internet (INTENTION).

My exogenous variables based on theories are 1) perceived ease of use (PEU) and 2) relative advantage (RA). Those are latent variables with multiple indicators. I also want to control the effect of gender, race, income, etc.

First, I will run a CFA to examine the validity of PEU and RA. I will also use the following command:
genderL by gender;
raceAAL by raceAA;
race AsianL by raceAsian;
raceWhiteL by raceWhite;

After I make sure the model fit of the CFA model is good, I will move onto the SEM model. I will also use the following command.

Intention on peu ra genderL raceAAL raceAsianL raceWhiteL;

I wonder if I am on the right track. If you could clarify my question, I would appreciate it. Thank you.
 Linda K. Muthen posted on Sunday, June 21, 2009 - 6:28 pm
You should simply regress intention on the observed covariates. You should not put a factor behind each covariate.
 Jiyoung  posted on Monday, June 22, 2009 - 9:46 am
Thank you so much for clearing that up. Let me make sure if I understand what you said.

1) Do you mean I don't need to create factors for the demographic observed variables? I don't need the following command?

genderL by gender;
raceAAL by raceAA;
race AsianL by raceAsian;
raceWhiteL by raceWhite;

2) And also do you mean that I can run SEM with the following command without creating factors for the demographic variables?

Intention on peu ra gender raceAA raceAsian raceWhite;

3) In terms of the interpretation of the dummy variables, I wonder if the interpretation of the results is the same as regression.

If you could clarify my question, it will be very helpful. Thank you.
 Linda K. Muthen posted on Monday, June 22, 2009 - 11:28 am
1. You do not need the BY statements.
2. Yes.
3. Yes.
 Susan Hamilton posted on Friday, January 22, 2010 - 3:17 am
I want to include SES as a latent variable in a SEM model. The indicators are:
education (educ) - 7 level ordinal indicator
income (inc) - 6 level ordinal indicator
race/ethnicity (race) - nominal indicator with 5 categories (Black, White, Hispanic, Asian, and other)
marital status (mar) - nominal indicator with 5 categories (married, living with partner, divorced, widowed, single never married)
nativity (bornUS) - born in US=1 not born US=0

1. I know I need to create dummy variables for the two nominal indicators - marital status and race/ethnicity. What I cannot figure out is how to code the dummy variables and then how to include them in the BY statement for the SES factor.

2. When SES is a latent factor, is it appropriate to have the measured indicators "cause" the latent factor? If so, how is the command for this written?

Thank you.
 Linda K. Muthen posted on Friday, January 22, 2010 - 9:41 am
Nominal factor indicators are allowed in Mplus.

See Slide 246 of the Topic 1 course handout for an example how to specify a formative factor.
 Susan Hamilton posted on Wednesday, January 27, 2010 - 7:29 am

Thank you for the example of how to specify SES as a formative factor. That helps.

I am using WLSMV because most of my indicators are categorical (5 point Likert scales) and when I identify marital status and race/ethnicity as nominal I can no longer use this estimator.

Given that all of my primary data is non-normally distributed ordinal data, isn't it better to use dummy variables for my two nominal variables so that I can use WLSMV?

If so, would you help me write the commands for a dummy for a 5-category race variable (1=Black, 2=White, 3=Hispanic, 4=Asian, 5=other)?

Thank you, Susan
 Linda K. Muthen posted on Wednesday, January 27, 2010 - 10:51 am
I'm not sure if you are aware of the fact that maximum likelihood is appropriate for categorical outcomes using either logistic or probit regression.

If you want to stick with weighted least squares estimation, you can use the DEFINE command to create the dummy variables. See the user's guide for more details.
 Susan Hamilton posted on Sunday, January 31, 2010 - 12:32 pm
I tried ML and defining raceth and marital as nominal variables but get an error message that I do not have enough memory. WLSMV seems to work best with my model (except for my lack of understanding of how to write the command for dummies for my two nominal variables).

On pp 449-450 in my users manual (version 5, Nov 2007) is a description of how to refer the the levels of a nominal dependent variables in the MODEL command but no description of how to create dummy variables in the DEFINE command.

I tried this in the DEFINE command:

It seems to work. Is this correct?

If not, please help me - I am just a dumb graduate student ready to pull my hair out... Thanks.
 Linda K. Muthen posted on Sunday, January 31, 2010 - 1:25 pm
Following is one way to do this:

white = 0;
if (raceth eq 1) then white = 1;
black = 0;
if (raceth eq 2) then black = 1;

You need k-1 dummies where k is the number of categories of the nominal variable. Whenever you create a variable in DEFINE, you should check to see that you get the results intended by saving the old and new variables and spot checking.
 Lucy Morgan posted on Tuesday, February 03, 2015 - 5:56 am

I need to control for the effects of two observed demographic variables in an otherwise fully latent SEM model using MLM (due to non-normal data). One is ordinal (education level) and the other is nominal (ethnicity) and both have a significant effect on the dependent variables. I have read through all the posts but I am still not clear what would be the correct approach. I need to run both measurement models and path models including these variables.

1) Can I include both the ordinal and the nominal variables as continuous variables as I do not need to interpret their effects, only control for them? (I will use SPSS to analyse for specific effects)

2) If no, my understanding is that I would need to create dummy variables - so would I state

CATEGORICAL = ethnic educa;

and then create dummy variables using
white = 0;
if (raceth eq 1) then white = 1;
black = 0;
if (raceth eq 2) then black = 1;

as outlined in the post above? If yes, do I use the newly created variables black, white, etc in WITH and ON statements or can I simply use ethnic and educ variables in the WITH and ON statements???

Many thanks for your help with this, I am very confused!
 Linda K. Muthen posted on Tuesday, February 03, 2015 - 6:03 am
1. The scale of observed exogenous variables is not specified. The scale is specified only for endogenous variables.

2. You do not need to create dummy variables for an ordinal variable unless you prefer to. You must create dummy variables for a nominal variable. Your code looks correct. For a nominal variable, you should use the dummy variables not the nominal variable in the analysis.
 Student posted on Wednesday, April 29, 2015 - 2:02 am
I am experiencing issues trying to create dummy variables for a nominal independent variable (c) with 3 classes. In my dataset, this variable has values ranging from 1-3.

I am defining the classes as you suggested above:
race2 = 0;
if (c eq 2) then race2 = 1;
race3 = 0;
if (c eq 3) then race3 = 1;

I am adding the new variables to my USEVARIABLES list, and regressing Y (my outcome) on race and other covariates:

Y on race3 race2 cov1 cov2;

However, I get the warning:
One or more variables have a variance of zero.
Check your data and format statement.

Do you have any suggestions/recommendations regarding this error?
 Student posted on Wednesday, April 29, 2015 - 2:18 am
Apologies- I got the model above to run. I do have another question-- that model above allows you to see the effects of race2 and race 3 in the Output, but obviously not race1 since you don't define that as a dummy variable. Is it OK to just re-run the model with defining different dummy variables to see what the effect of race1 would be on Y? If there is some statistical issue with that, is there another method you recommend?
 Linda K. Muthen posted on Wednesday, April 29, 2015 - 9:26 am
The effect of race1 is found in the intercept.
 Student posted on Wednesday, April 29, 2015 - 9:39 am
I checked my output and I only see effects of race2 and race3-- could you point me to where in the output I could find the effect of race1? I know it wouldn't be labelled as "race1" since we didn't define that. The only intercept information is for the indicators of the latent outcome variable. Is there a specific "OUTPUT" I need to request?
 Linda K. Muthen posted on Wednesday, April 29, 2015 - 1:19 pm
Under the intercept for the dependent variable where race2 and race3 are covariates.
 Erin posted on Tuesday, June 14, 2016 - 4:20 pm
Hello! Thank you for the help. I am running into problems because we have a DV that is a latent variable with two observed vars. Thus, our OUTPUT gives us just the intercept for both observed vars. How do we go about getting the effect of, in this instance, race 2 vs race3, under these circumstances?

In other words, our ouput gives us the estimate and sig for race 2 vs race 1 and for race 3 vs race 1. However, we are uncertain about how we get the estimate and sig for race 2 vs race 3.

 Linda K. Muthen posted on Tuesday, June 14, 2016 - 4:41 pm
You can use MODEL CONSTRAINT to create a difference parameter by labeling the coefficients in the MODEL command, for example, p1 and p2:

NEW (diff);
diff = p1 - p2;
 Erin posted on Wednesday, June 15, 2016 - 9:57 am
Thank you for the response. I will try this. Originally, I had rerun the model with a new set of dummy variables (race 1 vs race 2; race 3 vs race 2) instead of the old set (race 2 vs race 1), (race 3 vs race 1). I was anticipating the race 2 vs race 1 contrast to be the same magnitude in both models because all that I changed was the selection of dummy variables. However, the results were slightly different between the two models. Do you know why this is?

Thank you again
 Bengt O. Muthen posted on Wednesday, June 15, 2016 - 12:57 pm
Send the 2 outputs to Support along with your license number.
 Nagwan zahry posted on Monday, October 02, 2017 - 9:09 am
Hi Bengt,

I am trying to compare two models (rational and emotional models) where i am testing the effect of two messages (emotional message and rational message) on emotions and risk. So I have two independent variables (emotional and rational messages) and the rest of my endogneous variables are continuous.

initially I have one independent categorical exogenous variable i named 'experiment where 1= emotional, 2= rational.

Would you please let me know what is wrong with my syntax. it is my first to run model comparison using mplus.

here is the syntax:

USEV experiment em em1 em3 r1 r2 r3 ben1 ben2 ben3;
Model: feb by em em1 em3;
rik by r1 r2 r3;
benft by ben1 ben2 ben3;

rik benft on feb;

Classes= C(2)
categorical= experiment;

Analysis type=mixture;

thank you
 Bengt O. Muthen posted on Monday, October 02, 2017 - 5:15 pm
Send your output to Support along with your license number.
 Wim Beyers posted on Friday, May 18, 2018 - 2:39 am
When coding a 3-level nominal variable in two dummies (D1 & D2, each with same reference category), for regression, should we allow the correlation between D1 & D2 in the regression model, in order to get really the unique effects of the dummies?

Wim Beyers
 Bengt O. Muthen posted on Friday, May 18, 2018 - 1:36 pm
Yes, your model should allow them to correlate because they are probably correlated in the data. But for the same reason, you can't get unique effects from each dummy.
 Anna Mascherek posted on Tuesday, September 11, 2018 - 4:59 am
I ran a Second-Order LGCM and now want to regress slope and level on covariates.
I need to include a nominal variable with 9 categories, hence, I created 9-1=8 dummy coded variables. Now, is there a limit for dummy coded variables? I consecutively added the dummy variables, the estimation terminates normally and the model looks fine. However, as soon as I add the last dummy, I get tthe following message:

Parameter 89, I ON DUBLIN

Could you help me with this problem?
Thank you very much in advance, your help is highly appreciated.
 Bengt O. Muthen posted on Tuesday, September 11, 2018 - 2:39 pm
There is no such limit. Please send output and data to Support along with your license number.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message