Coding of Categorical Covariates PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Utkun Ozdil posted on Thursday, February 03, 2011 - 11:27 am
My question is about the coding of categorical covariates in the data file of Mplus. In my model I want to test the effect of gender, socioeconomic status, school type on mathematics achievement. Socioeconomic status has 4 scales and school type has 3 scales. Do you suggest dummy coding for these variables? For instance, should I create 3 dummies for SES and 2 dummies for school type,, and then include these dummy variables in the MODEL part or should I leave them as they are in the data file (without creating any dummies)?

I also hesitated in that whether a multigroup analysis would be more appropriate.


 Linda K. Muthen posted on Thursday, February 03, 2011 - 1:16 pm
Covariates must be binary or continuous. If you do not recode nominal variables to s set of dummy variables, they will be treated as continuous in model estimation. With ordinal variables, you can create a set of dummy variables or treat them as continuous.
 Steven Lancaster posted on Monday, August 06, 2018 - 10:07 am
Is there an efficient way to create the dummy variables in MPLUS? I have 55 groups in a nominal variable that I need to dummy code. I know this can be done using the DEFINE command, but will I need to manually list out all 55 dummy variables?
 Bengt O. Muthen posted on Monday, August 06, 2018 - 3:29 pm
No, there isn't an efficient way. But perhaps you want to treat group as random when you have that many? Or use a multiple-group approach.
 Steven Lancaster posted on Monday, August 06, 2018 - 5:07 pm
I am using a latent growth curve modeling approach. How might I use these techniques for this statistical approach? I have been watching your short course topic 3 and following along in the slides, but not sure how to bring in such a large number of groups.

 Bengt O. Muthen posted on Tuesday, August 07, 2018 - 11:19 am
See the paper on our website:

Muthén, B. & Asparouhov, T. (2016). Recent methods for the study of measurement invariance with many groups: Alignment and random effects. (Download scripts).

Although that paper talks about a factor analysis model, you should note that such a model is essentially equivalent to a growth curve model so the modeling alternatives that are discussed apply.
 Sara Namazi posted on Wednesday, August 08, 2018 - 6:23 am
Hi Dr. Muthen,

I have a question:

In my analysis, I am interested in controlling for the effect of female.
My demographic gender variable is coded as 1=female, 2=male

In Mplus, I recoded this variable as:

if (Sex eq 2) then female=0;
if (Sex eq 1) then female =1;

Does this look correct?

When I tried to control for the effect of male I changed the coding:

if (Sex eq 2) then male=1;
if (Sex eq 1) then male=0;

The results came out the same for both male and female in terms of fit statistics and beta coefficients.

Am I coding this incorrectly?

 Bengt O. Muthen posted on Wednesday, August 08, 2018 - 5:55 pm
You control for gender, not for female or male. That is, you control for a variable not values of the variable.

A simple way to define the variable is

female - 2- gender;

Just use this female variable in your model.
 Sara Namazi posted on Wednesday, August 08, 2018 - 6:46 pm
Hi Dr. Muthen,

Thank you for your response and clarification.

Could I re-code my Sex variable (male=2 female=1) using this command under define:

if (Sex eq 2) then male =1;
if (Sex eq 1) then male =0;
 Bengt O. Muthen posted on Thursday, August 09, 2018 - 2:07 pm
 Steven Lancaster posted on Thursday, August 09, 2018 - 3:10 pm
I had a chance to look at the 2016 paper on multiple groups. I also tried to run the analyses. It seems the Alignment method will not work with latent growth curve analyses because MPLUS wants me to free all the loadings. But arn't fixed loadings required to run that analysis correctly? I tried to run the two-level analysis as well, but do not have that version of MPLUS. I think I finally figured out how to run my data (expect for the groups part) using non-linear analyses (short course 3 was helpful!), but feeling very stuck on trying to analyze with so many groups.
 Bengt O. Muthen posted on Thursday, August 09, 2018 - 3:24 pm
Alignment should work with growth curve modeling using fixed loadings. If you have a problem with this, please send your full output to Support along with your license number.
 Bengt O. Muthen posted on Friday, August 10, 2018 - 8:05 am
I misspoke - the loadings and intercepts of the factor indicators need to be free in the alignment procedure. The growth model with its typically fixed loadings (fixed time scores) assumes measurement invariance in that the time scores are thought to be the same for all groups for the same linear/quadratic etc development. So capture the 55 groups by either using either a large set of dummy covariates or a multiple-group run. In the latter, you can test if the fixed time scores need to be relaxed in some groups due to deviations from the overall linear/quadratic etc growth. Modindices are useful here.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message