Mplus Discussion >> Some of the variables are nominal.

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Some of the variables are nominal.

Mplus Discussion > Categorical Data Modeling >

Message/Author

Jiyoung posted on Saturday, June 20, 2009 - 9:17 am

I tried to run a structural equation model. The variables include the following things:

ordinal (7 points Likert Scale)
nominal (e.g., gender, race)

I defined the ordinal variables as categorical variables in MPlus without recoding them as dummy variables.

My question is how to treat nominal variables in the same model.

Q1. Am I supposed to define nominal variables as categorical variables as well in MPlus?

Q2. In the case of ordinal variables, I did not recode those as dummy variables. I simply defined them as categorical variables. I wonder if I should create dummy variables for nominal variables.

Q3. The nominal variables that I am using are observed variables. I wonder if I have to define them as latent variables as well like below:

genderL by gender

If you could clarify my quesitons, I would appreciate it. Thank you.

Linda K. Muthen posted on Sunday, June 21, 2009 - 6:30 pm

Factor indicators can be declared as categorical or nominal. For nominal covariates, a set of dummy variables should be created.

Jiyoung posted on Monday, June 22, 2009 - 9:41 am

Thank you for clarifying my question. I have one more question. In the case of ordinal variables, I specified those ordinal variables as categorical variables using the following command:

Categorical variables are ...

I wonder if I have to specify the nominal variables with the following command:

Nominal variables are gender race;

Linda K. Muthen posted on Monday, June 22, 2009 - 11:27 am

Only if you are using them as dependent variables.

Jiyoung posted on Monday, June 22, 2009 - 12:38 pm

Thank you for your clarification. One last question regarding this matter.

The nominal variables that I want to use are independent varaibles. The other independent variables are ordinal variables.

As you instructed, I will create dummy variables for nominal independent variables.

I wonder if I should define the nominal independent variables using the following command as I did for ordinal independent variables.

Categorical variables are gender race SN;

I guess that you mean that I don't need to specify nominal independent variables as categorical variables in MPLUS if I create dummy variables. Am I correct?

Thank you so much.

Linda K. Muthen posted on Monday, June 22, 2009 - 2:57 pm

The CATEGORICAL and NOMINAL options are for dependent variables only. They should not be used for covariates. Covariates can be binary or continuous as in regular regression and are treated as continuous in both cases.

Jiyoung posted on Tuesday, June 23, 2009 - 8:30 am

Thank you for your answer. I have a few more questions.

All of my independent and dependent variables measured on 7-point Likert type of scales. Exceptions include a few nominal control variables.

I wonder what would be the best. These are the options I am thinking of.

1) I can simply treat all of my independent and dependent variables as continuous variables. In social science, we actually consider the variables that are measured on Likert type of scales interval variables.

If I choose this option, I should not use any CATEGORICAL and NOMINAL options for both independent and dependent varaibles. If I use this option, I also have to use ML estimator instead of WLSM estimator.

2) The second option is to use the CATEGORICAL command for the dependent variable to define them as ordinal varaibles, but not for the independent variables. As a result, I simply treat the independent varaibels that are measured on 7-point Likert scale continuous without defining them as ordinal variables. However, I define the dependent variables as ordinal variables using the CATEGORICAL option. If I use this option, I have to use WLSM estimator.

I think that the first option is better. Would you please let me know which option is better between the two. Am I supposed to take a totally different approach?

Bengt O. Muthen posted on Tuesday, June 23, 2009 - 12:29 pm

To decide, please see

Muth�n, B. & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.

Muth�n, B. & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.

Jiyoung posted on Thursday, June 25, 2009 - 7:29 am

Thank you very much.

Wassim Tarraf posted on Thursday, October 29, 2009 - 9:34 am

Hi- I've received some mixed advice about using nominal (3 categories) indicators for a continuous latent construct. My question is: is it permissible or valid to use both nominal and continuous variables as indicators for a continuous latent construct which will be used as a predictor for the intercept and slope in a Growth Curve model setting.

Thanks.

Linda K. Muthen posted on Thursday, October 29, 2009 - 10:23 am

In Mplus factor indicators can be continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal) counts, or combinations of these variable types.

Wassim Tarraf posted on Thursday, October 29, 2009 - 11:06 am

Thanks Linda! Are there any papers/citations that you would recommend reading on nominal indicators for continuous latent variables?

Linda K. Muthen posted on Thursday, October 29, 2009 - 11:43 am

I don't know of any offhand. I think that Darrel Bock may have written about this in the 1970's.

Guanyi Lu posted on Tuesday, November 06, 2012 - 6:48 am

Hi Linda,

I want to specify a control variable (nominal, with 12 categories). But the output gives me "This exceeds the maximum allowed of 10" error.

Is there any way I can solve this? It is not a missing data issue (or the data is not read correctly). The control variable is Country code which includes 12 countries.

Best and look forward to your reply.

Linda K. Muthen posted on Tuesday, November 06, 2012 - 9:18 am

To use a nominal variable as a control variable, you need to create a set of dummy variables. In your case, it would be 11 dummy variables. The NOMINAL option is for dependent variables only.

Jan Zirk posted on Thursday, August 22, 2013 - 4:02 pm

Do you plan or is there on the 'wish list' the matter of facilitation of analyses with nominal independent variables? (so that creating the dummy variables would not be necessary: it may get extremely complex when there are a few nominal predictors with more than 2 levels; most packages for standard analyses e.g. ANOVA automatically create sets of dummies for nominal predictors...).

Linda K. Muthen posted on Friday, August 23, 2013 - 10:41 am

This is on our list.

Jan Zirk posted on Friday, August 23, 2013 - 11:00 am

This would be a big advantage. Thank you for the prompt response.

Jan Zirk posted on Friday, August 23, 2013 - 11:04 am

P.S. Especially if the facility would provide both the general estimate of the main effect and the interaction term like in factorial anovas with nominal IVs with more than 2 categories, and the more detailed insight into the effects of dummy predictors.

Jan Zirk posted on Wednesday, September 18, 2013 - 6:09 am

I am testing an objective Bayesian regression model with an interaction term of a continuous predictor and 2 dummy variables (as there are 3 groups in the study). The model is defined in this way:

i on S;
i on dum1;
i on dum2;
i on dum1xS;
i on dum2xS;

How can I obtain estimates and significance for the joint effect of Group (i.e. of dum1 and dum2) and the joint effect of interaction terms (i.e. dum1xS and dum2xS)?

Linda K. Muthen posted on Wednesday, September 18, 2013 - 9:54 am

See MODEL TEST in the user's guide.

Jan Zirk posted on Wednesday, September 18, 2013 - 10:05 am

Does not work with Bayes.

Bengt O. Muthen posted on Wednesday, September 18, 2013 - 11:38 am

If you can define the joint effect as a NEW parameter in MODEL CONSTRAINT, Bayes gives you the estimate and SE.

Jan Zirk posted on Wednesday, September 18, 2013 - 11:59 am

Oh I see. Thanks for this. So I have specified the new parameters in this way:

i on S;
i on dum1 (a);
i on dum2 (b);
i on dum1xS (c);
i on dum2xS (d);

model constraint: NEW (maineff inteff); maineff = a+b; inteff = c+d;

If I would like to standardize them, what would be the best way? I tried including variance for i in the model:
i (varI);

and now added 2 more parameters to the model constraint:
NEW (maineff inteff mainst intst); maineff = a+b; inteff = c+d; mainst = maineff /sqrt(varI); intst = inteff /sqrt(varI);

but unfortunatelly this standardization does not work well:
...............est.......sd.....p..........95%ci
MAINEFF -1.102 0.081 0.000 -1.259 -0.943
INTEFF -0.180 0.113 0.060 -0.411 0.052
MAINst -1.426 0.110 0.000 -1.656 -1.220
INTst -0.233 0.147 0.060 -0.535 0.066

So, can be seen above the MAINst is larger than 1. Is there a way to handle this problem of the too large standardized estimate?

Bengt O. Muthen posted on Wednesday, September 18, 2013 - 1:55 pm

The "vari" that you use is only the residual variance of i, not the total variance since i is regressed on 4 covariates.

Jan Zirk posted on Wednesday, September 18, 2013 - 2:07 pm

Aa, that is right. Is it possible to obtain an estimate of the total variance when i is the dependent?

Bengt O. Muthen posted on Wednesday, September 18, 2013 - 3:37 pm

You get that in Tech4, but that is just the value. You want to express it in terms of model parameter estimates so you get the right SE for it. So a little cumbersome.

Jan Zirk posted on Wednesday, September 18, 2013 - 4:59 pm

As I understand, it is not then possible to standardise these additional parameters and get the appropriate SE?

Bengt O. Muthen posted on Thursday, September 19, 2013 - 9:16 am

No, it is possible, just cumbersome. You just have to express the variance of the i factor. For instance, if you have

i on x1 x2;

You would write in MODEL:

i on x1 (b1)
x2 (b2);
i (resvar);
x1 (x1var);
x2 (x2var);
x1 with x2 (cov);

and in MODEL CONSTRAINT:

NEW(iSD);

iSD = sqrt(resvar+b1*b1*x1var+b2*b2*x2var+2*b1*b2*cov);

and then you go on and standardize.

Jan Zirk posted on Thursday, September 19, 2013 - 9:26 am

That is indeed cumbersome but good that possible. Thank you for this.

Jonathan posted on Monday, December 23, 2013 - 8:14 am

I've run across something strange. I'm running a fairly large SEM with multiple latent variables and a number of dichotomous dummy variables as predictors of the latent variable. These dummy variables represent the person's occupation, so each respondent can only be "yes" for one.

I omitted a different dichotomous dummy variable, and I got a different chi-square value! In some cases, the difference in the chi-square is really big.

Two other things that are relevant:
1) I'm using the WLSMV estimator because I have a lot of categorical indicators.
2) I tried running a much simpler model in Mplus too; this was just a simple regression that went like
"income ON occ1 occ2 occ3..."
And omitted a different variable each time. Neither the R-Square nor the Chi-Square changed. I used the default ML estimator in that one.

Do you happen to have an idea why it would change in the first analysis but not the second? Thanks for your time in advance.

Tihomir Asparouhov posted on Monday, December 23, 2013 - 1:20 pm

If you add a binary predictor and a significant direct effect is not accounted for the chi-square would increase. This will be particularly an issue if there are empty cells in the [indicator,covariate] bivariate table.

Use "OUTPUT:samp residual mod;" to determine where the misfit comes from. If the model is large adding one covariate at a time would make it easier to proceed.

You should also keep this in mind. The test of fit is a test between the structural model and the unrestricted multivariate probit regression model given in the samp output. As you change the variables in the model the unrestricted model changes. Is principle that is nothing to worry about and it is just as in the ML estimation for the continuous case. In certain situations however this has implications for models with parameter constraints, such as equal parameters.

Jonathan posted on Monday, December 23, 2013 - 1:33 pm

Well, the issue isn't so much that the chi-square increases. The problem is that I have a series of dummy variables. Let's say I run a model with 5 predictor variables--all of them are dichotomous, one person can only have one occupation, and every person in the sample has an occupation.
x1: Manager
x2: Lawyer
x3: Administrative Assistant
x4: Short-Order Cook
x5: Engineer

I have to omit one because of multicollinearity issues, the same as in a standard regression. But if I omit Manager, the chi-square is substantially higher than if I omit Lawyer; in fact, there are five different chi-square values, and it changes depending on which variable is omitted.

I know that chi-square test for WLSMV isn't exactly the same as the ML, but I've never seen model fit statistics change depending on the omitted variable. Is there a way so that this won't happen, or a citation that I can use to say it is okay?

Thanks for all of your time on this.

Tihomir Asparouhov posted on Monday, December 23, 2013 - 7:08 pm

This is correct. The issue is specific to the WLSMV estimator. While the unrestricted model does not change, the unrestricted parameters that are fitted will change and also the weights for these parameters will change. If you use the ULSMV estimator you will get less variation in the chi-square because you will eliminate the variation in the weights, but the fit function will still be different as you change the covariates and with that the unrestricted parameters that you are using to construct a fit function of squared differences.

So again, the unrestricted H1 model is the same/equivalent, the restricted model you estimate is the same/equivalent. What changes is the estimator, because the fit function changes.

There is nothing to worry about in principle. There is no error in the estimation. The chi-square fit function will reach the same conclusion: reject or accept regardless of which covariates you choose, although the actual value can change. Typically however if the chi-square value changes a lot as you change the covariates this is an indicator of misfit of the structural model and omitted direct effects.

Note also that not only your chi-square changes but the actual structural model changes in non-equivalent ways (for example the loadings can change not just the means/thersholds). In principle you can choose the "best" covariates to obtain the best fitting model, but you should still examine direct effect omissions as a follow up step ... and if you do that you should approximately reach equivalent models even if you start with a different set of covariates.

A different/more elaborate approach is to use multiple groups modeling instead of the binary covariates.

As a reference you can use
http://pages.gseis.ucla.edu/faculty/muthen/articles/Article_075.pdf
but there is nothing specific there to this issue - it just explains how the estimator works.