The CATEGORICAL and NOMINAL options are for dependent variables only. They should not be used for covariates. Covariates can be binary or continuous as in regular regression and are treated as continuous in both cases.
Jiyoung posted on Tuesday, June 23, 2009 - 8:30 am
Thank you for your answer. I have a few more questions.
All of my independent and dependent variables measured on 7-point Likert type of scales. Exceptions include a few nominal control variables.
I wonder what would be the best. These are the options I am thinking of.
1) I can simply treat all of my independent and dependent variables as continuous variables. In social science, we actually consider the variables that are measured on Likert type of scales interval variables.
If I choose this option, I should not use any CATEGORICAL and NOMINAL options for both independent and dependent varaibles. If I use this option, I also have to use ML estimator instead of WLSM estimator.
2) The second option is to use the CATEGORICAL command for the dependent variable to define them as ordinal varaibles, but not for the independent variables. As a result, I simply treat the independent varaibels that are measured on 7-point Likert scale continuous without defining them as ordinal variables. However, I define the dependent variables as ordinal variables using the CATEGORICAL option. If I use this option, I have to use WLSM estimator.
I think that the first option is better. Would you please let me know which option is better between the two. Am I supposed to take a totally different approach?
Muthén, B. & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.
Muthén, B. & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.
Jiyoung posted on Thursday, June 25, 2009 - 7:29 am
Hi- I've received some mixed advice about using nominal (3 categories) indicators for a continuous latent construct. My question is: is it permissible or valid to use both nominal and continuous variables as indicators for a continuous latent construct which will be used as a predictor for the intercept and slope in a Growth Curve model setting.
To use a nominal variable as a control variable, you need to create a set of dummy variables. In your case, it would be 11 dummy variables. The NOMINAL option is for dependent variables only.
Jan Zirk posted on Thursday, August 22, 2013 - 4:02 pm
Do you plan or is there on the 'wish list' the matter of facilitation of analyses with nominal independent variables? (so that creating the dummy variables would not be necessary: it may get extremely complex when there are a few nominal predictors with more than 2 levels; most packages for standard analyses e.g. ANOVA automatically create sets of dummies for nominal predictors...).
Jan Zirk posted on Friday, August 23, 2013 - 11:00 am
This would be a big advantage. Thank you for the prompt response.
Jan Zirk posted on Friday, August 23, 2013 - 11:04 am
P.S. Especially if the facility would provide both the general estimate of the main effect and the interaction term like in factorial anovas with nominal IVs with more than 2 categories, and the more detailed insight into the effects of dummy predictors.
Jan Zirk posted on Wednesday, September 18, 2013 - 6:09 am
I am testing an objective Bayesian regression model with an interaction term of a continuous predictor and 2 dummy variables (as there are 3 groups in the study). The model is defined in this way:
i on S; i on dum1; i on dum2; i on dum1xS; i on dum2xS;
How can I obtain estimates and significance for the joint effect of Group (i.e. of dum1 and dum2) and the joint effect of interaction terms (i.e. dum1xS and dum2xS)?
Jan Zirk posted on Thursday, September 19, 2013 - 9:26 am
That is indeed cumbersome but good that possible. Thank you for this.
Jonathan posted on Monday, December 23, 2013 - 8:14 am
I've run across something strange. I'm running a fairly large SEM with multiple latent variables and a number of dichotomous dummy variables as predictors of the latent variable. These dummy variables represent the person's occupation, so each respondent can only be "yes" for one.
I omitted a different dichotomous dummy variable, and I got a different chi-square value! In some cases, the difference in the chi-square is really big.
Two other things that are relevant: 1) I'm using the WLSMV estimator because I have a lot of categorical indicators. 2) I tried running a much simpler model in Mplus too; this was just a simple regression that went like "income ON occ1 occ2 occ3..." And omitted a different variable each time. Neither the R-Square nor the Chi-Square changed. I used the default ML estimator in that one.
Do you happen to have an idea why it would change in the first analysis but not the second? Thanks for your time in advance.
If you add a binary predictor and a significant direct effect is not accounted for the chi-square would increase. This will be particularly an issue if there are empty cells in the [indicator,covariate] bivariate table.
Use "OUTPUT:samp residual mod;" to determine where the misfit comes from. If the model is large adding one covariate at a time would make it easier to proceed.
You should also keep this in mind. The test of fit is a test between the structural model and the unrestricted multivariate probit regression model given in the samp output. As you change the variables in the model the unrestricted model changes. Is principle that is nothing to worry about and it is just as in the ML estimation for the continuous case. In certain situations however this has implications for models with parameter constraints, such as equal parameters.
Jonathan posted on Monday, December 23, 2013 - 1:33 pm
Well, the issue isn't so much that the chi-square increases. The problem is that I have a series of dummy variables. Let's say I run a model with 5 predictor variables--all of them are dichotomous, one person can only have one occupation, and every person in the sample has an occupation. x1: Manager x2: Lawyer x3: Administrative Assistant x4: Short-Order Cook x5: Engineer
I have to omit one because of multicollinearity issues, the same as in a standard regression. But if I omit Manager, the chi-square is substantially higher than if I omit Lawyer; in fact, there are five different chi-square values, and it changes depending on which variable is omitted.
I know that chi-square test for WLSMV isn't exactly the same as the ML, but I've never seen model fit statistics change depending on the omitted variable. Is there a way so that this won't happen, or a citation that I can use to say it is okay?
This is correct. The issue is specific to the WLSMV estimator. While the unrestricted model does not change, the unrestricted parameters that are fitted will change and also the weights for these parameters will change. If you use the ULSMV estimator you will get less variation in the chi-square because you will eliminate the variation in the weights, but the fit function will still be different as you change the covariates and with that the unrestricted parameters that you are using to construct a fit function of squared differences.
So again, the unrestricted H1 model is the same/equivalent, the restricted model you estimate is the same/equivalent. What changes is the estimator, because the fit function changes.
There is nothing to worry about in principle. There is no error in the estimation. The chi-square fit function will reach the same conclusion: reject or accept regardless of which covariates you choose, although the actual value can change. Typically however if the chi-square value changes a lot as you change the covariates this is an indicator of misfit of the structural model and omitted direct effects.
Note also that not only your chi-square changes but the actual structural model changes in non-equivalent ways (for example the loadings can change not just the means/thersholds). In principle you can choose the "best" covariates to obtain the best fitting model, but you should still examine direct effect omissions as a follow up step ... and if you do that you should approximately reach equivalent models even if you start with a different set of covariates.
A different/more elaborate approach is to use multiple groups modeling instead of the binary covariates.