Binary predictors in WLSMV
Message/Author
 George Burruss posted on Saturday, May 02, 2009 - 1:46 pm
I have a model with the following variables:
F1 is a latent second-order factor;
F2 is a latent DV from three ordinal measures.
X1 is continuous.
X2 is binary.
X3 is binary.
X4 is binary;.
X5 is binary, but the reference category for X3 & X4.

Model (I left out the measurement models for F1 & F2):
F2 ON F1 X1 X2 X3 X4;
F1 ON X1 X2 X3 X4;
X1 WITH X2 X3 X4;

My questions are(1)do I need to correlate the binary variables with X1? I've read that binary variables in WLSMV should only be regressed on predictors and not correlated. They are regressed directly on F2 and indirectly through F1, but I'm not sure what to do about the other exogenous variables. If I don't expect a causal effect, do I just leave the correlations out?

(2) If F2 is a latent factor comprised of three ordered categorical variables, am I getting OLS regression coefficients or am I getting probits since the factor scale is set by the first observed ordinal variable.

Thank you.
 Linda K. Muthen posted on Sunday, May 03, 2009 - 9:57 am
The model is estimated conditioned on the observed exogenous variables x1, x2, x3, and x4. Their means, variances, and covariances are not parameters in the model. When you include the WITH statement shown above, you turn them into dependent variables and make distributional assumptions about them. I would not do this. Note that in regression, covariates can be binary or continuous and in both cases they are treated as continuous variables. If you want the means, variances, and covariances of x1, x2, x3, and x4, do a TYPE=BASIC;

The scale of the dependent variable determines the type of regression estimated not how the metric is set. With ordered categorical factor indicators, the factor loadings are probit regression coefficients with weighted least squares estimation and logistic regression coefficients with maximum likelihood estimation and the default logit link.
 George Burruss posted on Sunday, May 03, 2009 - 4:59 pm
Thank you for clearing that up.

I see your point about making distributional assumptions. If I leave out the correlation among exogenous variables, however, is the model not mispecified since the correlations are not accounted for?

Do you recommend a white paper or part of the Mplus manual that specifically covers the point you made about correlating the exogenous variables using WITH?

I anticipate having to explain to reviewers why X1-X4 were not correlated.
 George Burruss posted on Sunday, May 03, 2009 - 5:18 pm
Sorry, I forgot one more question.

Just to make sure I understand you, your suggestion 'not to correlate the exogenous variables' refers just to the binary predictors? In other words, the concern about distribution assumptions has to do with the binary variables only and there is no problem with correlating continuous exogenous variables.

Thank you.
 Linda K. Muthen posted on Sunday, May 03, 2009 - 6:08 pm
Think about the following simple linear regression:

y = a + b1x1 b2x2 + e

What are the parameters in the regression model? They are the intercept of y, the residual variance of y, and the two regression coefficient b1 and b2. The means and variances of x1 and x2 and the covariance between x1 and x2 are not parameters in the regression model. This does not mean these parameters are fixed at zero. It means that the model is estimated conditioned on x1 and x2. If you want to report the means and variances of x1 and x2 and the covariance between x1 and x2, use the simple descriptive statistics for these variables.

Means, variances, and covariances of all observed exogenous variables whether they are binary or continuous should not be inlcuded in the MODEL command.