Mplus Discussion >> SEM with Dichotomous dependent variable: multivariate normality and power calculation

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


SEM with Dichotomous dependent variab...

Mplus Discussion > Categorical Data Modeling >

Message/Author

Anonymous posted on Sunday, June 08, 2003 - 2:49 pm

I am trying to run SEM on a dichotomous dependent variable with a combination of observed variables (dichotomous and continous) and three continuous latent variables. It is estimated by WLSMV. An initial model generated the following tests of model fit:
chi-square value of 37.7 (df=30), p = .16.
N = 137,
CFI = .92,
WRMR = .845.

I have two questions:

1. Do I have to inspect and report univariate and multivarite normality information for SEM with a dichotomous dependent variable? Is it as important as SEM with continous measured variables to provide info on normality? Does Mplus v2.13 calculate Mardia's coefficient for examining multivarite normality? If not, any alternative available?

2. What is the best way of doing (or calculating) "power" analysis for a dichotomous dependent variable?

Linda K. Muthen posted on Sunday, June 08, 2003 - 5:09 pm

Do you have only one dependent variable? Are the dichotomous and continuous observed variables and the three latent variables independent (exogenous) variables?

Anonymous posted on Monday, June 09, 2003 - 11:46 am

Dear Linda,

I have one dichotmous dependent observed variable. All the other dichtomous and continuous observed variables are exogenous variables. However, the three latent variables are endogenous variables. And these three latent variables, in turn, serve as exogenous variables to the one dichotmous dependent variable. They were conceptualized as mediating factors between selected observed variables and the one dichotmous dependent observed variable.

I hope this is sufficient information. Thanks!

Linda K. Muthen posted on Tuesday, June 10, 2003 - 9:04 am

In your model, the normality assumption is that for the distal dichotomous outcome and any factor indicators of the mediating latent variables that might be binary or polytomous, the y* variables are normal conditioned on the exogenous x variables. I know of no way to test this conditional normality. Mardia's coefficient is for continuous outcomes.

Calculating power for categorical outcomes will be easy with Version 3 of Mplus. In Version 2, it would be very tedious to do. You would need to generate data outside of Mplus and analyze it using the RUNALL utility. You would need to save all outputs and then see, for the parameter of interest, how many ratios of the parameter to the standard error of the parameter are greater than 1.96.

Anonymous posted on Tuesday, June 10, 2003 - 2:13 pm

Dear Linda,

Thanks for your reply. A couple of follow-up questions on your response.

1. When there is no way to test normality in my model, is there any reference that I can cite in my paper to that effect?

2. Could you elaborate a little bit more on power calculation? I understand that Mplus version 3 comes out in Fall, but I'd appreciate if you could explain in more detail a process of power calculation so that I will be able to do the calculation in Version 2. Any references on the topic as well?

Thanks again in advance!!

Linda K. Muthen posted on Wednesday, June 11, 2003 - 8:30 am

1. See Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In K. A. Bollen, & J. S. Long (Eds.), Testing
Structural Equation Models (pp. 205-243). Newbury Park, CA: Sage.

2. See Muthén, L.K. and Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620. This paper is for continuous outcomes but the same strategy applies.

Anonymous posted on Wednesday, June 11, 2003 - 9:28 pm

Thank you so much for the references and your help.

Anonymous posted on Sunday, October 24, 2004 - 10:28 am

I have a question. I have 5 dichotomous observed variables that I would like to use for my latent variable. I know this isn't suggested but how can I make this work? My lit review confirmed that these variables were proper ways to measure my latent variable. Can I create a scale of some sort? What can I do and how can I do it?

Linda K. Muthen posted on Monday, October 25, 2004 - 8:59 am

Factor analysis of binary indicators can be done in Mplus using either a weighted least squares or maximum likelihood estimator. I'm not sure what you mean by "isn't suggested."

Anonymous posted on Monday, October 25, 2004 - 12:21 pm

Doesn't Bollen argue against using a dichotomous variable as an indicator?

Linda K. Muthen posted on Monday, October 25, 2004 - 1:49 pm

I am not aware of such an argument. It would seem strange because item response theory (IRT) is the same as factor analysis of binary items and it has been accepted for decades.

Anonymous posted on Thursday, June 02, 2005 - 9:38 am

Hello! I am attempting to figure multivariate normality statistics for my data (three proposed latent factors - 1 continuous and 2 possibly categorical). When I request Tech 13, I only get stats for my continuous variable. Thus, I changed all of my variables to continuous. When I did this, the operation ran normally but at the point in the output that it reads "Technical Output 13," there is a blank space before "Technical Output 8" with no information available. Any ideas what might be going on? Thanks!

Linda K. Muthen posted on Thursday, June 02, 2005 - 11:35 am

You will need to send your output, data, and license nuimber to support@statmodel.com. This type of question is best answered via Mplus support.

jackson de carvalho posted on Wednesday, October 11, 2006 - 10:43 am

Dear Linda,

My model has four latent variables and its respective constructs (3-4 constructs for each latent variable) and one dichotomous dependent observed variable.

My dissertation committee is rejecting my completed analysis by saying that I cannot use a dichotomous dependent observed variable in a SEM model, and that all SEM variables need to be continuously measured. What do say about that?

If the answer is yes, can you point me to publications to substantiates it?

Thank you.

Linda K. Muthen posted on Wednesday, October 11, 2006 - 10:58 am

In Mplus, an SEM model can have variables measured on many different scales among them categorical. EQS and LISREL can also analyze categorical dependent variables as part of an SEM model.

jackson de carvalho posted on Wednesday, October 11, 2006 - 11:04 am

Dr. Muthen,

My model has four latent variables, its respective constructs (3-4 constructs for each latent variable) and one dichotomous dependent observed variable.

My dissertation committee is rejecting my completed analysis by saying that I cannot use a dichotomous dependent observed variable in a SEM model, and that all SEM variables need to be continuously measured. What do you say about that?
Any help and reference to current literature on the subject would be greatly appreciated.

jackson de carvalho posted on Wednesday, October 11, 2006 - 1:05 pm

Dr. Muthen,

Thank you for the response of my previous question. However, can a SEM model with a dependent dichotomous obseved variable be measured with AMOS?

Linda K. Muthen posted on Wednesday, October 11, 2006 - 2:28 pm

The last I heard AMOS could not estimate models with categorical outcomes. To be certain, however, you should ask AMOS technical support. Things may have changed.

Lyle F. Bachman posted on Wednesday, May 28, 2008 - 12:33 pm

How robust is Mardia's coefficient to curvilinearity among observed variables?

Linda K. Muthen posted on Wednesday, May 28, 2008 - 1:35 pm

I don't know.

Lyle F. Bachman posted on Thursday, May 29, 2008 - 7:45 am

Do you know who might know? Thanks.

Linda K. Muthen posted on Thursday, May 29, 2008 - 8:50 am

No I don't.

Emily Blood posted on Monday, September 15, 2008 - 11:31 am

Do you have a technical appendix or reference that gives the details of the MLR estimation when a model with a logit link and binary outcome is used? If so, I would really appreciate it.
Thanks!

Linda K. Muthen posted on Monday, September 15, 2008 - 11:37 am

Muth�n, B. & Asparouhov, T. (2008). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.

This paper is more general that what you asking because it includes mixtures. You can download it from the website.

Christoph Ihl posted on Tuesday, April 07, 2009 - 12:28 am

Dear Linda and Bengt,

in a two-level model, I have an overall binary dependent variable. On within level, I have a some nominal indicators.
On between level, I have four latent factors (measured by continuous indicators) that (partially) mediate the effect of some other observed indicators.

The model seems to work fine, I just wonder about the number of integration dimensions. From my understanding, it should be five - one for the random effect and four for the latent variables. However, it says there is only one dimension. What point am I missing?

Thank you very much, Christoph

Bengt O. Muthen posted on Tuesday, April 07, 2009 - 6:17 pm

The latent variables on Between do not need numerical integration.

Christoph Ihl posted on Thursday, April 09, 2009 - 5:15 am

Could you please further explain why the latent variables on Between do not need numerical integration.

In my application, the latent variables are traits that are stable over choices made by individuals in four situations that are described by experimental variables on WITHIN. So, I cannot specify the traits on WITHIN because they have no within variance (or can I?).

On the other hand, these latent traits affect the binary choice probabilities. As the traits are not perfectly known, but only to its distributions, shouldnt the liklihood function be integrated over the distribution of the latent traits? Can I "force" Mplus to do so or does integration over the latent traits make no sense at all in this case?

Bengt O. Muthen posted on Thursday, April 09, 2009 - 9:46 am

Having latent variables on Between that have continuous Between-level indicators is just like in a regular single-level SEM. In the EM algorithm, the posterior distribution for the factors is explicit (normal) and therefore does not need numerical integration. If the indicators had been categorical say, it would have needed it.

Integration over the factors is already taken into account when you consider the distribution of the between-level indicators in the sense that a normal factor plus normal residuals produce normal indicators that have variation due to both factors and residuals.

Christoph Ihl posted on Thursday, April 09, 2009 - 3:55 pm

I really hope not to be too demanding on this issue, but please correct me when I summarize my take away:

When my overall dependent variable is observed binary or nominal, I do not need numerical integration for latent continous predictors, measured by continous indicators and directly affecting the DV - neither for regular models nor for the between level of two level-models.

But I do need integration for within level latent predictors (if so, why??)

I am asking because in a strand of discrete choice literature it is argued that when latent predictors are included in lets say a logistic regression, the likelihood function needs to be integrated over the normal residuals implied by the conditional distribution of the latent variables given observed antecedents of the latent variables. (Continuous) Indicators for the latent variables are introduced through factor-analytic measurement models in order to support empirical identification of the latent effects on choice.

Thank you, Christoph

Bengt O. Muthen posted on Thursday, April 09, 2009 - 6:20 pm

If you have a continuous factor predicting a categorical DV you will need numerical integration, even in a single-level model. This is because the DV is like just another factor indicator, so you fall into the case of continuous latent, categorical indicator - here the posterior of the latent is not explicit (such as normal).

My answer was for the case of a between-level factor measured by continuous between-level indicators.

Christoph Ihl posted on Thursday, April 09, 2009 - 11:35 pm

Ok, I see. There needs to be integration when continuous factors predict a categorical DV. In the twolevel model below where my question orginiated from, however, there is only one integration dimension on between level for the random effect when I run it.
I do not need to specify "factor BY choice ..." to explicitly treat choice as indicator of the factor, do I?

VARIABLES: ...CATEGORICAL = choice;
WITHIN = genres access exchange forum price1 price2 price3;
BETWEEN = sex age int_pur bill use_freq multi_imp pf1 pf2 ea1 ea2 en1 en2 en3 en4 sv1 sv2 sv3 sv4;
CLUSTER = id;
ANALYSIS: TYPE = TWOLEVEL;
ESTIMATOR = MLR;
ALGORITHM = INTEGRATION;
MODEL: %WITHIN%
choice ON genres access exchange forum price1(p1)price2(p2)price3(p3);
%BETWEEN%
prefit BY pf1@1 pf2;
ease BY ea1@1 ea2;
enjoy BY en1@1 en2 en3 en4;
socval BY sv1@1 sv2 sv3 sv4;
prefit ON sex age int_pur bill use_freq multi_imp;
ease ON sex age int_pur bill use_freq multi_imp;
enjoy ON sex age int_pur bill use_freq multi_imp;
socval ON sex age int_pur bill use_freq multi_imp;
choice ON prefit ease enjoy socval;

Bengt O. Muthen posted on Friday, April 10, 2009 - 1:54 pm

You have one dimension of integration on Between because of the "choice" variable, which on Between is the random intercept of the binary or ordinal variable "choice" on Within.

If your "choice" variable is polytomous, unordered (i.e. nominal) you should declare it as nominal and in that case you would have C-1 intercepts on Between for a C-category nominal.

Christoph Ihl posted on Saturday, April 11, 2009 - 1:29 am

My DV is binary.

But shouldn't there be four more integration dimensions on between level in addition to the one for the random intercept. As you said there needs to be integration when continuous latent factors predict a categorical DV.

When I run the model, there is only one integration dimension...

Bengt O. Muthen posted on Saturday, April 11, 2009 - 8:08 am

No, there should not be four more integration dimensions on the Between level because the Between-level factors do not predict a categorical DV on the Between level but a continuous DV, namely the random intercept of "choice".

Christoph Ihl posted on Saturday, April 11, 2009 - 9:05 am

Ah, I see.

Is it possible to have Mplus more dimensions of integration for the latent variables (in order to get closer to what the discrete choice literature recommends that I am referring to) - without getting rid of the two level model with random intercept to account for repeated observations?

When I include the latent variables on WITHIN they are considered in the integration procedure. Does this make sense although they do not have WITHIN variance?

Bengt O. Muthen posted on Saturday, April 11, 2009 - 10:15 am

You should distinguish between model features and algorithmic features. Algorithmic features are details of the estimator choice, in this case ML, which follow from a given model, not the other way around.

Mplus uses the dimensions of integration that your model input specifications require. So it is up to your model specification - not the numerical integration - to match whatever modeling literature that you consider.

If you give me a reference I might find time to be able to tell you what the model specifications should be that you are interested in.

Christoph Ihl posted on Saturday, April 11, 2009 - 11:49 am

An exemplary reference can be found here:
http://people.bu.edu/joanw/JW_CHLAT.pdf.

With the model specification above, can I assume that all measurement errors in the latent factors predicting choice are captured in one random intercept of unobserved heterogeneity? That is why the specification above does not require to integrate over each latent predictor?

So if I had only one observation per individual, integration over each latent factor would be required?

david bodoff posted on Sunday, November 11, 2012 - 1:51 am

Dear Dr. Muthen

Our paper under review is being held up because our sem model contains a binary dependent variable. That variable is not latent, i.e. has no other indicators. Reviewers write "sem is not appropriate for binary outcomes". The model is a sem network with some exogenous variables and 5-6 latent endogenous variables, including that one binary variable. The binary variable's predictors are all latent, endogenous variables.

Can I please ask these questions (as we consider switching to MPlus)?

1. Is it true that MPlus can handle this? Any special handling needed?

2. Does it matter whether we "tell" Mplus that that variable is directly observed, or whether we tell MPlus that it is latent but has only the one binary observed indicator?

3. We had previously been using PLS. Our idea was to use PLS on the whole network, EXCEPT we would separately run the one regression that has binary outcome, in a logistic regression that uses the principal components (i.e. output of the first step of PLS) as the predictors.

Do you have any comment on the merit of using MPlus, compared with what we had planned?

4. Assuming we use MPlus, how do you recommend we respond to reviewers who state -- AND CITE PUBLISHED PAPERS -- that "sem assumes continuous variables and so is not appropriate in a model that contains a binary outcome"?

Thank you very much for your guidance

Linda K. Muthen posted on Sunday, November 11, 2012 - 10:18 am

1. Yes. Put the variable on the CATEGORICAL list.

2. No. These treatments are identical.

3. No comment.

4. See the following paper which is available on the website:

Muth�n, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.

Pamela Medina posted on Tuesday, June 16, 2015 - 7:23 am

Hello Dr. Muthen,

I have completed an SEM Model with a latent endogenous variable consisting of 7 binary indicators. When I ran the CFA (using WLSMV), the model fit statistics were not ideal (RMSEA .077, CFI .804, and TLI .715) even after using the modification indices to adjust the error terms. My sample size is 1,895.

Are the model fit statistics affected by the fact that all of the indicators are binary? Is it still appropriate for me to run the full SEM model even though my endogenous variable had a less than ideal fit?

Thank you in advance for your guidance.

Bengt O. Muthen posted on Tuesday, June 16, 2015 - 1:26 pm

Q1. The fit statistics are valid for binary indicators, that is, they are not worse because your indicator are binary.

Q2. No, I think you should first improve your 7-indicator model part.

Tyler Mason posted on Thursday, September 24, 2015 - 11:49 am

I am running a structural equation model with a binary dependent variable. Thus, I am not able to get model fit indices. Would it be appropriate for me to run the model treating the dependent variable as continuous in order to get model fit indices? Then, when the model fit is deemed sufficient to run the model with the DV as categorical. I would report estimates for this model.

Linda K. Muthen posted on Thursday, September 24, 2015 - 5:20 pm

If you don't get model fit indices, you must be using maximum likelihood estimation where no absolute fit statistics are available with categorical outcomes. It would not be appropriate to report fit indices when the variables are treated as continuous. The WLSMV estimator does give fit statistics for your case.

Tyler Mason posted on Friday, September 25, 2015 - 7:15 am

Thanks. I tried the WLSMV estimator and the model did not converge even after increasing the number of iterations. It is saying the a variable that is continuous may be dichotomous (it is actually continuous).

"
WARNING: VARIABLE STRESS MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.

NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED."

Bengt O. Muthen posted on Friday, September 25, 2015 - 8:01 am

Please send output, data, and license number to Support.

Marcel Paulssen posted on Thursday, October 01, 2015 - 8:51 am

Dear Bengt, dear Linda,

you mentioned in one of your posts the following paper:

Muth�n, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Accepted for publication in Psychometrika.

unfortunately I could not find it. Can you recommend others papers to justify WLSMV for binary dependent variables?

Thank you very much in advance!

Bengt O. Muthen posted on Thursday, October 01, 2015 - 6:41 pm

That paper is on our website under Papers, SEM.

There are also more applied papers that you find on our site.

Pankaj Singhi posted on Tuesday, October 10, 2017 - 11:59 pm

I wanted to understand how does the functionality of SEM differ when we have variables on different scale types. So let say in one study all the variables are continuous, in another study all of them are binary and in another all of them are categorical. So how does SEM functionality differ. Does this change in variable scale also means that we need to look at different fit indices ?
Kindly help

Bengt O. Muthen posted on Wednesday, October 11, 2017 - 2:24 pm

See our Topic 2 video and handout on our short course page.