Anonymous posted on Sunday, June 08, 2003 - 2:49 pm
I am trying to run SEM on a dichotomous dependent variable with a combination of observed variables (dichotomous and continous) and three continuous latent variables. It is estimated by WLSMV. An initial model generated the following tests of model fit: chi-square value of 37.7 (df=30), p = .16. N = 137, CFI = .92, WRMR = .845.
I have two questions:
1. Do I have to inspect and report univariate and multivarite normality information for SEM with a dichotomous dependent variable? Is it as important as SEM with continous measured variables to provide info on normality? Does Mplus v2.13 calculate Mardia's coefficient for examining multivarite normality? If not, any alternative available?
2. What is the best way of doing (or calculating) "power" analysis for a dichotomous dependent variable?
Do you have only one dependent variable? Are the dichotomous and continuous observed variables and the three latent variables independent (exogenous) variables?
Anonymous posted on Monday, June 09, 2003 - 11:46 am
I have one dichotmous dependent observed variable. All the other dichtomous and continuous observed variables are exogenous variables. However, the three latent variables are endogenous variables. And these three latent variables, in turn, serve as exogenous variables to the one dichotmous dependent variable. They were conceptualized as mediating factors between selected observed variables and the one dichotmous dependent observed variable.
In your model, the normality assumption is that for the distal dichotomous outcome and any factor indicators of the mediating latent variables that might be binary or polytomous, the y* variables are normal conditioned on the exogenous x variables. I know of no way to test this conditional normality. Mardia's coefficient is for continuous outcomes.
Calculating power for categorical outcomes will be easy with Version 3 of Mplus. In Version 2, it would be very tedious to do. You would need to generate data outside of Mplus and analyze it using the RUNALL utility. You would need to save all outputs and then see, for the parameter of interest, how many ratios of the parameter to the standard error of the parameter are greater than 1.96.
Anonymous posted on Tuesday, June 10, 2003 - 2:13 pm
Thanks for your reply. A couple of follow-up questions on your response.
1. When there is no way to test normality in my model, is there any reference that I can cite in my paper to that effect?
2. Could you elaborate a little bit more on power calculation? I understand that Mplus version 3 comes out in Fall, but I'd appreciate if you could explain in more detail a process of power calculation so that I will be able to do the calculation in Version 2. Any references on the topic as well?
1. See Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In K. A. Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205-243). Newbury Park, CA: Sage.
2. See Muthén, L.K. and Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620. This paper is for continuous outcomes but the same strategy applies.
Anonymous posted on Wednesday, June 11, 2003 - 9:28 pm
Thank you so much for the references and your help.
Anonymous posted on Sunday, October 24, 2004 - 10:28 am
I have a question. I have 5 dichotomous observed variables that I would like to use for my latent variable. I know this isn't suggested but how can I make this work? My lit review confirmed that these variables were proper ways to measure my latent variable. Can I create a scale of some sort? What can I do and how can I do it?
I am not aware of such an argument. It would seem strange because item response theory (IRT) is the same as factor analysis of binary items and it has been accepted for decades.
Anonymous posted on Thursday, June 02, 2005 - 9:38 am
Hello! I am attempting to figure multivariate normality statistics for my data (three proposed latent factors - 1 continuous and 2 possibly categorical). When I request Tech 13, I only get stats for my continuous variable. Thus, I changed all of my variables to continuous. When I did this, the operation ran normally but at the point in the output that it reads "Technical Output 13," there is a blank space before "Technical Output 8" with no information available. Any ideas what might be going on? Thanks!
My model has four latent variables and its respective constructs (3-4 constructs for each latent variable) and one dichotomous dependent observed variable.
My dissertation committee is rejecting my completed analysis by saying that I cannot use a dichotomous dependent observed variable in a SEM model, and that all SEM variables need to be continuously measured. What do say about that?
If the answer is yes, can you point me to publications to substantiates it?
My model has four latent variables, its respective constructs (3-4 constructs for each latent variable) and one dichotomous dependent observed variable.
My dissertation committee is rejecting my completed analysis by saying that I cannot use a dichotomous dependent observed variable in a SEM model, and that all SEM variables need to be continuously measured. What do you say about that? Any help and reference to current literature on the subject would be greatly appreciated.
Muthén, B. & Asparouhov, T. (2008). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.
This paper is more general that what you asking because it includes mixtures. You can download it from the website.
in a two-level model, I have an overall binary dependent variable. On within level, I have a some nominal indicators. On between level, I have four latent factors (measured by continuous indicators) that (partially) mediate the effect of some other observed indicators.
The model seems to work fine, I just wonder about the number of integration dimensions. From my understanding, it should be five - one for the random effect and four for the latent variables. However, it says there is only one dimension. What point am I missing?
Could you please further explain why the latent variables on Between do not need numerical integration.
In my application, the latent variables are traits that are stable over choices made by individuals in four situations that are described by experimental variables on WITHIN. So, I cannot specify the traits on WITHIN because they have no within variance (or can I?).
On the other hand, these latent traits affect the binary choice probabilities. As the traits are not perfectly known, but only to its distributions, shouldnt the liklihood function be integrated over the distribution of the latent traits? Can I "force" Mplus to do so or does integration over the latent traits make no sense at all in this case?
Having latent variables on Between that have continuous Between-level indicators is just like in a regular single-level SEM. In the EM algorithm, the posterior distribution for the factors is explicit (normal) and therefore does not need numerical integration. If the indicators had been categorical say, it would have needed it.
Integration over the factors is already taken into account when you consider the distribution of the between-level indicators in the sense that a normal factor plus normal residuals produce normal indicators that have variation due to both factors and residuals.
I really hope not to be too demanding on this issue, but please correct me when I summarize my take away:
When my overall dependent variable is observed binary or nominal, I do not need numerical integration for latent continous predictors, measured by continous indicators and directly affecting the DV - neither for regular models nor for the between level of two level-models.
But I do need integration for within level latent predictors (if so, why??)
I am asking because in a strand of discrete choice literature it is argued that when latent predictors are included in lets say a logistic regression, the likelihood function needs to be integrated over the normal residuals implied by the conditional distribution of the latent variables given observed antecedents of the latent variables. (Continuous) Indicators for the latent variables are introduced through factor-analytic measurement models in order to support empirical identification of the latent effects on choice.
If you have a continuous factor predicting a categorical DV you will need numerical integration, even in a single-level model. This is because the DV is like just another factor indicator, so you fall into the case of continuous latent, categorical indicator - here the posterior of the latent is not explicit (such as normal).
My answer was for the case of a between-level factor measured by continuous between-level indicators.
Ok, I see. There needs to be integration when continuous factors predict a categorical DV. In the twolevel model below where my question orginiated from, however, there is only one integration dimension on between level for the random effect when I run it. I do not need to specify "factor BY choice ..." to explicitly treat choice as indicator of the factor, do I?
VARIABLES: ...CATEGORICAL = choice; WITHIN = genres access exchange forum price1 price2 price3; BETWEEN = sex age int_pur bill use_freq multi_imp pf1 pf2 ea1 ea2 en1 en2 en3 en4 sv1 sv2 sv3 sv4; CLUSTER = id; ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = MLR; ALGORITHM = INTEGRATION; MODEL: %WITHIN% choice ON genres access exchange forum price1(p1)price2(p2)price3(p3); %BETWEEN% prefit BY pf1@1 pf2; ease BY ea1@1 ea2; enjoy BY en1@1 en2 en3 en4; socval BY sv1@1 sv2 sv3 sv4; prefit ON sex age int_pur bill use_freq multi_imp; ease ON sex age int_pur bill use_freq multi_imp; enjoy ON sex age int_pur bill use_freq multi_imp; socval ON sex age int_pur bill use_freq multi_imp; choice ON prefit ease enjoy socval;
But shouldn't there be four more integration dimensions on between level in addition to the one for the random intercept. As you said there needs to be integration when continuous latent factors predict a categorical DV.
When I run the model, there is only one integration dimension...
No, there should not be four more integration dimensions on the Between level because the Between-level factors do not predict a categorical DV on the Between level but a continuous DV, namely the random intercept of "choice".
Is it possible to have Mplus more dimensions of integration for the latent variables (in order to get closer to what the discrete choice literature recommends that I am referring to) - without getting rid of the two level model with random intercept to account for repeated observations?
When I include the latent variables on WITHIN they are considered in the integration procedure. Does this make sense although they do not have WITHIN variance?
You should distinguish between model features and algorithmic features. Algorithmic features are details of the estimator choice, in this case ML, which follow from a given model, not the other way around.
Mplus uses the dimensions of integration that your model input specifications require. So it is up to your model specification - not the numerical integration - to match whatever modeling literature that you consider.
If you give me a reference I might find time to be able to tell you what the model specifications should be that you are interested in.
With the model specification above, can I assume that all measurement errors in the latent factors predicting choice are captured in one random intercept of unobserved heterogeneity? That is why the specification above does not require to integrate over each latent predictor?
So if I had only one observation per individual, integration over each latent factor would be required?
Our paper under review is being held up because our sem model contains a binary dependent variable. That variable is not latent, i.e. has no other indicators. Reviewers write "sem is not appropriate for binary outcomes". The model is a sem network with some exogenous variables and 5-6 latent endogenous variables, including that one binary variable. The binary variable's predictors are all latent, endogenous variables.
Can I please ask these questions (as we consider switching to MPlus)?
1. Is it true that MPlus can handle this? Any special handling needed?
2. Does it matter whether we "tell" Mplus that that variable is directly observed, or whether we tell MPlus that it is latent but has only the one binary observed indicator?
3. We had previously been using PLS. Our idea was to use PLS on the whole network, EXCEPT we would separately run the one regression that has binary outcome, in a logistic regression that uses the principal components (i.e. output of the first step of PLS) as the predictors.
Do you have any comment on the merit of using MPlus, compared with what we had planned?
4. Assuming we use MPlus, how do you recommend we respond to reviewers who state -- AND CITE PUBLISHED PAPERS -- that "sem assumes continuous variables and so is not appropriate in a model that contains a binary outcome"?
I have completed an SEM Model with a latent endogenous variable consisting of 7 binary indicators. When I ran the CFA (using WLSMV), the model fit statistics were not ideal (RMSEA .077, CFI .804, and TLI .715) even after using the modification indices to adjust the error terms. My sample size is 1,895.
Are the model fit statistics affected by the fact that all of the indicators are binary? Is it still appropriate for me to run the full SEM model even though my endogenous variable had a less than ideal fit?