Categorical outcomes PreviousNext
Mplus Discussion > Categorical Data Modeling >
Message/Author
 Anonymous posted on Thursday, March 13, 2003 - 7:32 am
I have a data with a binary outcome and a binary mediating variable. My exogenous variables are a mixture of latent and observed variables. Can I use m-plus version 2 to analyze this data?
 Linda K. Muthen posted on Thursday, March 13, 2003 - 9:09 am
Yes, it may be necessary to use our THETA parameterization for this model. If so, when you run the program, it will tell you.
 Angela posted on Monday, April 21, 2003 - 11:41 am
I have estimated a simple path model (no latent) with two measured continuous variables and four measured categorical variables. A reviewer is asking me to specify what type of correlations (tetrachoric, polychoric, Pearson) I am reporting in the correlation matrix. There's no indication on the Mplus output or in the manual what type of correlations are repoted in the " CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)" section. Could you clear this up for me? Thank you!
 Linda K. Muthen posted on Monday, April 21, 2003 - 12:50 pm
The type of correlation is determined by the scale of the two variables. If both are continuous, the correlation is a Pearson product moment correlation. If both are binary, it is a tetrachoric correlation. If both are ordered polytomous, it is a polychoric correlation. If one is dichotomous and one is continuous, it is a biserial correlation. If one is ordered polytomous and one is continuous, it is a polyserial correlation. I hope this answers your question.
 Angela posted on Monday, April 21, 2003 - 1:34 pm
You answered my quesiton perfectly. I was writing my responses to the reviewer exactly the way you described them, but wanted to know for sure I was saying the correct thing. Thank you so much!!

One more quick question - is there any way to get Mplus to print statistical significance levels for these correlations?
 Linda K. Muthen posted on Monday, April 21, 2003 - 4:29 pm
If you ask for TYPE=BASIC, you will get the correlations and also the standard deviations for each correlation. If you divide the correlation by its standard error, this is like a z-test.
 Greg posted on Wednesday, July 30, 2003 - 9:12 am
I have a model (3 exogenous and 3 endogenous latent constructs) with a binary categorical outcome. The other variables are partly ordinal, partly continuous.

As my data is skewed and the sample size is large enough (n=1000) I would like to use the WLS estimator. Do I have to apply a biserial correlation (as computed in PRELIS) or do I have to compute point-biserial correlations. If the latter is true, is it possible to compute the point-biserial correlations and the corresponding weight matrix with Mplus?

Or do I have to use latent class analysis instead?
 Linda K. Muthen posted on Wednesday, July 30, 2003 - 9:23 am
You need to have raw data. Mplus will compute the appropriate correlations for the scale of the variable. For example, with a dichotomous variable and a continuous variable, a biserial correlation is computed.
 Anonymous posted on Friday, August 15, 2003 - 1:13 pm
I have two dichotomous outcome measures (Y1 and Y2), that are supposed to be correlated with each other. The two outcomes were modeled simultaneously using Mplus 2.14, being predicted by the same set of Xs. With "TYPE=GENERAL," Mplus provides probit regression coefficients for the two outcomes. However, the model fit statistics/indexes are confusing: Chi-Square=51.841, d.f.=13, P-Value=0.0000; CFI=1.000, TLI=1.000; RMSEA= 0.000; and WRMR= 0.000. When some Xs were removed from the model, CFI or TLI had a value greater than 1.0 sometimes. What appropriate model fit statistics/indexes should I use for the simultaneous probit model?

In addition, Mplus provides the correlation between Y1* and Y2*, which are the underlying latent variables of Y1 and Y2, respectively. How should I interpret this correlation given the fact that probit model has no residual term? Setting the correlation to 0.0 didn't change the coefficient estimates and their standard errors, but changed chi-square statistic and other fit indexes (e.g., CFI changed from 1.000 to 0.234 TLI changed from 1.000 to -7.424(why negative TLI)). Was the correlation between Y1* and Y2* factored in the modeling?

Your help will be highly appreciated!
 bmuthen posted on Saturday, August 16, 2003 - 4:51 pm
Please send the output from both of the models you mention to support@statmodel.com so that your full results can be studied.

Regarding the residual correlation between y1* and y2*, the probit model does have a residual term. Although the variance of the residual is standardized to one, it is possible to estimate residual correlations with multivariate outcomes. If residual correlations are given in the Model Results section, they are estimated (see also Tech1 regarding which parameters are estimated).
 Anonymous posted on Thursday, August 21, 2003 - 11:23 am
Hi,
I am doing path analysis with several categorical outcome. the dependent variable and the mediating variable are both binary. how do I specify the 'THETA' parametriztion. Is ther an example?

Thanks
 bmuthen posted on Thursday, August 21, 2003 - 4:10 pm
Here is an example:

TITLE: this is an example of a Monte Carlo simulation study for a path analysis
DATA: FILE = firstcat.dat;
VARIABLE:
NAMES = y1-y3 x1-x3;
CATEGORICAL = y1-y3;
ANALYSIS:
PARAMETERIZATION = THETA;
MODEL:
y1 y2 on x1-x3*.5;
y3 on y1-y2*.5 x1-x3*.3;
OUTPUT: STANDARDIZED;
 davood posted on Thursday, December 02, 2004 - 9:34 am
Dear Dr. Muthen,
I am doing a two group CFA using categorical indicators. The mplus generated this errors:
*** ERROR
Group 5 does not contain all values of categorical variable: DADCARE1

group 5 is puertoricans. When I ran the model for just puerto ricans I did not face this error. Also, I ran the frequency and probed the DADCARE1 for group 5. Here is the result:
dadcare1
Value Frequency
-999 156
1.00 2
2.00 2
3.00 11
4.00 36
5.00 191
Total 398
 Linda K. Muthen posted on Thursday, December 02, 2004 - 10:46 am
It is likely because of listwise deletion, you don't have 398 observations and without all 398, you may not have all values on DADCARE1. Or you may be reading your data wrong in Mplus. If you can't figure it out, send your output and data to support@statmodel.com.
 empty logistic model posted on Friday, February 24, 2006 - 6:24 am
Dear Dr. Muthen,
I am having trouble replicating results given on an ucla website which uses mplus for examples from the Multilevel Book by snijders/bosker, chapter 14,
http://www.ats.ucla.edu/stat/mplus/examples/ma_snijders/chap14.htm
(it provides mplus syntax and data fo a multilevel binary logistic regression)

The website provides as a result:
Variances
COHAB 0.032 0.019 1.635

My own calculations (same data same syntax):
several serious problems and
Variances
COHAB 0.000

own data als yielded
several serious problems and
Variances
XYZ 0.000
Is there a problem in the mplus syntax for the "empty logistic model" ?:


Title: multilevel logistic regression
Data:
File is ch14_1.dat ;
Variable:
Names are
respnr ltbm eltwm sex age edu relserv class reg cohab;
Missing are all (-9999) ;
usevar are cohab reg;
categorical is cohab;
cluster is reg;
Analysis:
Type = twolevel random ;
model:
%between%
cohab;

Thanks in advance
 Linda K. Muthen posted on Friday, February 24, 2006 - 10:41 am
What version of Mplus are you using. What version did they use. You can see this by looking at their output if they provide the full output.
 empty logistic model posted on Saturday, February 25, 2006 - 12:53 am
Thanks for the quick response,
ucla version is not given.
How would you do the model in mplus syntax?
(binary outcome, only the level 2 cluster as explanatory variable)
 Linda K. Muthen posted on Saturday, February 25, 2006 - 7:39 am
The Mplus input is on the UCLA website. If you cannot get this to run, you need to send the input, data, output, and your license number to support@statmodel.com.
 Nyankomo Wambura Marwa posted on Monday, July 31, 2006 - 3:41 am
Dear Prof Muthen
I am fitting latent class models in MPLUs.I have four categorical manifest variables and i am interested to estimate the true status of the disease given the four diagnostic test.There is high likelihood that my manifest variables are correlated .Intitutively one latent class with two levels will be ideal.But when i try it the model does not fit.Only when i go up to level four is when the model fits better.

My questions are follows.

1.How do we detect the dependence among the manifest variables in MPLUs

2.Given that there is dependence how to model it.I saw in in MPlus users guide for version 1 of 2000 page 13 that WITH statement is used for correlational among continuous varibles but is not applicable to categorical variable.

Thanks in advance for your help
 Linda K. Muthen posted on Monday, July 31, 2006 - 10:27 am
The manifest variables should correlate. Their correlations are assumed to be zero within each class, but the variables will correlate when the classes are mixed. You can look at TECH10 to check for dependence among the manifest variables within calss. The WITH statement is used with catgorical observed variables in some models. With latent class analysis, you need to put a factor behind the two indicators to specify a correlation. See Example 7.16.
 Nyankomo Wambura Marwa posted on Monday, July 31, 2006 - 11:33 am
Dear Linda

Thanks for your quick response.But i have one more question.In your response u refered me to see Example 7.6 on how to use WITH statement for categorical variables.I am not clear it is example 7.6 from the user's manual or from where??.Just to remind you i have user's guide for MPLUS version 1.

See your response attached below for reference

Hoping to here from u soon.

With regards

Nyankomo marwa
 Nyankomo Wambura Marwa posted on Monday, July 31, 2006 - 12:00 pm
Dear Linda
Here is my syntax ,but i am not quite sure how to include with statement in modelling dependence among A B D H.
I tried to use

A B WITH D H but it does not work.

Thanks once again for your estimeed support.

Data:
File is D:\mp.txt;
Variable:
Names are A B D H count;
usevariables are A B D H count;
FREQWEIGHT is count ;
categorical are A B D H;
Missing are all (-9999) ;
classes = cl(2);
Analysis:
Type = mixture;
model:
%overall%

output:tech10;
 Linda K. Muthen posted on Monday, July 31, 2006 - 2:53 pm
I always refer to the most recent user's guide which is posted on the website. See Example 7.6 in this user's guide. Version 1 is very old. Your research will be improved by updating to Version 4.1. There are many changes that have occurred since Version 1 that will help you in mixture modeling particularly the addition of random starts.
 Peter Croy posted on Monday, May 14, 2007 - 7:24 am
I'm just starting to model a categorical outcome variable (a binary y/n DV) and have generated a model as the first step toward a chi sq difference test per instructions on page 314 of the guide.
Using the savedata: difftest is deriv.dat command saves a dat file which contains an error message, namely: The input file does not contain valid commands. What does this suggest?
 Linda K. Muthen posted on Monday, May 14, 2007 - 7:54 am
It is difficult to say without more information. If you send your input, data, output, and license number to support@statmodel.com, I can take a look at it.
 Yu Kyoum Kim posted on Friday, June 08, 2007 - 12:55 pm
Dear Dr. Muthen,

I am a new user of Mplus and I really like Mplus.

I have 4 continuous latent variables (IVs) which are all measured by continuous indicators.
However, dependent variable is one observed binary variable. I understand Mplus can incorporate this relationship but how?

1. What correlation matrix is used? Does matrix analyzed include both Pearson and biserial correlation even though I just have one binary variable in structural part?

2. What estimation method should I use?

3. The relationship in measurement part (between continuous latent variables and continuous indicators) is described by linear regression equation separately from structural part (between 4 continuous latent variable and one binary observed dependent variable) or all relationship is described by probit or logistic regression equation?


I am looking forward to hearing from you soon.

I truly appreciate your time and help.
 Linda K. Muthen posted on Friday, June 08, 2007 - 2:57 pm
Let's say you have four factors, f1 through f4, and one dependent binary variable, u1. You would specify the model as:

f1 BY ...
f2 BY ...
f3 BY ...
f4 BY ....
u1 ON f1-f4;

1. If you are using weighted least squares, the correlation depends on the scale of the two dependent variables.

2. You can use weighted least squares which is the default or maximum likelihood.

3. The type of regression coefficient is determined by the scale of the dependent variable. Factor indicators are dependent variables. When they are continuous, the regression coefficient is a linear regression coefficient. The regression of the categorical dependent variable on the continuous latent variables is probit or logistic depending on the estimator and link. It is probit for weighted least squares. It is a logistic regression coefficient for maximum likelihood using the default logit link. It is probit using the probit link.
 Omar Abdellatif posted on Monday, February 08, 2010 - 10:48 pm
Dear Dr. Muthen,
Kindly I'd like to know which product of Mplus 5.2 to purchase:
Mplus Base Program
Mplus Base Program and Mixture Add-On
Mplus Base Program and Multilevel Add-On
Mplus Base Program and Combination Add-On
to perform latent class analysis with 4 manfiset variables (diagnostic tests) in 400 animals to estimate the true disease status & estimate the sensitivity & specificity of each test.
My study is similar to Nyankomo Wambura Marwa's study posted in this wall.

also I'd like to ask if Mplus 3 can perform this type of test.
with Best regards
 Linda K. Muthen posted on Tuesday, February 09, 2010 - 9:54 am
I think the Mplus Base Program and Mixture Add-On would be the correct choice.
 harvey brewner posted on Monday, March 07, 2011 - 7:15 am
i conducted a efa and found two items (e6 and e12) that loaded strongly on the intended factor (E) as well as another factor (N). i then conducted a cfa without the cross-loadings and the model fit very well. i then ran another cfa allowing the first cross-loading item (e6) to cross-load and the model was not identified. the same was true when i ran another cfa allowing e12 instead of e6 to cross-load -- it was also not identified. in the first cfa, i.e. with no cross-loadings, there was a extremely high MI (e.g. >200) for e6 with e12.

1.) is there a connection between the strong cross-loadings of e6 and e12 in the efa and the strong correlation (i.e., large MI value) between e6 and e12 in the first cfa (i.e., where no cross-loadings were allowed)? for example, does the strong correlation in the cfa manifest itself as a large cross-loading in the efa?

2.)how can e6 and e12 have large cross-loadings on another factor (N) in the efa but the models were not identified in the cfa when the items were allowed to cross-load, separately, on E and N?

any thoughts on either question?
 Bengt O. Muthen posted on Monday, March 07, 2011 - 6:09 pm
1) If e6 and e12 have something in common not covered by your E and N factor, then EFA covers the extra correlation by a cross-loading because EFA does not allow residual correlations. You can look at the MIs for the EFA which probably point to a res corr. You can use ESEM to correlate residuals. The CFA tells you that correlating the residuals is better - that again points to the two items both being influenced by another factor.

2) You have to send the model to support for us to see the specific features of the CFA.
 mpduser1 posted on Wednesday, August 17, 2011 - 5:07 pm
I want to verify that I'm interpreting Mplus output and Chapter 14 of the User's Guide correctly.

I'm estimating model where I have a dichotomous outcome, Y (0 = no, 1 = yes), a dichotomous predictor, X, and a latent class variable that I'm specifying as a predictor, L. L has three classes.

I obtain "logistic regression intercepts" for Y from the latent class portion of my model as follows:
Class 1: Y$1 = 1.32
Class 2: Y$1 = .80
Class 3: Y$1 = .20.

For X = 0, am I correct that this indicates that 52% of respondents who respond "yes" to Y are in Class 1; 30.9% who respond "yes" to Y are in Class 2; 16.99% who respond "yes" to Y are in Class 3?

I obtain this via: EXP(1.32) = 3.74; EXP(.80) = 2.22; EXP(.20) = 1.22, with proportion in Class 1 as: 3.74 / (3.74 + 2.22 + 1.22).

Thank you.
 Bengt O. Muthen posted on Wednesday, August 17, 2011 - 6:16 pm
It is simpler than that (you are half-ways slipping into multinomial regression). With a binary Y you have:

P(Y=1 | Class k) = 1/[1+exp(-logit)],

where the logit = -threshold. So, for example for Class 1, you have exp(-logit) = exp(+threshold) = exp(+1.32) and get P = 0.21.
 mpduser1 posted on Wednesday, August 17, 2011 - 6:55 pm
That was actually my original thinking.

In one model, however, I get thresholds along the lines of:

Class 1: 4.197
Class 2: 3.589
Class 3: 3.330.

So as I understand your calculation this would indicate that for respondents for whom X = 0 (approximately) only 1.5%, 2.7%, and 3.4% responded "yes" to Y.

These numbers seem small; especially given that approximately 20% of my total sample responded "yes" to item Y; and Class 1 is 24% of my sample, Class 2 is 47% of my sample, and Class 3 is 29% of my sample.

Am I being thrown off here because I'm only looking at the "X=0 segment" of the sample in this particular application?
 Bengt O. Muthen posted on Wednesday, August 17, 2011 - 8:43 pm
Maybe X=0 is below the X mean. And perhaps X also influences L.
 mpduser1 posted on Thursday, August 18, 2011 - 4:43 am
If X does influence L, is the model better rendered as:

X --> L --> Y
X --> Y

Would this adjust the otherwise large intercepts?

Thanks.
 Bengt O. Muthen posted on Thursday, August 18, 2011 - 7:57 am
That's hard to say - try it - but it will change the class proportions at X=0 (assuming X has an effect on L).
 mpduser1 posted on Thursday, August 18, 2011 - 8:25 am
Running the model as a path model doesn't change anything.

In the problematic example I noted above (with the large intercepts), X is actually a vector of covariates. I'll have to reduce the complexity of the model. Seems latent class models of either type can't be too rich. Not really sure why that is, but you can start to see it in the SEs.

I'll have to go back and build the model piece-wise.

Thank you.
 Alfred Mbah posted on Tuesday, September 20, 2011 - 6:24 am
Dear Dr Muthen
How do you assign categorical outcome variables with 2 categories(e.g. u1, u2, u3) to a categorical latent variable (with 2 classes for example) in MPLUS? The Type = mixture command does not seem to do this.
 Linda K. Muthen posted on Tuesday, September 20, 2011 - 10:16 am
See Example 8.12 in particular page 222.
 Lisa M. Yarnell posted on Thursday, September 20, 2012 - 5:43 pm
Hi Linda,

If I read in a polychoric correlation matrix for a set of dichotomous indicators (being told to do this by a professor), using TYPE = CORRELATION, is it possible to specify the correlations as polychoric? If so, how?
 Linda K. Muthen posted on Friday, September 21, 2012 - 8:11 am
You can do this using only the ULS estimator. You do not need to use the CATEGORICAL option. With any other estimator you would need a weight matrix for correct results. It is much better to analyze raw data.
 K. Tan posted on Friday, March 01, 2013 - 2:13 pm
Hi,

It seems like MPLUS generally uses only Logit and Probit models for categorical dependent variables (using the MLR and WSLMV estimators respectively). Is there any way to run a Linear Probability Model with robust standard errors? Is that the model used when I fail to declare my dependent variable as categorical?
 Linda K. Muthen posted on Friday, March 01, 2013 - 6:00 pm
If you do not put a categorical variable on the CATEGORICAL list, a linear regression is estimated for the categorical outcome.
 K. Tan posted on Friday, March 01, 2013 - 7:37 pm
Thank you! Can I assume that if I use the MLR estimator with this specification, the standard errors are robust to heteroskedasticity?
 Bengt O. Muthen posted on Saturday, March 02, 2013 - 1:09 pm
In principle they are robust, but in practive I am not sure how robust the results would be given that the coefficients are biased. You would need to do a Monte Carlo simulation study which is easy to do in Mplus.
 Martin Van Boekel posted on Thursday, March 07, 2013 - 11:27 am
Hello,
I am a new MPLUS user and attempting to run a SEM model with five exogenous, and eight endogenous variables, all of which are categorical. Of the two dependent latent variables, one is made up of three categorical rating scale items, the other is a single binary outcome.
I have reviewed much of the documentation on how to set up this type of model and consulted with peers, however when I run the syntax the time between iterations is very slow. I have left the program running for over 15 minutes and observed only 115 iterations.
If I remove the dichotmous outcome variable the model will converge so I do not believe that there is a problem with the code.
This is my first attempt at running an SEM model with categorical outcome variables, so any help would be appreaciated thank you.
 Linda K. Muthen posted on Thursday, March 07, 2013 - 12:17 pm
Please send your input, data, and license number to support@statmodel.com.
 Isaac Yrigoyen M. posted on Friday, August 09, 2013 - 5:03 pm
Dear Dr. Linda and Dr. Bengt,

I am new in your website, and I'm very interested in knowing whether the following model can be run in Mplus:

-My dependent variable is unordered categorical variable (not latent, just observed nominal variable with 4 categories).
-My predictors are: 4 continuous latent variables (they are reflective and have ordinal indicators), and 2 of them are mediators in the model.

According to your user guide, it would be something like that:

Variables u1- u16
Categorical are u2 – u16
Nominal u1
MODEL:
f1 by u2 – u4
f2 by u5 – u7
f3 by u8 – u10
f4 by u11 – u16

f1 ON f3
f2 ON f3 f4

u1 ON f1 f2 f4

MODEL INDIRECT
u1 IND f2 f4

Can Mplus run this model?

Does Mplus run multinomial logistic regression for the most endogenous paths (I mean between u1 and f1 f2 f4)?

Is it possible to model some of those latent variables (e.g.: f3) as formative?

Thank you very much

Isaac
 Bengt O. Muthen posted on Friday, August 09, 2013 - 6:32 pm
This is possible, although Model Indirect is not available with a nominal distal outcome because indirect effects with a nominal distal outcome need special attention to what is meant by the indirect effect - it is not simply of the "a*b" kind. For a proper handling of this, see the paper (with Mplus scripts):

Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.

Formative factors are straightforward. Multinomial logistic regression is used where u1 is the DV.
 Cecily Na posted on Thursday, September 12, 2013 - 7:38 am
Hello Professors,
I have several count variables (range 0-200) which I want to treat as continuous variables, but they are heavily skewed (skewness=50). Can I still treat them as continuous and use the MLR estimator?
I also tried to normalize their distributions by transformation based on their frequency percentage. I recode these variables into 5- or 6-point ordinal variables (0-1 recoded into 0, 2-10 recoded into 1 and so on), and want to treat them as continuous in my analysis. However, I need to incorporate weights in my analysis, and the transformation is done regardless of the weights.
Which method is better (after I apply weights)?
Thank you very much for your advice!
 Linda K. Muthen posted on Thursday, September 12, 2013 - 1:31 pm
You should not transform count variables. You should not treat them as continuous if they have a lot of zeroes. You can treat them as count variables or perhaps censored variables. I don't see any advantage of collapsing values unless there is a substantive reason.
 Cecily Na posted on Friday, September 13, 2013 - 6:40 am
Thank you, Linda, for your reply. The reason that I do not want to treat them as count variables is I suspect the measurement is pretty lousy (not real counts), and I would rather collapse them into several (continuous)ranges. So the focus of my question this time is:
Would it be okay if the unweighted data are normal, and then I apply weights and use MLR for estimation?
Thanks a lot!
 Linda K. Muthen posted on Friday, September 13, 2013 - 8:50 am
Yes.
 Lily Wang posted on Saturday, October 12, 2013 - 2:50 pm
Hello professors,
I am fitting a multilevel path model with a multinomial outcome (3 categories). So I have odds ratios for category 1 vs 3 and 2 vs 3. I was wondering if there are ways for me to obtain odds ratios of outcomes 1 vs 2, WHILE knowing if the estimates are statistically significant. I look forward to your advice on this! Thanks.
 Linda K. Muthen posted on Monday, October 14, 2013 - 10:04 am
You can use the DEFINE command to change the reference category of the nominal variable.
 Lily Wang posted on Monday, October 14, 2013 - 10:24 am
Thank you so much Linda. It sounds like a great option--I just wanted to make sure if there are statistical/reporting implications of this. Would you say that it is appropriate to report odds ratio of 1 vs 2 alongside with the other two pairs (1 vs 3) (2 vs 3) where the base category is the same and generated from the running the model with 3 as the base? Are there any statistical issues that I need to be addressing in this situation? Thanks again for your guidance!
 Linda K. Muthen posted on Monday, October 14, 2013 - 2:29 pm
I can't see any issues as long as you are clear on what you are reporting.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: