I have estimated a simple path model (no latent) with two measured continuous variables and four measured categorical variables. A reviewer is asking me to specify what type of correlations (tetrachoric, polychoric, Pearson) I am reporting in the correlation matrix. There's no indication on the Mplus output or in the manual what type of correlations are repoted in the " CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)" section. Could you clear this up for me? Thank you!
The type of correlation is determined by the scale of the two variables. If both are continuous, the correlation is a Pearson product moment correlation. If both are binary, it is a tetrachoric correlation. If both are ordered polytomous, it is a polychoric correlation. If one is dichotomous and one is continuous, it is a biserial correlation. If one is ordered polytomous and one is continuous, it is a polyserial correlation. I hope this answers your question.
I have a model (3 exogenous and 3 endogenous latent constructs) with a binary categorical outcome. The other variables are partly ordinal, partly continuous.
As my data is skewed and the sample size is large enough (n=1000) I would like to use the WLS estimator. Do I have to apply a biserial correlation (as computed in PRELIS) or do I have to compute point-biserial correlations. If the latter is true, is it possible to compute the point-biserial correlations and the corresponding weight matrix with Mplus?
Or do I have to use latent class analysis instead?
You need to have raw data. Mplus will compute the appropriate correlations for the scale of the variable. For example, with a dichotomous variable and a continuous variable, a biserial correlation is computed.
Anonymous posted on Friday, August 15, 2003 - 7:13 pm
I have two dichotomous outcome measures (Y1 and Y2), that are supposed to be correlated with each other. The two outcomes were modeled simultaneously using Mplus 2.14, being predicted by the same set of Xs. With "TYPE=GENERAL," Mplus provides probit regression coefficients for the two outcomes. However, the model fit statistics/indexes are confusing: Chi-Square=51.841, d.f.=13, P-Value=0.0000; CFI=1.000, TLI=1.000; RMSEA= 0.000; and WRMR= 0.000. When some Xs were removed from the model, CFI or TLI had a value greater than 1.0 sometimes. What appropriate model fit statistics/indexes should I use for the simultaneous probit model?
In addition, Mplus provides the correlation between Y1* and Y2*, which are the underlying latent variables of Y1 and Y2, respectively. How should I interpret this correlation given the fact that probit model has no residual term? Setting the correlation to 0.0 didn't change the coefficient estimates and their standard errors, but changed chi-square statistic and other fit indexes (e.g., CFI changed from 1.000 to 0.234 TLI changed from 1.000 to -7.424(why negative TLI)). Was the correlation between Y1* and Y2* factored in the modeling?
Your help will be highly appreciated!
bmuthen posted on Saturday, August 16, 2003 - 10:51 pm
Please send the output from both of the models you mention to firstname.lastname@example.org so that your full results can be studied.
Regarding the residual correlation between y1* and y2*, the probit model does have a residual term. Although the variance of the residual is standardized to one, it is possible to estimate residual correlations with multivariate outcomes. If residual correlations are given in the Model Results section, they are estimated (see also Tech1 regarding which parameters are estimated).
Anonymous posted on Thursday, August 21, 2003 - 5:23 pm
Hi, I am doing path analysis with several categorical outcome. the dependent variable and the mediating variable are both binary. how do I specify the 'THETA' parametriztion. Is ther an example?
bmuthen posted on Thursday, August 21, 2003 - 10:10 pm
Here is an example:
TITLE: this is an example of a Monte Carlo simulation study for a path analysis DATA: FILE = firstcat.dat; VARIABLE: NAMES = y1-y3 x1-x3; CATEGORICAL = y1-y3; ANALYSIS: PARAMETERIZATION = THETA; MODEL: y1 y2 on x1-x3*.5; y3 on y1-y2*.5 x1-x3*.3; OUTPUT: STANDARDIZED;
davood posted on Thursday, December 02, 2004 - 3:34 pm
Dear Dr. Muthen, I am doing a two group CFA using categorical indicators. The mplus generated this errors: *** ERROR Group 5 does not contain all values of categorical variable: DADCARE1
group 5 is puertoricans. When I ran the model for just puerto ricans I did not face this error. Also, I ran the frequency and probed the DADCARE1 for group 5. Here is the result: dadcare1 Value Frequency -999 156 1.00 2 2.00 2 3.00 11 4.00 36 5.00 191 Total 398
It is likely because of listwise deletion, you don't have 398 observations and without all 398, you may not have all values on DADCARE1. Or you may be reading your data wrong in Mplus. If you can't figure it out, send your output and data to email@example.com.
The website provides as a result: Variances COHAB 0.032 0.019 1.635
My own calculations (same data same syntax): several serious problems and Variances COHAB 0.000
own data als yielded several serious problems and Variances XYZ 0.000 Is there a problem in the mplus syntax for the "empty logistic model" ?:
Title: multilevel logistic regression Data: File is ch14_1.dat ; Variable: Names are respnr ltbm eltwm sex age edu relserv class reg cohab; Missing are all (-9999) ; usevar are cohab reg; categorical is cohab; cluster is reg; Analysis: Type = twolevel random ; model: %between% cohab;
Dear Prof Muthen I am fitting latent class models in MPLUs.I have four categorical manifest variables and i am interested to estimate the true status of the disease given the four diagnostic test.There is high likelihood that my manifest variables are correlated .Intitutively one latent class with two levels will be ideal.But when i try it the model does not fit.Only when i go up to level four is when the model fits better.
My questions are follows.
1.How do we detect the dependence among the manifest variables in MPLUs
2.Given that there is dependence how to model it.I saw in in MPlus users guide for version 1 of 2000 page 13 that WITH statement is used for correlational among continuous varibles but is not applicable to categorical variable.
The manifest variables should correlate. Their correlations are assumed to be zero within each class, but the variables will correlate when the classes are mixed. You can look at TECH10 to check for dependence among the manifest variables within calss. The WITH statement is used with catgorical observed variables in some models. With latent class analysis, you need to put a factor behind the two indicators to specify a correlation. See Example 7.16.
Thanks for your quick response.But i have one more question.In your response u refered me to see Example 7.6 on how to use WITH statement for categorical variables.I am not clear it is example 7.6 from the user's manual or from where??.Just to remind you i have user's guide for MPLUS version 1.
Dear Linda Here is my syntax ,but i am not quite sure how to include with statement in modelling dependence among A B D H. I tried to use
A B WITH D H but it does not work.
Thanks once again for your estimeed support.
Data: File is D:\mp.txt; Variable: Names are A B D H count; usevariables are A B D H count; FREQWEIGHT is count ; categorical are A B D H; Missing are all (-9999) ; classes = cl(2); Analysis: Type = mixture; model: %overall%
I always refer to the most recent user's guide which is posted on the website. See Example 7.6 in this user's guide. Version 1 is very old. Your research will be improved by updating to Version 4.1. There are many changes that have occurred since Version 1 that will help you in mixture modeling particularly the addition of random starts.
Peter Croy posted on Monday, May 14, 2007 - 1:24 pm
I'm just starting to model a categorical outcome variable (a binary y/n DV) and have generated a model as the first step toward a chi sq difference test per instructions on page 314 of the guide. Using the savedata: difftest is deriv.dat command saves a dat file which contains an error message, namely: The input file does not contain valid commands. What does this suggest?
I have 4 continuous latent variables (IVs) which are all measured by continuous indicators. However, dependent variable is one observed binary variable. I understand Mplus can incorporate this relationship but how?
1. What correlation matrix is used? Does matrix analyzed include both Pearson and biserial correlation even though I just have one binary variable in structural part?
2. What estimation method should I use?
3. The relationship in measurement part (between continuous latent variables and continuous indicators) is described by linear regression equation separately from structural part (between 4 continuous latent variable and one binary observed dependent variable) or all relationship is described by probit or logistic regression equation?
Let's say you have four factors, f1 through f4, and one dependent binary variable, u1. You would specify the model as:
f1 BY ... f2 BY ... f3 BY ... f4 BY .... u1 ON f1-f4;
1. If you are using weighted least squares, the correlation depends on the scale of the two dependent variables.
2. You can use weighted least squares which is the default or maximum likelihood.
3. The type of regression coefficient is determined by the scale of the dependent variable. Factor indicators are dependent variables. When they are continuous, the regression coefficient is a linear regression coefficient. The regression of the categorical dependent variable on the continuous latent variables is probit or logistic depending on the estimator and link. It is probit for weighted least squares. It is a logistic regression coefficient for maximum likelihood using the default logit link. It is probit using the probit link.
Dear Dr. Muthen, Kindly I'd like to know which product of Mplus 5.2 to purchase: Mplus Base Program Mplus Base Program and Mixture Add-On Mplus Base Program and Multilevel Add-On Mplus Base Program and Combination Add-On to perform latent class analysis with 4 manfiset variables (diagnostic tests) in 400 animals to estimate the true disease status & estimate the sensitivity & specificity of each test. My study is similar to Nyankomo Wambura Marwa's study posted in this wall.
also I'd like to ask if Mplus 3 can perform this type of test. with Best regards
i conducted a efa and found two items (e6 and e12) that loaded strongly on the intended factor (E) as well as another factor (N). i then conducted a cfa without the cross-loadings and the model fit very well. i then ran another cfa allowing the first cross-loading item (e6) to cross-load and the model was not identified. the same was true when i ran another cfa allowing e12 instead of e6 to cross-load -- it was also not identified. in the first cfa, i.e. with no cross-loadings, there was a extremely high MI (e.g. >200) for e6 with e12.
1.) is there a connection between the strong cross-loadings of e6 and e12 in the efa and the strong correlation (i.e., large MI value) between e6 and e12 in the first cfa (i.e., where no cross-loadings were allowed)? for example, does the strong correlation in the cfa manifest itself as a large cross-loading in the efa?
2.)how can e6 and e12 have large cross-loadings on another factor (N) in the efa but the models were not identified in the cfa when the items were allowed to cross-load, separately, on E and N?
1) If e6 and e12 have something in common not covered by your E and N factor, then EFA covers the extra correlation by a cross-loading because EFA does not allow residual correlations. You can look at the MIs for the EFA which probably point to a res corr. You can use ESEM to correlate residuals. The CFA tells you that correlating the residuals is better - that again points to the two items both being influenced by another factor.
2) You have to send the model to support for us to see the specific features of the CFA.
mpduser1 posted on Wednesday, August 17, 2011 - 11:07 pm
I want to verify that I'm interpreting Mplus output and Chapter 14 of the User's Guide correctly.
I'm estimating model where I have a dichotomous outcome, Y (0 = no, 1 = yes), a dichotomous predictor, X, and a latent class variable that I'm specifying as a predictor, L. L has three classes.
I obtain "logistic regression intercepts" for Y from the latent class portion of my model as follows: Class 1: Y$1 = 1.32 Class 2: Y$1 = .80 Class 3: Y$1 = .20.
For X = 0, am I correct that this indicates that 52% of respondents who respond "yes" to Y are in Class 1; 30.9% who respond "yes" to Y are in Class 2; 16.99% who respond "yes" to Y are in Class 3?
I obtain this via: EXP(1.32) = 3.74; EXP(.80) = 2.22; EXP(.20) = 1.22, with proportion in Class 1 as: 3.74 / (3.74 + 2.22 + 1.22).
It is simpler than that (you are half-ways slipping into multinomial regression). With a binary Y you have:
P(Y=1 | Class k) = 1/[1+exp(-logit)],
where the logit = -threshold. So, for example for Class 1, you have exp(-logit) = exp(+threshold) = exp(+1.32) and get P = 0.21.
mpduser1 posted on Thursday, August 18, 2011 - 12:55 am
That was actually my original thinking.
In one model, however, I get thresholds along the lines of:
Class 1: 4.197 Class 2: 3.589 Class 3: 3.330.
So as I understand your calculation this would indicate that for respondents for whom X = 0 (approximately) only 1.5%, 2.7%, and 3.4% responded "yes" to Y.
These numbers seem small; especially given that approximately 20% of my total sample responded "yes" to item Y; and Class 1 is 24% of my sample, Class 2 is 47% of my sample, and Class 3 is 29% of my sample.
Am I being thrown off here because I'm only looking at the "X=0 segment" of the sample in this particular application?
That's hard to say - try it - but it will change the class proportions at X=0 (assuming X has an effect on L).
mpduser1 posted on Thursday, August 18, 2011 - 2:25 pm
Running the model as a path model doesn't change anything.
In the problematic example I noted above (with the large intercepts), X is actually a vector of covariates. I'll have to reduce the complexity of the model. Seems latent class models of either type can't be too rich. Not really sure why that is, but you can start to see it in the SEs.
I'll have to go back and build the model piece-wise.
Alfred Mbah posted on Tuesday, September 20, 2011 - 12:24 pm
Dear Dr Muthen How do you assign categorical outcome variables with 2 categories(e.g. u1, u2, u3) to a categorical latent variable (with 2 classes for example) in MPLUS? The Type = mixture command does not seem to do this.
If I read in a polychoric correlation matrix for a set of dichotomous indicators (being told to do this by a professor), using TYPE = CORRELATION, is it possible to specify the correlations as polychoric? If so, how?
You can do this using only the ULS estimator. You do not need to use the CATEGORICAL option. With any other estimator you would need a weight matrix for correct results. It is much better to analyze raw data.
It seems like MPLUS generally uses only Logit and Probit models for categorical dependent variables (using the MLR and WSLMV estimators respectively). Is there any way to run a Linear Probability Model with robust standard errors? Is that the model used when I fail to declare my dependent variable as categorical?
In principle they are robust, but in practive I am not sure how robust the results would be given that the coefficients are biased. You would need to do a Monte Carlo simulation study which is easy to do in Mplus.
Hello, I am a new MPLUS user and attempting to run a SEM model with five exogenous, and eight endogenous variables, all of which are categorical. Of the two dependent latent variables, one is made up of three categorical rating scale items, the other is a single binary outcome. I have reviewed much of the documentation on how to set up this type of model and consulted with peers, however when I run the syntax the time between iterations is very slow. I have left the program running for over 15 minutes and observed only 115 iterations. If I remove the dichotmous outcome variable the model will converge so I do not believe that there is a problem with the code. This is my first attempt at running an SEM model with categorical outcome variables, so any help would be appreaciated thank you.
I am new in your website, and I'm very interested in knowing whether the following model can be run in Mplus:
-My dependent variable is unordered categorical variable (not latent, just observed nominal variable with 4 categories). -My predictors are: 4 continuous latent variables (they are reflective and have ordinal indicators), and 2 of them are mediators in the model.
According to your user guide, it would be something like that:
Variables u1- u16 Categorical are u2 – u16 Nominal u1 MODEL: f1 by u2 – u4 f2 by u5 – u7 f3 by u8 – u10 f4 by u11 – u16
f1 ON f3 f2 ON f3 f4
u1 ON f1 f2 f4
MODEL INDIRECT u1 IND f2 f4
Can Mplus run this model?
Does Mplus run multinomial logistic regression for the most endogenous paths (I mean between u1 and f1 f2 f4)?
Is it possible to model some of those latent variables (e.g.: f3) as formative?
This is possible, although Model Indirect is not available with a nominal distal outcome because indirect effects with a nominal distal outcome need special attention to what is meant by the indirect effect - it is not simply of the "a*b" kind. For a proper handling of this, see the paper (with Mplus scripts):
Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.
Formative factors are straightforward. Multinomial logistic regression is used where u1 is the DV.
Cecily Na posted on Thursday, September 12, 2013 - 1:38 pm
Hello Professors, I have several count variables (range 0-200) which I want to treat as continuous variables, but they are heavily skewed (skewness=50). Can I still treat them as continuous and use the MLR estimator? I also tried to normalize their distributions by transformation based on their frequency percentage. I recode these variables into 5- or 6-point ordinal variables (0-1 recoded into 0, 2-10 recoded into 1 and so on), and want to treat them as continuous in my analysis. However, I need to incorporate weights in my analysis, and the transformation is done regardless of the weights. Which method is better (after I apply weights)? Thank you very much for your advice!
You should not transform count variables. You should not treat them as continuous if they have a lot of zeroes. You can treat them as count variables or perhaps censored variables. I don't see any advantage of collapsing values unless there is a substantive reason.
Cecily Na posted on Friday, September 13, 2013 - 12:40 pm
Thank you, Linda, for your reply. The reason that I do not want to treat them as count variables is I suspect the measurement is pretty lousy (not real counts), and I would rather collapse them into several (continuous)ranges. So the focus of my question this time is: Would it be okay if the unweighted data are normal, and then I apply weights and use MLR for estimation? Thanks a lot!
Lily Wang posted on Saturday, October 12, 2013 - 8:50 pm
Hello professors, I am fitting a multilevel path model with a multinomial outcome (3 categories). So I have odds ratios for category 1 vs 3 and 2 vs 3. I was wondering if there are ways for me to obtain odds ratios of outcomes 1 vs 2, WHILE knowing if the estimates are statistically significant. I look forward to your advice on this! Thanks.
You can use the DEFINE command to change the reference category of the nominal variable.
Lily Wang posted on Monday, October 14, 2013 - 4:24 pm
Thank you so much Linda. It sounds like a great option--I just wanted to make sure if there are statistical/reporting implications of this. Would you say that it is appropriate to report odds ratio of 1 vs 2 alongside with the other two pairs (1 vs 3) (2 vs 3) where the base category is the same and generated from the running the model with 3 as the base? Are there any statistical issues that I need to be addressing in this situation? Thanks again for your guidance!
My dependent variable is a post-treatment sum of ache-related symptoms, which is a count variable. I wanted to use poisson regression or negative binomial regression to examine the effects of several independent variables on this count outcome variable.
My problem is that one of my independent variables is also a count variable, pre-treatment sum of ache-related symptoms.
What would be the best approach in this case? Would it be okay to just include the pre-treatment count as an independent variable?
I have a cross-lagged model with: - one dichotomous outcome (dependent variable = wanting a social service (yes/no)) - a range of predictors: dichotomous (gender) as well as continuous (income)
I use WLSMV as an estimator because my dependent variable is dichotomous (yes/no).
Now my questions: 1) Is it correct that the estimates that I get are probit regression coefficients?
2) I get Model results (with P-value) and STDYX results (without P-value): which numbers do I need to report? - Do I reported the standardized estimates (using the P-value of the unstandardized model results) so that I can compare the strength of the paths? - Or do I just report the unstandardized results (and can I then not tell something about the strength of the associations)? - Do you know a paper that reports the outcomes of a similar analysis?
Many thanks for this information, best regards, Dirk
(1) Moest articles publish beta-coefficents + a P-value while I do not get a P-value for the STDYX coefficients. Can I just use the P-Value of the unstandardized model results for the standardized results too? I suspect not?
(2) The interpretation of these numbers is also not clear to me. is this interpretation correct? a) A unstandardized probit regression coefficient of 0.46 between income and wanting a social service (0/1)would mean 'an increase of 1 dollar in income, means an increase in the Z-score of wanting a social service by 0.46 b) a standardized probit regression coefficient of 0.22 between income and wanting a social service (0/1) would mean that 'an increase in income with 1 standard deviation, leads to an increase in the Z-score of wanting a social service by 0.22. c) One could also translate this info in probabilities using the thresholds and the standardized/unstandardized(??) coefficients, although I do not understand how.
Could you help me solve these four queries ((1) and (2)a, b & c?)