Message/Author 

Anonymous posted on Thursday, March 13, 2003  1:32 pm



I have a data with a binary outcome and a binary mediating variable. My exogenous variables are a mixture of latent and observed variables. Can I use mplus version 2 to analyze this data? 


Yes, it may be necessary to use our THETA parameterization for this model. If so, when you run the program, it will tell you. 

Angela posted on Monday, April 21, 2003  5:41 pm



I have estimated a simple path model (no latent) with two measured continuous variables and four measured categorical variables. A reviewer is asking me to specify what type of correlations (tetrachoric, polychoric, Pearson) I am reporting in the correlation matrix. There's no indication on the Mplus output or in the manual what type of correlations are repoted in the " CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)" section. Could you clear this up for me? Thank you! 


The type of correlation is determined by the scale of the two variables. If both are continuous, the correlation is a Pearson product moment correlation. If both are binary, it is a tetrachoric correlation. If both are ordered polytomous, it is a polychoric correlation. If one is dichotomous and one is continuous, it is a biserial correlation. If one is ordered polytomous and one is continuous, it is a polyserial correlation. I hope this answers your question. 

Angela posted on Monday, April 21, 2003  7:34 pm



You answered my quesiton perfectly. I was writing my responses to the reviewer exactly the way you described them, but wanted to know for sure I was saying the correct thing. Thank you so much!! One more quick question  is there any way to get Mplus to print statistical significance levels for these correlations? 


If you ask for TYPE=BASIC, you will get the correlations and also the standard deviations for each correlation. If you divide the correlation by its standard error, this is like a ztest. 

Greg posted on Wednesday, July 30, 2003  3:12 pm



I have a model (3 exogenous and 3 endogenous latent constructs) with a binary categorical outcome. The other variables are partly ordinal, partly continuous. As my data is skewed and the sample size is large enough (n=1000) I would like to use the WLS estimator. Do I have to apply a biserial correlation (as computed in PRELIS) or do I have to compute pointbiserial correlations. If the latter is true, is it possible to compute the pointbiserial correlations and the corresponding weight matrix with Mplus? Or do I have to use latent class analysis instead? 


You need to have raw data. Mplus will compute the appropriate correlations for the scale of the variable. For example, with a dichotomous variable and a continuous variable, a biserial correlation is computed. 

Anonymous posted on Friday, August 15, 2003  7:13 pm



I have two dichotomous outcome measures (Y1 and Y2), that are supposed to be correlated with each other. The two outcomes were modeled simultaneously using Mplus 2.14, being predicted by the same set of Xs. With "TYPE=GENERAL," Mplus provides probit regression coefficients for the two outcomes. However, the model fit statistics/indexes are confusing: ChiSquare=51.841, d.f.=13, PValue=0.0000; CFI=1.000, TLI=1.000; RMSEA= 0.000; and WRMR= 0.000. When some Xs were removed from the model, CFI or TLI had a value greater than 1.0 sometimes. What appropriate model fit statistics/indexes should I use for the simultaneous probit model? In addition, Mplus provides the correlation between Y1* and Y2*, which are the underlying latent variables of Y1 and Y2, respectively. How should I interpret this correlation given the fact that probit model has no residual term? Setting the correlation to 0.0 didn't change the coefficient estimates and their standard errors, but changed chisquare statistic and other fit indexes (e.g., CFI changed from 1.000 to 0.234 TLI changed from 1.000 to 7.424(why negative TLI)). Was the correlation between Y1* and Y2* factored in the modeling? Your help will be highly appreciated! 

bmuthen posted on Saturday, August 16, 2003  10:51 pm



Please send the output from both of the models you mention to support@statmodel.com so that your full results can be studied. Regarding the residual correlation between y1* and y2*, the probit model does have a residual term. Although the variance of the residual is standardized to one, it is possible to estimate residual correlations with multivariate outcomes. If residual correlations are given in the Model Results section, they are estimated (see also Tech1 regarding which parameters are estimated). 

Anonymous posted on Thursday, August 21, 2003  5:23 pm



Hi, I am doing path analysis with several categorical outcome. the dependent variable and the mediating variable are both binary. how do I specify the 'THETA' parametriztion. Is ther an example? Thanks 

bmuthen posted on Thursday, August 21, 2003  10:10 pm



Here is an example: TITLE: this is an example of a Monte Carlo simulation study for a path analysis DATA: FILE = firstcat.dat; VARIABLE: NAMES = y1y3 x1x3; CATEGORICAL = y1y3; ANALYSIS: PARAMETERIZATION = THETA; MODEL: y1 y2 on x1x3*.5; y3 on y1y2*.5 x1x3*.3; OUTPUT: STANDARDIZED; 

davood posted on Thursday, December 02, 2004  3:34 pm



Dear Dr. Muthen, I am doing a two group CFA using categorical indicators. The mplus generated this errors: *** ERROR Group 5 does not contain all values of categorical variable: DADCARE1 group 5 is puertoricans. When I ran the model for just puerto ricans I did not face this error. Also, I ran the frequency and probed the DADCARE1 for group 5. Here is the result: dadcare1 Value Frequency 999 156 1.00 2 2.00 2 3.00 11 4.00 36 5.00 191 Total 398 


It is likely because of listwise deletion, you don't have 398 observations and without all 398, you may not have all values on DADCARE1. Or you may be reading your data wrong in Mplus. If you can't figure it out, send your output and data to support@statmodel.com. 


Dear Dr. Muthen, I am having trouble replicating results given on an ucla website which uses mplus for examples from the Multilevel Book by snijders/bosker, chapter 14, http://www.ats.ucla.edu/stat/mplus/examples/ma_snijders/chap14.htm (it provides mplus syntax and data fo a multilevel binary logistic regression) The website provides as a result: Variances COHAB 0.032 0.019 1.635 My own calculations (same data same syntax): several serious problems and Variances COHAB 0.000 own data als yielded several serious problems and Variances XYZ 0.000 Is there a problem in the mplus syntax for the "empty logistic model" ?: Title: multilevel logistic regression Data: File is ch14_1.dat ; Variable: Names are respnr ltbm eltwm sex age edu relserv class reg cohab; Missing are all (9999) ; usevar are cohab reg; categorical is cohab; cluster is reg; Analysis: Type = twolevel random ; model: %between% cohab; Thanks in advance 


What version of Mplus are you using. What version did they use. You can see this by looking at their output if they provide the full output. 


Thanks for the quick response, ucla version is not given. How would you do the model in mplus syntax? (binary outcome, only the level 2 cluster as explanatory variable) 


The Mplus input is on the UCLA website. If you cannot get this to run, you need to send the input, data, output, and your license number to support@statmodel.com. 


Dear Prof Muthen I am fitting latent class models in MPLUs.I have four categorical manifest variables and i am interested to estimate the true status of the disease given the four diagnostic test.There is high likelihood that my manifest variables are correlated .Intitutively one latent class with two levels will be ideal.But when i try it the model does not fit.Only when i go up to level four is when the model fits better. My questions are follows. 1.How do we detect the dependence among the manifest variables in MPLUs 2.Given that there is dependence how to model it.I saw in in MPlus users guide for version 1 of 2000 page 13 that WITH statement is used for correlational among continuous varibles but is not applicable to categorical variable. Thanks in advance for your help 


The manifest variables should correlate. Their correlations are assumed to be zero within each class, but the variables will correlate when the classes are mixed. You can look at TECH10 to check for dependence among the manifest variables within calss. The WITH statement is used with catgorical observed variables in some models. With latent class analysis, you need to put a factor behind the two indicators to specify a correlation. See Example 7.16. 


Dear Linda Thanks for your quick response.But i have one more question.In your response u refered me to see Example 7.6 on how to use WITH statement for categorical variables.I am not clear it is example 7.6 from the user's manual or from where??.Just to remind you i have user's guide for MPLUS version 1. See your response attached below for reference Hoping to here from u soon. With regards Nyankomo marwa 


Dear Linda Here is my syntax ,but i am not quite sure how to include with statement in modelling dependence among A B D H. I tried to use A B WITH D H but it does not work. Thanks once again for your estimeed support. Data: File is D:\mp.txt; Variable: Names are A B D H count; usevariables are A B D H count; FREQWEIGHT is count ; categorical are A B D H; Missing are all (9999) ; classes = cl(2); Analysis: Type = mixture; model: %overall% output:tech10; 


I always refer to the most recent user's guide which is posted on the website. See Example 7.6 in this user's guide. Version 1 is very old. Your research will be improved by updating to Version 4.1. There are many changes that have occurred since Version 1 that will help you in mixture modeling particularly the addition of random starts. 

Peter Croy posted on Monday, May 14, 2007  1:24 pm



I'm just starting to model a categorical outcome variable (a binary y/n DV) and have generated a model as the first step toward a chi sq difference test per instructions on page 314 of the guide. Using the savedata: difftest is deriv.dat command saves a dat file which contains an error message, namely: The input file does not contain valid commands. What does this suggest? 


It is difficult to say without more information. If you send your input, data, output, and license number to support@statmodel.com, I can take a look at it. 


Dear Dr. Muthen, I am a new user of Mplus and I really like Mplus. I have 4 continuous latent variables (IVs) which are all measured by continuous indicators. However, dependent variable is one observed binary variable. I understand Mplus can incorporate this relationship but how? 1. What correlation matrix is used? Does matrix analyzed include both Pearson and biserial correlation even though I just have one binary variable in structural part? 2. What estimation method should I use? 3. The relationship in measurement part (between continuous latent variables and continuous indicators) is described by linear regression equation separately from structural part (between 4 continuous latent variable and one binary observed dependent variable) or all relationship is described by probit or logistic regression equation? I am looking forward to hearing from you soon. I truly appreciate your time and help. 


Let's say you have four factors, f1 through f4, and one dependent binary variable, u1. You would specify the model as: f1 BY ... f2 BY ... f3 BY ... f4 BY .... u1 ON f1f4; 1. If you are using weighted least squares, the correlation depends on the scale of the two dependent variables. 2. You can use weighted least squares which is the default or maximum likelihood. 3. The type of regression coefficient is determined by the scale of the dependent variable. Factor indicators are dependent variables. When they are continuous, the regression coefficient is a linear regression coefficient. The regression of the categorical dependent variable on the continuous latent variables is probit or logistic depending on the estimator and link. It is probit for weighted least squares. It is a logistic regression coefficient for maximum likelihood using the default logit link. It is probit using the probit link. 


Dear Dr. Muthen, Kindly I'd like to know which product of Mplus 5.2 to purchase: Mplus Base Program Mplus Base Program and Mixture AddOn Mplus Base Program and Multilevel AddOn Mplus Base Program and Combination AddOn to perform latent class analysis with 4 manfiset variables (diagnostic tests) in 400 animals to estimate the true disease status & estimate the sensitivity & specificity of each test. My study is similar to Nyankomo Wambura Marwa's study posted in this wall. also I'd like to ask if Mplus 3 can perform this type of test. with Best regards 


I think the Mplus Base Program and Mixture AddOn would be the correct choice. 


i conducted a efa and found two items (e6 and e12) that loaded strongly on the intended factor (E) as well as another factor (N). i then conducted a cfa without the crossloadings and the model fit very well. i then ran another cfa allowing the first crossloading item (e6) to crossload and the model was not identified. the same was true when i ran another cfa allowing e12 instead of e6 to crossload  it was also not identified. in the first cfa, i.e. with no crossloadings, there was a extremely high MI (e.g. >200) for e6 with e12. 1.) is there a connection between the strong crossloadings of e6 and e12 in the efa and the strong correlation (i.e., large MI value) between e6 and e12 in the first cfa (i.e., where no crossloadings were allowed)? for example, does the strong correlation in the cfa manifest itself as a large crossloading in the efa? 2.)how can e6 and e12 have large crossloadings on another factor (N) in the efa but the models were not identified in the cfa when the items were allowed to crossload, separately, on E and N? any thoughts on either question? 


1) If e6 and e12 have something in common not covered by your E and N factor, then EFA covers the extra correlation by a crossloading because EFA does not allow residual correlations. You can look at the MIs for the EFA which probably point to a res corr. You can use ESEM to correlate residuals. The CFA tells you that correlating the residuals is better  that again points to the two items both being influenced by another factor. 2) You have to send the model to support for us to see the specific features of the CFA. 

mpduser1 posted on Wednesday, August 17, 2011  11:07 pm



I want to verify that I'm interpreting Mplus output and Chapter 14 of the User's Guide correctly. I'm estimating model where I have a dichotomous outcome, Y (0 = no, 1 = yes), a dichotomous predictor, X, and a latent class variable that I'm specifying as a predictor, L. L has three classes. I obtain "logistic regression intercepts" for Y from the latent class portion of my model as follows: Class 1: Y$1 = 1.32 Class 2: Y$1 = .80 Class 3: Y$1 = .20. For X = 0, am I correct that this indicates that 52% of respondents who respond "yes" to Y are in Class 1; 30.9% who respond "yes" to Y are in Class 2; 16.99% who respond "yes" to Y are in Class 3? I obtain this via: EXP(1.32) = 3.74; EXP(.80) = 2.22; EXP(.20) = 1.22, with proportion in Class 1 as: 3.74 / (3.74 + 2.22 + 1.22). Thank you. 


It is simpler than that (you are halfways slipping into multinomial regression). With a binary Y you have: P(Y=1  Class k) = 1/[1+exp(logit)], where the logit = threshold. So, for example for Class 1, you have exp(logit) = exp(+threshold) = exp(+1.32) and get P = 0.21. 

mpduser1 posted on Thursday, August 18, 2011  12:55 am



That was actually my original thinking. In one model, however, I get thresholds along the lines of: Class 1: 4.197 Class 2: 3.589 Class 3: 3.330. So as I understand your calculation this would indicate that for respondents for whom X = 0 (approximately) only 1.5%, 2.7%, and 3.4% responded "yes" to Y. These numbers seem small; especially given that approximately 20% of my total sample responded "yes" to item Y; and Class 1 is 24% of my sample, Class 2 is 47% of my sample, and Class 3 is 29% of my sample. Am I being thrown off here because I'm only looking at the "X=0 segment" of the sample in this particular application? 


Maybe X=0 is below the X mean. And perhaps X also influences L. 

mpduser1 posted on Thursday, August 18, 2011  10:43 am



If X does influence L, is the model better rendered as: X > L > Y X > Y Would this adjust the otherwise large intercepts? Thanks. 


That's hard to say  try it  but it will change the class proportions at X=0 (assuming X has an effect on L). 

mpduser1 posted on Thursday, August 18, 2011  2:25 pm



Running the model as a path model doesn't change anything. In the problematic example I noted above (with the large intercepts), X is actually a vector of covariates. I'll have to reduce the complexity of the model. Seems latent class models of either type can't be too rich. Not really sure why that is, but you can start to see it in the SEs. I'll have to go back and build the model piecewise. Thank you. 

Alfred Mbah posted on Tuesday, September 20, 2011  12:24 pm



Dear Dr Muthen How do you assign categorical outcome variables with 2 categories(e.g. u1, u2, u3) to a categorical latent variable (with 2 classes for example) in MPLUS? The Type = mixture command does not seem to do this. 


See Example 8.12 in particular page 222. 


Hi Linda, If I read in a polychoric correlation matrix for a set of dichotomous indicators (being told to do this by a professor), using TYPE = CORRELATION, is it possible to specify the correlations as polychoric? If so, how? 


You can do this using only the ULS estimator. You do not need to use the CATEGORICAL option. With any other estimator you would need a weight matrix for correct results. It is much better to analyze raw data. 

K. Tan posted on Friday, March 01, 2013  8:13 pm



Hi, It seems like MPLUS generally uses only Logit and Probit models for categorical dependent variables (using the MLR and WSLMV estimators respectively). Is there any way to run a Linear Probability Model with robust standard errors? Is that the model used when I fail to declare my dependent variable as categorical? 


If you do not put a categorical variable on the CATEGORICAL list, a linear regression is estimated for the categorical outcome. 

K. Tan posted on Saturday, March 02, 2013  1:37 am



Thank you! Can I assume that if I use the MLR estimator with this specification, the standard errors are robust to heteroskedasticity? 


In principle they are robust, but in practive I am not sure how robust the results would be given that the coefficients are biased. You would need to do a Monte Carlo simulation study which is easy to do in Mplus. 


Hello, I am a new MPLUS user and attempting to run a SEM model with five exogenous, and eight endogenous variables, all of which are categorical. Of the two dependent latent variables, one is made up of three categorical rating scale items, the other is a single binary outcome. I have reviewed much of the documentation on how to set up this type of model and consulted with peers, however when I run the syntax the time between iterations is very slow. I have left the program running for over 15 minutes and observed only 115 iterations. If I remove the dichotmous outcome variable the model will converge so I do not believe that there is a problem with the code. This is my first attempt at running an SEM model with categorical outcome variables, so any help would be appreaciated thank you. 


Please send your input, data, and license number to support@statmodel.com. 


Dear Dr. Linda and Dr. Bengt, I am new in your website, and I'm very interested in knowing whether the following model can be run in Mplus: My dependent variable is unordered categorical variable (not latent, just observed nominal variable with 4 categories). My predictors are: 4 continuous latent variables (they are reflective and have ordinal indicators), and 2 of them are mediators in the model. According to your user guide, it would be something like that: Variables u1 u16 Categorical are u2 – u16 Nominal u1 MODEL: f1 by u2 – u4 f2 by u5 – u7 f3 by u8 – u10 f4 by u11 – u16 f1 ON f3 f2 ON f3 f4 u1 ON f1 f2 f4 MODEL INDIRECT u1 IND f2 f4 Can Mplus run this model? Does Mplus run multinomial logistic regression for the most endogenous paths (I mean between u1 and f1 f2 f4)? Is it possible to model some of those latent variables (e.g.: f3) as formative? Thank you very much Isaac 


This is possible, although Model Indirect is not available with a nominal distal outcome because indirect effects with a nominal distal outcome need special attention to what is meant by the indirect effect  it is not simply of the "a*b" kind. For a proper handling of this, see the paper (with Mplus scripts): Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Formative factors are straightforward. Multinomial logistic regression is used where u1 is the DV. 

Cecily Na posted on Thursday, September 12, 2013  1:38 pm



Hello Professors, I have several count variables (range 0200) which I want to treat as continuous variables, but they are heavily skewed (skewness=50). Can I still treat them as continuous and use the MLR estimator? I also tried to normalize their distributions by transformation based on their frequency percentage. I recode these variables into 5 or 6point ordinal variables (01 recoded into 0, 210 recoded into 1 and so on), and want to treat them as continuous in my analysis. However, I need to incorporate weights in my analysis, and the transformation is done regardless of the weights. Which method is better (after I apply weights)? Thank you very much for your advice! 


You should not transform count variables. You should not treat them as continuous if they have a lot of zeroes. You can treat them as count variables or perhaps censored variables. I don't see any advantage of collapsing values unless there is a substantive reason. 

Cecily Na posted on Friday, September 13, 2013  12:40 pm



Thank you, Linda, for your reply. The reason that I do not want to treat them as count variables is I suspect the measurement is pretty lousy (not real counts), and I would rather collapse them into several (continuous)ranges. So the focus of my question this time is: Would it be okay if the unweighted data are normal, and then I apply weights and use MLR for estimation? Thanks a lot! 


Yes. 

Lily Wang posted on Saturday, October 12, 2013  8:50 pm



Hello professors, I am fitting a multilevel path model with a multinomial outcome (3 categories). So I have odds ratios for category 1 vs 3 and 2 vs 3. I was wondering if there are ways for me to obtain odds ratios of outcomes 1 vs 2, WHILE knowing if the estimates are statistically significant. I look forward to your advice on this! Thanks. 


You can use the DEFINE command to change the reference category of the nominal variable. 

Lily Wang posted on Monday, October 14, 2013  4:24 pm



Thank you so much Linda. It sounds like a great optionI just wanted to make sure if there are statistical/reporting implications of this. Would you say that it is appropriate to report odds ratio of 1 vs 2 alongside with the other two pairs (1 vs 3) (2 vs 3) where the base category is the same and generated from the running the model with 3 as the base? Are there any statistical issues that I need to be addressing in this situation? Thanks again for your guidance! 


I can't see any issues as long as you are clear on what you are reporting. 

Yoon Oh posted on Tuesday, May 05, 2015  7:54 pm



My dependent variable is a posttreatment sum of acherelated symptoms, which is a count variable. I wanted to use poisson regression or negative binomial regression to examine the effects of several independent variables on this count outcome variable. My problem is that one of my independent variables is also a count variable, pretreatment sum of acherelated symptoms. What would be the best approach in this case? Would it be okay to just include the pretreatment count as an independent variable? Your advice would be greatly appreciated. 


Your only option is to include it and treat it as a continuous variable. All covariates in regression are treated as continuous variables. 


Dear, I have a crosslagged model with:  one dichotomous outcome (dependent variable = wanting a social service (yes/no))  a range of predictors: dichotomous (gender) as well as continuous (income) I use WLSMV as an estimator because my dependent variable is dichotomous (yes/no). Now my questions: 1) Is it correct that the estimates that I get are probit regression coefficients? 2) I get Model results (with Pvalue) and STDYX results (without Pvalue): which numbers do I need to report?  Do I reported the standardized estimates (using the Pvalue of the unstandardized model results) so that I can compare the strength of the paths?  Or do I just report the unstandardized results (and can I then not tell something about the strength of the associations)?  Do you know a paper that reports the outcomes of a similar analysis? Many thanks for this information, best regards, Dirk Prof. DirkJan De Vlaeminck 


With WLSMV and a categorical dependent variable, the regression coefficients are probit regression coefficients. You should look in a journal where you might publish these results and see what they report. What to report varies by the discipline. 


Thanks for this information. I checked and I see two things: (1) Moest articles publish betacoefficents + a Pvalue while I do not get a Pvalue for the STDYX coefficients. Can I just use the PValue of the unstandardized model results for the standardized results too? I suspect not? (2) The interpretation of these numbers is also not clear to me. is this interpretation correct? a) A unstandardized probit regression coefficient of 0.46 between income and wanting a social service (0/1)would mean 'an increase of 1 dollar in income, means an increase in the Zscore of wanting a social service by 0.46 b) a standardized probit regression coefficient of 0.22 between income and wanting a social service (0/1) would mean that 'an increase in income with 1 standard deviation, leads to an increase in the Zscore of wanting a social service by 0.22. c) One could also translate this info in probabilities using the thresholds and the standardized/unstandardized(??) coefficients, although I do not understand how. Could you help me solve these four queries ((1) and (2)a, b & c?) Many thanks in advance, DirkJan 


1. No, the two sampling distributions are different so the pvalues are different. 2. See the STANDARDIZED option in the user's guide for interpretation of standardized coefficients. See Chapter 14 for translating probit regression coefficients into probabilities. 

Back to top 