Message/Author 

Benedetta posted on Wednesday, February 21, 2001  3:21 am



I read that Mplus have been greatly expanded in version 2. I have the first version and I'm very interested in it. I would like to know if you use a full maximum likelihood approach to estimate a multilevel structural equation modeling with binary and/or ordered categorical indicator variables and wich type of approach. And if factor scores are available for this type of models. 


These analysis options are not available in Mplus Version 2. I don't know that there is any program available that estimates this type of model. We are currently doing research on this topic. 

Anonymous posted on Saturday, July 20, 2002  8:16 pm



Hello, The default estimation method for EFA with categorical indicators is ULS. However, the default for CFA is WLSMV. What are the reasons for preferring ULS as the default in an EFA over WLSMV? 


The reason that ULS was chosen as the default for EFA is that it is much faster. It is a good way to look at a large set of categorical outcomes more quickly that using WLSMV. 

CMW posted on Thursday, December 12, 2002  10:27 am



Greetings, For factor analysis, is it OK to have some items that are binary and others that are polytomous in a single analysis? 

bmuthen posted on Thursday, December 12, 2002  10:32 am



Yes. See page 180 of the Mplus User's Guide. 

Kyung Kim posted on Tuesday, April 01, 2003  6:07 am



Can I calculate factor scores, using Mplus program, for continuous latent variables with categorical (e.g., 5point or 6point Likert types) indicators? If not, are there other ways for doing this? 


Mplus does estimate factors scores for ordered polytomous variables. You ask for them through the SAVEDATA command. 

Per Togo posted on Tuesday, May 20, 2003  2:10 pm



I have used a CFA model with three factors and categorical variables (8 categories) x BY a b c d; y BY e d f g; z BY h i j k; I could not find a description of how the program computes the factorscores for more than binary variables(using categorical, savedata, fscores). (I'm not that experienced in matrix calculations and need a description for a paper) Can you help me out? 


Appendix 11 of the Mplus User's Guide contains a description of how factor scores are estimated in Mplus. The description for binary variables which begins with formula 227 applies also to polytomous variables. For categorical outcomes, factor score estimation is an iterative technique. 

Anonymous posted on Monday, June 02, 2003  7:17 am



I would like to know how does Mplus estimate the thresholds? 

bmuthen posted on Monday, June 02, 2003  7:46 am



When there are no x variables (covariates) in the model, the thresholds are simply the normal z scores corresponding to the univariate probabilities. With x's, they are obtained via probit regression. 


Hi! Working on a problem which would require a polychotomous regression with both manifest and latent independent variables. I fail to solve how that would be written in the syntax. 


I'm a little confused. You say you have both manifest and observed independent variables. Are the factor indicators for the latent independent variables continuous or categorical? What is the dependent variable in the model? Is it observed or latent? If it is observed, the statement in the MODEL command would be: y1 ON x1 f1; where y1 is an observed binary or ordered categorical variable, x1 is an observed independent variable, and f1 is a latent independent variable. You would also need to identify the y1 variable as categorical by using the CATEGORICAL statement of the VARIABLE command. Please let me know if this does not answer your question. 


Hi! I failed with this syntax: VARIABLE: ... CATEGORICAL ARE ammk993l ; ANALYSIS: TYPE IS LOGISTIC; ESTIMATOR IS MLR; ... MODEL: climate BY a3 a21 isuhde2 asuhde2; ammk993l ON climate sesdiko ero ; With TYPE IS LOGISTIC that obviously is impossible. I solved it by modelling in the first step the latent variable and saved the factor score and did the logistic using the factor score. Erkki 


TYPE=LOGISTIC is for a single dependent varirable. If you do TYPE = GENERAL with the CATEGORICAL statement, you will estimate a probit regression for the categorical dependent variable. I think this is what you want to do. There is a summary of analysis types and estimators available for each analysis type on page 38 of the Mplus User's Guide. 

Anonymous posted on Thursday, November 06, 2003  4:30 pm



Hi there, I want to create a latent trait variable with categorical indicators (same variable across 6 waves) and the WLSMV estimator and wanted to check to see if my code was correct. sui by x1 x2 x3 x4 x5 x6; Since these indicators do I have to specify some code for the thresholds? Also, how would I interpret the thresholds at each time point? Thanks! 


The way you are coding this does not take time into account. But you don't mention that you are interested in change over time. So then it is ok. If you are not taking time into account, then it is not necessary to include thresholds in your analysis. 

Anonymous posted on Tuesday, January 04, 2005  5:57 am



I am wondering what is the most appropriate way to handle observed (not latent) categorical covariates in a WLSMV framework. In my study X1 is an ordinal variable (6 classes) and X2 is binary. Currently, this is how I am specifying the model: Catategorical are p1p7 a1a8 d1d6; f1 by p1p7; f2 by a1a7; f3 by d1d6; f3 on f1 f2 X1 X2; Is it alright that the independent X variables are not specified as categorical? In addition, if I try to estimate the covariance between X variables and F1 F2 (i.e., X1 X2 with F1 F2) I get an error message about X1 being binary (the program then tells me to specify it as categorical). However, if I do not free the covariance between the X1 X2 and F1 F2 the model has a poor fit. Suggestions? 


You should not put x1 and x2 on the CATEGORICAL list. The CATEGORICAL list is only for dependent variables. You should treat your covariates as you would in regular regression. The binary variable can be left as is. The ordinal variable should be turned into five binary dummy variables. You should not specify any covariance between x1 and x2. The model is estimated conditioned on x1 and x2. If you want to know the covariance between x1 and x2, you can do a BASIC run and obtain this. 

Anonymous posted on Tuesday, January 04, 2005  6:31 am



A few followups: 1) Should I specify the covariance between the observed categorical variables (either the binary, ordinal, or both) and the other latent predictors? I know that Mplus does not do this as the default, but it degrades the model fit. 2) If the observed X variables were continuous would your answer to question 1 be different? 


1. You should not specify the covariances among the x variables. By doing so, you bring them into the model estimation and thereby the assumption of normality applies to them. You are not estimating the model with these covariances fixed to zero. The model is estimated conditioned on the x's. Therefore, the assumption of normality does not apply to them. They are not part of the model so if the fit is poor, it is not because the covariances are zero. They are not. It's the same as regular regression. If you have ten covariates, the parameters of the regression model are an intercept and ten slopes. The model does not estimate the covariances among the x variables. 2. No. 

Anonymous posted on Tuesday, January 04, 2005  7:51 am



Linda, When I include the categorical X variables into the model (without specifying any covariance), I get the following error message: *** FATAL ERROR VARIABLE X1 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL. 


The only place the categorical x variables should be is on the righthand side of ON. You should not mention their means, variances, or covariances. 

Anonymous posted on Tuesday, January 04, 2005  11:27 am



Last question in this vein. Please excuse my ignorance. When observed variables are added as covariates to the right side of an ON in an SEM model, should I be concerned that their presence dramatically reduces the fit of the overall model? The modification indicies are indicating that regressing them onto other latent predictor variables in the model would improve model fit. However, I really am only interested in examining if the latent predictors are significantly related to the latent DV after controlling for observed variables like race, ses, prior delinquency etc. Thanks for your help. 

bmuthen posted on Tuesday, January 04, 2005  4:03 pm



Typically, other latent predictors would be regressed on the covariates since the covariates are often demographics and therefore antecedents  so such relationships should be included. Model fit can also suffer from leftout direct effects from the covariates to the indicators of endogenous factors. 

Anonymous posted on Wednesday, January 05, 2005  9:03 am



When I regress the covariates onto the latent predictor variables the model becomes extremely poor fitting. I have been racking my brain trying to figure out why this is occuring, but have made no headway. Do you have time to take a quick peek if I send you the data and syntax? 


Send two outputs to support@statmodel.com. The one that fits well without the covariates. And the one with the covariates. 

Anonymous posted on Thursday, January 20, 2005  7:55 pm



Hi! I am looking at a multiple group model with a binary outcome variable. I don't have a categorical latent variable, but only persistence with no=0, yes=1 coding. partial output: _______________________________________ GROUPING IS ethn69 (1=white 2=latino); CATEGORICAL = persiste; GROUPING IS ethn69 (1=white 2=latino); Analysis: type = general missing MEANSTRUCTURE h1; estimator = ML; *** ERROR in Analysis command ALGORITHM = INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE = MIXTURE. _______________________________________________ I have tried using the KNOWNCLASS option for TYPE=MIXTURE as recommended but I get a message telling me (correctly so) that I don't have a categorical latent variable. I have four latent variables and a binary outcome variable, how/where should I put this variable? 


To use KNOWNCLASS, you need to use the CLASSES and KNOWNCLASSES option: CLASSES = cg (2); KNOWNCLASS = cg (ethn69 = 1 ethn69 = 2); Check KNOWNCLASS in the index of the user's guide. IT should point to an example. 

Anonymous posted on Saturday, January 22, 2005  6:26 pm



Hello again! Thank you for your prompt reply Linda. I want to thank Thuy for his valuable help with my model as well. I have the following questions: I began doing a multiple group analysis and have developed group specific models using the modification indices provided by the outputs. Now that I will be using the KNOWNCLASS option with TYPE=MIXTURE, 1) How is a group specific model developed? 2) Is it possible to analyze the measurement model first and then the structural model. I have not added covariates but I would like to at least add "primary language" and "income" 2) do I use modification indices from the multiple group analysis models or how do I determine measurement invariance? 3) how are the starting values of the variables for each group selected and for each observed factor indicator for each latent variable? I have four latent variables: GC at time 1 (five factor indicators), GC at time 2 (five factor indicators), academic integration (2 factor indicators), and social integration (2 factor indicators); and two categorical variables: persistence and performance (GPA). 4) Does MPlus analyze indirect effects with the TYPE= MIXTURE model? in addition I am measuring one latent variable at two time points: I know that growth mixture modeling is not appropriate since I only have two time periods, how is effect of time taken into account in measuring this model? Pardon my MPlus illiteracy. Thank you for your help. 

bmuthen posted on Sunday, January 23, 2005  3:43 pm



1) In the Model command you specify groupspecific parameters within %c#% statements, e.g. %cg#1% f; says that the variance of the factor f is specific to class 1. 2) I think MIs are available in this environment too, but you can also use chisquare difference testing via 2 times the logL differences of nested models. 3) Mlpus provides default starting values. 4) Pleae check with Thuy. 5) Analyze the outcomes for the 2 time points together: if you have 5 outcomes per time point you analyze 10 variables. 

Anonymous posted on Friday, March 25, 2005  6:54 am



Drs. Muthén & Muthén, I have what I hope is a very quick question. I have been reading (and rereading) the Muthén, du Tiot, & Spisic (1997) paper on your robust methods. From what I understand, Mplus uses Equation 44 to obtain parameter estimates when WLSMV is invoked. The weight matrix in Equation 44 is the inverse of a diagonal matrix, W^1. For WLSMV (and WLSM) I could rewrite Equation 44 by replacing W with Gamma_D where Gamma_D here is the diagonal elements from Equation 48. Is my understanding correct? Thank you. 

bmuthen posted on Friday, March 25, 2005  9:08 am



Yes. 

Anonymous posted on Thursday, April 28, 2005  1:45 am



I'm trying to construct a twoway ANOVA like interaction in MPLUS 3.11 using two observed categorical variables. The interaction is between gender (two categories) and family structure (five categories). The dependent variable is continous measure of antisocial behavior. In the Mplus user guide (p.420) you list various ways of obtaining interactions for different variable types, but have not included the interaction between observed categorical with observed categorical. Which is the best method to use in that case ? I have used the following method: (1)Coded the family structure variable into four dummy codes (the omittet is the reference group) predicting antisocial behavior. (2) Used multigroup analysis to examine if there is any significant differences between the unstandardized betas (mean values) for boys and girls. Is this the correct way to do it ? If it is, do you know any litterature that deals with this categorical times categorical variable interaction using multigroup SEM? Are you allowed to use multigroup analysis when you have multilevel design? Thanks in advance. 


You can use DEFINE or the multiple group approach as you have done above. I don't know of any reference to this. For some multilevel models, the GROUPING option can be used for multiple group analysis. If not, then the KNOWNCLASS option can be used instead. 

Fred Li posted on Friday, November 18, 2005  3:39 pm



Hello ! I am a new Mplus user. Is there anyone who can help me with the following Mplus program that has an error message at the end of it? My intended task was to create a polychoric correlation matrix for categorical variables for further CFA analysis. Thanks in ADVANCE!! TITLE: Number Sense in 2005; Note that this program has saved out the polychoric correlation matrix in a file called test.pcm; DATA: FILE IS "C:\Documents and Settings\USR1\®à±\NS2005.dat"; VARIABLE: NAMES ARE q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19 q20 q21 q22 q23 q24 q25 q26 q27 q28 q29 q30 q31 q32 q33 q34 q35 q36 q37 q38 q39 q40 q41 q42 q43 q44 q45 q46 q47 q48 q49 q50 q51 q52 q53 q54 q55; USEVARIABLES ARE q2 q3 q4 q5 q6 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 q20 q21 q22 q24 q25 q26 q29 q30 q31 q32 q33 q34 q35 q36 q37 q38 q39 q40 q41 q43 q44 q45 q46 q47 q48 q49 q50 q51 q52 q53 q54 q55; CATEGORICAL Q2Q55(15); ANALYSIS: ESTIMATOR IS ML; ITERATIONS = 1000; CONVERGENCE = 0.00005; MODEL: ETA1 BY q2  q55; SAVEDATA: TYPE IS CORR; FORMAT IS F5.3; SAMPLE C:\TEST.PCM; *** ERROR in Savedata command Only sample correlation matrix may be saved when there is at least one categorical dependent variable. 


It looks like you have not used the FILE option of the SAVEDATA command to give the name of the file in which to save the correlation matrix. Note that with categorical outcomes, you cannot analyze a correlation matrix and a weight matrix in Mplus but need raw data. You should send these types of questions to support@statmodel.com and include your license number. 

adthrash posted on Thursday, March 02, 2006  9:37 am



Hi: I have a SEM model using weighted data that will converge with no errors when all the variables are considered continuous but won't run when some indicators are considered categorical (binary) or count. f1 has 7 binary indicators (yes/no) and f2 has 4 count indicators (07 days) and one continuous indicator. ... CATEGORICAL ARE x1 x2 x3 x4 x5 x6 x7; COUNT ARE y1 y2 y3 y4; ... ANALYSIS: TYPE IS GENERAL COMPLEX; ESTIMATOR IS MLR; MODEL: f1 BY x1 x2 x3 x4 x5 x6 x7; ... f2 BY y1 y2 y3 y4; ... MODEL INDIRECT: f2 IND x8; f2 IND f1; f2 IND f3; The error message I receive is "MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION," which I did not specify. If I comment out either the CATEGORICAL or COUNT lines, I get the same error. If I comment out both, everything runs fine. 


When you use MLR with either CATEGORICAL or COUNT, numerical integration is required. Therefore, MODEL INDIRECT is not available. 

adthrash posted on Friday, March 03, 2006  5:56 am



Thanks for the quick response. The model terminated normally when CATEGORICAL was used and I commented out the COUNT and MODEL INDIRECT. However, now I get the error "MODINDICES option is not available for ALGORITHM=INTEGRATION." Could I consider all of my variables continuous in order to get MODINDICES, but use the parameter estimates derived when considering the vars categorical when reporting results? 

bmuthen posted on Friday, March 03, 2006  6:33 am



That would only be very approximate. Instead, you might want to explore model variations and do chisquare difference testing via 2*loglikelihood difference. 


Hello,my model contains latent factors, two dependent manifest indicator (d7; d16). However, if I put b3 (male=1) into my model I will get the following error message below: (...) USEVARIABLES ARE b3 d7 d16 c6 c10 c18 c20 c11 c12 c13 e1e20 wgt; CATEGORICAL ARE b3 d7 d16 c6 c10 c18 c20 c11 c12 c13 e1e20; WEIGHT IS wgt; Model: MO1 BY c6 c11 c10 c18 c20; MO2 BY c11 c12 c13; f1 BY e1e20* e5@0 e9@0 e13@0 e17@0; f2 BY e1e20* e2@0 e9@0 e13@0 e17@0; f3 BY e1e20* e2@0 e5@0 e13@0 e17@0; f4 BY e1e20* e2@0 e5@0 e9@0 e17@0; f5 BY e1e20* e2@0 e5@0 e9@0 e13@0; f1f5@1; d7 d16 ON b3 MO1 MO2 f1f5; (...) ** ERROR in Variable command CATEGORICAL option is used for dependent variables only. B3 is not a dependent variable. Thanks, Stephan 


Sorry, I've just found the answer in one of your web seminar files. We do not distinguish between scales if the variable is an IV (x). However, I guess that in spss user have to specify if they'll use a categorical variable. But this seems not to be necessary with Mplus, right? Regards, Stephan 


In regression analysis, independent variables (covariates) can be binary or continuous. In both cases, they are treated as continuous. In SPSS, perhaps the specification of categorical for an independent variable automatically creates a set of dummy variables for it. In Mplus, if you have a nominal independent variable, you must create a set of dummy variables. 

Soyoung Lee posted on Wednesday, March 21, 2007  6:17 pm



I have a question about the dimension of the weight matrix of WLS/WLSM/WLSMV with categorical indicators in factor analysis. I understand it is [(p(p+1)/2) x (p(p+1)/2)] when p=# indicators, p(p+1)/2=# univariate & bivariate marginal proportions, in dichotomous case. I wanted to know what it would be in polytomous case. [((C1)p+p(p1)/2) X ((C1)p+p(p1)/2)] with C=# categories, (C1)p+p(p1)/2= (# thresholds) + (# polychorics)? Also, I found the WLS and WLSMV estimates of thresholds are slightly different. Aren’t they the same since thresholds are the normal z scores corresponding to the univariate probabilities? Or, are these updated in the 3rd stage of the Muthen's WLS procedure? Thanks. (estimates with WLS) L01$1 1.440 L02$1 0.547 L03$1 0.137 L04$1 0.714 L05$1 1.129 (estimates with WLSMV) L01$1 1.433 L02$1 0.550 L03$1 0.133 L04$1 0.716 L05$1 1.126 


Yes on your first question (assuming all items have the same number of categories C). The modelestimated thresholds are different between WLS and WLSMV because in WLS it is acknowledged that the sample statistic thresholds are correlated among themselves and with the polychoric correlations, while with WLSMV a diagonal weight matrix is used (no correlations at all). The Muthen (1978) Psychometrika article details this in terms of WLS vs ULS. 


Hi, We are trying to perform CFA with MPLUS, including both continuous indicators and ordered polytomous indicators (likert 5ctg). I ran CFA indicating which variables were categorical (CATEGORICAL instr) and WLSMV estimator. Result: the continuous items have big residual variances (rsq<0.2), small loadings (sdtYX <0.45)and not very good GOF indices. However, when treating all the variables as continuous (and MLM estimator), I obtain better results. The results are not awesome, but the most usual GOF indices are over the recommended thresholds (CFI and TLI > 0.95, RMSEA=0.015, Standardized Root Mean Square Residual=0.049). Also, residual variances are relatively small (except for one item, FD21= 0.756). ChiSq test is significant in both analysis, but sample size is 4400 indiv. I understand that it is more appropriate to treat the variables as categorical, specially taking into account the highly skewed distributions of our items, but the fact that I obtain better results with the continuous variables approach makes me doubt. Could you tell me which is the best approach to analyse this data? does it make sense to compare GOF indices results of both methods and use the one that provides better fit? Or is it incorrect to analyse ordered categorical items not taking into account that they are actually categorical (i.e. not using CATEGORICAL statement)? Many Thanks! 


If your items have strong floor or ceiling effects, treating them as continuous results in attenuated correlations. The low correlations can result in better model fit because there is less power to reject the null hypothesis. Categorical items with strong floor or ceiling effects should be treated as categorical. 

David Kerr posted on Monday, May 28, 2007  9:50 am



HelloI’m trying to test a model that has latent variables each comprised of categorical indicators (questions that can be answered agree, disagree, or don’t know). The “don’t knows” are of interest, so I thought I would create two dummy coded variables: one coded 1 = agree, 0 = disagree/don’t know, and another coded don’t know = 1, agree/disagree = 0. My thought was that I might be able to create a “don’t know” factor that would capture this response type, and make the other factors more interpretable in terms of agree/disagree. However, when I put these variables in the model at the same time, I got error messages indicating that correlations between variables result in empty cells (i.e., no one can have a value of 1, 1). Is such a model with dummy coded dependent variables possible? 


I think there is a literature on how to treat "don't know". You may find some suggestions there on how to treat this category. Categorical outcome methodology uses information from bivariate tables of outcomes. A zero cell in a bivariate table implies a correlation of plus or minus zero resulting in estimation problems. 


HI. I'M HAVING A PROBLEM WHEM TRYING TO RUN THE FOLLOWING MODEL: I HAVE THREE LATENT CONTINUOUS VARIABLES, EACH OF ONE HAVE 4 DICHOTOMOUS INDICATORS (1,0) I HAVE ALSO AN OBSERVED EXOGENOUS VARIABLE (AGE) THE MODEL: F1 by A B C D; F2 by E F G H; F3 by I J L M; F3 F2 ON F1; F3 F2 F1 ON AGE; F3 WITH F2; I USED WLSMV FOR ESTIMATOR, BUT IN THE OUTPUT FOR EACH FACTOR by... IN THE FIRST INDICATOR I CAN´'T GET THE EST./S.E. AND THE TWOTAILED PVALUE, I GET ALWAYS 999.000. I'M VERY NEW IN mplus, WHAT AM I DOING WRONG PLEASE? 


The first indicator in fixed to one as the default to fix the metric of the factor. See the BY option in Chapter 16 of the Mplus User's Guide for a description of this. A fixed parameter has no standard error. 


What a fool i am. Thank you Linda. 


Hi, I'm trying to specify an interaction model using MPLUS 5.1 The interaction is between age(H9AGE, continous) and perceieved diversity (C1SAGEr, six categories). The dependent variable is categorical measure of emotional exhausion, depersonalization, and personal accomplishment In the Mplus user guide (p.522) you list various ways of obtaining interactions for different variable types. I have specified the model as follows: CATEGORICAL ARE B1EE B2EE B3EE B6EE B10EE B15EE B16EE B25EE B5DP B12DP B13DP B17DP B27DP B4PA B7PA B11PA B19PA B20PA B21PA B26PA; DEFINE: PAgeH9 = C1SAGEr*H9AGE; ANALYSIS: ESTIMATOR = WLSMV; MODEL: !MEASUREMENT MODEL FOR 3 FACTORS, CFA BURNOUT CATEGORICAL; EE BY B2EE B1EE B3EE B6EE B10EE B15EE B16EE B25EE B4PA B13DP B27DP B11PA B17DP; DP BY B12DP B5DP B13DP B17DP B27DP B6EE B25EE B26PA B15EE; PA BY B19PA B4PA B7PA B11PA B20PA B21PA B26PA B25EE B10EE B3EE B17DP; EE ON C1SAGEr H9AGE PAgeH9; DP ON C1SAGEr H9AGE PAgeH9; PA ON C1SAGEr H9AGE PAgeH9; QuestionsL 1. should the IV be correlated? 2. Is this the correct way to do this? Would it be better to use multigroup analysis to examine this interaction? Thanks, Angela 


Also, I was trying to run a model with multigroup analysis to test the interaction. However, I received this warning: Based on Group 1: Group 2 contains inconsistent categorical value for B25EE: 6 


1. Means, variances, and covariances of exogenous observed variables should not be part of the MODEL command. The model is estimated conditioned on these variables. If you want to know these values, ask for SAMPSTAT in the OUTPUT command. 2. You cannot use multiple group analysis when one of the variables in continuous. The way you specified it is fine. 


I have a data set for categorical CFA involving 12 categorical variables on a single factor. I keep receiving the error: SERIOUS COMPUTATIONAL PROBLEMS OCCURRED IN THE BIVARIATE ESTIMATION... When I look at the output correlation matrix, I see that one variable is correlated 999.00 with two other variables while all of the rest are fine. There are no odd values in my data set. Any thoughts for why this is happening? Thanks! 


It sounds like there is a problem computing the sample statistics. Please send your input, data, output, and license number to support@statmodel.com. 


Hi, I'm doing CFA for a new developed test. It comprises 23 items which belong to 14 testlets. Hence, testlets comprise 1 to 4 items. Items are scored as either right (1) or wrong (0). To account for dependencies among items belonging to one testlet, testlet scores are computed (sum of correct items per testlet). Maximum testlet scores vary between 1 (for testlets with only one item) and 4 (for testlets with four items). I've done CFA, using the CATEGORICAL option. Because I was not sure whether the indicators need to have the same maximum value, I rescaled all testlets scores  12 being the maximum number (all items correct) and 0 the minimum number. For testlets with four items, the testlet score can be 0, 3, 6, 9, 12; for testlets with three items 0, 4, 8, 12; for testlets with two items 0, 6, 12; for testlets with one item 0, 12. Redoing CFA my results differ in magnitude with regard to chi square, fit indices, factor loadings. Which results to trust now? Is a common scale with the same maximum score for all testlets necessary or shall I leave the data as it is? Thank you very much for your help! 


I think the approach described in the first paragraph is most transparent. The approach described in the second paragraph makes assumptions that may not be true. I would leave the data as is. 


Hi, there is a paper by Browne and Du Toit from 1992 ("Automated Fitting of Nonstandard Models", Multivariate Behavioural Research, 27 (2), 269300)where they suggest estimating "Root Deterioration per Restriction RDR" for model comparison. The formula is RDR = square root of ((delta chi²  delta df)/(N* delta df)) I'm using the WLSMV estimator for categorical data. Am I correct in assuming I should use the delta chi² and delta df I get in my output when I do the chisquare difference test using the DIFFTEST option for the above formula? Thanks for your help! 


I think the paper by Browne and DuToit is for WLS not WLSMV. 


Is it not possible to use RDR for WLSMV? Or is there an alteration of RDR for WLSMV? 


No, this is not possible. For WLSMV, the chisquare test statistic and degrees of freedom are adjusted to obtain a correct pvalue. Only the pvalue should be used. 

Cecily Na posted on Friday, December 03, 2010  2:52 pm



Dear Linda, I read a textbook and where there is an Mplus syntax sample. It looks like they use a * for underlying distribution of a categorical or dichotomous indicator. For example: f1 by q1*q12* where q1q12 are categorical variables. Do we need to use *? Do I have to write Criminal* ON drug where criminal is a categorical variable (but not a latent factor)? 


No, you don't need to use an asterisk. The way Mplus knows criminal is categorical is because you put it on the CATEGORICAL list. 


Dear Dr.s Muthén I am trying to do a multiple groups comparison, however, I get the following error: *** ERROR Based on Group 0: Group 1 contains inconsistent categorical value for Q61RAW: 1 What exactly does it tell me? Thank you! Sabine 


It tells you that the number of categories for Q61RAW is not the same in both groups. Check your data. The groups must have the same number of categories for the categorical variables. 

Sanja Franic posted on Wednesday, November 16, 2011  1:32 am



Hi, I get the following warning when trying to run a multigroup analysis on categorical variables: Based on Group 1: Group 2 contains inconsistent categorical value for Y21: 5 The observed variables are all defined as categorical. Could you maybe give me a hint as to what this warning could indicate? Best, Sanja 


Apparently, y21 does not have the value 5 in group 2. With weighted least squares, each group must have the same categories for categorical variables. You could collapse categories in group 1. With maximum likelihood estimation, you can use the * setting of the CATEGORICAL option. 

Sanja Franic posted on Wednesday, November 16, 2011  10:08 pm



Thanks!! I hope I can do the same/similar with WLSMV estimation. 

Sarah Ryan posted on Friday, September 21, 2012  1:04 pm



I just want to confirm something and I have searched the discussion board and web, but can't find an explicit answer. Is it correct that the output section "UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES" provides the model estimated (as opposed to observed) proportions and counts? 


The are not model estimated values. They are sample statistics. 

Sarah Ryan posted on Monday, September 24, 2012  6:13 am



Linda, Okay, I originally thought so, but I get "fractions" of people in the sample counts and a colleague was insisting these must be estimates given that the counts were not whole numbers. So, the ".XX" on counts is just a function of the proportions also being fractional? Thanks for indulging me on this rather basic question. 


Please send the output and your license number to support@statmodel.com so I can exactly what you are seeing. 


Hi, I am a student in Ege University, TURKEY. I managed to generate data sets with continious indicators. But my homework is generating data sets for my friends' homework using Mplus. The programme is only available in computer labs and i usually have limited time to use. I am trying to generate data sets with ten items; 6 item for factor 1 and 4 items for factor 2 which are on 5 point likert scale. I know it sounds simple but I couldn't. Here is my syntax: montecarlo: names = y1y10; nobservations = 200; nreps = 25; seed = 2344887; generate = y1y10 (5 p); categorical are y1y10; repsave = all; save = odev*.dat; MODEL POPULATION: f1 ON y1y6*.65; f2 ON y7y10*.70; f1f2@1; f1 WITH f2*.2; y1y10*.32; OUTPUT: TECH9; As you see I can't reach the solution. Would you please help on my syntax or show me a simple example. Thanks. 


See mcex5.2.inp. It generates data for two factors with binary indicators. It is available on the website. 


Thanks for your help. 

Claire posted on Wednesday, November 14, 2012  7:31 am



Hello, I would like clarification regarding some issues in the interpretation of path models using WLSMV and theta parameterization. In my model: Y ON X1 X2 X3 X2 ON X1 X3 ON X2 Model indirect: Y IND X1 X1X3 are binary and Y is continuous. If I'm right in my interpretation the coefficients for Y ON X1X3 are normal linear regression coefficients and X2 ON X1 and X3 ON X2 are probits. 1. If that's right how do I interpret the indirect effect when it is the product of two probit coefficients and a linear coefficient? 2. Also if I wanted to standardize the results which solution would I use  StdYX or Std or are neither appropriate? 3. Using WLSMV (theta) can any of the normal fit statistics be used to assess model fit? Many thanks. 


Yes, y is linear and the x's are probit if you are using the default estimator WLSMV. 1. Linear 2. StdY 3. Yes. 

Claire posted on Thursday, November 15, 2012  1:52 am



Thanks Linda (forgive me as I'm new to Mplus/SEM) Re: 2. When asking for the standardized output it only gives me StdYX and Std. Is there a way to get StdY? If not is it sufficient to just report the unstandardized estimates? Re: 3. Which particular fit statistics are safe to use when using WLSMV (theta) and do you know of a reference to read on this? 


Divide StdYX by the standard deviation of x. I know of no paper that has studied this. I would use the same fit statistics as for Delta which are the ones provided in the results section. 


Hi, I am running a structural model where I have 10 3point likert indicators that are theorized to measure 4 constructs, 3 being predictors and 1 being the outcome. I want to declare these variables as categorical ordinal. However, one of the predictor constructs has only one indicator, so if I just include it as an independent variable, it cannot be declared as an categorical ordinal. Is it OK to include a factor for it (to make it a y so I can declare in as categorical) and fix both the loading and variance of that oneindicator factor at 1? Would this be equivalent to using the underlying continuous version of the categorical variable? Below is my input. The variable in question here is var7. ANALYSIS: NAMES = var1var10; USEVARIABLES = var1var10; CATEGORICAL = var1var10; MODEL: f1 BY var1* var2 var3; f2 BY var4* var5 var6; f3 BY var7; f4 BY var8* var9 var10; f1@1 f2@1 f3@1 f4@1; f4 ON f1 f2 f3; Thanks much! 


Yes, this would be equivalent to doing that. I'm not sure why you would want to do this given that you make unnecessary normality assumptions. I would use the observed variable as the covariate or create a set of dummy variables to represent the categories and use them as covariates. 


Thank you. Yes, I need to think about whether to do this or to use dummy variables. The pro of using an assumed normal continuous version of var7 is the result will be simple  just one regression coefficient, similar to one coefficient for each of f1 and f2. Conceptually these are three things hypothesized to influence f4, and it would be nice to have three coefficients instead of four. The con is introducing inaccuracy through normality assumption. Var7 has three ordinal categories and there are roughly the same number of people in each category. Does this suggest that it is perhaps OK to assume normality for the underlying continuous variable? Or should I still worry that it might be skewed and the equal proportions could be just an artificial result of the thresholds? In practice, is it something that is sometimes done, to treat an ordered categorical independent variable this way? Or is it a no no? Thanks much for advising me on this. I am quite inexperienced in SEM. 


I vote for treating the observed 3category Var7 as a continuous variable  that is, as a regular covariate. I say that because you say its distribution is symmetric and because when you treat it as a regular covariate you don't introduce an assumption of underlying normality. 


Thank you so much! So that I am crystal clear, your suggestion is to include the raw variable as a continuous variable (ie take it out of CATEGORICAL =), and not introduce that f3 factor? 


Right. 


Thank you! 


Hi, is it possible to use the AUXILIARY (M) command with categorical data? I get an error message when trying but I seem to recall it was possible with Mplus 5? Is there a way around this problem? Thanks! 


All variables on the AUXILIARY list are treated as continuous. There may not have been an error message in Version 5. 


Hi, sorry, my question was not clear: not only my auxiliary variables, but also my indicators of the latent variable are categorical; hence, I use the WLSMV estimator. Is the AUXILIARY (M) option not possible with categorical indicators? Thanks! 


AUXILIARY (m) is available only for continuous variables. 


Hi, I would like to model some kind of twin model: that is two latent variables, each measured by 3 binary indicators. Measurements might be parallel. I have other covariates to add. I tried to model this by factor analysis, that is i have two continuous latent variables measured by 3 indicators each. Though i have a grouping variable. It results in very good model fit (all Chi², RMSEA, TLI, Chi² difference tests). So i resume the indicators are good measurements of some underlying factor. But, if I try to fit the model using LCA as a framework, that is the latent variables shall both be binary classes (with 3 the same result), model fit turns out to be awful. Is there any reasoning why this doesn't work, although the indicators seems to be fine? Is there anything to try out or to look at as an explanation? Thank you very much for your help, Juliana 


With the LCA for twins, you should have two latent class variables, one for twin a and one for twin b. Each one should have two classes. You need to use PARAMETRIZATION = LOGLINEAR and use WITH to correlate the two categorical latent variables, for example, c1 WITH c2; 

Wendong posted on Wednesday, April 24, 2013  6:04 pm



Hi Dr. Muthen, May I ask a quick question? I know strictly speaking, we should treat Likerttype (e.g., 15 point) item level data as ordinal and thus in SEM we should use estimator=MLSMV. But it seems that many people treat such scales as continuous. Thus my question is, could we still treat such items as continuous, but use WLSMV as estimator, to deal with data skewness at item level? Thank you so much! 


You can treat them as continuous as long as they don't have floor or ceiling effects. If they do, I would treat them as categorical because that methodology can handle the floor and ceiling effects. You can't use WLSMV without having at least one categorical variable in the model. 

Lois Downey posted on Thursday, May 01, 2014  11:44 am



I have a situation in which the univariate proportions/counts listed in the output do not make sense. If I run an unclustered model with the full sample, the univariate values are correct. If I run a clustered model with the full sample, the univariate values continue to be correct. If I select a subsample with the USEOBSERVATIONS command and run an unclustered model, the univariate values continue to be correct  although, as expected, smaller than for the full sample. However, if I select a subsample with the SUBPOPULATION command and run a clustered model, the univariate values are incorrect. In fact, the counts aren't even integers. What might I be doing incorrectly? 


Send the USEOBSERVATIONS and SUBPOPULATION outputs and your license number to support@statmodel.com. 

jane shen posted on Friday, June 27, 2014  12:24 pm



Hello When sampling weights added in the model, what is "UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES"? They are no longer sample statistics for the raw data, nor for the weighted data. Mplus version: 7.11 Analysis: ESTIMATOR = WLSMV; thanks Jane 


Please send your input, output, and data to Support@statmodel.com. 


I have a question of how to interpret and to best standardize the estimates of the model below. My main interest lies on the regression coefficient of continuous time survival outcome Y on factor F, which is measured by one count (marker) indicator (X1) and two binary indicators (X2, X3). COUNT = X1; GATEGORICAL = X2 X3; SURVIVAL = Y; TIMECENSORED = cens; ANALYSIS: BASEHAZARD = ON; ESTIMATOR =MLR; MODEL: F BY X1 X2 X3; Y ON F; 1) Not sure how the factor F is scaled: a count indicator X1 is used to set the scale, but based on the model output, the link function used is logit. Shouldn’t the link in count outcomes be log instead? So, in this case, how to interpret the factor scale? 2) Given the above, what’s the best way to standardize here? I know that it makes no sense to standardize the binary indicators, but should I use STDYX standardization when interpreting the regression coefficient of Y on F? 


Set the metric in f by fixing its variance at 1 and free the first loading. The count link is not mentioned in the Mplus output but it is the usual elog link of counts. So then you don't need to worry about standardizing. 


Thanks Bengt. I did as suggested and the results changed a bit  the regression coefficient of Y on F had a different sign now (although nonsignificant in both cases). Is this what to expect? Moreover, as someone not too familiar with latent variables, does that scaling approach change the interpretation of the latent variable (e.g. is the latent still measured with "error" although its variance is fixed)? 


Q1. The sign should not change if you do it right. If yuo like you can send the 2 outputs (for different metric settings) to support along with your license number. Q2. No; same interpretation. 


The code I am using is: Y BY X1 X2 X3 vs. Y BY X1* X2 X3; Y@1; 


Send the 2 outputs that show the sign change to support along with your license number. 


A SEM with 2 continuous latent factors measured by y (cont.), u (count), & n (nominal) n_a & n_b have 4 categories n_c has 3 DEFINE: n_a1 = n_a == 1; n_a2 = n_a == 2; n_a3 = n_a == 3; . . n_c2 = n_c == 2; 1) How can I evaluate the contributions of each set of dummy variables? MODEL: f1 BY y1  y4 u1  u3; f2 BY y5 (n_a set) (n_b set) (n_c set); outcome ON f1 f2; 2) The f2 indicators measure a common construct but the (n_a set) measure a different facet of that construct than the other f2 indicators. If it is uncorrelated with the other indicators it won’t load highly on that factor but it is an important component of the construct so I want to be sure that its influence on the outcome variables isn’t underestimated. Would you suggest making an interaction term? f2 BY y5 (n_b set) (n_c set); f2int  (n_a set) XWITH f2; outcome ON f1 f2int; Would using the f2int be appropriate as a measure of the main effect of a single construct or would it be interpreted as a 3 way interaction? 


1) I don't think you should split up the nominal outcomes into dummy variables. Do the factor analysis with them declared as nominal. 2) Perhaps you want to use a bifactor model where the "different facet" is a "specific factor" different from the "general factor". I don't see how interactions would help. 


For earlier versions of Mplus, if you have a nominal independent variable, you must create a set of dummy variables. Is this statement still true with the most recent version of Mplus (7+). A lot of my IVs are nominal (up to 6 categories), is there an efficient way of handling them in Mplus in my twolevel logistic regressions? Thank you. 


Nominal IVs need to be turned into dummy variables. 

Pia H. posted on Thursday, March 24, 2016  5:52 am



Hello, I'm having a quite confusing error message: *** ERROR Based on Group 2: Group 1 contains inconsistent categorical value for SUPPORT_R: 201 I have had those ones before and got rid of them simply by collapsing categories; in this variable, however, all categories have been chosen in group 1 and group 2 (I checked); moreover, I don't understand why it says 201 instead of the categories available (15). Could you give me a hint what this may mean? Thank you very much! 


It sounds like you are not reading the data correctly. 


Hello Dr Muthen, I am working on SEM model and I have many categorical indicators. I would like to ask if there is a minimum number of observations that should I have in each category. Thank you, Owis 


No, but see our FAQ: Estimator choices with categorical outcomes 


Thank you, The document was very helpful 


Hi I am having a bizarre experience in MPlus not correctly recognizing binary variables. Basic univariate descriptive analysis conducted when my Stata file was converted to a .dat file showed my binary variables as being 0 minimum value (no) and 1 maximum value (yes). However, for the code: Title: Basic descriptives for Instrumental construct with binary indicators WITHOUT survey weights Data: File is youngadultPATHDec23.dat; Variable: NAMES = flavors smell helpquit socialize alttoquit; CATEGORICAL = flavors smell helpquit socialize alttoquit; Missing are all (9) Analysis: TYPE = Basic; I keep getting the following error message: *** ERROR Categorical variable FLAVORS contains 3887 categories. This exceeds the maximum allowed of 10. How can I get MPlus to recognize my variables as binary instead of 3887 categories (which is my total sample size)? thanks, Diane 


You either have more variable names in the NAMES list than columns in the data set, you have blanks in the data set which are not allowed with free format dat, or you are saying that the id variable is the categorical variable by the placement of the variable name on the NABES list. All of these will cause the data to be read incorrectly. 


Thank you so much Dr. Muthen for your immediate response!! I greatly appreciate it and will troubleshoot this further. Happy Holidays! best, Diane 


Hi, I am running an analysis with categorical indicators of a latent variable (MLR, loglink). I regress this latent variable on a predictor and am wondering how to interpret the regression coefficient. Should I see it as a logistic regression since the indicators are categorical, or is it linear regression since the latent factor is not categorical? I also wondered why the STDY and the STDYX regression estimates are the same. The predictor is a dichotomous variable. 


A categorical indicator has a logit/probit regression on the factor. The factor has a linear regression on the predictor because the factor is continuous. In the first type of equation STDY and STDYX are the same because there is no x. In the second type of equation they are different. 


Hello, What is the syntax for a factor mean for categorical data? Is it this? [Factorname]; Hillary 


Yes, so same as with cont's variables. 

Back to top 