Categorical indicators PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Benedetta posted on Wednesday, February 21, 2001 - 3:21 am
I read that Mplus have been greatly expanded in version 2. I have the first version and
I'm very interested in it.
I would like to know if you use a full maximum likelihood approach to estimate a multilevel
structural equation modeling with binary and/or ordered categorical indicator variables and
wich type of approach. And if factor scores are available for this type of models.
 Linda K. Muthen posted on Wednesday, February 21, 2001 - 8:10 am
These analysis options are not available in Mplus Version 2. I don't know that there is any program available that estimates this type of model. We are currently doing research on this topic.
 Anonymous posted on Saturday, July 20, 2002 - 8:16 pm

The default estimation method for EFA with categorical indicators is ULS. However, the default for CFA is WLSMV. What are the reasons for preferring ULS as the default in an EFA over WLSMV?
 Linda K. Muthen posted on Sunday, July 21, 2002 - 6:37 am
The reason that ULS was chosen as the default for EFA is that it is much faster. It is a good way to look at a large set of categorical outcomes more quickly that using WLSMV.
 CMW posted on Thursday, December 12, 2002 - 10:27 am

For factor analysis, is it OK to have some items
that are binary and others that are polytomous in
a single analysis?
 bmuthen posted on Thursday, December 12, 2002 - 10:32 am
Yes. See page 180 of the Mplus User's Guide.
 Kyung Kim posted on Tuesday, April 01, 2003 - 6:07 am
Can I calculate factor scores, using M-plus program, for continuous latent variables with categorical (e.g., 5-point or 6-point Likert types) indicators? If not, are there other ways for doing this?
 Linda K. Muthen posted on Tuesday, April 01, 2003 - 6:46 am
Mplus does estimate factors scores for ordered polytomous variables. You ask for them through the SAVEDATA command.
 Per Togo posted on Tuesday, May 20, 2003 - 2:10 pm
I have used a CFA model with three factors
and categorical variables (8 categories)
x BY a b c d;
y BY e d f g;
z BY h i j k;

I could not find a description of how the program computes the factorscores for more than binary variables(using categorical, savedata, fscores). (I'm not that experienced in matrix calculations and need a description for a paper)
Can you help me out?
 Linda K. Muthen posted on Thursday, May 22, 2003 - 6:43 am
Appendix 11 of the Mplus User's Guide contains a description of how factor scores are estimated in Mplus. The description for binary variables which begins with formula 227 applies also to polytomous variables. For categorical outcomes, factor score estimation is an iterative technique.
 Anonymous posted on Monday, June 02, 2003 - 7:17 am
I would like to know how does Mplus estimate the thresholds?
 bmuthen posted on Monday, June 02, 2003 - 7:46 am
When there are no x variables (covariates) in the model, the thresholds are simply the normal z scores corresponding to the univariate probabilities. With x's, they are obtained via probit regression.
 Erkki Komulainen posted on Thursday, June 05, 2003 - 12:29 pm
Hi! Working on a problem which would require a polychotomous regression with both manifest and latent independent variables. I fail to solve how that would be written in the syntax.
 Linda K. Muthen posted on Thursday, June 05, 2003 - 12:38 pm
I'm a little confused. You say you have both manifest and observed independent variables. Are the factor indicators for the latent independent variables continuous or categorical? What is the dependent variable in the model? Is it observed or latent? If it is observed, the statement in the MODEL command would be:

y1 ON x1 f1;

where y1 is an observed binary or ordered categorical variable, x1 is an observed independent variable, and f1 is a latent independent variable. You would also need to identify the y1 variable as categorical by using the CATEGORICAL statement of the VARIABLE command.

Please let me know if this does not answer your question.
 Erkki Komulainen posted on Friday, June 06, 2003 - 12:46 am
Hi! I failed with this syntax:



climate BY a3 a21 isuhde2 asuhde2;
ammk993l ON climate sesdiko ero ;

With TYPE IS LOGISTIC that obviously is impossible.

I solved it by modelling in the first step the latent variable and saved the factor score and did the logistic using the factor score. Erkki
 Linda K. Muthen posted on Friday, June 06, 2003 - 7:38 am
TYPE=LOGISTIC is for a single dependent varirable. If you do TYPE = GENERAL with the CATEGORICAL statement, you will estimate a probit regression for the categorical dependent variable. I think this is what you want to do. There is a summary of analysis types and estimators available for each analysis type on page 38 of the Mplus User's Guide.
 Anonymous posted on Thursday, November 06, 2003 - 4:30 pm
Hi there,

I want to create a latent trait variable with categorical indicators (same variable across 6 waves) and the WLSMV estimator and wanted to check to see if my code was correct.

sui by x1 x2 x3 x4 x5 x6;

Since these indicators do I have to specify some code for the thresholds? Also, how would I interpret the thresholds at each time point?

 Linda K. Muthen posted on Sunday, November 09, 2003 - 10:21 am
The way you are coding this does not take time into account. But you don't mention that you are interested in change over time. So then it is ok. If you are not taking time into account, then it is not necessary to include thresholds in your analysis.
 Anonymous posted on Tuesday, January 04, 2005 - 5:57 am
I am wondering what is the most appropriate way to handle observed (not latent) categorical covariates in a WLSMV framework. In my study X1 is an ordinal variable (6 classes) and X2 is binary.

Currently, this is how I am specifying the model:

Catategorical are p1-p7 a1-a8 d1-d6;

f1 by p1-p7;
f2 by a1-a7;
f3 by d1-d6;

f3 on f1 f2 X1 X2;

Is it alright that the independent X variables are not specified as categorical?

In addition, if I try to estimate the covariance between X variables and F1 F2 (i.e., X1 X2 with F1 F2) I get an error message about X1 being binary (the program then tells me to specify it as categorical). However, if I do not free the covariance between the X1 X2 and F1 F2 the model has a poor fit.

 Linda K. Muthen posted on Tuesday, January 04, 2005 - 6:20 am
You should not put x1 and x2 on the CATEGORICAL list. The CATEGORICAL list is only for dependent variables. You should treat your covariates as you would in regular regression. The binary variable can be left as is. The ordinal variable should be turned into five binary dummy variables. You should not specify any covariance between x1 and x2. The model is estimated conditioned on x1 and x2. If you want to know the covariance between x1 and x2, you can do a BASIC run and obtain this.
 Anonymous posted on Tuesday, January 04, 2005 - 6:31 am
A few follow-ups:

1) Should I specify the covariance between the observed categorical variables (either the binary, ordinal, or both) and the other latent predictors? I know that Mplus does not do this as the default, but it degrades the model fit.

2) If the observed X variables were continuous would your answer to question 1 be different?
 Linda K. Muthen posted on Tuesday, January 04, 2005 - 6:55 am
1. You should not specify the covariances among the x variables. By doing so, you bring them into the model estimation and thereby the assumption of normality applies to them. You are not estimating the model with these covariances fixed to zero. The model is estimated conditioned on the x's. Therefore, the assumption of normality does not apply to them. They are not part of the model so if the fit is poor, it is not because the covariances are zero. They are not. It's the same as regular regression. If you have ten covariates, the parameters of the regression model are an intercept and ten slopes. The model does not estimate the covariances among the x variables.

2. No.
 Anonymous posted on Tuesday, January 04, 2005 - 7:51 am

When I include the categorical X variables into the model (without specifying any covariance), I get the following error message:

 Linda K. Muthen posted on Tuesday, January 04, 2005 - 8:58 am
The only place the categorical x variables should be is on the right-hand side of ON. You should not mention their means, variances, or covariances.
 Anonymous posted on Tuesday, January 04, 2005 - 11:27 am
Last question in this vein. Please excuse my ignorance.

When observed variables are added as covariates to the right side of an ON in an SEM model, should I be concerned that their presence dramatically reduces the fit of the overall model? The modification indicies are indicating that regressing them onto other latent predictor variables in the model would improve model fit.
However, I really am only interested in examining if the latent predictors are significantly related to the latent DV after controlling for observed variables like race, ses, prior delinquency etc.

Thanks for your help.
 bmuthen posted on Tuesday, January 04, 2005 - 4:03 pm
Typically, other latent predictors would be regressed on the covariates since the covariates are often demographics and therefore antecedents - so such relationships should be included. Model fit can also suffer from left-out direct effects from the covariates to the indicators of endogenous factors.
 Anonymous posted on Wednesday, January 05, 2005 - 9:03 am
When I regress the covariates onto the latent predictor variables the model becomes extremely poor fitting. I have been racking my brain trying to figure out why this is occuring, but have made no headway. Do you have time to take a quick peek if I send you the data and syntax?
 Linda K. Muthen posted on Wednesday, January 05, 2005 - 10:03 am
Send two outputs to The one that fits well without the covariates. And the one with the covariates.
 Anonymous posted on Thursday, January 20, 2005 - 7:55 pm

I am looking at a multiple group model with a binary outcome variable. I don't have a categorical latent variable, but only persistence with no=0, yes=1 coding.

partial output:

GROUPING IS ethn69 (1=white 2=latino);

CATEGORICAL = persiste;
GROUPING IS ethn69 (1=white 2=latino);

type = general missing MEANSTRUCTURE h1;
estimator = ML;

*** ERROR in Analysis command
ALGORITHM = INTEGRATION is not available for multiple group analysis.
Try using the KNOWNCLASS option for TYPE = MIXTURE.

I have tried using the KNOWNCLASS option for TYPE=MIXTURE as recommended but I get a message telling me (correctly so) that I don't have a categorical latent variable. I have four latent variables and a binary outcome variable, how/where should I put this variable?
 Linda K. Muthen posted on Thursday, January 20, 2005 - 8:21 pm
To use KNOWNCLASS, you need to use the CLASSES and KNOWNCLASSES option:

CLASSES = cg (2);
KNOWNCLASS = cg (ethn69 = 1 ethn69 = 2);

Check KNOWNCLASS in the index of the user's guide. IT should point to an example.
 Anonymous posted on Saturday, January 22, 2005 - 6:26 pm
Hello again!

Thank you for your prompt reply Linda. I want to thank Thuy for his valuable help with my model as well.

I have the following questions:

I began doing a multiple group analysis and have developed group specific models using the modification indices provided by the outputs.

Now that I will be using the KNOWNCLASS option with TYPE=MIXTURE,

1) How is a group specific model developed?

2) Is it possible to analyze the measurement model first and then the structural model. I have not added covariates but I would like to at least add "primary language" and "income"

2) do I use modification indices from the multiple group analysis models or how do I determine measurement invariance?

3) how are the starting values of the variables for each group selected and for each observed factor indicator for each latent variable? I have four latent variables: GC at time 1 (five factor indicators), GC at time 2 (five factor indicators), academic integration (2 factor indicators), and social integration (2 factor indicators); and two categorical variables: persistence and performance (GPA).

4) Does MPlus analyze indirect effects with the TYPE= MIXTURE model?

in addition I am measuring one latent variable at two time points:

I know that growth mixture modeling is not appropriate since I only have two time periods, how is effect of time taken into account in measuring this model?

Pardon my MPlus illiteracy.

Thank you for your help.
 bmuthen posted on Sunday, January 23, 2005 - 3:43 pm
1) In the Model command you specify group-specific parameters within %c#% statements, e.g.


says that the variance of the factor f is specific to class 1.

2) I think MIs are available in this environment too, but you can also use chi-square difference testing via 2 times the logL differences of nested models.

3) Mlpus provides default starting values.

4) Pleae check with Thuy.

5) Analyze the outcomes for the 2 time points together: if you have 5 outcomes per time point you analyze 10 variables.
 Anonymous posted on Friday, March 25, 2005 - 6:54 am
Drs. Muthén & Muthén,
I have what I hope is a very quick question. I have been reading (and rereading) the Muthén, du Tiot, & Spisic (1997) paper on your robust methods. From what I understand, Mplus uses Equation 44 to obtain parameter estimates when WLSMV is invoked. The weight matrix in Equation 44 is the inverse of a diagonal matrix, W^-1. For WLSMV (and WLSM) I could rewrite Equation 44 by replacing W with Gamma_D where Gamma_D here is the diagonal elements from Equation 48. Is my understanding correct?

Thank you.
 bmuthen posted on Friday, March 25, 2005 - 9:08 am
 Anonymous posted on Thursday, April 28, 2005 - 1:45 am
I'm trying to construct a two-way ANOVA like interaction in MPLUS 3.11 using two observed categorical variables. The interaction is between gender (two categories) and family structure (five categories). The dependent variable is continous measure of antisocial behavior.
In the Mplus user guide (p.420) you list various ways of obtaining interactions for different variable types, but have not included the interaction between observed categorical with observed categorical. Which is the best method to use in that case ?
I have used the following method:
(1)Coded the family structure variable into four dummy codes (the omittet is the reference group) predicting antisocial behavior.
(2) Used multigroup analysis to examine if there is any significant differences between the unstandardized betas (mean values) for boys and girls.

Is this the correct way to do it ? If it is, do you know any litterature that deals with this categorical times categorical variable interaction using multigroup SEM?
Are you allowed to use multigroup analysis when you have multilevel design?

Thanks in advance.
 Linda K. Muthen posted on Thursday, April 28, 2005 - 8:42 am
You can use DEFINE or the multiple group approach as you have done above. I don't know of any reference to this. For some multilevel models, the GROUPING option can be used for multiple group analysis. If not, then the KNOWNCLASS option can be used instead.
 Fred Li posted on Friday, November 18, 2005 - 3:39 pm
Hello !

I am a new Mplus user. Is there anyone who can help me with the following Mplus program that has an error message at the end of it? My intended task was to create a polychoric correlation matrix for categorical variables for further CFA analysis.

Thanks in ADVANCE!!

TITLE: Number Sense in 2005; Note that this program has saved out
the polychoric correlation matrix in a file called test.pcm;

FILE IS "C:\Documents and Settings\USR1\®à­±\NS2005.dat";

NAMES ARE q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17
q18 q19 q20 q21 q22 q23 q24 q25 q26 q27 q28 q29 q30 q31 q32 q33 q34
q35 q36 q37 q38 q39 q40 q41 q42 q43 q44 q45 q46 q47 q48 q49 q50 q51
q52 q53 q54 q55;
USEVARIABLES ARE q2 q3 q4 q5 q6 q9 q10 q11 q12 q13 q14 q15 q16 q17
q18 q20 q21 q22 q24 q25 q26 q29 q30 q31 q32 q33 q34 q35 q36 q37 q38
q39 q40 q41 q43 q44 q45 q46 q47 q48 q49 q50 q51 q52 q53 q54 q55;



CONVERGENCE = 0.00005;

ETA1 BY q2 - q55;

*** ERROR in Savedata command
Only sample correlation matrix may be saved when there is at least
one categorical dependent variable.
 Linda K. Muthen posted on Friday, November 18, 2005 - 4:19 pm
It looks like you have not used the FILE option of the SAVEDATA command to give the name of the file in which to save the correlation matrix. Note that with categorical outcomes, you cannot analyze a correlation matrix and a weight matrix in Mplus but need raw data. You should send these types of questions to and include your license number.
 adthrash posted on Thursday, March 02, 2006 - 9:37 am

I have a SEM model using weighted data that will converge with no errors when all the variables are considered continuous but won't run when some indicators are considered categorical (binary) or count. f1 has 7 binary indicators (yes/no) and f2 has 4 count indicators (0-7 days) and one continuous indicator.

CATEGORICAL ARE x1 x2 x3 x4 x5 x6 x7;
COUNT ARE y1 y2 y3 y4;


f1 BY x1 x2 x3 x4 x5 x6 x7;
f2 BY y1 y2 y3 y4;

f2 IND x8;
f2 IND f1;
f2 IND f3;

The error message I receive is "MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION," which I did not specify. If I comment out either the CATEGORICAL or COUNT lines, I get the same error. If I comment out both, everything runs fine.
 Linda K. Muthen posted on Thursday, March 02, 2006 - 10:31 am
When you use MLR with either CATEGORICAL or COUNT, numerical integration is required. Therefore, MODEL INDIRECT is not available.
 adthrash posted on Friday, March 03, 2006 - 5:56 am
Thanks for the quick response. The model terminated normally when CATEGORICAL was used and I commented out the COUNT and MODEL INDIRECT. However, now I get the error "MODINDICES option is not available for ALGORITHM=INTEGRATION." Could I consider all of my variables continuous in order to get MODINDICES, but use the parameter estimates derived when considering the vars categorical when reporting results?
 bmuthen posted on Friday, March 03, 2006 - 6:33 am
That would only be very approximate. Instead, you might want to explore model variations and do chi-square difference testing via -2*loglikelihood difference.
 Stephan Golla posted on Monday, February 19, 2007 - 4:54 am
Hello,my model contains latent factors, two dependent manifest indicator (d7; d16). However, if I put b3 (male=1) into my model I will get the following error message below:

USEVARIABLES ARE b3 d7 d16 c6 c10 c18 c20 c11 c12 c13 e1-e20 wgt;
CATEGORICAL ARE b3 d7 d16 c6 c10 c18 c20 c11 c12 c13 e1-e20;
MO1 BY c6 c11 c10 c18 c20;
MO2 BY c11 c12 c13;
f1 BY e1-e20* e5@0 e9@0 e13@0 e17@0;
f2 BY e1-e20* e2@0 e9@0 e13@0 e17@0;
f3 BY e1-e20* e2@0 e5@0 e13@0 e17@0;
f4 BY e1-e20* e2@0 e5@0 e9@0 e17@0;
f5 BY e1-e20* e2@0 e5@0 e9@0 e13@0;
d7 d16 ON b3 MO1 MO2 f1-f5;

** ERROR in Variable command
CATEGORICAL option is used for dependent variables only.
B3 is not a dependent variable.
Thanks, Stephan
 Stephan Golla posted on Wednesday, February 21, 2007 - 6:08 am
Sorry, I've just found the answer in one of your web seminar files. We do not distinguish between scales if the variable is an IV (x). However, I guess that in spss user have to specify if they'll use a categorical variable. But this seems not to be necessary with Mplus, right?
Regards, Stephan
 Linda K. Muthen posted on Saturday, February 24, 2007 - 7:17 am
In regression analysis, independent variables (covariates) can be binary or continuous. In both cases, they are treated as continuous. In SPSS, perhaps the specification of categorical for an independent variable automatically creates a set of dummy variables for it. In Mplus, if you have a nominal independent variable, you must create a set of dummy variables.
 Soyoung Lee posted on Wednesday, March 21, 2007 - 6:17 pm
I have a question about the dimension of the weight matrix of WLS/WLSM/WLSMV with categorical indicators in factor analysis.
I understand it is [(p(p+1)/2) x (p(p+1)/2)] when p=# indicators, p(p+1)/2=# univariate & bivariate marginal proportions, in dichotomous case. I wanted to know what it would be in polytomous case. [((C-1)p+p(p-1)/2) X ((C-1)p+p(p-1)/2)] with C=# categories, (C-1)p+p(p-1)/2= (# thresholds) + (# polychorics)?
Also, I found the WLS and WLSMV estimates of thresholds are slightly different. Aren’t they the same since thresholds are the normal z scores corresponding to the univariate probabilities? Or, are these updated in the 3rd stage of the Muthen's WLS procedure? Thanks.

(estimates with WLS)
L01$1 -1.440
L02$1 -0.547
L03$1 -0.137
L04$1 -0.714
L05$1 -1.129
(estimates with WLSMV)
L01$1 -1.433
L02$1 -0.550
L03$1 -0.133
L04$1 -0.716
L05$1 -1.126
 Bengt O. Muthen posted on Wednesday, March 21, 2007 - 7:01 pm
Yes on your first question (assuming all items have the same number of categories C).

The model-estimated thresholds are different between WLS and WLSMV because in WLS it is acknowledged that the sample statistic thresholds are correlated among themselves and with the polychoric correlations, while with WLSMV a diagonal weight matrix is used (no correlations at all). The Muthen (1978) Psychometrika article details this in terms of WLS vs ULS.
 Gemma vilagut posted on Tuesday, April 17, 2007 - 10:16 am
We are trying to perform CFA with M-PLUS, including both continuous indicators and ordered polytomous indicators (likert- 5ctg).
I ran CFA indicating which variables were categorical (CATEGORICAL instr) and WLSMV estimator. Result: the continuous items have big residual variances (r-sq<0.2), small loadings (sdtYX <0.45)and not very good GOF indices.
However, when treating all the variables as continuous (and MLM estimator), I obtain better results. The results are not awesome, but the most usual GOF indices are over the recommended thresholds (CFI and TLI > 0.95, RMSEA=0.015, Standardized Root Mean Square Residual=0.049). Also, residual variances are relatively small (except for one item, FD21= 0.756).
Chi-Sq test is significant in both analysis, but sample size is 4400 indiv.
I understand that it is more appropriate to treat the variables as categorical, specially taking into account the highly skewed distributions of our items, but the fact that I obtain better results with the continuous variables approach makes me doubt.
Could you tell me which is the best approach to analyse this data? does it make sense to compare GOF indices results of both methods and use the one that provides better fit? Or is it incorrect to analyse ordered categorical items not taking into account that they are actually categorical (i.e. not using CATEGORICAL statement)?

Many Thanks!
 Linda K. Muthen posted on Tuesday, April 17, 2007 - 11:08 am
If your items have strong floor or ceiling effects, treating them as continuous results in attenuated correlations. The low correlations can result in better model fit because there is less power to reject the null hypothesis. Categorical items with strong floor or ceiling effects should be treated as categorical.
 David Kerr posted on Monday, May 28, 2007 - 9:50 am
Hello---I’m trying to test a model that has latent variables each comprised of categorical indicators (questions that can be answered agree, disagree, or don’t know).

The “don’t knows” are of interest, so I thought I would create two dummy coded variables: one coded 1 = agree, 0 = disagree/don’t know, and another coded don’t know = 1, agree/disagree = 0. My thought was that I might be able to create a “don’t know” factor that would capture this response type, and make the other factors more interpretable in terms of agree/disagree.

However, when I put these variables in the model at the same time, I got error messages indicating that correlations between variables result in empty cells (i.e., no one can have a value of 1, 1). Is such a model with dummy coded dependent variables possible?
 Linda K. Muthen posted on Monday, June 04, 2007 - 11:09 am
I think there is a literature on how to treat "don't know". You may find some suggestions there on how to treat this category.

Categorical outcome methodology uses information from bivariate tables of outcomes. A zero cell in a bivariate table implies a correlation of plus or minus zero resulting in estimation problems.
 Magda Mónica Martins Rocha posted on Thursday, April 17, 2008 - 7:01 am




F1 by A B C D;
F2 by E F G H;
F3 by I J L M;
F3 F2 ON F1;
F3 F2 F1 ON AGE;


 Linda K. Muthen posted on Thursday, April 17, 2008 - 12:28 pm
The first indicator in fixed to one as the default to fix the metric of the factor. See the BY option in Chapter 16 of the Mplus User's Guide for a description of this. A fixed parameter has no standard error.
 Magda Mónica Martins Rocha posted on Friday, April 18, 2008 - 3:00 am
What a fool i am.

Thank you Linda.
 Angela Wolff posted on Wednesday, July 23, 2008 - 7:18 am
I'm trying to specify an interaction model using MPLUS 5.1

The interaction is between age(H9AGE, continous) and perceieved diversity (C1SAGEr, six categories). The dependent variable is categorical measure of emotional exhausion, depersonalization, and personal accomplishment

In the Mplus user guide (p.522) you list various ways of obtaining interactions for different variable types.

I have specified the model as follows:

B20PA B21PA B26PA;







1. should the IV be correlated?
2. Is this the correct way to do this? Would it be better to use multigroup analysis to examine this interaction?

 Angela Wolff posted on Wednesday, July 23, 2008 - 9:22 am
Also, I was trying to run a model with multigroup analysis to test the interaction. However, I received this warning:

Based on Group 1: Group 2 contains inconsistent categorical value for B25EE: 6
 Linda K. Muthen posted on Thursday, July 24, 2008 - 9:20 am
1. Means, variances, and covariances of exogenous observed variables should not be part of the MODEL command. The model is estimated conditioned on these variables. If you want to know these values, ask for SAMPSTAT in the OUTPUT command.

2. You cannot use multiple group analysis when one of the variables in continuous. The way you specified it is fine.
 Scott J. Peters posted on Thursday, October 16, 2008 - 6:17 pm
I have a data set for categorical CFA involving 12 categorical variables on a single factor. I keep receiving the error:


When I look at the output correlation matrix, I see that one variable is correlated 999.00 with two other variables while all of the rest are fine. There are no odd values in my data set.

Any thoughts for why this is happening?

 Linda K. Muthen posted on Thursday, October 16, 2008 - 6:31 pm
It sounds like there is a problem computing the sample statistics. Please send your input, data, output, and license number to
 Maren Winkler posted on Monday, July 27, 2009 - 1:57 am

I'm doing CFA for a new developed test. It comprises 23 items which belong to 14 testlets. Hence, testlets comprise 1 to 4 items.
Items are scored as either right (1) or wrong (0). To account for dependencies among items belonging to one testlet, testlet scores are computed (sum of correct items per testlet).
Maximum testlet scores vary between 1 (for testlets with only one item) and 4 (for testlets with four items).
I've done CFA, using the CATEGORICAL option.

Because I was not sure whether the indicators need to have the same maximum value, I rescaled all testlets scores - 12 being the maximum number (all items correct) and 0 the minimum number. For testlets with four items, the testlet score can be 0, 3, 6, 9, 12; for testlets with three items 0, 4, 8, 12; for testlets with two items 0, 6, 12; for testlets with one item 0, 12.
Re-doing CFA my results differ in magnitude with regard to chi square, fit indices, factor loadings.

Which results to trust now? Is a common scale with the same maximum score for all testlets necessary or shall I leave the data as it is?

Thank you very much for your help!
 Linda K. Muthen posted on Monday, July 27, 2009 - 9:10 am
I think the approach described in the first paragraph is most transparent. The approach described in the second paragraph makes assumptions that may not be true. I would leave the data as is.
 Maren Winkler posted on Monday, August 24, 2009 - 3:59 am

there is a paper by Browne and Du Toit from 1992 ("Automated Fitting of Nonstandard Models", Multivariate Behavioural Research, 27 (2), 269-300)where they suggest estimating "Root Deterioration per Restriction RDR" for model comparison. The formula is
RDR = square root of ((delta chi² - delta df)/(N* delta df))

I'm using the WLSMV estimator for categorical data. Am I correct in assuming I should use the delta chi² and delta df I get in my output when I do the chi-square difference test using the DIFFTEST option for the above formula?

Thanks for your help!
 Linda K. Muthen posted on Monday, August 24, 2009 - 8:31 am
I think the paper by Browne and DuToit is for WLS not WLSMV.
 Maren Winkler posted on Tuesday, August 25, 2009 - 1:39 am
Is it not possible to use RDR for WLSMV? Or is there an alteration of RDR for WLSMV?
 Linda K. Muthen posted on Tuesday, August 25, 2009 - 7:38 am
No, this is not possible. For WLSMV, the chi-square test statistic and degrees of freedom are adjusted to obtain a correct p-value. Only the p-value should be used.
 Cecily Na posted on Friday, December 03, 2010 - 2:52 pm
Dear Linda,
I read a textbook and where there is an Mplus syntax sample.
It looks like they use a * for underlying distribution of a categorical or dichotomous indicator.
For example:
f1 by q1*-q12*

where q1-q12 are categorical variables. Do we need to use *?

Do I have to write
Criminal* ON drug

where criminal is a categorical variable (but not a latent factor)?
 Linda K. Muthen posted on Friday, December 03, 2010 - 4:23 pm
No, you don't need to use an asterisk. The way Mplus knows criminal is categorical is because you put it on the CATEGORICAL list.
 Sabine Spindler posted on Tuesday, May 03, 2011 - 7:48 am
Dear Dr.s Muthén

I am trying to do a multiple groups comparison, however, I get the following error:

Based on Group 0: Group 1 contains
inconsistent categorical value for Q61RAW: 1

What exactly does it tell me?

Thank you!
 Linda K. Muthen posted on Tuesday, May 03, 2011 - 7:54 am
It tells you that the number of categories for Q61RAW is not the same in both groups. Check your data. The groups must have the same number of categories for the categorical variables.
 Sanja Franic posted on Wednesday, November 16, 2011 - 1:32 am
Hi, I get the following warning when trying to run a multigroup analysis on categorical variables:

Based on Group 1: Group 2 contains
inconsistent categorical value for Y21: 5

The observed variables are all defined as categorical. Could you maybe give me a hint as to what this warning could indicate?

 Linda K. Muthen posted on Wednesday, November 16, 2011 - 2:16 pm
Apparently, y21 does not have the value 5 in group 2. With weighted least squares, each group must have the same categories for categorical variables. You could collapse categories in group 1. With maximum likelihood estimation, you can use the * setting of the CATEGORICAL option.
 Sanja Franic posted on Wednesday, November 16, 2011 - 10:08 pm
Thanks!! I hope I can do the same/similar with WLSMV estimation.
 Sarah Ryan posted on Friday, September 21, 2012 - 1:04 pm
I just want to confirm something and I have searched the discussion board and web, but can't find an explicit answer.

Is it correct that the output section "UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES" provides the model estimated (as opposed to observed) proportions and counts?
 Linda K. Muthen posted on Saturday, September 22, 2012 - 3:35 pm
The are not model estimated values. They are sample statistics.
 Sarah Ryan posted on Monday, September 24, 2012 - 6:13 am

Okay, I originally thought so, but I get "fractions" of people in the sample counts and a colleague was insisting these must be estimates given that the counts were not whole numbers.

So, the ".XX" on counts is just a function of the proportions also being fractional?

Thanks for indulging me on this rather basic question.
 Linda K. Muthen posted on Monday, September 24, 2012 - 11:54 am
Please send the output and your license number to so I can exactly what you are seeing.
 muhammed erkoç posted on Tuesday, October 02, 2012 - 7:12 am

I am a student in Ege University, TURKEY. I managed to generate data sets with continious indicators. But my homework is generating data sets for my friends' homework using Mplus. The programme is only available in computer labs and i usually have limited time to use. I am trying to generate data sets with ten items; 6 item for factor 1 and 4 items for factor 2 which are on 5 point likert scale. I know it sounds simple but I couldn't.

Here is my syntax:

names = y1-y10;
nobservations = 200;
nreps = 25;
seed = 2344887;
generate = y1-y10 (5 p);
categorical are y1-y10;
repsave = all;
save = odev*.dat;
f1 ON y1-y6*.65;
f2 ON y7-y10*.70;
f1 WITH f2*.2;

As you see I can't reach the solution. Would you please help on my syntax or show me a simple example.

 Linda K. Muthen posted on Tuesday, October 02, 2012 - 10:42 am
See mcex5.2.inp. It generates data for two factors with binary indicators. It is available on the website.
 muhammed erkoç posted on Wednesday, October 03, 2012 - 12:54 am
Thanks for your help.
 Claire  posted on Wednesday, November 14, 2012 - 7:31 am

I would like clarification regarding some issues in the interpretation of path models using WLSMV and theta parameterization. In my model:

Y ON X1 X2 X3

X2 ON X1

X3 ON X2

Model indirect:

X1-X3 are binary and Y is continuous. If I'm right in my interpretation the coefficients for Y ON X1-X3 are normal linear regression coefficients and X2 ON X1 and X3 ON X2 are probits.

1. If that's right how do I interpret the indirect effect when it is the product of two probit coefficients and a linear coefficient?

2. Also if I wanted to standardize the results which solution would I use - StdYX or Std or are neither appropriate?

3. Using WLSMV (theta) can any of the normal fit statistics be used to assess model fit?

Many thanks.
 Linda K. Muthen posted on Wednesday, November 14, 2012 - 11:58 am
Yes, y is linear and the x's are probit if you are using the default estimator WLSMV.

1. Linear
2. StdY
3. Yes.
 Claire  posted on Thursday, November 15, 2012 - 1:52 am
Thanks Linda (forgive me as I'm new to Mplus/SEM)

Re: 2. When asking for the standardized output it only gives me StdYX and Std. Is there a way to get StdY? If not is it sufficient to just report the unstandardized estimates?

Re: 3. Which particular fit statistics are safe to use when using WLSMV (theta) and do you know of a reference to read on this?
 Linda K. Muthen posted on Friday, November 16, 2012 - 12:12 pm
Divide StdYX by the standard deviation of x.

I know of no paper that has studied this. I would use the same fit statistics as for Delta which are the ones provided in the results section.
 Trang Q. Nguyen posted on Friday, November 30, 2012 - 1:38 pm

I am running a structural model where I have 10 3-point likert indicators that are theorized to measure 4 constructs, 3 being predictors and 1 being the outcome. I want to declare these variables as categorical ordinal. However, one of the predictor constructs has only one indicator, so if I just include it as an independent variable, it cannot be declared as an categorical ordinal. Is it OK to include a factor for it (to make it a y so I can declare in as categorical) and fix both the loading and variance of that one-indicator factor at 1? Would this be equivalent to using the underlying continuous version of the categorical variable?

Below is my input. The variable in question here is var7.

NAMES = var1-var10;
USEVARIABLES = var1-var10;
CATEGORICAL = var1-var10;

f1 BY var1* var2 var3;
f2 BY var4* var5 var6;
f3 BY var7;
f4 BY var8* var9 var10;
f1@1 f2@1 f3@1 f4@1;
f4 ON f1 f2 f3;

Thanks much!
 Linda K. Muthen posted on Friday, November 30, 2012 - 3:21 pm
Yes, this would be equivalent to doing that. I'm not sure why you would want to do this given that you make unnecessary normality assumptions. I would use the observed variable as the covariate or create a set of dummy variables to represent the categories and use them as covariates.
 Trang Q. Nguyen posted on Saturday, December 01, 2012 - 8:31 am
Thank you. Yes, I need to think about whether to do this or to use dummy variables. The pro of using an assumed normal continuous version of var7 is the result will be simple -- just one regression coefficient, similar to one coefficient for each of f1 and f2. Conceptually these are three things hypothesized to influence f4, and it would be nice to have three coefficients instead of four. The con is introducing inaccuracy through normality assumption.

Var7 has three ordinal categories and there are roughly the same number of people in each category. Does this suggest that it is perhaps OK to assume normality for the underlying continuous variable? Or should I still worry that it might be skewed and the equal proportions could be just an artificial result of the thresholds?

In practice, is it something that is sometimes done, to treat an ordered categorical independent variable this way? Or is it a no no?

Thanks much for advising me on this. I am quite inexperienced in SEM.
 Bengt O. Muthen posted on Saturday, December 01, 2012 - 5:09 pm
I vote for treating the observed 3-category Var7 as a continuous variable - that is, as a regular covariate. I say that because you say its distribution is symmetric and because when you treat it as a regular covariate you don't introduce an assumption of underlying normality.
 Trang Q. Nguyen posted on Monday, December 03, 2012 - 7:27 am
Thank you so much!

So that I am crystal clear, your suggestion is to include the raw variable as a continuous variable (ie take it out of CATEGORICAL =), and not introduce that f3 factor?
 Bengt O. Muthen posted on Monday, December 03, 2012 - 1:37 pm
 Trang Q. Nguyen posted on Monday, December 03, 2012 - 2:01 pm
Thank you!
 Maren Formazin posted on Wednesday, February 13, 2013 - 8:10 am

is it possible to use the AUXILIARY (M) command with categorical data? I get an error message when trying but I seem to recall it was possible with Mplus 5? Is there a way around this problem?

 Linda K. Muthen posted on Wednesday, February 13, 2013 - 10:34 am
All variables on the AUXILIARY list are treated as continuous. There may not have been an error message in Version 5.
 Maren Formazin posted on Friday, February 15, 2013 - 1:46 am

sorry, my question was not clear: not only my auxiliary variables, but also my indicators of the latent variable are categorical; hence, I use the WLSMV estimator.

Is the AUXILIARY (M) option not possible with categorical indicators?

 Linda K. Muthen posted on Friday, February 15, 2013 - 10:46 am
AUXILIARY (m) is available only for continuous variables.
 Juliana Werneburg posted on Wednesday, March 13, 2013 - 11:55 am

I would like to model some kind of twin model: that is two latent variables, each measured by 3 binary indicators. Measurements might be parallel. I have other covariates to add.

I tried to model this by factor analysis, that is i have two continuous latent variables measured by 3 indicators each. Though i have a grouping variable. It results in very good model fit (all Chi², RMSEA, TLI, Chi² difference tests). So i resume the indicators are good measurements of some underlying factor.

But, if I try to fit the model using LCA as a framework, that is the latent variables shall both be binary classes (with 3 the same result), model fit turns out to be awful.

Is there any reasoning why this doesn't work, although the indicators seems to be fine? Is there anything to try out or to look at as an explanation?

Thank you very much for your help,
 Linda K. Muthen posted on Wednesday, March 13, 2013 - 6:02 pm
With the LCA for twins, you should have two latent class variables, one for twin a and one for twin b. Each one should have two classes. You need to use PARAMETRIZATION = LOGLINEAR and use WITH to correlate the two categorical latent variables, for example,

c1 WITH c2;
 Wendong  posted on Wednesday, April 24, 2013 - 6:04 pm
Hi Dr. Muthen,

May I ask a quick question?

I know strictly speaking, we should treat Likert-type (e.g., 1-5 point) item level data as ordinal and thus in SEM we should use estimator=MLSMV.

But it seems that many people treat such scales as continuous. Thus my question is, could we still treat such items as continuous, but use WLSMV as estimator, to deal with data skewness at item level?

Thank you so much!
 Linda K. Muthen posted on Wednesday, April 24, 2013 - 6:33 pm
You can treat them as continuous as long as they don't have floor or ceiling effects. If they do, I would treat them as categorical because that methodology can handle the floor and ceiling effects.

You can't use WLSMV without having at least one categorical variable in the model.
 Lois Downey posted on Thursday, May 01, 2014 - 11:44 am
I have a situation in which the univariate proportions/counts listed in the output do not make sense.

If I run an unclustered model with the full sample, the univariate values are correct.

If I run a clustered model with the full sample, the univariate values continue to be correct.

If I select a subsample with the USEOBSERVATIONS command and run an unclustered model, the univariate values continue to be correct -- although, as expected, smaller than for the full sample.

However, if I select a subsample with the SUBPOPULATION command and run a clustered model, the univariate values are incorrect. In fact, the counts aren't even integers.

What might I be doing incorrectly?
 Linda K. Muthen posted on Thursday, May 01, 2014 - 11:51 am
Send the USEOBSERVATIONS and SUBPOPULATION outputs and your license number to
 jane shen posted on Friday, June 27, 2014 - 12:24 pm

When sampling weights added in the model, what is "UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES"?

They are no longer sample statistics for the raw data, nor for the weighted data.

Mplus version: 7.11


 Bengt O. Muthen posted on Sunday, June 29, 2014 - 11:10 am
Please send your input, output, and data to
 Samuli Helle posted on Wednesday, May 13, 2015 - 11:30 am
I have a question of how to interpret and to best standardize the estimates of the model below. My main interest lies on the regression coefficient of continuous time survival outcome Y on factor F, which is measured by one count (marker) indicator (X1) and two binary indicators (X2, X3).

F BY X1 X2 X3;

1) Not sure how the factor F is scaled: a count indicator X1 is used to set the scale, but based on the model output, the link function used is logit. Shouldn’t the link in count outcomes be log instead? So, in this case, how to interpret the factor scale?

2) Given the above, what’s the best way to standardize here? I know that it makes no sense to standardize the binary indicators, but should I use STDYX standardization when interpreting the regression coefficient of Y on F?
 Bengt O. Muthen posted on Thursday, May 14, 2015 - 10:53 am
Set the metric in f by fixing its variance at 1 and free the first loading.

The count link is not mentioned in the Mplus output but it is the usual elog link of counts.

So then you don't need to worry about standardizing.
 Samuli Helle posted on Saturday, May 16, 2015 - 5:01 am
Thanks Bengt. I did as suggested and the results changed a bit - the regression coefficient of Y on F had a different sign now (although non-significant in both cases). Is this what to expect?

Moreover, as someone not too familiar with latent variables, does that scaling approach change the interpretation of the latent variable (e.g. is the latent still measured with "error" although its variance is fixed)?
 Bengt O. Muthen posted on Saturday, May 16, 2015 - 8:06 am

The sign should not change if you do it right. If yuo like you can send the 2 outputs (for different metric settings) to support along with your license number.

Q2. No; same interpretation.
 Samuli Helle posted on Saturday, May 16, 2015 - 10:12 am
The code I am using is:

Y BY X1 X2 X3


Y BY X1* X2 X3; Y@1;
 Bengt O. Muthen posted on Monday, May 18, 2015 - 12:26 pm
Send the 2 outputs that show the sign change to support along with your license number.
 Kristy Snyder posted on Tuesday, January 12, 2016 - 12:57 pm
A SEM with 2 continuous latent factors measured by y (cont.), u (count), & n (nominal)

n_a & n_b have 4 categories n_c has 3

DEFINE: n_a1 = n_a == 1;
n_a2 = n_a == 2;
n_a3 = n_a == 3;
n_c2 = n_c == 2;

1) How can I evaluate the contributions of each set of dummy variables?

MODEL: f1 BY y1 - y4 u1 - u3;
f2 BY y5 (n_a set) (n_b set) (n_c set);

outcome ON f1 f2;

2) The f2 indicators measure a common construct but the (n_a set) measure a different facet of that construct than the other f2 indicators. If it is uncorrelated with the other indicators it won’t load highly on that factor but it is an important component of the construct so I want to be sure that its influence on the outcome variables isn’t underestimated. Would you suggest making an interaction term?
f2 BY y5 (n_b set) (n_c set);
f2int | (n_a set) XWITH f2;

outcome ON f1 f2int;

Would using the f2int be appropriate as a measure of the main effect of a single construct or would it be interpreted as a 3 way interaction?
 Bengt O. Muthen posted on Wednesday, January 13, 2016 - 12:33 pm
1) I don't think you should split up the nominal outcomes into dummy variables. Do the factor analysis with them declared as nominal.

2) Perhaps you want to use a bi-factor model where the "different facet" is a "specific factor" different from the "general factor".

I don't see how interactions would help.
 Yanick Brice posted on Saturday, January 16, 2016 - 5:54 am
For earlier versions of Mplus, if you have a nominal independent variable, you must create a set of dummy variables. Is this statement still true with the most recent version of Mplus (7+). A lot of my IVs are nominal (up to 6 categories), is there an efficient way of handling them in Mplus in my two-level logistic regressions? Thank you.
 Bengt O. Muthen posted on Saturday, January 16, 2016 - 6:36 am
Nominal IVs need to be turned into dummy variables.
 Pia H. posted on Thursday, March 24, 2016 - 5:52 am

I'm having a quite confusing error message:

Based on Group 2: Group 1 contains
inconsistent categorical value for SUPPORT_R: -201

I have had those ones before and got rid of them simply by collapsing categories; in this variable, however, all categories have been chosen in group 1 and group 2 (I checked); moreover, I don't understand why it says -201 instead of the categories available (1-5).
Could you give me a hint what this may mean?

Thank you very much!
 Linda K. Muthen posted on Thursday, March 24, 2016 - 8:17 am
It sounds like you are not reading the data correctly.
 Owis Eilayyan posted on Friday, November 25, 2016 - 12:43 pm
Hello Dr Muthen,

I am working on SEM model and I have many categorical indicators. I would like to ask if there is a minimum number of observations that should I have in each category.

Thank you,
 Bengt O. Muthen posted on Friday, November 25, 2016 - 3:18 pm
No, but see our FAQ:

Estimator choices with categorical outcomes
 Owis Eilayyan posted on Saturday, November 26, 2016 - 4:16 pm
Thank you,

The document was very helpful
 Diane Martinez posted on Friday, December 23, 2016 - 2:55 pm
Hi I am having a bizarre experience in MPlus not correctly recognizing binary variables. Basic univariate descriptive analysis conducted when my Stata file was converted to a .dat file showed my binary variables as being 0 minimum value (no) and 1 maximum value (yes).

However, for the code:

Basic descriptives for Instrumental construct with binary indicators WITHOUT survey weights
Data: File is youngadultPATHDec23.dat;
NAMES = flavors smell helpquit socialize alttoquit;
CATEGORICAL = flavors smell helpquit socialize alttoquit;
Missing are all (-9)
TYPE = Basic;

I keep getting the following error message:
Categorical variable FLAVORS contains 3887 categories.
This exceeds the maximum allowed of 10.

How can I get MPlus to recognize my variables as binary instead of 3887 categories (which is my total sample size)?

 Linda K. Muthen posted on Friday, December 23, 2016 - 2:59 pm
You either have more variable names in the NAMES list than columns in the data set, you have blanks in the data set which are not allowed with free format dat, or you are saying that the id variable is the categorical variable by the placement of the variable name on the NABES list. All of these will cause the data to be read incorrectly.
 Diane Martinez posted on Friday, December 23, 2016 - 3:03 pm
Thank you so much Dr. Muthen for your immediate response!! I greatly appreciate it and will troubleshoot this further.

Happy Holidays!

Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message