Message/Author 

Anonymous posted on Tuesday, January 25, 2005  1:01 pm



There are 6 independent variables in the SEM model I am running: X1 X6 to predict 5 dependent variables (one is a category variable). X3, X4, and X6 are latent variables, and X1, X2, and X4 are observed variables. X4 is a category variable (dichotomous). Using Mplus, I can get correlations among latent independent variables, observed independent variables, and latent and observed independent variables. But I can not get X4 to correlate with any other independent variable. Is there a way to do it? Thanks. 

BMuthen posted on Tuesday, January 25, 2005  2:05 pm



You should regress the latent exogenous variables on the observed exogenous variables to make them correlate. 

Anonymous posted on Tuesday, January 25, 2005  2:33 pm



I am sorry that I made a mistake. It should be that X1, X2, and X5 are observed variables and X5 is a category variable (dichotomous). X1 to X6 variables are all time 1 variables. If I regress the latent exogenous variables on the observed exogenous variables (e.g., X3 on X5;), doesn't that make it a path rather a correlation? When I used "X5 WITH X1 X2 X3 X4 X6;" the warning is : *** FATAL ERROR VARIABLE X5 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL. How should I respond? Thank you very much. 

michael posted on Wednesday, January 26, 2005  9:21 am



Dear Linda and Ben, I have a question concerning a path model with both latent and observed explanatory and dependent variables. Some of the observed explanatory variables are binary, in other words: dummy variables coded 0 and 1. As shown in the MPlus manual, non continuous dependent variables can be defined by the "CATEGORICAL ARE ;" command. But what about categorical independent variables? For them, there isn't any definition, as far as I can see. My questions: 1. Hence, is the interpretation of their effects just the same as the interpretation of e.g. dummy variables in logistic regression? 2. Does it make sense to centre these dummy variables, and if so, is it recommended? DATA: FILE IS d:\tatenm.t; VARIABLE: NAMES ARE ab un mi a fr g16 g17 g18 g1 g2 q1 q2 q3 q4 q5 o ot s; CATEGORICAL ARE ot s; MODEL: f1 by g16 g17 g18 g1 g2; f2 by q1 q2 q3 q4 q5; f1 on un ab mi a o; f2 on f1 un a fr; ot on un ab a fr f1; ot with f2; s on ot un ab a o fr f1 f2; In the above example, "ab un mi" are educational degrees. ab=higher secondary educ., un=university, mi=high level of lower secondary education. There remains the reference group: low level of lower secondary education. 3. In path analysis and structural equation modelling with both unobserved and observed independent variables: Is it correct to split the nominal variable "educational degree" into four dummies and to include three of them into the model  just as usual in regression analysis? Thanks a lot, Michael Windzio, Hannover (KFN), Germany 


The scale of the indepdendent variables does not affect model estimation. This is the same as in regular regression. 1. It depends on the estimator. With weighted least squares and categorical dependent variables, probit regressions are estimated. With maximum likelihood and categorical dependent variables, logistic regressions are estimated. 2. No. 3. Yes, as in regular regression independent variables can be binary or continuous. 

Anonymous posted on Wednesday, January 26, 2005  2:45 pm



you mentioned that "the scale of the indepdendent variables does not affect model estimation". But when I used "X5 WITH X1 X2 X3 X4 X6;" the warning is : *** FATAL ERROR VARIABLE X5 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL. How should I specify X5 which is not a dependent variable? Thank you very much. 


You should not specify the covariances among the observed x variables. They are not estimated as part of the model as in regular regression. If you can't get this straightened out, send the complete output to support@statmodel.com. It is difficult to understand your full model without seeing the complete input. 

Anonymous posted on Thursday, January 27, 2005  6:55 am



Dear Dr. Muthen, I will do that and thank you very much for your time. 

Anonymous posted on Thursday, February 17, 2005  12:56 pm



Dear Dr. Muthen, So if the independent variables are categorical, we just use WLS as the estimation method to run regular regression models for path analysis or SEM models. Is that right? I am just a little confused how to choose the estimation method when I run SEM models. Thank you so much! 


If you look up ESTIMATOR in the Mplus User's Guide, you will find a table that summarizes the estimators available in Mplus and shows the defaults for various situations. Note that the scale of the independent variables are irrelevant for choosing an estimator. If you are unsure which estimator to use, go with the Mplus default. 

Bonnie posted on Wednesday, March 02, 2005  9:12 am



Dear Linda , I am still confused about the SEM models with nonnormality or categorical independent variable. In my SEM model, I have two continuous factor indicators PBIcare and PBIcontrol, the underlying latent variable is PBI; and the mediator and dependent variable CG are all normally distritbuted continuous variables. PBIcontrol is also normal. But PBIcare is highly skewed. I tried many ways to transform it, but failed. In this case, which estimation method should I use if the normality assumption is violated? Or, I can create a dummy variable based on PBIcare to replace it, then in this case, which estimation method should I use? Or say, do I need to create a dummy variable when the normality assumption cannot be met. I would appreciate it! Best, Bonnie 


The scale of indepdent variables does not affect estimation. Independent variables can be binary or continuous. If you have a three category independent variable, you should create two dummy variables. This is the same as in regular regression. 

Anonymous posted on Wednesday, April 20, 2005  2:19 am



I do have a model with an exogenous latent variable (f1) measured by categorical indicators, three independent dichotomous variables and two endogenous categorical variables. I have not specified the covariances among the independent variables since they are not estimated as part of the model. It results in very low fit measures. If a create a latent variable behind all my dummy variables (by fixing the factor loading to one and the residual variance to zero  i.e. setting the latent variable equal with the dummy indicator), I am able to specify the covariances among the independant variables and it results in excellent fit measures. It also makes theoretically sense to correlate the independant variables. How come these huge differences in the fit measures although it is practically the same model? 


Instead of including the covariances among the exogenous latent variable and the exogenous observed variables, you should regress the latent variable on the observed variables. This both allows for the relationships to be included in the model and also makes the analysis stay in the regressionbased framework rather than the correlationbased framework. 


León, Nicaragua 27 de abril del 2005 Dear Dr. Muthen, Estoy tratando de realizar un analis de trayectoria PATH ANALYSIS resulta que en este tengo todas mis variables (Exogenas y Endogenas) SON DUMMY (0 1) e leido un articulo escrito por NOBOUOKI ESHIMA, MINORU TABATA and GENG ZHI titulado PATH ANALYSIS WITH LOGISTIC REGRESSION MODELS, esto quiere decri que cuando tenemos estos tipos de variables el modelo no se estima adecuadamente atraves de la regresion lineal comun y corriente Marlon Melendez 


We have no one on our staff who speaks Spanish. Can you please post in English. Thank you. 


León, Nicaragua 28 de Abril del 2005 Center for Demographic and Health Research Dear Dr. Muthen 1) Path analysis is usually perormed for continuous variables by using linear regression equation, and the basic idea is aplied to the analysis of causal systems of continuos variables, LISREL model. In comparasion with path analysis of continuous variables, that of categorical variables is complex, because the causal system under considertation cannot be described by linear regression equations. 2) I am trying to carry out path analysis with categorical variables (all my variables are qualitative,[as much the exogenous ones as the endogenous ones]) . that I can make?. 3) Where can find information of path analysis with categorical variables?. excuse me my English Marlon Osman Melendez Rodriguez 


The only reference that I have is: Xie, Y. (1989). Structural equation models for ordinal variables. Sociological Methods and Research, 17, 325352. Mplus can estimate path models using either probit or logistic regression. 


León, Nicaragua 28 of April of the 2005 Dear Dr. Muthen Excuse me, can MPLUS carry out the PATH ANALYSIS with variable dummy? Marlon Osman Melendez Rodriguez 


Yes, Mplus can estimate a path model with dichotomous outcomes. The default is to estimate probit regression coefficients but Mplus can also estimate logistic regression coefficients. 

Anonymous posted on Monday, May 02, 2005  1:45 am



Referring to the Posting of April, 20 (2:19 am): I probably put my problem mistakably in words. I do have one exogenous latent variable which is measured by three indicators (ordinal scale). Besides that I do have three other exogenous dummy variables, which are correlated with the before mentioned latent variable, but not in a causal relationship. If I fix a correlational relationship between the exogenous variables (with the command WITH), then I run into the same problems as the anonymous poster before me (*** FATAL ERROR VARIABLE XY CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL.) So, I created a latent factor behind each of the dummy variables (by fixing the factor loading to one and the residual variance to zero  i.e. setting the latent variable equal with the dummy indicator). This results in very good fit measures, while leaving out a correlation between the exogenous variables leads to very bad fit maesures. How come these huge differences between practically identical models? 

Anonymous posted on Monday, May 02, 2005  4:18 am



To be able to use all dummy variables (and not only k1 dummy variables), one has to suppress the intercepts? How to do that? Just fix the intercepts to zero? 

bmuthen posted on Tuesday, May 03, 2005  2:55 pm



Answer to the May 02, 01:45 question: With exogeneous latent and exogeneous observed variables, Mplus does not default to having these 2 sets of variables correlated when the outcomes are categorical. The best way to correlate the 2 sets is to regress the latent variables on the observed variables. It is advantageous to have exogeneous observed variables in the model so that the "regressionbased" as opposed to the "correlationbased" estimation approach can be used (see Muthen's 1984 Psychometrika article). 

bmuthen posted on Tuesday, May 03, 2005  2:57 pm



Answer to the May 02, 04:18 question: In principle, yes the intercepts should be supressed. However, even so, using all dummy variables causes a singular sample covariance matrix and Mplus does not allow that. 

Anonymous posted on Tuesday, September 06, 2005  2:05 am



Referring to the posting of May 03, 2:55 pm: My reviewer asks me why it is not technically possible to make an exogenous manifest and a exogenous latent variable (there is no causal relationship between them) to correlate when the outcome is categorical. Is there an article to which I could refer? 


It is possible to include such a correlation. However, you then make distributional assumptions about the observed variable which you do not do otherwise. 

Anonymous posted on Tuesday, September 06, 2005  1:50 pm



Excuse me, I forgot to mention that the exogenous manifest variable is binomial. And there is seemingly no possibilty to correlate this variable with the other latent exogenous variable. You advised to regress the latent on the binomial but this seems to me theoretically not appropriate, since there is no causal relationship between them. 

bmuthen posted on Tuesday, September 06, 2005  3:01 pm



Exogenous observed (manifest) variables are treated as continuous so if your manifest variable is binomial that is not explicitly taken into account  and this seems alright. As mentioned in the earlier posts it is possible to correlate the manifest with the latents, but it comes at the price referred to in my original May 03 post. Regression seems a better alternative  I don't see that as necessarily implying causation. 


Good morning, I am attempting to use dummy variables representing different racial groupings as exogenous variables in my SEM. In initial attempts I used the binary coding strategy to create the dummary variables where, for example: (1) African American = 1, Asian / Pacific Islander or Amerindian or Other = 0; (2)Asian / Pacific Islander = 1, African American or Amerindian or Other = 0; (3)Ect. (4) and where the dummy variable forWhite is used as the reference group (and thus excluded from the model). This strategy had the effect of excluding a portion of the study sample from the analysis. Here is an example of the message that is generated in the output: *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 8437 5 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS. In situations where this message is not generated, Mplus generates output that does not provide model fit information nor information that could be used to achieve convergence (it simply lists the summary of analysis, summary of sample statistics, and technical output). As a next step, I adopted a different coding strategy: The effect coding approach, using 1 for the category of interest, 0 for all other categories, and 1 for the reference group. The analysis runs successfully using this coding approach; the model converges with an excellent fit however, the interpretation of the coefficients are different (based on comparisions with the grand mean) and thus undesirable. Do you have any thoughts on the nature of this issue and a possible resolution? 


It sounds like with your dummy variables, you are assigning missing to one category while with the effect coding you are not. I'm not sure why you are doing this. 


Would you suggest the use of trend or contrast coding strategies? 


Please disregard the previous posting. I've determined the problem. Thank you. 

Shi Huang posted on Monday, July 17, 2006  9:48 am



Dear Dr. Muthen I am running a path analysis with purpose to test mediation effect. The variables I used are: X: independent variable, a categorical variable with 3 categories. Then I created two dummy variables (X1, X2) Y: mediator, a continuous variable. Z: outcome, a continuous variable. The syntax is: ANALYSIS: TYPE IS MISSING H1; BOOTSTRAP = 500; model: Z on Y; Y on X1 X2; model indirect: Z IND Y X1 ; Z IND Y X2; I was wondering if this is a correct syntax (specifically for model indirect). Thanks 


I can't see a problem but the best way to find out is the run it. 

Shi Huang posted on Monday, July 17, 2006  10:18 am



Thank you so much for your quick reply. One thing worried me is the 999 in the outputs. TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS Estimates S.E. Est./S.E. StdYX StdYX SE StdYX/SE Effects from X1 to Z Sum of indirect 0.096 0.074 1.285 999.000 999.000 1.000 Specific indirect Z Y X1 0.096 0.074 1.285 999.000 999.000 1.000 Effects from X2 to Z Sum of indirect 0.086 0.068 1.262 999.000 999.000 1.000 Specific indirect Z Y X2 0.086 0.068 1.262 999.000 999.000 1.000 Can I ignore these 999? Thanks 


It soundns like you may be using an old version of the program. If you are using Version 4.1, you should send your input, data, output, and license number to support@statmodel.com. 


Dear prof. Muthen, I am conducting a simulation study to investigate influence of hidden bias in the estimate of treatment effects using regression analysis, the propensity score, the heckman model and a latent variable approach. I have simulated 100 datafiles with a predefined true effect theta. I am using Mplus for the estimation of the effects using latent class modelling. I have simulated the data as followed: There are two variables X1 and X2 that influences the latent variable f (tendency to participate). The latent class has three indicators Z1, Z2 and D (categorical variable, treatment or no treatment). X2 and X3 and D influence the continuous outcome Y. Hidden bias is introduced by correlated errorterms of f and Y. I have set up the Mplus script as follows: VARIABLE: NAMES ARE X1 X2 X3 Z1 Z2 D Y; CATEGORICAL ARE D; ANALYSIS: parameterization = theta; MODEL: F ON X1 X2; F BY Z1 Z2 D; Y ON X2 X3 D; Y with F; The problem that I encounter is that if I denote D as categorical my true estimate theta does not return and if i do not denote D as categorical my true estimates return. Since D is both an indicator and a predictor for Y, should I denote D as being categorical? Or am I doing something wrong here? Can you explain my results? With kind regards, Marieke 


In your introduction you say "latent class modeling" and in your second pararaph you talk about the latent variable f followed by the sentence "The latent class...". The Mplus input however, indicates that f is a continuous latent variable (a factor), not a latent class variable.  So my question is, do you want f to be a factor or a latent class variable? 


Dear Prof. Muthen, Sorry for the confusion. f is a continuous latent variable, indicating 'the tendency to participate' in a nonrandomised study. With kind regards, Marieke 


You should declare D as categorical because it is a dependent variable in one part of the model. If you have problems with this run you should send input, output and license number to support@statmodel.com. 


Dear Prof Muthen, I have a gender variable in my SEM. The standardized coefficients were just estimated when I refer to the variance of gender in the model statement, but than I get the following WARNING: VARIABLE Gender MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS. How can I get the standardized coefficients with not declaring gender as continuous or is there no difference for the model estimation(but when I look at the model fit, it is different). Thank you for your help, Bettina 


It sounds like you have mentioned the means, variances, or covariances of gender in the MODEL command. These should not be mentioned. The model is estimated conditional on the covariates in the model. 


Dear Dr. Muthen I am running a path analysis with purpose to test how different categories of my independent variable impact the mediator and the outcome. The variables I used are: X: independent variable, a categorical variable with 3 categories. Then I created two dummy variables (X1, X2) Y: mediator, a continuous variable (with 6 items). Z: outcome, a continuous variable(with 6 items). The syntax is: ANALYSIS: model: x by X1 X2; Y BY Y1  Y6; Z BY Z1  Z6; Z on Y; Y on X1 X2; model indirect: Z IND Y X1 ; Z IND Y X2; I am not sure if this is right way to do in order to test the different effect of each category's impact on Y, and in turn, on Z. Thank you very much for your help. 


Dear Dr. Muthen: I have a follow up question re my previous question. I have run the model and got output contents both STD and STDYX Standardization. I am not sure which one I should be reading? Since my model including only first order factors and observed categorical independent variable? Best regards Amy 


Dear Dr. Muthen Sorry, another question in relation to earlier two questions. I have actually got three types of standardisation, STD, STDY and STDYX. From what I understand, I should use STDYX result since my model contents both measured and observed variables. Am I right? I am very confused at this stage, and the results are significantly different between STD, STDY and STDYX standardisation. Thank you ever so much again. Amy 


You should remove the statement x BY x1 x2; For binary covariates, use STDY. 

Tracy Witte posted on Wednesday, February 16, 2011  10:39 am



I am running a similar model. X is a binary variable (sex) Y by Y1 y2 y3 y4; z by z1 z2 z3 z4; y with z; y on x; z on x; m1 with m2; MODEL INDIRECT: y ind m1 x; z ind m1 x; y ind m2 x; z ind m2 x; Because X is binary, I assumed that I should use the stdy standardized values. however, one of them is greater than 1.0 (i.e. M1 on X). None of the StdYX values are greater than 1, and none of my residual variances are negative. I'm not sure why this would be. Should I just use the StdYX values? 


You don't show m1 and m2 in the model. 

Tracy Witte posted on Thursday, February 17, 2011  6:52 am



I apologize for the typo. Here's the model I ran: Y by Y1 y2 y3 y4; z by z1 z2 z3 z4; y with z; y on m1; y on m2; z on m1; z on m2; y on x; z on x; m1 with m2; MODEL INDIRECT: y ind m1 x; z ind m1 x; y ind m2 x; z ind m2 x; 


You don't show m1 and m2 on x, but I assume they are there given your question. I would use STDY with a binary x. The fact that it is greater than 1 simply means that the mediator changes more than one SD when x changes from one category (such as male) to another; that's ok. 

Miroslav posted on Sunday, September 25, 2011  11:14 am



Dear Dr. Muthen, I was wondering whether it was possible to dummy code a multi categorical independant variable (e.g., experimentally manipulated 3 conditions in between) as in a typical post hoc test (i.e., creating k variables instead of the traditional dummy/contrast coding resulting in k1 variables) in a path model using Mplus? Is there any supporting literature for such a practice? Thanks you very much Miro 


This may get a better response if you post it on a general discussion forum like SEMNET. 

Melvin C Y posted on Monday, April 02, 2012  10:48 am



In my model, I have 3 independent variables (1 latent continuous, x1, and 2 manifest binary, x2, x3). I believe independent variables should be allowed to covary and these are default options in mplus. Although I obtained a value for the covariance between x1x2, and x1x3, I did not get any value in the model output for the 2 binary variables, i.e., x2x3. I examined tech1 and the PSI for the two binary variable is 0. Yet, when I add the command to covary x2 with x3, I get an error about nonpositive definite. When I looked at the sample stats, x2 and x3 have a covariance of 0.03 and correlation of 0.198. Should binary independent variables be uncorrelated with each other? For your information, I did not specify the binary variables as categorical. Thank you. 


In regression, the model is estimated conditioned on the observed exogenous variables. They are not uncorrelated with each other. Their means, variances, and covariances are not parameters in the model. You can find these values by looking at descriptive statistics for these variables. I am surprised you get covariances among the latent and observed exogenous variables. Can you please send your license number and output to support@statmodel.com. 

Kelly posted on Monday, April 15, 2013  10:05 pm



Hello, At the risk of redundancy after reading the above posts, I'm wondering if the issue I am having with my model is the result of the same problem... I am running into issues when attempting to allow an observed categorical independent variable (gender) to correlate with continuous independent variables (both latent and observed) in my model. The model runs normally when specifying correlations between the continuous independent variables, but when the categorical IV is added in, the nonpositive definite error message pops up. I can provide additional info (i.e., license no. and output) if needed. Thanks in advance for your help. 


The message is generated because the mean and variance of a binary variable are not orthogonal. You can ignore this. 


Hi, I run the following model Emo by mams1x mams2x mams3x mams4x mams5x; Beh by mams11x mams12x mams13x mams14x; Emo Beh on Ethnic Language SEN CLA; Ethnic with Language SEN CLA; Language with SEN CLA; SEN with CLA; Because Ethnic Language SEN CLA are independent variables, I do not declare them as categorical (despite teh fact that they are). But then I have the following warning "WARNING: VARIABLE ETHNIC MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS." But my model is estimated normally "THE MODEL ESTIMATION TERMINATED NORMALLY" Can I ignore the warning message? Thank you in advance 


The message is caused because you bring the covariates into the model by mentioning the WITH statements. In regression, the model is estimated conditioned on the covariates. Their means, variances, and covariances are not model parameters. If you remove the WITH statements, you will not get the message. 


Hi Linda, Yes when I remove the WITH statement, I do not get that message. However, I do want/need to declare the correlation between those variables. Can I do it in any other way so I will not get a warning? Or is it alright to ignore the message? Thank you very much in advance. 


The correlations among these variables are not zero during model estimation. If you want to know what they are, ask for SAMPSTAT in the OUTPUT command and you will obtain them. If you are using the WITH statements to avoid losing cases because of missing data, you can put the WITH statements back now that you know they are causing the message. The message is triggered because the mean and variance of binary variables are not orthogonal. 


Thank you very much! 


Dear Profs Muthen I’m trying to conduct a mediation analysis where one of my IVs is categorical. I’m following the approach suggested by Hayes and Preacher (2014) (Statistical mediation analysis with a multicategorical independent variable), dummycoding education into Yr12, university or the reference category and then forcing a constraint for total effects for Yr12 and Uni on the outcome. Model: waiB on yr12 (a1) uni (a2) sq_inc SEIF NSB kes male; atn on yr12 (a3) uni (a4) sq_inc SEIF NSB kes male; reg on yr12 (a5) uni (a6) sq_inc SEIF NSB kes male; y3readB on yr12 (cp1) uni (cp2) waiB (b1) atn (b2) reg (b3) sq_inc SEIF NSB kes male; Model indirect: y3readB IND yr12 ; y3readB IND uni; y3readB IND sq_inc ; y3readB IND SEIF; y3readB IND NSB ; y3readB IND kes ; y3readB IND male; Model constraint: new (tot1 tot2); tot1 = a1 * b1 + a3 * b2 + a5 * b3 + cp1; tot2 = a2 * b1 + a4 * b2 + a6 * b3 + cp2; Do you have any advice or warnings on this sort of analysis? 


No concerns with this approach? 


I don't see that any constraints are imposed. I don't see why one would take such a complicated approach  what is gained? 


Thanks The idea was trying to make sure that MPLUS ‘knew’ that uni and yr12 were part of the one dummy coded measure (education). But I can see that the ‘total’ constraint just forces the total effects for year 12 and uni to each add up (which it already does anyway). So I’m confused as to whether this is necessary, or if you have any advice on a categorical independent variable in this instance. 


Would you advise removing this? Model constraint: new (tot1 tot2); tot1 = a1 * b1 + a3 * b2 + a5 * b3 + cp1; tot2 = a2 * b1 + a4 * b2 + a6 * b3 + cp2; Thanks Daniel 


Yes, I don't see that it has a purpose. 

Back to top 