Anonymous posted on Tuesday, January 25, 2005 - 1:01 pm
There are 6 independent variables in the SEM model I am running: X1 -X6 to predict 5 dependent variables (one is a category variable).
X3, X4, and X6 are latent variables, and X1, X2, and X4 are observed variables. X4 is a category variable (dichotomous).
Using Mplus, I can get correlations among latent independent variables, observed independent variables, and latent and observed independent variables. But I can not get X4 to correlate with any other independent variable. Is there a way to do it? Thanks.
BMuthen posted on Tuesday, January 25, 2005 - 2:05 pm
You should regress the latent exogenous variables on the observed exogenous variables to make them correlate.
Anonymous posted on Tuesday, January 25, 2005 - 2:33 pm
I am sorry that I made a mistake. It should be that X1, X2, and X5 are observed variables and X5 is a category variable (dichotomous).
X1 to X6 variables are all time 1 variables. If I regress the latent exogenous variables on the observed exogenous variables (e.g., X3 on X5;), doesn't that make it a path rather a correlation?
When I used "X5 WITH X1 X2 X3 X4 X6;" the warning is : *** FATAL ERROR VARIABLE X5 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL.
How should I respond?
Thank you very much.
michael posted on Wednesday, January 26, 2005 - 9:21 am
Dear Linda and Ben,
I have a question concerning a path model with both latent and observed explanatory and dependent variables. Some of the observed explanatory variables are binary, in other words: dummy variables coded 0 and 1. As shown in the MPlus manual, non- continuous dependent variables can be defined by the "CATEGORICAL ARE ;" command. But what about categorical independent variables? For them, there isn't any definition, as far as I can see. My questions: 1. Hence, is the interpretation of their effects just the same as the interpretation of e.g. dummy variables in logistic regression? 2. Does it make sense to centre these dummy variables, and if so, is it recommended?
DATA: FILE IS d:\tatenm.t; VARIABLE: NAMES ARE ab un mi a fr g16 g17 g18 g1 g2 q1 q2 q3 q4 q5 o ot s; CATEGORICAL ARE ot s; MODEL: f1 by g16 g17 g18 g1 g2; f2 by q1 q2 q3 q4 q5; f1 on un ab mi a o; f2 on f1 un a fr; ot on un ab a fr f1; ot with f2; s on ot un ab a o fr f1 f2;
In the above example, "ab un mi" are educational degrees. ab=higher secondary educ., un=university, mi=high level of lower secondary education. There remains the reference group: low level of lower secondary education.
3. In path analysis and structural equation modelling with both unobserved and observed independent variables: Is it correct to split the nominal variable "educational degree" into four dummies and to include three of them into the model - just as usual in regression analysis?
The scale of the indepdendent variables does not affect model estimation. This is the same as in regular regression.
1. It depends on the estimator. With weighted least squares and categorical dependent variables, probit regressions are estimated. With maximum likelihood and categorical dependent variables, logistic regressions are estimated.
3. Yes, as in regular regression independent variables can be binary or continuous.
Anonymous posted on Wednesday, January 26, 2005 - 2:45 pm
you mentioned that "the scale of the indepdendent variables does not affect model estimation". But when I used "X5 WITH X1 X2 X3 X4 X6;" the warning is :
*** FATAL ERROR VARIABLE X5 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL.
How should I specify X5 which is not a dependent variable?
You should not specify the covariances among the observed x variables. They are not estimated as part of the model as in regular regression. If you can't get this straightened out, send the complete output to firstname.lastname@example.org. It is difficult to understand your full model without seeing the complete input.
Anonymous posted on Thursday, January 27, 2005 - 6:55 am
Dear Dr. Muthen,
I will do that and thank you very much for your time.
Anonymous posted on Thursday, February 17, 2005 - 12:56 pm
Dear Dr. Muthen,
So if the independent variables are categorical, we just use WLS as the estimation method to run regular regression models for path analysis or SEM models. Is that right? I am just a little confused how to choose the estimation method when I run SEM models. Thank you so much!
If you look up ESTIMATOR in the Mplus User's Guide, you will find a table that summarizes the estimators available in Mplus and shows the defaults for various situations. Note that the scale of the independent variables are irrelevant for choosing an estimator. If you are unsure which estimator to use, go with the Mplus default.
Bonnie posted on Wednesday, March 02, 2005 - 9:12 am
Dear Linda ,
I am still confused about the SEM models with non-normality or categorical independent variable. In my SEM model, I have two continuous factor indicators PBIcare and PBIcontrol, the underlying latent variable is PBI; and the mediator and dependent variable CG are all normally distritbuted continuous variables. PBIcontrol is also normal. But PBIcare is highly skewed. I tried many ways to transform it, but failed. In this case, which estimation method should I use if the normality assumption is violated? Or, I can create a dummy variable based on PBIcare to replace it, then in this case, which estimation method should I use? Or say, do I need to create a dummy variable when the normality assumption cannot be met. I would appreciate it!
The scale of indepdent variables does not affect estimation. Independent variables can be binary or continuous. If you have a three category independent variable, you should create two dummy variables. This is the same as in regular regression.
Anonymous posted on Wednesday, April 20, 2005 - 2:19 am
I do have a model with an exogenous latent variable (f1) measured by categorical indicators, three independent dichotomous variables and two endogenous categorical variables. I have not specified the covariances among the independent variables since they are not estimated as part of the model. It results in very low fit measures. If a create a latent variable behind all my dummy variables (by fixing the factor loading to one and the residual variance to zero - i.e. setting the latent variable equal with the dummy indicator), I am able to specify the covariances among the independant variables and it results in excellent fit measures. It also makes theoretically sense to correlate the independant variables. How come these huge differences in the fit measures although it is practically the same model?
Instead of including the covariances among the exogenous latent variable and the exogenous observed variables, you should regress the latent variable on the observed variables. This both allows for the relationships to be included in the model and also makes the analysis stay in the regression-based framework rather than the correlation-based framework.
Estoy tratando de realizar un analis de trayectoria PATH ANALYSIS resulta que en este tengo todas mis variables (Exogenas y Endogenas) SON DUMMY (0 1) e leido un articulo escrito por NOBOUOKI ESHIMA, MINORU TABATA and GENG ZHI titulado PATH ANALYSIS WITH LOGISTIC REGRESSION MODELS, esto quiere decri que cuando tenemos estos tipos de variables el modelo no se estima adecuadamente atraves de la regresion lineal comun y corriente
León, Nicaragua 28 de Abril del 2005 Center for Demographic and Health Research
Dear Dr. Muthen
1) Path analysis is usually perormed for continuous variables by using linear regression equation, and the basic idea is aplied to the analysis of causal systems of continuos variables, LISREL model. In comparasion with path analysis of continuous variables, that of categorical variables is complex, because the causal system under considertation cannot be described by linear regression equations.
2) I am trying to carry out path analysis with categorical variables (all my variables are qualitative,[as much the exogenous ones as the endogenous ones]) . that I can make?.
3) Where can find information of path analysis with categorical variables?.
Yes, Mplus can estimate a path model with dichotomous outcomes. The default is to estimate probit regression coefficients but Mplus can also estimate logistic regression coefficients.
Anonymous posted on Monday, May 02, 2005 - 1:45 am
Referring to the Posting of April, 20 (2:19 am):
I probably put my problem mistakably in words. I do have one exogenous latent variable which is measured by three indicators (ordinal scale). Besides that I do have three other exogenous dummy variables, which are correlated with the before mentioned latent variable, but not in a causal relationship.
If I fix a correlational relationship between the exogenous variables (with the command WITH), then I run into the same problems as the anonymous poster before me (*** FATAL ERROR VARIABLE XY CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL.)
So, I created a latent factor behind each of the dummy variables (by fixing the factor loading to one and the residual variance to zero - i.e. setting the latent variable equal with the dummy indicator). This results in very good fit measures, while leaving out a correlation between the exogenous variables leads to very bad fit maesures.
How come these huge differences between practically identical models?
Anonymous posted on Monday, May 02, 2005 - 4:18 am
To be able to use all dummy variables (and not only k-1 dummy variables), one has to suppress the intercepts? How to do that? Just fix the intercepts to zero?
With exogeneous latent and exogeneous observed variables, Mplus does not default to having these 2 sets of variables correlated when the outcomes are categorical. The best way to correlate the 2 sets is to regress the latent variables on the observed variables. It is advantageous to have exogeneous observed variables in the model so that the "regression-based" as opposed to the "correlation-based" estimation approach can be used (see Muthen's 1984 Psychometrika article).
In principle, yes the intercepts should be supressed. However, even so, using all dummy variables causes a singular sample covariance matrix and Mplus does not allow that.
Anonymous posted on Tuesday, September 06, 2005 - 2:05 am
Referring to the posting of May 03, 2:55 pm: My reviewer asks me why it is not technically possible to make an exogenous manifest and a exogenous latent variable (there is no causal relationship between them) to correlate when the outcome is categorical. Is there an article to which I could refer?
It is possible to include such a correlation. However, you then make distributional assumptions about the observed variable which you do not do otherwise.
Anonymous posted on Tuesday, September 06, 2005 - 1:50 pm
Excuse me, I forgot to mention that the exogenous manifest variable is binomial. And there is seemingly no possibilty to correlate this variable with the other latent exogenous variable. You advised to regress the latent on the binomial but this seems to me theoretically not appropriate, since there is no causal relationship between them.
bmuthen posted on Tuesday, September 06, 2005 - 3:01 pm
Exogenous observed (manifest) variables are treated as continuous so if your manifest variable is binomial that is not explicitly taken into account - and this seems alright. As mentioned in the earlier posts it is possible to correlate the manifest with the latents, but it comes at the price referred to in my original May 03 post. Regression seems a better alternative - I don't see that as necessarily implying causation.
I am attempting to use dummy variables representing different racial groupings as exogenous variables in my SEM. In initial attempts I used the binary coding strategy to create the dummary variables where, for example: (1) African American = 1, Asian / Pacific Islander or Amerindian or Other = 0; (2)Asian / Pacific Islander = 1, African American or Amerindian or Other = 0; (3)Ect. (4) and where the dummy variable forWhite is used as the reference group (and thus excluded from the model).
This strategy had the effect of excluding a portion of the study sample from the analysis. Here is an example of the message that is generated in the output: *** WARNING Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 8437 5 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS. In situations where this message is not generated, Mplus generates output that does not provide model fit information nor information that could be used to achieve convergence (it simply lists the summary of analysis, summary of sample statistics, and technical output).
As a next step, I adopted a different coding strategy: The effect coding approach, using 1 for the category of interest, 0 for all other categories, and -1 for the reference group. The analysis runs successfully using this coding approach; the model converges with an excellent fit however, the interpretation of the coefficients are different (based on comparisions with the grand mean) and thus undesirable. Do you have any thoughts on the nature of this issue and a possible resolution?
I am conducting a simulation study to investigate influence of hidden bias in the estimate of treatment effects using regression analysis, the propensity score, the heckman model and a latent variable approach. I have simulated 100 datafiles with a pre-defined true effect theta. I am using Mplus for the estimation of the effects using latent class modelling.
I have simulated the data as followed: There are two variables X1 and X2 that influences the latent variable f (tendency to participate). The latent class has three indicators Z1, Z2 and D (categorical variable, treatment or no treatment). X2 and X3 and D influence the continuous outcome Y. Hidden bias is introduced by correlated error-terms of f and Y.
I have set up the Mplus script as follows:
VARIABLE: NAMES ARE X1 X2 X3 Z1 Z2 D Y; CATEGORICAL ARE D; ANALYSIS: parameterization = theta;
MODEL: F ON X1 X2; F BY Z1 Z2 D; Y ON X2 X3 D; Y with F;
The problem that I encounter is that if I denote D as categorical my true estimate theta does not return and if i do not denote D as categorical my true estimates return. Since D is both an indicator and a predictor for Y, should I denote D as being categorical? Or am I doing something wrong here? Can you explain my results?
You should declare D as categorical because it is a dependent variable in one part of the model. If you have problems with this run you should send input, output and license number to email@example.com.
I have a gender variable in my SEM. The standardized coefficients were just estimated when I refer to the variance of gender in the model statement, but than I get the following WARNING: VARIABLE Gender MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
How can I get the standardized coefficients with not declaring gender as continuous or is there no difference for the model estimation(but when I look at the model fit, it is different).
I am running a path analysis with purpose to test how different categories of my independent variable impact the mediator and the outcome. The variables I used are:
X: independent variable, a categorical variable with 3 categories. Then I created two dummy variables (X1, X2) Y: mediator, a continuous variable (with 6 items). Z: outcome, a continuous variable(with 6 items).
The syntax is:
model: x by X1 X2; Y BY Y1 - Y6; Z BY Z1 - Z6; Z on Y; Y on X1 X2;
model indirect: Z IND Y X1 ; Z IND Y X2;
I am not sure if this is right way to do in order to test the different effect of each category's impact on Y, and in turn, on Z.
Dear Dr. Muthen: I have a follow up question re my previous question. I have run the model and got output contents both STD and STDYX Standardization. I am not sure which one I should be reading? Since my model including only first order factors and observed categorical independent variable?
Sorry, another question in relation to earlier two questions. I have actually got three types of standardisation, STD, STDY and STDYX. From what I understand, I should use STDYX result since my model contents both measured and observed variables. Am I right? I am very confused at this stage, and the results are significantly different between STD, STDY and STDYX standardisation.
You should remove the statement x BY x1 x2; For binary covariates, use STDY.
Tracy Witte posted on Wednesday, February 16, 2011 - 10:39 am
I am running a similar model. X is a binary variable (sex)
Y by Y1 y2 y3 y4; z by z1 z2 z3 z4; y with z; y on x; z on x; m1 with m2;
MODEL INDIRECT: y ind m1 x; z ind m1 x; y ind m2 x; z ind m2 x;
Because X is binary, I assumed that I should use the stdy standardized values. however, one of them is greater than 1.0 (i.e. M1 on X). None of the StdYX values are greater than 1, and none of my residual variances are negative. I'm not sure why this would be. Should I just use the StdYX values?
You don't show m1 and m2 on x, but I assume they are there given your question.
I would use STDY with a binary x. The fact that it is greater than 1 simply means that the mediator changes more than one SD when x changes from one category (such as male) to another; that's ok.
Miroslav posted on Sunday, September 25, 2011 - 11:14 am
Dear Dr. Muthen, I was wondering whether it was possible to dummy code a multi categorical independant variable (e.g., experimentally manipulated 3 conditions in between) as in a typical post hoc test (i.e., creating k variables instead of the traditional dummy/contrast coding resulting in k-1 variables) in a path model using Mplus? Is there any supporting literature for such a practice? Thanks you very much Miro
This may get a better response if you post it on a general discussion forum like SEMNET.
Melvin C Y posted on Monday, April 02, 2012 - 10:48 am
In my model, I have 3 independent variables (1 latent continuous, x1, and 2 manifest binary, x2, x3). I believe independent variables should be allowed to covary and these are default options in mplus. Although I obtained a value for the covariance between x1-x2, and x1-x3, I did not get any value in the model output for the 2 binary variables, i.e., x2-x3. I examined tech1 and the PSI for the two binary variable is 0. Yet, when I add the command to covary x2 with x3, I get an error about non-positive definite. When I looked at the sample stats, x2 and x3 have a covariance of 0.03 and correlation of 0.198.
Should binary independent variables be uncorrelated with each other?
For your information, I did not specify the binary variables as categorical.
In regression, the model is estimated conditioned on the observed exogenous variables. They are not uncorrelated with each other. Their means, variances, and covariances are not parameters in the model. You can find these values by looking at descriptive statistics for these variables.
I am surprised you get covariances among the latent and observed exogenous variables. Can you please send your license number and output to firstname.lastname@example.org.
At the risk of redundancy after reading the above posts, I'm wondering if the issue I am having with my model is the result of the same problem...
I am running into issues when attempting to allow an observed categorical independent variable (gender) to correlate with continuous independent variables (both latent and observed) in my model. The model runs normally when specifying correlations between the continuous independent variables, but when the categorical IV is added in, the non-positive definite error message pops up.
I can provide additional info (i.e., license no. and output) if needed. Thanks in advance for your help.
Emo by mams1x mams2x mams3x mams4x mams5x; Beh by mams11x mams12x mams13x mams14x; Emo Beh on Ethnic Language SEN CLA; Ethnic with Language SEN CLA; Language with SEN CLA; SEN with CLA;
Because Ethnic Language SEN CLA are independent variables, I do not declare them as categorical (despite teh fact that they are). But then I have the following warning "WARNING: VARIABLE ETHNIC MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS."
But my model is estimated normally "THE MODEL ESTIMATION TERMINATED NORMALLY"
The message is caused because you bring the covariates into the model by mentioning the WITH statements. In regression, the model is estimated conditioned on the covariates. Their means, variances, and covariances are not model parameters. If you remove the WITH statements, you will not get the message.
Yes when I remove the WITH statement, I do not get that message. However, I do want/need to declare the correlation between those variables. Can I do it in any other way so I will not get a warning? Or is it alright to ignore the message?
The correlations among these variables are not zero during model estimation. If you want to know what they are, ask for SAMPSTAT in the OUTPUT command and you will obtain them. If you are using the WITH statements to avoid losing cases because of missing data, you can put the WITH statements back now that you know they are causing the message. The message is triggered because the mean and variance of binary variables are not orthogonal.
I’m trying to conduct a mediation analysis where one of my IVs is categorical.
I’m following the approach suggested by Hayes and Preacher (2014) (Statistical mediation analysis with a multicategorical independent variable), dummy-coding education into Yr12, university or the reference category and then forcing a constraint for total effects for Yr12 and Uni on the outcome.
Model: waiB on yr12 (a1) uni (a2) sq_inc SEIF NSB kes male; atn on yr12 (a3) uni (a4) sq_inc SEIF NSB kes male; reg on yr12 (a5) uni (a6) sq_inc SEIF NSB kes male; y3readB on yr12 (cp1) uni (cp2) waiB (b1) atn (b2) reg (b3) sq_inc SEIF NSB kes male; Model indirect: y3readB IND yr12 ; y3readB IND uni; y3readB IND sq_inc ; y3readB IND SEIF; y3readB IND NSB ; y3readB IND kes ; y3readB IND male; Model constraint: new (tot1 tot2); tot1 = a1 * b1 + a3 * b2 + a5 * b3 + cp1; tot2 = a2 * b1 + a4 * b2 + a6 * b3 + cp2;
Do you have any advice or warnings on this sort of analysis?
Thanks so much to let me know on your recent paper on centering. I just know that we have one more option of centering; latent centering which seems suit my study well since I would like to run two-level modeling with huge no of predictors (most of them are categorical). Even though im not yet check the ICCs for each predictors, i believe many of them are significant.
My question is there are several times in the paper mentioned that this latent centering work well for study with small no of cluster but mine is more than 100 clusters. What do you think? Should i apply latent clustering or not?