Mplus Discussion >> Category independent variable

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Category independent variable

Mplus Discussion > Structural Equation Modeling >

Message/Author

Anonymous posted on Tuesday, January 25, 2005 - 1:01 pm

There are 6 independent variables in the SEM model I am running: X1 -X6 to predict 5 dependent variables (one is a category variable).

X3, X4, and X6 are latent variables, and X1, X2, and X4 are observed variables. X4 is a category variable (dichotomous).

Using Mplus, I can get correlations among latent independent variables, observed independent variables, and latent and observed independent variables. But I can not get X4 to correlate with any other independent variable. Is there a way to do it?
Thanks.

BMuthen posted on Tuesday, January 25, 2005 - 2:05 pm

You should regress the latent exogenous variables on the observed exogenous variables to make them correlate.

Anonymous posted on Tuesday, January 25, 2005 - 2:33 pm

I am sorry that I made a mistake. It should be that X1, X2, and X5 are observed variables and X5 is a category variable (dichotomous).

X1 to X6 variables are all time 1 variables. If I regress the latent exogenous variables on the observed exogenous variables (e.g., X3 on X5;), doesn't that make it a path rather a correlation?

When I used "X5 WITH X1 X2 X3 X4 X6;" the warning is :
*** FATAL ERROR
VARIABLE X5 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE
DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
RESPECIFY THE VARIABLE AS CATEGORICAL.

How should I respond?

Thank you very much.

michael posted on Wednesday, January 26, 2005 - 9:21 am

Dear Linda and Ben,

I have a question concerning a path model with both latent and observed explanatory and dependent variables. Some of the observed explanatory variables are binary, in other words: dummy variables coded 0 and 1.
As shown in the MPlus manual, non- continuous dependent variables can be defined by the "CATEGORICAL ARE ;" command. But what about categorical independent variables? For them, there isn't any definition, as far as I can see.
My questions:
1. Hence, is the interpretation of their effects just the same as the interpretation of e.g. dummy variables in logistic regression?
2. Does it make sense to centre these dummy variables, and if so, is it recommended?

DATA: FILE IS d:\tatenm.t;
VARIABLE: NAMES ARE ab un mi a fr g16 g17 g18 g1 g2 q1 q2 q3 q4 q5 o ot s;
CATEGORICAL ARE ot s;
MODEL: f1 by g16 g17 g18 g1 g2;
f2 by q1 q2 q3 q4 q5;
f1 on un ab mi a o;
f2 on f1 un a fr;
ot on un ab a fr f1;
ot with f2;
s on ot un ab a o fr f1 f2;

In the above example, "ab un mi" are educational degrees. ab=higher secondary educ., un=university, mi=high level of lower secondary education. There remains the reference group: low level of lower secondary education.

3. In path analysis and structural equation modelling with both unobserved and observed independent variables: Is it correct to split the nominal variable "educational degree" into four dummies and to include three of them into the model - just as usual in regression analysis?

Thanks a lot,

Michael Windzio, Hannover (KFN), Germany

Linda K. Muthen posted on Wednesday, January 26, 2005 - 1:33 pm

The scale of the indepdendent variables does not affect model estimation. This is the same as in regular regression.

1. It depends on the estimator. With weighted least squares and categorical dependent variables, probit regressions are estimated. With maximum likelihood and categorical dependent variables, logistic regressions are estimated.

2. No.

3. Yes, as in regular regression independent variables can be binary or continuous.

Anonymous posted on Wednesday, January 26, 2005 - 2:45 pm

you mentioned that "the scale of the indepdendent variables does not affect model estimation".
But when I used "X5 WITH X1 X2 X3 X4 X6;" the warning is :

*** FATAL ERROR
VARIABLE X5 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE
DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
RESPECIFY THE VARIABLE AS CATEGORICAL.

How should I specify X5 which is not a dependent variable?

Thank you very much.

Linda K. Muthen posted on Wednesday, January 26, 2005 - 6:59 pm

You should not specify the covariances among the observed x variables. They are not estimated as part of the model as in regular regression. If you can't get this straightened out, send the complete output to support@statmodel.com. It is difficult to understand your full model without seeing the complete input.

Anonymous posted on Thursday, January 27, 2005 - 6:55 am

Dear Dr. Muthen,

I will do that and thank you very much for your time.

Anonymous posted on Thursday, February 17, 2005 - 12:56 pm

Dear Dr. Muthen,

So if the independent variables are categorical, we just use WLS as the estimation method to run regular regression models for path analysis or SEM models. Is that right? I am just a little confused how to choose the estimation method when I run SEM models. Thank you so much!

Linda K. Muthen posted on Thursday, February 17, 2005 - 2:20 pm

If you look up ESTIMATOR in the Mplus User's Guide, you will find a table that summarizes the estimators available in Mplus and shows the defaults for various situations. Note that the scale of the independent variables are irrelevant for choosing an estimator. If you are unsure which estimator to use, go with the Mplus default.

Bonnie posted on Wednesday, March 02, 2005 - 9:12 am

Dear Linda ,

I am still confused about the SEM models with non-normality or categorical independent variable. In my SEM model, I have two continuous factor indicators PBIcare and PBIcontrol, the underlying latent variable is PBI; and the mediator and dependent variable CG are all normally distritbuted continuous variables. PBIcontrol is also normal. But PBIcare is highly skewed. I tried many ways to transform it, but failed. In this case, which estimation method should I use if the normality assumption is violated? Or, I can create a dummy variable based on PBIcare to replace it, then in this case, which estimation method should I use? Or say, do I need to create a dummy variable when the normality assumption cannot be met. I would appreciate it!

Best, Bonnie

Linda K. Muthen posted on Wednesday, March 02, 2005 - 10:52 am

The scale of indepdent variables does not affect estimation. Independent variables can be binary or continuous. If you have a three category independent variable, you should create two dummy
variables. This is the same as in regular regression.

Anonymous posted on Wednesday, April 20, 2005 - 2:19 am

I do have a model with an exogenous latent variable (f1) measured by categorical indicators, three independent dichotomous variables and two endogenous categorical variables. I have not specified the covariances among the independent variables since they are not estimated as part of the model. It results in very low fit measures. If a create a latent variable behind all my dummy variables (by fixing the factor loading to one and the residual variance to zero - i.e. setting the latent variable equal with the dummy indicator), I am able to specify the covariances among the independant variables and it results in excellent fit measures. It also makes theoretically sense to correlate the independant variables. How come these huge differences in the fit measures although it is practically the same model?

Linda K. Muthen posted on Wednesday, April 20, 2005 - 9:08 am

Instead of including the covariances among the exogenous latent variable and the exogenous observed variables, you should regress the latent variable on the observed variables. This both allows for the relationships to be included in the model and also makes the analysis stay in the regression-based framework rather than the correlation-based framework.

MARLONMELENDEZRODRIGUEZ posted on Wednesday, April 27, 2005 - 3:48 pm

León, Nicaragua 27 de abril del 2005

Dear Dr. Muthen,

Estoy tratando de realizar un analis de trayectoria PATH ANALYSIS resulta que en este tengo todas mis variables (Exogenas y Endogenas) SON DUMMY (0 1) e leido un articulo escrito por NOBOUOKI ESHIMA, MINORU TABATA and GENG ZHI titulado PATH ANALYSIS WITH LOGISTIC REGRESSION MODELS, esto quiere decri que cuando tenemos estos tipos de variables el modelo no se estima adecuadamente atraves de la regresion lineal comun y corriente

Marlon Melendez

Linda K. Muthen posted on Wednesday, April 27, 2005 - 5:52 pm

We have no one on our staff who speaks Spanish. Can you please post in English. Thank you.

MARLONMELENDEZRODRIGUEZ posted on Thursday, April 28, 2005 - 9:32 am

León, Nicaragua 28 de Abril del 2005
Center for Demographic and Health Research

Dear Dr. Muthen

1) Path analysis is usually perormed for continuous variables by using linear regression equation, and the basic idea is aplied to the analysis of causal systems of continuos variables, LISREL model. In comparasion with path analysis of continuous variables, that of categorical variables is complex, because the causal system under considertation cannot be described by linear regression equations.

2) I am trying to carry out path analysis with categorical variables (all my variables are qualitative,[as much the exogenous ones as the endogenous ones]) . that I can make?.

3) Where can find information of path analysis with categorical variables?.

excuse me my English

Marlon Osman Melendez Rodriguez

Linda K. Muthen posted on Thursday, April 28, 2005 - 10:18 am

The only reference that I have is:

Xie, Y. (1989). Structural equation models for ordinal variables. Sociological Methods and Research, 17, 325-352.

Mplus can estimate path models using either probit or logistic regression.

MARLONMELENDEZRODRIGUEZ posted on Thursday, April 28, 2005 - 1:30 pm

León, Nicaragua 28 of April of the 2005

Dear Dr. Muthen

Excuse me, can MPLUS carry out the PATH ANALYSIS with variable dummy?

Marlon Osman Melendez Rodriguez

Linda K. Muthen posted on Thursday, April 28, 2005 - 5:17 pm

Yes, Mplus can estimate a path model with dichotomous outcomes. The default is to estimate probit regression coefficients but Mplus can also estimate logistic regression coefficients.

Anonymous posted on Monday, May 02, 2005 - 1:45 am

Referring to the Posting of April, 20 (2:19 am):

I probably put my problem mistakably in words. I do have one exogenous latent variable which is measured by three indicators (ordinal scale). Besides that I do have three other exogenous dummy variables, which are correlated with the before mentioned latent variable, but not in a causal relationship.

If I fix a correlational relationship between the exogenous variables (with the command WITH), then I run into the same problems as the anonymous poster before me (*** FATAL ERROR
VARIABLE XY CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE
DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS.
RESPECIFY THE VARIABLE AS CATEGORICAL.)

So, I created a latent factor behind each of the dummy variables (by fixing the factor loading to one and the residual variance to zero - i.e. setting the latent variable equal with the dummy indicator). This results in very good fit measures, while leaving out a correlation between the exogenous variables leads to very bad fit maesures.

How come these huge differences between practically identical models?

Anonymous posted on Monday, May 02, 2005 - 4:18 am

To be able to use all dummy variables (and not only k-1 dummy variables), one has to suppress the intercepts? How to do that? Just fix the intercepts to zero?

bmuthen posted on Tuesday, May 03, 2005 - 2:55 pm

Answer to the May 02, 01:45 question:

With exogeneous latent and exogeneous observed variables, Mplus does not default to having these 2 sets of variables correlated when the outcomes are categorical. The best way to correlate the 2 sets is to regress the latent variables on the observed variables. It is advantageous to have exogeneous observed variables in the model so that the "regression-based" as opposed to the "correlation-based" estimation approach can be used (see Muthen's 1984 Psychometrika article).

bmuthen posted on Tuesday, May 03, 2005 - 2:57 pm

Answer to the May 02, 04:18 question:

In principle, yes the intercepts should be supressed. However, even so, using all dummy variables causes a singular sample covariance matrix and Mplus does not allow that.

Anonymous posted on Tuesday, September 06, 2005 - 2:05 am

Referring to the posting of May 03, 2:55 pm: My reviewer asks me why it is not technically possible to make an exogenous manifest and a exogenous latent variable (there is no causal relationship between them) to correlate when the outcome is categorical. Is there an article to which I could refer?

Linda K. Muthen posted on Tuesday, September 06, 2005 - 7:33 am

It is possible to include such a correlation. However, you then make distributional assumptions about the observed variable which you do not do otherwise.

Anonymous posted on Tuesday, September 06, 2005 - 1:50 pm

Excuse me, I forgot to mention that the exogenous manifest variable is binomial. And there is seemingly no possibilty to correlate this variable with the other latent exogenous variable. You advised to regress the latent on the binomial but this seems to me theoretically not appropriate, since there is no causal relationship between them.

bmuthen posted on Tuesday, September 06, 2005 - 3:01 pm

Exogenous observed (manifest) variables are treated as continuous so if your manifest variable is binomial that is not explicitly taken into account - and this seems alright. As mentioned in the earlier posts it is possible to correlate the manifest with the latents, but it comes at the price referred to in my original May 03 post. Regression seems a better alternative - I don't see that as necessarily implying causation.

Jeffrey Hall posted on Wednesday, April 19, 2006 - 4:53 am

Good morning,

I am attempting to use dummy variables representing different racial groupings as exogenous variables in my SEM. In initial attempts I used the binary coding strategy to create the dummary variables where, for example:
(1) African American = 1, Asian / Pacific Islander or Amerindian or Other = 0;
(2)Asian / Pacific Islander = 1, African American or Amerindian or Other = 0;
(3)Ect.
(4) and where the dummy variable forWhite is used as the reference group (and thus excluded from the model).

This strategy had the effect of excluding a portion of the study sample from the analysis. Here is an example of the message that is generated in the output:
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 8437
5 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS.
In situations where this message is not generated, Mplus generates output that does not provide model fit information nor information that could be used to achieve convergence (it simply lists the summary of analysis, summary of sample statistics, and technical output).

As a next step, I adopted a different coding strategy: The effect coding approach, using 1 for the category of interest, 0 for all other categories, and -1 for the reference group. The analysis runs successfully using this coding approach; the model converges with an excellent fit however, the interpretation of the coefficients are different (based on comparisions with the grand mean) and thus undesirable. Do you have any thoughts on the nature of this issue and a possible resolution?

Linda K. Muthen posted on Wednesday, April 19, 2006 - 8:35 am

It sounds like with your dummy variables, you are assigning missing to one category while with the effect coding you are not. I'm not sure why you are doing this.

Jeffrey Hall posted on Wednesday, April 19, 2006 - 11:05 am

Would you suggest the use of trend or contrast coding strategies?

Jeffrey Hall posted on Wednesday, April 19, 2006 - 11:46 am

Please disregard the previous posting. I've determined the problem. Thank you.

Shi Huang posted on Monday, July 17, 2006 - 9:48 am

Dear Dr. Muthen

I am running a path analysis with purpose to test mediation effect.
The variables I used are:

X: independent variable, a categorical variable with 3 categories. Then I created two dummy variables (X1, X2)
Y: mediator, a continuous variable.
Z: outcome, a continuous variable.

The syntax is:

ANALYSIS:
TYPE IS MISSING H1;
BOOTSTRAP = 500;
model:
Z on Y;
Y on X1 X2;

model indirect:
Z IND Y X1 ;
Z IND Y X2;

I was wondering if this is a correct syntax (specifically for model indirect).

Thanks

Linda K. Muthen posted on Monday, July 17, 2006 - 10:09 am

I can't see a problem but the best way to find out is the run it.

Shi Huang posted on Monday, July 17, 2006 - 10:18 am

Thank you so much for your quick reply.

One thing worried me is the 999 in the outputs.

TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS

Estimates S.E. Est./S.E. StdYX StdYX SE StdYX/SE

Effects from X1 to Z

Sum of indirect -0.096 0.074 -1.285 999.000 999.000 1.000

Specific indirect

Z
Y
X1 -0.096 0.074 -1.285 999.000 999.000 1.000

Effects from X2 to Z

Sum of indirect -0.086 0.068 -1.262 999.000 999.000 1.000

Specific indirect

Z
Y
X2 -0.086 0.068 -1.262 999.000 999.000 1.000

Can I ignore these 999?

Thanks

Linda K. Muthen posted on Monday, July 17, 2006 - 10:21 am

It soundns like you may be using an old version of the program. If you are using Version 4.1, you should send your input, data, output, and license number to support@statmodel.com.

M.D.Spreeuwenberg posted on Tuesday, July 22, 2008 - 7:42 am

Dear prof. Muthen,

I am conducting a simulation study to investigate influence of hidden bias in the estimate of treatment effects using regression analysis, the propensity score, the heckman model and a latent variable approach. I have simulated 100 datafiles with a pre-defined true effect theta. I am using Mplus for the estimation of the effects using latent class modelling.

I have simulated the data as followed: There are two variables X1 and X2 that influences the latent variable f (tendency to participate). The latent class has three indicators Z1, Z2 and D (categorical variable, treatment or no treatment). X2 and X3 and D influence the continuous outcome Y. Hidden bias is introduced by correlated error-terms of f and Y.

I have set up the Mplus script as follows:

VARIABLE: NAMES ARE X1 X2 X3 Z1 Z2 D Y;
CATEGORICAL ARE D;
ANALYSIS: parameterization = theta;

MODEL: F ON X1 X2;
F BY Z1 Z2 D;
Y ON X2 X3 D;
Y with F;

The problem that I encounter is that if I denote D as categorical my true estimate theta does not return and if i do not denote D as categorical my true estimates return. Since D is both an indicator and a predictor for Y, should I denote D as being categorical? Or am I doing something wrong here? Can you explain my results?

With kind regards,

Marieke

Bengt O. Muthen posted on Tuesday, July 22, 2008 - 7:58 am

In your introduction you say "latent class modeling" and in your second pararaph you talk about the latent variable f followed by the sentence "The latent class...".

The Mplus input however, indicates that f is a continuous latent variable (a factor), not a latent class variable.

- So my question is, do you want f to be a factor or a latent class variable?

M.D.Spreeuwenberg posted on Wednesday, July 23, 2008 - 12:02 am

Dear Prof. Muthen,

Sorry for the confusion. f is a continuous latent variable, indicating 'the tendency to participate' in a non-randomised study.

With kind regards,

Marieke

Bengt O. Muthen posted on Wednesday, July 23, 2008 - 9:13 am

You should declare D as categorical because it is a dependent variable in one part of the model. If you have problems with this run you should send input, output and license number to support@statmodel.com.

Bettina Doering posted on Friday, August 28, 2009 - 3:28 am

Dear Prof Muthen,

I have a gender variable in my SEM. The standardized coefficients were just estimated when I refer to the variance of gender in the model statement, but than I get the following WARNING: VARIABLE Gender MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.

How can I get the standardized coefficients with not declaring gender as continuous or is there no difference for the model estimation(but when I look at the model fit, it is different).

Thank you for your help,

Bettina

Linda K. Muthen posted on Friday, August 28, 2009 - 6:44 am

It sounds like you have mentioned the means, variances, or covariances of gender in the MODEL command. These should not be mentioned. The model is estimated conditional on the covariates in the model.

AMY TIAN-FOREMAN posted on Thursday, December 03, 2009 - 4:28 pm

Dear Dr. Muthen

I am running a path analysis with purpose to test how different categories of my independent variable impact the mediator and the outcome.
The variables I used are:

X: independent variable, a categorical variable with 3 categories. Then I created two dummy variables (X1, X2)
Y: mediator, a continuous variable (with 6 items).
Z: outcome, a continuous variable(with 6 items).

The syntax is:

ANALYSIS:

model:
x by X1 X2;
Y BY Y1 - Y6;
Z BY Z1 - Z6;
Z on Y;
Y on X1 X2;

model indirect:
Z IND Y X1 ;
Z IND Y X2;

I am not sure if this is right way to do in order to test the different effect of each category's impact on Y, and in turn, on Z.

Thank you very much for your help.

AMY TIAN-FOREMAN posted on Thursday, December 03, 2009 - 5:32 pm

Dear Dr. Muthen:
I have a follow up question re my previous question.
I have run the model and got output contents both STD and STDYX Standardization. I am not sure which one I should be reading? Since my model including only first order factors and observed categorical independent variable?

Best regards
Amy

AMY TIAN-FOREMAN posted on Thursday, December 03, 2009 - 8:08 pm

Dear Dr. Muthen

Sorry, another question in relation to earlier two questions. I have actually got three types of standardisation, STD, STDY and STDYX. From what I understand, I should use STDYX result since my model contents both measured and observed variables. Am I right? I am very confused at this stage, and the results are significantly different between STD, STDY and STDYX standardisation.

Thank you ever so much again.

Amy

Linda K. Muthen posted on Friday, December 04, 2009 - 9:23 am

You should remove the statement x BY x1 x2; For binary covariates, use STDY.

Tracy Witte posted on Wednesday, February 16, 2011 - 10:39 am

I am running a similar model. X is a binary variable (sex)

Y by Y1 y2 y3 y4;
z by z1 z2 z3 z4;
y with z;
y on x;
z on x;
m1 with m2;

MODEL INDIRECT:
y ind m1 x;
z ind m1 x;
y ind m2 x;
z ind m2 x;

Because X is binary, I assumed that I should use the stdy standardized values. however, one of them is greater than 1.0 (i.e. M1 on X). None of the StdYX values are greater than 1, and none of my residual variances are negative. I'm not sure why this would be. Should I just use the StdYX values?

Bengt O. Muthen posted on Wednesday, February 16, 2011 - 6:16 pm

You don't show m1 and m2 in the model.

Tracy Witte posted on Thursday, February 17, 2011 - 6:52 am

I apologize for the typo. Here's the model I ran:

Y by Y1 y2 y3 y4;
z by z1 z2 z3 z4;
y with z;
y on m1;
y on m2;
z on m1;
z on m2;
y on x;
z on x;
m1 with m2;

MODEL INDIRECT:
y ind m1 x;
z ind m1 x;
y ind m2 x;
z ind m2 x;

Bengt O. Muthen posted on Thursday, February 17, 2011 - 4:02 pm

You don't show m1 and m2 on x, but I assume they are there given your question.

I would use STDY with a binary x. The fact that it is greater than 1 simply means that the mediator changes more than one SD when x changes from one category (such as male) to another; that's ok.

Miroslav posted on Sunday, September 25, 2011 - 11:14 am

Dear Dr. Muthen,
I was wondering whether it was possible to dummy code a multi categorical independant variable (e.g., experimentally manipulated 3 conditions in between)
as in a typical post hoc test (i.e., creating k variables instead of the traditional dummy/contrast coding resulting in k-1 variables) in a path model using Mplus?
Is there any supporting literature for such a practice?
Thanks you very much
Miro

Linda K. Muthen posted on Sunday, September 25, 2011 - 3:53 pm

This may get a better response if you post it on a general discussion forum like SEMNET.

Melvin C Y posted on Monday, April 02, 2012 - 10:48 am

In my model, I have 3 independent variables (1 latent continuous, x1, and 2 manifest binary, x2, x3). I believe independent variables should be allowed to covary and these are default options in mplus. Although I obtained a value for the covariance between x1-x2, and x1-x3, I did not get any value in the model output for the 2 binary variables, i.e., x2-x3. I examined tech1 and the PSI for the two binary variable is 0. Yet, when I add the command to covary x2 with x3, I get an error about non-positive definite. When I looked at the sample stats, x2 and x3 have a covariance of 0.03 and correlation of 0.198.

Should binary independent variables be uncorrelated with each other?

For your information, I did not specify the binary variables as categorical.

Thank you.

Linda K. Muthen posted on Monday, April 02, 2012 - 1:15 pm

In regression, the model is estimated conditioned on the observed exogenous variables. They are not uncorrelated with each other. Their means, variances, and covariances are not parameters in the model. You can find these values by looking at descriptive statistics for these variables.

I am surprised you get covariances among the latent and observed exogenous variables. Can you please send your license number and output to support@statmodel.com.

Kelly posted on Monday, April 15, 2013 - 10:05 pm

Hello,

At the risk of redundancy after reading the above posts, I'm wondering if the issue I am having with my model is the result of the same problem...

I am running into issues when attempting to allow an observed categorical independent variable (gender) to correlate with continuous independent variables (both latent and observed) in my model. The model runs normally when specifying correlations between the continuous independent variables, but when the categorical IV is added in, the non-positive definite error message pops up.

I can provide additional info (i.e., license no. and output) if needed. Thanks in advance for your help.

Linda K. Muthen posted on Tuesday, April 16, 2013 - 8:55 am

The message is generated because the mean and variance of a binary variable are not orthogonal. You can ignore this.

Minnik Findik posted on Thursday, February 11, 2016 - 5:06 am

Hi,

I run the following model

Emo by mams1x mams2x mams3x mams4x mams5x;
Beh by mams11x mams12x mams13x mams14x;
Emo Beh on Ethnic Language SEN CLA;
Ethnic with Language SEN CLA;
Language with SEN CLA;
SEN with CLA;

Because Ethnic Language SEN CLA are independent variables, I do not declare them as categorical (despite teh fact that they are). But then I have the following warning "WARNING: VARIABLE ETHNIC MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS."

But my model is estimated normally "THE MODEL ESTIMATION TERMINATED NORMALLY"

Can I ignore the warning message?

Thank you in advance

Linda K. Muthen posted on Thursday, February 11, 2016 - 9:52 am

The message is caused because you bring the covariates into the model by mentioning the WITH statements. In regression, the model is estimated conditioned on the covariates. Their means, variances, and covariances are not model parameters. If you remove the WITH statements, you will not get the message.

Minnik Findik posted on Monday, February 15, 2016 - 4:05 am

Hi Linda,

Yes when I remove the WITH statement, I do not get that message. However, I do want/need to declare the correlation between those variables. Can I do it in any other way so I will not get a warning?
Or is it alright to ignore the message?

Thank you very much in advance.

Linda K. Muthen posted on Monday, February 15, 2016 - 6:37 am

The correlations among these variables are not zero during model estimation. If you want to know what they are, ask for SAMPSTAT in the OUTPUT command and you will obtain them. If you are using the WITH statements to avoid losing cases because of missing data, you can put the WITH statements back now that you know they are causing the message. The message is triggered because the mean and variance of binary variables are not orthogonal.

Minnik Findik posted on Monday, February 15, 2016 - 7:26 am

Thank you very much!

Daniel Christensen posted on Wednesday, June 14, 2017 - 1:34 am

Dear Profs Muthen

I�m trying to conduct a mediation analysis where one of my IVs is categorical.

I�m following the approach suggested by Hayes and Preacher (2014) (Statistical mediation analysis with a multicategorical independent variable), dummy-coding education into Yr12, university or the reference category and then forcing a constraint for total effects for Yr12 and Uni on the outcome.

Model:
waiB on yr12 (a1)
uni (a2)
sq_inc SEIF NSB kes male;
atn on yr12 (a3)
uni (a4)
sq_inc SEIF NSB kes male;
reg on yr12 (a5)
uni (a6)
sq_inc SEIF NSB kes male;
y3readB on yr12 (cp1)
uni (cp2)
waiB (b1)
atn (b2)
reg (b3)
sq_inc SEIF NSB kes male;
Model indirect:
y3readB IND yr12 ;
y3readB IND uni;
y3readB IND sq_inc ;
y3readB IND SEIF;
y3readB IND NSB ;
y3readB IND kes ;
y3readB IND male;
Model constraint:
new (tot1 tot2);
tot1 = a1 * b1 + a3 * b2 + a5 * b3 + cp1;
tot2 = a2 * b1 + a4 * b2 + a6 * b3 + cp2;

Do you have any advice or warnings on this sort of analysis?

Daniel Christensen posted on Tuesday, June 20, 2017 - 6:29 pm

No concerns with this approach?

Bengt O. Muthen posted on Thursday, June 22, 2017 - 3:31 pm

I don't see that any constraints are imposed.

I don't see why one would take such a complicated approach - what is gained?

Daniel Christensen posted on Thursday, June 22, 2017 - 8:42 pm

Thanks

The idea was trying to make sure that MPLUS �knew� that uni and yr12 were part of the one dummy coded measure (education).

But I can see that the �total� constraint just forces the total effects for year 12 and uni to each add up (which it already does anyway).

So I�m confused as to whether this is necessary, or if you have any advice on a categorical independent variable in this instance.

Daniel Christensen posted on Tuesday, June 27, 2017 - 11:52 pm

Would you advise removing this?

Model constraint:
new (tot1 tot2);
tot1 = a1 * b1 + a3 * b2 + a5 * b3 + cp1;
tot2 = a2 * b1 + a4 * b2 + a6 * b3 + cp2;

Thanks

Daniel

Bengt O. Muthen posted on Wednesday, June 28, 2017 - 1:23 pm

Yes, I don't see that it has a purpose.

Salmi Md Zahid posted on Wednesday, July 25, 2018 - 9:45 pm

Hi,

Should we centering our binary/categorical predictors for multilevel modeling?

Is there any different on how we write the command for centering (cluster- or grand-mean) for categorical predictors? I just found the example for continuous predictors.

Thank you.

Bengt O. Muthen posted on Thursday, July 26, 2018 - 6:07 pm

See the paper on our website (e.g. under Recent Papers):

Asparouhov, T. & Muth�n, B. (2018). Latent variable centering of predictors and mediators in multilevel and time-series models. Technical Report. June 4, 2018. (Download scripts).

Salmi Md Zahid posted on Sunday, July 29, 2018 - 12:19 pm

Hi Mr. Muthen,

Thanks so much to let me know on your recent paper on centering. I just know that we have one more option of centering; latent centering which seems suit my study well since I would like to run two-level modeling with huge no of predictors (most of them are categorical). Even though im not yet check the ICCs for each predictors, i believe many of them are significant.

My question is there are several times in the paper mentioned that this latent centering work well for study with small no of cluster but mine is more than 100 clusters. What do you think? Should i apply latent clustering or not?

Thank you.

Tihomir Asparouhov posted on Monday, July 30, 2018 - 8:57 am

Yes. Larger number of clusters is not a problem. If you use the Bayes estimation the latent centering is done by default for you. See Table 8 and 9 directories for examples
http://www.statmodel.com/download/LatentCentering.zip

Orpha de Lenne posted on Wednesday, March 20, 2019 - 10:40 am

Dear Dr. Muthen,

I want to test a mediation model with a categorical independent variable. The categorical variable is the condition to which participants were exposed in the experiment and has 7 values or categories. Can I just enter this categorical variable in het model like that or should I make dummies?

Thank you.

Bengt O. Muthen posted on Wednesday, March 20, 2019 - 4:58 pm

Create dummies.

Eleanor Winpenny posted on Tuesday, August 06, 2019 - 9:13 am

Dear Drs Muthen,

I am running a growth model with several exogenous binary covariates (gender, race, having a child). In order to address missing data using FIML, I would like to bring these inside the model, and so have specified the variances on these variables in model i.e. using the code:
race2cat gender01 newkid12;
I have read that this makes distribution assumptions (I would assume normal distribution), and I think this is causing errors in my model. MPlus reports an error on the variance of the newkid12 variable, or if this variable is removed on the variance of the gender variable etc.
My question is whether it is possible to attribute some other distribution assumption (ie binary) to these independent variables?

Thank you for your help!

Bengt O. Muthen posted on Tuesday, August 06, 2019 - 4:02 pm

Our Regression and Mediation book discusses the different ways of handling missing data on binary covariates. The usual way of including them in the model is not always the best approach.

I suspect your error message mentions "first-order derivative" which refers to the fact that the mean and variance of a binary variable are mathematically related. If so, the message can be ignored.

If this doesn't help, send your output to Support along with your license number.

Eleanor Winpenny posted on Tuesday, August 06, 2019 - 11:37 pm

Dear Bengt,
Thank you for your advice, I will have a look at the book.

My error message states: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS 0.888D-18. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 65, NEWKID34

Can I ignore this message and proceed with the model?

Thank you,
Eleanor

Bengt O. Muthen posted on Wednesday, August 07, 2019 - 5:25 pm

Only if this variable is a binary covariate.

Luo Wenshu posted on Sunday, August 02, 2020 - 1:36 am

Dear Dr. Muthen,

Regarding exogenous latent variables and exogenous observed variables in the same SEM model, I see that you suggest that we should regress the latent variables on the observed variables. Is it OK to NOT define any relationship in the model between exogenous latent variables and exogenous observed variables if we are NOT interested in their relations and just want to control for the exogenous observed variables?

Thank you very much!

Bengt O. Muthen posted on Sunday, August 02, 2020 - 4:48 pm

If the x variable and the factor are indeed correlated, not allowing them to correlate somehow will make the model misspecified.

Luo Wenshu posted on Sunday, August 02, 2020 - 8:53 pm

Thank you, Dr Muthen.

When there are several observed X variables, then we should not specify correlations among them even though they are indeed correlated. Is my understanding right?

Thank you very much!

Bengt O. Muthen posted on Monday, August 03, 2020 - 5:33 pm

No, you should correlate the X's.

Luo Wenshu posted on Tuesday, August 04, 2020 - 7:41 pm

Dear Dr Muthen,

I saw this following message under this same topic by Linda K. Muthen posted on Wednesday, January 26, 2005

"You should not specify the covariances among the observed x variables. They are not estimated as part of the model as in regular regression."

I have tried myself to see the difference in the output between specifying and not specifying correlation between the observed X variables.

The results showed that the two models resulted in exactly the same model fit and path coefficients and so on.

However, when we specify the correlations between the observed X variables, the model will produce the correlations, means and variances of the X variables.

So it seems both ways are OK as the correlations, means and variances of the X variables can also be found in the sample statistics.

Hope my understanding is right. Thank you very much.

Bengt O. Muthen posted on Wednesday, August 05, 2020 - 4:50 pm

That's correct. I thought you were still talking about the case where you have an exogenous factor and some x variables. If you correlate one x with the factor, you turn that x into a y in Mplus thinking. And therefore, that new y is not correlated with the other x's.

Luo Wenshu posted on Wednesday, August 05, 2020 - 7:24 pm

Dear Dr Muthen,

I see your point.

Now I think it is better to regress an exogenous factor to the observed X variables (rather than correlate) as long as this is theoretically sound.

Thank you very much, Dr Muthen :-)

Bengt O. Muthen posted on Thursday, August 06, 2020 - 5:35 pm

Right.