SEM with continuous and categorical data
Message/Author
 Evangelos Tsoukatos posted on Tuesday, October 11, 2005 - 11:41 pm
Dear Linda,

I am quite new in SEM and perhaps my inquiry is trivial to you.
I am trying to analyze a model (N-519)with 8 observed variables (path analysis) of which 5 are continuous, one is Likert 1-10 and 2 are categorical. The Likert variable and the two categoricals are DVs.
Further, the continuous variables depart from mv normality.
I am quite confused about which method of estimation to use. Any suggestions? Is there any literature on the subject?

Vagelis
 Linda K. Muthen posted on Wednesday, October 12, 2005 - 8:35 am
You have two estimator choices in Mplus -- weighted least squares (WLSMV) or maximum likelihood (MLR). You may find the following articles helpful:

Muthén, B. & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.

Muthén, B. & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.

See also Web Note 4 and other references that are available on the website.
 Daniel Rodriguez posted on Wednesday, May 17, 2006 - 10:08 am
Hi Linda and Bengt,
I have two latent variables in an SEM, each of which has categorical indicator variables with the same four-point scales. Is it possible to interpret significant results of predictor variables on the criterion factors in terms of odds ratios since the latent variable is based on observed variables with the same four point scale. For instance, can I say, for each unit change in the predictor, the odds of risk beliefs increase by such and such (exponentiating the beta log odds)?
 Linda K. Muthen posted on Wednesday, May 17, 2006 - 11:07 am
The regression of a continuous factor on a covariate is a simple linear regression and is interpreted as such. It is the regression of the categorical factor indicator on the factor that is a probit or logistic regression.
 Daniel Rodriguez posted on Wednesday, May 17, 2006 - 11:10 am
Ok, I see.
 Daniel Rodriguez posted on Wednesday, May 17, 2006 - 11:26 am
Linda,
I also ran a test for an indirect effect, which was significant. I will report the effect plus confidence interval. However, is the parameter estimate a log odds beta if the outcome variable is categorical? I just want to be sure how to interpret the results for my paper.
 Linda K. Muthen posted on Wednesday, May 17, 2006 - 11:45 am
Which estimator are you using? Weighted least squares or maximum likelihood?
 Daniel Rodriguez posted on Wednesday, May 17, 2006 - 12:04 pm
WLSMV
 Linda K. Muthen posted on Wednesday, May 17, 2006 - 1:58 pm
Then the regression coefficient is a probit regression coefficient.
 Daniel Rodriguez posted on Thursday, May 18, 2006 - 5:07 am
Thanks again.
 Jungeun Lee posted on Friday, August 03, 2007 - 1:16 pm
Hi,

I have a SEM model in which all observed variables are categorical. I used 'Categorical' option for this model and estimated it with Weighted least squares estimator. I am not quite sure about how to interpret coefficents. Following is a short version of my mplus input for the model. In parenthesis, I added my current thought about how these coefficents should be interpreted. Am I on track???

MODEL:
(Probit)f1 by zpotgodr zpotplrr zpotharr;
(linear regression)f2 on f1;
(probit)ncmopotr on f2;
(linear regression)f4 on ncmopotr;
 Linda K. Muthen posted on Friday, August 03, 2007 - 1:52 pm
It seems correct if ncmopotr is categorical.
 TD posted on Tuesday, April 15, 2008 - 11:49 am
Hi -

I am running a path model with a combination of categorical and continuous indicators. I am running the same model on five different groups (one with the groups pooled, the other on four different groups separately). I have two main questions:

1) Although I use the same variables in with each group, I get starkly different sample statistics (means/thresholds/intercepts) for each group. For example, in one group I get a set of means that seems to make sense (e.g. 42 for a continuous variable with a range of 20-60, .42 for a binary ordered categorical variable ranging 0-1). But in another group, the same variables will have means of 4, and 2.6 for example. Why is this the case? Could this be causing convergence problems?

2) It seems like models with combinations of categorical and continuos variables have poorer model fit as measured by RMSEA compared with models having only continuous variables. Is this the case?

 Linda K. Muthen posted on Tuesday, April 15, 2008 - 1:16 pm
1. Either the groups have very different sample statistics or the data are being read incorrectly due to an error in the input.

2. I don't think there is any basis for this conclusion.
 jtw posted on Thursday, May 14, 2009 - 4:42 pm
Hello:

I have a relatively complex SEM model in which endogenous variables are a mixture of continuous, categorical, AND count variables. I can't seem to find an estimator that can simultaneously handle all three types of data. I am running version 3.1 (I know I need to upgrade!). Can MPLUS handle this situation? If so, what estimator would be able to handle a complex model with all three types of data?
 Linda K. Muthen posted on Thursday, May 14, 2009 - 6:10 pm
You can do this in the current version of Mplus using maximum likelihood estimation. Note that numerical integration is required and each factor represents one dimension of integration.
 Cecily Na posted on Wednesday, December 01, 2010 - 4:52 pm
Dear Linda,
Can a factor contain indicators in different scales? For example a factor is drug use, the three indictors are 1)frequency of drug, 2)age at first time drug use, 3)whether or not IV users (dichonomous)?
I guess I should use WLSMV estimation in this case? How would I interpret the paths between the factor and different indicators? Especially between the factor and the dichonomous indicator?
Thanks a lot!
 Linda K. Muthen posted on Wednesday, December 01, 2010 - 5:22 pm
Factor indicators can be measured on different scales. The scale determines the type of regression coefficient. For continous, it is linear. For categorical, it is probit with WLSMV and logistic with ML unless the probit link is used.
 mari posted on Monday, February 14, 2011 - 1:37 pm
Hello,

I have two follow-up questions to the posting from [Jungeun Lee posted on Friday, August 03, 2007 - 1:16 pm]. I have a similar model in which all observed variables are binary. I used WLSMV for my ESEM model.

1) In the printed output under "model results", does the estimates of "factor by indicator" mean factor loadings or probit regression coefficients?

2) You confirmed that the estimates of "factor on covariate" are linear regression cofficients, and the estimate of "distal outcome on factor" is a probit regression coefficient(if distal outcome is binary).
My ESEM model also have a path from covariate (binary) to distal outcome (binary). In the output, does the estimate of "distal outcome on covariate" also mean a probit regression coefficient (instead of a logistic regression)?

I am sorry for such a beginner's question. Many thanks in advance.
 Linda K. Muthen posted on Monday, February 14, 2011 - 2:07 pm
1. Yes.
2. Yes.
 Myong Hwa Lee posted on Wednesday, February 16, 2011 - 6:20 am
Sorry, did you mean 'yes' for factor loading in the first question?
 Linda K. Muthen posted on Wednesday, February 16, 2011 - 6:31 am
 mari posted on Monday, March 28, 2011 - 12:15 pm
Hello Linda,

Following up the questions above (posted on 2/14/2011), I have three questions for my ESEM model with WLSMV. I have two EFA factors for 20 items, three covariates, and one distal outcome. All observed variables are dichotomous.

Q1: I am wondering if a path coefficient(e.g., 1.2)from a covariate to a EFA factor can be interpreted like one unit change in x increases y by 1.2?

Q2: one covariate is gender in my model. I am wondering if I should test measurement invariance before interpreting the path coefficient from gender to a factor.

Q3: If Q2 is yes, how about the path from a factor to distal outcome. My distal outcome is drug use (yes/no). In this case, do I need to test measurement invariance?

 Bengt O. Muthen posted on Monday, March 28, 2011 - 4:47 pm
Q1. ESEM factors (typically) have the metric set so that their variances are 1. So 1.2 means that when x changes 1 unit, the factor changes 1.2 SDs.

Q2. That's always a good idea to make sure you are talking about the same factor for the two genders.

Q3. Yes for the same reason as Q2.
 mari posted on Wednesday, April 13, 2011 - 7:42 am
Dear Bengt

As a follow-up question to your answer about Q3 above, I am wondering how to test measurement invariance for EFA factors in this ESEM model. I am learning multiple group (MG) analysis to test measurement invariance, but it seems that MG analysis cannot handle efa factors. Since all 20 items load on both factors, I believe that MG is not an option for me.

Then, how can I test measurement invariance for the path from efa factors to distal outcome? Am I missing something? I would appreciate any guidance. Thank you.
 Linda K. Muthen posted on Wednesday, April 13, 2011 - 9:37 am
This is possible and illustrated in Example 5.27.
 mari posted on Thursday, April 14, 2011 - 11:34 am
Thank you, Linda! After I tried ex.5.27, I got two more questions.

Q1. When I used "type=imputation" for 20 imputed data sets, the output did not print model fit information. When I used the same syntax to one of the 20 data sets without type=imputation, it printed model fit info. I wonder if model fit info cannot be computed when using multiply imputed data sets.

Q2. Even when using one data set, any MG models with commands "model g2" did not run. The following is input excerpts and errors:

-------------------------------------
GROUPING is mrjfq3dyb (0 = g1 1 = g2);

Model: people disorder by stepsab2 - balcoab2 (*1);
[people disorder @ 0];

Model g2: [stepsab2 - balcoab2];

*** ERROR
The following MODEL statements are ignored: * Statements in Group G2:

*** ERROR in MODEL command
EFA factors in the same set as PEOPLE must have all fixed or free means. Problem with: [ PEOPLE ]

--------------------------------------
When I did not have group-specific commands, the models ran well. I am wondering how I can resolve this problem.
 Linda K. Muthen posted on Thursday, April 14, 2011 - 2:30 pm
Model fit is summarized over the imputation with TYPE=IMPUTATION. Fit statistics have not yet been developed.

You should say:

[people-disorder @ 0];

or

[people@0 disorder @ 0];
 Mohamed Abou-Shouk posted on Monday, April 18, 2011 - 8:10 am
Hi,
I am trying to run an sem model with one binary dependent variable (u1) and two independent contiunous latent variables (f5 and f9).
f5 contains f1-f4, f9 contains f6-f8.
I have written the syntax as follows:

variable: names are x1-x39 u1;
categorical is u1;

model:
f1 by x1-x4;
f2 by x5-x9;
f3 by x10-x15;
f4 by x16-x21;
f5 by f1-f4;

f6 by x22-x25;
f7 by x26-x30;
f7 by x31-36;
f8 by x37-x39;
f9 by f6-f8;

u1 on f5 f9;

is this right?or i have to add anything else?
note: the CFA for f5 and f9 has a good fit.
Many thanks,
 Linda K. Muthen posted on Monday, April 18, 2011 - 8:25 am
This looks correct. The best way to know if you have specified the model correctly and that the defaults are what you want is to run it and look at the results and TECH1.
 Mohamed Abou-Shouk posted on Monday, April 18, 2011 - 8:54 am
Thank you.
Does it make sense if i run the model as two models where:
u1 on f5; can be separate model
then i run another model as
u1 on f9;

before i run the whole model as
u1 on f5 f9;

also do i have to run a model with the interaction of f5 f9 and their impacts on u1.

Many thanks,
 Linda K. Muthen posted on Monday, April 18, 2011 - 9:09 am
Testing of submodels can help understand problems in the full model. It would be your decision whether to include the interaction.
 Chantal Hermann posted on Thursday, June 20, 2013 - 9:59 am
Hello,

I would greatly appreciate help with the following:

I would like to run an SEM model (see below). All of the observed variables are categorical (likert scales 0, 1, 2), with the exception of TOTSOVIC (continuous) and SSPIAPP (0 to 6).

1) Is it appropriate to use the MLR estimator as listed in the syntax below?

2)How would I go about testing nested models for this model? Would it be appropriate to remove pathways (please see below).

Original Model:
VARIABLE:
NAMES = ECWC ECWCD IDLIP IDGSRL S9910 SEXPRE SEXCOPE DSI2000 DSI2007 TOTSOVIC TOTCHVIC SSPIAPP;
USEVARIABLES = ECWC SSPIAPP SEXPRE TOTCHVIC DSI2007 SEXCOPE;
ANALYSIS: ESTIMATOR = MLR;
TYPE = RANDOM;
ALGORITHM = INTEGRATION;
MODEL:
SSR BY SEXPRE SEXCOPE; !SEXUAL SELF REGULATION;
PEDO BY SSPIAPP DSI2007 TOTCHVIC; !PEDOPHILIA;
ECWC@0.103;
ECWCL BY ECWC@1.00;
ECWCL WITH SSR;
ECWCXSSR | ECWCL XWITH SSR;
PEDO ON ECWCL SSR ECWCXSSR;
OUTPUT: TECH1 TECH8;

Nested model:
MODEL:
SSR BY SEXPRE SEXCOPE; !SEXUAL SELF REGULATION;
PEDO BY SSPIAPP DSI2007 TOTCHVIC; !PEDOPHILIA;
ECWC@0.103;
ECWCL BY ECWC@1.00;
ECWCL WITH SSR;
PEDO ON ECWCL SSR;

Thank you very much for your help!
 Linda K. Muthen posted on Thursday, June 20, 2013 - 12:11 pm
You must use maximum likelihood with TYPE=RANDOM. You should be the categorical variable on the CATEGORICAL list if you want them treated as categorical.
 Chantal Hermann posted on Friday, June 21, 2013 - 9:06 am
Hi Linda,

1) Can I test an interaction between an observed variables (ECWC) and a continuous latent variable? or do I have to create a continuous latent variable based on a single indicator?

2) If a single indicator is an exogenous variable do/should I correct for measurement error?

For example, should I do this:
ECWC@0.103;
ECWCL BY ECWC@1.00;
ECWCL WITH SSR;
ECWCXSSR | ECWCL XWITH SSR;
PEDO ON ECWCL SSR ECWCXSSR;

or this:

ECWC WITH SSR
ECWCXSSR | ECWC XWITH SSR;
PEDO ON ECWC SSR ECWCXSSR;

 Bengt O. Muthen posted on Friday, June 21, 2013 - 6:33 pm
1) Yes. No.

2) I would only correct for measurement error in a single-indicator model if you have very good information about the reliability and it is not high. You don't need to do that just to have the interaction.
 Chantal Hermann posted on Saturday, June 22, 2013 - 5:25 am
Thank you very much for your help.

I have one more question:

1) If my exogenous variable (see ECWC above) is categorical (Likert scale 1, 2, 3), do I need to list it as categorical in my syntax? In the MPlus user manual it states to only list dependent variables as categorical.

Thank you!
 Linda K. Muthen posted on Saturday, June 22, 2013 - 5:57 am
Only dependent variables go on the CATEGORICAL list. In regression, covariates are either binary or continuous and in both cases are treated as continuous. You can treat your variable as continuous or create a set of dummy variables.
 Chantal Hermann posted on Saturday, June 22, 2013 - 9:21 am
Thank you for your help. When I run the model (see syntax below) with the ML/MLR estimator I get the warning below. This warning goes away when I use the MLF estimator - can the results with the MLF estimator be trusted?

VARIABLE:
NAMES ARE ECWC ECWCD LOVER REJECT LIVED SEXPRE SEXCOPE DSI2000
DSI2007 TOTVIC TOTCHVIC SSPIC SSPIA;
USEVARIABLES ARE ECWC SEXPRE SEXCOPE DSI2007 TOTCHVIC SSPIC;
MISSING ARE ALL (-9.00);
CATEGORICAL ARE DSI2007 SEXPRE SEXCOPE! ONLY LIST DEPENDENT INDICATORS AS CATEGORICAL (
ANALYSIS:
ESTIMATOR IS MLR;
TYPE = RANDOM;
ITERATIONS = 1000;
CONVERGENCE = 0.00005;
H1ITERATIONS = 500;
H1CONVERGENCE = 0.0001;
COVERAGE = 0.10;
MODEL:
SSR BY SEXPRE SEXCOPE; ! SEXUAL SELF REGULATION;
PEDO BY SSPIC DSI2007 TOTCHVIC; !PEDOPHILIA;
ECWCXSSR | ECWC XWITH SSR; ! COMPUTING INTERACTION TERM;
SSR WITH ECWC;
PEDO ON ECWC SSR ECWCXSSR;

WARNING: THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE
OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. AN ADJUSTMENT TO THE ESTIMATION OF THE INFORMATION MATRIX HAS BEEN MADE.THE CONDITION NUMBER IS -0.245D-03.THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OR LOGCRITERION OPTIONS OR BY CHANGING THE STARTING VALUES OR BY INCREASING THE NUMBER OF INTEGRATION POINTS OR BY USING THE MLF ESTIMATOR.
 Bengt O. Muthen posted on Saturday, June 22, 2013 - 9:51 am
In our experience the (adjusted) MLR solution that you show above is better than the MLF solution. So you can ignore this warning here.
 Chantal Hermann posted on Saturday, June 22, 2013 - 11:26 am
Sorry to bother you again. When I run my model using the MLR estimator (see syntax in above post) the fit indices are missing (RMSEA, CFI/TLI etc.). Is it possible to get these somehow?

Thank you very much for your patience and help!
 Linda K. Muthen posted on Saturday, June 22, 2013 - 12:58 pm
Chi-square and related fit statistics are not available if means, variances, and covariances are not sufficient statistics for model estimation. Difference testing of nested models can be done using -2 times the loglikelihood difference which is distributed as chi-square.
 adwin posted on Monday, February 02, 2015 - 10:34 pm
Dear Sir/Mam

I am a new mplus user. I just learned how to use the software to estimated a simple model which contains a continuous dependent variable (i.e. profitpr); a latent variable with 4 observed variables {i.e. b3_1-b3_4, and all in 1-7 Likert scale (1/disagree to 7/agree)}; and a continuous independent variable (i.e. lnloanac).
I typed the command as:

.......
Variable:
Names are profitpr b3_1 b3_2 b3_3 b3_4 lnloanac;
categorical are b3_1 b3_2 b3_3 b3_4;

Analysis:
MODEL : innov by b3_1 b3_2 b3_3 b3_4;
profitpr on innov lnloanac;

After running the program, I had this results:

......
*** ERROR
Categorical variable B3_1 contains 28 categories.
This exceeds the maximum allowed of 10.
......

My question are :
Why does Mplus consider 28 categories instead of 7 categories for variable b3_1?
How can I deal with such problem?

Thank you very much.

Kind Regards
 Linda K. Muthen posted on Tuesday, February 03, 2015 - 5:53 am
You are most likely misreading the data set. Common reasons for this are:

1. Blanks in a free format data set.
2. More variable names in the NAMES statement than columns in the data set.
3. Variable names do not correspond to the columns of the data set.

 adwin posted on Tuesday, February 03, 2015 - 4:53 pm
Yes, you're right. The problem is now solved.
Thank you very much for your help.

Kind Regards,

 anonymous Z posted on Sunday, February 15, 2015 - 7:44 pm
Dr. Muthens,
I have two questions about SEM:
1.According to my reading, it seems that continuous indicators that are scaled very differently can load on the same latent variable. For instance, indicators may have different range, one ranging from 0 to 5, while the other ranging from 5 to 20 and etc. Is this correct?
2.I wondered if a combination of continuous and categorical indicators can load on the same latent variable. For instance, indicator 1 and 2 are categorical, while indicator 3 is continuous, can they load on the same late variable?

Thank you very much,
 Linda K. Muthen posted on Monday, February 16, 2015 - 5:14 am
1. Yes.
2. Yes.
 Mary M Mitchell posted on Wednesday, January 24, 2018 - 6:21 pm
Dear Drs. Muthen,

I am running a Montecarlo power analysis for a logistic regression and want to set the threshold for u1 as a 25%/75% split. I know the thresholds go from -15 to +15 on a normal distribution, but I'm wondering what threshold corresponds with this split?

Thanks,

Mary Mitchell
 Bengt O. Muthen posted on Thursday, January 25, 2018 - 1:33 pm
The probability of u=1 is

P(u=1) = 1/(1+exp(threshold- b*x))

where b is the regression slope. Writing

logit = - threshold + b*x,

you get the logit from the probability as

log(P/(1-P))

and from that logit you get

threshold = b*x - logit.
 Mary M Mitchell posted on Thursday, January 25, 2018 - 4:23 pm
Thanks Bengt!
 Javed Ashraf posted on Friday, March 16, 2018 - 9:37 am
Can our observed exogenous and endogenous variables be categorical (dichotomous,ordinal, binary) for conducting CFA and SEM alongside moderation and mediation analyses.
 Bengt O. Muthen posted on Friday, March 16, 2018 - 2:02 pm
 Sehrish Shahid posted on Sunday, April 22, 2018 - 7:24 pm
Hi I got this error message upon running 1-1-2 model. I have 358 observation on IV, Mediator 1, Mediator 2. For DV, i have 61 observations.
CLUSTER IS WU;
Analysis: TYPE IS TWOLEVEL RANDOM;
Model:
%within%
Psyw by Psy1-Psy24;
OVw by Ov1-Ov15;
Tw by T1-T10;
TFw by TF1-TF20;
Tw on Psyw;
Tw on OVw;
Psyw on TFw;
OVw on TFw;
%between%
Psyb by Psy1-Psy24;
OVb by Ov1-Ov15;
Tb by T1-T10;
TFb by TF1-TF20;
TP by IR1-IR7 OCBI1-OCBI7 OCBO1-OCBO7;
Tb on Psyb(b1);
Tb on OVb(b2);
Psyb on TFb(a1);
OVb on TFb(a2);
Tb on TFb;
TP on Tb(d1);
TP on Psyb(d2);
TP on OVb(d3);
Tb on TFb(c1);
Psyb on TF(c2);
OVb on TFb(c3);
TP on TFb;
MODEL CONSTRAINT:
NEW(a1b1 a2b2 c1d1 c2d2 c3d3);
a1b1=a1*b1;
a2b2=a2*b2;
c1d1=c1*d1;
c2d2=c2*d2;
c3d3=c3*d3;
OUTPUT: TECH1 TECH8 CINTERVAL;
*** ERROR
Unexpected end of file reached in data file.
 Bengt O. Muthen posted on Monday, April 23, 2018 - 4:41 pm
You might have more variables in your NAMES = list than columns in your data. If this doesn't help, send data and input to Support along with your license number.
 Jeremy Saenz posted on Sunday, June 17, 2018 - 5:57 am
I am hoping someone can help with this. I used gender as a predictor variable (categorical) to depression (negative scores are better).

How would I interpret the estimates? For example, negative estimates for a gender to depression path.
 Bengt O. Muthen posted on Monday, June 18, 2018 - 9:38 am
If for example gender = 0 for males and 1 for females, a negative effect means that females have lower depression.