Message/Author 


Dear Linda, I am quite new in SEM and perhaps my inquiry is trivial to you. I am trying to analyze a model (N519)with 8 observed variables (path analysis) of which 5 are continuous, one is Likert 110 and 2 are categorical. The Likert variable and the two categoricals are DVs. Further, the continuous variables depart from mv normality. I am quite confused about which method of estimation to use. Any suggestions? Is there any literature on the subject? Vagelis 


You have two estimator choices in Mplus  weighted least squares (WLSMV) or maximum likelihood (MLR). You may find the following articles helpful: Muthén, B. & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171189. Muthén, B. & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of nonnormal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 1930. See also Web Note 4 and other references that are available on the website. 


Hi Linda and Bengt, I have two latent variables in an SEM, each of which has categorical indicator variables with the same fourpoint scales. Is it possible to interpret significant results of predictor variables on the criterion factors in terms of odds ratios since the latent variable is based on observed variables with the same four point scale. For instance, can I say, for each unit change in the predictor, the odds of risk beliefs increase by such and such (exponentiating the beta log odds)? 


The regression of a continuous factor on a covariate is a simple linear regression and is interpreted as such. It is the regression of the categorical factor indicator on the factor that is a probit or logistic regression. 


Ok, I see. 


Linda, I also ran a test for an indirect effect, which was significant. I will report the effect plus confidence interval. However, is the parameter estimate a log odds beta if the outcome variable is categorical? I just want to be sure how to interpret the results for my paper. 


Which estimator are you using? Weighted least squares or maximum likelihood? 


WLSMV 


Then the regression coefficient is a probit regression coefficient. 


Thanks again. 

Jungeun Lee posted on Friday, August 03, 2007  1:16 pm



Hi, I have a SEM model in which all observed variables are categorical. I used 'Categorical' option for this model and estimated it with Weighted least squares estimator. I am not quite sure about how to interpret coefficents. Following is a short version of my mplus input for the model. In parenthesis, I added my current thought about how these coefficents should be interpreted. Am I on track??? MODEL: (Probit)f1 by zpotgodr zpotplrr zpotharr; (linear regression)f2 on f1; (probit)ncmopotr on f2; (linear regression)f4 on ncmopotr; 


It seems correct if ncmopotr is categorical. 

TD posted on Tuesday, April 15, 2008  11:49 am



Hi  I am running a path model with a combination of categorical and continuous indicators. I am running the same model on five different groups (one with the groups pooled, the other on four different groups separately). I have two main questions: 1) Although I use the same variables in with each group, I get starkly different sample statistics (means/thresholds/intercepts) for each group. For example, in one group I get a set of means that seems to make sense (e.g. 42 for a continuous variable with a range of 2060, .42 for a binary ordered categorical variable ranging 01). But in another group, the same variables will have means of 4, and 2.6 for example. Why is this the case? Could this be causing convergence problems? 2) It seems like models with combinations of categorical and continuos variables have poorer model fit as measured by RMSEA compared with models having only continuous variables. Is this the case? I would appreciate any help you could offer. 


1. Either the groups have very different sample statistics or the data are being read incorrectly due to an error in the input. 2. I don't think there is any basis for this conclusion. 

jtw posted on Thursday, May 14, 2009  4:42 pm



Hello: I have a relatively complex SEM model in which endogenous variables are a mixture of continuous, categorical, AND count variables. I can't seem to find an estimator that can simultaneously handle all three types of data. I am running version 3.1 (I know I need to upgrade!). Can MPLUS handle this situation? If so, what estimator would be able to handle a complex model with all three types of data? Thank you for your time. 


You can do this in the current version of Mplus using maximum likelihood estimation. Note that numerical integration is required and each factor represents one dimension of integration. 

Cecily Na posted on Wednesday, December 01, 2010  4:52 pm



Dear Linda, Can a factor contain indicators in different scales? For example a factor is drug use, the three indictors are 1)frequency of drug, 2)age at first time drug use, 3)whether or not IV users (dichonomous)? I guess I should use WLSMV estimation in this case? How would I interpret the paths between the factor and different indicators? Especially between the factor and the dichonomous indicator? Thanks a lot! 


Factor indicators can be measured on different scales. The scale determines the type of regression coefficient. For continous, it is linear. For categorical, it is probit with WLSMV and logistic with ML unless the probit link is used. 

mari posted on Monday, February 14, 2011  1:37 pm



Hello, I have two followup questions to the posting from [Jungeun Lee posted on Friday, August 03, 2007  1:16 pm]. I have a similar model in which all observed variables are binary. I used WLSMV for my ESEM model. 1) In the printed output under "model results", does the estimates of "factor by indicator" mean factor loadings or probit regression coefficients? 2) You confirmed that the estimates of "factor on covariate" are linear regression cofficients, and the estimate of "distal outcome on factor" is a probit regression coefficient(if distal outcome is binary). My ESEM model also have a path from covariate (binary) to distal outcome (binary). In the output, does the estimate of "distal outcome on covariate" also mean a probit regression coefficient (instead of a logistic regression)? I am sorry for such a beginner's question. Many thanks in advance. 


1. Yes. 2. Yes. 


Sorry, did you mean 'yes' for factor loading in the first question? 


Yes, the factor loading is a probit regression coefficient. 

mari posted on Monday, March 28, 2011  12:15 pm



Hello Linda, Following up the questions above (posted on 2/14/2011), I have three questions for my ESEM model with WLSMV. I have two EFA factors for 20 items, three covariates, and one distal outcome. All observed variables are dichotomous. Q1: I am wondering if a path coefficient(e.g., 1.2)from a covariate to a EFA factor can be interpreted like one unit change in x increases y by 1.2? Q2: one covariate is gender in my model. I am wondering if I should test measurement invariance before interpreting the path coefficient from gender to a factor. Q3: If Q2 is yes, how about the path from a factor to distal outcome. My distal outcome is drug use (yes/no). In this case, do I need to test measurement invariance? Thank you in advance. 


Q1. ESEM factors (typically) have the metric set so that their variances are 1. So 1.2 means that when x changes 1 unit, the factor changes 1.2 SDs. Q2. That's always a good idea to make sure you are talking about the same factor for the two genders. Q3. Yes for the same reason as Q2. 

mari posted on Wednesday, April 13, 2011  7:42 am



Dear Bengt As a followup question to your answer about Q3 above, I am wondering how to test measurement invariance for EFA factors in this ESEM model. I am learning multiple group (MG) analysis to test measurement invariance, but it seems that MG analysis cannot handle efa factors. Since all 20 items load on both factors, I believe that MG is not an option for me. Then, how can I test measurement invariance for the path from efa factors to distal outcome? Am I missing something? I would appreciate any guidance. Thank you. 


This is possible and illustrated in Example 5.27. 

mari posted on Thursday, April 14, 2011  11:34 am



Thank you, Linda! After I tried ex.5.27, I got two more questions. Q1. When I used "type=imputation" for 20 imputed data sets, the output did not print model fit information. When I used the same syntax to one of the 20 data sets without type=imputation, it printed model fit info. I wonder if model fit info cannot be computed when using multiply imputed data sets. Q2. Even when using one data set, any MG models with commands "model g2" did not run. The following is input excerpts and errors:  GROUPING is mrjfq3dyb (0 = g1 1 = g2); Model: people disorder by stepsab2  balcoab2 (*1); [people disorder @ 0]; Model g2: [stepsab2  balcoab2]; *** ERROR The following MODEL statements are ignored: * Statements in Group G2: *** ERROR in MODEL command EFA factors in the same set as PEOPLE must have all fixed or free means. Problem with: [ PEOPLE ]  When I did not have groupspecific commands, the models ran well. I am wondering how I can resolve this problem. 


Model fit is summarized over the imputation with TYPE=IMPUTATION. Fit statistics have not yet been developed. You should say: [peopledisorder @ 0]; or [people@0 disorder @ 0]; 


Hi, I am trying to run an sem model with one binary dependent variable (u1) and two independent contiunous latent variables (f5 and f9). f5 contains f1f4, f9 contains f6f8. I have written the syntax as follows: variable: names are x1x39 u1; categorical is u1; model: f1 by x1x4; f2 by x5x9; f3 by x10x15; f4 by x16x21; f5 by f1f4; f6 by x22x25; f7 by x26x30; f7 by x3136; f8 by x37x39; f9 by f6f8; u1 on f5 f9; is this right?or i have to add anything else? note: the CFA for f5 and f9 has a good fit. Many thanks, 


This looks correct. The best way to know if you have specified the model correctly and that the defaults are what you want is to run it and look at the results and TECH1. 


Thank you. Does it make sense if i run the model as two models where: u1 on f5; can be separate model then i run another model as u1 on f9; before i run the whole model as u1 on f5 f9; also do i have to run a model with the interaction of f5 f9 and their impacts on u1. Many thanks, 


Testing of submodels can help understand problems in the full model. It would be your decision whether to include the interaction. 


Hello, I would greatly appreciate help with the following: I would like to run an SEM model (see below). All of the observed variables are categorical (likert scales 0, 1, 2), with the exception of TOTSOVIC (continuous) and SSPIAPP (0 to 6). 1) Is it appropriate to use the MLR estimator as listed in the syntax below? 2)How would I go about testing nested models for this model? Would it be appropriate to remove pathways (please see below). Original Model: VARIABLE: NAMES = ECWC ECWCD IDLIP IDGSRL S9910 SEXPRE SEXCOPE DSI2000 DSI2007 TOTSOVIC TOTCHVIC SSPIAPP; USEVARIABLES = ECWC SSPIAPP SEXPRE TOTCHVIC DSI2007 SEXCOPE; ANALYSIS: ESTIMATOR = MLR; TYPE = RANDOM; ALGORITHM = INTEGRATION; MODEL: SSR BY SEXPRE SEXCOPE; !SEXUAL SELF REGULATION; PEDO BY SSPIAPP DSI2007 TOTCHVIC; !PEDOPHILIA; ECWC@0.103; ECWCL BY ECWC@1.00; ECWCL WITH SSR; ECWCXSSR  ECWCL XWITH SSR; PEDO ON ECWCL SSR ECWCXSSR; OUTPUT: TECH1 TECH8; Nested model: MODEL: SSR BY SEXPRE SEXCOPE; !SEXUAL SELF REGULATION; PEDO BY SSPIAPP DSI2007 TOTCHVIC; !PEDOPHILIA; ECWC@0.103; ECWCL BY ECWC@1.00; ECWCL WITH SSR; PEDO ON ECWCL SSR; Thank you very much for your help! 


You must use maximum likelihood with TYPE=RANDOM. You should be the categorical variable on the CATEGORICAL list if you want them treated as categorical. 


Hi Linda, Thank you for your reply. To follow this up: 1) Can I test an interaction between an observed variables (ECWC) and a continuous latent variable? or do I have to create a continuous latent variable based on a single indicator? 2) If a single indicator is an exogenous variable do/should I correct for measurement error? For example, should I do this: ECWC@0.103; ECWCL BY ECWC@1.00; ECWCL WITH SSR; ECWCXSSR  ECWCL XWITH SSR; PEDO ON ECWCL SSR ECWCXSSR; or this: ECWC WITH SSR ECWCXSSR  ECWC XWITH SSR; PEDO ON ECWC SSR ECWCXSSR; Thank you for your help! 


1) Yes. No. 2) I would only correct for measurement error in a singleindicator model if you have very good information about the reliability and it is not high. You don't need to do that just to have the interaction. 


Thank you very much for your help. I have one more question: 1) If my exogenous variable (see ECWC above) is categorical (Likert scale 1, 2, 3), do I need to list it as categorical in my syntax? In the MPlus user manual it states to only list dependent variables as categorical. Thank you! 


Only dependent variables go on the CATEGORICAL list. In regression, covariates are either binary or continuous and in both cases are treated as continuous. You can treat your variable as continuous or create a set of dummy variables. 


Thank you for your help. When I run the model (see syntax below) with the ML/MLR estimator I get the warning below. This warning goes away when I use the MLF estimator  can the results with the MLF estimator be trusted? VARIABLE: NAMES ARE ECWC ECWCD LOVER REJECT LIVED SEXPRE SEXCOPE DSI2000 DSI2007 TOTVIC TOTCHVIC SSPIC SSPIA; USEVARIABLES ARE ECWC SEXPRE SEXCOPE DSI2007 TOTCHVIC SSPIC; MISSING ARE ALL (9.00); CATEGORICAL ARE DSI2007 SEXPRE SEXCOPE! ONLY LIST DEPENDENT INDICATORS AS CATEGORICAL ( ANALYSIS: ESTIMATOR IS MLR; TYPE = RANDOM; ITERATIONS = 1000; CONVERGENCE = 0.00005; H1ITERATIONS = 500; H1CONVERGENCE = 0.0001; COVERAGE = 0.10; MODEL: SSR BY SEXPRE SEXCOPE; ! SEXUAL SELF REGULATION; PEDO BY SSPIC DSI2007 TOTCHVIC; !PEDOPHILIA; ECWCXSSR  ECWC XWITH SSR; ! COMPUTING INTERACTION TERM; SSR WITH ECWC; PEDO ON ECWC SSR ECWCXSSR; WARNING: THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. AN ADJUSTMENT TO THE ESTIMATION OF THE INFORMATION MATRIX HAS BEEN MADE.THE CONDITION NUMBER IS 0.245D03.THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OR LOGCRITERION OPTIONS OR BY CHANGING THE STARTING VALUES OR BY INCREASING THE NUMBER OF INTEGRATION POINTS OR BY USING THE MLF ESTIMATOR. 


In our experience the (adjusted) MLR solution that you show above is better than the MLF solution. So you can ignore this warning here. 


Sorry to bother you again. When I run my model using the MLR estimator (see syntax in above post) the fit indices are missing (RMSEA, CFI/TLI etc.). Is it possible to get these somehow? Thank you very much for your patience and help! 


Chisquare and related fit statistics are not available if means, variances, and covariances are not sufficient statistics for model estimation. Difference testing of nested models can be done using 2 times the loglikelihood difference which is distributed as chisquare. 

adwin posted on Monday, February 02, 2015  10:34 pm



Dear Sir/Mam I am a new mplus user. I just learned how to use the software to estimated a simple model which contains a continuous dependent variable (i.e. profitpr); a latent variable with 4 observed variables {i.e. b3_1b3_4, and all in 17 Likert scale (1/disagree to 7/agree)}; and a continuous independent variable (i.e. lnloanac). I typed the command as: ....... Variable: Names are profitpr b3_1 b3_2 b3_3 b3_4 lnloanac; categorical are b3_1 b3_2 b3_3 b3_4; Analysis: MODEL : innov by b3_1 b3_2 b3_3 b3_4; profitpr on innov lnloanac; After running the program, I had this results: ...... *** ERROR Categorical variable B3_1 contains 28 categories. This exceeds the maximum allowed of 10. ...... My question are : Why does Mplus consider 28 categories instead of 7 categories for variable b3_1? How can I deal with such problem? Thank you very much. Kind Regards adwin 


You are most likely misreading the data set. Common reasons for this are: 1. Blanks in a free format data set. 2. More variable names in the NAMES statement than columns in the data set. 3. Variable names do not correspond to the columns of the data set. If this does not help, please send the output, data, and your license number to support@statmodel.com. 

adwin posted on Tuesday, February 03, 2015  4:53 pm



Yes, you're right. The problem is now solved. Thank you very much for your help. Kind Regards, adwin 

anonymous Z posted on Sunday, February 15, 2015  7:44 pm



Dr. Muthens, I have two questions about SEM: 1.According to my reading, it seems that continuous indicators that are scaled very differently can load on the same latent variable. For instance, indicators may have different range, one ranging from 0 to 5, while the other ranging from 5 to 20 and etc. Is this correct? 2.I wondered if a combination of continuous and categorical indicators can load on the same latent variable. For instance, indicator 1 and 2 are categorical, while indicator 3 is continuous, can they load on the same late variable? Thank you very much, 


1. Yes. 2. Yes. 


Dear Drs. Muthen, I am running a Montecarlo power analysis for a logistic regression and want to set the threshold for u1 as a 25%/75% split. I know the thresholds go from 15 to +15 on a normal distribution, but I'm wondering what threshold corresponds with this split? Thanks, Mary Mitchell 


The probability of u=1 is P(u=1) = 1/(1+exp(threshold b*x)) where b is the regression slope. Writing logit =  threshold + b*x, you get the logit from the probability as log(P/(1P)) and from that logit you get threshold = b*x  logit. 


Thanks Bengt! 


Can our observed exogenous and endogenous variables be categorical (dichotomous,ordinal, binary) for conducting CFA and SEM alongside moderation and mediation analyses. 


Answered elsewhere. 


Hi I got this error message upon running 112 model. I have 358 observation on IV, Mediator 1, Mediator 2. For DV, i have 61 observations. CLUSTER IS WU; Analysis: TYPE IS TWOLEVEL RANDOM; Model: %within% Psyw by Psy1Psy24; OVw by Ov1Ov15; Tw by T1T10; TFw by TF1TF20; Tw on Psyw; Tw on OVw; Psyw on TFw; OVw on TFw; %between% Psyb by Psy1Psy24; OVb by Ov1Ov15; Tb by T1T10; TFb by TF1TF20; TP by IR1IR7 OCBI1OCBI7 OCBO1OCBO7; Tb on Psyb(b1); Tb on OVb(b2); Psyb on TFb(a1); OVb on TFb(a2); Tb on TFb; TP on Tb(d1); TP on Psyb(d2); TP on OVb(d3); Tb on TFb(c1); Psyb on TF(c2); OVb on TFb(c3); TP on TFb; MODEL CONSTRAINT: NEW(a1b1 a2b2 c1d1 c2d2 c3d3); a1b1=a1*b1; a2b2=a2*b2; c1d1=c1*d1; c2d2=c2*d2; c3d3=c3*d3; OUTPUT: TECH1 TECH8 CINTERVAL; *** ERROR Unexpected end of file reached in data file. 


You might have more variables in your NAMES = list than columns in your data. If this doesn't help, send data and input to Support along with your license number. 


I am hoping someone can help with this. I used gender as a predictor variable (categorical) to depression (negative scores are better). How would I interpret the estimates? For example, negative estimates for a gender to depression path. 


If for example gender = 0 for males and 1 for females, a negative effect means that females have lower depression. 

Back to top 