Message/Author 


I would like to run a path analysis that includes a categorical (dicotomous) endogenous variable as well as A nominal (categorical) control variable that specifies which unique subgroup (households) that each individual in my sample came from. I don't have a specific direction of effects predicted for each subgroup (household), I just want to control for the unique influence each household group places upon each other. In a linear regression I would simply create a dummy variable for each unique family. If I don't control for unique family I violate the independence assumption. In a sample of 177 I have about 67 households. I started a simplified version of an MPLUS script. How would I add family as nominal control? In the simplified model I predict that taking a "Guru" role (which is categorical) mediates the effect of "Internet" usage on family communication "Famcomm" controlling for unique "Family" influence. TITLE: Family Communication  Internet > Guru > Famcomm DATA: FILE IS c:\comn_3ab.dat; FORMAT IS FREE; VARIABLE: NAMES ARE family internet guru famcomm; CATEGORICAL IS guru; !what statement here? GROUPING IS family; ??? ANALYSIS: TYPE=MEANSTRUCTURE; MODEL: famcomm ON guru; guru ON internet; ! add family relationship here???? OUTPUT: sampstat; 


Mike, It sounds like you are in a multilevel situation with subjects within households where you have about two persons per household. This is how I would approach your analysis. However, Mplus requires all outcome variables to be continuous for multilevel analysis. Your problem would not be appropriate for multiple group analysis because of the large number of groups and the small number of subjects within group. Also, multiple group analysis assumes that all observations are independent. I don't know what the mean of your dichotomous variable is. But if it is close to .5, you could treat it as continuous and use TYPE=TWOLEVEL. Then your setup would be: TITLE: Family Communication  Internet > Guru > Famcomm DATA: FILE IS c:\comn_3ab.dat; FORMAT IS FREE; VARIABLE: NAMES ARE family internet guru famcomm; ! CATEGORICAL IS guru; !what statement here? GROUPING IS family; ??? CLUSTER IS family; ANALYSIS: TYPE=MEANSTRUCTURE; MODEL: %between% famcomm ON guru; guru ON internet; %within% famcomm ON guru; guru ON internet; OUTPUT: sampstat; If you want to keep guru categorical, you might consider taking a multivariate approach to multilevel analysis. To do this, all cluster sizes must be the same. So in your case, I imagine most of your clusters have two observations. You would need to use only clusters of size two. If there is a relationship between the two observations, husband and wife, for example, this needs to be taken into account in the data setup. The variables in the data set would be as follows: internth guruh famcomh interntw guruw famcomw. So there are three variables for the husband and three for the wife. The setup would be as follows: VARIABLE: NAMES ARE internth guruh famcomh interntw guruw famcomw; CATEGORICAL = guruh guruw; ANALYSIS: TYPE=MEANSTRUCTURE; MODEL: famcomh ON guruh; guruh ON internth; famcomw ON guruw; guruw ON interntw; guruh internth PWITH guruw interntw; OUTPUT: sampstat The WITH statements model the dependence in the observations. The exogenous variables are covaried as the default. You can set equalities to test if certain parameters are the same for both husband and wife and make different variations depending on your hypotheses. 


I am running a path analysis with both continuous and categorical variables. My categorical variable (y1, with two categories) is a dependent variable. I have a continuous variable (x1) predicting y1. I did test the relationship by indicating the categorical variable as stated in the User's Guide. I have another variable x2, which is a continuous variable as a moderator in the relationship between x1 and y1 in my model. My hypothesis predicts that x2 interacts with x1 in predicting y1. I want to run this model as a multigroup analysis. I am using raw data in my analysis. Would there be anyone who can lead me stepbystep on how to test the moderating effect? Thank you very much. 

Anonymous posted on Wednesday, July 19, 2000  9:18 am



I have two questions regarding structural models with categorical variables. First, every model I have run with different data sets, if I correlate any exogenous variables the models will not converge. If I leave them uncorrelated, and have very biased models, the program does converge. Am I doing something incorrectly? Second, how would I go about estimating the predicted probability of a dichotomous event occuring with one or more latent variables as predictors? Thank you. 


It is not necessary to mention the correlations of the exogenous variables. They are not part of the estimation of the model. Not mentioning them implies that they are freely correlated. The probability you ask for is computed as P, P = 1  probability ((threshold  z)/sqrt(theta)), where, threshold = the threshold of the dichotomous event, theta = the residual variance for y* of the dichotomous event obtained from the standardized solution, and, for example z = a*eta1 + b*eta2 + c*x, where a, b, and c are the estimated regression coefficients of y* for the dichotomous event, regressed on two factors and one x. P is the conditional probability of the event given those factor values and x value. To compute P you choose values of eta1, eta2, and x that you are interested in and evaluate z for those values. You then use a normal probability table to obtain probability ((threshold  z)/sqrt(theta)), from which you obtain the desired P. 

ziv shkedy posted on Tuesday, December 12, 2000  8:08 am



How can I perform a path analysis in which one of the INDEPENDENT variables is categorical (the dependent variables are continuous and categorical). The Mplus code for the model is: ! program for path analysis with observed variables TITLE:PATH ANALYSIS DATA: FILE IS vero3.txt; VARIABLE: NAMES ARE mus neurotic alextot negobliq posobliq leeftijd geslacht opleidin int1 int2; GROUPING IS geslacht(1=male 2=female); USEV = mus neurotic alextot negobliq posobliq leeftijd opleidin ; CATEGORICAL ARE mus opleidin; MISSING ARE ALL(5); DEFINE: CUT mus(3,6,11,15); MODEL: negobliq on neurotic alextot; posobliq on neurotic alextot; mus on negobliq posobliq leeftijd opleidin; The error message is given below. My question is if there is a way to define The variable opleidin as categorical variable. Many Thanks, Ziv. *** ERROR in Variable command CATEGORICAL option is used for dependent variables only. OPLEIDIN is not a dependent variable. 


The scale of the independent variables is not an issue in model estimation, only the scale of dependent variables. Just leave it off of the list and it should be fine. 

Anonymous posted on Tuesday, January 30, 2001  2:57 am



I have a structural model, where a vector of Xs predicts Y1 and Y2 and then the Xs and Y1 and Y2 predict Y3. Both Y1 and Y3 are categorical. I assume that Y2, the continuous variable, enters the "second stage" as a predicted value, but I am not certain how Y1 enters the second stage. Is it a probit coefficient, an integer representing the predicted category from an ordered probit, or a probability of being in a particular category. Thank you 


The intervening variables Y1 and Y2 are not given predicted values that enter into the second stage. Instead, the parameters in all equations are estimated directly and simultaneously. This is done by considering the "reduced form" (to use econometric terms), that is expressing Y3 in terms of the ultimate "causes", namely the vector of X's. For continuous outcomes, this leads to a covariance structure model implied by the set of linear regressions as in conventional SEM. For categorical outcomes, this leads to a correlation structure implied by the set of linear equations for the y* variables. Here, y* variables are the continuous latent response variables behind each categorical outcome (a propensity to reply y=1 as opposed to y=0 for a binary outcome). So, for example, the influence of the categorical Y1 on Y3 is expressed as the influence of Y1* on Y3,not Y1 on Y3. 

Anonymous posted on Thursday, February 01, 2001  4:52 am



Thank you for the response to the previous posting. While I understand how to estimate the impact of Y1* on X3 if X3 is a binary indepdendent variable (and I believe you have posted how to compute the predicted probabilities), I fail to see how to extimate the impact of Y1* on Y3 (what was before X3). The vector of X's are not of theoretical concern, what is of concern is the effect of Y1 on Y3 given that Y3=0 or Y3=1. I thank you for any assistance. 


To explicate the impact of Y1* (which underlies the categorical Y1) on the binary Y3, you act as if you had a regular observed continuous variable influencing a binary outcome. You would want to know how the Y3=1 probability changes as a function of different values on Y1*. For example, you might want to look at the mean value of Y1* and plus or minus 1 standard deviation from the mean, or, since Y1* is itself a function of x's, you may want to look at Y1* values predicted from x values. The Y1* mean and variance are printed in TECH4 because Y1* is restated as a latent variable (actually I think the mean is only printed in Version 2). Hope this helps. 

rose posted on Wednesday, June 06, 2001  11:22 pm



I am doing path analysis with 2 dichotmous dependent variables. My question is whether it is possible to compare coefficients across nested models. Thanks... 


You can do a chisquare difference test if you use the WLS estimator. The chisquare values obtained with the WLSM and WLSMV estimators cannot be used for difference tests. 

rose posted on Thursday, June 07, 2001  3:06 pm



I'm doing path analysis with 2 dichmotomous dependent variables. (so.. y1 ON x1 x2 x3 x4 etc y2 ON x1 x2 x3 x4 where y2 = an IV in the first equation, as I am interested in the mediating impact of y2) I have two questions. First, how can I distinguish between the direct and indirect effects of my explanatory variables of interest? Second, how do I report the coefficients? I've read previously that you simply report direction and strength. I also read that they can be interpreted as probit regression coefficients. Is this correct? Would you know of any articles that have used this method with mplus so I could see how to report my results? Any help appreciated. 


Direct effects are computed for dichotomous variables in the same way as for continuous variables. This should be treated in the Bollen SEM book. The regression coefficients are probit regression coefficients. Reporting significance and direction should be sufficient in most cases. I don't know of any articles showing Mplus applied to a path analysis with categorical outcomes. 

rose posted on Tuesday, June 12, 2001  10:10 am



Just to make sure  does the fact that my coefficients are probit regression coefficients alter the computation? Should I be using standardized or unstandardized coefficients? cheers. 


No, it doesn't make a difference. I would use unstandardized. 

Anonymous posted on Wednesday, June 20, 2001  9:56 am



I'm constructing a path model with a continuous outcome and categorical and continuous predictor variables. The model contains three latent variables that are outcomes in the first model which are then used to predict my final outcome in the second model. I have a sample of about 3,500 cases. The total number of variables is about 21 (not including the indicators of the intervening latent variables). I find that when I run the model one of the continuous variables has a standardized coefficient (STDYX) value on the order of .7. This seemed very large to me, so I bootstrapped my sample and obtained remarkably similar estimates for all coefficients, including the one with the high STDYX value. In addition, the standard errors are consistent and wellbehaved (none seem inappropriately large). I used Mplus to conduct standard tests for multicollinearity, specifically checking to see if any of my other variables were highly correlated with the problmatic variable. This doesn't appear to be the case. I should note that Mplus consistently returns CFI's and TLI's in the neighborhood of .80, but RMSEs below .03, and Rsquares for the outcome variable of .5. The CFI and TLI values are obviously low, but because I'm testing a number of specific hypotheses with the model I'm less concerned with parsimony than variance explained, the significance level of specific parameters, and stability of parameter estimates. Is it likely I'm encountering a collinearity problem that I haven't checked for, or is it possible that the unusually high STDYX is telling me that either I have too many categorical variables in the model or that I am including too many parameters relative to the available sample size ? 


You could check the correlations among the factors by asking for TECH4 in the OUTPUT command which gives model estimated means, covariances, and correlations for the latent variables in the model. Perhaps they are highly correlated. I don't find a standardized coefficent of .7 to be a problem. 

Mary Campa posted on Friday, February 08, 2002  11:04 am



I am doing a path analysis model with a categorical dependent variable. I am trying to obtain indirect effects ( the indirect effect of X1 on Y1 through X2) and I learned from a previous posting that I can caculate the indirect with the Sobel formula, if I have the coefficents and errors from the two paths. These are the paths I have specified: y1 on X1 X2 X2 on X1 In mplus with a categorical dependent variable a WLSMV is the defult estimation on the path from X1 to X2, and a probit extimation is the path from X2 to Y1. If I did each equation in SAS the path from X1 to X2 (or X2 on X1) would be an OLS estimate. My question is which is appropriate? The value of the path from X2 to Y1 does not change with different estimation on the X1X2 path, but because the final outcome is categorical should I be using the WLSMV estimate? Thank you for your Help. 


When one or more dependent variables are categorical, the WLSMV estimator is used as the default. You could also select WLS or WLSM. For any estimator allowed in this situation, you will obtain a probit regression coefficient for each regression relation in which the dependent variable is categorical and an OLS regression coefficient for each regression relation in which the dependent variable is continuous. I hope this answers your question. 

Mary Campa posted on Friday, February 08, 2002  2:14 pm



Thank you for your response. It does answer the question I asked, but maybe I am unclear about what I should be asking. If it does an OLS regression for the continuous dependant variable then what makes this estimate differnt from the one I get in SAS. I can obtain the same probit estimation in SAS. Also, given the differnce which is more appropriate for the question I am asking? 


I would need to look at the SAS and the Mplus outputs to comment on this. I don't believe the estimates should be different. Can you send them to support@statodel.com? 

duckhye posted on Saturday, May 25, 2002  8:17 pm



Bengt O. Muthen wrote on Thursday, February 01, 2001  05:51 pm: "Here, y* variables are the continuous latent response variables behind each categorical outcome (a propensity to reply y=1 as opposed to y=0 for a binary outcome)." I have a fivelevel categorical dependent variable (0 to 4). How can you explain y* in terms of "a propensity..."?? For y* for y=4, a propensity to reply y=4 as opposed to y=3 or 2 or 1 or 0??? Or for y* for y=4 or 3, a propensity to reply y=4 or 3 as opposed to y=2 or 1 or 0??? 

bmuthen posted on Sunday, May 26, 2002  11:50 am



Take the example of a test where your grade is 0 to 4. This is the observed y. Here, y* is your specific skill needed to do well on the test. For example, people who get a certain grade do not necessarily have the same skill. So, y* can be thought of the underlying variable that you really want to measure, but you only have your crude measurement y. 

duckhye posted on Wednesday, May 29, 2002  7:19 am



Bengt O. Muthen wrote on Thursday, February 01, 2001  05:51 pm: "This is done by considering the "reduced form" (to use econometric terms), that is expressing Y3 in terms of the ultimate "causes", namely the vector of X's." I have three endogenous variables, two of them are supposed to affect each other (reciprocal interaction). I have difficulty in writing thier reduced forms in terms of all X's. Could you show me how to write the reduced form of one of the variables (say, L* O*)? 

duckhye posted on Wednesday, May 29, 2002  8:06 am



This is a followup question about the response by bmuthen on Sunday, May 26, 2002  11:50 am: "Take the example of a test where your grade is 0 to 4. This is the observed y. Here, y* is your specific skill needed to do well on the test. For example, people who get a certain grade do not necessarily have the same skill. So, y* can be thought of the underlying variable that you really want to measure, but you only have your crude measurement y." I know that there are k1 fitted regression lines for k levels. How do you convert these k1 lines to one y*?? 

bmuthen posted on Wednesday, May 29, 2002  5:22 pm



Regarding your question about reduced form, please see an SEM book, e.g. Bollen. Regarding the 04 variable, I am assuming that this variable is ordered categorical in which case there is only 1 regression slope and k1 thresholds (intercepts). With unordered polytomous response, however, there would be several slopes. 

M. Hyland posted on Monday, June 24, 2002  11:22 am



I have read the discussion list, and I am considering purchasing Mplus to analyze a dataset. Here are the issues I have: 1) I am using observed variable path analysis 2) multiple group analysis (2 groups) 3) categorical dependent variable (0/1) with a mean of .1 I think that this analysis would be possible using Mplus, correct? Thank you in advance. 


Yes. 


I would like to perform a path analysis with an ordered categorical outcome variable (4 categories, skewed). The data is hierarchical and has two levels (students within schools) and the number of students per school ranges from 2 to 52. From the earlier discussion and the error message ('general twolevel analysis is only available when all dependent variables are continuous') I understood that it is not possible to do this in Mplus? Is that correct? If not, please tell me how I should do the analysis. 


This is not possible in the current version of Mplus. It will be possible in Version 3 which will be out in the Spring of 2003. 

Anonymous posted on Wednesday, February 26, 2003  5:04 am



I have got a question concerning a model with a causal effect between an exogenous continous manifest variable and an endogenous continous latent variable measured by 4 categorical indicators. The 4 factor loadings are probit coefficients (please correct me if this is not correct), but I am not sure about the regression coefficent between the exogenous continous variable and the continous latent variable (measured by categorical indicators). Is this coefficient a probit coefficient or not? 


The regression coefficient for the continuous dependent latent variable and the continuous independent variable is a regular regression coefficient. The type of regression coefficient is determined by the scale of the dependent variable. In this case, the dependent variable is continuous. In the case of the categorical factor indicators, they are the dependent variables and therefore the regression coefficients for the regresson of the factor indicatros on the factor are probit regression coefficients, 

Anonymous posted on Tuesday, August 10, 2004  9:48 am



I have a categorical variable (3 categories) that is the outcome of a regression equation ('endogenous') as well as the predictor variable of another regression equation ('exogenous'). I was having some problem in understanding the estimates of this variable. I understand the probit estimates when the variable is endogenous and I have declared the variable to be categorical in the MPlus variables statement, but I was wondering if MPlus treats the categorical variable as a linear predictor in the 'exogenous' equation. 

bmuthen posted on Friday, August 13, 2004  5:03 pm



I assume that your categorical variable is not nominal but ordinal, using the categorical = option. With the default probitWLSMV approach, the exogeneous role of the categorical variable is handled by letting its corresponding y* variable (continuous, latent response variable) be the predictor. With the new ML option in Mplus Version 3, the ordinal scores of the categorical variable are used, treating the predictor as (quasi) continuous. 

Anonymous posted on Tuesday, March 29, 2005  7:23 am



I am running a model with an unordered categorical dependent variable, using NOMINAL, to define such variable. I understand the regression of this variable on other variables is a MNL model. In this context, do the standardized logit coefficients have any value? Thanks 

Anonymous posted on Tuesday, March 29, 2005  11:14 am



Follow up on above message. Regarding the standardized logit coefficients, what I meant was to ask whether they had any meaningful interpretation or if they could be viewed as anything resembling a correlation coefficient (like polychoric correlations)? 


I am conducting SEM analyses and have three questions: 1. My analyses are with categorical dependent, mediating and independent variables using a weighted sample with missing data. Upon trying to run the models I go an error message indicating that I needed to use the THETA instead of the DELTA parameterization. What is the difference between these two methods of parameterization (in plain language, please)? I didn't quite understant the information in the MPLUS User's guide. 2. My main independent variable is a nominal variable with 5 categories and I have figured out that dummy variables need to be created in order to model this variable. I am assuming that this is similar to doing a logistic model in SAS where you include 4 dummy variables and the one left out would be your referent category. Is this correct? So your estimates are compared to the left out category? 3. Since MPLUS allows the path analysis of categorical dependent variables is there a way to get a logit instead of a probit model? Why/why not? 

BMuthen posted on Saturday, April 02, 2005  8:57 pm



Regarding the standardized logit coefficents, these are probably not very useful in the case of nominal outcomes. 


1. The difference between THETA and DELTA has to do with which parameters are estimated for the categorical outcomees. With THETA, residual variances are estimted. With DELTA, scales factors are estimated. For some models, THETA must be used. 2. Yes. 3. Yes, ask for ESTIMATOR = ML; 


Thanks for your feedback above. Another question: When checking whether your model fits the data you have the Chisquare, CFI and RMSEA statistics as a guide. I am aware that Chisqare is sensitive to large sample size and since I am working with population data (>60,000) I keep getting p<<<0.05 and am therefore relying on the CFI and RMSEA statistics. However, is one better than the other in assessing model fit? For example, I hve categorical dependent, mediating and independent variables using a weighted sample and the CFI =0.782 and RMSEA=0.037. The RMSEA indicate good fit but the CFI indicate less than marginal fit. Which one do you choose? 


CFI has proven to be very dependable in many of the simulations done in the Yu dissertation which can be found on our website. So I would be concerned about the low CFI. You might want to do your own simulation to see how the fit statistis work for your type of model. Be sure that you are using Version 3.12. 

Anonymous posted on Tuesday, April 19, 2005  12:32 pm



How do you model reciprocal relationships in MPLUS? I am interested in the direct and indirect relationship between X and Y, but the relationship is mediated by ABCD and E. A and B and C and D have reciprocal relationships (i.e. a nonrecursive model). 

bmuthen posted on Tuesday, April 19, 2005  1:21 pm



You simply say, for example, a on b; b on a; where a and b are part of the bigger model you specify. 

Anonymous posted on Tuesday, April 19, 2005  3:46 pm



Thanks. This is the model that I fitted given that my X variable is a 5level categorical variable (dummy coded x2,x3, x4, x5), my Y a dichotomous variable and B is a latent variable with pqrs loading onto the factor: B by p q r s; B on x2 x3 x4 x5 A; A on x2 x3 x4 x5 B; E on A B x2 x3 x4 x5; C on A E D x2 x3 x4 x5; D on B E C x2 x3 x4 x5; Y on C D E x2 x3 x4 x5; p with q r s; q with r; r with s; the model ran with CFI = .984; TFI = .963; RMSEA = 0.051 but included the warning "The residual covariance matric is not positive definitive, problem with variable p". How do I get rid of this problem? Secondly, what is the TFI and how is it estimated? I assume both should be used in assessing model fit and that both should be greater that .9 (based on an article that used MPLUS and reported the CFI only), is my assumption correct? 

Anonymous posted on Tuesday, April 19, 2005  3:50 pm



Another question regarding the reciprocal relationship between A and B: If the estimates for A on B and B on A are not statistically significant but B on A is significant if A on B in not specified and vice versa, how do you interpret this? Do you model each path separately? 

BMuthen posted on Wednesday, April 20, 2005  9:26 am



You have included several residual covariances among your indicators for the factor B. This may be causing the residual variance for p to be negative or the residual for p may be correlated one with another indicator. You will need to look at your output to see what is happening and make a change to your model to avoid the problem. See the technical appendices on the website for the formula for TFI. It is your choice whether you want to report one or both fit statistics. You can find recommended cutoffs in the Yu dissertation which is on our website. If you believe in a nonrecursive model, then you would need to include them both and accept that they are not significant. 

Anonymous posted on Thursday, April 21, 2005  1:57 pm



Thanks. You indicated that if I believe in a nonrecursive model then I would need to include both paths (A on B; B on A) even if they are not significant. Is this still the case even if my data is crosssectional? I read that the effects "may be inflated due to methodological artifact of priming or to the inconsistency effects associated with simultaneous measurement" (ChiSum Wang et al, 1998), as such, even though theoretically the reciprocal relationship is true, I am not sure if I should be modeling it as such. Any suggesions? 

bmuthen posted on Thursday, April 21, 2005  5:14 pm



There are certainly arguments about weaknesses of nonrecursive models  I think a large econometric literature exists on this. I couldn't say to which extent that critique is applicable to your case. Some argue that with detailed longitudinal data you don't need nonrecursive models, but the reciprocity is fleshed out in time. 


Hello, I have a cross lagged panel design in which I want to estimate the effect of number of hours of math on math achievement. My model is: TITLE: this is an example of a path analysis with categorical dependent variables DATA: FILE IS C:\probeer1499.txt; VARIABLE: NAMES ARE u1u5 x1x5 ; CATEGORICAL ARE u2u5; ANALYSIS: PARAMETERIZATION = theta; MODEL: x5 ON u4 u1 x4; x4 ON u3 x3; x3 ON u2 x2; x2 ON x1 u2; u5 ON u1 u4 x4; u4 ON u1 u3 x3; u3 ON u1 u2 x2; u2 ON u1 x1; In the output I can see that the model has estimated the correlation between x5 and u5 but I don't want this to be estimated. How can I override this default. And how can I estimate the correlation between my two endogenous variables x1 and u1? Thank you! Eva 


If you don't want the correlation between x5 and u5, say x5 WITH u5@0; I believe that x1 and u1 are exogenous variables. I see them only on the righthand side of ON. The correlations of exogenous variables are not included in the model as in regular regression. The model is estimated conditional on the exogenous variables. If you want to know their correlation, use sample statistics. 


Thank you very much! x1 and u1 are indeed exogenous. I was just wondering whether one could take into account the fact that they might be correlated. 


This is taken into account. Think of regular regression where y = a + bx1+ bx2 + e. The parameters in the model are an intercept, two slopes, and a residual. The correlation between x1 and x2 is not a parameter in the model. The correlation between x1 and x2 is the sample correlation value. If you bring x1 and x2 into the model and estimate their correlation, then you will have to assume normality of x1 and x2. If you do not, normality does not have to be assumed for them. 

Anonymous posted on Thursday, May 05, 2005  4:13 pm



1. If my main predictor (X) is a 5level nominal variable (therefore 4 dummy variables created), I have 5 ordinal mediating variables (ABCDE) with reciprocal relationships b/w A&B and b/w C&D, and my outcome variable is a 3level ordinal variable (Y), can I obtain rsquare values for each level of my main predictor variable for all observed variables (ABCDE&Y)? 2. How do I interpret the reciprocal relationships in the model (they are all statistically significant and make theoretical sense)? i.e. x>A>B>E>C>D>Y and at the same time: X>B>A>E>D>C>Y 3. How do I interpret the coefficient for the 3level ordinal outcome variable? 4. Can I calculate final probabilitie for each level of X? 5. How do I present the results in an interpretable manner for the reader? 6. How do I know that my mode is identifiable? Is the fact that the MPLUS resuts indicated model convergence and good fit of the data (all the fit statistics suggest good fit) sufficient? Should I also manually calculate model identification? If so, can you provide a reference? 


1. You will get an rsquare for each dependent variable in the model. 2. I am not aware that the interpretation would be different than any other regression relationship. There may be a literature on that. 3. In Mplus, with weighted least squares, it is a probit regression coefficient. With maximum likelihood, it is a logisitic regression coefficient. You can look at sign and significance or go to a probability. 4. Yes. See Appendix 1 on the website and Chapter 13. 5. I would look at the journal I aim to publish in and follow their style. 6. In most cases, if Mplus converges, the model is identified. I'm not sure if the Bollen book shows how to check for identifiability but it may. You can also check some other SEM books. 


Hello, I am estimating a cross lagged panel design where the amount of hours taken in math are related to math achievement and vice versa. I want to model the correlations between the errors of the different measures of math achievement and between the different measures of amount of hours of math (which are categorical variables). Because math achievement is measured with a very similar test I would expect measure specific variance. How can I model this? My current model is: DATA: FILE IS C:\probeer1499metschool.txt; VARIABLE: NAMES ARE clus u1u5 x1x5 ; CATEGORICAL ARE u2u5; CLUSTER = clus; ANALYSIS: TYPE = COMPLEX; PARAMETERIZATION = THETA; MODEL: x5 ON u4 x4 u1; x4 ON u3 x3 u1; x3 ON u2 x2 u1; x2 ON x1 u1; u5 ON u4 x4 u1; u4 ON u3 x3 u1; u3 ON u2 x2 u1; u2 ON x1 u1; x5 with u5@0; OUTPUT: standardized modindices sampstat; SAVEDATA: results are results.dat; I cannot find any information in the Mplushandbook on how to model errors that are correlated. Thanks for your help! 


See the section describing the WITH option and the VARIANCES/RESIDUAL VARIANCES sections in Chapter 16. Residual variances are referred to by their variable name and WITH is used to specify correlational relationships. x4 WITH u4; specifies the residual covariace between x4 and u4. 


Consider the following path analysis: Y1  a 4 category ordinal variable Y2  a 4 category ordinal variable x1x4  other variables (either dummy or continuous) categorical are y1 y2; Model: y1 on y2 x1 x2; y2 on x3 x4; In the output, the probit regressions give three intercepts or thresholds for y2. The model also gives a single parameter estimate for y2 on y1. Although the parameter estimate is a probit coefficient, how is it interpreted in the first equation? 

bmuthen posted on Tuesday, May 17, 2005  3:53 pm



I think you are asking how the coefficient for y1 on y2 in the first equation is interpreted. When using WLSMV, this is a probit coefficient as you say, meaning that holding x1 and x2 constant, it says how many probit units y1 changes as a function of a oneunit change in y2*, where the y2* is a continuous latent response variables underlying y2. 


Hello, I am trying to fit a cross lagged model where the amount of hours taken in math are related to math achievement. I have 5 waves and the data are clustered (student and school level). When I fit the following model: TITLE:this is an example of a path analysis with categorical dependent variables DATA:FILE IS C:\probeer1495metschoolAPARTJM.txt; VARIABLE:NAMES ARE clus g u2u5 x1x5; CATEGORICAL ARE u2u5; CLUSTER IS clus; GROUPING is g (0=boys 1=girls); ANALYSIS: TYPE = COMPLEX; PARAMETERIZATION = THETA; MODEL: x5 ON x4 x3 x2 x1 u4 u3 u2; x4 ON x3 x2 x1 u3 u2; x3 ON x2 x1 u2; x2 ON x1; u5 ON u4 u3 u2 x4 x3 x2 x1; u4 ON u3 u2 x3 x2 x1; u3 ON u2 x2 x1; u2 ON x1; x2 with u2; x3 with u3; x4 with u4; OUTPUT: standardized modindices residual; I get the following error message: *** FATAL ERROR THE WEIGHT MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. However when I pretend that my data are not clustered (and leave out the type = complex) I have no problems fitting the model. So i guess that this error message has something to do with the fact that i take into account the clustering of the data. How can I solve this? I also have another question. Sometimes (fitting slightly different models) I get the error message: NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED. How can I manipulate the number of iterations and make it larger? Thank you 


I suspect that you are not running Version 3.12 and that your first problem is that groups must contain indepdent observations. Your clusters most likely contain both boys and girls, so gender cannot be used as a grouping variable. If you look up convergence problems in the index of the Mplus User's Guide, you will find suggestions for what to do when you get nonconvergence. See the ANALYSIS command in the Mplus User's Guide for a description of options to increase iterations. 


Thanks for you quick reply! I am using version 3.11 and made sure that the groups have independent observations. So I changed the codes from the schools in a way that my clusters only contain boys or girls (but not both). I still get the error message: *** FATAL ERROR THE WEIGHT MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. The convergence problem: number of iteration is solved but now I get the error message that the weight matrix is not positive definite. Do you have any idea what I have to do to solve this problem? 


If you send your input, output, data, and license number to support@statmodel.com, I will take a look at it. Also, please download Version 3.12. 

Anonymous posted on Tuesday, May 24, 2005  1:52 pm



Hi Linda/Bengt; My original question was posted By Anonymous on Thursday, May 05, 2005  04:13 pm. I am using weighted data. If I use WLSMV the model runs fine and converges and all fit statistics indicate good fit. However, when I tried running this model using ML to get logits instead of probits coefficients the model wont run. Can you suggest why? Thanks 

bmuthen posted on Tuesday, May 24, 2005  6:42 pm



Hard to say off hand why you have ML problems, although note that with ML and logit, the ordinal mediating variables are treated differently than with WLSMV and probit (there was a recent post on this in connection with the WinshipMare article). When the mediators are used as predictors, the actual values are used in ML/logit and treated as continuous scores, whereas in WLSMV/probit it is the underlying y* variable that is used as predictor. To diagnose your ML problem, please send your data, input, and output for the ML run to support@statmodel.com with your license number. 

Anonymous posted on Tuesday, May 31, 2005  4:00 pm



Unfortunately I am working in a secure data facility and am unable to forward my data to you. I could try to send my output and program but need to go through a vetting process with the data administrators which would take some time. However, I have a couple additional questions: 1. Do you still need to check for under, just or overidentification for nonrecursive path analyses given that the model converges and all fit statistics indicate good fit of the model to the data? 2. It is recommended that the standardized coefficients be reported. How is the standardized coefficient calculated in MPLUS for ordinal (e.g. 4levels) outcomes and dichotomous outcomes? 

bmuthen posted on Tuesday, May 31, 2005  4:40 pm



We could probably say something based on the output without data. 1. Not if you get SEs computed (and they are not huge)  that typically indicates that the model is identified. 2. Many outlets like to see standardized values in addition to the raw ones. With categorical outcomes, Mplus uses the variance of the underlying y* variable 


Dear Muthéns, I want to estimate a system of ordered probits. When I do that, MPLUS chooses WLSMV as the default estimator. But I am not really sure how the estimates are obtained. As far as I understand the WLS procedure, it first obtains the reduced form parameters (which here coincide with the model parameters) from single equation probits, then the correlation coefficients from bivariate probits, conditioned on the first stage estimates and in the final step the model parameters from a quadratic form. Since the model parameters coincide with the first stage reduced form parameters, the outcome of the last stage should be those estimates again, or am I wrong? Anyway, if I estimate each equation independently, the estimates differ slightly from the system estimates. I would be grateful for any help. 

BMuthen posted on Friday, June 24, 2005  2:04 am



The results will be different because the WLS estimator weights the sample statistics somewhat differently than what an ML solution would do. 


But it does not take crossequation correlation into account, does it (unless I specify it with the WITH command)? 


Please send the relevant outputs, data, and your license number to support@statmodel.com so we can see exactly the situation. This will not be looked at until after July 1. 

Anonymous posted on Friday, August 19, 2005  10:42 am



I have a 5level ordinal main independent variable, a number of categorical endegeneous variables and a dichotomous outcome variable and have obtained the direct, indirect and total effects and significance for 4 levels of my main independent variables relative to a a left out category in a nonrecursive path analytic model. Is there a way for me to obtain an overall categorical effect for this main independent variable and its level of significance? 


I don't know of any way to consolidate the results from the four dummy variables. Perhaps the ANOVA literature or regression literature would have something. Maybe someone else knows. 

Anonymous posted on Tuesday, August 23, 2005  12:15 pm



I am not quite sure how to interpret the estimated probit coefficients. Am I a right that the "estimates" in the output (the dependant variable is binary; estimator: WLSMV) indicates a unstandardized amount of change in y* (and not in the manifest variable y) when x is increased by one unstandardized unit? In turn, the "StdYX" indicates the amount of standardized change in y* when x is increased by one standard unit? If so, is it appropriate to use the standardized results to estimate the relative strenght, e.g. to say that a predictor x1 with the standardized estimate of 0.780 has a negative and foremost bigger impact on y than a perdictor x2 with an estimate of 0.040? 2. Is there a possibilty in Mplus to transform those values in probabilities? 

Anonymous posted on Tuesday, August 23, 2005  1:59 pm



Another additional question: Is the estimated probit coefficient identical with the one computed by stata? Because, there the probit coefficient indicates the change in the zscores of the variable y when x increases by one unit. And, as far as I know, the "raw" coefficients are not comparable to each other, i.e. the coefficients do not say anything about the magnitude of the effect. But regarding at your answer from MAy 17, this seems not to apply to the probit estimates of MPlus. 

bmuthen posted on Tuesday, August 23, 2005  8:59 pm



1. It is correct to talk about the standardized values as you do here when you consider y*. 2. It is possible to transform the impact of x variable changes into y probability changes, but the problem is that the probability change is different at different values of the x's since the probability is a nonlinear (probit/logit) function of the x's. One can pick key x values (such as sample means) and study changes when moving away from that. Regarding the additional question, the change in the z score is the same concept as the change in y* (y* is a z score). 

Anonymous posted on Thursday, August 25, 2005  10:23 am



Hi, I ran a nonrecursive model in mplus (X2>X3 and X3>X2 but found that X3>X2 was statistically significant but X2>X3 was not. Someone indicated that I might be getting this result because I failed to correlate the error terms for the variables with the reciprocal relationship. Is this correct? Will the syntax below correct for this (I.E correlate the two variables)? X1 ON X2 X3; X2 ON X3 X4; X3 0N X2 X4 X5; X2 WITH X3; 


The statement x2 WITH x3; specifies the residual covariance between x2 and x3. I am not sure that this explains your results. 

Anonymous posted on Friday, August 26, 2005  10:13 am



Hi Linda, Thanks for reponding to my query posted on August 19, 2005 regarding a possible way to obtain the significance of the overall categorical effect for my 5level main independent variable. I have consulted with a statistician and he indicated that if I figured out how to obtain the asymptotic covariance matrices (i.e. covariance matrix of the parameter estimates) then he should be able to help. Is it possible for me to obtain the ACOV matrix in MPLUS? How? Thanks 


This is TECH3 of the OUTPUT command. 

Anonymous posted on Wednesday, September 07, 2005  5:14 am



Referring to the posting by Linda K. Muthen on Friday, February 08, 2002  01:58 pm: If a model consists of a dependent continous variable (regressed on two factors), which in turn influences another dependent but dichotomous variable, I obtain on the one hand an OLS (and not ML) estimated regression coefficient and on the other hand (for the binomial outcome) an WLSMV estimated probit coefficient? So, there are in fact two estimators in use, OLS and WLSMV, for each "stage" of the model? 


Only the weighted least squares estimator is used. Simple linear regression coefficients can be estimated using many estimators not just OLS. 

Anonymous posted on Friday, September 09, 2005  9:42 am



Hi Linda, Thanks for your reply to my message posted on Friday, August 26, 2005  10:13 am. Now I am able to obtain the ACOV matrix. Is there a way to change the number of decimal places for the estimates? By default MPLUS uses 3 decimal places but I would like to get the information at a more precise level of significance. 


If you save TECH3 using the TECH3 option of the SAVEDATA command, the values are saved in an E15.8 format. 

Anonymous posted on Tuesday, September 20, 2005  3:14 pm



I ran 3 models in which I am interested in the effect of A on D directly: 1. A on D only as well as indirectly through: 2. X1 only 3. X1 and X2 and found that in model 1 the direct effect of A on D was .545 and statistically significant and in model 2 and 3 its direct effect was .345 and .271 respectively with both estimated being statistically significant and my sample size change from model 1 to model 3 due to missing data on variables X1 and X2. Can I compare the direct effects across the three models (e.g. report a 37% change in the magnitude of the direct effect from model 1 to 2)? Thanks. 

Anonymous posted on Tuesday, September 20, 2005  4:06 pm



Re my message above, I am comparing the unstandardized coefficients. 


If I understand you correctly, you have different samples for the estimates that you want to compare. I don't think this would be correct. 

Anonymous posted on Wednesday, September 21, 2005  1:07 pm



Actually, for all three models I am using the same dataset but some people and excluded from one model to the next because they have missing data on X1 and X2. That is, all the people in model 3 were also included in model 2 but model 2 has some people who weren't present in sample 3. Is you response still the same. 


To make comparisons, the observations need to be the same. You can inlcude the x variables in the model by mentioning their variances. Then the obervations will not be eiliminated. Note then that you make distributional assumptions about the x variables. 

Diana posted on Thursday, October 27, 2005  1:11 pm



Hi Drs. Muthen; regarding my message posted on May 24, 2005: " Hi Linda/Bengt; My original question was posted By Anonymous on Thursday, May 05, 2005  04:13 pm. I am using weighted data. If I use WLSMV the model runs fine and converges and all fit statistics indicate good fit. However, when I tried running this model using ML to get logits instead of probits coefficients the model wont run. Can you suggest why?" I finally got the model to run with the ML estimator by excluding all cases with missing data on the variables that I am using. However, I was trying to run a nonrecursive model that couldn't be identified because the two endogenous variables that theoretically have a reciprocal relationship shares the same predictors. Based on a posting on SEMNET by Rigdon on October 8, 2001, if the two variables in the reciprocal paths share the same predictors and have correlated error terms then the model cannot be identified. Rigdon suggested that in such a situation one could use the correlated error term to model the reciprocal paths and discuss the limitations. For example (note: the reciprocal relationship is between y2 and y3 and all the endogeneous variables are categorical and clearly specified in the model): y1 on a b c; y2 on a b y1; y3 on a b y1; y4 on a b c y2 y3; y2 with y3; When I run this model using the probit regression (WLSMV) it runs fine but when I try with ML (logit) I the following error message: "covariance for categorical, censored, count or nominal variables with other observed variables are not defined. Problems with y2 with y3" Note, the logit model runs fine if I omit the y2 with y3 line in the model statement. How do I include the correlated error terms between y2 and y3 in the logit model? thanks 


Residual covariances for categorical outcomes are not supported in the maximum likelihood logistic regression model. If you want these parameters in your model, it is most straightforward to use WLSMV. If you want to inlcude it using maximum likelihood, you would have to use a strategy such as is shown in Example 7.16 where a factor is used to capture the residual covariance. 

Diana posted on Friday, October 28, 2005  12:00 pm



I am unable to obtain the indirect effect for each level of my main independent variable using the maximum likelihood estimation. I am assuming that the maximum likelihood logistic model does not support the estimation of the indirect effect. Is this correct or am I missing something? 


I'm not sure I understand your question. Please send your output and license number to support@statmodel.com so I can see what you mean. 

Jane posted on Monday, October 31, 2005  10:14 am



I ran a path model with WLSMV with a number of categorical endogenous variables. In the output there are rsq. associated with each endogenous variable. Are these rsq. interpreted the same as in a linear regression model (i.e. the model explains X% of endogenous variable B)? 

Diana posted on Monday, October 31, 2005  2:10 pm



1. The model command XWITH allows you to model interaction effect for continuous endogenous variables, can this also be done for categorical endogenous variables? 2. I ran a model: GROUPING is loc (0=urban 1=rural); MODEL: y1 ON A2 A3 A4 A5 B C; y2 ON A2 A3 A4 A5 y1 B; y3 ON A2 A3 A4 A5 y1 y2 B C; y4 ON A2 A3 A4 A5 y1 y2 B C; y5 on A2 A3 A4 A5 y1 y2 y3 y4; y3 with y4; model indirect: dep ind a2; dep ind a3; dep ind a4; dep ind a5; ANALYSIS: ESTIMATOR = WLSMV; PARAMETERIZATION = THETA; and obtained my estimates for each of my endogenous variables (which by the way are all categorical) for each group (urban vs. rural). Is there anyway to obtain beta for the main effects of my loc variable (i.e. beta0's for urban and rural)? 3. I tried getting the logits instead of the probits for the model above after removing "y3 with y4" from the model statement because the residual covariance is not supported in the ml logistic model (your response above Oct 27, 2005 3:35PM). To get the logit in this modified model I changed the estimator and parameterization to: ESTIMATOR = ML; PARAMETERIZATION = LOGIT; However, I keep getting the error message: "model indirect is not available with ALGORITHM = INTEGRATION". How do I solve this. 4. Based on your response on Oct 27, 2005 3:35PM, will the model statement: RC by y3 y4; with the inclusion of RC in the equation for y3 and y4 work? 


The rsquare when outcomes are categorical is the rsquare for the u* variable underlying the u variable. It is interpreted the same as in a linear regression model. 


1. I am a little confused because you start out with a question about XWITH but you don't have XWITH in your MODEL command. XWITH is used to define interactions between continuous latent variables or between a continuous latent variable and an observed varialbe. The observed variable can be continuous or categorical. The distinction is not between continuous and categorical but observed and latent. See the description of the XWITH command in the Mplus User's Guide. There is also a table that shows which methods to use for different types of interactions. 2. Urban and renewal are categories of your grouping variable. If a model with a parameter held equal across groups fits better than a model with a parameter unequal across groups, this would indicate no interaction, only a main effect. 3. MODEL INDIRECT is not available with ALGORITHM = INTEGRATION. So if your model requires this, you cannot use MODEL INDIRECT. 4. You can try this. I believe there are other specifications in the example I referred to. 

Diana posted on Monday, October 31, 2005  3:56 pm



Thanks, Re point number 3, I guess I am trying to show that there is nowhere in the model statement where I stated that ALGORITHM = INTEGRATIOn yet I am getting this error statement. What would I have to change in my model statement below to prevent this error message: GROUPING is loc (0=urban 1=rural); MODEL: y1 ON A2 A3 A4 A5 B C; y2 ON A2 A3 A4 A5 y1 B; y3 ON A2 A3 A4 A5 y1 y2 B C; y4 ON A2 A3 A4 A5 y1 y2 B C; y5 on A2 A3 A4 A5 y1 y2 y3 y4; y3 with y4; model indirect: dep ind a2; dep ind a3; dep ind a4; dep ind a5; ANALYSIS: ESTIMATOR = ML; PARAMETERIZATION = LOGIT; Thanks 

Diana posted on Monday, October 31, 2005  4:00 pm



Re point 1, I guess my understanding of the XWITH command was incorrect. However, what I want to find out is how to go about modeling the interaction between 2 observed categorical endogenous variables. 

Diana posted on Monday, October 31, 2005  4:01 pm



Oops! Re point 1, I guess my understanding of the XWITH command was incorrect. However, what I want to find out is how to go about modeling the interaction between 2 observed categorical endogenous variables in a path model. 


Sometimes ALGORITHM=INTEGRATION is required and so it is turned on even if you don't ask for it. Interactions are used to predict. What do you mean by an interaction between two observed categorical endogenous variables in a path model? 

Anonymous posted on Thursday, November 03, 2005  2:08 pm



What does it mean when your direct and indirect effects are statistically significant (i.e. estimate/SE >>> 1.96) but your total effect is not? 

Anonymous posted on Thursday, November 03, 2005  4:06 pm



Even more interesting is your thoughts on what to make of a situation in which your indirect effect is large and negative (.325), your direct effect is large and positive (0.402) and therefore the total effect is small, positive and not statistically significant (0.067)? 

BMuthen posted on Thursday, November 03, 2005  5:44 pm



It would seem possible that x can have a negative effect on m, m can have a positive effect on y, and x can have a positive direct effect on y. In this case, you can have a zero total effect. I can't think of an application but apparently you have found one. 

Asegid posted on Monday, January 23, 2006  12:52 pm



Hi Linda, I am runing a multigroup path analysis with categorical dependent variables. my main interest is if the realtionship between the independent and dependent variable(s) is different across groups. if at all,do I have to constrain scale factos/thresholds? Thanks. 


Regression slopes do not have to be fixed and freed with other parameters. 

asegid posted on Monday, January 23, 2006  4:36 pm



thanks.to make my question clearer : I was wondering about the scale factor/threshold of the categorical depndent variable. 


Actually, I was incorrect in my statement above, when a regression coefficient is free, the threshold should also be free and the scale factor fixed to one. 

Joan posted on Saturday, February 11, 2006  10:38 am



Hi Linda or Bengt; 1. How do you determine the sample size that is needed to have sufficient power to run SEM models? 2. If you use very large sample size (N=60,000 and the number of parameters in your model is 70), how do you determine if the significance of you estimates are not simply due to the large sample size but meaningful? When the ML estimation is used I am told that you can use the BIC criteria to adjust your estimates. Is there an equivalent way to adjust your estimate (for large sample size) when your endogenous variables are categorical (ordinal) and you used the recommended WLSMV? Thanks. 

Joan posted on Monday, February 13, 2006  5:13 am



Another quick qusetion, do you report the standardized or unstandardized coefficients? and why? (I noticed that you suggested to one person above that you would report the unstandardized coefficient but you did not indicate why). 


The following paper has some suggestions: Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. You need to consider practical significance when you have a very large sample. For example, if you have an intervention that costs $1000 per person, how large of an effect do you need to see to make it worth it  an increase of 1 on a measure or an increase of 10. You might also divide the sample and use the second part for crossvalidation. BIC is only for maximum likelihood not weighted least squares. 

Joan posted on Wednesday, February 15, 2006  8:52 pm



I posted this question obove on Feb 13, 2006 but received no response so I thought I would try again: Do you report the standardized or unstandardized coefficients? and why? (I noticed that you suggested to one person above that you would report the unstandardized coefficient but you did not indicate why). In addition, how do you determine which of your endogenous variables is having the greatest effect on the pathway from your main independent variable to the outcome of interest? Do you just compare the magnitude of the effect for the different endogenous variables? Is this the same whether standardized or unstandardized estimates are used? Thanks 

bmuthen posted on Thursday, February 16, 2006  6:28 am



I think unstandardized coefficients and their standard errors should be the main part of reporting results because they are the parameters in the model. In addition, standardized coefficients are typically reported to help gauge effect sizes. I don't understand your question about endogeneous variables having an effect on a pathway. Here is how your description sounds to me: You have an x (main independent variable), a y (outcome of interest), and a z (endogeneous variable). You say that z influences the influence of x on y. That sounds like x and z interacting in their influence on y, that is, a moderating effect.  But I am not sure if that is what you are asking. 


Hello I am running a path analysis which consists of one categorical independent variable (family structure), three continous mediators (relationship with mother, relationship with father and monitoring) and a continous dependentent variable (antisocial behavior). The categorical independent variable consists of four different family structures (twoparent, single mother, single father & stepfather families) which is dummy coded into a set of three independent variales with twoparent family as a reference group. Analyses show that single mother, single father and stepfather families have a higher risk for antisocial behavior than their counterparts in twoparent families. I wonder if it is legimate to examine whether the different mediating variables mediates the increased risk of antisocial behavior for the different family structures (in the same analysis) and examine direct, indirect effects etc. I am asking because by doing such an analysis I am really mixing ANOVA with path analysis in a way that I have not seen in the literature. Thanks in advance 


Yes, it would be appropriate to examine direct and indirect effects. I think this is often done in path analysis. 


Hello, I have a model likes this: f1 by x1 x2 x3; f2 by x3 x4 x5; y on f1 f2; I want to get the relationship between y and x's (indirect path relation, or covariance, or correlation), I found a web link: http://luna.cas.usf.edu/~mbrannic/files/regression/SEM.html says that the correlation can be obtained from the path coefficients (example just before section "Causal Modeling Revisited") without caring the path directions. For my case, since the relation likes: f1 to x1 and f1 to y (i.e. X1<f1>y). 1. Can I get the correlation by simply multiply the coefficients anyway? 2. If it is ok, since the coefficients are greater than 1, should I use the STDYX instead? Thanks and happy new year! 


No, you cannot get the correlation between y and x this way. You don't have an indirect effect from x to y as you show. To obtain correlations between the x's and y, you need to put a factor behind each x and use the factors as factor indicators. The correlations would be found in TECH4. For continuous outcomes, you would do this as follows: f1 BY x1; x1@0; For categorical outcomes, you would do this as follows: f1 BY x1; 

Boliang Guo posted on Tuesday, January 30, 2007  2:51 am



morning Prof Muthen. my question is: different inference based on results from Logistic and probit regression with same data?from the output, I can say, enviro1 effect is significant from logistic but not probit regression pros1 have nonsignificant effect from logistics but significant in probit. I think the slops are not same because they are in difference scale, but the significant level should be same, am I right? here are the output: logisticreg out Estimates S.E. Est./S.E. ON ENVIRO1 0.223 0.080 2.789** PROS1 0.007 0.116 0.059 CONS1 0.548 0.103 5.296 TEMPT1 0.759 0.104 7.326 probit out Estimates S.E. Est./S.E. ON ENVIRO1 0.027 0.044 0.626 PROS1 0.273 0.052 5.260** CONS1 0.384 0.050 7.711 TEMPT1 0.440 0.037 11.929 


This can happen. One situation where this may happen is with variables with floor or ceiling effects. The reason is that the normal and logistic distributions are different in the tails. 

Boliang Guo posted on Friday, February 02, 2007  4:47 am



thanks, Linda, more question about path analysis with categorical Y in ex3.12, if we use U3 x1 x2 model X1>X2>U3 if I use U3 on x1 x2 only Estimates S.E. Est./S.E. X2 1.396 0.100 13.903 X1 0.798 0.073 10.894 and X2 on x1 only X1 0.034 0.043 0.793 if I use X2 on x1; u3 on x1 x2; Estimates S.E. Est./S.E. X2 0.795 0.025 31.455 X1 0.462 0.044 10.552 and X2 on X1 is almost same with above. my question is: since path analysis equal two single equation U3 on X1 x2 X2 on X1, why the estimates for U3 equation are different in path analysis and a single equation? 


I would need to see more information to answer this, for example, estimator, sample size, etc. Please send your inputs, data, outputs, and license number to support@statmodel.com. 

Oinas Tomi posted on Tuesday, February 19, 2008  12:35 am



Hello I am planning to use Mplus to my dissertation analyses and would like to ask few questions conserning path analysis with categorical variables in Mplus. I have similar situation with Kyrre Breivik: categorical (nominal, 5 categories) independent variable X, continous mediating variable M and continous dependent variable Y. My question is that how can I run analyses so that the effect (direct and indirect) of each category (5) of the independent categorical variable is compared to all other categories this variables? There is no single reference category and i would instead like to compare each category to other categories. Do i have to run separate analyses for each comparison? Could you show me the proper Mplus code for these analyses and do you know any examples of this kind of path analyses in articles etc.? Thank you in advance 


I would check a regression text to see the various types of coding that can be used for nominal independent variables. Once you decide on the type of coding that is appropriate for your research questions, you can use the DEFINE command to create the variables. 


Dear Bengt, In a response to a poster's query, you wrote on 02/16/06 regarding reporting results that "I think unstandardized coefficients and their standard errors should be the main part of reporting results because they are the parameters in the model. In addition, standardized coefficients are typically reported to help gauge effect sizes." I have followed this same approach to reporting SEM results for years. Recently a paper on which a colleague of mine and I worked was returned for revision and resubmission by a journal. Two of the reviewers requested that additional effect size information be included from a path analysis we conducted with continuous and ordered categorical varibles using Mplus. We had originally reported the unstandardized estimates, their 95% confidence intervals, and the corresponding standardized estimates, consistent with what you recommended in your posting above. In our revised manuscript, we can add Rsquare estimates for all endogenous variables in the analysis. Is there any other effect size estimate aside from the Rsquares for endogenous variables and the standardized path coefficients that you would recommend that Mplus users report from their analyses? Thanks so much, Tor Neilands 


I can't think if anything else. 


Hello, I am very new to Mplus, and I already liked it. However, I have encountered an error message for my model:The input file does not contain valid commands. The model is a path analysis without laten variables. So, everyhting is observed. The ultimate dependent and some other intervening variables are binary coded. By looking at the examples in the user's guide, I typed the following input script: TITLE: This is for a path model DATA: FILE IS votechoice.dat ; VARIABLE: NAMES ARE voted spkeng chatt whifri polint south SES civeng latiso age ; USEVARIABLES ARE voted polint civeng whifri SES spkeng chatt south latiso age ; CATEGORICAL ARE voted spkeng chatt whifri polint ; Missing are all (9999) ; ANALYSIS: type=general; estimator=ml ; MODEL: latiso WITH south ; south WITH age ; SES WITH spkeng ; voted ON latiso SES south age spkeng polint civeng spkeng chatt ; polint ON whifri latiso SES south age spkeng chatt civeng ; whifri ON latiso SES south age spkeng chatt ; SES ON latiso south age ; spkeng ON latiso south age ; chatt ON SES spkeng latiso south age ; civeng ON whifri SES latiso spkeng south age chatt ; Output:Standardized ; Where am I wrong in this command? Is it because I have indirect effects but they are modeled in the input file? Thanks for all the help. 


This usually means you have copied an input from an output and left INPUT INSTRUCTIONS above TITLE. If not, please send your files and license number to support@statmodel.com. 


Dear professors Muthen, I’m running a path analysis with categorical variables; they have between 5 and 8 categories each one. For example the variable WEALTH has only 5 categories. However, I got an error message, as it is shown below. Could you please tell what I’m doing wrong DATA: FILE IS "\tsclient\E\prueba\geoses.dat"; VARIABLE: NAMES ARE region area resedu wealth paroccu resoccu partedu; USEVARIABLES ARE resedu wealth paroccu resoccu partedu; CATEGORICAL ARE resedu wealth paroccu resoccu partedu; MODEL: resedu paroccu resoccu partedu on wealth; fses by wealth; *** ERROR Categorical variable WEALTH contains 14 categories. This exceeds the maximum allowed of 10. 


Please send your input, data, output, and license number to support@statmodel.com. 


Hello, I have mplus v.5 and I am doing a path analysis with 4 observed variables as follow: chd at time 1 (dummy no yes) depression changes(t1 and t2) 4categories qol3 at time3 continuos qol1 at time1 continuos I wrote the model as follow: categorical is depression; Model: qol1 on chd; depression on chd qol1; qol3 on chd depression qol1; Analysis: Type=general; Parameterization = theta; Model indirect:qol3 ind chd1; Output: STDY STDYX; I have 3 questions: 1) Which coefficients shall I look at STDYX or STDY? Does it depend on whether the variable is categorical or continuos? 2) In the output of the direct and indirect effects does Mplus calculate these estimates by multiplying a path coefficient estimated from a logistic regression with that obtained from a linear regression? 3)In the linear regression is my dummy dependent variable treated in the same way as a continuos dependent variable? Thanks 


1. See the STANDARDIZED option in the user's guide where this is discussed. 2. Yes, for weighted least squares estimation. 3. If a variable is not on the CATEGORICAL list, it is treated as a continuous variable. 


Thank you, for question 3) I meant to say a dummy independent variable x 01 that predicts a continuos dependent variable y. Is the coefficient the increase in y when x goes from 0 to 1? Like in any standard linear regression? 


In regression, covariates can be binary or continuous. In both cases, they are treated as continuous. 


Dear Linda, I am not sure which coefficients I should report for my results, given that I have one binary independent variable regressed on an ordinal dependent var, and this ordinal regressed on a continuos independent variable. On the STANDARDIZED option the user guide says that for a binary independent variable STDY should be used rather than STDYX. However, because I have one categorical dependent variable and I am using WLSMV, I cannot obtain STDY. In this case what shall I do? 


Use STDY for the binary covariate. Take STDY for the continuous covariate and multiply it by the standard deviation of that covariate to obtain STDYX. 


So I have a binary on a categorical, a binary on a continuos, a categorical on a continuos, when I ask the STDY in the output option, I get an error message "STDY option is not available for analysis with categorical outcomes and estimators WLS, WLSM, WLSMV or ULSMV. Request for STDY is ignored." So I can get only the STDYX, which is seems not to be correct, what can I do? Shall I not report the standardized results? 


A dependent variable is regressed on an independent variable. The way you describe the regressions it is not clear what you mean. Please send your output and license number to support@statmodel.com. 


I have run a path analysis one categorical (dichotomous) endogenous variable with an Analysis: Estimator = ML but I do not receive standardized coefficients but running analyses to obtain indirect effects with Analysis: Bootstrap 5 5000 provided standardized coefficients. Are the standardized coefficients given by the indirect effects output interpretable for all coefficients (categorical and continuous variables)? What is the best method to test the model fit with categorical data? Any information on reporting the results with observed categorical endogenous variables would be very helpful! Syntax for direct effects: INPUT INSTRUCTIONS Variable: NAMES ARE y1 u1 u2 z1 z2; USEVARIABLES y1 u1 u2 z1 z2; CATEGORICAL IS z2; Analysis: Estimator = ML; Model: z2 ON u2 z1; y1 ON u2 u1 z1 z2; Output: standardized; SAMPSTAT CINTERVAL TECH1; Syntax for the indirect effects: INPUT INSTRUCTIONS Variable: NAMES ARE y1 u1 u2 z1 z2; USEVARIABLES y1 u1 u2 z1 z2; CATEGORICAL IS z2; Analysis: Bootstrap 5 5000; Model: z2 ON u2 z1; y1 ON u2 u1 z1 z2; Model Indirect: y1 ind z2 u2; y1 ind z2 u1; Output: standardized; stdyx; SAMPSTAT CINTERVAL TECH1; CINTERVAL (BCBOOTSTRAP); 


Please send the relevant outputs and your license number to support@statmodel.com. 


Dear Dr. Muthen; I am running a two level SEM. i fitted with no problem my measurement part then added my level 1 variables. However, when i added my first between level variable i got this error: "*** FATAL ERROR THE WEIGHT MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE." i tried with other 2nd level variables 1 by 1 and all gave me the same error. right now i am trying to fix my residual variances to zero and see if that helps. would you have any other suggestions? thank you! this is the model. CLUSTER =sch; WEIGHT=mp_wt_w2; within=female zaage zses ahisp aafr aasian aoth su1 ap1; between = e1; ANAlYSIS: TYPE=TWOLEVEL; ESTIMATOR=WLSMV; !integration=montecarlo; processors=3; model: %within% suW2 by smoke2 nsmoke2 alc2 nmar2 ; smoke2 with nsmoke2; apW2 by egpaw2a mgpaw2a hsgpaw2a sgpaw2a; apW2 on female zaage zses ahisp aafr aasian aoth su1 ap1 ; suw2 on female zaage zses ahisp aafr aasian aoth su1 ap1 ; su1 on female zaage zses ahisp aafr aasian aoth ; ap1 on female zaage zses ahisp aafr aasian aoth ; su1 with ap1; %between% suB2 BY smoke2 nsmoke2 alc2 nmar2 ; apB2 BY egpaw2a mgpaw2a hsgpaw2a sgpaw2a; suB2 on e1; apB2 on e1; 


Fixing the residual variances to zero on between is a good idea. If that does not work, send the output and your license number to support@statmodel.com. 


Hello Dr. Muthen, We are modelling a path from age (defined using a fivecategory variable) through chronic conditions (eight separate binary variables) to health care costs (which has been logged). We are wondering if there is any way to convert the estimates generated by Mplus in to dollar values to facilitate interpretation. for example the mean health care cost for the eldest group as compared the youngest group. thanks in advance 


It sounds like you want to change a coefficient estimated for log(cost) to a coefficient suitable for cost. I assume you can do that by exponentiating the coefficient. In Mplus you label the coeff in the MODEL command, like logcost ON chronic (b); and then in MODEL CONSTRAINT you say NEW(expb); expb = exp(b); so that you get the estimate and SE of the new coeff expb. 


Thanks Bengt. I was more concerned about the mediated path and the assumption of underlying latent variables associated with the mediating dichotomous variables. is the interpretation the same as the direct path? Can I use the same method for the indirect impact to estimate the change in dollar amount? Thanks 


If your analysis works with latent continuous response variables underlying the binary mediators so that these are used as the predictors of the cost outcome, then the usual formulas apply. This would be the case with the WLSMV estimator or the Bayes estimator, but not with ML. With ML, the observed binaries are the predictors and you would have to use the formulas of my "Causal effects" paper. 


Thank you. 

Dale Hardy posted on Tuesday, October 13, 2015  8:31 pm



I would like to run path analysis models but I would like to stratify on BMI and whr (waist to hip ratio) categories: normal weight and over weight. Can you show me how to do this? I am including the stratification=BMI or whr but I don't think the model is giving me statistics for normal weight/obese or those with a high whr and normal whr. Please help. Dale 


Stratification = captures the sampling design of stratified sample. I think you just want to use USEOBSERVATIONS = x, where x can be a selection of people like USEOBSERVATIONS = bmi gt 25; 

m fatemi posted on Monday, September 04, 2017  2:24 am



I want to do a mediation analysis with this situation: 1. One predictor(x:quantitative), one response(y: quantitative), one meditated (m:categorical with 4 levels) I'm interested in direct effect of x on y and indirect effect 2. Five predictor(x1,x2,x3,x4,x5,:quantitative), one response(y: quantitative), one meditated (m:categorical with 4 levels) I'm interested in direct effect of x on y and indirect effect which model is the best in Mplus? I couldn't find a comment when mediated var is ordinal _____________ Do these commands work for this purpose? VARIABLE: NAMES ARE X M Y; usevariables are X M Y; CATEGORICAL ARE M; MISSING ARE .; ANALYSIS: ESTIMATOR IS WLSMV; bootstrap = 1000; model: Y on X M; M on X; model indirect: Y ind M X ; output: tech1 tech8 sampstat patterns cinterval(bootstrap); plot: type = plot3; 

m fatemi posted on Monday, September 04, 2017  2:27 am



forget: meditated categorical variable is ordinal 


Use WLSMV which defines the indirect effect as going via the continuous latent response variable underlying the ordinal mediator. 

m fatemi posted on Monday, September 04, 2017  9:04 pm



Thanks I did it. One other question, if I want to see the indirect effect of X on y in all levels of the ordinal mediator , what should i do? That is , is it possible to have a plot which is shown these effect simultaneously? (indirect effect in different category of mediated variable) _____________________ On the other hand, if my mediated variable is nominal what is the solution to the problem which i discussed in previous post? 


No  you can't get an indirect effect for each category of the ordinal. The indirect effect "collapses/integrates" over the mediator, so particular categories are not relevant. Same for nominal  mediation with a nominal mediator is discussed in our RMA book. 

Back to top 