Benedetta posted on Wednesday, February 21, 2001 - 9:21 am
I read that Mplus have been greatly expanded in version 2. I have the first version and I'm very interested in it. I would like to know if you use a full maximum likelihood approach to estimate a multilevel structural equation modeling with binary and/or ordered categorical indicator variables and wich type of approach. And if factor scores are available for this type of models.
Mplus does estimate factors scores for ordered polytomous variables. You ask for them through the SAVEDATA command.
Per Togo posted on Tuesday, May 20, 2003 - 8:10 pm
I have used a CFA model with three factors and categorical variables (8 categories) x BY a b c d; y BY e d f g; z BY h i j k;
I could not find a description of how the program computes the factorscores for more than binary variables(using categorical, savedata, fscores). (I'm not that experienced in matrix calculations and need a description for a paper) Can you help me out?
Appendix 11 of the Mplus User's Guide contains a description of how factor scores are estimated in Mplus. The description for binary variables which begins with formula 227 applies also to polytomous variables. For categorical outcomes, factor score estimation is an iterative technique.
Anonymous posted on Monday, June 02, 2003 - 1:17 pm
I would like to know how does Mplus estimate the thresholds?
I'm a little confused. You say you have both manifest and observed independent variables. Are the factor indicators for the latent independent variables continuous or categorical? What is the dependent variable in the model? Is it observed or latent? If it is observed, the statement in the MODEL command would be:
y1 ON x1 f1;
where y1 is an observed binary or ordered categorical variable, x1 is an observed independent variable, and f1 is a latent independent variable. You would also need to identify the y1 variable as categorical by using the CATEGORICAL statement of the VARIABLE command.
Please let me know if this does not answer your question.
TYPE=LOGISTIC is for a single dependent varirable. If you do TYPE = GENERAL with the CATEGORICAL statement, you will estimate a probit regression for the categorical dependent variable. I think this is what you want to do. There is a summary of analysis types and estimators available for each analysis type on page 38 of the Mplus User's Guide.
Anonymous posted on Thursday, November 06, 2003 - 10:30 pm
I want to create a latent trait variable with categorical indicators (same variable across 6 waves) and the WLSMV estimator and wanted to check to see if my code was correct.
sui by x1 x2 x3 x4 x5 x6;
Since these indicators do I have to specify some code for the thresholds? Also, how would I interpret the thresholds at each time point?
The way you are coding this does not take time into account. But you don't mention that you are interested in change over time. So then it is ok. If you are not taking time into account, then it is not necessary to include thresholds in your analysis.
Anonymous posted on Tuesday, January 04, 2005 - 11:57 am
I am wondering what is the most appropriate way to handle observed (not latent) categorical covariates in a WLSMV framework. In my study X1 is an ordinal variable (6 classes) and X2 is binary.
Currently, this is how I am specifying the model:
Catategorical are p1-p7 a1-a8 d1-d6;
f1 by p1-p7; f2 by a1-a7; f3 by d1-d6;
f3 on f1 f2 X1 X2;
Is it alright that the independent X variables are not specified as categorical?
In addition, if I try to estimate the covariance between X variables and F1 F2 (i.e., X1 X2 with F1 F2) I get an error message about X1 being binary (the program then tells me to specify it as categorical). However, if I do not free the covariance between the X1 X2 and F1 F2 the model has a poor fit.
You should not put x1 and x2 on the CATEGORICAL list. The CATEGORICAL list is only for dependent variables. You should treat your covariates as you would in regular regression. The binary variable can be left as is. The ordinal variable should be turned into five binary dummy variables. You should not specify any covariance between x1 and x2. The model is estimated conditioned on x1 and x2. If you want to know the covariance between x1 and x2, you can do a BASIC run and obtain this.
Anonymous posted on Tuesday, January 04, 2005 - 12:31 pm
A few follow-ups:
1) Should I specify the covariance between the observed categorical variables (either the binary, ordinal, or both) and the other latent predictors? I know that Mplus does not do this as the default, but it degrades the model fit.
2) If the observed X variables were continuous would your answer to question 1 be different?
1. You should not specify the covariances among the x variables. By doing so, you bring them into the model estimation and thereby the assumption of normality applies to them. You are not estimating the model with these covariances fixed to zero. The model is estimated conditioned on the x's. Therefore, the assumption of normality does not apply to them. They are not part of the model so if the fit is poor, it is not because the covariances are zero. They are not. It's the same as regular regression. If you have ten covariates, the parameters of the regression model are an intercept and ten slopes. The model does not estimate the covariances among the x variables.
Anonymous posted on Tuesday, January 04, 2005 - 1:51 pm
When I include the categorical X variables into the model (without specifying any covariance), I get the following error message:
*** FATAL ERROR VARIABLE X1 CAUSES A SINGULAR WEIGHT MATRIX PART. THIS MAY BE DUE TO THE VARIABLE BEING DICHOTOMOUS BUT DECLARED AS CONTINUOUS. RESPECIFY THE VARIABLE AS CATEGORICAL.
The only place the categorical x variables should be is on the right-hand side of ON. You should not mention their means, variances, or covariances.
Anonymous posted on Tuesday, January 04, 2005 - 5:27 pm
Last question in this vein. Please excuse my ignorance.
When observed variables are added as covariates to the right side of an ON in an SEM model, should I be concerned that their presence dramatically reduces the fit of the overall model? The modification indicies are indicating that regressing them onto other latent predictor variables in the model would improve model fit. However, I really am only interested in examining if the latent predictors are significantly related to the latent DV after controlling for observed variables like race, ses, prior delinquency etc.
Thanks for your help.
bmuthen posted on Tuesday, January 04, 2005 - 10:03 pm
Typically, other latent predictors would be regressed on the covariates since the covariates are often demographics and therefore antecedents - so such relationships should be included. Model fit can also suffer from left-out direct effects from the covariates to the indicators of endogenous factors.
Anonymous posted on Wednesday, January 05, 2005 - 3:03 pm
When I regress the covariates onto the latent predictor variables the model becomes extremely poor fitting. I have been racking my brain trying to figure out why this is occuring, but have made no headway. Do you have time to take a quick peek if I send you the data and syntax?
CATEGORICAL = persiste; GROUPING IS ethn69 (1=white 2=latino);
Analysis: type = general missing MEANSTRUCTURE h1; estimator = ML;
*** ERROR in Analysis command ALGORITHM = INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE = MIXTURE. _______________________________________________
I have tried using the KNOWNCLASS option for TYPE=MIXTURE as recommended but I get a message telling me (correctly so) that I don't have a categorical latent variable. I have four latent variables and a binary outcome variable, how/where should I put this variable?
Check KNOWNCLASS in the index of the user's guide. IT should point to an example.
Anonymous posted on Sunday, January 23, 2005 - 12:26 am
Thank you for your prompt reply Linda. I want to thank Thuy for his valuable help with my model as well.
I have the following questions:
I began doing a multiple group analysis and have developed group specific models using the modification indices provided by the outputs.
Now that I will be using the KNOWNCLASS option with TYPE=MIXTURE,
1) How is a group specific model developed?
2) Is it possible to analyze the measurement model first and then the structural model. I have not added covariates but I would like to at least add "primary language" and "income"
2) do I use modification indices from the multiple group analysis models or how do I determine measurement invariance?
3) how are the starting values of the variables for each group selected and for each observed factor indicator for each latent variable? I have four latent variables: GC at time 1 (five factor indicators), GC at time 2 (five factor indicators), academic integration (2 factor indicators), and social integration (2 factor indicators); and two categorical variables: persistence and performance (GPA).
4) Does MPlus analyze indirect effects with the TYPE= MIXTURE model?
in addition I am measuring one latent variable at two time points:
I know that growth mixture modeling is not appropriate since I only have two time periods, how is effect of time taken into account in measuring this model?
Pardon my MPlus illiteracy.
Thank you for your help.
bmuthen posted on Sunday, January 23, 2005 - 9:43 pm
1) In the Model command you specify group-specific parameters within %c#% statements, e.g.
says that the variance of the factor f is specific to class 1.
2) I think MIs are available in this environment too, but you can also use chi-square difference testing via 2 times the logL differences of nested models.
3) Mlpus provides default starting values.
4) Pleae check with Thuy.
5) Analyze the outcomes for the 2 time points together: if you have 5 outcomes per time point you analyze 10 variables.
Anonymous posted on Friday, March 25, 2005 - 12:54 pm
Drs. Muthén & Muthén, I have what I hope is a very quick question. I have been reading (and rereading) the Muthén, du Tiot, & Spisic (1997) paper on your robust methods. From what I understand, Mplus uses Equation 44 to obtain parameter estimates when WLSMV is invoked. The weight matrix in Equation 44 is the inverse of a diagonal matrix, W^-1. For WLSMV (and WLSM) I could rewrite Equation 44 by replacing W with Gamma_D where Gamma_D here is the diagonal elements from Equation 48. Is my understanding correct?
bmuthen posted on Friday, March 25, 2005 - 3:08 pm
Anonymous posted on Thursday, April 28, 2005 - 7:45 am
I'm trying to construct a two-way ANOVA like interaction in MPLUS 3.11 using two observed categorical variables. The interaction is between gender (two categories) and family structure (five categories). The dependent variable is continous measure of antisocial behavior. In the Mplus user guide (p.420) you list various ways of obtaining interactions for different variable types, but have not included the interaction between observed categorical with observed categorical. Which is the best method to use in that case ? I have used the following method: (1)Coded the family structure variable into four dummy codes (the omittet is the reference group) predicting antisocial behavior. (2) Used multigroup analysis to examine if there is any significant differences between the unstandardized betas (mean values) for boys and girls.
Is this the correct way to do it ? If it is, do you know any litterature that deals with this categorical times categorical variable interaction using multigroup SEM? Are you allowed to use multigroup analysis when you have multilevel design?
You can use DEFINE or the multiple group approach as you have done above. I don't know of any reference to this. For some multilevel models, the GROUPING option can be used for multiple group analysis. If not, then the KNOWNCLASS option can be used instead.
Fred Li posted on Friday, November 18, 2005 - 9:39 pm
I am a new Mplus user. Is there anyone who can help me with the following Mplus program that has an error message at the end of it? My intended task was to create a polychoric correlation matrix for categorical variables for further CFA analysis.
Thanks in ADVANCE!!
TITLE: Number Sense in 2005; Note that this program has saved out the polychoric correlation matrix in a file called test.pcm;
DATA: FILE IS "C:\Documents and Settings\USR1\®à±\NS2005.dat";
It looks like you have not used the FILE option of the SAVEDATA command to give the name of the file in which to save the correlation matrix. Note that with categorical outcomes, you cannot analyze a correlation matrix and a weight matrix in Mplus but need raw data. You should send these types of questions to firstname.lastname@example.org and include your license number.
adthrash posted on Thursday, March 02, 2006 - 3:37 pm
I have a SEM model using weighted data that will converge with no errors when all the variables are considered continuous but won't run when some indicators are considered categorical (binary) or count. f1 has 7 binary indicators (yes/no) and f2 has 4 count indicators (0-7 days) and one continuous indicator.
... CATEGORICAL ARE x1 x2 x3 x4 x5 x6 x7; COUNT ARE y1 y2 y3 y4; ...
ANALYSIS: TYPE IS GENERAL COMPLEX; ESTIMATOR IS MLR;
MODEL: f1 BY x1 x2 x3 x4 x5 x6 x7; ... f2 BY y1 y2 y3 y4; ...
MODEL INDIRECT: f2 IND x8; f2 IND f1; f2 IND f3;
The error message I receive is "MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION," which I did not specify. If I comment out either the CATEGORICAL or COUNT lines, I get the same error. If I comment out both, everything runs fine.
When you use MLR with either CATEGORICAL or COUNT, numerical integration is required. Therefore, MODEL INDIRECT is not available.
adthrash posted on Friday, March 03, 2006 - 11:56 am
Thanks for the quick response. The model terminated normally when CATEGORICAL was used and I commented out the COUNT and MODEL INDIRECT. However, now I get the error "MODINDICES option is not available for ALGORITHM=INTEGRATION." Could I consider all of my variables continuous in order to get MODINDICES, but use the parameter estimates derived when considering the vars categorical when reporting results?
bmuthen posted on Friday, March 03, 2006 - 12:33 pm
That would only be very approximate. Instead, you might want to explore model variations and do chi-square difference testing via -2*loglikelihood difference.
Sorry, I've just found the answer in one of your web seminar files. We do not distinguish between scales if the variable is an IV (x). However, I guess that in spss user have to specify if they'll use a categorical variable. But this seems not to be necessary with Mplus, right? Regards, Stephan
In regression analysis, independent variables (covariates) can be binary or continuous. In both cases, they are treated as continuous. In SPSS, perhaps the specification of categorical for an independent variable automatically creates a set of dummy variables for it. In Mplus, if you have a nominal independent variable, you must create a set of dummy variables.
Soyoung Lee posted on Wednesday, March 21, 2007 - 11:17 pm
I have a question about the dimension of the weight matrix of WLS/WLSM/WLSMV with categorical indicators in factor analysis. I understand it is [(p(p+1)/2) x (p(p+1)/2)] when p=# indicators, p(p+1)/2=# univariate & bivariate marginal proportions, in dichotomous case. I wanted to know what it would be in polytomous case. [((C-1)p+p(p-1)/2) X ((C-1)p+p(p-1)/2)] with C=# categories, (C-1)p+p(p-1)/2= (# thresholds) + (# polychorics)? Also, I found the WLS and WLSMV estimates of thresholds are slightly different. Aren’t they the same since thresholds are the normal z scores corresponding to the univariate probabilities? Or, are these updated in the 3rd stage of the Muthen's WLS procedure? Thanks.
Yes on your first question (assuming all items have the same number of categories C).
The model-estimated thresholds are different between WLS and WLSMV because in WLS it is acknowledged that the sample statistic thresholds are correlated among themselves and with the polychoric correlations, while with WLSMV a diagonal weight matrix is used (no correlations at all). The Muthen (1978) Psychometrika article details this in terms of WLS vs ULS.
Hi, We are trying to perform CFA with M-PLUS, including both continuous indicators and ordered polytomous indicators (likert- 5ctg). I ran CFA indicating which variables were categorical (CATEGORICAL instr) and WLSMV estimator. Result: the continuous items have big residual variances (r-sq<0.2), small loadings (sdtYX <0.45)and not very good GOF indices. However, when treating all the variables as continuous (and MLM estimator), I obtain better results. The results are not awesome, but the most usual GOF indices are over the recommended thresholds (CFI and TLI > 0.95, RMSEA=0.015, Standardized Root Mean Square Residual=0.049). Also, residual variances are relatively small (except for one item, FD21= 0.756). Chi-Sq test is significant in both analysis, but sample size is 4400 indiv. I understand that it is more appropriate to treat the variables as categorical, specially taking into account the highly skewed distributions of our items, but the fact that I obtain better results with the continuous variables approach makes me doubt. Could you tell me which is the best approach to analyse this data? does it make sense to compare GOF indices results of both methods and use the one that provides better fit? Or is it incorrect to analyse ordered categorical items not taking into account that they are actually categorical (i.e. not using CATEGORICAL statement)?
If your items have strong floor or ceiling effects, treating them as continuous results in attenuated correlations. The low correlations can result in better model fit because there is less power to reject the null hypothesis. Categorical items with strong floor or ceiling effects should be treated as categorical.
David Kerr posted on Monday, May 28, 2007 - 3:50 pm
Hello---I’m trying to test a model that has latent variables each comprised of categorical indicators (questions that can be answered agree, disagree, or don’t know).
The “don’t knows” are of interest, so I thought I would create two dummy coded variables: one coded 1 = agree, 0 = disagree/don’t know, and another coded don’t know = 1, agree/disagree = 0. My thought was that I might be able to create a “don’t know” factor that would capture this response type, and make the other factors more interpretable in terms of agree/disagree.
However, when I put these variables in the model at the same time, I got error messages indicating that correlations between variables result in empty cells (i.e., no one can have a value of 1, 1). Is such a model with dummy coded dependent variables possible?
The first indicator in fixed to one as the default to fix the metric of the factor. See the BY option in Chapter 16 of the Mplus User's Guide for a description of this. A fixed parameter has no standard error.
Hi, I'm trying to specify an interaction model using MPLUS 5.1
The interaction is between age(H9AGE, continous) and perceieved diversity (C1SAGEr, six categories). The dependent variable is categorical measure of emotional exhausion, depersonalization, and personal accomplishment
In the Mplus user guide (p.522) you list various ways of obtaining interactions for different variable types.
1. Means, variances, and covariances of exogenous observed variables should not be part of the MODEL command. The model is estimated conditioned on these variables. If you want to know these values, ask for SAMPSTAT in the OUTPUT command.
2. You cannot use multiple group analysis when one of the variables in continuous. The way you specified it is fine.
I'm doing CFA for a new developed test. It comprises 23 items which belong to 14 testlets. Hence, testlets comprise 1 to 4 items. Items are scored as either right (1) or wrong (0). To account for dependencies among items belonging to one testlet, testlet scores are computed (sum of correct items per testlet). Maximum testlet scores vary between 1 (for testlets with only one item) and 4 (for testlets with four items). I've done CFA, using the CATEGORICAL option.
Because I was not sure whether the indicators need to have the same maximum value, I rescaled all testlets scores - 12 being the maximum number (all items correct) and 0 the minimum number. For testlets with four items, the testlet score can be 0, 3, 6, 9, 12; for testlets with three items 0, 4, 8, 12; for testlets with two items 0, 6, 12; for testlets with one item 0, 12. Re-doing CFA my results differ in magnitude with regard to chi square, fit indices, factor loadings.
Which results to trust now? Is a common scale with the same maximum score for all testlets necessary or shall I leave the data as it is?
there is a paper by Browne and Du Toit from 1992 ("Automated Fitting of Nonstandard Models", Multivariate Behavioural Research, 27 (2), 269-300)where they suggest estimating "Root Deterioration per Restriction RDR" for model comparison. The formula is RDR = square root of ((delta chi² - delta df)/(N* delta df))
I'm using the WLSMV estimator for categorical data. Am I correct in assuming I should use the delta chi² and delta df I get in my output when I do the chi-square difference test using the DIFFTEST option for the above formula?
Apparently, y21 does not have the value 5 in group 2. With weighted least squares, each group must have the same categories for categorical variables. You could collapse categories in group 1. With maximum likelihood estimation, you can use the * setting of the CATEGORICAL option.
Sanja Franic posted on Thursday, November 17, 2011 - 4:08 am
Thanks!! I hope I can do the same/similar with WLSMV estimation.
Sarah Ryan posted on Friday, September 21, 2012 - 7:04 pm
I just want to confirm something and I have searched the discussion board and web, but can't find an explicit answer.
Is it correct that the output section "UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES" provides the model estimated (as opposed to observed) proportions and counts?
I am a student in Ege University, TURKEY. I managed to generate data sets with continious indicators. But my homework is generating data sets for my friends' homework using Mplus. The programme is only available in computer labs and i usually have limited time to use. I am trying to generate data sets with ten items; 6 item for factor 1 and 4 items for factor 2 which are on 5 point likert scale. I know it sounds simple but I couldn't.
Here is my syntax:
montecarlo: names = y1-y10; nobservations = 200; nreps = 25; seed = 2344887; generate = y1-y10 (5 p); categorical are y1-y10; repsave = all; save = odev*.dat; MODEL POPULATION: f1 ON y1-y6*.65; f2 ON y7-y10*.70; f1-f2@1; f1 WITH f2*.2; y1-y10*.32; OUTPUT: TECH9;
As you see I can't reach the solution. Would you please help on my syntax or show me a simple example.
I am running a structural model where I have 10 3-point likert indicators that are theorized to measure 4 constructs, 3 being predictors and 1 being the outcome. I want to declare these variables as categorical ordinal. However, one of the predictor constructs has only one indicator, so if I just include it as an independent variable, it cannot be declared as an categorical ordinal. Is it OK to include a factor for it (to make it a y so I can declare in as categorical) and fix both the loading and variance of that one-indicator factor at 1? Would this be equivalent to using the underlying continuous version of the categorical variable?
Below is my input. The variable in question here is var7.
Yes, this would be equivalent to doing that. I'm not sure why you would want to do this given that you make unnecessary normality assumptions. I would use the observed variable as the covariate or create a set of dummy variables to represent the categories and use them as covariates.
Thank you. Yes, I need to think about whether to do this or to use dummy variables. The pro of using an assumed normal continuous version of var7 is the result will be simple -- just one regression coefficient, similar to one coefficient for each of f1 and f2. Conceptually these are three things hypothesized to influence f4, and it would be nice to have three coefficients instead of four. The con is introducing inaccuracy through normality assumption.
Var7 has three ordinal categories and there are roughly the same number of people in each category. Does this suggest that it is perhaps OK to assume normality for the underlying continuous variable? Or should I still worry that it might be skewed and the equal proportions could be just an artificial result of the thresholds?
In practice, is it something that is sometimes done, to treat an ordered categorical independent variable this way? Or is it a no no?
Thanks much for advising me on this. I am quite inexperienced in SEM.
I vote for treating the observed 3-category Var7 as a continuous variable - that is, as a regular covariate. I say that because you say its distribution is symmetric and because when you treat it as a regular covariate you don't introduce an assumption of underlying normality.
I would like to model some kind of twin model: that is two latent variables, each measured by 3 binary indicators. Measurements might be parallel. I have other covariates to add.
I tried to model this by factor analysis, that is i have two continuous latent variables measured by 3 indicators each. Though i have a grouping variable. It results in very good model fit (all Chi², RMSEA, TLI, Chi² difference tests). So i resume the indicators are good measurements of some underlying factor.
But, if I try to fit the model using LCA as a framework, that is the latent variables shall both be binary classes (with 3 the same result), model fit turns out to be awful.
Is there any reasoning why this doesn't work, although the indicators seems to be fine? Is there anything to try out or to look at as an explanation?
With the LCA for twins, you should have two latent class variables, one for twin a and one for twin b. Each one should have two classes. You need to use PARAMETRIZATION = LOGLINEAR and use WITH to correlate the two categorical latent variables, for example,
c1 WITH c2;
Wendong posted on Thursday, April 25, 2013 - 12:04 am
Hi Dr. Muthen,
May I ask a quick question?
I know strictly speaking, we should treat Likert-type (e.g., 1-5 point) item level data as ordinal and thus in SEM we should use estimator=MLSMV.
But it seems that many people treat such scales as continuous. Thus my question is, could we still treat such items as continuous, but use WLSMV as estimator, to deal with data skewness at item level?
I have a question of how to interpret and to best standardize the estimates of the model below. My main interest lies on the regression coefficient of continuous time survival outcome Y on factor F, which is measured by one count (marker) indicator (X1) and two binary indicators (X2, X3).
COUNT = X1; GATEGORICAL = X2 X3; SURVIVAL = Y; TIMECENSORED = cens; ANALYSIS: BASEHAZARD = ON; ESTIMATOR =MLR; MODEL: F BY X1 X2 X3; Y ON F;
1) Not sure how the factor F is scaled: a count indicator X1 is used to set the scale, but based on the model output, the link function used is logit. Shouldn’t the link in count outcomes be log instead? So, in this case, how to interpret the factor scale?
2) Given the above, what’s the best way to standardize here? I know that it makes no sense to standardize the binary indicators, but should I use STDYX standardization when interpreting the regression coefficient of Y on F?
Thanks Bengt. I did as suggested and the results changed a bit - the regression coefficient of Y on F had a different sign now (although non-significant in both cases). Is this what to expect?
Moreover, as someone not too familiar with latent variables, does that scaling approach change the interpretation of the latent variable (e.g. is the latent still measured with "error" although its variance is fixed)?