Message/Author 


I need to do a confirmatory factor analysis on nested data (days within individuals). Can Mplus do this for me? Is it possible to do such a factor analysis with different numbers of data points from different individuals? 


Yes, that can be done in several ways in Mplus if your outcomes can be viewed as continuous. First, you can do it as multilevel factor analysis, see Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376398. Second, you can treat the days within individuals as multivariate outcomes in a singlelevel model, just as is done in growth modeling. The issue of different number of days within individuals is not a problem in either approach. In the first approach, this means different cluster sizes. In the second approach this means missing data. 

Anonymous posted on Wednesday, May 31, 2000  11:37 am



You memtion on the website that with MPlus a MIMIC model can include indirect effects, the effect of a background variable on a factor indicator via the factor. Is this illustrated in the Mplus manual? If not is it a straight froward procedure. Many thanks. 


There are examples of MIMIC models in Chapter 21 of the Mplus User's Guide. Indirect effect parameters are not automatically calculated in Mplus. You would have to multiply the factor loading by the regression coefficient to obtain the parameter estimate. The standard error of the estimate would have to be obtained using the Delta method. 

melady posted on Sunday, August 13, 2000  9:01 am



I have done a factor analysis with disaggregated data using different cluster sizes. What I would like to do now is see whether the factors hold across clusters. I am thinking I could compare the goodness of fit for a confirmatory analysis using all the data without regard for clustering to the goodness of fit for a confirmatory factor analysis using the cluster as a part of the model. Is this reasonable? 

Anonymous posted on Wednesday, February 21, 2001  7:59 am



I am doing CFA on a model with three latent variables with four or five paramaters each. Nevertheless, my model is not identified. I have fixed the starting value of each paramter to 1 and correlated the three latent variables. There is prior research using this scale which suggest that at least two of the factors are highly correlated. Would fixing the correlation between factors help identify the model? Are there other reasonable options for overcoming the model identification problem? Thanks for your time. 


If you have 12 to 15 factor indicators which I think is what you are saying, a three factor model should be identified. Fixing all of the starting values to one is not a solution. Have you perhaps freed the factor means? They must be fixed at zero (the Mplus default) for the model to be identified. Otherwise, I would need to see your output to fully understand your problem. 

melady posted on Tuesday, March 06, 2001  2:34 pm



I wish to conduct a factor analysis with data on specific individuals provided by multiple informants. How can Mplus help me to do this? I have the program, but I don't fully understand what I'm doing with it! Melady Preece 


As I understand, you have data where a trait is measured for individuals be several informants, for example, teacher, peers, and parents. You could do a factor analysis in the following way, factor BY teacher peers parents; 


I would like to use Mplus to estimate a nonlinear relationship among latent variables [interaction]. Joreskog and Yang (1996) demonstrated such a model can be estimated using SEM if an observed product variable is used as an indicator of the latent product variable. Bollen (1995) used a twostage least squares with instrumental variables to estimate the interaction. How can I use Mplus to estimate an interaction model? 


Mplus cannot do what Joreskog and Yang demonstrated because it requires nonlinear constraints. The Bollen approach can be done in Mplus but it is not directly implemented. It would have to be done in a series of steps. 

Anonymous posted on Thursday, May 10, 2001  3:57 pm



I read in the Mplus 2.0 manual (page 346, bottom) that: "When x variables are present, the conditional normality assumption allows nonnormality for y* as a function of nonnormal x variables". I also note that in his chapter "Some Uses of Structural Equation Modeling in Validity Studies: Extending IRT to External Variables" Bengt writes (pg 218, middle): "We will add the multivariate probit assumption that y*x is multivariate normal. Note that this does not mean that we assume normality for the y*'s or for the [Eta], but normality is merely required for the [residual and error terms]. The distribution of [Eta] and the y*'s is actually to some extent generated by the x's". I.e., in a CFA "in isolation" with ordered categorical indicators but no background (x) variables, one assumes that the latent y*'s are normal (but makes no assumptions about y* variances). However, introducing covariates relaxes or makes less crucial the assumption of the y*'s normality ? How is this so ? Does "nonnormality of y*'s when x's are introduced" hold for all Mplus estimators (i.e., WLS as well as robust WLSMV, etc) ? 


The reason that conditional normality can be assumed is that the sample statistics used for estimation are not correlations but rather probit regression coefficients. The follow article explains this: Muthén, B. (1983). Latent variable structural equation modeling with categorical data. Journal of Econometrics, 22, 4865. (#9) This conditional normality holds for WLS, WLSM, and WLSMV. 

Anonymous posted on Monday, May 14, 2001  11:28 am



Responding to your May 11 reply above: Is this why I find my indicator item rsquares increase by 1015 percentage points when external variables are introduced into the model ? 


No. The reason your rsquare values increase is that the variables that you are adding explain the variability of the indicators. The rsquare values are not dependent on which estimation method you use but on the model. 

Anonymous posted on Wednesday, June 13, 2001  2:48 pm



I have a question concerning the rsquares that are in my output. I have dichotomous dependent variables, so are the rsquare values equivalent to adjusted rsquares? thanks. 


The rsquares for categorical dependent variables are defined for the y* variables. See Appendix 1 of the Mplus User's Guide. It is not an adjusted rsquare. 

Anonymous posted on Monday, August 27, 2001  1:40 pm



Provided that one isn't performing nexted ChiSquare tests of model fit, are there any circumstances when you would recommend one *not* use the robust (WLSMV) estimator ? Why does Mplus include the WLSM estimator if the WLSMV estimator is available (and, I'm guessing technically superior) ? 


We actually recommend WLSMV in all situations based on current knowledge of WLSMV and the other alternatives. We include WLS and WLSM so that they can be studied further. It may be in certain situations one of them will behave better. 

Anonymous posted on Tuesday, August 28, 2001  12:11 pm



As a follow up to the above response, would you provide a reference / citation discussing nested ChiSquare tests of fit for the robust estimator. 


I don't know of any for categorical outcomes. 

David Klein posted on Friday, October 05, 2001  3:15 pm



I saved the factor scores from a CFA (using the "SAVE = F_SCORES" option) and read them into another software program. When I calculate simple statistics on the saved data, the means & variances of my latent variables do not match the means & variances shown in the Mplus "Tech4" output. Not even close, really. Why not? My goal, if it matters for your answer, is to create some kind of residual that represents the unexplained variance of an indicator (that is, the variance not explained by the latent variable on to which the indicator was loaded) Thanks!!! 


Be sure to include TYPE=MEANSTRUCTURE if you are using Version 2.0. Means were not turned on automatically in that version. Subsequent updates have corrected this problem. If this does not help, please send data and output and I will take a look at it. 

Anonymous posted on Wednesday, November 14, 2001  1:21 pm



I'm trying to save factor scores in CFA. The output file has all the variables on the usevariables list and the factor scores appended at the end. I'd like to merge the factor scores with a larger data set, but am not sure how to save an identifying variable with the factor scores. When I include an identifier on the usevariables list, the model doesn't converge. 


You cannot do this in Version 1. In Version 2, there is an IDVARIABLE = statement in the VARIABLE command. 

Jef Kahn posted on Friday, November 16, 2001  9:29 am



I am estimating a CFA with 4 factors and 49 items using ML. The fit of the model is not very good. However, when looking at the modification indices I notice that dozens and dozens of the fixed factor loadings have MI values of 999 with a StdYX S.P.C. of 0. What does this mean? 


We print 999 when the denominator in the formula for modification indices becomes zero or very close to zero. You can find the formula on page 373 of the following article: Sorbom, D. (1989). Model modification. Psychometrika, 54, 371384. 

Jef Kahn posted on Friday, November 16, 2001  10:59 am



Following up from the last post, does a M.I. value of 999 indicate that the M.I. value would likely be very large or very small? Or is that simply indeterminable in this case? 

bmuthen posted on Friday, November 16, 2001  4:04 pm



Simply indeterminable. 

Hervé CACI posted on Saturday, November 24, 2001  2:14 am



I'm conducting a set of CFAs. How can I check my datasets of 03 Likerttype variables for outliers ? Can I use the usual methods (i.e. those for continuous variables...) ? 


I would suggest using the usual methods for continuous variables. 

Anonymous posted on Thursday, December 06, 2001  5:53 am



1: I have run a multiple group confirmatory factor analysis on categorical data (3 factors, 3 groups) using meanstructure and save fscores in vers. 2. Can you explain in words how the scores are estimated? (I have trouble understanding the formulas in the manual pp 3856.) 2: Is it reasonable to use multiple group analysis (default settings) to compute scores for the same individuals on 3 points in time (= 3'groups') to study changes in scores? 

bmuthen posted on Thursday, December 06, 2001  5:51 pm



A factor score is an estimate of an individual's most likely value on the factor given both the estimated model and the individual's observed variable values. For example, in measuring an ability dimension with multiplechoice items, an individual who got many items right gets a high estimated score, but the estimated score is also affected by group membership so that the estimated score is lower for a group that has a lower estimated factor mean. This is in line with Bayesian estimation where the estimated model is the "prior" to which information on the individual's data is added to get a "posterior". The estimated score is the maximum of the posterior distribution. Multiplegroup analysis is concerned with independent data from the different groups. Longitudinal data do not give independent data from the different time points. You can, however, use (longitudinal) factor analysis with acrosstime measurement invariance to estimate and compare scores over time. To study change in score over time, however, it would seem advantageous to use growth modeling. 

Anonymous posted on Tuesday, December 11, 2001  11:59 pm



I am trying CFA on ordinal data using WLS estimator. I have constructed the secondorder factorial model, consisting of five firstorder factors and twelve observed variables. Of these five firstorder factors, two had only one observed variable. For model identification, we have tried to constrain 1 to the path loadings from these firstorder factors to observed variable. In addition, we constrained 0 to the error variance of these observed variables. However, this model could not be identified. Can I identificate this type of model in Mplus? 


If the factors have only one indicator, don't create a factor. With categorical outcomes, the residual variances are not parameters in the model, so this changes things from the continuous case. Just use the indicator as a indicator of the secondorder factor. Let Mplus do what is necessary. If you ask for TECH1, you will be able to see which parameter is causing the identification problem. It is most likely the residual covariances among the firstorder factors which need to be fixed to zero for identifiability. If you cannot solve your problem this way, send me the output including TECH1 and the data and I will take a look at it. 

JND posted on Tuesday, December 18, 2001  9:55 am



I hate to express my ignorance publicly on this . . . . I ran the CFA with three latent variables and 28 observed variables (per theory). Only five of the estimates were less than 1.0. (None were negative.) How do I attack this? 

bmuthen posted on Tuesday, December 18, 2001  10:29 am



The unstandardized loading estimates do not have to be less than one; this depends on the scale of the y. The Stdyx standardized values can still be less than one. If you fix to one the loading that has the largest unstandardized estimate in your run, you will most likely get unstandardized loadings less than one. 

Anonymous posted on Monday, March 18, 2002  10:11 am



Is there a way to check the multivariate normality assumption in ML estimation in Mplus? 

Anonymous posted on Monday, March 18, 2002  12:20 pm



I am estimating a three factor model with ML MLM and WLS. My 18 observed variables are likerttype and is multivariate nonnormal (Mardia's coefficient normalized estimate = 160). As expected MLM chisquare estimate 2272.9 (df=125) is lower than ML estimate 2750.966, other fit indices also behave similarly since they are corrected for nonnormality (scaling correction factor (1.21). But when I ran the same model with WLS estimation method the results are considerably different. ChiSquare Value 1326.607 (df=125) Both CFI and TLI is quite lower than their previous estimates of (.94; .92) CFI: 0.718 TLI: 0.655 although RMSEA is lower with WLS compared to MLM (.061) WLS RMSEA: 0.046 Which results should I trust in this case? Is there a better method than the ones I used? Thank you very much for you help. 


There is no way to do this in the current version. We are currently exploring this topic. 


Regarding the threefactor model, can you send the three outputs and data if possible to support@statmodel.com? I will take a look at them. 


Thank you for sending the outputs. Getting such different results with is unusual. There are a couple of things that may be going on. 1. WLS is not suiited for many observed variables like the 18 you have. See Muthen and Kaplan, 1982. 2. You may have hit a local solution with ML and MLM. You might want to try the WLS estimates as starting values and rerun these analyses. It may be that a local solution is also the problem in the baseline model. Note that for ML and MLM, the chisquare for the baseline models is about ten times as large as for the target models, while for WLS, the chisquare for the baseline model is about four times as large as for the target model. 3. You might consider doing a simulation study to compare the three estimators for a problem similar to yours and see which estimator behaves best in practice. This in the only way to really know which outcome to trust. 

Anonymous posted on Tuesday, March 19, 2002  11:43 am



Thank you very much for your answers to my post dated March 18. If I may, I would like to follow up on your answers. I tried your second suggestion and ML or MLM estimates and fit indices did not change. But maybe the problem lies in the baseline model as you suggested. Is there a way to fix the local solution problem for the baseline model for ML or MLM? Is the local solution problem the case when the iterations converge when they hit a plateau even though it is not the best one? With regard to your first answer, I tried to look up the reference but I wasn't able to locate the exact one. I was able to locate the following two references but their years are different. I am guessing you meant one of these: 1. Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171189. 2. Muthén, B., & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of nonnormal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 1930. I don't have a lot of knowledge on doing simulations. Are there any sources that you can recommend me to read on this topic? thank you very much for your help. 

Anonymous posted on Tuesday, March 19, 2002  11:46 am



Thank you very much for your answers to my post dated March 18. If I may, I would like to follow up on your answers. I tried your second suggestion and ML or MLM estimates and fit indices did not change. But maybe the problem lies in the baseline model as you suggested. Is there a way to fix the local solution problem for the baseline model for ML or MLM? Just to make sure I understand, is the local solution problem the case when the iterations converge when they hit a plateau even though it is not the best one? With regard to your first answer, I tried to look up the reference but I wasn't able to locate the exact one. I was able to locate the following two references but their years are different. I am guessing you meant one of these: 1. Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171189. 2. Muthén, B., & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of nonnormal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 1930. I don't have a lot of knowledge on doing simulations. Are there any sources (e.g., previous simulation studies) that you can recommend me to read on this topic? thank you very much for your help. 


You could run the baseline model yourself. It consists of means and variances only, no covariances. I am not sure that a local solution is the problem however. And yes, your understanding of what a local solution is correct. I apologize about the reference. It is the 1992 paper. Our website will have a new feature in the next couple of days. It will be called Mplus Web Notes and will show how to use simulations to answer research questions. This may help you. 

Anonymous posted on Monday, June 03, 2002  11:37 am



I have conducted a confirmatory factor analysis using dichotomous indicators on a very large sample (n=8008). I would like to use the results of this model to approximate factor scores for individuals not in the analysis dataset. I understand that strictly proper factor scores cannot be estimated through a factor score coefficient matrix for models with categorical indicators  they must be iteratively obtained. I am not a mathmatician and could not create programming language to do this operation for individual cases as they come up. Is there a way I can fudge a "good enough" estimate of factor scores using the Mplus output for this particular application? 

bmuthen posted on Tuesday, June 04, 2002  6:35 am



It should be possible to use your estimated model from the n=8008 as fixed parameters in a new analysis where you enter the new individuals and get factor scores the correct way. In this new analysis you don't estimate any parameters but only estimate factor scores. 

Anonymous posted on Tuesday, June 04, 2002  7:06 am



Is there any way to use the results from my estimated model to construct factor scores on a casebycase basis? I need a way to convert responses to factor scores as an assessment tool for nonscientists to use with individuals. 

Anonymous posted on Tuesday, June 04, 2002  7:08 am



BTW, the users of the new instrument will not necessarily have access to Mplus, so I wanted to come up with a solution that can be used outside the program. 

bmuthen posted on Tuesday, June 04, 2002  9:30 am



You can do the factor score estimation on a casebycase basis by the approach I described; so using an n=1 analysis with fixed parameters. Regarding your last question, the factor score estimation with categorical outcomes is an iterative process, so there is no explicit formula such as a factor score coefficient matrix that can be used to easily obtain factor scores. The only approximation would be to ignore that the outcomes are categorical and treat them as continuous, but that would seem to forfeit the purpose of your analysis. 

Rich Jones posted on Wednesday, June 05, 2002  9:26 am



Couldn't Anonymous generate a data set where each record was one of all 2^p combinations of the items in the instrument (where p is the number of items), and then estimate factor scores for each record with fixed parameters as in the n=1 case. The result would be a table of response patterns and factor scores conditional on the parameter estimates returned from the n=8,008 model. Anonymous could then distribute this table to other investigators interested in using the new instrument or prepare a simple program to associate response patterns with the appropriate factor score. 


Yes, this could be done. It's a good idea. Thanks for the suggestion. 

Anonymous posted on Wednesday, June 12, 2002  6:48 am



Thanks, Rich! Nice problem solving! I'll try that approach. 

Anonymous posted on Thursday, June 27, 2002  1:40 pm



I'm doing a CFA with ordered, categorical indicators. The CFA model by itself fits quite well, but when I go to include it in a fuller SEM, some of the scale factors become very small and insignificant. If the scale factors are related to error, a scale factor of zero doesn't seem to make sense. Should insignificant scale factors be cause for alarm ? 

Anonymous posted on Thursday, June 27, 2002  1:49 pm



Hi I have a latent variable called participation measured by 6 items asking individuals if they have participated in 6 different activities within the last two years. Activities are not neccesarily correlated with each other. These six items are considered to be "causal" indicators (formative model, or spurious model) rather than "effect" indicators (reflective model) for the dependent variable (Bollen & Lennox, 1991; Edwards & Bagozzi, 2000). This implies I need to have an unobserved latent variable (Blalock, 1971) in my model. Can Mplus handle models with such unobserved latent variables? Thank you very much. Blalock (1971). Causal models involving unobserved variables in stimulusresponse situations. In H. M. Blalock (ed.) Causal models in the social sciences (pp.335347). Chicago: Aldine. Bollen & Lennox (1991). Conventional Wisdom on measurement: A structural equation perspective. Psych Bull, 110 (2), 305314. Edwards & Bagozzi (2000). On the nature and direction of relationships between constructs and measures. Psych Methods, 5 (2), 155174. 

bmuthen posted on Thursday, June 27, 2002  3:43 pm



Scale factors close to zero correspond to latent response variable (y*) variances that are very large. This can certainly be a sign of model misspecification. 

bmuthen posted on Thursday, June 27, 2002  3:49 pm



I think by "causal" indicators you are referring to a situation where indicators are influencing rather than being influenced by a latent variable. Yes, Mplus can handle this situation. Although you don't have a factor indicator in the usual sense, you can always say factor BY anyvble fixed at 0 factor on x1x5 where x1x5 are your "causal" indicators and anyvble is any observed dependent variable in the model. Don't forget to include the identifying restrictions that this type of model requires. If I remember correctly off hand, this involves one of the slopes on x fixed at 1 and fixing the factor residual variance at zero, but you'd better check this. 

Anonymous posted on Wednesday, September 11, 2002  7:15 am



You write that a CFA requires at least m*m restrictions (m is the number of factors) to be identifiable. Is it possible to give a more precise definition of when a CFA is identifiable or when it is not? Is there any general rules for when a CFA is identifiable / nonidentifiable? 

bmuthen posted on Wednesday, September 11, 2002  8:50 am



This is a big topic. A good treatment is in the reference off of the Mplus web site Reference list: Joreskog, K.G. (1979). Author's addendum. In Advances in Factor Analysis and Structural Equation Models, J. Magidson (Ed.). Cambridge, Massachusetts: Abt Books, pp. 4043. Apart from that, rules of thumb are generally helpful. For instance, having 3 indicators per factor. Or, 2 indicators per factor if there is more than one factor. 

Anonymous posted on Friday, October 11, 2002  12:16 am



I am estimating a CFA model with about 10 3level indicators with 2 latent variables on about 1000 subjects, with 5 indicators per latent variable. I fix the latent variables to be uncorrelated. However, when I compute factor scores, I get a decidedly nonzero correlation between the factor scores (about 0.5), and the scores do not have mean zero and variance 1. I understand that the factor score empirical distribution is necessairly discrete according to the pattern of indicator values, but shouldn't the scores be uncorrelated? If not, what is the meaning of correlated factor scores from a model with uncorrelated factors? 

bmuthen posted on Sunday, October 13, 2002  10:44 am



The distribution of estimated factor scores does not have the same means, variances, and covariances as the factors themselves. This is shown for example in the LawleyMaxwell factor analysis book (look under the "regression method"); for ref. see the Mplus web site. With many good indicators (high loadings), however, the estimated factor scores tend to behave more and more like the true scores. This is true whether your indicators are continuous or categorical. Your correlation of 0.5 between estimated factor scores seems high however, unless your indicators are weak. I think you are saying that you have 3category indicators. Perhaps their loadings are quite small. If you like, you can send your Mplus input, output, and data to support@statmodel.com 


I am introducing MPlus to my CFA class and have not been able to find any studies in which the WLSMV, WLSM, and WLS estimators have been compared in terms of bias in standard errors and chisquare values and/or efficiency. Are any such studies available? 

bmuthen posted on Monday, November 25, 2002  10:59 am



These two reference cover this (I'd be happy to send them): Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Accepted for publication in Psychometrika. (#75) Muthén, B. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205243). Newbury Park, CA: Sage. (#45) 


Thanks Bengt. Would you please send the Muthen, du Toit & Spisic? I have the other article. I am at: 325V Aderhold University of Georgia Athens, GA 30602 or email at dbandalo@coe.uga.edu. Thanks, Debbi 

bmuthen posted on Tuesday, November 26, 2002  6:46 am



Will send it. 

Anonymous posted on Thursday, August 14, 2003  7:58 am



Hi, I am claculating factor scores for dichotomous items using confirmatory factor analysis in Mplus. in general what is the range of these factor scores? 


The range depends on the estimated factor means and factor variances. 

Anonymous posted on Tuesday, September 09, 2003  12:20 pm



I'm trying to locate a piece Muthen did with Kaplan a while back that evaluated the performance of various estimators when doing CFA with categorical indicators and I'm having some difficulties. I believe the piece was Muthen and Kaplan, 1985 (British Journal of Mathematical and Statistical Psychology). Does this sound right, or did M+K do another piece which made similar comparisons ? 


Are these the references you want? Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171189. Muthén, B., & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of nonnormal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 1930. 

Mpduser1 posted on Tuesday, September 09, 2003  3:02 pm



Those might be the ones; I'll give them a try. I've actually read the piece I'm attempting to refer to  M+K do a series of simulations and find that the Muthen estimator (whatever it was called at the time, but I assume its the WLSMV estimator now) provides superior results over WLS, GLS. Thanks. 


I am planning on fitting a CFA on a large sample and obtaining parameter estimates. I then plan to use/fix those parameter estimates in a CFA on another small sample and obtain factor score estimates based on the original large sample parameter estimates. I haven't seen this done, so I am wondering if there is a logical/technical problem with doing this? 


If the small sample is not from the same population as the large sample, this would not make sense. If it is, it would seem reasonable. I don't know of any literature on this. 


Thanks. Yes the small sample is very much like the large sample (actually a seemingly representative subset of the large sample) but the small sample has an interesting criterion variable available. I want to know the relationship between the factor scores and the criterion variable. The large sample is big enough to let me estimate the CFA parameters, while the small sample is not. 


Both samples should be randomly sampled from the same population for your logic to hold. I think you will run into critique if you can't claim that. 

Anonymous posted on Thursday, November 13, 2003  5:57 am



I am estimating several CFAmodels for a questionnaire with 112 dichotomous items. The sample size is 685. What estimator should I use? (WLS, WLSM, WLSMV)? (item difficulties range from .2  .8) 


We recommend WLSMV. It is the Mplus default. 

Dustin posted on Monday, December 22, 2003  11:27 am



Is there any way to turn of the defualt estimation of the covariances between continuous latent variables? 

bmuthen posted on Monday, December 22, 2003  12:08 pm



If you want factors f1 and f2 to be uncorrelated, you simply say f1 with f2@0; 

dustin posted on Monday, December 22, 2003  12:39 pm



Is fixing the covariance to be equal to zero the same as not estimated the parameter to begin with in terms of the df in the model? 

bmuthen posted on Monday, December 22, 2003  12:53 pm



Yes. 


Drs. Muthen & Muthen, I am conducting a CFA with categorical indicators (using WLSMV and Type=missing). I am sometimes obtaining modification indices = 999.000. I've read on the discussion board that this is because the modification index is indeterminate (zero or near zero denominator). My question is: Does this mean that the parameter estimate for my model are suspect? (The fit of my model is good  chisquare is nonsignificant.) Or should I just ignore these modification indices and retain the model since the model results make sense to me? Thanks, Scott 

bmuthen posted on Saturday, May 22, 2004  10:03 am



No, this most likely has no implication for quality of estimates or the model  just ignore. 

Lieven posted on Thursday, October 14, 2004  3:06 am



I have 1056 variables (equity portfolios) with the number of observations ranging from 10 to 95 (monthly dollar returns). For some variables the covariance is missing because the observations may not overlap in time. These missing covariances can be set to zero. There are 74 factors. 1 worldfactor: each variable can have an unrestricted loading on it; 39 countryfactors: each porfolio belongs to a specific country and can have an unrestricted loading to only one countryfactor, the rest zero; and 34 industryfactors: each portfolio (variable) belongs to a specific industry and can have an unrestricted loading on only one industryfactor, the rest zero. I have initial values, coming from a twostep regression methode, for the factors and the loadings. Our aim is twofold: (1) getting estimates for the loadings and the factors in oneshot (2) testing whether the 3 loadings per variable are equal. A reliable estimate of the fit of the model is less important. Is it possible to perform such an analysis? 


Mplus has a limit of 500 variables. This is an arbitrary limit but it neverthless is there in the current version. An analysis with considerably more variables than observations is difficult to carry out particularly in terms of getting good inference, in your case, testing of equalities. The situation is similar to that of recent factor analyses of microarray data; see, for example, the work of Geoff McLachlan for some recent developments in this area. 

dana posted on Thursday, November 18, 2004  7:00 am



Hi, I would like more information about the fact that 'with categorical outcome, the residual variances are not parameters in the model'. So that is different when we have continuous outcomes, right? I wonder why we can't estimate the residual variances? I read on this article: B. Muthén (1978). Contributions to factor analysis dichotomous variables. Psychometrika, vol 43, no 4, 551560. And on page 552, it is said that there is one 'necessary restriciton to make since there is no possibility to identify the diagonal elements of sigma, only observing dichotomous variables'. I still don't understant why it is not possible to identify the diagonal elements, ... Can you please explain me this? So if we need to make this necessary restriction (which is: diag(sigma) = I ), does this mean that to know the psi covariance matrix of the errors, I need to do this calculation: psi = I  diag((lambda)*(phi)*(t(lambda))) where psi = covariance matrix of the errors I = identity matrix lambda = matrix of factor loadings phi = covariance matrix of the factors t(lambda) = transpose of the matrix lambda if yes, does this mean that we can't fix or let free any elements of the psi matrix? Thanks for your help! dana 

bmuthen posted on Thursday, November 18, 2004  1:50 pm



Look at technical appendix 1 for details about this from the perspective of probit regression. The issue arises from a binary variable having mean p (the probability) and variance p (1p), which means that there is not separate information about the variance beyond information about the mean. This is different than for continuous variables. Yes, you compute the residual variance as in the formula you give. You can only estimate a residual variance if you work with longitudinal or multigroup data, where you have equality of measurement parameters  see Mplus web note #4. 

dana posted on Friday, November 19, 2004  7:46 am



Thank you very much! I'll read the paper! Many thanks! dana 

Anonymous posted on Tuesday, November 23, 2004  3:41 am



What is the difference between using WLS and MLR when indicators in a CFA are categorical? 


With WLS, probit regressions are estimated. With MLR, logistic regressions are estimated. 

Anonymous posted on Wednesday, November 24, 2004  10:36 am



Hi, when doing a CFA with binary indicators and with missing value (so TYPE = MISSING), is it possible to have CFI, TLI RMSEA and SRMR like we can get when we do not use type = missing? thanks 


Add H1 to TYPE = MISSING and I think you will get what you want. 

Anonymous posted on Wednesday, November 24, 2004  5:48 pm



thanks! I have another question. What is the meaning of this warning: THE MODEL ESTIMATION TERMINATED NORMALLY WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE F2. thanks a lot 

dana posted on Friday, November 26, 2004  12:10 pm



Hi I would like to add another question about the psi matrix (message from 18 november 2004 Following the formula: psi = I  diag((lambda)*(phi)*(t(lambda))) where psi = covariance matrix of the errors I = identity matrix lambda = matrix of factor loadings phi = covariance matrix of the factors t(lambda) = transpose of the matrix lambda So that means that the errors are not correlated! Because as i understand the equation, the identity matrix is for example like: 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 If yes, I thought the CFA can allow correlated errors, ...!! thanks dana 

bmuthen posted on Friday, November 26, 2004  12:24 pm



Correlated errors are allowed. Your computations above are only for the diagonal of the matrix you call Psi, not the offdiagonal elements. See the Mplus Technical Appendix 2, formulas 42 and on (where the residual cov matrix is called Theta) for a more precise formulation. 

dana posted on Friday, November 26, 2004  12:37 pm



thanks just to know: can Mplus give me the covariance matrix for the errors in the ouput? Because it seems a little bit complicate to calculate it, ... thanks 

bmuthen posted on Friday, November 26, 2004  2:15 pm



The residual covariances are parameters that can be identified and estimated in the model. Just say e.g. u1 with u2. 

dana posted on Monday, November 29, 2004  5:03 am



ok! So if I understand well: 1. In a CFA with binary indicators, I can't estimate the errors variances but I can calculate it with the formula above. 2. But I can estimate the errors covariances by writing in the program "u1 with u2" 3. So if I don't write "u1 with u2", that will mean that the errors between u1 and u2 are uncorrelated? Is that right? Thanks! dana 


1. Yes. 2. Yes. 3. Yes. 

Scott Engel posted on Thursday, December 09, 2004  12:01 pm



I ran a CFA with categorical factor indicators on 25 items with four scales. I got an excellent fit for this model. I followed this up with a second order CFA to determine if these four factors are well represented by a single latent construct. This model also fit well and it appears that a single latent construct does, in fact, represent the four scales well. My question is this given that I’ve run nonlinear factor analysis, can I use a simple sum score of the scales as a total score? Or, do I need to consider weighting of the scaled scores based upon how they loaded on the second order latent construct? For practical reasons I’d prefer to use a simple unweighted score. If the weighted scores are ideal, is using the unweighted scores defensible and an adequate rough approximation of the latent construct? 

bmuthen posted on Thursday, December 09, 2004  5:10 pm



The optimal approach would be to get the estimated factor scores for the secondorder construct. You get this by using the FSCORES option of the SAVEDATA command. A sum of the 25 items is probably a decent rough approximation of the factor score. But note that you then revert to assuming interval scale for the four categories and also don't take into account the differential loadings on each of the two levels. 


A student is interested in doing a confirmatory factor analysis with categorical variables and has specified several items with multiple thresholds in a large sample. She wants to accomplish a change to her confirmatory model in which error variances are covaried. How can one do this, given the variables are categorical? A search of the manual revealed that it's possible to specify that thresholds are correlated. Does this accomplish the desired correlation between error terms? I'm somewhat unsure of what correlating thresholds means and where I'd read more about it. Given that the data are polytomous, what would it mean, by the way, if one specified a covariance between one of the thresholds and not the other between two manifests? As an interim workaround, I told the student to model the desired error covariance as a factor with two indicators, but of course, it would be good to know what the best answer is. thanks! 

BMuthen posted on Wednesday, January 12, 2005  4:33 pm



Residuals can be correlated using the default WLSMV estimator. These are the residuals for the underlying y* variables. The thresholds are not correlated. They are not random variables. 


Is it possible to get Mardia's coefficient in MPlus version 3? 


Technical output 13 provides univariate, bivariate and multivariate sample skew and kurtosis. This output is available for mixture models only. Nonemixture models can be estimates as a single class mixture to get that output. 


Earlier I mentioned that I will be doing a CFA with 24 binary outcomes and 1 latent variable. It was suggested that I use WLSMV instead of WLS and that nested models should be compared using chisquare difference. On the Mplus output, however, a warning message says that "The chisquare value for MLM, MLMV, MLR, WLSM and WLSMV cannot be used for chisquare difference tests." It further says that MLM, MLR and WLSM chisquare difference testing is described in the Mplus Technical Appendices. How about WLSMV? Thank you! 


You need to use the DIFFTEST option to do chisquare difference testing using WLSMV. See the Mplus User's Guide for a description of the DIFFTEST option. See Example 12.12 for an illustration of this option. 

Anonymous posted on Thursday, March 17, 2005  10:39 pm



I am trying to reconcile my understanding of web note 4 with page 480 of the manual. My model is MODEL f1 BY y1y7 f2 BY y8y15 f1 ON x1x10 f2 ON f1 x1x15 Where y1y15 are ordinal, f1 and f2 are continuous, and x1x15 are a mix of continuous and categorical. My question is: are the output coefficients for the “ON” part of the model simple linear regression coefficients or probit (WLS) / logistic (MLE) regression coefficients? From web note 4 and a technical appendix, I started thinking the “ON” portion is a “latent response variable formulation” and could be viewed as a simple linear regression with continuous outcome since the latent variable is continuous. From p 480 of the manual, I’m not sure this is the case. Either way, can I assume the parameter estimates are normal and use the estimates with their std errors to get pvalues? Another question should I expect to see a coefficient on the output for f1? Thank you. 


Because f1 and f2 are continuous latent variables, the coefficients are simple linear regression coefficients. The factor loadings are probit regression coefficients. The ratio of the parameter estimate to its standard error can be used to assess significance. You should expect to see coefficients for x1x10 for f1 and x1x15 for f2. 

Anonymous posted on Monday, March 21, 2005  3:24 pm



This is a followup question for the model in my March 17, 2005 post above. I have been reading technical appendix for version 2 (although I’m using 3.12) but still unclear on how my model is being estimated. I’m not specifying an analysis type so the default WLSMV is being used. More specifically, I’m confused by the statement that the coefficients for the “ON” part of the model are simple linear regression coefficients, which would imply that maximum likelihood is being applied. Is the estimation applied to the “BY” part of the model different than that applied to the “ON” part of the model? Is WLSMV applied only to the “BY” part of the model? I also see in Tech 5 that gradient calculations and quasinewton are being applied…are these steps required for both ML and WLSMV? On a separate topic, I need some help understanding the model fit: ChiSquare, ChiSquare for baseline (what is baseline?), CFI, TLI, RMSEA, WRMR…can you recommend a reference? Thank you so much! 

bmuthen posted on Monday, March 21, 2005  5:36 pm



Say that a model has both (1) factor indicators and (2) regressions of factors on covariates. With categorical indicators, part (1) is a nonlinear regression (probit or logit), while part (2) is a linear regression. The choice of regression is done as a function of the dependent variable being categorical in (1) and continuous in (2). The fact that linear regression is used in (2) does not imply that ML is used. With the WLSMV estimator you get probit in (2) and linear reg in (2) and all of this is done in a single WLSMV estimation step encompassing both (1) and (2). Yes, both ML and WLSMV require numerical optimization using gradient and QN steps. See the Yu dissertation on the Mplus web site and references therein for these fit index definitions. 

Anonymous posted on Thursday, March 24, 2005  7:44 pm



Hi, I am using CFA to evaluate a model of 5 correlated firstorder factors. My understanding is that it is appropriate to constrain error covariances to zero. It has been suggested to me that, instead, I estimate the error covariances among the items specified as indicators of a given factor, but not estimate error covariances for items across different factors. The argument provided to me for doing this is that this is okay because theoretically the items serving as indicators of a given factor are measuring something distinct from the other factors. To me, this is represented by the fact that such indicators are specified to load on one and only one of the factors. Is there ever a time to allow these errors to covary? The only situation that comes to my mind is if there are method effects, maybe reversescored items, but even in this situation I wouldn't be that comfortable doing it. Any thoughts about this situation or any other? How justifiable is it to estimate these parameters? 


It is justifiable to estimate residual covariances of factor indicators if there is a reason that this parameter makes sense in the model, for exmaple, methods effects or minor factors. I do not believe that this would apply to reverse order items. 


I am running a multiple group CFA where I want to compare the loadings on the same 6 factors in 4 different age groups. Each factor is defined using the same 2 observed variables in each of the 4 groups (factor1 by var1 var2 in group1, group2, group3, and group4). There are no covariates. Following the documentation in the Mplus 3.0 User's guide I have first specified an overall model, followed by group specific models for each of the age groups, listing only the second observed variable for each factor on the groupspecific models (since the first variable will always have a loading of 1.0 to establish scale). My questions: 1. When I specify the groupspecific models, do I need to leave out one of the age categories, as I would need to do in multiple regression to prevent multicolinearity? Curiously, I get the same results whether I do or don't omit one of the age categories. 2. The User's Guide says that, if I do not request a model for one of the groups that I have defined with the GROUPING command, the omitted group will have the overall model fitted. Could you explain why I get the same output (identical stats) if I don't include a groupspecific model command for the age 1524 group as when I do? I would have expected to get different loadings for the second observed variable on my 6 factors when I allow them to be freely estmated in the age 1524 group compared to when I force that group to have the overall model by not including a model statement for it. 3. I have observed that I get exactly the same output (all stats the same) regardless of which age category group I omit (i.e. don't include a specific model for) or whether I specify that I want the loadings to be freely estimated separately for all 4 age groups. When I fit the same models (6 factors, 2 observed vars per factor, first var loading fixed at 1.0) across 3 education levels, I of course get different loadings than when I model by age groups, but again the output is the same across the education groups. To summarize: Can you explain this consistency regardless of how I specify that I want models built for each group? Whether I leave a group out of my model statements or don't, and regardless of which group I omit, I get the same results. Thank you so very much for whatever clarification you can provide. 

Thuy Nguyen posted on Tuesday, April 12, 2005  10:30 am



If possible, please send your input, output and data to support@statmodel.com. I think these questions will be better answered if we are able to see the results that you are seeing and how the model is set up. 

Anonymous posted on Monday, August 15, 2005  2:06 pm



First, let me say thank you for maintaining this discussion site. It is invaluable. I have found that your responses frequently answer questions I have had but did not post. I am running a CFA with two factors. I get the warning message that the psi matrix is not positive definite. The Tech4 option shows that the estimated correlation between my two factors is 1.098, which the cause of the warning, but the output under Model Results shows under "factor1 with factor2" a correlation of just 0.118. Computationally, what's the difference between the estimated correlation between factors produced by the Tech4 option and the one that appears under the model results? 

bmuthen posted on Monday, August 15, 2005  3:47 pm



Thanks for the encouragement regarding our site. Tech4 gives the correlation which is the same as what is given in the regular output if the factors are exogenous in the model, but if the factors are dependent variables the regular output concerns the residual correlation. 

Anonymous posted on Wednesday, August 17, 2005  9:30 am



Could you please explain how to interpret the Residual Variances and RSQUARE values listed in the output for a MIMIC model? 1. Are the residual variances the differences between the variances estimated for the observed variables (factor indicators) from the model and the actual variances for these variables calculated from the data?  such as what is measured in the Chisquare goodness of fit measure? 2. Or is the residual variance of a factor indicator variable the variation that's unaccounted for by its loadings of the factors? If this is the case one would expect that the RSquare value would be equal to (1  the residual variance), but this does not seem to be the case judging from the values in my output. Is Rsquare the same as the communality in EFA? 3. I know that the RSquare value is equal to 1  the StdYX of the Residual variance. But how do the Rsquare values and the raw residual variances relate to one another? Thank you for the information! 

BMuthen posted on Wednesday, August 17, 2005  2:18 pm



1. No. 2. That's right. In the MIMIC model, the Rsquare is not 1  the residual variance but it is the ratio of variance in the factor explained by the covariates over the total variance of the factor, where the total variance includes both the explained variance and the residual variance. 3. See number 2. 

Anonymous1 posted on Monday, October 03, 2005  6:02 am



Good morning, I am a relatively new user of your software. I am currently attempting to conduct a CFA using the latest version of MPlus but keep encountering two series of error messages: 1. COMPUTATIONAL PROBLEMS ESTIMATING THE CORRELATION FOR x2 AND x4. INCREASING THE ITERATION OR CONVERGENCE OPTIONS MAY RESOLVE THIS PROBLEM. When I follow the suggested advice I then encounter the second message which pertains to a completely different variable: 2. SERIOUS COMPUTATIONAL PROBLEMS OCCURRED IN THE UNIVARIATE ESTIMATION OF THE THRESHOLDS/MEANS, VARIANCES AND/OR SLOPES FOR VARIABLE x10. What might these messages and their associated problems be attributable to? 

bmuthen posted on Monday, October 03, 2005  10:34 am



It sounds like your variables are categorical so you may want to look at their univariate and bivariate distributions (frequency tables) to see if you see anything unexpected such as very skewed distributions. If this doesnt help, send your input, data, and output to support@statmodel.com 

Anonymous1 posted on Tuesday, October 04, 2005  1:37 pm



Thanks Dr. Muthen. I followed your advice and reexamined the variable distributions. Several variables were defined by negative and postive skews. Transformations solved the problems that I'd been encountering. 

JI posted on Tuesday, October 04, 2005  2:03 pm



Hi, I'm a new MPlus user. I had just ran a CFA with my .inp as stated below. However, I've been getting a series of error messages on "ERROR in Variable command Duplicate variable on NAMES list". I truly hope I can get some help with this issue. Thanks. TITLE: CFA for Mach and Religiosity DATA: FILE IS Mach_Rel_BE_EPbeg_EPnow.dat; VARIABLE: NAMES ARE T1T9 V1V9 M1M2 R1R4 Be1Be16 Epb1Epb10 Epn1Epn10; USEVARIABLES ARE T1 T3 T5 T7 V1 V2 T2 T4 T6 T8 T9 M1 V3 V4 V6 V8 V9 R1R4; CATEGORICAL ARE T1 T3 T5 T7 V1 V2 T2 T4 T6 T8 T9 M1 V3 V4 V6 V8 V9 R1R4; MODEL: M BY T1 T3 T5 T7 V1 V2; Mi BY T2 T4 T6 T8 T9 M1; Fio BY V3 V4 V6 V8 V9; Rel BY R1R4; 

Thuy Nguyen posted on Tuesday, October 04, 2005  5:30 pm



JI, From your input, I don't see what might be causing a problem. If you send your input, output and data to support@statmodel.com, I can check into this for you. 


Hi, I'm trying to estimate the effects of a national level observed variable on the covariance between two individullevel latent constructs. IS this possible to do using Mplus? I would like to be refer to some examples/papers that have done this type of analysis in the past. Thanks in advance for your help, Pancho 


Can you describe this in a little more detail? For example, do you want to explain the covariance between an individual's income and ses by by the nation's GNP? I don't think I understand what you are asking. 


Hi Linda, Yes your example is correct. Can I do that using Mplus? Thanks a lot! Pancho 


Explain to me how the nation variable has any variability. It seems it would be the same for each person. Or are you looking at several nations? 


Linda, Yes, I have a total of 25 nations, their GNPs vary among them. so you are correct people are nested within nations. Thanks so much, Pancho 


The will be available in Version 4 of Mplus. 

Ed Wu posted on Tuesday, November 15, 2005  5:01 pm



"In the MIMIC model, the Rsquare is... the ratio of variance in the factor explained by the covariates over the total variance of the factor, where the total variance includes both the explained variance and the residual variance." What is the relationship between Rsquare and communalities in models without covariates? 

bmuthen posted on Tuesday, November 15, 2005  5:47 pm



Communalities refer to explained variance in an item as a function of the factors influencing that item. So it is the Rsquare for the item instead of for the factor. 

Blaze Aylmer posted on Saturday, November 19, 2005  4:33 am



Linda Can you recommend any sources that can help with the interpretation of CFA output in MPLUS? Blaze 


Chapter 17 of the Mplus User's Guide has a description of the Mplus output. The scale of the factor indicators tells you what type of regression coefficient the factor loading is. As far as how to understand other things about CFA, see a book like the Bollen book or other references on the Mplus website. 

fati posted on Tuesday, January 17, 2006  6:55 am



I am estimating a CFA with 7 factors and 38 items using wlsmv, my output shows : RMSEA=0.095, CFI=0.94, I know that is a poor model, and I have some questions , this is the first time that I do this analysis. 1what statistics can help me to have a good model? my program is: TITLE: CFA FOR PCAS (Categorical factor indicators) DATA: FILE IS PCAS.dat; VARIABLE: NAMES ARE q1b q2 q3b q4b q6a q6b q9a q9b q9c q9d q9e q10 q11a q11b q11c q11d q11e q12b q12d q12g q14a q14b q14c q14d q15 q19a q19b q19c q19d q19e q19f q7r q8r q12ar q12cr q12er q12fr q13REC; CATEGORICAL ARE ALL; missing = ; MODEL: ACCES BY q1b q2 q3b q4b q6a q6b; CR_VISIT BY q7r q8r; CR_KNOW BY q14a q14b q14c q14d q15; IC_COMM BY q9a q9b q9c q9d q9e q10; IC_TRUST BY q12ar q12b q12cr q12d q12er q12fr q12g q13rec; MC_INTEG BY q19a q19b q19c q19d q19e q19f; RESP BY q11a q11b q11c q11d q11e; OUTPUT: tech2 modindices; 2can I use the modifications indices for this, my modification indices output are: MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index 10.000 M.I. E.P.C. Std E.P.C. StdYX E.P.C. BY Statements ACCES BY Q10 17.281 0.423 0.237 0.237 CR_KNOW BY Q6B 17.053 0.254 0.226 0.226 IC_COMM BY Q3B 10.165 0.155 0.146 0.146 IC_COMM BY Q6B 26.019 0.268 0.253 0.253 IC_COMM BY Q15 12.545 0.317 0.299 0.299 IC_TRUST BY Q6B 31.992 0.361 0.278 0.278 IC_TRUST BY Q15 13.594 0.405 0.312 0.312 MC_INTEG BY Q6B 18.600 0.312 0.256 0.256 RESP BY Q6B 24.439 0.270 0.249 0.249 RESP BY Q15 12.753 0.348 0.321 0.321 how can I apply this change in my model, for example, I understand that q10 must be better with factor acces ,h ow can I do this, do i change the polace of item q10 to a factor acces? 2how can I use the estimates in order to have a better model , what that is means if I have: MODEL RESULTS Estimates S.E. Est./S.E. ACCES BY Q1B 1.000 0.000 0.000 Q2 1.235 0.101 12.262 Q3B 1.329 0.118 11.280 Q4B 1.200 0.117 10.230 Q6A 1.317 0.117 11.227 Q6B 1.534 0.147 10.406. for q1b , can I tell that q1b is not significant, what can I do to change this. thank you very much in advance, for your response in 


Given that the modification indices suggest a lot of possible crossloadings, I would suggest starting with an EFA. It may be that the data you have does not support the theory that you are testing. EFA will give you a better idea of whether the variables are behaving as you expect. 

fati posted on Tuesday, January 17, 2006  9:19 am



Thank you for response, I have doing a EFA, I have used a promax rotated loading (>0.4) to compare the items in each factor, there is a little difference, but when I use a CFA for a model changed , my test RMSEA IS ALWAYS >0.05 (RMSEA=0.10, CFI=0.936), WHAT THAT IS MEANS, My indices modifactions are: MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index 10.000 M.I. E.P.C. Std E.P.C. StdYX E.P.C. BY Statements ACCES BY Q10 16.162 0.223 0.223 0.223 CR_KNOW BY Q6B 17.360 0.226 0.226 0.226 IC_RESP BY Q6B 25.570 0.246 0.246 0.246 IC_RESP BY Q14A 10.447 0.252 0.252 0.252 IC_RESP BY Q15 17.956 0.352 0.352 0.352 IC_TRUST BY Q6B 31.391 0.277 0.277 0.277 MC_INTEG BY Q6B 18.837 0.257 0.257 0.257 OTHER BY Q3B 10.089 0.163 0.163 0.163 OTHER BY Q6B 31.372 0.307 0.307 0.307 OTHER BY Q14A 10.620 0.247 0.247 0.247 OTHER BY Q15 31.692 0.427 0.427 0.427 ON/BY Statements IC_RESP ON CR_KNOW / CR_KNOW BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON IC_RESP / IC_RESP BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON IC_TRUST / IC_TRUST BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON MC_INTEG / MC_INTEG BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON OTHER / OTHER BY IC_RESP 999.000 0.000 0.000 0.000 how can I do for having a good model? thanks 


I'm afraid that I cannot tell you how to get your model to fit. I don't think you would have so many factor loading modification indices if your EFA clearly pointed to the factors in your CFA. I would revisit the EFA. 

jad posted on Wednesday, January 18, 2006  12:16 pm



Iam conducting CFA with 40 categorical factor indicators with estimator by default (wlsmv), (7 constructs), I have understand in the article (David B.Flora and Patrick J.Curran (2004)) that a robust WLS is robust to modest violations of underlying normality, I want to know how can I determine a modest nonnormality , do I use a skweness and kustosis ? if yes, how can I do this in Mplus, do I verifyy a normality to esch item used in my analysis, I have 40 items with 350 observations? thanks 


The normality assumption is for the u* variables underlying the observed u variables. It is difficult to test their normality. From a practical point of view, if you did test the normality of the u* variables and found that they were extremely nonnormal, what would you do? You would probably just use a robust weighted least squares estimator. To read more about these issues, see the following paper: Muthén, B. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205243). Newbury Park, CA: Sage. (#45) If you don't have access to the paper, you can request paper 45 from bmuthen@ucla.edu. 

JAD posted on Monday, January 23, 2006  7:54 am



THANK YOU 


I'm running a 5 factor CFA with 44 dichotomous items. Here is my input syntax: VARIABLE: NAMES ARE q1  q44; CATEGORICAL ARE q1  q44; ANALYSIS: ESTIMATOR=WLSMV; MODEL: f1 by q5* q8* q12* q21* q24* q43*; f2 by q2* q6* q10* q11* q13* q15*; f3 by q3* q4* q14* q16* q18* q19* q22* q31* q33* q35* q37* q40* q42*; f4 by q9* q17* q20* q23* q25* q26* q27* q32* q34* q36* q39* q44*; f5 by q1* q7* q28* q29* q30* q38* q41*; f1 @ 1; f2 @ 1; f3 @ 1; f4 @ 1; f5 @ 1; I get the following warning: WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE F2. I'm confused as to what this means. 1) Is this the covariance matrix of the item residuals? I don't think so but I need to clarify & wouldn't it be called theta? 2) Is this the covariance matrix of the 5 latent factors? If so, why is it referring to residuals when my factors are just freely correlated (i.e., nothing is predicting them so there should be no residual) & wouldn't this be called phi? thanks a lot 


It is referring to the covariance matrix of the factors. Most likely f2 has a negative or zero variance. I would have to see the entire output to understand why residual is being printed. You can send it along with your license number to support@statmodel.com. 


I ran two programs of CFA with the exact same MODEL command but with different USEVARIABLE command and generated two very different outputs. The first satement consisted of three variables (USEVARIABLE) which generated an RMSEA score of 0.000. (Extremely perfect!!!) The second satement consisted of four variables (USEVARIABLE) which generated an RMSEA score of 1.581. (Extremely bad!!!) Below are my commands for this simple CFA: DATA: FILE = "y:\carmen\carmen2.txt"; FORMAT = free; VARIABLE: NAMES ARE id fib211 fib212 fib213 fib214; USEVARIABLES = ! Statement 1: ! fib211 fib212 fib214; ! Statement 1 produces an RMSEA score of 0.000 ; ! Statement 2: fib211 fib212 fib213 fib214; ! Statement 2 produces an RMSEA score of 1.581; CATEGORICAL = ! Statement 1: ! fib211 fib212 fib214; ! Statement 2: fib211 fib212 fib213 fib214; MISSING ARE ALL(99); ANALYSIS: TYPE = GENERAL MISSING H1 ; ESTIMATOR = WLSMV; MODEL: f1 by fib211 fib212 fib214; OUTPUT: SAMPSTAT STANDARDIZED; ********************* Will someone please tell me what is wrong here? Thank you very much for your help. 


When you have four variables on the USEVARIABLES list and only three variables in the MODEL command, all four variables are used in the analysis. The variable not mentioned in the model command is not correlated with any of the other variables. This could make the model not fit. You will find a message to this effect in the output. If you have further questions of this type, send them along with the input, data, output, and your license number to support@statmodel.com. 


i have a wellfitting model for an instrument that mixes categorical and continuous items. in the model, items load on three lowerlevel factors (two of which include categorical items), which then load on a single factor. the categorical items are all "yes/no." a somewhat arbitrary scoring system exists for this measure. based on the above discussion, i get the sense that, at least for the factors that include categorical items, an iterative process must be gone through in order to create the factor scores. i want to be able to describe a scoring system that does not require use of mplus, so i have several questions: 1. there is some discussion above about creating a table that includes all possible answers and the resulting score on the factor. however, it's not particularly clear to me how this would be accomplished. can someone provide a little more detail? 2. am i correct that the iterative process is necessary because of the categorical items? that is, for the one lowerorder factor that is defined by continuous variables, can i instead use the loadings on the factor to estimate factor scores? 3. when categorical and continuous variables are both included, are the continuous variables part of the iterative process, or is their contribution to the factor score more direct? i went into this thinking that it should be easy to come up with something better than the arbitrary scoring method that now exists, but reading the posts above i'm realizing i don't really know much about the issues here! thanks in advance for any thoughts on this question. 


Whenever some observed factor indicators are categorical, the factor score estimates have to be obtained using an iterative optimization procedure. There is no simple approximation (short of just summing the items). The procedure is described in the technical appendix for Version 2 posted on the web site. So the estimation has to be done in Mplus. A tabulation approach is not feasible. 


thanks for the answer. just to clarify: an above answer to a similar question seemed to suggest that, for example, if a factor is defined by 5 yes/no items, then a table could be constructed that would look something like (spacing isn't coming out exactly as i meant to, but i hope the idea is clear enough): 1 2 3 4 5 score n n n n n 0 y n n n n 2 y y n n n 3 y n y n n 3.5 etc... where all possible combinations are expressed and the corresponding factor score is indicated. i am not sure this ends up being at all feasible in my data set (one of the factors has 7 yes/no items and one item with responses from 16so that would be a *lot* of rows in a table), but is it *theoretically* possible to create such a table? (if not, i'm giving up; if so, it would be nice to have a pointer or two.) thanks! tom 


Each distinct response pattern (row in your table) does give rise to a distinct estimated factor score value. To fill in the value for each row, you would need estimates for the model parameters and then compute the estimated factor scores using the iterative optimization technique. If this has been computed, the table can be created and then applied to any other individuals with these values. 


thanks for your answer, that clarifies the task. i'll go bang my head against that particular wall for a while, but i may be back with further questions. thanks again! 

yang posted on Friday, April 21, 2006  6:58 am



Is it permitted for an indicator to have continuous and categorical (binary) indicators at the same time? 


A factor may have a combination of continuous and categorical factor indicators. 


Am I understanding correctly? (1)The scale factor in categorical data CFA is the latent response variable associated with each observed categorical factor indicator? (2)if we want to compare two models, one nested in the other, we should use the uncorrected chisquare that is resulted from nonrobust estimation method. Thanks Linda and Bengt! 


The scale factor is one divided by the standard deviation of the latent response variable. For WLS, you can use a standard chisquare difference. For WLSM, you need to use the scaling correction factor provided in the output. For WLSMV, you need to use the DIFFTEST option. 


Hi Linda, I'm running a CFA with 5 factors and 29 items. As have others above, I received the message: THE MODEL ESTIMATION TERMINATED NORMALLY WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE avoid. I checked the tech4 and the correlation with another factor is larger than 1. (same in model result). What is the fix for this? Theoretically I would expect these factors to be negatively correlated, so I hesitate to fix it to 0. Thanks! 


This means that the two factors with a correlation greater than one are statistically indistinguishable from each other. You either need to rethink them or use only one of them. Did you ever do an EFA with the set of items to see if they are loading as expected? It may be that your items are not valid measures of what they are meant to measure. 

yang posted on Monday, May 15, 2006  12:45 pm



Will confirmatory factor analysis allow one indicator to be loaded on two factors? Thanks. 


Yes, cross loadings are possible in CFA. 


Response to your May 12 posting: That's a good thought, but it doesn't seem to fit. I did do an EFA (both 4 and 5 factor solution) and these two factors are the ones that have the loadings most consistent with the theory (the others have some problems, but these two are very robust and they are generally not correlated more than .30). The other 3 factors which don't the correlation greater than 1 are not as distinct as I would like. COuld this be affecting the parameters of the more robust factors? Thanks 


I don't think so. I would do an EFA to see where the items load. If they don't load according to theory, I would think about the validity of the items. 


I am puzzled by a discrepancy between EFA and CFA In the EFA the factor intercorrelations are all within reasonable range, whereas in the CFA 2 Factors 1 and 5 end up with a correlation >1. I know the solution below is not very good, but this ALSO happens when I eliminate the items that are crosscorrelated or don't load according to the theory and when I assign items in the CFA to the factors they load on in the EFA. Any other thoughts on this? EFA result EXPLORATORY ANALYSIS WITH 5 FACTOR(S) : CHISQUARE VALUE 522.915 DEGREES OF FREEDOM 271 PROBABILITY VALUE 0.0000 RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) : ESTIMATE IS 0.052 ROOT MEAN SQUARE RESIDUAL IS 0.0373 PROMAX ROTATED LOADINGS 1 . 2 . 3 .4 . 5 AWARE1 0.770 0.233 0.056 0.140 0.006 AWARE2 0.610 0.077 0.070 0.057 0.031 AWARE3 0.662 0.059 0.019 0.098 0.013 AWARE4 0.513 0.400 0.121 0.197 0.074 AWARE5 0.584 0.280 0.117 0.248 0.015 AWARE6 0.021 0.262 0.074 0.111 0.409 AWARE7 0.030 0.021 0.009 0.057 0.806 AWARE8 0.055 0.145 0.032 0.125 0.860 AWARE9 0.277 0.197 0.022 0.284 0.007 AWARE10. 0.005 0.100 0.012 0.018 0.578 GATHER1 0.062 0.617 0.129 0.244 0.052 GATHER2 0.105 0.587 0.012 0.176 0.026 GATHER3 0.000 0.530 0.068 0.295 0.044 GATHER4 0.141 0.107 0.054 0.376 0.010 GATHER5 0.002 0.231 0.230 0.139 0.075 GATHER6 0.116 0.678 0.133 0.036 0.009 GATHER7 0.050 0.605 0.071 0.264 0.065 PREFER1 0.081 0.614 0.016 0.441 0.088 PREFER2 0.054 0.177 0.138 0.495 0.060 PREFER3 0.048 0.019 0.599 0.395 0.001 PREFER4 0.093 0.113 0.586 0.076 0.048 PREFER5 0.155 0.371 0.284 0.387 0.004 PREFER6 0.037 0.293 0.394 0.348 0.029 PLANS1 0.027 0.202 0.056 0.674 0.063 PLANS2 0.035 0.113 0.047 0.597 0.009 PLANS3 0.059 0.498 0.158 0.174 0.050 PLANS4 0.022 0.133 0.010 0.263 0.050 PLANS5 0.021 0.038 0.060 0.512 0.059 PLANS6 0.011 0.081 0.061 0.290 0.028 PROMAX FACTOR CORRELATIONS 1 . 2 .3 . 4 . 5 . 1 1.000 . 2 0.424 1.000 . 3 0.249 0.337 1.000 . 4 0.234 0.295 0.258 1.000 . 5 0.036 0.268 0.036 0.088 1.000 CFA result ChiSquare Test of Model Fit Value 4942.984 Degrees of Freedom 367 PValue 0.0000 ChiSquare Test of Model Fit for the Baseline Model Value 22500.021 Degrees of Freedom 406 PValue 0.0000 CFI/TLI CFI 0.793 TLI 0.771 Loglikelihood H0 Value 41460.522 H1 Value 38989.030 Information Criteria Number of Free Parameters 68 Akaike (AIC) 83057.044 Bayesian (BIC) 83329.977 SampleSize Adjusted BIC 83114.201 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.175 90 Percent C.I. 0.170 0.179 Probability RMSEA <= .05 0.000 SRMR (Standardized Root Mean Square Residual)Value 0.114 MODEL RESULTS Estimates S.E. Est./S.E. Std StdYX AWARE BY AWARE1 1.000 0.000 0.000 9.144 0.727 AWARE2 1.169 0.082 14.332 10.691 0.686 AWARE3 1.343 0.069 19.605 12.280 0.918 AWARE4 1.485 0.080 18.542 13.575 0.872 AWARE5 1.463 0.091 16.073 13.381 0.764 AWARE9 1.611 0.076 21.202 14.734 0.985 AVOID BY AWARE6 1.000 0.000 0.000 11.760 0.668 AWARE7 1.128 0.072 15.678 13.271 0.818 AWARE8 1.059 0.063 16.749 12.459 0.884 AWARE10 1.241 0.068 18.204 14.596 0.979 GATHER BY GATHER1 1.000 0.000 0.000 10.569 0.625 GATHER2 1.605 0.104 15.380 16.963 0.934 GATHER3 1.495 0.102 14.631 15.801 0.869 GATHER4 1.643 0.121 13.534 17.364 0.780 GATHER5 1.622 0.104 15.566 17.141 0.951 GATHER6 1.626 0.107 15.234 17.182 0.921 GATHER7 1.631 0.105 15.552 17.242 0.950 DECIDE BY PREFER1 1.000 0.000 0.000 18.141 0.999 PREFER2 1.004 0.003 303.259 18.210 0.999 PREFER3 0.926 0.013 72.939 16.802 0.965 PREFER4 0.989 0.018 53.961 17.934 0.937 PREFER5 0.928 0.018 50.476 16.834 0.929 PREFER6 0.452 0.034 13.314 8.198 0.550 CONCRETE BY PLANS1 1.000 0.000 0.000 14.628 0.866 PLANS2 1.013 0.050 20.421 14.813 0.788 PLANS3 0.980 0.039 25.265 14.338 0.883 PLANS4 0.952 0.042 22.604 13.921 0.834 PLANS5 0.859 0.047 18.399 12.571 0.739 PLANS6 0.934 0.030 30.998 13.667 0.968 AVOID WITH AWARE 109.749 10.896 10.072 1.021 1.021 GATHER WITH AWARE 61.594 7.458 8.258 0.637 0.637 AVOID 81.230 10.001 8.122 0.654 0.654 DECIDE WITH AWARE 106.331 10.965 9.697 0.641 0.641 AVOID 141.003 14.838 9.503 0.661 0.661 GATHER 188.091 17.704 10.624 0.981 0.981 CONCRETE WITH AWARE 115.983 10.850 10.690 0.867 0.867 AVOID 149.943 14.608 10.264 0.872 0.872 GATHER 104.788 11.798 8.882 0.678 0.678 DECIDE 167.761 16.404 10.227 0.632 0.632 Variances AWARE 83.616 9.716 8.606 1.000 1.000 AVOID 138.302 17.817 7.762 1.000 1.000 GATHER 111.711 15.937 7.009 1.000 1.000 DECIDE 329.089 23.066 14.267 1.000 1.000 CONCRETE 213.979 19.474 10.988 1.000 1.000 Residual Variances AWARE1 74.442 5.142 14.477 74.442 0.471 AWARE2 128.728 8.912 14.445 128.728 0.530 AWARE3 28.213 1.924 14.667 28.213 0.158 AWARE4 57.915 3.952 14.654 57.915 0.239 AWARE5 127.639 8.796 14.512 127.639 0.416 AWARE6 171.665 11.655 14.729 171.665 0.554 AWARE7 87.099 5.784 15.058 87.099 0.331 AWARE8 43.288 2.868 15.092 43.288 0.218 AWARE9 6.641 0.581 11.431 6.641 0.030 AWARE10 9.108 0.759 11.996 9.108 0.041 GATHER1 174.124 12.288 14.170 174.124 0.609 GATHER2 41.799 3.255 12.842 41.799 0.127 GATHER3 81.028 5.930 13.665 81.028 0.245 GATHER4 193.663 13.851 13.982 193.663 0.391 GATHER5 30.818 2.518 12.241 30.818 0.095 GATHER6 52.600 4.008 13.123 52.600 0.151 GATHER7 32.027 2.604 12.299 32.027 0.097 PREFER1 0.759 0.172 4.404 0.759 0.002 PREFER2 0.677 0.172 3.940 0.677 0.002 PREFER3 21.023 1.491 14.100 21.023 0.069 PREFER4 44.379 3.127 14.193 44.379 0.121 PREFER5 44.784 3.152 14.207 44.784 0.136 PREFER6 154.765 10.827 14.294 154.765 0.697 PLANS1 71.324 5.564 12.819 71.324 0.250 PLANS2 134.137 9.927 13.512 134.137 0.379 PLANS3 57.917 4.622 12.530 57.917 0.220 PLANS4 84.616 6.418 13.184 84.616 0.304 PLANS5 131.413 9.575 13.725 131.413 0.454 PLANS6 12.449 1.765 7.053 12.449 0.062 


You are comparing an EFA to a simple structure CFA. These are two different models and therefore would not necessarily end up with the same results. Also, please don't paste large portions of output on the discussion board as it takes too much room. If it is necessary to show this much output, please send the question to support@statmodel.com along with your license number. 

Lois Downey posted on Thursday, June 29, 2006  10:24 am



The following statement appears in B. Muthen's post of 10/13/02, 10:44 AM: "With many good indicators (high loadings), ... the estimated factor scores tend to behave more and more like the true scores." I computed factor scores for a singlefactor complex missing model with 10 dichotomous indicators. 801 patients were clustered under 92 physicians. Standardized loadings ranged from 0.934 to 0.858. Am I correct in interpreting this to constitute "many good indicators"? Estimated mean and variance for the latent variable were 0.00 and 0.79, respectively, whereas the mean and variance for the factor scores were 0.07 and 0.44, respectively. Would one expect greater correspondence between the two distributions than this, given the number of indicators and the size of the loadings? 


With binary indicators you need closer to 20 indicators for the factor scores to behave well. 


Dear prof. I am trying to fit the latent class analysis model to four manifest variables each with two levels and one covariate i.e.age.Inititutively since i am working on diagnostics test i rarely expect the number of classes to go beyond 3,the two the better.But unfortunately the model with 3 classes seems to fit better than the one with two and may be with four may be still significant.One of the problem i noticed is that there is high correlation between manifest varibles about 0.95. Please will you advice me on how to handle the effects of correlation in modelling latent class. Second i saw correlation of about 0.95 can you kindly give me an idea on how is obtained because to my knoweledge correlation is for continous varaiables and we commonly use odds ratio as measure of association in categorical variable. Find the attached Mplus codes for my analysis. Title: Summer Latent Class Analysis. Data: File is C:\Documents and Settings\Maruwa1\Desktop\mps2.txt; Variable: names = id age viap paplsilp colpohgrp hpv; usevariables = id age viap paplsilp colpohgrp hpv ; categorical = viap paplsilp colpohgrp hpv; classes = c(3); missing=all(9999); Analysis: Type=missing mixture ; MODEL: %oVERALL% C#1 ON AGE; Plot: type is plot3; series is viap(1) paplsilp(2) colpohgrp(3) hpv(4); Savedata: file is mps_save.txt ; save is cprob; format is free; Output: tech11 tech14; 


When I look at your input, I wonder if you want to include the ID variable in the analysis. By including it on the USEVARIABLES list, it will be used as a latent class indicator along with the four categorical variables. You would need to send your output and license number to support@statmodel.com for me to comment on the .95 correlation. I can't see where that would come from. 

SC posted on Tuesday, November 21, 2006  3:11 pm



TECH4 shows model estimated means. For example, my latent factor "F1" has three indicator variables X1, X2, and X3, all on a 7point likert scale. The sample statistics tell me that the means for the indicators X1, X2, and X3 are 5.105, 4.643, and 4.827. However, the mean for the latent factor F1 is only "0.123". =========================================================== Estimates S.E. Est./S.E. Std StdYX F1 BY X1 1.000 0.000 0.000 1.155 0.760 X2 1.274 0.091 13.949 1.472 0.917 X3 1.215 0.102 11.893 1.403 0.888 SAMPLE STATISTICS X1 X2 X3 _______ ________ ________ 5.105 4.643 4.827 TECHNICAL 4 OUTPUT ESTIMATED MEANS FOR THE LATENT VARIABLES F1 ________ 0.123 ESTIMATED COVARIANCE MATRIX FOR THE LATENT VARIABLES F1 ______ F1 1.333 =========================================================== How is the mean "0.123" for the latent factor calculated? Why is it so small when compared to the means of the respective indicator variables? In journals, we need to report the means and std.devs, so is this the value we report? 


Remember that the mean of an indicator y is: Mean(y) = intercept + loading*Mean(factor) 

A. Dyrlund posted on Wednesday, January 31, 2007  9:21 am



If I have a simple CFA model where: VARIABLE: NAMES ARE psq1psq15; MODEL: reward BY psq1, psq6, psq11; coercive BY psq2, psq7, psq12; referent BY psq3, psq8, psq13; legit BY psq4, psq9, psq14; expert BY psq5, psq10, psq15; What is the syntax for placing this CFA model in EFA in order to obtain the promax rotation results and eigenvalues? 


If EFA, all you can specify is the number of factors not which variables load on which factors. All variables load on all factors. You would say TYPE= EFA 5 5; to obtain the fivefactor solution. 

A. Dyrlund posted on Wednesday, January 31, 2007  11:20 am



Then is there anyway to obtain eigenvalues and an oblique rotation while specifying specific items loading on specific factors? 


You can get an oblique rotation if you do "EFA within a CFA"  see our course material, "Day 1". But Mplus does not give eigenvalues except for EFAs. 

A. Dyrlund posted on Wednesday, January 31, 2007  12:25 pm



I have purchased the manual already. Is the course material separate and also has to be purchased? I cant find a link on your site to the Day 1 course material. 


Note that the user's guide is available in pdf form on the website. See online ordering for the course handouts. What are you trying to do? 

jenny yu posted on Saturday, February 10, 2007  8:26 pm



Dear Drs. Muthen, I have some questions when I built my MIMIC model with DIF effects. I would appreciate if you can give me some clues. 1) I am implementing a iterative model building process by dropping a variable each time and using DIFFTEST to test significance of nested models. V3 mplus requires WLSMV to be the estimator to do DIFFTEST, however, my model fit is better when I used WLS. I read earlier discussion here and noticed that WLS was able to run DIFFTEST in earlier version. So I am wondering whether there is any way to run DIFFTEST with WLS estimator. 2) In earlier discussion, I also noticed a strategy of doing DIFFTEST with WLS and using WLSMV for the final model. Similarly, can I did DIFFTEST with WLSMV to achieve the final model and then use WLS to run the final one? Because in my case, the model fit (CFI and RMSEA) is better with WLS. 3) With 'residual' in 'output' statement, I get covariance/residual correlation/correlation matrix? Is there any way to output pvalues related to this matrix? Thank you very much for your time and help in advance. 


I am not sure choosing an estimator because it gives better fit is justifiable but I will let you make that decision. DIFFTEST is used only with WLSMV. With WLS, the difference in chisquare and degrees of freedom for the two nested models is used. I am confused by two things that you say. One is that you are using difference testing for a MIMIC model. In a MIMIC model, DIF of intercepts is looked at by regressing the items on one or more covariates. I am also confused about what you mean by dropping a variable. Nested models should have the same set of observed variables. Mplus does not give pvalues for residuals . 

jenny yu posted on Monday, February 12, 2007  7:41 am



I apologize for the confusion. I am trying a model building process, that's why I was dropping variable. I probably misconcepted the definition of 'nested model'. Isn't it that a full model vs. a restricted model (with fewer variables than the full model)? I think my question is that given a bounch of variables, DIF effect of which variable should be added to the model so as to achieve a parsimonious model (resulting in better model fit) instead of looking at DIF with all variables. When we look at significance of coefficients of a variable for all items (indicators), DIF effects on some items were signficant, some were not. How can we decide whether this variable should be kept in the model or not? what is a valid and doable strategy to select variable? Also is there any function in Mplus similar to Macro in SAS which can be used when we run something iterative. 


Nesting in reference to chisquare difference testing refers to models using the same set of observed variables where restrictions are place on a more general model. Following are some papers you might find useful: Gallo, J.J., Anthony, J. & Muthén, B. (1994). Age differences in the symptoms of depression: A latent trait analysis. Journals of Gerontology: Psychological Sciences, 49, 251264. (#52) Muthén, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557585. (#24) Muthén, B., Tam, T., Muthén, L., Stolzenberg, R. M. & Hollis, M. (1993). Latent variable modeling in the LISCOMP framework: Measurement of attitudes toward career choice. In D. Krebs & P. Schmidt (Eds.), New Directions in Attitude Measurement, Festschrift for Karl Schuessler (pp. 277290). Berlin: Walter de Gruyter. (#46) If by the macro in SAS you are asking about running several analyses at the same time, you can use a DOS batch file for this purpose. 

jenny yu posted on Monday, February 12, 2007  10:29 am



Thank you very much for your answers and the references. They are helpful. In addition, could you give me some instructions on my questions about building parsimonious model of DIF effect within MIMIC (the 2nd and 3rd paragraph in my previous post)? 


This is discussed in the papers I suggested. Models to study measurement invariance are also described in Chapter 13 at the end of the discussion of multiple group analysis. 

jenny yu posted on Sunday, February 18, 2007  3:31 pm



Thank you for the explanation. I would still like to ask  Can I do DIFFTEST with WLSMV estimator to achieve the final model and then use WLS estimator to run the final model to get model coefficients and other statistics? Thanks. 


You should stay with the same estimator. You should either use WLSMV for the entire analysis or use WLS for the entire analysis. 


I am looking for statistical help in doing SEM analyses using LISREL (or any other software for SEM analyses). I already did the EFA (using SPSS) and I only need to run the CFA. I do have all the conditions (correlations...) that needs to be enter in order to get the right model fit, however, I don't know how to operate the LISREL software very well. I am confused by Lisrel language!!! Thanks 


If you find the LISREL language too difficult, try Mplus. If you want to use LISREL, post your request for help on SEMNET. 


How I can post on SEMNET? Thank you. 


You can sign on to SEMNET by going to the following link: http://bama.ua.edu/archives/semnet.html 


I tried to subscribe to SEMNET at the link you sent me but it keeps telling me that there is a cookie problem and I can't get ride of it. It has been taking me on a circle ride for the last three days. HELP!! I need to know whether the Cronbach Alpa can be determined by Lisrel or it needs to be calculated? if so how? Thanks 


I don't know how to help you with the SEMNET problem. You should contract LISREL support to ask your question if you are using LISREL. 


I try to fit the following model (CFA): DATA: FILE=girls.dat; Format=free; TYPE=individual; VARIABLE: NAMES= famid y1 y2 y3 y4 Y5 y6 y7 y8 y9 y10 y11 y12; USEVARIABLES= y1 y2 y3 y4 Y5 y6 y7 y8 y9 y10 y11 y12; MISSING= ALL (999999); CLUSTER = famid ; ANALYSIS: TYPE= MEANSTRUCTURE COMPLEX MISSING h1; MODEL: f1 by y1* y2 y3 y4 y5 y6; f2 BY y7* y8 y9 y4; f3 by y10* y2 y4 y6 y11 y12; F1@1 F2@1 F3@1; [F1@0 F2@0 F3@0]; ![y1 y2 y3 y4 Y5 y6 y7 y8 y9 y10 y11y 12]; OUTPUT: TECH1 STANDARDIZED RESIDUAL MODINDICES(0); however, when I try to estimate intercepts, or when I start y12 at a certain value, the program gives a 'parsing error': *** ERROR Error in parsing line: "[OA VOC SIM AR COM PC PA BD SS CO DIG INF]" or *** ERROR Error in parsing line: "VOC* SIM AR COM dig INF*4" This error seems to be linked to variable y12. I checked the data; y12 is not strangely distributed, no uncoded missing values, not correlated 1 with other variables.. any suggestions? best sophie 


You would need to send your input, data, output, and license number to support@statmodel.com. I cannot see what the problem is from what you have included. 

Alex posted on Tuesday, June 26, 2007  8:28 am



Greetings, Todd Little (Little et al., 1999, SEM) recommends putting equality constraints on factor loadings when estimating a model with only two indicators per latent variables (fixing both loadings to be equal). In this case, I use continuous indicators. (1) How do we do that in MPlus ? (2) I tried doing it by fixing both loadings to 1. In this case, the estimates I obtained are both fixed to 1 without S.E. (0), the STD estimates are equal to one another but the StdYX differ. If I which to report standardized loadings, what should I do (if that is the right way to do it  ref question 1)? 


He must fix the metric of the factor by fixing the factor variance to one and have both factor loadings free and equal as follows: f1 BY y1* y2 (1); f1@1; 

Nina Zuna posted on Wednesday, July 11, 2007  2:05 pm



Dear Linda, I have a very elementary question. I would like to set a correlation to 1 between two factors. When I used the WITH command and @1, I received this error: NO CONVERGENCE. SERIOUS PROBLEMS IN ITERATIONS. ESTIMATED COVARIANCE MATRIX NONINVERTIBLE. CHECK YOUR STARTING VALUES. My goal is to do a chi sq diff test between a model in which the correlation between 2 factors is set to 1 vs a model in which the correlation is freely estimated. Below is the model: ChildFoc by q2_1 q2_2 q2_3 q2_4 q2_5 q2_6 q2_7 q2_8 q2_9; FamFoc by q2_10 q2_11 q2_12 q2_13 q2_14 q2_15 q2_16 q2_17 q2_18; ChildFoc WITH FamFoc@1; Thank you for your kind assistance, Nina 


I think a better way to do this is to use MODEL TEST. See the user's guide under MODEL CONSTRAINT to see how to label the parameters. And then see MODEL TEST which performs a Wald test. 

Nina Zuna posted on Wednesday, July 11, 2007  7:27 pm



Thank you, Linda; your speedy response was very much appreciated. I reviewed pps.484488 in the UG as advised. My new syntax is: ChildFoc by q2_1* q2_2 q2_3 q2_4 q2_5 q2_6 q2_7 q2_8 q2_9; FamFoc by q2_10* q2_11 q2_12 q2_13 q2_14 q2_15 q2_16 q2_17 q2_18; ChildFoc@1; FamFoc@1; ChildFoc WITH FamFoc (p1); MODEL CONSTRAINT: p1=1; MODEL TEST p1=.93 The correlation on the output is now 1.0; however, I want to make sure I interpreted your suggestion and the UG correctly: It appears that I can not conduct a model test (Wald test)for p1 since I constrained p1, right? If I am constraining it to 1, then I probably can't test it for a different value? Incidently, I noticed that the constrained model is the same as having all 18 indicators load on 1 factor (makes sense since I am saying the two factors perfectly correlate). However, I am back to square 1: given these two scenarios, what is the best way to test between the 2 models: Model 1 that allows the two factors to correlate freely vs. a model in which the correlation is fixed to 1 or a model with 1 factor identified by 18 observed indicators? Is my only option to run the two models separately and conduct the chi sq diff test by hand since I got the error message for Wald test(WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX)? 


Don't use MODEL CONSTRAINT. Instead use: MODEL TEST: p1 = 1; 

Nina Zuna posted on Thursday, July 12, 2007  9:20 am



Dear Linda, Great...I have the Wald's test at the end of my output!! If I may bother you to ask one final questionI want to ensure I am interpreting it correctly. It is significant. I removed the constraint so now the correlation between my two factors is freely estimated on my output (completely stdzd r=.93). I added the model test only as advised. Am I correct to assume the Wald test (p1=1) tested the significant difference between a correlation of .93 and 1.0? Thank you in advance for your final thoughts, Nina 


Yes, it tested the difference between .93 and 1.0. It is the same as the square of the z test: (.93  1) / std. error of .93 

Nina Zuna posted on Thursday, July 12, 2007  12:56 pm



Thank you so much!! The assistance you provide and the speed with which you respond to queries is phenomenal. With much gratitude, Nina 


Greetings, Drs. Muthen, Pardon such a basic question, but sometimes it is helpful to check one’s understanding of the basics. Would you be so kind to please clarify if the interpretation of the Covariances and Variances sections of the CFA output would differ if factor variances WERE vs. WERE NOT set @1? Let’s say we have a simple 3factor model, with no freed parameters (i.e., no freed factor loadings or covariances). Let’s also say that for Model 1, f1@1 f2@2 f3@1 and for Model 2 there are no such specifications. Obviously, the StdYX values would = 1.0 in the Variances section for Model 1. If your time permits, can you please clarify everything else (i.e., Estimates, S.E., Est./S.E., Std, StdYX)? Thank you, Masha. 


Greetings~ I apologize in advance, but I am relatively new to CFA and Mplus, so I had some rather basic questions. 1)I wanted to check my understanding of the use and computation of factor scores. I'm gathering that factor scores are individuals' predicted scores on a factor created by multiplying their score on each predictor by that predictor's factor loading and then summing these values. Is this accurate? And then, is it comparable to a composite of those indicators? Do factor scores mean the same thing for categorical variables? 2)I was trying to find the matrix to use to calculate the composite reliability based on an equation I was provided in a SEM class, and I believe I can request the matrix for categorical indicators using the Tech 4 output request. But, when I was looking into this on this discussion forum, if I did not completely misunderstand what was meant, it was suggested that with binary variables, the reliability is better considered using IRT. Additionally, not all of my indicators are binary, most are, but the others have 34 unordered categories. So, can I calculate the reliability in the same way with my categorical variables, and if so, will Tech 4 provide the matrix I need to use? If not, any basic references on IRT? 


Hi Brandi, Linda posted the following reply a few years ago that I addresses your first question: "A factor loading is a regression coefficient. If factor loadings are continuous, they are simple linear regression coefficients and are interpreted as such. They can be greater than one. There is a discussion of this on the LISREL website under Karl's Corner. If the factor indicators are categorical, then the factor loadings are probit or logistic regression coefficients depending on the estimator used in Mplus. " If you haven't already done so, use the search function in this discussion forum and I bet you'll be able to find posts that provide even more information for your first question, and then posts that address your second question. If not, you'll need to wait a few more days since the Muthen's are on vacation. 


Masha: When factor variances are fixed to one, correlations are estimated. When factor variances are free, covariances are estimated. See Chapter 17 of the Mplus User's Guide for a description of the output. 


Brandi: 1. For continuous outcomes, this is approximately correct. See Technical Appendix 11 for a description of how factor scores are estimated in Mplus. 2. The information functions are available in the PLOT command. See the IRT section on the website for more information. 


Thank you, Dr. Muthen, This forum is an unbelievable tool! Your responsiveness and patience are amazing. 


Hello, I'm referring to a post posted on May 04, 2001 "I would like to use Mplus to estimate a nonlinear relationship among latent variables [interaction]. Joreskog and Yang (1996) demonstrated such a model can be estimated using SEM if an observed product variable is used as an indicator of the latent product variable. Bollen (1995) used a twostage least squares with instrumental variables to estimate the interaction. How can I use Mplus to estimate an interaction model? Linda K. Muthen posted on Thursday, May 10, 2001  10:07 am Mplus cannot do what Joreskog and Yang demonstrated because it requires nonlinear constraints. The Bollen approach can be done in Mplus but it is not directly implemented. It would have to be done in a series of steps." Is this still the case or does Mplus deal with the 2SLS approach? What would be the mentioned steps? Thanks for your help.Stephan 


Since 2001, Mplus has added the XWITH option for latent variable interactions and MODEL CONSTRAINT for linear and nonlinear constraints. Latent variables interactions are estimated using maximum likelihood according to the principles described in the following paper: Klein, A. & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65, 457474. 


Dear Linda, thanks for your fast response and the literature. Best, Stephan 

Linda posted on Thursday, September 27, 2007  10:23 am



Hello, I ran a CFA with three continuous indicator variables, and I get a chisquare value of 0.0000, degrees of freedom = 0, CFI=1.0, TLI=1.0, RMSEA= 0.0000. It's a just identified model. In this situation, do I report the fit indices? Thanks in advance! Linda 


No. You can't test model fit in this situation. 

Linda posted on Friday, September 28, 2007  7:17 pm



Thank you for your prompt response. Then,what do I report?...the loadings and the r square of the continuous variables? Linda 


Also, you would report the standard errors of the estimates. 


I'm trying to save factor scores in CFA using the "idvariable is" statement in the variable command. The problem is that I need to merge the "factor score" dataset with a larger data set using 2 different identifiers (a family ID as well as an individual ID). Is there a way to save two identifying variables with the factor scores? I wasn't able to get the code to run with two variables listed in the idvariable statement. Thank you! 


The only thing I can think of is to create one variable from the two variables and then do the same thing in the larger data set when you do the merge. The length of the id variable is increasing to 16 in Version 5 so this might help. 


I am running a CFA with MPlus for the first time. I would like to have fit index like GFI, AGFI, Gamma 1, Gamma 2, TFI, NFI, NNFI. I don't know what option to type. Thank you for your help. 


These fit statistics are not available in Mplus. All available fit statistics are printed as the default. 


Thank you for your fast response. Can I submit an article on CFA without these fit statistics? 


I think the fit statistics we provide should be sufficient. 

Paul Silvia posted on Monday, November 05, 2007  1:38 pm



You might consult a recent paper by Bentler in Personality and Individual Differences, in which he suggests "best practices" for reporting fit statistics. (Less is more, I think: a few wellchosen ones are better than a laundry list of every statistic that a program will compute.) 

Matthew Cole posted on Wednesday, November 07, 2007  4:06 am



Bentler, P.M. (2007). On tests and indices for evaluating structural models. Personality and Individual Differences, 42, 825–829. 


Hi, I am running a cfa and i want to run a LM (Lagrangian Multiplier) test to identify which fixed parameters, if set free, would lead to a significatly better fitting. Which option is good for that, in Mplus? My second question is about modeling. Can I correlate a second order factor with a first order factor? For example, is it true to write : Model: f1 by v1v3; f2 by v4v8; f3 by v9v10; f4 by v11v14; f5 by v15v18; f6 by f1 f4 f5; Y on f6; f2 with f3; f6 with f2 f3; Thank you in advance 


You can obtain modification indices using the MODINDICES option of the OUTPUT command. I believe that you can correlate f6 with f2 and f3 because they are not part of the secondorder factor. 


Thank you Dr. Muthen. It was very helpful. 


Drs. Muthen, I am new to Mplus and have a question regarding a CFA that includes both categorical and continuous variables. I ran the following output and recevied a message that says "MODINDICES option is not available for ALGORITHM=INTEGRATION. Request for MODINDICES is ignored.1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS." Here's my syntax: Title: Family Characteristics CFA Data: file is H:\sociology\lgriepenstroh\runfeb4.dat; format is F8.2(41); Variable: NAMES ARE female grade hetero bioHH momed phyab sexab neglect dadwarm parcon depress depres1 depres2 depres3 depres4 depres5 depres6 depres7 depres8 depres9 depres10 depres11 depres12 depres13 depres14 depres15 depres16 depres17 depres18 depres19 run1 run2 binge bingeD marij drug delinq scheng pvict dpeer run3; MISSING ARE ALL (999); CATEGORICAL ARE sexab; USEVAR = phyab sexab neglect dadwarm parcon; Analysis: ESTIMATOR = ML; Model: FAMCHAR BY phyab sexab neglect dadwarm parcon; Output: Standardized; Modindices; Am I missing something that will enable me to run the model with both categorical and continuous indicators? Thanks so much for your help! 


When you have one or more categorical indicators and use maximum likelihood estimation, numerical integration is required. The error message is about modification indices not about having categorcal and continuous indicators. You can use ESTIMATOR=WLSMV as an alternative. If this does not help, please send your input, data, output, and license number to support@statmodel.com. 


Thank you for your help. 


I am running two CFA models, a threefactor model and a secondorder model. However, I am getting the same result for both models (model fit and estimates). Below are the syntax used. Am I missing something? Thanks. Model 1: Three factor Model VARIABLE: NAMES ARE v1v62; USEVARIABLES ARE v1v62; CATEGORICAL ARE v1v62; ANALYSIS: ESTIMATOR= wlsmv; MODEL: f1 BY v1item20; f2 BY v21item38 v61v62; f3 BY v39v60; OUTPUT: sampstat tech4; Model 2: Second Order VARIABLE: NAMES ARE v1v62; USEVARIABLES ARE v1v62; CATEGORICAL ARE v1v62; ANALYSIS: ESTIMATOR= wlsmv; MODEL: f1 BY v1item20; f2 BY v21item38 v61v62; f3 BY v39v60; f4 BY f1 f2 f3; OUTPUT: sampstat tech4; 


The secondorder factor that you add is justidentified. This is why it makes no difference. 

Zsuzsa Londe posted on Thursday, February 28, 2008  5:47 pm



Hi, You say that Mardia's coefficient can be generated using TECH13, which has to be used with a MIXTURE model, which has to be of mixed classes, and that "Nonemixture models can be estimates as a single class mixture to get that output." Could you please help me figure out how to make all continuous variables to be a "single class mixture" in order to get Mardia's? Thank you, Zsuzsa 


You need to use TYPE=MIXTURE with the CLASSES option. You specify the CLASSES option as CLASSES = c (1); One class is the same as a one group analysis. 

Zsuzsa Londe posted on Thursday, February 28, 2008  6:41 pm



Thank you very much for the incredibly speedy answer. I have been trying your suggestion but am not succeeding. I'm a new user and it is possible that I'm doing something very wrong but keep getting an error message "This analysis is only available with the Mixture or Combination AddOn." This is my input; Variable: Names are id digs_hu reads_hu words_hu nonwd_hu liss_hu corsi digs_en reads_en words_en nonwd_en liss_en comp_esl; Usevariables are digs_hu reads_hu words_hu nonwd_hu liss_hu digs_en reads_en words_en nonwd_en liss_en; Missing are all (9999) ; CLASSES = c(1); Data: LISTWISE=ON; MODEL: STM BY digs_hu words_hu nonwd_hu digs_en words_en nonwd_en; WM by reads_hu liss_hu reads_en liss_en; ANALYSIS: TYPE=MIXTURE; OUTPUT: tech13; 


You can obtain TECH13 on if you have the mixture or combination addon. It sounds like you don't. 


Hi Linda, Would you have any suggestions how to check multivariate normality with Mplus? I do have univariate nonnormality and my committee members would also like me to provide multivariate comparisons. Thank you, Zsuzsa 


This is the only way to do the check in Mplus. If you send your input, data, and license number to support@statmodel.com, I can do the run for you as a one time favor. 


I am attempting to use DIFFTEST in a CFA with categorical indicators to test a 1factor structure compared to a 2factor structure with 5 (binary) observed indicators. When I follow the instructions in Example 12.12, I receive a message that " THE CHISQUARE DIFFERENCE TEST COULD NOT BE COMPUTED BECAUSE THE H0 MODEL IS NOT NESTED IN THE H1 MODEL." Key parts of my setup follows: File 1: Snip> Usevariables are cohab divorce samesex DIsinbir DIpmsex ; Categorical are cohab divorce samesex DIsinbir DIpmsex ; Analysis: Type = general; ESTIMATOR = wlsmv ; Model: f1 BY cohab@1 samesex DIpmsex divorce DIsinbir; Savedata: DIFFTEST IS deriv.dat; File 2: Snip> Usevariables are cohab divorce samesex DIsinbir DIpmsex ; Categorical are cohab divorce samesex DIsinbir Analysis: Type = general; ESTIMATOR = wlsmv ; DIFFTEST IS deriv.dat; Model: f1 BY cohab@1 samesex DIpmsex; f2 BY divorce@1 DIsinbir; f1 with f2; 


The one factor model is nested in the two factor model so you should do the two factor model first. Note that you have different variables on the two CATEGORICAL lists. 


Linda, Thanks for your quick response. The different variables listed were just a result of my cut and paste to get down to the maximum character number for the message. I had initially run it the other way (2 factor first, 1 factor second) without success. When I tried again today it worked. I'm running on a remote terminal server and it is almost as if it wasn't having time to register or something. Anyway, I've got my difftest results now! Thanks! Mick 

Vivian Towe posted on Wednesday, May 14, 2008  2:08 pm



Linda, I ran a very simple CFA with categorical data (Likert scale responses). My model fit was very poor (chisquare, CFI/TLI, RMSEA) according to Yu's dissertation. I was thinking that one way to improve fit would be to see if the item residuals are correlated, but I don't know how to model that? Can you point me to some Mplus examples of CFA that specify correlated item errors? Or is there another way to deal with fit problems? Thank you. 


I don't know you situation but if you have more than two factors, I would start with an EFA. The WITH option is used to correlate residuals, for example, u1 WITH u2; You can look at modification indices to see other possible model misfit. 

Vivian Towe posted on Wednesday, May 21, 2008  10:53 am



I am running a CFA multiple group analysis. I am following your handouts for multiple group analysis. The 2 groups are male and female. When I ran a single group analysis restricting to female using the useobservations (gender eq 2) statement, the analysis ran. However, when I ran the analysis for males and females simultaneously using the statement grouping is gender (male=1 female = 2), I received the following error message: *** ERROR Group 2 does not contain all values of categorical variable: ESTEEM *** ERROR Group 2 does not contain all values of categorical variable: SEXREG *** WARNING This is true, for females, no one answered with the value '1' for either of these variables, but not sure how to fix problem. Any ideas on why the single analysis works but not the multi group? 


The groups are expected to have the same categories with weighted least squares estimation. You can collapse categories or use maximum likelihood estimation with the * setting of the CATEGORICAL option. See the user's guide for more information. 


My first question is whether there is difference between the "confirmatory factor analysis (CFA)" and "CFA conducted by using SEM framework or by using SEM software"? My second question is whether analysis of CFA would be suggested to be conducted by following typical analysis/ software for CFA or by using SEM software? Thanks for your feedback! 


If you have the same model, the same data, and the same estimator, all programs should give the same results. I see no distinction between CFA and SEM software. 


Hi there, i hope you can help me with this: I'm testing the factor structure of the SOC scale (12 items collected using 7point ordinal scales) with two different hipothesised structures (onefactor model vs. second orderfactor model with three latent factors of four items each which in turn load on the highorder factor) in a national study (using survey commands). The fit of the second order factor model is realively higher (CFI 0.98 RMSEA 0.083 AIC 231201.21) than that for the onefactor model (CFI 0.98 RMSEA 0.092 AIC 231559.79), however some correlations between latent factors are higher than one (Heywood cases?). These are my questions: 1) any idea why is that happening? 2) should I dismiss the secondorder factor model (which was my personal bet in this research) because of this inadmissible solution? 3) is there any way of constraining correlations to avoid values higher than one? Thanks in advance for your help, E 


Factors that correlate one are not statistically distinguishable. A secondorder factor model with three firstorder factors is the same as a model with three correlated firstorder factors. I suggest doing an EFA for 14 factors as a first step. 

Derek Kosty posted on Wednesday, June 18, 2008  1:23 pm



Hi, This question is regarding CFA. I am trying comparatively evaluate 5 models, some nested and some nonnested, by simultaneously taking into account the goodness of fit, sample size, and the number of parameters estimated. I know that Information criterion indicators are good for this purpose (eg. BIC and AIC). However, the problem is that my observed variables (lifetime diagnosis of different mental disorders) are dichotomous and have low base rates. I am aware that BIC/AIC indices are not appropriate when using the WLS estimator but I am unsure of the appropriateness of using ML and specifying my observed variables as continuous (due low base rates which cause a highly skewed distribution). I have seen multiple papers reporting BIC statistics while claiming that parameters were estimated using weighted least squares. This does not make sense to me. How would you recommend that I compare my models in this context? Thanks for your support! 


If you have only a small number of factors you can use ML. Using ML does not mean that you have to assume that the variables are continuous which it sounds like you are implying. If you specify the variables as categorical and the estimator as ML (or MLR), then the appropriate logit (or probit) model parts will be used in Mplus. I don't see how BIC can be computed with weighted least squares since it builds on the likelihood. 

Derek Kosty posted on Wednesday, June 18, 2008  5:27 pm



Dr. Muthen, Thanks for the quick reply. The model I am currently trying to run has six factors, each having between 3 and 6 observed variables loading on them. My sample size is 816. I specified the variables as categorical, requested the MLR estimator, and reduced the number of integration points to 5. The compiler has been working for about 3 hours now, is this normal? If so, what should I be looking for in the DOS window that could give a clue to how much longer the process will take? Maybe the slow pace is a result of running version 4.2? D 


As indicated in the User's Guide, six factors gives very timeconsuming computations. Particularly if you are not using a computer with at least 2 and preferably 4 or even 8 processors. Mplus takes advantage of multiple processors using parallelized code which gives considerable time savings (using the PROCESS= option in the ANALYSIS command). You should also use version 5.1. The DOS window shows you the iteration history and the time each iteration takes  usually you can get an idea from this of how long it will take to converge. But with this many factors I would recommend using WLSMV and instead of BIC compare models via fit measures such as SRMR. 

Derek Kosty posted on Thursday, June 19, 2008  10:15 am



Yu (2002) suggests that SRMR is not good when dealing with binary outcomes. It is unclear to me if his critique of the SRMR is with respect to the cutoff recommendation, or with the statistic in general. For example, can the values between models still be compared (e.g. which one is lower) and it be meaningful? 


It is not clearcut what to do here, but I think maybe measures such as CFI, which Yu (2002) found useful, may not be able to discriminate between neighboring models that are not far apart in terms of fit. Perhaps SRMR is more useful for this. 


In Mplus version 4.2 the SRMR is included in the output for a model in which the outcomes are all categorical, the latent variables are continuous, and WLSMV is the estimator. However, SRMR is not included in the output for version 5.1. What is the reason behind this and can I request the SRMR to be computed in version 5.1? Here is my model: MODEL: distr by LMDD4 LDYS4 LDPD4 LGOA4 LPTS4; fear by LSPE4 LSOC4 LPAN4 LOBC4; intern by fear distr;! LBIP4; fear@0; Thanks again. 


SRMR is not available when thresholds are in the model which is the default starting with Version 5. Add MODEL = NOMEANSTRUCTURE; to the ANALYSIS command. 


If I run a model using MLR as the estimator with categorical outcomes, I notice that fit indices such as CFI, TLI, RMSEA, SRMR, and WRMR do not appear in the output. Can they be requested as well? 


Nevermind, I discovered (from another thread) that with maximum likelihood and categorical outcomes, these fit statistics are not available because sample statistics are not sufficient statistics for model estimation. 


I am conducting a CFA with dichotomous observed variables with low base rates and an n=816. What method of estimation is most appropriate (MLSMV or MLR) and do you know of any articles that discuss this issue? In trying to resolve the question, I stumbled across an article in which Beauducel and Herzberg (2006) compare MLSMV with ML (not MLR). They use categorical data with Mplus version 3.11 and somehow are reporting CFI, TLI, RMSEA and SRMR for both methods of estimation. This contradicts my earlier discovery that "with maximum likelihood and categorical outcomes, these fit statistics are not available because sample statistics are not sufficient statistics for model estimation". Is this due to a difference in Mplus itself across versions, or do you think that the authors did not actually specify their variables as categorical within the model? Sorry about so many questions. I really appreciate all of your support! 


What is MLSMV? Do you mean WLSMV? 


Sorry, I did mean WLSMV. 


WLSMV uses weighted least squares estimation. Chisquare and related fit statistics are available with this estimator. MLR uses maximum likelihood estimation. With categorical outcomes, chisquare and related fit statistics are not available. With maximum likelihood and categorical outcomes, each factor requires one dimension of integration which can be computationally demanding. More than 3 or 4 factors is not feasible. Weighted least squares is a better option when you many many factors. 


Two more questions: 1.) Does your previous answer imply that Beauducel and Herzberg (2006) had to be using continuous data in order to get CFI, TLI, RMSEA and SRMR for both methods of estimation? 2.) Do you know of any articles that discuss using SRMR to compare across models with binary outcomes? 


1. I would think if the data were categorical, they were not using the CATEGORICAL option so it was being treated as continuous. 2. No. 


I have done a CFA using the default analysis TYPE=GENERAL. What is the rotation method used for TYPE=GENERAL? I thought the method and type of rotation was principal axis factoring with promax (oblique) rotation. Is this correct? Thank you. 


There is no rotation involved with CFA, only with EFA measurement structures. Are you referring to the new "exploratory SEM" approach? For EFA, Mplus does not use PAF, but estimators such as ML and ULS. A multitude of rotations are available  see the User's Guide. 


Thank you very much for your help yesterday! I have another question: Are those three syntaxes equivalent, when I use mplus 5 1. model: WMS BY mnamb4 mnam mvis mvisb4 mver mverb4; GS BY mifa miss minc; OBJ BY mchsmfu mchsmfi mchpwp mchpww; GenCog BY Zcrsa Zmcmu raven WMS GS OBJ; ________________________________________________________________________ 2. model: WMS BY mnamb4@1 mnam mvis mvisb4 mver mverb4; GS BY mifa@1 miss minc; OBJ BY mchsmfu@1 mchsmfi mchpwp mchpww; GenCog BY Zcrsa@1 Zmcmu raven WMS GS OBJ; _________________________________________________________________________ 3. model: WMS BY mnamb4* mnam mvis mvisb4 mver mverb4; WMS@1; GS BY mifa* miss minc; GS@1; OBJ BY mchsmfu* mchsmfi mchpwp mchpww; OBJ@1; GenCog BY Zcrsa* Zmcmu raven WMS GS OBJ; GenCog@1; with 3. I get other results as with 1 or 2, but I thought, those are equivalent specifications. Thank you! 


Numbers one and two are equivalent because you choose the same factor indicator to set the metric of the factor. In Number three you set the metric of the factor using a different factor indicator. Model fit will be the same but not parameters estimates. 


thank you very much for your answer! Unfortunatelly there is something wrong. Model fit is not the same between those models. model 3 fits much better. Which indicator will be used to set the metric in Model three, where the first indicator has a *? The second in the list? thank you for your support! 


When you free the first indicator and fix the factor variance to one as in model 3, no factor indicator is fixed to one. You would need to send the output from model 3, the output from either model 1 or 2, and your license number to support@statmodel.com so I can see exactly what you are doing. 

RDU posted on Thursday, December 11, 2008  8:18 am



Hello. I am trying to run a MIMIC model for eight ordinal indicators. My sample size is around 500. My data are nested (students clustered within schools), and due to my research interests I chose to use the aggregated or designbased approach (i.e., Type=complex in conjunction with cluster=school) to model my data. I have read several articles about MIMIC models, but have yet to see one where schoollevel effects are used as covariates. I have given some thought to looking at the direct effect of several schoollevel covariates on my latent factor. Though, perhaps this isn't substantively meaningful or correct in terms of modeling population heterogeneity. Questions: 1.) I was wondering what your thoughts are on using a combination of student and schoollevel covariates as predictors for the latent factor(s) in an aggregated MIMIC model? 2.) If it is true that regressing the latent factor(s) on a combination of student and schoollevel covariates is feasible (i.e., substantively interpretable) then how would one interpret the school level covariates? This is a bit confusing to me since the aggregated approach doesn't disentangle the between school and within school effects. Thank you so much for your time. Best, RDU 


You can do what you want but I would recommend using multilevel analysis because I do not believe that a MIMIC model is aggregatable. 

Anonymous posted on Tuesday, December 30, 2008  9:26 am



Hi: I am running a CFA with dichotomous response items and modeling on 8 latent factors. I want to run a logistic SEM model with ML estimator. I keep getting this error. I have tried to run this model on couple of computers but it does not seem to solve the problem. Where could I be wrong? *** FATAL ERROR THERE IS NOT ENOUGH MEMORY SPACE TO RUN Mplus ON THE CURRENT INPUT FILE. THE ANALYSIS REQUIRES 8 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.25629E+10 INTEGRATION POINTS. THIS MAY BE THE CAUSE OF THE MEMORY SHORTAGE. YOU CAN TRY TO FREE UP SOME MEMORY BY CLOSING OTHER APPLICATIONS THAT ARE CURRENTLY RUNNING. NOTE THAT THE MODEL MAY REQUIRE MORE MEMORY THAN ALLOWED BY THE OPERATING SYSTEM. REFER TO SYSTEM REQUIREMENTS AT www.statmodel.com FOR MORE INFORMATION ABOUT THIS LIMIT. 


With maximum likelihood and categorical outcomes, each factor is one dimension of integration. We recommend no more than four dimensions of integration. I suggest using weighted least squares estimation when you have several factors. 

Anonymous posted on Wednesday, December 31, 2008  12:31 pm



My main concern with weighted least squares estimation is that it does not allow one to run a logit model. This is true right? Because my outcome is dichotomous I would like to get a logit estimate. Can you please advice. Thanks. 


In Mplus weighted least squares estimates a probit model. If you want logistic regression, you need to use maximum likelihood estimation in Mplus. 

Anonymous posted on Wednesday, December 31, 2008  1:57 pm



So if I have more than 4 factors to estimate, Mplus will not allow me to run a logistic regression? 


Numerical integration is required with CFA and categorical factor indicators. A model with more than four factors is not feasible in this case. Your only option is using weighted least squares and probit regression. I don't see this as a problem. 

Anonymous posted on Wednesday, December 31, 2008  2:40 pm



Thank you so much for your quick responses. I apologize if I am belaboring on this point. I am have a problem with interpreting results of probit estimate in my context due to the differences in the distribution used. I am thinking of using the latent variable scores from Mplus and using them in a logit regression in STATA. Do you think that is a feasible approach? 


It sounds like you mean that you would first do WLS probit factor analysis, then estimate factor scores, and then regress each item on those scores to get logit results. If I am interpreting that correctly, that would seem to be less precise than using the probit factor analysis results and do the usual approximate translation to logit by the Sqrt(pi^2/3) factor. But I don't see much of a difference between probit and logit in practice to warrant any interpretational concerns. Probit modeling certainly seems accepted in IRT. 

Anonymous posted on Wednesday, December 31, 2008  3:29 pm



Thank you for your quick response. I think your suggestion it is a reasonable way forward. 

Anonymous posted on Friday, January 02, 2009  10:12 am



I did manage to run my models with WLS. Thank you. I am now wanting to run an interaction model. I am running a CFA with dichotomous response items and modeling on 8 latent factors. I want to create interaction between 2 of these 8 latent factors. Is it possible to do this? I read "5.13: SEM with continuous factor indicators and an interaction between two factors" in the Mplus user guide, but I don't think the command listed there is helping me. Is there any modification you can suggest? Thank you. 


The XWITH option is available only for TYPE=RANDOM which is available only with maximum likelihood estimation. 


Hi, Mplus computes standard errors of factors loadings in a CFA model. So, I was wondering if we can use a pool standard error to compare 2 factors loadings across groups instead of using a chisquare difference testing. If yes, 1) what are the advantages of using one instead of the other testing? 2) Is it possible for some reasons to have a different result for the 2 testings? Thank you in advance. 


I do not know how one would calculate pooledstandard errors in the maximum likelihood framework. I would use either a difference test or a Wald test via MODEL TEST. 

Derek Kosty posted on Wednesday, March 11, 2009  3:49 pm



Dear Mplus team, I am running a series of CFA models each including a different outcome measure regressed on the latent factors. Looking at the Rsquared values associated with the outcomes across the different models suggests that the significance of the Rsquared value is not based on magnitude alone. In other words, some outcomes have significant Rsquared values that are less than the nonsignificant Rsquared values of another outcome. I am sure that the answer lies in how the standard error is computed for Rsquared. Can you provide any input on this issue? Thanks for your support! 


The two tests are not the same. One tests whether the regression coefficient is significantly different from zero. The other tests whether the variance explained in the dependent variable is significantly different from zero. 

Derek Kosty posted on Wednesday, March 11, 2009  4:31 pm



Sorry, I don't think my question was clear enough. I am only looking at the test of variance explained in the dependent variable. Here is an example: 16.1% of the variance in years of school completed is explained by disruptive behavior and substance use factors. This Rsquared is statistically significant. 19.6% of the variance in lifetime prevelence of depression is explained by disruptive behavior and substance use factors. This Rsquare is not statistically significant. I am wondering how the first Rsquared is significant while the latter is not givin the relative magnitudes. 


Each Rsquare has its own standard error based on information related that that dependent variable. So one may have a larger standard error resulting in nonsignificance even though the absolute value is larger. 

Derek Kosty posted on Thursday, March 12, 2009  9:01 am



Can you provide further information on what goes into the computation of the standard error? 


The Delta method of computing standard errors is used. Google this for further information. 

Maša Vidmar posted on Wednesday, April 08, 2009  8:20 pm



Hi, I am running CFA with 1 construct and three indicators. It was my believe that this should result in a justidentfied model. But I get following error msg: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 5. Parameter 5 is loading for one of the indicators. What is causing this? And how to avoid it? Another (unrelated) question...in a different model I have 2 construct at two time points (so actually 4 latent variables) with 38 indicators. Fit indices and all estimates are exactly the same in CFA (no order imposed) and SEM models (paths). Is this expected? Thank you! 


A factor model with three indicators is just identified. You may have all factor loadings and the factor variance free. If this is not the case, you need to send the full output and your license number to support@statmodel.com. It sounds like you have a model where the four factors have an unrestricted covariance matrix such that covarying all factors gives the same number of parameters as regressing two of them on the other two. If so, the models are statistically equivalent and the fit should be the same. If not, I would need to see your full output and license number at support@statmodel.com to answer your question. 


Hi, I have attempted a CFA with 59 dichotomous items and 1000 observations. My fit statistics were less than ideal. I am attempting to analyze the misfit and report my findings to a nontechnical audience. There are some issues I could use your help on: Is the input matrix tetrachoric? If so, is there a way to obtain a print out of the input matrix? I have several positive residuals (as high as .283) Does this indicate that my model implied correlations are smaller than my observed correlations? 


Yes, the sample statistics used for model estimation are tetrachoric correlations with WLSMV and binary outcomes. These are printed is you ask for SAMPSTAT in the OUTPUT command. They can be saved using the SAMPLE option of the SAVEDATA command. A positive residual means that observed value is larger than the model estimated value. 

Greg posted on Wednesday, April 22, 2009  9:09 am



Hi, I'm new to the forum, so excuse me if my question may seem straightforward. I'm running a CFA on 29 reflective indicators (4 factors) and 2 observed continuous dependent variables (scores to a test). The output says: *** WARNING in MODEL command Variable is uncorrelated with all other variables: scorea *** WARNING in MODEL command Variable is uncorrelated with all other variables: scoreb *** WARNING in MODEL command All least one variable is uncorrelated with all other variables in the model. Check that this is what is intended. 3 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS The output gives relationships between the latent variables (e.g. f1 WITH f2) but not between the latent variables and the scores (e.g. f1 WITH scorea). Do I need to specify the hypothesized relationships between the latent var and the scores in the CFA already? Then, isn't it already a structural model? Thanks for the answer! 


Yes and yes. 

Ben Saville posted on Wednesday, May 06, 2009  9:33 am



Dr. Muthen, I have a 2level onefactor CFA that I'm trying to fit in Mplus. I found a very similar model in the article Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376398. Can you tell me how to code the one factor model with 8 indicators, in which students are nested within schools (Figure 1)? My best guess is the following: Variable: Names are y1y8; cluster = school; ANALYSIS: TYPE= TWOLEVEL; Model: %within% fw by y1y8; %between% fb by y1y8; Thanks in advance. 


This looks fine. Note, however, that between level residual variances often are vary small and may need to be fixed at zero. See the Topic 7 course handout for more information. 


I am using complex sample data with cluster, weight, and groups (2 groups). Could you please advice me how can I compute maximum likelihood (ML), pseudo maximum likelihood (PML) and pseudomaximum loglikelihood (PLL) estimators in MPlus Version 5.2? I am also interested in corresponding CFI Values. I didn't find the specific commands for these estimators in MPlus manual. Thanks in advance. 


With TYPE=COMPLEX, ESTIMATOR=ML gives ML and ESTIMATOR=MLR gives PML. I don't know what you mean by PLL. In all cases, if a chisquare test statistic is given, CFI is given. 


Thanks Dr. Linda for your response. But I ran With TYPE=COMPLEX, ESTIMATOR=ML . This command gives me the following error: *** ERROR in ANALYSIS command Estimator ML is not allowed with TYPE = COMPLEX. Default will be used. 1 ERROR(S) FOUND IN TH IENPUT INSTRUCTIONS. So, I can't use Type=Complex and Estimator=ML. What is the alternative way to estimate ML estimator? I found the Pseudo loglikelihood (PLL) estimator in the article: Asparouhov, T. & Muthen, B. (2006). Comparison of estimation methods for complex survey data analysis. In that article, the estimator is described by the equation 10. I am saying it PLL. Would you think PML and PLL (in rquation 10 of the above article) are the two different estimators or same. If these two (PML and PLL)are different how can I estimate in Mplus 5.2? In literature I found both are available in Mplus. 


I want to compare a "full" model with a "basic" model and include the same variables in both models. However, in the full model, some variables are treated as 1item measures. For example, the basic model has two factors  the first factor has two variables  and the full model is one where these same two variables are treated as separate oneitem measures (i.e., are not loaded on a factor). I am new to Mplus, and I am not sure what is the correct procedure for specifying these two oneitem measures in the full model. Should I fix their variance at 1 or leave them out of the model statement but in the USEVARIABLES statement? What different assumptions would I be making with this change? Ex: For the models below: All variables are continuous ANALYSIS: estimator=ml; OUTPUT: standardized mod(3.84) tech4; (Basic model) MODEL: f1 by v1 v2; f2 by v3 v4 v5; compared to (Full model #1) MODEL: v1@1; v2@2; f2 by v3 v4 v5; or compared to (Full model #2) MODEL: f2 by v3 v4 v5; >>in Full model #2, v1 and v2 are included in the "USEVARIABLES" statement but not specified in the MODEL statement<< Thank you in advance. 


I think the following is the way to go: MODEL: f2 by v3 v4 v5; Check to be sure v1 and v2 are correlated with f2 as the default. If not, add a WITH statement. 


Dr. Muthen, I have a few outliers in my data and I do not want to eliminate them. Is there a way to run a CFA model with certain cases identified as outliers? Or should I just run two models (one with outliers and one without) and compare them. Arina. 


You would have to do the analysis twice as you suggest. You could use the USEOBSERVATIONS option to select cases that are not outliers. 


Is it possible to use weights in conventional SEM analysis? In particular, how can I use weights to estimate CFA model parameters, when ML estimator is used in Mplus? Another question is, I found the correlation between two factors is more than 1 in CFA. This gives me error. So, is there any way to solve this problem? 


Yes, it is possible to use weights in conventional SEM analysis using any estimator. See the following paper which is available on the website: Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12, 411434. See also the WEIGHTS option in the user's guide and the topic Complex Survey Data in the user's guide for information about all of the complex survey data features in Mplus. When two factors correlate more than 1, they are not statistically distinguishable. Only one can be used in the model. You may want to go back to an EFA. 


I ran a CFA and the chisquare degrees of freedom was greater than the sample size. Am I violating any assumptions here? 


There would be a problem if there were more free parameters than number of observations. You are fine. 


Hi, I'm trying to do a 3level (students in classes in schools) multilevel CFA in Mplus. Is it possible to incorporate the 3rd level by using the COMPLEX function in the commands? Thanks! 


Yes. You don't get 3level modeling, so no estimated 3rdlevel random effects, but you do correct the 2level model SEs for the level 3 nesting. 


Dear muthen, I analysis two confirmator factor analysi with mplus 5.12 but something result cannot estimated. STDYX Standardization TwoTailed Estimate S.E. Est./S.E. PValue F BY X1 0.279 0.000 999.000 999.000 ****** (unestimated) X2 0.637 0.017 36.761 0.000 X2 WITH X1 0.230 0.030 7.648 0.000 Intercepts X1 0.000 0.026 0.000 1.000 X2 0.000 0.026 0.000 1.000 Variances F 1.000 0.000 999.000 999.000****** (unestimated) Residual Variances X1 0.922 0.000 999.000 999.000****** (unestimated) X2 0.594 0.022 26.894 0.000 thank you, tam 


These parameters are fixed by you and are therefore not estimated. "999" says that something should not or could not be computed, which clearly is the case here. 

Anne DeField posted on Tuesday, September 15, 2009  2:12 pm



I am running a CFA with both metric and dichotom items. Here I assume the following structure: a latent variable is represented by four observed variables: Three of them consist of metric items and one consists of dichotome items (yes/no). How do I handle these items in this factor anaylsis? 


Put the dichotomous one on the CATEGORICAL list in the VARIABLE command. The default estimator for this situation is WLSMV. You can ask for maximum likelihood using the ESTIMATOR option of the ANALYSIS command. 

Anne DeField posted on Thursday, September 17, 2009  7:38 am



Thanks! 

Da C posted on Tuesday, November 03, 2009  7:41 pm



Hi, I ran a 3factor CFA model with categorical/ordinal indicators. The means of the 3 factors were set to 0 and their variances set to 1. The loading of the first indicator within each factor was set free. This analysis was weighted and I used the WLSMV estimator. The model results and STDYX results were identical. Are these results the structure or pattern loadings? Which ever it is how would I estimate one from the other? Thank you! 


The raw coefficients and STDYX are the same because factor variances are one and variances of the latent response variables underlying the categorical variables are one. The coefficients are factor pattern coefficients. The matrix for these coefficients is lambda. The matrix for the factor variances and covariances is psi. The product of the two matrices gives the factor structure coefficients. 

Da C posted on Wednesday, November 04, 2009  1:46 pm



Thank you very much for your quick reply! I have yet another question concerning my analyses. I first ran an EFA with categorical/ordinal indicators. This analysis was weighted and I used the WLSMV estimator. Based on the interpretability of this EFA and the fact that there were 3 eigenvalues > 1, I chose the 3factor solution. Based, on the 3factor simple structure I identified in the EFA I ran a CFA model. As I described in the previous post, I ran a 3factor CFA model with categorical/ordinal indicators. The means of the 3 factors were set to 0 and their variances set to 1. The loading of the first indicator within each factor was set free. This analysis was weighted and I used the WLSMV estimator. My question concerns the latent correlations from the EFA 3factor solution and the CFA confirming the 3factor model. The CFA correlations are much greater than the EFA correlations: 1st & 2nd factor: 0.47 vs. 0.79 1st & 3rd factor: 0.57 vs. 0.92 2nd & 3rd factor: 0.46 vs. 0.73 Why would this occur? Does this indicate an issue with the model or latent correlation estimation in EFA or CFA? Which one is the correct one? Thank you! 


When you create the simple structure CFA and fix factor loadings to zero, this influences the correlations between the factors by forcing the relationship to go through the factors. See the Asparouhov and Muthen and Marsh papers on the website under ESEM. This issue is discussed. 

Daiwon Lee posted on Monday, March 08, 2010  2:32 pm



Hi, I am trying to confirm my measurement model by applying CFA on each of latent construct. However, when I run CFA using three items misfitting seems to occur automatically. Every time I use CFA with threeitem constructs I get .000 for RMSEA and SRMR and 1.000 for CFI/TLI. Please see the syntax bellow. TITLE: CFA for material strain w1 base model; DATA: File is .dat ; VARIABLE: NAMES ARE ..... usevariables are clothes1 allow1 goods1; Missing are all (9999) ; MODEL: matstrw1 BY clothes1 allow1 goods1; OUTPUT: sampstat modindices(all) residual patterns FSDETERMINACY; Please help me! Thank you. 


The reason that you get this is because a factor model with threeindicators is justidentified. Fit cannot be assessed. There are no degrees of freedom. 

Daiwon Lee posted on Monday, March 08, 2010  9:26 pm



Thank you for the note. Then, is there any way to identify whether the justidentified model is good fit or not? In other words, how can we identify threeitem underlying construct model is good fit or not? Thank you! 


No. That is why it is a good idea to have no fewer than four factor indicators. 

Brian Hall posted on Thursday, March 18, 2010  3:24 pm



Hi, I am working on a couple of CFA models. Here are the model statements: MODEL: F1 BY x1 x2 x3 x4 x5; F2 BY y1 y2 y3 y4 y5 y6 y7; F3 BY z1 z2 z3 z4 z5; F4 BY F1 F2 F3; MODEL: F1 BY x1 x2 x3 x4 x5; F2 BY y1 y2; F3 BY z3 z4 z5 z6 z7 z1 z2 z3; F4 BY zz4 zz5; F5 BY F1 F2 F3 F4; I am getting the following error: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. Model1 PROBLEM INVOLVING VARIABLE F2. Model2 PROBLEM INVOLVING VARIABLE F4. I do have in both cases correlations with the latent variable F2 or F4 that exceed 1, and errors that are negative. My question is how best to deal with this issue. I have tried fixing the variance of the offending variances to 0 using this command: F2@0 or F4@0. When I do this, my models become unidentified. I have tried using different start values, even high values using this command: F2*.7 or F2*2, and this did not change the PSI error. I have tried fixing the variance of those latent variables to 1 using this command F2@1 or F4@1. This allowed the models to run (Rindskof; Psychometrika, 1983). Is this is a correct thing to do? I appreciate your help. Brian 


When you give the factor variances to one, you should free the first factor loading, for example, F1 BY x1* x2 x3 x4 x5; f1@1; This might allow you to see if the problem is that the first factor loading is not close to one so fixing it to one is a problem. You should not both fix one factor loading to one and fix the factor variance to one. 

finnigan posted on Tuesday, April 20, 2010  6:41 am



Hi There I have a 56 item scale answered on a 5 point likert scale.I am planning to do a two level CFA. Unfortunately the scale items are not normally distributed despite attempts to use log transformations. AM I correct in saying that a factor analysis can be conducted in MPLUS using MLR which adjust chi square for non normality while using ML under non normailty should give larger standard errors. It might be useful to compare the two outputs Can MPLUS provide corrected correlations as part of the output. Thanks 


If the items have floor or ceiling effects, you should not transform them. You should treat them as categorical variables. You can then use the default of weighted least squares or use maximum likelihood. The categorical data methodology using either estimator deals with the floor and ceiling effects. If you do not have floor or ceiling effects, you should not transform them but instead use MLR. 

finnigan posted on Tuesday, April 20, 2010  8:17 am



Hi Linda How are floor and ceiling effects detected in data? 


By looking at the univariate frequencies. If the lower or upper categories have a piling up of frequencies, this indicate a floor or ceiling effect. In this case, the variables should be treated as categorical. 


I would not log transform, i.e. producing something that you don't have (normal distribution) and making interpretation of your results more difficult. I would use MLR instead. In case of nonnormal distributions, I guess, MLR SE's are always more trustworthy than ML SE's, so a comparison may make no sense. Floor and ceiling effects should be detected by inspecting a graphical display of the distribution, like a histogramm. Many values at the higher end of your distribution indicate a ceiling effect, many values at the lower end a floor effect. 

finnigan posted on Wednesday, April 21, 2010  9:01 am



Is there any recommended cut off criteria such as 20% of observations located on the lowest or highest response category? Thanks again for the help. If so is there any references to support the cut off 


It is the bivariate tables that are most important. They should not contain zero cells. 

Enrique posted on Thursday, April 22, 2010  10:07 am



I used a questionnaire with 25 Likerttype items (5 levels), sample n = 129. Ordinal variables, and multivariate nonnormal distribution. I run EFA with promax rotation, wich showed that the 25 items are grouped in 4 factors: all variables loading > 0.55 except 1, and 4 variables with crossloadings > 0.25. When I run CFA on these factors, the fit was very poor (RMSEA= 0.20; CFI = 0.84, SRMR = 0.17). The same happens when I change paths according the modification indices suggested, choosing 1, 2 or 3 factors, or eliminating the items with crossloading. I need some help for testing the source of misfit. thanks 


Why don't you use the Mplus default EFA rotation method which will give you SEs for all factor loadings so you can see which ones should not be fixed at zero in the CFA. It also gives you Modification Indices for residual correlations. Or, don't move to CFA by stay with ESEM  see our web site. 

Enrique posted on Friday, April 23, 2010  2:51 am



Thanks, Sorry but I'm not familiar with that acronym, what is SEs? 


These are standard errors of the parameter estimates. The ratio of the parameter estimate to its standard error is a ztest which assesses significance. 

finnigan posted on Saturday, April 24, 2010  12:16 am



Linda If I have data that appears MCAR, but demonstrates significant non normality. Can MLR still be used if there are no ceiling effects? Or should FIML be used on account of the missing data, but what deals with the non normailty of data? Thanks 


MLR is the best choice you have in this case. It is fullinformation maximum likelihood. 

finnigan posted on Tuesday, April 27, 2010  8:16 am



Does MLM and MLR estimation of non normal data adjust correlations among observed and latent variables for non normality? 


MLM and MLR are robust to nonnormality. This does not affect the correlations among observed or latent variables. It affects the standard errors. 


Hello I'm running a longitudinal CFA with 6 time points and 4 indicators per time. I made my model specification as in example 5.1. as the initial model for a sequence testing measurement invariance. Although my sample is N=380, I got a large significant Chisquare. Could there something else be the reason than a worse model fit? Thanks M. 


Perhaps the four indicators are not unidimensional. Or they may be nonnormal. 

ann bell posted on Tuesday, July 13, 2010  11:05 am



What does it mean when the output for MPLUS reads this: NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED. as opposed to giving me fit scores. 


It means that the model was not able to converge in the default number of iterations. See pages 415417 of the Version 6 User's Guide for suggestions. If these do not help, send the output and your license number to support@statmodel.com. 


I am having some trouble with my analysis. I am running a CFA for my path analysis. The model fit of the the latent factors is good (CFI = 1, RMSEA = 0), howerever the twotailed pvalues of the undstandardized results are nonsignificant for some indicators. I already rescaled (logtransform) my variables since the variance was very large and I constrained the variance of the factors (@1). Furthermore, I receive the warning: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. The problems involves my indicators. My data includes negative values > could this be the problem? And could it also affect the significance of the indicators? Thank you for your time! 


Please send the full output and your license number to support@statmodel.com. 


Dear Dr. Muthén, I have performed a simple twofactor CFA and received a positive correlation between factors (.50), however it is contradictory to my expectation (I expected a negative correlation). Then I calculated by hand the scores with summarizing the appropriate items, and calculated the correlation which was negative (.15) as I would expect it. Do you have any idea about this discrepancy? Which correlation should I rely on and which should be reported? Best regards, Robert 


It sounds like you may be reading the data incorrectly or that the factor loadings are all negative resulting in a sign change for the covariance. I would need to see the full output, your calculations, and your license number at support@statmodel.com to say for sure. 

nanda mooij posted on Wednesday, August 04, 2010  11:17 am



Hi, I have a question about my model, it doesn't fit (not positive definite). This is what I'm trying to fit: f1 BY v1v16; f2 BY v17v32; f3 BY v33v48; h1 BY f1 f2 f3; In the output I saw that de correlation between f1 en h1 is above 1. I've tried to fix f1 at zero, like this: f1@0, but that doesnt help. I also tried to add a correlation to the model: h1 WITH f1, and try to fix the correlation with the factor loadings freed like I red in the discussions, but that also doesnt make the model fit. So what can I do more to fit the model while there is a high correlation between two factors? Thanks very much! 


How does the model look without the secondorder factor, for example, f1 BY v1v16; f2 BY v17v32; f3 BY v33v48; I suspect you have problems already then. 

nanda mooij posted on Wednesday, August 04, 2010  12:07 pm



I tried this, and it fits, no problems. What does this mean? I want to present the results of the original model, so is there a way to fit this model with a secondorder factor? Thanks, Nanda 


When you add the secondorder factors, I assume that the residual variance of f1 is negative and this is why you fix it to zero. Is this the case? 

nanda mooij posted on Wednesday, August 04, 2010  3:20 pm



Yes, Linda, this is the case. Also, the output says the Rsquare of f1 is undefined. When I fix the res. var. of f1 to zero, Mplus says the covariance matrix is not positive definite, while I don't see anything strange about it, no negative values in that matrix. 

nanda mooij posted on Thursday, August 05, 2010  6:37 am



How can I solve this problem? Do I have to leave the second order factor out of the model or can I still fit this model in another way? Thanks for all your help! 


If the negative residual variance is small and not significant, you can fix the residual variance at zero and ignore the not positive definite message. If the residual variance is zero that means that the 1storder factor represents the 2ndorder factor perfectly. 

nanda mooij posted on Thursday, August 05, 2010  2:01 pm



Thanks for the answer. If I want to get the factor scores of f1, f2, f3 and h1, can I fit the model without h1, so that the factor scores will be calculated for f1, f2 and f3. Can I then assume that the factor scores of f1 are the same as the factor scores of h1? Thanks 


I don't think this will work. Try fixing the f1 variance at .001. 


Hello, I am conducting a CFA with order categorical indicators. I conducted the CFA with the categorical option specified, and was wondering how best to interpret the fit indices associated when the WLSMV estimator is used. I ask because it seems highly unlikely to me that the fit indices under WLSMV are directly and perfectly comparable to fit index values that are obtained in the "usual" CFAs conducted in Psychology with Max.Likelihood and indicators that are treated as continuous. I realize that "rules of thumb" are usually overly simplistic, but are there any journal articles that address how to interpret fit indices under WLSMV? Thank you for your time. 


See the Yu dissertation on our website where cutoffs are looked at for binary and continuous outcomes. She finds similar cutoffs in both cases. Yu, C.Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Doctoral dissertation, University of California, Los Angeles. 

Dallas posted on Saturday, October 30, 2010  4:28 am



I have a question. I'd like to conduct a fullinformation factor analysis of categorical data. I have 19 indicators of a single trait. I understand that I simply have to specify the estimator to get this. I've done that. However, I would like to allow some of the uniquenesses to correlate in the model. When I do this with ML, I get an error saying "Covariances for categorical, censored, count or nominal variables with other observed variables are not defined". How can I go about allowing correlated errors in a factor model with categorical data and maximum likelihood estimation? 


You can do this using the BY option as illustrated in Example 7.16. Note that each residual covariance adds one dimension of integration to the analysis. 

Resmi Gupta posted on Tuesday, November 02, 2010  7:42 pm



I am conducting CFA for polytomous items usgin WLSMV. Is there a way to get factor scores ? Here is my code : y1 by a1*, a2, a3, a4, a5; y2 by a6*, a7, a8, a9; y1@1 , y2@1; y1 with y2; Thanks 


Use the FSCORES setting of the SAVE option in the SAVEDATA command. See the user's guide for further information. 

Resmi Gupta posted on Wednesday, November 10, 2010  7:23 pm



My factor scores distribution is not normal. Is it because I have highly skewed items ? Is the mixture distribution would be better choice here ? Thanks. 


Dear Dr. Muthen, I need your advice, conducting a simple cfa with one factor. There is an error message i don´t know. Even though everything is right with the names of variables and Missings are defined as 99, there is this error: "unable to expand" the factor. Please help me. Tnx The following syntax was used: VARIABLE: Names are tsk1_a tsk2_a tsk3_a tsk5_a tsk6_a tsk7_a tsk9_a tsk10_a tsk11_a tsk13_a tsk14_a tsk15_a tsk17_a; Missing is ALL (99); Model: TSK13 by tsk1_a tsk2_a tsk3_a tsk5_a tsk6_a tsk7_a tsk9_a tsk10_a tsk11_a tsk13_a tsk14_a tsk15_a tsk17_a; Analysis: estimator=mlr; Output: Standardized; modindices; residual; *** ERROR Unable to expand: TSK13 


Jane: Variable names cannot contain a dash. That is used to specify a list. Rename the factor tsk13 or something without a dash. 


Resmi: Skewed items can cause a nonnormal factor score distribution. This is not necessarily an indication that the normality assumption for the factor is wrong, but rather that the items are not optimal  they don't discriminate well between people with high or low factor scores. 

ywang posted on Friday, November 12, 2010  11:43 am



Dear Drs. Muthen, We would like to report the ttest results of a latent factor. Is it possible to conduct a Ttest for a latent factor underlying three indicator variables with Mplus? Thanks! 


What would the ttest test? 

ywang posted on Friday, November 12, 2010  7:17 pm



For example, the ttest will test whether the factor score is equal between males and females. Thanks! 


For this you do a 2group analysis with measurement invariance and estimate the factor mean in one group with the mean fixed at zero in the other group. The ztest for that mean is then what you want. See Topic 1. You don't do it via estimated factor scores. 


Dear all, I have 2000 respondents who have each rated two out of a set of 10 (rotating randomly) stimuli. The respondents have answered 20 questions for each stimuli. So, a total of 400 responses per stimuli X 20 variables. The grand total dataset contains 4000 judgments [2000 respondents X 2 stimuli each] x 20 variables. I want to perform EFA and CFA for each stimuli, but I am still unclear about the treatment of the 'repeated measures' nature of the exercise given the random rotation of the stimuli shown to respondents. Any comments? Furthermore, a separate issue.. Some of the stimuli are nested within others in terms of attributes. So, this also is a 'nested' EFA / CFA problem. Any comments on that front too? Regards 


I should preface this with saying that I am not an expert on this type of design, but here is my quick take. The random rotation of stimuli assures that the subject groups corresponding to different pairs of stimuli are randomly equivalent so that you can say that all stimuli responses draw from the same population. I would just do one analysis per stimuli pair, so using 400 subjects for each analysis where the number of variables is 2x20=40. If you feel there are important differences between stimuli pairs, you could do a multigroup analysis for the 5 groups to compare factor solutions. Regarding your last question, I don't see a special modeling for this, but would merely keep this nesting in mind when interpreting the findings. 

ywang posted on Monday, November 29, 2010  12:23 pm



Dear Drs. Muthen: For the CFA model using MLR, the scaling factor for the loglikehood is H0 and H1 is 1.168, but the scaling factor for the chisquare is 1.0. For this model, should we use MLR Or ML? Is there any cutoff for the scaling factor which can indicates whether we should use MLR or ML? Also are chisquare and loglikehood scaling factor always different? Thanks! 


The scaling correction factors differ for chisquare and the loglikelihood due to the fact they are in different metrics. Generally speaking you should use MLR if the scaling correction factor is different from one. 

Catherine posted on Friday, March 04, 2011  4:47 am



Dear Dr Muthen, I want to test a 3 factor model with categorical variables. Now the 3 factors only contain 19 of the total 26 variables. What should i do with the other 7 variables? Should they still be in de USEVARIABLES option or not? And how will i know if these variables load on one of three factors? Hope you can help me, Cat 


All variables on the USEVARIABLES list are used in the analysis. You should leave the variables off if you don't want them included. If you want to see how they load on the factors, you need to include them. It sounds like you should start with an EFA of the total set of variables. 


For a one factor CFA solution, I am getting an error message  "chisquare value could not be computed because of too many categories"  Can you please help me understanding why am I getting this message ? I have total 11 items which are ordinal in nature ( 1 5 scale) Thanks Reshmi Gupta 


I have not seen this message. Please send the full output and your license number to support@statmodel.com. 


Hi, I have 3 latent variables, each latent variable consists of 3 indicators, the indicators are all Likert scale 5points. My question is: Likert scale as indicator is treated as continuous or categorical variable, hence can i say that my latent variable is categorical (expressed by c) or continuous (expressed by f)when i run CFA model. Thanks 


If you put the Likert variables on the CATEGORICAL list, they are treated as categorical. If you do not, they are treated as continuous. In both cases, the latent variables specified using the BY option are continuous. 


I ran a CFA and the residuals and the fit indices sound too good to be true (RMSEA = .00, CFI= 1.00). The factor coefficients are between 0.6 and 0.91, the alphas for the factors are .89.95, and the correlations between items belonging to the same factor range between 0.70.98. 1. I am guessing there is a lot of multicollinearity in this set up. Hence, the high fit. Correct me if I am wrong. Is there a way to deal with this problem? I read somewhere that centering might be a technique that people use. Any suggestions? 2. A factor requires at least 4 items. He is left with two items in a factor (in another model). But these are important items. How do people normally deal with this situation (you want to include the items, but do not want to construct a factor with 2 items)? Thanks Rob 


Regarding the fit, please send the output and your license number to support@statmodel.com. Regarding the two items, if that is all you have you have no choice but to use them knowing the pitfalls. The model is not identified unless information is borrowed from other parts of the model and model fit for that factor cannot be assessed. 


Hi, I have two latent variables (my model two main constructs), i have run CFA for the first latent variable (first construct) then I have run CFA for the second latent variable (the second construct) and i got good fit indices for each construct, Now I want to check the covariance between those two latent variables (constructs) but i do not know how can i do that? could you please give any advice. Thanks. 


If you include both factors in the MODEL command, the covariance will be estimated as the default. 


Hello, I am interested in testing measurement equivalence of a model across levels of several ordered categorical variables. Can you recommend any citations related to best practices in this type of situation? Also, can MPLUS handle this type of procedure? If so, can you point me in the(very) general direction of how to conduct such an analysis? Thank you very much. Richard Hermida 


I'm sending you a paper you may find helpful. You can start from the inputs found in the Topic 2 course handout on the website under multiple group analysis. This looks at binary items. You can extend it using the inputs shown in the Topic 4 course handout under multiple indicator growth. This uses ordinal items. 


Dr. Muthen, Thank you very much for the response and for the sent article. I will read the article (probably multiple times) with great interest. I think my last post (4/26) might have been a bit unclear. I am interested in assessing if a measure differs not between different groups, but across the continuum of a continuous variable (technically measured by ordered categorical items). For example, if I were to hypothesize a particular 4factor model for my data, could I test if this model holds as values for a continuous variable increased or decreased across a continuum of values? Is that possible? Or would I need to split the continuous variable into multiple groups to conduct any test of measurement equivalence? Thanks again. My apologies if this is indeed covered in the materials you indicate earlier. Richard Hermida 


There are two things you can do: 1. Categorize the continuous variable and use multiple group analysis. 2. Use the XWITH option to create an interaction between the factor and the continuous variable and regress the factor indicators on the interaction. 


Hi, I have an annoying problem. I am running a multigroup CFA. When I try to save factor scores using SAVEDATA the fit statistics and factor loadings I get are different from the model without the SAVEDATA command. Otherwise everything is identical between two specificaitons. Any ideas? 


Please send the files and your license number to support@statmodel.com. 


Hi, Do you know of any rulesofthumb for the maximum number of indicator variables you should have on a latent factor? I read some ideas about parceling variables but it seems controversial….so I’m wondering at what point you start worrying about the number of items you have for a scale. Thanks. 


I would recommend no less than four indicators for the reason that you can't test the fit of a factor model with less than four indicators. 


Hi, Thanks but I was concerned about the maximum of indicators. For example, is 12 items, 20 items, etc. too many items for one latent factor? Thank you. 


I don't think there is an upper limit on the number of factor indicators. If you have 15 unidimensional items, that should be sufficient for creating a sum score. So between 4 and 15 should be optimal. 


Drs Muthens, I have run a CFA with 28 categorical items and three factors. i have a fairly large, weighted sample n= 875. The fit of the model is not acceptable. However if i dekete 2 items which have low R2 and non significant factor loadings I get a better fit. I wonder however if there is a way to test whether the fit of this modified 26 item model is significantly better than the original 28 item model? Thank you, Elisabet 


I don't know of such a test. 

Oana Lup posted on Tuesday, May 31, 2011  5:44 am



Could you please help? I test for significance of the difference between intercepts for east european and west european countries. USEVARIABLES ARE poldisc polint media pid female age agesq educlev emp mar urban swi rus por den wger eger nl slo nor ro mold sp; CENTERING = GRANDMEAN (poldisc, polint, media, age, agesq, educlev); Missing are all (999); MODEL: poldisc ON polint media pid female age agesq educlev emp mar urban swi(mswi) rus(mrus) por(mpor) den(mden) wger(mwger) eger(meger) nl(mnl) slo(mslo) nor(mnor) ro(mro) mold(mmold) sp(msp) !swe (mswe) ; [poldisc] (mint); MODEL CONSTRAINT: NEW = (hm); hm =(((5*mint)+mrus+meger+mslo+mro+mmold)/5) (((8*mint)+mswi+mpor+mwger+mnl+mnor+msp+mswe)/8); OUTPUT: TECH1; and I get these warnings : *** WARNING Warning: No specification of mean structure analysis in 'ANALYSIS' paragraph. *** ERROR in Model Constraint command Unknown parameter label in MODEL CONSTRAINT: NEW in assignment: NEW = (HM) thanks very very much, Oana 


It sounds like you may be using an old version of the program where you must specify TYPE=MEANSTRUCTURE; to include means in the model. NEW should not be followed by an equal sign. It should be NEW (HM); 

Oana Lup posted on Tuesday, May 31, 2011  2:23 pm



thanks very much! added the TYPE=MEANSTRUCTURE and this indeed sorted out the problem. i also removed the equal sign and run it again but am still getting this error message. ERROR in Model Constraint command Unknown parameter label in MODEL CONSTRAINT: NEW(HM) in assignment: NEW(HM) = really hope you can find out what this is. many many thanks!! Oana 


There should not be an = sign. Please see the NEW option in the user's guide for the correct specification. 

Oana Lup posted on Wednesday, June 01, 2011  4:25 am



Yes I did. now my script looks like MODEL CONSTRAINT: NEW(hm); hm =(((5*mint)+mrus+meger+mslo+mro+mmold)/5) (((7*mint)+mswi+mpor+mwger+mnl+mnor+msp)/7); but I am still getting the error message: *** ERROR in Model Constraint command Unknown parameter label in MODEL CONSTRAINT: NEW(HM) in assignment: NEW(HM) = am really not understanding where the problem is thanks, Oana 


Please send the full output and your license number to support@statmodel.com. 

Oana Lup posted on Thursday, June 02, 2011  5:33 am



thanks very much. there was a problem with our school program i think. now it works. many thanks, Oana 


Drs Muthens, I am running a CFA with 28 categorical indicators and three latent factors. I am trying to test for gender invariance: using the command: Grouping is GENDER (1 = female 2 = male); However I get the following error message: *** ERROR Based on Group 2: Group 1 contains inconsistent categorical value for STRS26_1: 5 What does this mean? Is there a way to work around it? Thank you, Elisabet 


With the default WLSMV estimator, categorical variables must have the same categories in each class. You can collapse categories to achieve this. 

chuma owums posted on Tuesday, June 21, 2011  12:08 pm



Drs Muthens, I am knew to Mplus and was wondering how to interpret the Confidence Intervals of a bootstrapped indirect effect. In particular the output from my analysis shows that my result zero was not within the upper and lower limits of a 2.5% bootstrapped CI but not at .5%. do these percentages correspond to 97.5% and 99.5% confidence levels? Any help with this would be greatly appreciated. 


The confidence intervals are 95 and 99 and are interpreted in the regular way. 


Dear Linda and Bengt, In the situation where the CFA produces the following message: MINIMIZATION FAILED WHILE COMPUTING FACTOR SCORES FOR THE FOLLOWING OBSERVATION(S) : 457 FOR VARIABLE V32011 484 FOR VARIABLE V32001 860 FOR VARIABLE V32001 how does MPlus identify such cases (in this example they are cases 457, 484, 860). Can you please let me know any reference that describes about the methodology. Thanks, Amang 


Perhaps you have categorical outcomes in which case this type of iterative optimization takes place for each subject in line with our appendix 11 of http://www.statmodel.com/download/techappen.pdf The message means that for these subjects the optimum could not be found by the iterative technique, perhaps due to an unusual response vector. Also, be sure that you use the latest 6.11 version of Mplus. 


I am having a very strange problem. In SPSS one of my variables has a mean of 0 but then in mplus it says that the mean is 11. I have tried everything, including combing through the spss and text files by hand to see if there are large values or if something got read in incorrectly. I have no idea what could be going on. Thank you! 


It sounds like you are reading the data incorrectly. You may have blanks in your data set. Free format data cannot contain blanks. If you can't see the problem, please send your output, data, and license number to support@statmodel.com. 

Lena Herich posted on Friday, August 19, 2011  11:16 am



Hello ! I was reading your article “ Applications of continuoustime survival in latent variable Models for the analysis of oncology randomized clinical trial data using mplus.”. In chapter five, several different latent variable models are fit to the data. My question ist, why for the 1f, 2f and 3f models exploratory factor analysis was used and not confirmatory factor analysis. Would it also have been possible to fit confirmatory models, and what would be the differences? 


That's certainly possible, but there was not really any wellformed substantive theory behind the measurement instrument that called for specific CFAs. But note that M4 is a CFA. 

Eric Teman posted on Friday, August 26, 2011  8:31 pm



I have a 3 factor CFA (with 4 indicators per factor). When I output the results ASCII file, factor loading estimates only appear for 3 indicators per latent variable. Is there a reason that all the factor loading estimates do not appear in the ASCII file? 


The first indicator is fixed to set the metric of the factor. 

Eric Teman posted on Saturday, August 27, 2011  3:30 pm



Yes, but shouldn't the standardized output include all factor loading estimates including the one that is fixed to set the metric? 


No, only free parameters are saved. 

Nidhi Kohli posted on Wednesday, September 14, 2011  5:03 pm



I am running a CFA with 61 continuous indicators and 4 latent factors. However I get the following error message: *** ERROR Mismatched parentheses: WAI2 WITH WAI4( I have checked the paratheses and it is correctly specified. Why then I am getting this error message? Is there a way to work around it? Thank you 


Please send the full output and your license number to support@statmodel.com. 


Hello, I am running a two factor CFA on dataset containing 107 variables. We keep getting the following warning when trying to run the code: Warning: The estimation of a model with 107 variables with the WLSMV estimatory may be slow. Using the VLSMV estimator will produce more timely results. If analysis with the WLSMV estimator is desired, try specifying NOSERROR and NOCHISQUARE in the output command to reduce computation command. Here is my input. Do you see any error that could be causing this warning? ANALYSIS: ESTIMATOR=WLSMV; MODEL: PsyCog by P_088 P_023 P_085 P_101 P_022 P_007 P_019 P_059 P_107 P_054 P_020 P_061 P_004 P_057 P_016 P_097 P_012 P_052 P_011 P_064 P_056 P_006 P_018 P_051 P_065 P_105 P_089 P_047 P_008 P_021 P_111 P_090 P_055 P_118 P_108 P_099 P_106 P_046 P_122 P_049 P_084 P_045 P_048 P_095 P_067 P_014 P_015 P_017 P_109 P_058 P_050 P_102 P_063 P_091 P_112 P_113 P_119 P_041 P_003 P_120 P_002 P_060 P_062 P_121; Som by S_070 S_029 S_059 S_031 S_017 S_044 S_040 S_063 S_055 S_039 S_048 S_069 S_030 S_019 S_026 S_053 S_062 S_024 S_032 S_072 S_013 S_037 S_042 S_015 S_052 S_058 S_064 S_045 S_066 S_046 S_034 S_054 S_067 S_065 S_033 S_043 S_060 S_049 S_068 S_035 S_011 S_038 S_021; OUTPUT: standardized res; 


The warning is not due to an error. It is just telling you that with 107 categorical variables, the estimation may be slow. 


Dear Linda, Regarding an older post of yours from 2010, in my multilevel CFA I also get a warning for a negative residual variance of an item at the between level. You said that if it is very small and nonsignificant, it can be fixed to 0 or 0.001. It's actually .001 and nonsignificant in my case. If I fix it, do you think I have to report that in my paper? Is there a reference for that? Thank you! 


This will not change your solution so I would simply mention it in a footnote. See the following paper which is available on the website: Muthén, B. & Asparouhov, T. (2011). Beyond multilevel regression modeling: Multilevel analysis in a general latent variable framework. In J. Hox & J.K. Roberts (eds), Handbook of Advanced Multilevel Analysis, pp. 1540. New York: Taylor and Francis. 


Thank you very much Linda! 

yezi posted on Thursday, November 03, 2011  5:52 am



Hi, Do group factors correlate in bifactor model? Thank you very much! Kind regards Yours sincerely Ye 


No. See the input on Slide 159 of the Topic 1 course handout on the website. 


Is there a way to output standardized factor scores when using the following command: SAVEDATA: SAVE IS FSCORES; Thanks. 


No, this is not possible. 

yezi posted on Tuesday, November 08, 2011  10:41 pm



Hi, When we do ESEM, is the construc of the test ECFA(that is ESEM)? That is to say, each item load each factor? Thank you very much! Kind regards Yours sincerely Ye 


Hi, We have a factor analysis model fit in two independent groups with invariance constraints. One unique variance estimate is negative. There are no identification problems. As an experiment, we imposed a positivity constraint on the unique variance in both groups using the model constraint command. The result of that is a model that converges, but we get a message about the standard errors not being computed, and a possible identification problem with parameter 8. Yet there are only 7 free parameters; no parameter 8 exists in the parameter count. Can you tell us what is going on? Why would the model be identified without the constraint, but not identified with the constraint? 


Roger When a model is estimated with inequality constraints the constraint parameter is substituted with the so called slack parameter in your case Variance=slack*slack and this keeps the variance positive. The actual parameter in the model is the slack parameter and that is parameter 8. Typically in a situation as the one you describe this parameter will be estimated to a value that is near 0 and thus the variance itself will be estimated to 0 which is a parameter value on the borderline of admissible solutions. In that case the standard methodology for computing SE is not valid and currently Mplus would just report that as a problem. The bottom line is that in this case the standard error for the variance parameter, if estimated at the borderline value of 0, is not reliable. All other results are fine. Without looking at the exact results I am not 100% sure of the above answer so if this doesn't make sense for your situation send the example to support@statmodel.com Tihomir 

yezi posted on Wednesday, November 09, 2011  4:04 pm



Hi, When we do ESEM, is the construc of the test ECFA(that is ESEM)? That is to say, each item load each factor? Thank you very much! Kind regards Yours sincerely Ye 


The ESEM measurement model is EFA. Each item loads on all factors after rotation. 

yezi posted on Thursday, November 10, 2011  8:09 am



Hi, Thank you very much! How to estimate reliability of the test whose model is ESEM measurement model? Thank you. Kind regards Ye 


For ESEM, look at the StdY estimates for each item's residual variance. Reliability would be 1 minus the StdY residual variance. 

yezi posted on Thursday, November 10, 2011  4:30 pm



Hi, Thank you very much! For ESEM, why not look at the StdYX or Std estimates for each item's residual variance? Is reliability 1 minus the StdYX or Std residual variance? Thanks! Kind regards Ye 


You would not use StdYX because ESEM as EFA has no independent variables. You would not use Std because ESEM analyzes a covariance matrix. 

yezi posted on Friday, November 11, 2011  6:33 pm



Hi, Thank you very much! Kind regards Ye 


Dear Linda and Bengt, we used WLSMV estimator for a CFA with 3 factors. There are 11 observed variables on each factor. For 2 of the 3 factors, indicators are binary while indicators for the 3rd factor have 3 categories. A reviewer now asks how binarybased correlations were corrected. He/she claims this to be necessary whenever limited information is used in MPlus. However, we did not use limited information but raw data of item responses. Is, in this case, a correction necessary and if so, is this done automatically? Can I find any information about this in the manual or the technical reports? Thank you very much, Samuel 


I'm not sure what the reviewer is asking about. The sample statistics for model estimation for WLSMV for a model with no covariates are tetrachoric and polychoric correlations. 

WenHsu Lin posted on Thursday, March 08, 2012  1:41 am



I try to fit a simple CFA model. I have 4 observable variables, which are all categorical variables, and want to see if they load on one latent variable. However, Mplus kept giving me the following error message. What is wrong? INPUT INSTRUCTIONS data: file is c:\crime1.dat; type is individual; format is 4f1.0; variable: names are w1c1w1c4; usevariable are w1c1w1c4; categorical are w1c1 w1c2 w1c3 w1c4; missing is blank; model: dev1 by w1c4 w1c1 w1c2 w1c3; output: sampstat stand mod (4); *** ERROR The number of observations is 0. Check your data and format statement. Data file: c:\crime1.dat *** ERROR Invalid symbol in data file: "ï»¿0000" at record #: 1, field #: 1 


You seem to have a problem in the data file. Please send your output, data, and license number to support@statmodel.com. 

WenHsu Lin posted on Thursday, March 08, 2012  5:38 pm



I am using public computer to analyze it. I know it looks like I have problem in the data file. However, I check it and did not see a problem nor do the SPSS report any strange number or something like that. 


It sounds like the dataset may be saved in an incorrect format. Try opening your dataset in Excel and resaving it as a txt file. 

seefeh posted on Thursday, March 29, 2012  3:31 am



Hello, I would like to run a multiple group (males vs females) CFA using indicator values that are nonnormally distributed. I have tried a number of transformations (log, square root and reciprocal), but indicator skew is not reduced to <2. I have tried running a CFA with the indicators as count data, but this doesn't seem to work when running a multiple group analysis. Is there a way around this issue? Kind regards 


If your variables are continuous and do not have a piling up at either end, using a nonnormality robust estimator like MLR should be sufficient. If they have a piling up at either end, you can consider treating them as censored. 


I conducted a CFA on a twofactor model, where each factor had 2 indicators. The residual variances of all 4 indicators were constrained equal. I wanted to test whether a 2factor or a 1 factor solution was best thus, I ran the model once where the two factors were allowed to freely correlate, and again where I constrained the correlation between the twofactors to 1. I determined that the 1 factor solution was better than the 2 factor solution using a chisquare difference test of the nested models. Subsequently, I ran a 1factor model, with the same 4 indicators as before. The residual variances of all 4 indicators were still constrained equal. However, I found that when run as a 1factor model, the model fit indices changed significantly, as did the residual variances. Could you explain why these parameters changed when I changed the model from a 2factor model (with the factors correlated @1) to a 1factor model? Below are my syntax for reference. 2factor model: DISORG by ydiseng ychaotic; CONTROL by yrigid yenmesh; ydiseng(1); ychaotic(1); yrigid(1); yenmesh(1); 1factor model: UNBAL by ydiseng ychaotic yrigid yenmesh; ydiseng(1); ychaotic(1); yrigid(1); yenmesh(1); 


You don't show the setup where you constrain the 2 factors to correlate 1. 


Hi, I apologize for the omission. Thanks for pointing it out. When I constrain the 2 factors to correlate 1, I used the syntax below. Thanks! DISORG by ydiseng ychaotic; CONTROL by yrigid yenmesh; DISORG WITH CONTROL@1; ydiseng(1); ychaotic(1); yrigid(1); yenmesh(1); 


This setup does not constrain the factor correlation to 1, but the factor covariance. This is because your factor variances are not one but freely estimated. You can instead free all factor loadings by using * and fix the factor variances to 1. And then fix the covariance which is then a correlation. But note that the chisquare difference test is suspect here because you are on the border of the admissible parameter space, namely a correlation of 1. 


Thank you for your response. Would you recommend using a Wald test instead of the ChiSquare Difference Test, in this case? 


The same issue holds for the Wald test. You have a parameter on the border of the admissible parameter space. 

seefeh posted on Tuesday, May 08, 2012  7:15 am



Hello, I have a question regarding the Scaling Correction Factor. When running CFAs I get Scaling Correction Factors for MLR of >3 even though there is high Goodness of Fit (e.g., CFI = 0.962; TLI = 0.958; and RMSEA = 0.043). What does the high Scaling Correction Factor tell me about the model? Also  is there a rule of thumb regarding what should be classified as a satisfactory SCF value? Many thanks, Seefeh 


The scaling correction factor tells how nonnormal the data are. The larger the scaling correction factor, the more nonnormal the data. It is not a fit statistic. 


I am trying to run a multilevel CFA where I have children nested in families  less than half of the families have more than 1 child. The purose of multilevel modeling is more simply to account for dependency in the data, not make strong conclusions about factors which vary within and between. I have run the following using raw data: VARIABLE: NAMES ARE famid relrf1relrf8 fmon1fmon4 psinv1psinv2; CLUSTER = famid ANALYSIS: TYPE=TWOLEVEL; ESTIMATOR=ML MODEL: %WITHIN% relatew by relrf1relrf8; monitorw by fmon1fmon4; involvew by psinv1psinv2; %BETWEEN% relateb by relrf1relrf8; monitorb by fmon1fmon4; involveb by psinv1psinv2; OUTPUT: STDYX modindices residual; TECH4; First, Mplus doesn't seem to be recognizing my families by famid because it says "Number of groups 1". Next, I'm getting the error that the correlations among many of my items is either 1.00 or 0.994. However, when I examine a correlation matrix and none of the items appear to be correlated greater than r=.49. I've attempted examining these data as a single level model, in case the small number of clusters with more than 1 case was causing a problem, and I'm still getting the same correlation messages. Do you have any idea what might be causing this? 


You need a semicolon after the CLUSTER option for it to be recognized. For the other problem, please send the relevant files and your license number to support@statmodel.com. 


Hello, I am trying a include in a SEM model a acquiescence style like the one proposed by Billiet and McClendon (2000). I can't figure out how to impose the constraints to measure the style factor. This is also complicated by the fact that the items use different scales and are coded in different directions (sometimes a large code represents agreement, sometimes disagreement). Thank you, Alex 


We are not familiar with the Billiet and McClendon article. If you can briefly describe the model, we can try to help you impose the constraints using MODEL CONSTRAINT. 


The basic idea is to model acquiescence (i.e., responding positively regardless of the question content) using a latent variable. For this at least two balanced sets of items are needed (i.e., for some questions answering positively represents positive attitudes while for others negative attitudes). In the articles the authors used LISREL8 and the figure presented shows only a "+1" (for all the questions a higher score represented agreement and they assumed the effect to be equal for all the questions) for the relationships from the items to the acquiescence factor. Considering that I have different scales (5710 categories) that are ordered in different directions (agreement represented by a smallest code or by the highest) I was wondering if I can impose something like positive or negative relationships (eventually with the possibility of freeing the equality constraint) so I won't need to recode all the questions in the same direction. Thank you again, Alex 


You might want to take a look at Confirmatory Factor Analysis for Applied Research by Timothy A. Brown. He uses Mplus for MultiTrait MultiMethods models. You might get some ideas from his Mplus syntax that you can apply to your situation. 

Sam Hawes posted on Wednesday, August 08, 2012  8:42 am



Hello, I've ran a model attempting to identify a latent trait. The fit indices are good (CFI1.00, TLI1.00, RMSEA 0.00), but I wanted to check and see if anyone sees any problems in attempting to identify a latent trait with the following model setup. Thank you for your help. im by im1* im2 (1); el by el1* el2 (2); ca by ca1* ca2 (3); imca@1; psy by im el ca; im1 with el1 (4); im1 with ca1 (5); el1 with ca1 (6); im2 with el2 (4); im2 with ca2 (5); el2 with ca2 (6); 


If Mplus doesn't complain about the model not being identified, it most likely is. It seems that it could be. 

Sam Hawes posted on Thursday, August 09, 2012  7:01 am



Thank you for your quick response. Would it be accurate to say that the latent variables in the model represent the traitlike stable aspects of the three constructs across the two timepoints? Thank you again. 


Questions not specific to Mplus are best posted on a general discussion forum like SEMNET. 


I have a categorical (5point Likert scale) singleitem latent variable. I know I can treat the singleitem latent variable directly as observed in Mplus, but in the CFA framework this would not allow me to asses the model fit of the one factor vs. two factors solution, which is what I am after. Therefore, my syntax would look like: F1 BY y1* y2 y3; F2 BY y4; [F1@0]; F1@1; F2@0; My questions are: 1.) I know the F2 as a single item latent variable should be fixed to F2 BY y4@1; y4@a; where a = (1  reliability)* sample variance. However, is this valid also for categorical (Likert scale) variables?! 2.) In case it is, I apologize for the lack of knowledge, but is there a way in Mplus how to actually obtain the reliability and sample variance? And if there is, could you please provide me with the relevant syntax? 3.) the initial syntax works when I do not specify y4 as a categorical variable. When I do, the THETA parametrization is necessary. However, when I use the THETA parametrization, the model does not work. Is there a solution to this problem? Thank you very much in advance for your answer! 


You cannot correct for reliability with categorical indicators. To put a factor behind a categorical variable say f BY u@1; The variance of a categorical variable is not an estimated parameter in a crosssectional study so you can't fix it at zero. 

Cindy Masaro posted on Wednesday, November 07, 2012  3:44 pm



Hi Linda, I am hoping you can help me. I am running 5 separate CFAs. Each CFA has one latent factor measured by three indicators. All indicators (in each CFAs) has been measured on a 7 point scale. I have specified these indicators as categorical and used WLSMV as the estimator. For each of these CFAs I get an RMSEA=0.000, CFI=1.000, TLI=1.000. Standardized factor loadings are high with small standard errors. Residuals for covariances/correlations/residual correlations are all 0.000. The modification indices (ON statements for all indicators) all show an MI of 999.000, and EPC 0.000. I'm suspecting something isn't quite right so my question is, why am I getting 999.000 for the MI and can I put any faith in the parameter estimates and fit indices etc.? 


A factor with three indicators is justidentified. The model has no degrees of freedom so there are no modification indices. There can be no modifications to the model. 


Dear Linda, thank you very much for your prompt answer! However, I still have few questions. If I understood you correctly, you are saying to run the syntax with y1y4 categorical: F1 BY y1* y2 y3; F2 BY y4@1; [F1@0]; F1@1; When I do, I get the following warning: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE F2. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 51. Therefore, it does not work! 1.) Is there any solution to this? 2.) If I decide to treat the 5point Likert scale variable y4 as continuous and therefore F2 will be continuous, can I say F2 BY y4; y4@0; or do I have to set y4 at some other number than 0 (given that it is a 5point scale Likert variable)? 3.) And if I do have to set it to (1  reliability)* sample variance, how do I actually find out "reliability" and "sample variance". Thank you very much in advance for your answer! 


Please send the full output and your license number to support@statmodel.com. 


Hello, Is there any way to save the standardized coefficients for the fixed parameters (e.g., loading to set the scale of a latent variable) and other linked indicators (residuals, R2). Thank you, Alex 


No, we save them only for free parameters. 


Hello, I am hoping that the place I am posting is correct. I want to ask a question about a CFA study that has a sample of 3276 individuals, 191 items, 8 factors, and WLSMV estimation method. When I run it, I get an error massage that says the phisical memory of the computer is not enough (i5 processor, 8GB ram, 64 bit). Then i reduce the items to 131, again the same error. I did not try much, but the program worked when i had 77 items. I have two questions now: 1)What is the highest number of items i could analyse at a time? How can i know it? 2)the criteria to reduce the items was a previously conducted EFA study. I simply referred to the size of loadings. I exluded the items if they had a loading of .40 or lower.. Is there a better way to make this decision. I mean statistically, without considering the content background of the instrument. I never see the whole picture since I cannot run everything at the same time. So, forgive me I am a naive user of Mplus and a learner of CFA. Thanks a lot in advance regards Mustafa Yildiz 


Please send your input, data, and license number to support@statmodel.com and we will give it a stry. 

Lucy Hebert posted on Wednesday, February 13, 2013  1:57 pm



I have a 2 factor model with categorical indicators and solid sample size. I have 6 very distinct groups by sex and city and I am wanting to simply compare the loadings for my hypothesized CFA model between these 6 groups. Is it most appropriate to compare the unstandardized loadings or the standardized in this case? (All indicators have the same response options). Thank you in advance. 

Lucy Hebert posted on Wednesday, February 13, 2013  1:59 pm



And to clarify I am not doing a multi=group comparison analysis simply comparing between different strata in unique analyses. Thanks. 


It is hard to make that sort of comparison because it is confounded by group variation in factor means and factor variances. So I wouldn't use either approach. The advantage of the multiplegroup analysis is that you put the factor on a common scale. 

ywang posted on Monday, March 04, 2013  10:02 am



Hello, We conducted a confirmatory factor analysis (one factor out of three indicator variables) and link the factor to another variable using "with" command. The sample size is 54. The unstandardized correlation coefficient is signficant, but the standardized correlation coefficient is not significant. Is this inconsistency due to the small sample size? My other question is what correlation coefficient to report in this condition? Should we report standardized or unstandardized correlation coefficient? Thanks! 


Raw and standardized significance can be different because the sampling distributions of the two coefficients are different. It would be your decision which coefficient to report. 

Lucy Hebert posted on Wednesday, March 06, 2013  1:40 pm



Unless I am mistaken, Tech10 provides estimates of standardized residuals, but if every indicator I am using has the exact same scale, then wouldn't the fitted residuals (unstandardized) be indicative of strain, and if so, what is an acceptable cutoff value for that? Also, if Tech10 is only used for mixture models, so how does one get standardized residuals with a nonmixture model? Thanks in advance for your help! 


For unstandardized residuals you don't get a statistical test so the cutoff is arbitrary. You can do a singleclass mixture analysis to get Tech10. 

John Nelson posted on Monday, March 18, 2013  11:39 am



I ran a CFA using MPlus 6.12. My sample had 186 parameters. I used a randomized sample of 2,500, drawn from a sample of 5,000. My model fit very well. I now want to test this model in a sample from another country but I have only 82 in my sample. I would like to run a partial model, even just the factor loadings of the 51 items, from 10 different factors. I have read through the discussion board and searched on the web for MPlus solutions but cannot find. Could you please tell me if this is possible using my version of MPlus? Thank you! 


You say that you have 10 factors. It isn't clear to me if you have 51 parameters (loadings) in the run with n=82. You don't say if your items are categorical or continuous. A sample size of n=82 is rather small unless your items are continuous or unidimensional. 


I'm not sure how you could do this. With a sample size of 82, you should have less than 82 parameters. 

John Nelson posted on Tuesday, March 19, 2013  7:19 pm



Thank you for this response! I do agree I should have less than 82 parameters. Thus, I am seeking to minimize my parameters. My plan was to set all error variances to 1, the command, as I understand, is f1@1; f2@1; etc. That should remove 50 parameters. I then plan to remove all the correlations/covariances between factors as I am interested in confirming the CFA from a previous sample from the USA and not how correlated the factors are. My plan was to 1. Set factor variances to 1: f1@1; f2@1; etc 2. Set residual variances to 0: Q1MD  Q58EL@0; 3. Eliminate certain correlations between factors: f1 with f5  f9@0; this allows the correlation between f1 and f2, f3, f4 but eliminates the rest. All of the items in my measures/subscales use 7point Likert scales. I would be grateful for your reflection that my plan is accurate or flawed. If I am accurate, I would like to know where I can find in the MPlus documents what the command is for me to use in the syntax so I can execute this analysis. I can also not find direction in how to minimize the number of parameters in a small sample. 


I don't think fixing parameters is a good idea. You need a different model for the small group. 

John Nelson posted on Saturday, March 23, 2013  1:15 pm



I have done extensive research on this model in 10 different facilities using large samples and the model held up well using CFA. Those studies were in the USA. This study was conducted in the Caribbean and so I would like to stick with this model to see if it applies in other countries. I did try setting the variances to 1 and so on as I stated above, but it did not help. I see that parceling my factors makes sense. I did define how the indicators should be parceled. However, after I defined how the indicators should be parceled and ran the model, I received the following error: *** ERROR in MODEL command Unknown variable(s) in a BY statement: S1 I have a 2factor model with 4 parcel scores in first factor and 4 parcel scores in the second factor. I have checked and rechecked but cannot see what is wrong. Any ideas? Thanks! 


Please send the output and your license number to support@statmodel.com. 


Hi, I'm trying to confirm the structure of a questionnaire. This questionnaire has been used in four samples with n > 2000 in each sample. For most scales, a fourpoint answering scheme has been used for the single items. I use scale scores as indicators. One sample differs from the others in that for two scales, a fivepoint answering scheme and for one scale, a sevenpoint answering scheme have been used. I've been wondering whether this might lead to problems when Mplus estimates the models  does Mplus use correlations when estimating the models? Thanks for your help! 


Are you treating the factor indicators as continuous or categorical? 


Dear Linda, the factor indicators are mean scores over up to 8 items. We have therefore decided to treat them as continous. Thanks for your help. 


Then there should be no problem in estimating the model. I would not compare the items across groups that did not have the same answering scheme. 


Dear Linda, thanks for your reply  that's reassuring! Does Mplus use correlations when the factor indicators are treated as continous? Thanks again! 


The sample statistics used for model estimation for continuous variables and CFA are means, variances, and covariances. 


hi I am getting the below error meesage when i try to run a CFA *** ERROR Invalid symbol in data file: "ï»¿5" at record #: 1, field #: 1 I have tried saving the file as a txt file, manually checking the data file, altering the input from spss file to sepcify the field size rather than use tab delimited and i continue to get variants on this error. Not sure what to try next. 


Open the file in the Mplus Editor. The symbol is the first entry in the data file. Delete it and save the file. This seems to be related to a new version of SPSS. 


This worked, Thanks Linda. 


I just read the version 7.1 addendum to the manual and the invariance testing setups sound so much simpler than in version 7. Thank you! 


Hi, I ran a CFA with the mlm estimator and I don´t know the command to get the p values for the correlation matrix. Thank you for your help. 


Which correlation matrix do you mean. From SAMPTSTAT or TECH4. 


From SAMPSTAT. But I can´t find them in TECH4, either. Thank you! 


We don't provide them with SAMPSTAT. With TYPE=BASIC we give standard errors for correlations for categorical variables. We also give standard errors for TECH4 in most cases. You would need to compute the ratio of the estimate to the standard error to get a zvalue and get the pvalue from that. 

Scott Smith posted on Thursday, October 10, 2013  10:54 am



I am running a two level CFA with three composites. Two of the composites have three items each. One composite only has two items. After the initial run, one of the items in the twoitem composite had an STDYX estimate above 1 at the within level. I set it to @.001 and reran the model. Now the other item in the two item composite has an STDYX estimate above 1. Will I get valid results if I set both items in a twoitem composite to @.001, in conjunction with two threeitem composites? 


Standardized factor loadings can be greater than one. There is a FAQ about this on the website. 


I am running a bifactor model CFA, and I am interested in the estimated common variance accounted for by the general factor. Mplus does not give this value by default. How can I calculate/obtain this value? 


Perhaps you mean the amount of variance in all the indicators explained by the general factor. If the general factor variance is set at 1, you sum the squared factor loadings and divide them by the sum of the indicator variances. 


I am conducting a CFA to test for measurement invariance for whites and blacks on a scale. Here are my models: Constrained model: VARIABLE: NAMES ARE RACE ACS1ACS10; USE VARAIBLES ARE RACE ACS1ACS10; CATEGORICAL ARE ACS1ACS10; MISSING ARE ALL (999); GROUPING is RACE (1=black 0=white); MODEL: pos BY ACS1 ACS2 ACS3 ACS4 ACS5; neg BY ACS6 ACS7 ACS8 ACS9 ACS10; OUTPUT: STDYX MODINDICES; Unconstrained model: VARIABLE: NAMES ARE RACE ACS1ACS10; USE VARAIBLES ARE RACE ACS1ACS10; CATEGORICAL ARE ACS1ACS10; MISSING ARE ALL (999); GROUPING is RACE (1=black 0=white); MODEL: pos BY ACS1* ACS2 ACS3 ACS4 ACS5; neg BY ACS6* ACS7 ACS8 ACS9 ACS10; pos@1 neg@1; MODEL white: pos BY ACS1* ACS2 ACS3 ACS4 ACS5; neg BY ACS6* ACS7 ACS8 ACS9 ACS10; OUTPUT: STDYX MODINDICES; I am unsure what Mplus is constraining to be equal across groups (e.g., factor loadings, intercepts, error variances) as a default in the first model. Also, is there a way to individually constrain factor loadings, intercepts, and error variances to be equal across race so that I can conduct tests for weak, strong, and strict invariance? 


See the discussion in Chapter 14 on multiple group analysis. This should answer your questions. For the models to test for measurement invariance, see the Topic 1 course handout under multiple group analysis. See also the Version 7.1 Language Addendum on the website with the user's guide where a new feature that automatically tests for measurement invariance is described. 


I have 25 different variables and only 2 of them give a good fit. In all the rest, the x squared is significant and the RMSEA high. However, in many variables the CFI and TLI are close to 1. Can I assume that the model fits the data because of the CFI and TLI values and ignore the x squared? 


No. 

Sarah Hafidz posted on Wednesday, January 22, 2014  7:28 pm



Hi I ran a CFA model for 75 indicators into 12 latent variables. The fit indices showed that the model fit is good. However, it also came up with a warning as follows: THE MODEL ESTIMATION TERMINATED NORMALLY WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE FACTOR4. Does this essentially mean that I cant accept the model and use it for other analysis? Thanks ' 


This message cannot be ignored. It means you should change your model. Perhaps looking at an EFA will reveal the problem. 

SY Khan posted on Wednesday, March 12, 2014  5:40 am



Hi Dr. Muthen, Is there a way in Mplus to save the actual covariance matrix that the program uses rather than an abbreviated/estimated version saved through tech3 command? Is there some way that the actual covariance matrix used in a negative residual variance case in CFA can be saved? if yes, can you please guide me to the syntax to get that? Many thanks for your guidance. 


TECH3 is the matrix of variances and covariances among the parameters. See the SAMPLE option of the SAVEDATA command to save the observed variable covariance matrix. See the RESIDUAL option for the model estimated covariance matrix. 

SY Khan posted on Thursday, March 13, 2014  3:30 am



Thanks for your guidance on the above Dr. Muthen. I have been able to get the RESIDUAL option working. However, since my data are categorical the SAMPLE SAVEDATA command is giving the correlation matrix as default. I have tried using the TYPE option along with the SAMPLE option, but it is still not giving the covariance matrix. SAVEDATA: TYPE=COVARIANCE; SAMPLE=COVSAMPLE.dat; Is it possible to obtain a covariance matrix for categorical dependent variables at all? Many thanks for your time and guidance in advance. 


The covariance matrix is not the matrix analyzed for categorical variables. If you want a covariance matrix for these variables, remove the CATEGORICAL option. 

Tom Bailey posted on Sunday, April 06, 2014  8:56 am



Dear Linda I was hoping you (or someone else on the board) may be so kind as to answer an issue related to items on one of latent factors in a CFA model using the WLSMV estimator in MPlus. When I run an EFA in SPSS or MPlus or a CFA in AMOS all the item loadings on my latent variable are positive (with the exception of one). However, when I run the CFA model in MPlus loadings on this variable are now all negative (again with the exception of 1). Is there a rational explanation for this, or is it something that perhaps I am doing wrong when specifying the model? Regards Tom 


In factor analysis, this reversal can happen. You can just reverse all signs if you like to interpret the factor that way. 

Tom Bailey posted on Monday, April 07, 2014  5:04 pm



Thanks Linda Actually, once I make modifications to my measurement model, the factor loadings then revert back to the direction they were in the EFA in Mplus and the ML CFA in AMOS; so it is no longer an issue. Tom 


Hi Linda: I just completed two level CFAs involving ordinal variables using Bayesian estimation. I get PRS values, but not PPP values. Can you advise how I can obtain PPP values for these runs? Thank you. Richard 


PPP has not been developed for multilevel models. It is not available. 

Nara Jang posted on Monday, April 28, 2014  12:30 pm



Dear Dr. Muthen, I got following warning, after conducting CFA latent variable. Would you tell me how I can solve this problem. Thank you very much for your great help! WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE I16CAT. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 7. 

Nara Jang posted on Monday, April 28, 2014  1:27 pm



Dear Dr. Muthen, This is followup question. I found out the theta value of the numhsl variable in parameter specification showed "8" and the other variables had "0". So I removed the "numhsl". The model fit indices are as follows: ChiSquare Test of Model Fit Value 0.000* Degrees of Freedom 0 PValue 0.0000 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.000 Probability RMSEA <= .05 0.000 CFI/TLI CFI 1.000 TLI 1.000 WRMR (Weighted Root Mean Square Residual) Value 0.000 Would you tell me if it is correct to remove the numhsl variable. And is it OK to interpret the model is good based on the CFI/TLI value? I tried removing i16cat, but the result indicated that the negative variance/residual variance of the numhsl variable. So I keep the "i16cat" and delete the "numhsl" variable. Then the result showed as aforementioned. Thank you so much for your expert explanation in advance! 


Regarding your first question, I would need to see the output and your license number. The fit of a model with zero degrees of freedom cannot be assessed. 

Nara Jang posted on Saturday, May 03, 2014  10:15 pm



Dear Muthen, I conducted CFA with a half random sample drawn from my dataset. I would like to use weighting methods, because women and old aged people were oversampled. So I downloaded 5 zip code areas (the surveyed areas) from the U.S. census web. Would you tell me if it is correct or not that I have to use weighting method for only CFA or both total sample? The first random half sample was used for EFA. Thank you very much! 

Nara Jang posted on Sunday, May 04, 2014  5:48 am



Dear Dr. Muthen, The weighting method I posted earlier is for regression/logistic regression. Would you mind explaining weighting methods for CFA or recommend any reference regarding weighting methods for CFA? Thank you very much for your expert explanation in advance! Have a great day! 


You should use the weight variable that comes with the data set. See the Special Topic on complex survey data on the website. 

Nara Jang posted on Sunday, May 04, 2014  12:13 pm



Dear Dr. Muthen, Thank you so much for your information! 

f f posted on Tuesday, May 06, 2014  7:15 pm



Dear Prof.Muthen, I'd like to know whether I can scratch variables with absolute value less than .30 when I am using Mplus for CFA. In SPSS, there is this option. 


Mplus has no such option. 


Dear To test the factor structure of a screening tool for mental health problems (yes or no items; 7 scales but no total score; some items appear on two scales)I ran a CFA (WLSMV) : ADU by M10 M19 M23 M24 M33 M37 M40 M45; AI by M2 M6 M7 M8 M13 M35 M39 M42 M44; DA by M3 M14 M17 M21 M34 M35 M41 M47 M51; SC by M27 M28 M29 M30 M31 M43; SI by M11 M16 M18 M22 M47; TD by M9 M20 M25 M26 M32; TE by M46 M48 M49 M51 M52; And check if the MODINDICES suggest if correlations between factors are needed. The model estimation terminated normally but with warnings (nonpositive definite) Removing the items that caused trouble did not solve the problem. It The model may be too complex to test in 1600 boys Yet, I wonder if it makes sense to just perform seven separate CFAs (e.g., one input file testing : ADU by M10 M19 M23 M24 M33 M37 M40 M45; then a next input file AI by M2 M6 M7 M8 M13 M35 M39 M42 M44; etcetera). After all, and conceptually, this tool was designed to assess 7 several construct that are not supposed to load on a higher order factor. This would avoid having scales in one and the same model that include the same items, and having scales that are related to each other. So, do you think this is a strategy that makes sense? I would really appreciate your input as I have difficulties to find discussion or examples that are related to my question. Cheers Olivier 


This general modeling question is more appropriate for a general discussion forum like SEMNET. 

Hyojeong Seo posted on Friday, September 12, 2014  10:27 am



Hello Dr.Muthen, I am wondering how I can obtain the chisquare value. I ran a simple CFA and the chisquare is indicated by ***********. What do I need to do to see actual numbers on the output? Thank you for your help in advance! 


Sounds like you either have a huge sample or a very illfitting model, or both  the value is too big to print. Your RMSEA and CFI results are probably also poor. 


Hello Dr.Muthen, Thank you for your response. Yes, I do have a large sample (n = 134,984). Is there any possible way that I can do on syntax to see the chisqaure? Thank you. 


Please send the output and your data to support with your license number. 


Hi, I calculated factor scores using CFA and I need to export the data into STATA for further analysis. Can you please indicate how to do that? Below is part of my input file. Thank you! Maria Data: File is C:\Users\Mcarrasc\Documents\Maria\Latent.dat ; MODEL: cohesion BY CSPC_01* CSPC_02 CSPC_03 CSPC_05 CSPC_06 CSPC_07 CSPC_08 CSPC_10 CSPC_10A; cohesion@1; CSPC_03 WITH CSPC_02; CSPC_02 WITH CSPC_01; CSPC_06 WITH CSPC_07; OUTPUT: sampstat tech1 stdyx modindices (all); SAVEDATA: File is C:\Users\Mcarrasc\Documents\Maria\LCvul.dat; Save = FSCORES; 


The factor scores will be in the file lcvul.dat. This is an ASCII file. You will need to read that using STATA. See the STATA user's guide to see how to do this. 


I have a second order CFA model, and I would like to get a histogram of the distribution of estimated factor scores. I succeeded in getting that with the plot3 command. However, in terms of layout I prefer the frequency table over the histrogram to make it in excel in the format of the journal. Is there a command to give me the estimated factor scores in a table? 


No, there is no such option. You can save the factor scores and create the table using another software. 

Djangou C posted on Friday, January 09, 2015  7:24 pm



Hi I am doing a simulation study with ESTIMATOR=BAYES. And I am interested in the median, mean and the mode for point estimate. The default in Mplus is the median. Is there a way to get the same stat for mean and mode in simulation studies? Thank you. 


Use the POINT= option in the Analysis command. 

Djangou C posted on Sunday, January 11, 2015  1:08 am



Thank you. 

Fatih Koca posted on Monday, January 26, 2015  10:14 am



Hi I need help on that. Here is my question How I can modify this code to use effects coding method of identification and introduce phantom constructs for each of the lowerorder constructs to convert the variances covariances into standard deviations and correlations? Grouping is SchLev(1=ELEM, 2=MIDDLE, 3=HIGH) ; idvariable=ID; Missing are all (99, 777); Auxiliary = (m) auxvar1 auxvar2 auxvar3 auxvar4 auxvar5 auxvar6 auxvar7 auxvar8 auxvar9 auxvar10 auxvar11 auxvar12 auxvar13 auxvar14 auxvar15 auxvar16 auxvar17 auxvar18 auxvar19 Auxvar20 Auxvar21 ; model: f1 by IASTH1* IASTH2 IASTH3; f1@1 ; F2 BY TSGR1* TSGR2 TSGR3; F2@1 ; Model Constraint: f1=2f2; output: standardized; 


I don't understand what you want to do. Is there a reference that you are going by? 

Fatih Koca posted on Tuesday, January 27, 2015  8:01 am



Dr. Muthen, The script is above. What I want to use effects coding method of identification and introduce phantom constructs for each of the lowerorder constructs to convert the variances covariances into standard deviations and correlations. However, I really could not figure out how I can? 


I don't know what you mean by lowerorder constructs in this context. The relationship between f1 and f2 is a correlation. Perhaps you want to ask this question on a general discussion list like SEMNET. 

lee posted on Thursday, January 29, 2015  8:47 am



Hi, I am handling a single item measure. I have checked the slide 44 in topic 1 but still not sure how to calculate the reliability. How can I get sample variance, psi and reliability? Thank you 


The sample variance is obtained using Type=Basic. The reliability you have to provide  using prior information of some sort. Psi is estimated. 


Dear Prof. Muthen, I am running CFA for my path analysis. However, I received an error message as below: "WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE ATT1." I checked the ATT1 variable. It has negative residual variance. It does not look How can I fix this problem? Thanks 


If the negative value isn't large, I would fix it to zero. If it is large, it may indicate an important model misspecification. 

Cecily Na posted on Monday, May 18, 2015  10:37 am



Hi I want to do a weighted CFA and weighted SEM. Do I just add a line for weight variable "weight= "? Thanks. 

Cecily Na posted on Monday, May 18, 2015  2:15 pm



I have a followup question. When there are missing data and I need to normalize the weights, how can I do it in Mplus? 


Yes you would add a line for the weight variable "weight= ". The normalization is done automatically for you. 

JianBin Li posted on Wednesday, May 27, 2015  7:48 am



Hi, I am using Mplus 7.0 to construct a CFA model. The data are 25 observed ordinal variables. The model consists of 5 firstorder factors and an additional factor that several items are crossloaded on. Following is my input: variable: NAMES ARE nation gender period sdq1sdq25 sdqm1sdqm25 sdqp1sdqp25; USEV ARE sdqm1sdqm25; categorical are sdqm1sdqm25; model: emo by sdqm3 sdqm8 sdqm13 sdqm16 sdqm24; con by sdqm5 sdqm7 sdqm12 sdqm18 sdqm22; hyp by sdqm2 sdqm10 sdqm15 sdqm21 sdqm25; peer by sdqm6 sdqm11 sdqm14 sdqm19 sdqm23; pro by sdqm1 sdqm4 sdqm9 sdqm17 sdqm20; method by sdqm1 sdqm4 sdqm9 sdqm17 sdqm20 sdqm21 sdqm25 sdqm7 sdqm11 sdqm14; !method is not correlated with other latent factors method with emo@0; method with con@0; method with hyp@0; method with peer@0; method with pro@0; method@1; output: standardized modindices(3.84) sampstat tech4 Question: I cannot get the standardized solution of the "pro" factor. In the output, loadings of its five items are zero. How to fix this problem? Thank you in advance. 


This can be answered only by looking at your full output  please send to support along with your license number. 


Dear Prof. Muthen, I am using Mplus 6.1 to construct a CFA model. The data are 12 observed ordinal variables. The model consists of 3 firstorder factors. Because of the ordinal variables, I am not sure whether I have to use Maximum Likelihood (ML) estimation, or, as proposed by DiStefano en Morgan (2014), the Weighted Least Squares — Mean and Variance adjusted (WLSMV).Which one do you prefer, and do you know a reference to support this decision? Thank you very much. 


For the choice of estimators, see our FAQ: Estimator choices with categorical outcomes You should update to version 7.31. 


Dear Prof. Muthen, Thank you very much for your prompt reply. ML seems to be the best choice. After running a CFA and looking at the Modification Indices, I want to add an additional parameter between two items within one factor, in order to improve the model fit. In the user's guide I cannot found the proper command to do so. Could you please help me with this? Thank you very much. 


y1 WITH y2; But if you use ML with categorical items this won't work unless you say Parameterization = Rescov; I think you have to use Type = mixture for this and you would say classes = c(1), etc. See the Mplus Version 7.2 Language Addendum. 


What estimator should be used in the case of a second order factor analysis model where all the observed variables are binary? Also, is it possible to weight the data (which is household survey data), or should it be weighted prior to inputting it to Mplus? When trying to weight the data, I receive the error message that "Categorical variable SL6 contains noninteger values". SL6 is the last variable listed just before the weight variable. 


Q1. WLSMV, ML, or Bayes. Q2. Yes. Send problematic output to Support. 

Ejlis posted on Saturday, June 13, 2015  1:06 pm



Hi! I have run a cfa with three correlated factors (wlsmv estimation). Why is it that when I run the same model but as hierarchical, this produce the very same fit results as the correlated model? How, then, to choose between the two? 


You can't choose on statistical grounds since they produce the same correlation matrix. It is just two ways at looking at the same thing. We have this situation often  for instance, with EFA and correlated vs uncorrelated factors. Go with whatever alternative is most useful to you. 


Hi, Is it possible to set minimum and maximum item loadings in CFA using Mplus? For instance, I would want one item to load with a .7 or higher on a latent factor and another to load with a .5 or lower. I am trying to simulate several models for fit comparison. Thank you very much. 


You can do this using MODEL CONSTRAINT. See the user's guide for further information. 

Yanxia WANG posted on Tuesday, August 11, 2015  1:32 am



Hi Professor Muthen, I tried to do a CFA by using Mplus, however, the result keeps warming that "unexpected end of file reached in data file". I checked the related response before, and then checked the number of variables in the "names" part and found that it was as the same as the column of variables in the data set. I really do not know how to deal with this issue, please help me to figure it out. Thanks so much! YX 


It sounds like you have blanks in the data set and are reading it free format where blanks are not allowed. If you can't see the problem, send the output, data set, and your license number to support@statmodel.com. 

Yanxia WANG posted on Tuesday, August 11, 2015  11:38 pm



Thanks Muthen, I have already fixed it through deal with missing value in the data file. Still thanks for your reply. YX 

Robert Buch posted on Monday, October 19, 2015  12:16 am



Dear Dr. Muthen, when using: Analysis: Type = COMPLEX; MODEL=NOMEANSTRUCTURE; ESTIMATOR = wlsmv; I obtain the SRMR fit, but when adding: CATEGORICAL =.... Then I no longer obtain the SRMR.. Is there a way to obtain it when using the "categorical=" option? From reading an earlier post I thought "MODEL=NOMEANSTRUCTURE;" would do the trick, but seems this does not help? 


With CATEGORICAL outcomes, SRMR is available only when there are no thresholds and no covariates. If this is your situation and you don't get SRMR, please send the output and your license number to support@statmodel.com. 


(I accidentally posted this in the "mean structures" thread, but I cannot figure out how to delete that comment; apologies) Greetings, I am using Mplus to test the theorized factor structure of a 6item unidimensional measure. The items are rated on a 15 scale with strongly disagree <> strong agree anchors. My question is this: is there a minimum percent of my cases that need to select a given answer option (e.g., "1: Strongly Disagree) in order to assume that my data is continuous? I.e., say only 2% of my cases indicate "1: Strongly Disagree" for item 1. Is this a problem? What about if only 2% of my cases indicate "1: Strongly Disagree" for the measure as a whole? I have heard that there is a rough rule around 5%, but I have yet to find a citation for it. I.e., I have heard that if less than 5% of my cases indicate a response (e.g., "1: Strongly Disagree"), that A) I can no longer assume my data is continuous, and B) that I should combine that response with another (e.g., combine "1: Strongly Disagree" with "2: Disagree" to create a "Disagree" category). I would appreciate any input on this. Thank you. 


If you have floor or ceiling effects, you should treat the variable as categorical. If you don't, you can treat it as continuous. If you have small frequencies, you can collapse categories. 


I am working with a model that is consolidated in the literature, showing good model fit adjustments in several crosscultural studies. I'm trying to test this model (18 Observed variables split equally into 6 latent variables) with my data (22.000 subjects divided into 20 countries), but the model fit indices are not being satisfactory, and I'm not sure whether I am using the correct approach to analyze it. One of the differences between my data and the data which the model has been tested in is that I have a greater variability of age and I'm not sure if it's composing a heterogeneous population, and consequently influencing the model fit index (as for the construct that I'm working with variations in the scores during the life span are expected). I have tested the model with robust estimators for nonnormal samples (MLR, WLS) and I have tested the model using age as a covariate as well (MIMIC). However, I did not get improvement in the model fit. I wonder if it would be reasonable to test the CFA MIXTURE MODELING for this case. I appreciate any comments. 


It sounds like you are using a model that was validated on a sample from a different population. I am not sure mixture modeling would help here because the population it was validated on was not unobserved. Try an EFA to see if the CFA you are using is close the what the data show. 

Back to top 