I need to do a confirmatory factor analysis on nested data (days within individuals). Can Mplus do this for me? Is it possible to do such a factor analysis with different numbers of data points from different individuals?
Yes, that can be done in several ways in Mplus if your outcomes can be viewed as continuous. First, you can do it as multilevel factor analysis, see
Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376-398.
Second, you can treat the days within individuals as multivariate outcomes in a single-level model, just as is done in growth modeling.
The issue of different number of days within individuals is not a problem in either approach. In the first approach, this means different cluster sizes. In the second approach this means missing data.
Anonymous posted on Wednesday, May 31, 2000 - 11:37 am
You memtion on the website that with MPlus a MIMIC model can include indirect effects, the effect of a background variable on a factor indicator via the factor. Is this illustrated in the Mplus manual? If not is it a straight froward procedure. Many thanks.
There are examples of MIMIC models in Chapter 21 of the Mplus User's Guide. Indirect effect parameters are not automatically calculated in Mplus. You would have to multiply the factor loading by the regression coefficient to obtain the parameter estimate. The standard error of the estimate would have to be obtained using the Delta method.
melady posted on Sunday, August 13, 2000 - 9:01 am
I have done a factor analysis with disaggregated data using different cluster sizes. What I would like to do now is see whether the factors hold across clusters. I am thinking I could compare the goodness of fit for a confirmatory analysis using all the data without regard for clustering to the goodness of fit for a confirmatory factor analysis using the cluster as a part of the model. Is this reasonable?
Anonymous posted on Wednesday, February 21, 2001 - 7:59 am
I am doing CFA on a model with three latent variables with four or five paramaters each. Nevertheless, my model is not identified. I have fixed the starting value of each paramter to 1 and correlated the three latent variables. There is prior research using this scale which suggest that at least two of the factors are highly correlated. Would fixing the correlation between factors help identify the model? Are there other reasonable options for overcoming the model identification problem? Thanks for your time.
If you have 12 to 15 factor indicators which I think is what you are saying, a three factor model should be identified. Fixing all of the starting values to one is not a solution. Have you perhaps freed the factor means? They must be fixed at zero (the Mplus default) for the model to be identified. Otherwise, I would need to see your output to fully understand your problem.
melady posted on Tuesday, March 06, 2001 - 2:34 pm
I wish to conduct a factor analysis with data on specific individuals provided by multiple informants. How can Mplus help me to do this? I have the program, but I don't fully understand what I'm doing with it!
I would like to use Mplus to estimate a non-linear relationship among latent variables [interaction]. Joreskog and Yang (1996) demonstrated such a model can be estimated using SEM if an observed product variable is used as an indicator of the latent product variable. Bollen (1995) used a two-stage least squares with instrumental variables to estimate the interaction. How can I use Mplus to estimate an interaction model?
Mplus cannot do what Joreskog and Yang demonstrated because it requires non-linear constraints. The Bollen approach can be done in Mplus but it is not directly implemented. It would have to be done in a series of steps.
Anonymous posted on Thursday, May 10, 2001 - 3:57 pm
I read in the Mplus 2.0 manual (page 346, bottom) that: "When x variables are present, the conditional normality assumption allows non-normality for y* as a function of non-normal x variables".
I also note that in his chapter "Some Uses of Structural Equation Modeling in Validity Studies: Extending IRT to External Variables" Bengt writes (pg 218, middle): "We will add the multivariate probit assumption that y*|x is multivariate normal. Note that this does not mean that we assume normality for the y*'s or for the [Eta], but normality is merely required for the [residual and error terms]. The distribution of [Eta] and the y*'s is actually to some extent generated by the x's".
I.e., in a CFA "in isolation" with ordered categorical indicators but no background (x) variables, one assumes that the latent y*'s are normal (but makes no assumptions about y* variances). However, introducing covariates relaxes or makes less crucial the assumption of the y*'s normality ?
How is this so ?
Does "non-normality of y*'s when x's are introduced" hold for all Mplus estimators (i.e., WLS as well as robust WLSMV, etc) ?
No. The reason your r-square values increase is that the variables that you are adding explain the variability of the indicators. The r-square values are not dependent on which estimation method you use but on the model.
Anonymous posted on Wednesday, June 13, 2001 - 2:48 pm
I have a question concerning the r-squares that are in my output. I have dichotomous dependent variables, so are the r-square values equivalent to adjusted r-squares? thanks.
I saved the factor scores from a CFA (using the "SAVE = F_SCORES" option) and read them into another software program. When I calculate simple statistics on the saved data, the means & variances of my latent variables do not match the means & variances shown in the M-plus "Tech4" output. Not even close, really. Why not? My goal, if it matters for your answer, is to create some kind of residual that represents the unexplained variance of an indicator (that is, the variance not explained by the latent variable on to which the indicator was loaded)
Be sure to include TYPE=MEANSTRUCTURE if you are using Version 2.0. Means were not turned on automatically in that version. Subsequent updates have corrected this problem. If this does not help, please send data and output and I will take a look at it.
Anonymous posted on Wednesday, November 14, 2001 - 1:21 pm
I'm trying to save factor scores in CFA. The output file has all the variables on the usevariables list and the factor scores appended at the end. I'd like to merge the factor scores with a larger data set, but am not sure how to save an identifying variable with the factor scores. When I include an identifier on the usevariables list, the model doesn't converge.
You cannot do this in Version 1. In Version 2, there is an IDVARIABLE = statement in the VARIABLE command.
Jef Kahn posted on Friday, November 16, 2001 - 9:29 am
I am estimating a CFA with 4 factors and 49 items using ML. The fit of the model is not very good. However, when looking at the modification indices I notice that dozens and dozens of the fixed factor loadings have MI values of 999 with a StdYX S.P.C. of 0. What does this mean?
I would suggest using the usual methods for continuous variables.
Anonymous posted on Thursday, December 06, 2001 - 5:53 am
1: I have run a multiple group confirmatory factor analysis on categorical data (3 factors, 3 groups) using meanstructure and save fscores in vers. 2. Can you explain in words how the scores are estimated? (I have trouble understanding the formulas in the manual pp 385-6.) 2: Is it reasonable to use multiple group analysis (default settings) to compute scores for the same individuals on 3 points in time (= 3'groups') to study changes in scores?
bmuthen posted on Thursday, December 06, 2001 - 5:51 pm
A factor score is an estimate of an individual's most likely value on the factor given both the estimated model and the individual's observed variable values. For example, in measuring an ability dimension with multiple-choice items, an individual who got many items right gets a high estimated score, but the estimated score is also affected by group membership so that the estimated score is lower for a group that has a lower estimated factor mean. This is in line with Bayesian estimation where the estimated model is the "prior" to which information on the individual's data is added to get a "posterior". The estimated score is the maximum of the posterior distribution.
Multiple-group analysis is concerned with independent data from the different groups. Longitudinal data do not give independent data from the different time points. You can, however, use (longitudinal) factor analysis with across-time measurement invariance to estimate and compare scores over time. To study change in score over time, however, it would seem advantageous to use growth modeling.
Anonymous posted on Tuesday, December 11, 2001 - 11:59 pm
I am trying CFA on ordinal data using WLS estimator. I have constructed the second-order factorial model, consisting of five first-order factors and twelve observed variables. Of these five first-order factors, two had only one observed variable. For model identification, we have tried to constrain 1 to the path loadings from these first-order factors to observed variable. In addition, we constrained 0 to the error variance of these observed variables. However, this model could not be identified. Can I identificate this type of model in M-plus?
If the factors have only one indicator, don't create a factor. With categorical outcomes, the residual variances are not parameters in the model, so this changes things from the continuous case. Just use the indicator as a indicator of the second-order factor. Let Mplus do what is necessary. If you ask for TECH1, you will be able to see which parameter is causing the identification problem. It is most likely the residual covariances among the first-order factors which need to be fixed to zero for identifiability. If you cannot solve your problem this way, send me the output including TECH1 and the data and I will take a look at it.
JND posted on Tuesday, December 18, 2001 - 9:55 am
I hate to express my ignorance publicly on this . . . .
I ran the CFA with three latent variables and 28 observed variables (per theory). Only five of the estimates were less than 1.0. (None were negative.)
How do I attack this?
bmuthen posted on Tuesday, December 18, 2001 - 10:29 am
The unstandardized loading estimates do not have to be less than one; this depends on the scale of the y. The Stdyx standardized values can still be less than one. If you fix to one the loading that has the largest unstandardized estimate in your run, you will most likely get unstandardized loadings less than one.
Anonymous posted on Monday, March 18, 2002 - 10:11 am
Is there a way to check the multivariate normality assumption in ML estimation in Mplus?
Anonymous posted on Monday, March 18, 2002 - 12:20 pm
I am estimating a three factor model with ML MLM and WLS. My 18 observed variables are likert-type and is multivariate non-normal (Mardia's coefficient normalized estimate = 160). As expected MLM chi-square estimate 2272.9 (df=125) is lower than ML estimate 2750.966, other fit indices also behave similarly since they are corrected for non-normality (scaling correction factor (1.21). But when I ran the same model with WLS estimation method the results are considerably different. Chi-Square Value 1326.607 (df=125) Both CFI and TLI is quite lower than their previous estimates of (.94; .92) CFI: 0.718 TLI: 0.655 although RMSEA is lower with WLS compared to MLM (.061) WLS RMSEA: 0.046 Which results should I trust in this case? Is there a better method than the ones I used? Thank you very much for you help.
Thank you for sending the outputs. Getting such different results with is unusual. There are a couple of things that may be going on.
1. WLS is not suiited for many observed variables like the 18 you have. See Muthen and Kaplan, 1982.
2. You may have hit a local solution with ML and MLM. You might want to try the WLS estimates as starting values and rerun these analyses. It may be that a local solution is also the problem in the baseline model. Note that for ML and MLM, the chi-square for the baseline models is about ten times as large as for the target models, while for WLS, the chi-square for the baseline model is about four times as large as for the target model.
3. You might consider doing a simulation study to compare the three estimators for a problem similar to yours and see which estimator behaves best in practice. This in the only way to really know which outcome to trust.
Anonymous posted on Tuesday, March 19, 2002 - 11:43 am
Thank you very much for your answers to my post dated March 18. If I may, I would like to follow up on your answers. I tried your second suggestion and ML or MLM estimates and fit indices did not change. But maybe the problem lies in the baseline model as you suggested. Is there a way to fix the local solution problem for the baseline model for ML or MLM? Is the local solution problem the case when the iterations converge when they hit a plateau even though it is not the best one?
With regard to your first answer, I tried to look up the reference but I wasn't able to locate the exact one. I was able to locate the following two references but their years are different. I am guessing you meant one of these: 1. Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189. 2. Muthén, B., & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.
I don't have a lot of knowledge on doing simulations. Are there any sources that you can recommend me to read on this topic?
thank you very much for your help.
Anonymous posted on Tuesday, March 19, 2002 - 11:46 am
Thank you very much for your answers to my post dated March 18. If I may, I would like to follow up on your answers. I tried your second suggestion and ML or MLM estimates and fit indices did not change. But maybe the problem lies in the baseline model as you suggested. Is there a way to fix the local solution problem for the baseline model for ML or MLM? Just to make sure I understand, is the local solution problem the case when the iterations converge when they hit a plateau even though it is not the best one?
With regard to your first answer, I tried to look up the reference but I wasn't able to locate the exact one. I was able to locate the following two references but their years are different. I am guessing you meant one of these: 1. Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189. 2. Muthén, B., & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.
I don't have a lot of knowledge on doing simulations. Are there any sources (e.g., previous simulation studies) that you can recommend me to read on this topic?
You could run the baseline model yourself. It consists of means and variances only, no covariances. I am not sure that a local solution is the problem however. And yes, your understanding of what a local solution is correct.
I apologize about the reference. It is the 1992 paper.
Our website will have a new feature in the next couple of days. It will be called Mplus Web Notes and will show how to use simulations to answer research questions. This may help you.
Anonymous posted on Monday, June 03, 2002 - 11:37 am
I have conducted a confirmatory factor analysis using dichotomous indicators on a very large sample (n=8008). I would like to use the results of this model to approximate factor scores for individuals not in the analysis dataset. I understand that strictly proper factor scores cannot be estimated through a factor score coefficient matrix for models with categorical indicators - they must be iteratively obtained. I am not a mathmatician and could not create programming language to do this operation for individual cases as they come up. Is there a way I can fudge a "good enough" estimate of factor scores using the Mplus output for this particular application?
bmuthen posted on Tuesday, June 04, 2002 - 6:35 am
It should be possible to use your estimated model from the n=8008 as fixed parameters in a new analysis where you enter the new individuals and get factor scores the correct way. In this new analysis you don't estimate any parameters but only estimate factor scores.
Anonymous posted on Tuesday, June 04, 2002 - 7:06 am
Is there any way to use the results from my estimated model to construct factor scores on a case-by-case basis? I need a way to convert responses to factor scores as an assessment tool for nonscientists to use with individuals.
Anonymous posted on Tuesday, June 04, 2002 - 7:08 am
BTW, the users of the new instrument will not necessarily have access to Mplus, so I wanted to come up with a solution that can be used outside the program.
bmuthen posted on Tuesday, June 04, 2002 - 9:30 am
You can do the factor score estimation on a case-by-case basis by the approach I described; so using an n=1 analysis with fixed parameters. Regarding your last question, the factor score estimation with categorical outcomes is an iterative process, so there is no explicit formula such as a factor score coefficient matrix that can be used to easily obtain factor scores. The only approximation would be to ignore that the outcomes are categorical and treat them as continuous, but that would seem to forfeit the purpose of your analysis.
Rich Jones posted on Wednesday, June 05, 2002 - 9:26 am
Couldn't Anonymous generate a data set where each record was one of all 2^p combinations of the items in the instrument (where p is the number of items), and then estimate factor scores for each record with fixed parameters as in the n=1 case. The result would be a table of response patterns and factor scores conditional on the parameter estimates returned from the n=8,008 model. Anonymous could then distribute this table to other investigators interested in using the new instrument or prepare a simple program to associate response patterns with the appropriate factor score.
Yes, this could be done. It's a good idea. Thanks for the suggestion.
Anonymous posted on Wednesday, June 12, 2002 - 6:48 am
Thanks, Rich! Nice problem solving! I'll try that approach.
Anonymous posted on Thursday, June 27, 2002 - 1:40 pm
I'm doing a CFA with ordered, categorical indicators. The CFA model by itself fits quite well, but when I go to include it in a fuller SEM, some of the scale factors become very small and insignificant. If the scale factors are related to error, a scale factor of zero doesn't seem to make sense. Should insignificant scale factors be cause for alarm ?
Anonymous posted on Thursday, June 27, 2002 - 1:49 pm
I have a latent variable called participation measured by 6 items asking individuals if they have participated in 6 different activities within the last two years. Activities are not neccesarily correlated with each other. These six items are considered to be "causal" indicators (formative model, or spurious model) rather than "effect" indicators (reflective model) for the dependent variable (Bollen & Lennox, 1991; Edwards & Bagozzi, 2000). This implies I need to have an unobserved latent variable (Blalock, 1971) in my model. Can Mplus handle models with such unobserved latent variables?
Thank you very much. Blalock (1971). Causal models involving unobserved variables in stimulus-response situations. In H. M. Blalock (ed.) Causal models in the social sciences (pp.335-347). Chicago: Aldine. Bollen & Lennox (1991). Conventional Wisdom on measurement: A structural equation perspective. Psych Bull, 110 (2), 305-314. Edwards & Bagozzi (2000). On the nature and direction of relationships between constructs and measures. Psych Methods, 5 (2), 155-174.
bmuthen posted on Thursday, June 27, 2002 - 3:43 pm
Scale factors close to zero correspond to latent response variable (y*) variances that are very large. This can certainly be a sign of model misspecification.
bmuthen posted on Thursday, June 27, 2002 - 3:49 pm
I think by "causal" indicators you are referring to a situation where indicators are influencing rather than being influenced by a latent variable. Yes, Mplus can handle this situation. Although you don't have a factor indicator in the usual sense, you can always say
factor BY anyvble fixed at 0
factor on x1-x5
where x1-x5 are your "causal" indicators and anyvble is any observed dependent variable in the model. Don't forget to include the identifying restrictions that this type of model requires. If I remember correctly off hand, this involves one of the slopes on x fixed at 1 and fixing the factor residual variance at zero, but you'd better check this.
Anonymous posted on Wednesday, September 11, 2002 - 7:15 am
You write that a CFA requires at least m*m restrictions (m is the number of factors) to be identifiable. Is it possible to give a more precise definition of when a CFA is identifiable or when it is not? Is there any general rules for when a CFA is identifiable / non-identifiable?
bmuthen posted on Wednesday, September 11, 2002 - 8:50 am
This is a big topic. A good treatment is in the reference off of the Mplus web site Reference list:
Joreskog, K.G. (1979). Author's addendum. In Advances in Factor Analysis and Structural Equation Models, J. Magidson (Ed.). Cambridge, Massachusetts: Abt Books, pp. 40-43.
Apart from that, rules of thumb are generally helpful. For instance, having 3 indicators per factor. Or, 2 indicators per factor if there is more than one factor.
Anonymous posted on Friday, October 11, 2002 - 12:16 am
I am estimating a CFA model with about 10 3-level indicators with 2 latent variables on about 1000 subjects, with 5 indicators per latent variable. I fix the latent variables to be uncorrelated. However, when I compute factor scores, I get a decidedly non-zero correlation between the factor scores (about 0.5), and the scores do not have mean zero and variance 1. I understand that the factor score empirical distribution is necessairly discrete according to the pattern of indicator values, but shouldn't the scores be uncorrelated? If not, what is the meaning of correlated factor scores from a model with uncorrelated factors?
bmuthen posted on Sunday, October 13, 2002 - 10:44 am
The distribution of estimated factor scores does not have the same means, variances, and covariances as the factors themselves. This is shown for example in the Lawley-Maxwell factor analysis book (look under the "regression method"); for ref. see the Mplus web site. With many good indicators (high loadings), however, the estimated factor scores tend to behave more and more like the true scores. This is true whether your indicators are continuous or categorical. Your correlation of 0.5 between estimated factor scores seems high however, unless your indicators are weak. I think you are saying that you have 3-category indicators. Perhaps their loadings are quite small. If you like, you can send your Mplus input, output, and data to firstname.lastname@example.org
I am introducing MPlus to my CFA class and have not been able to find any studies in which the WLSMV, WLSM, and WLS estimators have been compared in terms of bias in standard errors and chi-square values and/or efficiency. Are any such studies available?
bmuthen posted on Monday, November 25, 2002 - 10:59 am
These two reference cover this (I'd be happy to send them):
Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Accepted for publication in Psychometrika. (#75)
Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In K. A. Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205-243). Newbury Park, CA: Sage. (#45)
Muthén, B., & Kaplan D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.
Muthén, B., & Kaplan D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.
Mpduser1 posted on Tuesday, September 09, 2003 - 3:02 pm
Those might be the ones; I'll give them a try.
I've actually read the piece I'm attempting to refer to -- M+K do a series of simulations and find that the Muthen estimator (whatever it was called at the time, but I assume its the WLSMV estimator now) provides superior results over WLS, GLS.
I am planning on fitting a CFA on a large sample and obtaining parameter estimates. I then plan to use/fix those parameter estimates in a CFA on another small sample and obtain factor score estimates based on the original large sample parameter estimates. I haven't seen this done, so I am wondering if there is a logical/technical problem with doing this?
Thanks. Yes the small sample is very much like the large sample (actually a seemingly representative subset of the large sample) but the small sample has an interesting criterion variable available. I want to know the relationship between the factor scores and the criterion variable. The large sample is big enough to let me estimate the CFA parameters, while the small sample is not.
Drs. Muthen & Muthen, I am conducting a CFA with categorical indicators (using WLSMV and Type=missing). I am sometimes obtaining modification indices = 999.000. I've read on the discussion board that this is because the modification index is indeterminate (zero or near zero denominator). My question is: Does this mean that the parameter estimate for my model are suspect? (The fit of my model is good - chi-square is non-significant.) Or should I just ignore these modification indices and retain the model since the model results make sense to me? Thanks, Scott
bmuthen posted on Saturday, May 22, 2004 - 10:03 am
No, this most likely has no implication for quality of estimates or the model - just ignore.
Lieven posted on Thursday, October 14, 2004 - 3:06 am
I have 1056 variables (equity portfolios) with the number of observations ranging from 10 to 95 (monthly dollar returns). For some variables the covariance is missing because the observations may not overlap in time. These missing covariances can be set to zero. There are 74 factors. 1 world-factor: each variable can have an unrestricted loading on it; 39 country-factors: each porfolio belongs to a specific country and can have an unrestricted loading to only one country-factor, the rest zero; and 34 industry-factors: each portfolio (variable) belongs to a specific industry and can have an unrestricted loading on only one industry-factor, the rest zero. I have initial values, coming from a two-step regression methode, for the factors and the loadings. Our aim is twofold: (1) getting estimates for the loadings and the factors in one-shot (2) testing whether the 3 loadings per variable are equal. A reliable estimate of the fit of the model is less important. Is it possible to perform such an analysis?
Mplus has a limit of 500 variables. This is an arbitrary limit but it neverthless is there in the current version. An analysis with considerably more variables than observations is difficult to carry out particularly in terms of getting good inference, in your case, testing of equalities. The situation is similar to that of recent factor analyses of microarray data; see, for example, the work of Geoff McLachlan for some recent developments in this area.
dana posted on Thursday, November 18, 2004 - 7:00 am
I would like more information about the fact that 'with categorical outcome, the residual variances are not parameters in the model'. So that is different when we have continuous outcomes, right? I wonder why we can't estimate the residual variances?
I read on this article: B. Muthén (1978). Contributions to factor analysis dichotomous variables. Psychometrika, vol 43, no 4, 551-560.
And on page 552, it is said that there is one 'necessary restriciton to make since there is no possibility to identify the diagonal elements of sigma, only observing dichotomous variables'. I still don't understant why it is not possible to identify the diagonal elements, ... Can you please explain me this?
So if we need to make this necessary restriction (which is: diag(sigma) = I ), does this mean that to know the psi covariance matrix of the errors, I need to do this calculation: psi = I - diag((lambda)*(phi)*(t(lambda)))
where psi = covariance matrix of the errors I = identity matrix lambda = matrix of factor loadings phi = covariance matrix of the factors t(lambda) = transpose of the matrix lambda
if yes, does this mean that we can't fix or let free any elements of the psi matrix?
Thanks for your help!
bmuthen posted on Thursday, November 18, 2004 - 1:50 pm
Look at technical appendix 1 for details about this from the perspective of probit regression. The issue arises from a binary variable having mean p (the probability) and variance p (1-p), which means that there is not separate information about the variance beyond information about the mean. This is different than for continuous variables.
Yes, you compute the residual variance as in the formula you give.
You can only estimate a residual variance if you work with longitudinal or multigroup data, where you have equality of measurement parameters - see Mplus web note #4.
dana posted on Friday, November 19, 2004 - 7:46 am
Thank you very much! I'll read the paper! Many thanks!
Anonymous posted on Tuesday, November 23, 2004 - 3:41 am
What is the difference between using WLS and MLR when indicators in a CFA are categorical?
Add H1 to TYPE = MISSING and I think you will get what you want.
Anonymous posted on Wednesday, November 24, 2004 - 5:48 pm
I have another question. What is the meaning of this warning:
THE MODEL ESTIMATION TERMINATED NORMALLY
WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE F2.
thanks a lot
dana posted on Friday, November 26, 2004 - 12:10 pm
I would like to add another question about the psi matrix (message from 18 november 2004 Following the formula: psi = I - diag((lambda)*(phi)*(t(lambda)))
where psi = covariance matrix of the errors I = identity matrix lambda = matrix of factor loadings phi = covariance matrix of the factors t(lambda) = transpose of the matrix lambda
So that means that the errors are not correlated! Because as i understand the equation, the identity matrix is for example like: 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
If yes, I thought the CFA can allow correlated errors, ...!!
bmuthen posted on Friday, November 26, 2004 - 12:24 pm
Correlated errors are allowed. Your computations above are only for the diagonal of the matrix you call Psi, not the off-diagonal elements. See the Mplus Technical Appendix 2, formulas 42 and on (where the residual cov matrix is called Theta) for a more precise formulation.
dana posted on Friday, November 26, 2004 - 12:37 pm
just to know: can Mplus give me the covariance matrix for the errors in the ouput? Because it seems a little bit complicate to calculate it, ...
bmuthen posted on Friday, November 26, 2004 - 2:15 pm
The residual covariances are parameters that can be identified and estimated in the model. Just say e.g. u1 with u2.
dana posted on Monday, November 29, 2004 - 5:03 am
So if I understand well: 1. In a CFA with binary indicators, I can't estimate the errors variances but I can calculate it with the formula above.
2. But I can estimate the errors covariances by writing in the program "u1 with u2"
3. So if I don't write "u1 with u2", that will mean that the errors between u1 and u2 are uncorrelated?
Scott Engel posted on Thursday, December 09, 2004 - 12:01 pm
I ran a CFA with categorical factor indicators on 25 items with four scales. I got an excellent fit for this model. I followed this up with a second order CFA to determine if these four factors are well represented by a single latent construct. This model also fit well and it appears that a single latent construct does, in fact, represent the four scales well. My question is this- given that I’ve run nonlinear factor analysis, can I use a simple sum score of the scales as a total score? Or, do I need to consider weighting of the scaled scores based upon how they loaded on the second order latent construct? For practical reasons I’d prefer to use a simple unweighted score. If the weighted scores are ideal, is using the unweighted scores defensible and an adequate rough approximation of the latent construct?
bmuthen posted on Thursday, December 09, 2004 - 5:10 pm
The optimal approach would be to get the estimated factor scores for the second-order construct. You get this by using the FSCORES option of the SAVEDATA command.
A sum of the 25 items is probably a decent rough approximation of the factor score. But note that you then revert to assuming interval scale for the four categories and also don't take into account the differential loadings on each of the two levels.
A student is interested in doing a confirmatory factor analysis with categorical variables and has specified several items with multiple thresholds in a large sample. She wants to accomplish a change to her confirmatory model in which error variances are covaried. How can one do this, given the variables are categorical? A search of the manual revealed that it's possible to specify that thresholds are correlated. Does this accomplish the desired correlation between error terms? I'm somewhat unsure of what correlating thresholds means and where I'd read more about it. Given that the data are polytomous, what would it mean, by the way, if one specified a covariance between one of the thresholds and not the other between two manifests? As an interim workaround, I told the student to model the desired error covariance as a factor with two indicators, but of course, it would be good to know what the best answer is. thanks!
BMuthen posted on Wednesday, January 12, 2005 - 4:33 pm
Residuals can be correlated using the default WLSMV estimator. These are the residuals for the underlying y* variables. The thresholds are not correlated. They are not random variables.
Technical output 13 provides univariate, bivariate and multivariate sample skew and kurtosis. This output is available for mixture models only. None-mixture models can be estimates as a single class mixture to get that output.
Earlier I mentioned that I will be doing a CFA with 24 binary outcomes and 1 latent variable. It was suggested that I use WLSMV instead of WLS and that nested models should be compared using chi-square difference. On the Mplus output, however, a warning message says that "The chi-square value for MLM, MLMV, MLR, WLSM and WLSMV cannot be used for chi-square difference tests." It further says that MLM, MLR and WLSM chi-square difference testing is described in the Mplus Technical Appendices. How about WLSMV? Thank you!
You need to use the DIFFTEST option to do chi-square difference testing using WLSMV. See the Mplus User's Guide for a description of the DIFFTEST option. See Example 12.12 for an illustration of this option.
Anonymous posted on Thursday, March 17, 2005 - 10:39 pm
I am trying to reconcile my understanding of web note 4 with page 480 of the manual. My model is MODEL f1 BY y1-y7 f2 BY y8-y15 f1 ON x1-x10 f2 ON f1 x1-x15
Where y1-y15 are ordinal, f1 and f2 are continuous, and x1-x15 are a mix of continuous and categorical.
My question is: are the output coefficients for the “ON” part of the model simple linear regression coefficients or probit (WLS) / logistic (MLE) regression coefficients? From web note 4 and a technical appendix, I started thinking the “ON” portion is a “latent response variable formulation” and could be viewed as a simple linear regression with continuous outcome since the latent variable is continuous. From p 480 of the manual, I’m not sure this is the case.
Either way, can I assume the parameter estimates are normal and use the estimates with their std errors to get p-values?
Another question- should I expect to see a coefficient on the output for f1? Thank you.
Because f1 and f2 are continuous latent variables, the coefficients are simple linear regression coefficients. The factor loadings are probit regression coefficients.
The ratio of the parameter estimate to its standard error can be used to assess significance.
You should expect to see coefficients for x1-x10 for f1 and x1-x15 for f2.
Anonymous posted on Monday, March 21, 2005 - 3:24 pm
This is a follow-up question for the model in my March 17, 2005 post above. I have been reading technical appendix for version 2 (although I’m using 3.12) but still unclear on how my model is being estimated. I’m not specifying an analysis type so the default WLSMV is being used.
More specifically, I’m confused by the statement that the coefficients for the “ON” part of the model are simple linear regression coefficients, which would imply that maximum likelihood is being applied. Is the estimation applied to the “BY” part of the model different than that applied to the “ON” part of the model? Is WLSMV applied only to the “BY” part of the model?
I also see in Tech 5 that gradient calculations and quasi-newton are being applied…are these steps required for both ML and WLSMV?
On a separate topic, I need some help understanding the model fit: Chi-Square, Chi-Square for baseline (what is baseline?), CFI, TLI, RMSEA, WRMR…can you recommend a reference? Thank you so much!
bmuthen posted on Monday, March 21, 2005 - 5:36 pm
Say that a model has both (1) factor indicators and (2) regressions of factors on covariates. With categorical indicators, part (1) is a non-linear regression (probit or logit), while part (2) is a linear regression. The choice of regression is done as a function of the dependent variable being categorical in (1) and continuous in (2). The fact that linear regression is used in (2) does not imply that ML is used. With the WLSMV estimator you get probit in (2) and linear reg in (2) and all of this is done in a single WLSMV estimation step encompassing both (1) and (2).
Yes, both ML and WLSMV require numerical optimization using gradient and QN steps.
See the Yu dissertation on the Mplus web site and references therein for these fit index definitions.
Anonymous posted on Thursday, March 24, 2005 - 7:44 pm
Hi, I am using CFA to evaluate a model of 5 correlated first-order factors. My understanding is that it is appropriate to constrain error covariances to zero. It has been suggested to me that, instead, I estimate the error covariances among the items specified as indicators of a given factor, but not estimate error covariances for items across different factors. The argument provided to me for doing this is that this is okay because theoretically the items serving as indicators of a given factor are measuring something distinct from the other factors. To me, this is represented by the fact that such indicators are specified to load on one and only one of the factors. Is there ever a time to allow these errors to covary? The only situation that comes to my mind is if there are method effects, maybe reverse-scored items, but even in this situation I wouldn't be that comfortable doing it. Any thoughts about this situation or any other? How justifiable is it to estimate these parameters?
It is justifiable to estimate residual covariances of factor indicators if there is a reason that this parameter makes sense in the model, for exmaple, methods effects or minor factors. I do not believe that this would apply to reverse order items.
I am running a multiple group CFA where I want to compare the loadings on the same 6 factors in 4 different age groups. Each factor is defined using the same 2 observed variables in each of the 4 groups (factor1 by var1 var2 in group1, group2, group3, and group4). There are no covariates.
Following the documentation in the Mplus 3.0 User's guide I have first specified an overall model, followed by group specific models for each of the age groups, listing only the second observed variable for each factor on the group-specific models (since the first variable will always have a loading of 1.0 to establish scale).
1. When I specify the group-specific models, do I need to leave out one of the age categories, as I would need to do in multiple regression to prevent multicolinearity? Curiously, I get the same results whether I do or don't omit one of the age categories.
2. The User's Guide says that, if I do not request a model for one of the groups that I have defined with the GROUPING command, the omitted group will have the overall model fitted. Could you explain why I get the same output (identical stats) if I don't include a group-specific model command for the age 15-24 group as when I do? I would have expected to get different loadings for the second observed variable on my 6 factors when I allow them to be freely estmated in the age 15-24 group compared to when I force that group to have the overall model by not including a model statement for it.
3. I have observed that I get exactly the same output (all stats the same) regardless of which age category group I omit (i.e. don't include a specific model for) or whether I specify that I want the loadings to be freely estimated separately for all 4 age groups. When I fit the same models (6 factors, 2 observed vars per factor, first var loading fixed at 1.0) across 3 education levels, I of course get different loadings than when I model by age groups, but again the output is the same across the education groups.
To summarize: Can you explain this consistency regardless of how I specify that I want models built for each group? Whether I leave a group out of my model statements or don't, and regardless of which group I omit, I get the same results.
Thank you so very much for whatever clarification you can provide.
If possible, please send your input, output and data to email@example.com. I think these questions will be better answered if we are able to see the results that you are seeing and how the model is set up.
Anonymous posted on Monday, August 15, 2005 - 2:06 pm
First, let me say thank you for maintaining this discussion site. It is invaluable. I have found that your responses frequently answer questions I have had but did not post.
I am running a CFA with two factors. I get the warning message that the psi matrix is not positive definite. The Tech4 option shows that the estimated correlation between my two factors is 1.098, which the cause of the warning, but the output under Model Results shows under "factor1 with factor2" a correlation of just 0.118.
Computationally, what's the difference between the estimated correlation between factors produced by the Tech4 option and the one that appears under the model results?
bmuthen posted on Monday, August 15, 2005 - 3:47 pm
Thanks for the encouragement regarding our site. Tech4 gives the correlation which is the same as what is given in the regular output if the factors are exogenous in the model, but if the factors are dependent variables the regular output concerns the residual correlation.
Anonymous posted on Wednesday, August 17, 2005 - 9:30 am
Could you please explain how to interpret the Residual Variances and R-SQUARE values listed in the output for a MIMIC model?
1. Are the residual variances the differences between the variances estimated for the observed variables (factor indicators) from the model and the actual variances for these variables calculated from the data? -- such as what is measured in the Chi-square goodness of fit measure?
2. Or is the residual variance of a factor indicator variable the variation that's unaccounted for by its loadings of the factors? If this is the case one would expect that the R-Square value would be equal to (1 - the residual variance), but this does not seem to be the case judging from the values in my output. Is R-square the same as the communality in EFA?
3. I know that the R-Square value is equal to 1 - the StdYX of the Residual variance. But how do the R-square values and the raw residual variances relate to one another?
Thank you for the information!
BMuthen posted on Wednesday, August 17, 2005 - 2:18 pm
2. That's right. In the MIMIC model, the R-square is not 1 - the residual variance but it is the ratio of variance in the factor explained by the covariates over the total variance of the factor, where the total variance includes both the explained variance and the residual variance.
3. See number 2.
Anonymous1 posted on Monday, October 03, 2005 - 6:02 am
I am a relatively new user of your software. I am currently attempting to conduct a CFA using the latest version of MPlus but keep encountering two series of error messages:
1. COMPUTATIONAL PROBLEMS ESTIMATING THE CORRELATION FOR x2 AND x4. INCREASING THE ITERATION OR CONVERGENCE OPTIONS MAY RESOLVE THIS PROBLEM.
When I follow the suggested advice I then encounter the second message which pertains to a completely different variable:
2. SERIOUS COMPUTATIONAL PROBLEMS OCCURRED IN THE UNIVARIATE ESTIMATION OF THE THRESHOLDS/MEANS, VARIANCES AND/OR SLOPES FOR VARIABLE x10.
What might these messages and their associated problems be attributable to?
bmuthen posted on Monday, October 03, 2005 - 10:34 am
It sounds like your variables are categorical so you may want to look at their univariate and bivariate distributions (frequency tables) to see if you see anything unexpected such as very skewed distributions. If this doesnt help, send your input, data, and output to firstname.lastname@example.org
Anonymous1 posted on Tuesday, October 04, 2005 - 1:37 pm
Thanks Dr. Muthen. I followed your advice and reexamined the variable distributions. Several variables were defined by negative and postive skews. Transformations solved the problems that I'd been encountering.
Hi, I'm a new MPlus user. I had just ran a CFA with my .inp as stated below. However, I've been getting a series of error messages on "ERROR in Variable command Duplicate variable on NAMES list". I truly hope I can get some help with this issue. Thanks.
TITLE: CFA for Mach and Religiosity DATA: FILE IS Mach_Rel_BE_EPbeg_EPnow.dat; VARIABLE: NAMES ARE T1-T9 V1-V9 M1-M2 R1-R4 Be1-Be16 Epb1-Epb10 Epn1-Epn10; USEVARIABLES ARE T1 T3 T5 T7 V1 V2 T2 T4 T6 T8 T9 M1 V3 V4 V6 V8 V9 R1-R4; CATEGORICAL ARE T1 T3 T5 T7 V1 V2 T2 T4 T6 T8 T9 M1 V3 V4 V6 V8 V9 R1-R4; MODEL: M BY T1 T3 T5 T7 V1 V2; Mi BY T2 T4 T6 T8 T9 M1; Fio BY V3 V4 V6 V8 V9; Rel BY R1-R4;
Can you describe this in a little more detail? For example, do you want to explain the covariance between an individual's income and ses by by the nation's GNP? I don't think I understand what you are asking.
Ed Wu posted on Tuesday, November 15, 2005 - 5:01 pm
"In the MIMIC model, the R-square is... the ratio of variance in the factor explained by the covariates over the total variance of the factor, where the total variance includes both the explained variance and the residual variance."
What is the relationship between R-square and communalities in models without covariates?
bmuthen posted on Tuesday, November 15, 2005 - 5:47 pm
Communalities refer to explained variance in an item as a function of the factors influencing that item. So it is the R-square for the item instead of for the factor.
Chapter 17 of the Mplus User's Guide has a description of the Mplus output. The scale of the factor indicators tells you what type of regression coefficient the factor loading is. As far as how to understand other things about CFA, see a book like the Bollen book or other references on the Mplus website.
fati posted on Tuesday, January 17, 2006 - 6:55 am
I am estimating a CFA with 7 factors and 38 items using wlsmv, my output shows : RMSEA=0.095, CFI=0.94, I know that is a poor model, and I have some questions , this is the first time that I do this analysis. 1-what statistics can help me to have a good model? my program is: TITLE: CFA FOR PCAS (Categorical factor indicators) DATA: FILE IS PCAS.dat; VARIABLE: NAMES ARE q1b q2 q3b q4b q6a q6b q9a q9b q9c q9d q9e q10 q11a q11b q11c q11d q11e q12b q12d q12g q14a q14b q14c q14d q15 q19a q19b q19c q19d q19e q19f q7r q8r q12ar q12cr q12er q12fr q13REC;
CATEGORICAL ARE ALL; missing = ; MODEL: ACCES BY q1b q2 q3b q4b q6a q6b; CR_VISIT BY q7r q8r; CR_KNOW BY q14a q14b q14c q14d q15; IC_COMM BY q9a q9b q9c q9d q9e q10; IC_TRUST BY q12ar q12b q12cr q12d q12er q12fr q12g q13rec; MC_INTEG BY q19a q19b q19c q19d q19e q19f; RESP BY q11a q11b q11c q11d q11e;
OUTPUT: tech2 modindices; 2-can I use the modifications indices for this, my modification indices output are: MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index 10.000
M.I. E.P.C. Std E.P.C. StdYX E.P.C.
ACCES BY Q10 17.281 0.423 0.237 0.237 CR_KNOW BY Q6B 17.053 0.254 0.226 0.226 IC_COMM BY Q3B 10.165 -0.155 -0.146 -0.146 IC_COMM BY Q6B 26.019 0.268 0.253 0.253 IC_COMM BY Q15 12.545 0.317 0.299 0.299 IC_TRUST BY Q6B 31.992 0.361 0.278 0.278 IC_TRUST BY Q15 13.594 0.405 0.312 0.312 MC_INTEG BY Q6B 18.600 0.312 0.256 0.256 RESP BY Q6B 24.439 0.270 0.249 0.249 RESP BY Q15 12.753 0.348 0.321 0.321
how can I apply this change in my model, for example, I understand that q10 must be better with factor acces ,h ow can I do this, do i change the polace of item q10 to a factor acces? 2-how can I use the estimates in order to have a better model , what that is means if I have: MODEL RESULTS
Given that the modification indices suggest a lot of possible cross-loadings, I would suggest starting with an EFA. It may be that the data you have does not support the theory that you are testing. EFA will give you a better idea of whether the variables are behaving as you expect.
fati posted on Tuesday, January 17, 2006 - 9:19 am
Thank you for response, I have doing a EFA, I have used a promax rotated loading (>0.4) to compare the items in each factor, there is a little difference, but when I use a CFA for a model changed , my test RMSEA IS ALWAYS >0.05 (RMSEA=0.10, CFI=0.936), WHAT THAT IS MEANS, My indices modifactions are: MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index 10.000
M.I. E.P.C. Std E.P.C. StdYX E.P.C.
ACCES BY Q10 16.162 0.223 0.223 0.223 CR_KNOW BY Q6B 17.360 0.226 0.226 0.226 IC_RESP BY Q6B 25.570 0.246 0.246 0.246 IC_RESP BY Q14A 10.447 0.252 0.252 0.252 IC_RESP BY Q15 17.956 0.352 0.352 0.352 IC_TRUST BY Q6B 31.391 0.277 0.277 0.277 MC_INTEG BY Q6B 18.837 0.257 0.257 0.257 OTHER BY Q3B 10.089 -0.163 -0.163 -0.163 OTHER BY Q6B 31.372 0.307 0.307 0.307 OTHER BY Q14A 10.620 0.247 0.247 0.247 OTHER BY Q15 31.692 0.427 0.427 0.427
IC_RESP ON CR_KNOW / CR_KNOW BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON IC_RESP / IC_RESP BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON IC_TRUST / IC_TRUST BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON MC_INTEG / MC_INTEG BY IC_RESP 999.000 0.000 0.000 0.000 IC_RESP ON OTHER / OTHER BY IC_RESP 999.000 0.000 0.000 0.000
I'm afraid that I cannot tell you how to get your model to fit. I don't think you would have so many factor loading modification indices if your EFA clearly pointed to the factors in your CFA. I would revisit the EFA.
jad posted on Wednesday, January 18, 2006 - 12:16 pm
Iam conducting CFA with 40 categorical factor indicators with estimator by default (wlsmv), (7 constructs), I have understand in the article (David B.Flora and Patrick J.Curran (2004)) that a robust WLS is robust to modest violations of underlying normality, I want to know how can I determine a modest nonnormality , do I use a skweness and kustosis ? if yes, how can I do this in Mplus, do I verifyy a normality to esch item used in my analysis, I have 40 items with 350 observations? thanks
The normality assumption is for the u* variables underlying the observed u variables. It is difficult to test their normality. From a practical point of view, if you did test the normality of the u* variables and found that they were extremely non-normal, what would you do? You would probably just use a robust weighted least squares estimator. To read more about these issues, see the following paper:
Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 205-243). Newbury Park, CA: Sage. (#45)
If you don't have access to the paper, you can request paper 45 from email@example.com.
I'm running a 5 factor CFA with 44 dichotomous items. Here is my input syntax:
VARIABLE: NAMES ARE q1 - q44; CATEGORICAL ARE q1 - q44;
MODEL: f1 by q5* q8* q12* q21* q24* q43*; f2 by q2* q6* q10* q11* q13* q15*; f3 by q3* q4* q14* q16* q18* q19* q22* q31* q33* q35* q37* q40* q42*; f4 by q9* q17* q20* q23* q25* q26* q27* q32* q34* q36* q39* q44*; f5 by q1* q7* q28* q29* q30* q38* q41*; f1 @ 1; f2 @ 1; f3 @ 1; f4 @ 1; f5 @ 1;
I get the following warning: WARNING: THE RESIDUAL COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. PROBLEM INVOLVING VARIABLE F2.
I'm confused as to what this means. 1) Is this the covariance matrix of the item residuals? I don't think so but I need to clarify & wouldn't it be called theta? 2) Is this the covariance matrix of the 5 latent factors? If so, why is it referring to residuals when my factors are just freely correlated (i.e., nothing is predicting them so there should be no residual) & wouldn't this be called phi?
It is referring to the covariance matrix of the factors. Most likely f2 has a negative or zero variance. I would have to see the entire output to understand why residual is being printed. You can send it along with your license number to firstname.lastname@example.org.
When you have four variables on the USEVARIABLES list and only three variables in the MODEL command, all four variables are used in the analysis. The variable not mentioned in the model command is not correlated with any of the other variables. This could make the model not fit. You will find a message to this effect in the output. If you have further questions of this type, send them along with the input, data, output, and your license number to email@example.com.
i have a well-fitting model for an instrument that mixes categorical and continuous items. in the model, items load on three lower-level factors (two of which include categorical items), which then load on a single factor. the categorical items are all "yes/no."
a somewhat arbitrary scoring system exists for this measure. based on the above discussion, i get the sense that, at least for the factors that include categorical items, an iterative process must be gone through in order to create the factor scores. i want to be able to describe a scoring system that does not require use of mplus, so i have several questions:
1. there is some discussion above about creating a table that includes all possible answers and the resulting score on the factor. however, it's not particularly clear to me how this would be accomplished. can someone provide a little more detail?
2. am i correct that the iterative process is necessary because of the categorical items? that is, for the one lower-order factor that is defined by continuous variables, can i instead use the loadings on the factor to estimate factor scores?
3. when categorical and continuous variables are both included, are the continuous variables part of the iterative process, or is their contribution to the factor score more direct?
i went into this thinking that it should be easy to come up with something better than the arbitrary scoring method that now exists, but reading the posts above i'm realizing i don't really know much about the issues here!
thanks in advance for any thoughts on this question.
Whenever some observed factor indicators are categorical, the factor score estimates have to be obtained using an iterative optimization procedure. There is no simple approximation (short of just summing the items). The procedure is described in the technical appendix for Version 2 posted on the web site. So the estimation has to be done in Mplus.
an above answer to a similar question seemed to suggest that, for example, if a factor is defined by 5 yes/no items, then a table could be constructed that would look something like (spacing isn't coming out exactly as i meant to, but i hope the idea is clear enough):
1 2 3 4 5 score n n n n n 0 y n n n n 2 y y n n n 3 y n y n n 3.5 etc...
where all possible combinations are expressed and the corresponding factor score is indicated. i am not sure this ends up being at all feasible in my data set (one of the factors has 7 yes/no items and one item with responses from 1-6--so that would be a *lot* of rows in a table), but is it *theoretically* possible to create such a table? (if not, i'm giving up; if so, it would be nice to have a pointer or two.)
Each distinct response pattern (row in your table) does give rise to a distinct estimated factor score value. To fill in the value for each row, you would need estimates for the model parameters and then compute the estimated factor scores using the iterative optimization technique. If this has been computed, the table can be created and then applied to any other individuals with these values.
Am I understanding correctly? (1)The scale factor in categorical data CFA is the latent response variable associated with each observed categorical factor indicator? (2)if we want to compare two models, one nested in the other, we should use the uncorrected chi-square that is resulted from nonrobust estimation method. Thanks Linda and Bengt!
This means that the two factors with a correlation greater than one are statistically indistinguishable from each other. You either need to rethink them or use only one of them. Did you ever do an EFA with the set of items to see if they are loading as expected? It may be that your items are not valid measures of what they are meant to measure.
Response to your May 12 posting: That's a good thought, but it doesn't seem to fit. I did do an EFA (both 4 and 5 factor solution) and these two factors are the ones that have the loadings most consistent with the theory (the others have some problems, but these two are very robust and they are generally not correlated more than -.30).
The other 3 factors which don't the correlation greater than 1 are not as distinct as I would like. COuld this be affecting the parameters of the more robust factors?
I am puzzled by a discrepancy between EFA and CFA In the EFA the factor intercorrelations are all within reasonable range, whereas in the CFA 2 Factors 1 and 5 end up with a correlation >1.
I know the solution below is not very good, but this ALSO happens when I eliminate the items that are cross-correlated or don't load according to the theory and when I assign items in the CFA to the factors they load on in the EFA.
Any other thoughts on this?
EXPLORATORY ANALYSIS WITH 5 FACTOR(S) :
CHI-SQUARE VALUE 522.915 DEGREES OF FREEDOM 271 PROBABILITY VALUE 0.0000
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) : ESTIMATE IS 0.052
You are comparing an EFA to a simple structure CFA. These are two different models and therefore would not necessarily end up with the same results. Also, please don't paste large portions of output on the discussion board as it takes too much room. If it is necessary to show this much output, please send the question to firstname.lastname@example.org along with your license number.
Lois Downey posted on Thursday, June 29, 2006 - 10:24 am
The following statement appears in B. Muthen's post of 10/13/02, 10:44 AM: "With many good indicators (high loadings), ... the estimated factor scores tend to behave more and more like the true scores."
I computed factor scores for a single-factor complex missing model with 10 dichotomous indicators. 801 patients were clustered under 92 physicians. Standardized loadings ranged from 0.934 to 0.858. Am I correct in interpreting this to constitute "many good indicators"?
Estimated mean and variance for the latent variable were 0.00 and 0.79, respectively, whereas the mean and variance for the factor scores were -0.07 and 0.44, respectively. Would one expect greater correspondence between the two distributions than this, given the number of indicators and the size of the loadings?
Dear prof. I am trying to fit the latent class analysis model to four manifest variables each with two levels and one covariate i.e.age.Inititutively since i am working on diagnostics test i rarely expect the number of classes to go beyond 3,the two the better.But unfortunately the model with 3 classes seems to fit better than the one with two and may be with four may be still significant.One of the problem i noticed is that there is high correlation between manifest varibles about 0.95.
Please will you advice me on how to handle the effects of correlation in modelling latent class.
Second i saw correlation of about 0.95 can you kindly give me an idea on how is obtained because to my knoweledge correlation is for continous varaiables and we commonly use odds ratio as measure of association in categorical variable.
Find the attached Mplus codes for my analysis.
Title: Summer Latent Class Analysis. Data: File is C:\Documents and Settings\Maruwa1\Desktop\mps2.txt; Variable: names = id age viap paplsilp colpohgrp hpv; usevariables = id age viap paplsilp colpohgrp hpv ; categorical = viap paplsilp colpohgrp hpv; classes = c(3); missing=all(9999); Analysis: Type=missing mixture ; MODEL: %oVERALL% C#1 ON AGE; Plot: type is plot3; series is viap(1) paplsilp(2) colpohgrp(3) hpv(4); Savedata: file is mps_save.txt ; save is cprob; format is free; Output: tech11 tech14;
When I look at your input, I wonder if you want to include the ID variable in the analysis. By including it on the USEVARIABLES list, it will be used as a latent class indicator along with the four categorical variables. You would need to send your output and license number to email@example.com for me to comment on the .95 correlation. I can't see where that would come from.
TECH4 shows model estimated means. For example, my latent factor "F1" has three indicator variables X1, X2, and X3, all on a 7-point likert scale. The sample statistics tell me that the means for the indicators X1, X2, and X3 are 5.105, 4.643, and 4.827. However, the mean for the latent factor F1 is only "0.123".
ESTIMATED MEANS FOR THE LATENT VARIABLES F1 ________ 0.123
ESTIMATED COVARIANCE MATRIX FOR THE LATENT VARIABLES F1 ______ F1 1.333 ===========================================================
How is the mean "0.123" for the latent factor calculated? Why is it so small when compared to the means of the respective indicator variables? In journals, we need to report the means and std.devs, so is this the value we report?
A. Dyrlund posted on Wednesday, January 31, 2007 - 9:21 am
If I have a simple CFA model where: VARIABLE: NAMES ARE psq1-psq15; MODEL: reward BY psq1, psq6, psq11; coercive BY psq2, psq7, psq12; referent BY psq3, psq8, psq13; legit BY psq4, psq9, psq14; expert BY psq5, psq10, psq15;
What is the syntax for placing this CFA model in EFA in order to obtain the promax rotation results and eigenvalues?
Note that the user's guide is available in pdf form on the website. See online ordering for the course handouts. What are you trying to do?
jenny yu posted on Saturday, February 10, 2007 - 8:26 pm
Dear Drs. Muthen,
I have some questions when I built my MIMIC model with DIF effects. I would appreciate if you can give me some clues.
1) I am implementing a iterative model building process by dropping a variable each time and using DIFFTEST to test significance of nested models. V3 mplus requires WLSMV to be the estimator to do DIFFTEST, however, my model fit is better when I used WLS. I read earlier discussion here and noticed that WLS was able to run DIFFTEST in earlier version. So I am wondering whether there is any way to run DIFFTEST with WLS estimator.
2) In earlier discussion, I also noticed a strategy of doing DIFFTEST with WLS and using WLSMV for the final model. Similarly, can I did DIFFTEST with WLSMV to achieve the final model and then use WLS to run the final one? Because in my case, the model fit (CFI and RMSEA) is better with WLS.
3) With 'residual' in 'output' statement, I get covariance/residual correlation/correlation matrix? Is there any way to output p-values related to this matrix?
Thank you very much for your time and help in advance.
I am not sure choosing an estimator because it gives better fit is justifiable but I will let you make that decision.
DIFFTEST is used only with WLSMV. With WLS, the difference in chi-square and degrees of freedom for the two nested models is used.
I am confused by two things that you say. One is that you are using difference testing for a MIMIC model. In a MIMIC model, DIF of intercepts is looked at by regressing the items on one or more covariates. I am also confused about what you mean by dropping a variable. Nested models should have the same set of observed variables.
Mplus does not give p-values for residuals .
jenny yu posted on Monday, February 12, 2007 - 7:41 am
I apologize for the confusion.
I am trying a model building process, that's why I was dropping variable. I probably misconcepted the definition of 'nested model'. Isn't it that a full model vs. a restricted model (with fewer variables than the full model)?
I think my question is that given a bounch of variables, DIF effect of which variable should be added to the model so as to achieve a parsimonious model (resulting in better model fit) instead of looking at DIF with all variables.
When we look at significance of coefficients of a variable for all items (indicators), DIF effects on some items were signficant, some were not. How can we decide whether this variable should be kept in the model or not? what is a valid and doable strategy to select variable?
Also is there any function in Mplus similar to Macro in SAS which can be used when we run something iterative.
Nesting in reference to chi-square difference testing refers to models using the same set of observed variables where restrictions are place on a more general model. Following are some papers you might find useful:
Gallo, J.J., Anthony, J. & Muthén, B. (1994). Age differences in the symptoms of depression: A latent trait analysis. Journals of Gerontology: Psychological Sciences, 49, 251-264. (#52)
Muthén, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557-585. (#24)
Muthén, B., Tam, T., Muthén, L., Stolzenberg, R. M. & Hollis, M. (1993). Latent variable modeling in the LISCOMP framework: Measurement of attitudes toward career choice. In D. Krebs & P. Schmidt (Eds.), New Directions in Attitude Measurement, Festschrift for Karl Schuessler (pp. 277-290). Berlin: Walter de Gruyter. (#46)
If by the macro in SAS you are asking about running several analyses at the same time, you can use a DOS batch file for this purpose.
jenny yu posted on Monday, February 12, 2007 - 10:29 am
Thank you very much for your answers and the references. They are helpful. In addition, could you give me some instructions on my questions about building parsimonious model of DIF effect within MIMIC (the 2nd and 3rd paragraph in my previous post)?
I am looking for statistical help in doing SEM analyses using LISREL (or any other software for SEM analyses). I already did the EFA (using SPSS) and I only need to run the CFA. I do have all the conditions (correlations...) that needs to be enter in order to get the right model fit, however, I don't know how to operate the LISREL software very well. I am confused by Lisrel language!!! Thanks
I tried to subscribe to SEMNET at the link you sent me but it keeps telling me that there is a cookie problem and I can't get ride of it. It has been taking me on a circle ride for the last three days. HELP!!
I need to know whether the Cronbach Alpa can be determined by Lisrel or it needs to be calculated? if so how? Thanks
Todd Little (Little et al., 1999, SEM) recommends putting equality constraints on factor loadings when estimating a model with only two indicators per latent variables (fixing both loadings to be equal). In this case, I use continuous indicators.
(1) How do we do that in MPlus ? (2) I tried doing it by fixing both loadings to 1. In this case, the estimates I obtained are both fixed to 1 without S.E. (0), the STD estimates are equal to one another but the StdYX differ. If I which to report standardized loadings, what should I do (if that is the right way to do it - ref question 1)?
Nina Zuna posted on Wednesday, July 11, 2007 - 2:05 pm
I have a very elementary question. I would like to set a correlation to 1 between two factors. When I used the WITH command and @1, I received this error: NO CONVERGENCE. SERIOUS PROBLEMS IN ITERATIONS. ESTIMATED COVARIANCE MATRIX NON-INVERTIBLE. CHECK YOUR STARTING VALUES. My goal is to do a chi sq diff test between a model in which the correlation between 2 factors is set to 1 vs a model in which the correlation is freely estimated.
Below is the model: ChildFoc by q2_1 q2_2 q2_3 q2_4 q2_5 q2_6 q2_7 q2_8 q2_9; FamFoc by q2_10 q2_11 q2_12 q2_13 q2_14 q2_15 q2_16 q2_17 q2_18; ChildFoc WITH FamFoc@1;
I think a better way to do this is to use MODEL TEST. See the user's guide under MODEL CONSTRAINT to see how to label the parameters. And then see MODEL TEST which performs a Wald test.
Nina Zuna posted on Wednesday, July 11, 2007 - 7:27 pm
Thank you, Linda; your speedy response was very much appreciated. I reviewed pps.484-488 in the UG as advised. My new syntax is: ChildFoc by q2_1* q2_2 q2_3 q2_4 q2_5 q2_6 q2_7 q2_8 q2_9; FamFoc by q2_10* q2_11 q2_12 q2_13 q2_14 q2_15 q2_16 q2_17 q2_18; ChildFoc@1; FamFoc@1; ChildFoc WITH FamFoc (p1); MODEL CONSTRAINT: p1=1; MODEL TEST p1=.93 The correlation on the output is now 1.0; however, I want to make sure I interpreted your suggestion and the UG correctly: It appears that I can not conduct a model test (Wald test)for p1 since I constrained p1, right? If I am constraining it to 1, then I probably can't test it for a different value? Incidently, I noticed that the constrained model is the same as having all 18 indicators load on 1 factor (makes sense since I am saying the two factors perfectly correlate). However, I am back to square 1: given these two scenarios, what is the best way to test between the 2 models: Model 1 that allows the two factors to correlate freely vs. a model in which the correlation is fixed to 1 or a model with 1 factor identified by 18 observed indicators? Is my only option to run the two models separately and conduct the chi sq diff test by hand since I got the error message for Wald test(WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX)?
Nina Zuna posted on Thursday, July 12, 2007 - 9:20 am
Great...I have the Wald's test at the end of my output!! If I may bother you to ask one final question---I want to ensure I am interpreting it correctly. It is significant. I removed the constraint so now the correlation between my two factors is freely estimated on my output (completely stdzd r=.93). I added the model test only as advised. Am I correct to assume the Wald test (p1=1) tested the significant difference between a correlation of .93 and 1.0?
Thank you in advance for your final thoughts, Nina
Pardon such a basic question, but sometimes it is helpful to check one’s understanding of the basics.
Would you be so kind to please clarify if the interpretation of the Covariances and Variances sections of the CFA output would differ if factor variances WERE vs. WERE NOT set @1? Let’s say we have a simple 3-factor model, with no freed parameters (i.e., no freed factor loadings or covariances). Let’s also say that for Model 1, f1@1f2@2f3@1 and for Model 2 there are no such specifications. Obviously, the StdYX values would = 1.0 in the Variances section for Model 1. If your time permits, can you please clarify everything else (i.e., Estimates, S.E., Est./S.E., Std, StdYX)?
Greetings~ I apologize in advance, but I am relatively new to CFA and Mplus, so I had some rather basic questions.
1)I wanted to check my understanding of the use and computation of factor scores. I'm gathering that factor scores are individuals' predicted scores on a factor created by multiplying their score on each predictor by that predictor's factor loading and then summing these values. Is this accurate? And then, is it comparable to a composite of those indicators? Do factor scores mean the same thing for categorical variables?
2)I was trying to find the matrix to use to calculate the composite reliability based on an equation I was provided in a SEM class, and I believe I can request the matrix for categorical indicators using the Tech 4 output request. But, when I was looking into this on this discussion forum, if I did not completely misunderstand what was meant, it was suggested that with binary variables, the reliability is better considered using IRT. Additionally, not all of my indicators are binary, most are, but the others have 3-4 unordered categories. So, can I calculate the reliability in the same way with my categorical variables, and if so, will Tech 4 provide the matrix I need to use? If not, any basic references on IRT?
Linda posted the following reply a few years ago that I addresses your first question:
"A factor loading is a regression coefficient. If factor loadings are continuous, they are simple linear regression coefficients and are interpreted as such. They can be greater than one. There is a discussion of this on the LISREL website under Karl's Corner.
If the factor indicators are categorical, then the factor loadings are probit or logistic regression coefficients depending on the estimator used in Mplus. "
If you haven't already done so, use the search function in this discussion forum and I bet you'll be able to find posts that provide even more information for your first question, and then posts that address your second question.
If not, you'll need to wait a few more days since the Muthen's are on vacation.
Hello, I'm referring to a post posted on May 04, 2001 "I would like to use Mplus to estimate a non-linear relationship among latent variables [interaction]. Joreskog and Yang (1996) demonstrated such a model can be estimated using SEM if an observed product variable is used as an indicator of the latent product variable. Bollen (1995) used a two-stage least squares with instrumental variables to estimate the interaction. How can I use Mplus to estimate an interaction model?
Linda K. Muthen posted on Thursday, May 10, 2001 - 10:07 am Mplus cannot do what Joreskog and Yang demonstrated because it requires non-linear constraints. The Bollen approach can be done in Mplus but it is not directly implemented. It would have to be done in a series of steps."
Is this still the case or does Mplus deal with the 2SLS approach? What would be the mentioned steps? Thanks for your help.-Stephan
Since 2001, Mplus has added the XWITH option for latent variable interactions and MODEL CONSTRAINT for linear and non-linear constraints. Latent variables interactions are estimated using maximum likelihood according to the principles described in the following paper:
Klein, A. & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65, 457-474.
Dear Linda, thanks for your fast response and the literature. Best, Stephan
Linda posted on Thursday, September 27, 2007 - 10:23 am
Hello, I ran a CFA with three continuous indicator variables, and I get a chi-square value of 0.0000, degrees of freedom = 0, CFI=1.0, TLI=1.0, RMSEA= 0.0000. It's a just identified model. In this situation, do I report the fit indices?
I'm trying to save factor scores in CFA using the "idvariable is" statement in the variable command. The problem is that I need to merge the "factor score" dataset with a larger data set using 2 different identifiers (a family ID as well as an individual ID). Is there a way to save two identifying variables with the factor scores? I wasn't able to get the code to run with two variables listed in the idvariable statement. Thank you!
The only thing I can think of is to create one variable from the two variables and then do the same thing in the larger data set when you do the merge. The length of the id variable is increasing to 16 in Version 5 so this might help.
I think the fit statistics we provide should be sufficient.
Paul Silvia posted on Monday, November 05, 2007 - 1:38 pm
You might consult a recent paper by Bentler in Personality and Individual Differences, in which he suggests "best practices" for reporting fit statistics. (Less is more, I think: a few well-chosen ones are better than a laundry list of every statistic that a program will compute.)
Matthew Cole posted on Wednesday, November 07, 2007 - 4:06 am
Bentler, P.M. (2007). On tests and indices for evaluating structural models. Personality and Individual Differences, 42, 825–829.
Hi, I am running a cfa and i want to run a LM (Lagrangian Multiplier) test to identify which fixed parameters, if set free, would lead to a significatly better fitting. Which option is good for that, in Mplus? My second question is about modeling. Can I correlate a second order factor with a first order factor? For example, is it true to write :
f1 by v1-v3; f2 by v4-v8; f3 by v9-v10; f4 by v11-v14; f5 by v15-v18; f6 by f1 f4 f5; Y on f6;
I am new to Mplus and have a question regarding a CFA that includes both categorical and continuous variables. I ran the following output and recevied a message that says "MODINDICES option is not available for ALGORITHM=INTEGRATION. Request for MODINDICES is ignored.1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS."
Here's my syntax:
Title: Family Characteristics CFA Data: file is H:\sociology\lgriepenstroh\runfeb4.dat; format is F8.2(41);
When you have one or more categorical indicators and use maximum likelihood estimation, numerical integration is required. The error message is about modification indices not about having categorcal and continuous indicators. You can use ESTIMATOR=WLSMV as an alternative. If this does not help, please send your input, data, output, and license number to firstname.lastname@example.org.
I am running two CFA models, a three-factor model and a second-order model. However, I am getting the same result for both models (model fit and estimates). Below are the syntax used. Am I missing something?
Model 1: Three factor Model
VARIABLE: NAMES ARE v1-v62; USEVARIABLES ARE v1-v62; CATEGORICAL ARE v1-v62; ANALYSIS: ESTIMATOR= wlsmv; MODEL: f1 BY v1-item20; f2 BY v21-item38 v61-v62; f3 BY v39-v60; OUTPUT: sampstat tech4;
Model 2: Second Order
VARIABLE: NAMES ARE v1-v62; USEVARIABLES ARE v1-v62; CATEGORICAL ARE v1-v62; ANALYSIS: ESTIMATOR= wlsmv; MODEL: f1 BY v1-item20; f2 BY v21-item38 v61-v62; f3 BY v39-v60; f4 BY f1 f2 f3; OUTPUT: sampstat tech4;
The second-order factor that you add is just-identified. This is why it makes no difference.
Zsuzsa Londe posted on Thursday, February 28, 2008 - 5:47 pm
Hi, You say that Mardia's coefficient can be generated using TECH13, which has to be used with a MIXTURE model, which has to be of mixed classes, and that "None-mixture models can be estimates as a single class mixture to get that output." Could you please help me figure out how to make all continuous variables to be a "single class mixture" in order to get Mardia's? Thank you, Zsuzsa
You need to use TYPE=MIXTURE with the CLASSES option. You specify the CLASSES option as CLASSES = c (1); One class is the same as a one group analysis.
Zsuzsa Londe posted on Thursday, February 28, 2008 - 6:41 pm
Thank you very much for the incredibly speedy answer. I have been trying your suggestion but am not succeeding. I'm a new user and it is possible that I'm doing something very wrong but keep getting an error message "This analysis is only available with the Mixture or Combination Add-On." This is my input;
Variable: Names are id digs_hu reads_hu words_hu nonwd_hu liss_hu corsi digs_en reads_en words_en nonwd_en liss_en comp_esl; Usevariables are digs_hu reads_hu words_hu nonwd_hu liss_hu digs_en reads_en words_en nonwd_en liss_en; Missing are all (-9999) ; CLASSES = c(1);
Data: LISTWISE=ON; MODEL: STM BY digs_hu words_hu nonwd_hu digs_en words_en nonwd_en; WM by reads_hu liss_hu reads_en liss_en;
Hi Linda, Would you have any suggestions how to check multivariate normality with Mplus? I do have univariate non-normality and my committee members would also like me to provide multivariate comparisons. Thank you, Zsuzsa
I am attempting to use DIFFTEST in a CFA with categorical indicators to test a 1-factor structure compared to a 2-factor structure with 5 (binary) observed indicators. When I follow the instructions in Example 12.12, I receive a message that " THE CHI-SQUARE DIFFERENCE TEST COULD NOT BE COMPUTED BECAUSE THE H0 MODEL IS NOT NESTED IN THE H1 MODEL."
Key parts of my setup follows: File 1: Snip>
Usevariables are cohab divorce samesex DIsinbir DIpmsex ; Categorical are cohab divorce samesex DIsinbir DIpmsex ; Analysis: Type = general; ESTIMATOR = wlsmv ;
Model: f1 BY cohab@1 samesex DIpmsex divorce DIsinbir;
Savedata: DIFFTEST IS deriv.dat;
File 2: Snip> Usevariables are cohab divorce samesex DIsinbir DIpmsex ; Categorical are cohab divorce samesex DIsinbir
Analysis: Type = general; ESTIMATOR = wlsmv ; DIFFTEST IS deriv.dat;
Linda, Thanks for your quick response. The different variables listed were just a result of my cut and paste to get down to the maximum character number for the message. I had initially run it the other way (2 factor first, 1 factor second) without success. When I tried again today it worked. I'm running on a remote terminal server and it is almost as if it wasn't having time to register or something. Anyway, I've got my difftest results now! Thanks! Mick
Vivian Towe posted on Wednesday, May 14, 2008 - 2:08 pm
I ran a very simple CFA with categorical data (Likert scale responses). My model fit was very poor (chi-square, CFI/TLI, RMSEA) according to Yu's dissertation.
I was thinking that one way to improve fit would be to see if the item residuals are correlated, but I don't know how to model that? Can you point me to some Mplus examples of CFA that specify correlated item errors? Or is there another way to deal with fit problems?
I don't know you situation but if you have more than two factors, I would start with an EFA. The WITH option is used to correlate residuals, for example, u1 WITH u2; You can look at modification indices to see other possible model misfit.
Vivian Towe posted on Wednesday, May 21, 2008 - 10:53 am
I am running a CFA multiple group analysis. I am following your handouts for multiple group analysis. The 2 groups are male and female.
When I ran a single group analysis restricting to female using the useobservations (gender eq 2) statement, the analysis ran.
However, when I ran the analysis for males and females simultaneously using the statement grouping is gender (male=1 female = 2), I received the following error message:
*** ERROR Group 2 does not contain all values of categorical variable: ESTEEM *** ERROR Group 2 does not contain all values of categorical variable: SEXREG *** WARNING
This is true, for females, no one answered with the value '1' for either of these variables, but not sure how to fix problem.
Any ideas on why the single analysis works but not the multi group?
The groups are expected to have the same categories with weighted least squares estimation. You can collapse categories or use maximum likelihood estimation with the * setting of the CATEGORICAL option. See the user's guide for more information.
I'm testing the factor structure of the SOC scale (12 items collected using 7-point ordinal scales) with two different hipothesised structures (one-factor model vs. second order-factor model with three latent factors of four items each which in turn load on the high-order factor) in a national study (using survey commands). The fit of the second order factor model is realively higher (CFI 0.98 RMSEA 0.083 AIC 231201.21) than that for the one-factor model (CFI 0.98 RMSEA 0.092 AIC 231559.79), however some correlations between latent factors are higher than one (Heywood cases?).
These are my questions:
1) any idea why is that happening?
2) should I dismiss the second-order factor model (which was my personal bet in this research) because of this inadmissible solution?
3) is there any way of constraining correlations to avoid values higher than one?
Factors that correlate one are not statistically distinguishable. A second-order factor model with three first-order factors is the same as a model with three correlated first-order factors. I suggest doing an EFA for 1-4 factors as a first step.
Derek Kosty posted on Wednesday, June 18, 2008 - 1:23 pm
This question is regarding CFA. I am trying comparatively evaluate 5 models, some nested and some non-nested, by simultaneously taking into account the goodness of fit, sample size, and the number of parameters estimated.
I know that Information criterion indicators are good for this purpose (eg. BIC and AIC). However, the problem is that my observed variables (lifetime diagnosis of different mental disorders) are dichotomous and have low base rates.
I am aware that BIC/AIC indices are not appropriate when using the WLS estimator but I am unsure of the appropriateness of using ML and specifying my observed variables as continuous (due low base rates which cause a highly skewed distribution).
I have seen multiple papers reporting BIC statistics while claiming that parameters were estimated using weighted least squares. This does not make sense to me.
How would you recommend that I compare my models in this context?
If you have only a small number of factors you can use ML. Using ML does not mean that you have to assume that the variables are continuous which it sounds like you are implying. If you specify the variables as categorical and the estimator as ML (or MLR), then the appropriate logit (or probit) model parts will be used in Mplus.
I don't see how BIC can be computed with weighted least squares since it builds on the likelihood.
Derek Kosty posted on Wednesday, June 18, 2008 - 5:27 pm
Thanks for the quick reply. The model I am currently trying to run has six factors, each having between 3 and 6 observed variables loading on them. My sample size is 816. I specified the variables as categorical, requested the MLR estimator, and reduced the number of integration points to 5. The compiler has been working for about 3 hours now, is this normal? If so, what should I be looking for in the DOS window that could give a clue to how much longer the process will take?
Maybe the slow pace is a result of running version 4.2?
As indicated in the User's Guide, six factors gives very time-consuming computations. Particularly if you are not using a computer with at least 2 and preferably 4 or even 8 processors. Mplus takes advantage of multiple processors using parallelized code which gives considerable time savings (using the PROCESS= option in the ANALYSIS command). You should also use version 5.1. The DOS window shows you the iteration history and the time each iteration takes - usually you can get an idea from this of how long it will take to converge. But with this many factors I would recommend using WLSMV and instead of BIC compare models via fit measures such as SRMR.
Derek Kosty posted on Thursday, June 19, 2008 - 10:15 am
Yu (2002) suggests that SRMR is not good when dealing with binary outcomes. It is unclear to me if his critique of the SRMR is with respect to the cutoff recommendation, or with the statistic in general. For example, can the values between models still be compared (e.g. which one is lower) and it be meaningful?
It is not clearcut what to do here, but I think maybe measures such as CFI, which Yu (2002) found useful, may not be able to discriminate between neighboring models that are not far apart in terms of fit. Perhaps SRMR is more useful for this.
In Mplus version 4.2 the SRMR is included in the output for a model in which the outcomes are all categorical, the latent variables are continuous, and WLSMV is the estimator. However, SRMR is not included in the output for version 5.1. What is the reason behind this and can I request the SRMR to be computed in version 5.1?
Here is my model: MODEL: distr by LMDD4 LDYS4 LDPD4 LGOA4 LPTS4; fear by LSPE4 LSOC4 LPAN4 LOBC4; intern by fear distr;! LBIP4; fear@0;
Nevermind, I discovered (from another thread) that with maximum likelihood and categorical outcomes, these fit statistics are not available because sample statistics are not sufficient statistics for model estimation.
I am conducting a CFA with dichotomous observed variables with low base rates and an n=816. What method of estimation is most appropriate (MLSMV or MLR) and do you know of any articles that discuss this issue?
In trying to resolve the question, I stumbled across an article in which Beauducel and Herzberg (2006) compare MLSMV with ML (not MLR). They use categorical data with Mplus version 3.11 and somehow are reporting CFI, TLI, RMSEA and SRMR for both methods of estimation. This contradicts my earlier discovery that "with maximum likelihood and categorical outcomes, these fit statistics are not available because sample statistics are not sufficient statistics for model estimation". Is this due to a difference in Mplus itself across versions, or do you think that the authors did not actually specify their variables as categorical within the model?
Sorry about so many questions. I really appreciate all of your support!
WLSMV uses weighted least squares estimation. Chi-square and related fit statistics are available with this estimator. MLR uses maximum likelihood estimation. With categorical outcomes, chi-square and related fit statistics are not available. With maximum likelihood and categorical outcomes, each factor requires one dimension of integration which can be computationally demanding. More than 3 or 4 factors is not feasible. Weighted least squares is a better option when you many many factors.
I have done a CFA using the default analysis TYPE=GENERAL. What is the rotation method used for TYPE=GENERAL? I thought the method and type of rotation was principal axis factoring with promax (oblique) rotation. Is this correct? Thank you.
There is no rotation involved with CFA, only with EFA measurement structures. Are you referring to the new "exploratory SEM" approach? For EFA, Mplus does not use PAF, but estimators such as ML and ULS. A multitude of rotations are available - see the User's Guide.
Numbers one and two are equivalent because you choose the same factor indicator to set the metric of the factor. In Number three you set the metric of the factor using a different factor indicator. Model fit will be the same but not parameters estimates.
When you free the first indicator and fix the factor variance to one as in model 3, no factor indicator is fixed to one. You would need to send the output from model 3, the output from either model 1 or 2, and your license number to email@example.com so I can see exactly what you are doing.
RDU posted on Thursday, December 11, 2008 - 8:18 am
Hello. I am trying to run a MIMIC model for eight ordinal indicators. My sample size is around 500.
My data are nested (students clustered within schools), and due to my research interests I chose to use the aggregated or design-based approach (i.e., Type=complex in conjunction with cluster=school) to model my data.
I have read several articles about MIMIC models, but have yet to see one where school-level effects are used as covariates. I have given some thought to looking at the direct effect of several school-level covariates on my latent factor. Though, perhaps this isn't substantively meaningful or correct in terms of modeling population heterogeneity.
1.) I was wondering what your thoughts are on using a combination of student and school-level covariates as predictors for the latent factor(s) in an aggregated MIMIC model? 2.) If it is true that regressing the latent factor(s) on a combination of student and school-level covariates is feasible (i.e., substantively interpretable) then how would one interpret the school level covariates? This is a bit confusing to me since the aggregated approach doesn't disentangle the between school and within school effects.
You can do what you want but I would recommend using multilevel analysis because I do not believe that a MIMIC model is aggregatable.
Anonymous posted on Tuesday, December 30, 2008 - 9:26 am
I am running a CFA with dichotomous response items and modeling on 8 latent factors. I want to run a logistic SEM model with ML estimator. I keep getting this error. I have tried to run this model on couple of computers but it does not seem to solve the problem. Where could I be wrong?
*** FATAL ERROR THERE IS NOT ENOUGH MEMORY SPACE TO RUN Mplus ON THE CURRENT INPUT FILE. THE ANALYSIS REQUIRES 8 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.25629E+10 INTEGRATION POINTS. THIS MAY BE THE CAUSE OF THE MEMORY SHORTAGE. YOU CAN TRY TO FREE UP SOME MEMORY BY CLOSING OTHER APPLICATIONS THAT ARE CURRENTLY RUNNING. NOTE THAT THE MODEL MAY REQUIRE MORE MEMORY THAN ALLOWED BY THE OPERATING SYSTEM. REFER TO SYSTEM REQUIREMENTS AT www.statmodel.com FOR MORE INFORMATION ABOUT THIS LIMIT.
With maximum likelihood and categorical outcomes, each factor is one dimension of integration. We recommend no more than four dimensions of integration. I suggest using weighted least squares estimation when you have several factors.
Anonymous posted on Wednesday, December 31, 2008 - 12:31 pm
My main concern with weighted least squares estimation is that it does not allow one to run a logit model. This is true right? Because my outcome is dichotomous I would like to get a logit estimate. Can you please advice. Thanks.
Numerical integration is required with CFA and categorical factor indicators. A model with more than four factors is not feasible in this case. Your only option is using weighted least squares and probit regression. I don't see this as a problem.
Anonymous posted on Wednesday, December 31, 2008 - 2:40 pm
Thank you so much for your quick responses. I apologize if I am belaboring on this point. I am have a problem with interpreting results of probit estimate in my context due to the differences in the distribution used. I am thinking of using the latent variable scores from Mplus and using them in a logit regression in STATA. Do you think that is a feasible approach?
It sounds like you mean that you would first do WLS probit factor analysis, then estimate factor scores, and then regress each item on those scores to get logit results. If I am interpreting that correctly, that would seem to be less precise than using the probit factor analysis results and do the usual approximate translation to logit by the Sqrt(pi^2/3) factor. But I don't see much of a difference between probit and logit in practice to warrant any interpretational concerns. Probit modeling certainly seems accepted in IRT.
Anonymous posted on Wednesday, December 31, 2008 - 3:29 pm
Thank you for your quick response. I think your suggestion it is a reasonable way forward.
Anonymous posted on Friday, January 02, 2009 - 10:12 am
I did manage to run my models with WLS. Thank you. I am now wanting to run an interaction model. I am running a CFA with dichotomous response items and modeling on 8 latent factors. I want to create interaction between 2 of these 8 latent factors. Is it possible to do this? I read "5.13: SEM with continuous factor indicators and an interaction between two factors" in the Mplus user guide, but I don't think the command listed there is helping me. Is there any modification you can suggest? Thank you.
Hi, Mplus computes standard errors of factors loadings in a CFA model. So, I was wondering if we can use a pool standard error to compare 2 factors loadings across groups instead of using a chi-square difference testing. If yes, 1) what are the advantages of using one instead of the other testing? 2) Is it possible for some reasons to have a different result for the 2 testings? Thank you in advance.
I do not know how one would calculate pooled-standard errors in the maximum likelihood framework. I would use either a difference test or a Wald test via MODEL TEST.
Derek Kosty posted on Wednesday, March 11, 2009 - 3:49 pm
Dear Mplus team,
I am running a series of CFA models each including a different outcome measure regressed on the latent factors. Looking at the R-squared values associated with the outcomes across the different models suggests that the significance of the R-squared value is not based on magnitude alone. In other words, some outcomes have significant R-squared values that are less than the non-significant R-squared values of another outcome. I am sure that the answer lies in how the standard error is computed for R-squared. Can you provide any input on this issue?
The two tests are not the same. One tests whether the regression coefficient is significantly different from zero. The other tests whether the variance explained in the dependent variable is significantly different from zero.
Derek Kosty posted on Wednesday, March 11, 2009 - 4:31 pm
Sorry, I don't think my question was clear enough. I am only looking at the test of variance explained in the dependent variable. Here is an example:
16.1% of the variance in years of school completed is explained by disruptive behavior and substance use factors. This R-squared is statistically significant.
19.6% of the variance in lifetime prevelence of depression is explained by disruptive behavior and substance use factors. This R-square is not statistically significant.
I am wondering how the first R-squared is significant while the latter is not givin the relative magnitudes.
Each R-square has its own standard error based on information related that that dependent variable. So one may have a larger standard error resulting in non-significance even though the absolute value is larger.
Derek Kosty posted on Thursday, March 12, 2009 - 9:01 am
Can you provide further information on what goes into the computation of the standard error?
The Delta method of computing standard errors is used. Google this for further information.
Maša Vidmar posted on Wednesday, April 08, 2009 - 8:20 pm
I am running CFA with 1 construct and three indicators. It was my believe that this should result in a just-identfied model. But I get following error msg: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 5. Parameter 5 is loading for one of the indicators. What is causing this? And how to avoid it?
Another (unrelated) question...in a different model I have 2 construct at two time points (so actually 4 latent variables) with 3-8 indicators. Fit indices and all estimates are exactly the same in CFA (no order imposed) and SEM models (paths). Is this expected?
A factor model with three indicators is just identified. You may have all factor loadings and the factor variance free. If this is not the case, you need to send the full output and your license number to firstname.lastname@example.org.
It sounds like you have a model where the four factors have an unrestricted covariance matrix such that covarying all factors gives the same number of parameters as regressing two of them on the other two. If so, the models are statistically equivalent and the fit should be the same. If not, I would need to see your full output and license number at email@example.com to answer your question.
Hi, I have attempted a CFA with 59 dichotomous items and 1000 observations. My fit statistics were less than ideal. I am attempting to analyze the misfit and report my findings to a non-technical audience. There are some issues I could use your help on:
Is the input matrix tetrachoric? If so, is there a way to obtain a print out of the input matrix?
I have several positive residuals (as high as .283) Does this indicate that my model implied correlations are smaller than my observed correlations?
Yes, the sample statistics used for model estimation are tetrachoric correlations with WLSMV and binary outcomes. These are printed is you ask for SAMPSTAT in the OUTPUT command. They can be saved using the SAMPLE option of the SAVEDATA command.
A positive residual means that observed value is larger than the model estimated value.
Greg posted on Wednesday, April 22, 2009 - 9:09 am
I'm new to the forum, so excuse me if my question may seem straightforward.
I'm running a CFA on 29 reflective indicators (4 factors) and 2 observed continuous dependent variables (scores to a test). The output says: *** WARNING in MODEL command Variable is uncorrelated with all other variables: scorea *** WARNING in MODEL command Variable is uncorrelated with all other variables: scoreb *** WARNING in MODEL command All least one variable is uncorrelated with all other variables in the model. Check that this is what is intended. 3 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
The output gives relationships between the latent variables (e.g. f1 WITH f2) but not between the latent variables and the scores (e.g. f1 WITH scorea).
Do I need to specify the hypothesized relationships between the latent var and the scores in the CFA already? Then, isn't it already a structural model?
I am using complex sample data with cluster, weight, and groups (2 groups). Could you please advice me how can I compute maximum likelihood (ML), pseudo maximum likelihood (PML) and pseudo-maximum log-likelihood (PLL) estimators in MPlus Version 5.2? I am also interested in corresponding CFI Values. I didn't find the specific commands for these estimators in MPlus manual. Thanks in advance.
Thanks Dr. Linda for your response. But I ran With TYPE=COMPLEX, ESTIMATOR=ML . This command gives me the following error: *** ERROR in ANALYSIS command Estimator ML is not allowed with TYPE = COMPLEX. Default will be used. 1 ERROR(S) FOUND IN TH IENPUT INSTRUCTIONS. So, I can't use Type=Complex and Estimator=ML. What is the alternative way to estimate ML estimator?
I found the Pseudo log-likelihood (PLL) estimator in the article: Asparouhov, T. & Muthen, B. (2006). Comparison of estimation methods for complex survey data analysis.
In that article, the estimator is described by the equation 10. I am saying it PLL. Would you think PML and PLL (in rquation 10 of the above article) are the two different estimators or same. If these two (PML and PLL)are different how can I estimate in Mplus 5.2? In literature I found both are available in Mplus.
I want to compare a "full" model with a "basic" model and include the same variables in both models. However, in the full model, some variables are treated as 1-item measures. For example, the basic model has two factors - the first factor has two variables - and the full model is one where these same two variables are treated as separate one-item measures (i.e., are not loaded on a factor).
I am new to Mplus, and I am not sure what is the correct procedure for specifying these two one-item measures in the full model. Should I fix their variance at 1 or leave them out of the model statement but in the USEVARIABLES statement? What different assumptions would I be making with this change?
Ex: For the models below: All variables are continuous ANALYSIS: estimator=ml; OUTPUT: standardized mod(3.84) tech4;
(Basic model) MODEL: f1 by v1 v2; f2 by v3 v4 v5;
(Full model #1) MODEL: v1@1; v2@2; f2 by v3 v4 v5;
or compared to
(Full model #2) MODEL: f2 by v3 v4 v5;
>>in Full model #2, v1 and v2 are included in the "USEVARIABLES" statement but not specified in the MODEL statement<<
Dr. Muthen, I have a few outliers in my data and I do not want to eliminate them. Is there a way to run a CFA model with certain cases identified as outliers? Or should I just run two models (one with outliers and one without) and compare them.
Is it possible to use weights in conventional SEM analysis? In particular, how can I use weights to estimate CFA model parameters, when ML estimator is used in Mplus? Another question is, I found the correlation between two factors is more than 1 in CFA. This gives me error. So, is there any way to solve this problem?
These parameters are fixed by you and are therefore not estimated. "999" says that something should not or could not be computed, which clearly is the case here.
Anne DeField posted on Tuesday, September 15, 2009 - 2:12 pm
I am running a CFA with both metric and dichotom items. Here I assume the following structure: a latent variable is represented by four observed variables: Three of them consist of metric items and one consists of dichotome items (yes/no). How do I handle these items in this factor anaylsis?
Put the dichotomous one on the CATEGORICAL list in the VARIABLE command. The default estimator for this situation is WLSMV. You can ask for maximum likelihood using the ESTIMATOR option of the ANALYSIS command.
Anne DeField posted on Thursday, September 17, 2009 - 7:38 am
Da C posted on Tuesday, November 03, 2009 - 7:41 pm
I ran a 3-factor CFA model with categorical/ordinal indicators. The means of the 3 factors were set to 0 and their variances set to 1. The loading of the first indicator within each factor was set free. This analysis was weighted and I used the WLSMV estimator.
The model results and STDYX results were identical.
Are these results the structure or pattern loadings? Which ever it is how would I estimate one from the other?
The raw coefficients and STDYX are the same because factor variances are one and variances of the latent response variables underlying the categorical variables are one.
The coefficients are factor pattern coefficients. The matrix for these coefficients is lambda. The matrix for the factor variances and covariances is psi. The product of the two matrices gives the factor structure coefficients.
Da C posted on Wednesday, November 04, 2009 - 1:46 pm
Thank you very much for your quick reply!
I have yet another question concerning my analyses.
I first ran an EFA with categorical/ordinal indicators. This analysis was weighted and I used the WLSMV estimator. Based on the interpretability of this EFA and the fact that there were 3 eigenvalues > 1, I chose the 3-factor solution.
Based, on the 3-factor simple structure I identified in the EFA I ran a CFA model. As I described in the previous post, I ran a 3-factor CFA model with categorical/ordinal indicators. The means of the 3 factors were set to 0 and their variances set to 1. The loading of the first indicator within each factor was set free. This analysis was weighted and I used the WLSMV estimator.
My question concerns the latent correlations from the EFA 3-factor solution and the CFA confirming the 3-factor model. The CFA correlations are much greater than the EFA correlations:
1st & 2nd factor: 0.47 vs. 0.79 1st & 3rd factor: 0.57 vs. 0.92 2nd & 3rd factor: 0.46 vs. 0.73
Why would this occur? Does this indicate an issue with the model or latent correlation estimation in EFA or CFA? Which one is the correct one?
When you create the simple structure CFA and fix factor loadings to zero, this influences the correlations between the factors by forcing the relationship to go through the factors. See the Asparouhov and Muthen and Marsh papers on the website under ESEM. This issue is discussed.
Daiwon Lee posted on Monday, March 08, 2010 - 2:32 pm
I am trying to confirm my measurement model by applying CFA on each of latent construct. However, when I run CFA using three items misfitting seems to occur automatically. Every time I use CFA with three-item constructs I get .000 for RMSEA and SRMR and 1.000 for CFI/TLI. Please see the syntax bellow.
TITLE: CFA for material strain w1 base model;
DATA: File is .dat ;
VARIABLE: NAMES ARE .....
usevariables are clothes1 allow1 goods1; Missing are all (-9999) ;
The reason that you get this is because a factor model with three-indicators is just-identified. Fit cannot be assessed. There are no degrees of freedom.
Daiwon Lee posted on Monday, March 08, 2010 - 9:26 pm
Thank you for the note. Then, is there any way to identify whether the just-identified model is good fit or not? In other words, how can we identify three-item underlying construct model is good fit or not? Thank you!
No. That is why it is a good idea to have no fewer than four factor indicators.
Brian Hall posted on Thursday, March 18, 2010 - 3:24 pm
Hi, I am working on a couple of CFA models. Here are the model statements: MODEL: F1 BY x1 x2 x3 x4 x5; F2 BY y1 y2 y3 y4 y5 y6 y7; F3 BY z1 z2 z3 z4 z5; F4 BY F1 F2 F3;
MODEL: F1 BY x1 x2 x3 x4 x5; F2 BY y1 y2; F3 BY z3 z4 z5 z6 z7 z1 z2 z3; F4 BY zz4 zz5; F5 BY F1 F2 F3 F4;
I am getting the following error: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. Model1 PROBLEM INVOLVING VARIABLE F2. Model2 PROBLEM INVOLVING VARIABLE F4.
I do have in both cases correlations with the latent variable F2 or F4 that exceed 1, and errors that are negative.
My question is how best to deal with this issue.
I have tried fixing the variance of the offending variances to 0 using this command: F2@0 or F4@0. When I do this, my models become unidentified.
I have tried using different start values, even high values using this command: F2*.7 or F2*2, and this did not change the PSI error.
I have tried fixing the variance of those latent variables to 1 using this command F2@1 or F4@1. This allowed the models to run (Rindskof; Psychometrika, 1983). Is this is a correct thing to do? I appreciate your help. Brian
This might allow you to see if the problem is that the first factor loading is not close to one so fixing it to one is a problem. You should not both fix one factor loading to one and fix the factor variance to one.
finnigan posted on Tuesday, April 20, 2010 - 6:41 am
I have a 56 item scale answered on a 5 point likert scale.I am planning to do a two level CFA.
Unfortunately the scale items are not normally distributed despite attempts to use log transformations.
AM I correct in saying that a factor analysis can be conducted in MPLUS using MLR which adjust chi square for non normality while using ML under non normailty should give larger standard errors. It might be useful to compare the two outputs
Can MPLUS provide corrected correlations as part of the output.
If the items have floor or ceiling effects, you should not transform them. You should treat them as categorical variables. You can then use the default of weighted least squares or use maximum likelihood. The categorical data methodology using either estimator deals with the floor and ceiling effects.
If you do not have floor or ceiling effects, you should not transform them but instead use MLR.
finnigan posted on Tuesday, April 20, 2010 - 8:17 am
How are floor and ceiling effects detected in data?
By looking at the univariate frequencies. If the lower or upper categories have a piling up of frequencies, this indicate a floor or ceiling effect. In this case, the variables should be treated as categorical.
I would not log transform, i.e. producing something that you don't have (normal distribution) and making interpretation of your results more difficult. I would use MLR instead. In case of nonnormal distributions, I guess, MLR SE's are always more trustworthy than ML SE's, so a comparison may make no sense. Floor and ceiling effects should be detected by inspecting a graphical display of the distribution, like a histogramm. Many values at the higher end of your distribution indicate a ceiling effect, many values at the lower end a floor effect.
finnigan posted on Wednesday, April 21, 2010 - 9:01 am
Is there any recommended cut off criteria such as 20% of observations located on the lowest or highest response category?
Thanks again for the help. If so is there any references to support the cut off
It is the bivariate tables that are most important. They should not contain zero cells.
Enrique posted on Thursday, April 22, 2010 - 10:07 am
I used a questionnaire with 25 Likert-type items (5 levels), sample n = 129. Ordinal variables, and multivariate non-normal distribution. I run EFA with promax rotation, wich showed that the 25 items are grouped in 4 factors: all variables loading > 0.55 except 1, and 4 variables with crossloadings > 0.25. When I run CFA on these factors, the fit was very poor (RMSEA= 0.20; CFI = 0.84, SRMR = 0.17). The same happens when I change paths according the modification indices suggested, choosing 1, 2 or 3 factors, or eliminating the items with crossloading. I need some help for testing the source of misfit.
Why don't you use the Mplus default EFA rotation method which will give you SEs for all factor loadings so you can see which ones should not be fixed at zero in the CFA. It also gives you Modification Indices for residual correlations.
Or, don't move to CFA by stay with ESEM - see our web site.
Enrique posted on Friday, April 23, 2010 - 2:51 am
Thanks, Sorry but I'm not familiar with that acronym, what is SEs?
These are standard errors of the parameter estimates. The ratio of the parameter estimate to its standard error is a z-test which assesses significance.
finnigan posted on Saturday, April 24, 2010 - 12:16 am
If I have data that appears MCAR, but demonstrates significant non normality. Can MLR still be used if there are no ceiling effects? Or should FIML be used on account of the missing data, but what deals with the non normailty of data?
I'm running a longitudinal CFA with 6 time points and 4 indicators per time. I made my model specification as in example 5.1. as the initial model for a sequence testing measurement invariance. Although my sample is N=380, I got a large significant Chi-square. Could there something else be the reason than a worse model fit?
It means that the model was not able to converge in the default number of iterations. See pages 415-417 of the Version 6 User's Guide for suggestions. If these do not help, send the output and your license number to firstname.lastname@example.org.
I am having some trouble with my analysis. I am running a CFA for my path analysis. The model fit of the the latent factors is good (CFI = 1, RMSEA = 0), howerever the two-tailed p-values of the undstandardized results are non-significant for some indicators. I already rescaled (logtransform) my variables since the variance was very large and I constrained the variance of the factors (@1).
Furthermore, I receive the warning: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. The problems involves my indicators. My data includes negative values > could this be the problem? And could it also affect the significance of the indicators?
I have performed a simple two-factor CFA and received a positive correlation between factors (.50), however it is contradictory to my expectation (I expected a negative correlation). Then I calculated by hand the scores with summarizing the appropriate items, and calculated the correlation which was negative (-.15) as I would expect it. Do you have any idea about this discrepancy? Which correlation should I rely on and which should be reported?
It sounds like you may be reading the data incorrectly or that the factor loadings are all negative resulting in a sign change for the covariance. I would need to see the full output, your calculations, and your license number at email@example.com to say for sure.
nanda mooij posted on Wednesday, August 04, 2010 - 11:17 am
Hi, I have a question about my model, it doesn't fit (not positive definite). This is what I'm trying to fit: f1 BY v1-v16; f2 BY v17-v32; f3 BY v33-v48; h1 BY f1 f2 f3; In the output I saw that de correlation between f1 en h1 is above 1. I've tried to fix f1 at zero, like this: f1@0, but that doesnt help. I also tried to add a correlation to the model: h1 WITH f1, and try to fix the correlation with the factor loadings freed like I red in the discussions, but that also doesnt make the model fit. So what can I do more to fit the model while there is a high correlation between two factors?
When you add the second-order factors, I assume that the residual variance of f1 is negative and this is why you fix it to zero. Is this the case?
nanda mooij posted on Wednesday, August 04, 2010 - 3:20 pm
Yes, Linda, this is the case. Also, the output says the R-square of f1 is undefined. When I fix the res. var. of f1 to zero, Mplus says the covariance matrix is not positive definite, while I don't see anything strange about it, no negative values in that matrix.
nanda mooij posted on Thursday, August 05, 2010 - 6:37 am
How can I solve this problem? Do I have to leave the second order factor out of the model or can I still fit this model in another way?
If the negative residual variance is small and not significant, you can fix the residual variance at zero and ignore the not positive definite message. If the residual variance is zero that means that the 1st-order factor represents the 2nd-order factor perfectly.
nanda mooij posted on Thursday, August 05, 2010 - 2:01 pm
Thanks for the answer. If I want to get the factor scores of f1, f2, f3 and h1, can I fit the model without h1, so that the factor scores will be calculated for f1, f2 and f3. Can I then assume that the factor scores of f1 are the same as the factor scores of h1?
I am conducting a CFA with order categorical indicators.
I conducted the CFA with the categorical option specified, and was wondering how best to interpret the fit indices associated when the WLSMV estimator is used.
I ask because it seems highly unlikely to me that the fit indices under WLSMV are directly and perfectly comparable to fit index values that are obtained in the "usual" CFAs conducted in Psychology with Max.Likelihood and indicators that are treated as continuous.
I realize that "rules of thumb" are usually overly simplistic, but are there any journal articles that address how to interpret fit indices under WLSMV?
See the Yu dissertation on our website where cutoffs are looked at for binary and continuous outcomes. She finds similar cutoffs in both cases.
Yu, C.Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Doctoral dissertation, University of California, Los Angeles.
Dallas posted on Saturday, October 30, 2010 - 4:28 am
I have a question. I'd like to conduct a full-information factor analysis of categorical data. I have 19 indicators of a single trait. I understand that I simply have to specify the estimator to get this. I've done that. However, I would like to allow some of the uniquenesses to correlate in the model. When I do this with ML, I get an error saying "Covariances for categorical, censored, count or nominal variables with other observed variables are not defined". How can I go about allowing correlated errors in a factor model with categorical data and maximum likelihood estimation?
I need your advice, conducting a simple cfa with one factor. There is an error message i don´t know. Even though everything is right with the names of variables and Missings are defined as 99, there is this error: "unable to expand" the factor. Please help me. Tnx
The following syntax was used: VARIABLE: Names are tsk1_a tsk2_a tsk3_a tsk5_a tsk6_a tsk7_a tsk9_a tsk10_a tsk11_a tsk13_a tsk14_a tsk15_a tsk17_a; Missing is ALL (99); Model: TSK-13 by tsk1_a tsk2_a tsk3_a tsk5_a tsk6_a tsk7_a tsk9_a tsk10_a tsk11_a tsk13_a tsk14_a tsk15_a tsk17_a; Analysis: estimator=mlr; Output: Standardized; modindices; residual;
Skewed items can cause a non-normal factor score distribution. This is not necessarily an indication that the normality assumption for the factor is wrong, but rather that the items are not optimal - they don't discriminate well between people with high or low factor scores.
ywang posted on Friday, November 12, 2010 - 11:43 am
Dear Drs. Muthen,
We would like to report the t-test results of a latent factor. Is it possible to conduct a T-test for a latent factor underlying three indicator variables with Mplus?
For this you do a 2-group analysis with measurement invariance and estimate the factor mean in one group with the mean fixed at zero in the other group. The z-test for that mean is then what you want. See Topic 1. You don't do it via estimated factor scores.
Dear all, I have 2000 respondents who have each rated two out of a set of 10 (rotating randomly) stimuli. The respondents have answered 20 questions for each stimuli. So, a total of 400 responses per stimuli X 20 variables. The grand total dataset contains 4000 judgments [2000 respondents X 2 stimuli each] x 20 variables. I want to perform EFA and CFA for each stimuli, but I am still unclear about the treatment of the 'repeated measures' nature of the exercise given the random rotation of the stimuli shown to respondents. Any comments?
Furthermore, a separate issue.. Some of the stimuli are nested within others in terms of attributes. So, this also is a 'nested' EFA / CFA problem. Any comments on that front too?
I should preface this with saying that I am not an expert on this type of design, but here is my quick take.
The random rotation of stimuli assures that the subject groups corresponding to different pairs of stimuli are randomly equivalent so that you can say that all stimuli responses draw from the same population.
I would just do one analysis per stimuli pair, so using 400 subjects for each analysis where the number of variables is 2x20=40.
If you feel there are important differences between stimuli pairs, you could do a multi-group analysis for the 5 groups to compare factor solutions.
Regarding your last question, I don't see a special modeling for this, but would merely keep this nesting in mind when interpreting the findings.
ywang posted on Monday, November 29, 2010 - 12:23 pm
Dear Drs. Muthen: For the CFA model using MLR, the scaling factor for the loglikehood is H0 and H1 is 1.168, but the scaling factor for the chi-square is 1.0. For this model, should we use MLR Or ML? Is there any cutoff for the scaling factor which can indicates whether we should use MLR or ML? Also are chi-square and loglikehood scaling factor always different?
The scaling correction factors differ for chi-square and the loglikelihood due to the fact they are in different metrics. Generally speaking you should use MLR if the scaling correction factor is different from one.
Catherine posted on Friday, March 04, 2011 - 4:47 am
Dear Dr Muthen,
I want to test a 3 factor model with categorical variables. Now the 3 factors only contain 19 of the total 26 variables. What should i do with the other 7 variables?
Should they still be in de USEVARIABLES option or not?
And how will i know if these variables load on one of three factors?
All variables on the USEVARIABLES list are used in the analysis. You should leave the variables off if you don't want them included. If you want to see how they load on the factors, you need to include them. It sounds like you should start with an EFA of the total set of variables.
For a one factor CFA solution, I am getting an error message - "chi-square value could not be computed because of too many categories" - Can you please help me understanding why am I getting this message ? I have total 11 items which are ordinal in nature ( 1- 5 scale) Thanks Reshmi Gupta
Hi, I have 3 latent variables, each latent variable consists of 3 indicators, the indicators are all Likert scale 5-points.
My question is: Likert scale as indicator is treated as continuous or categorical variable, hence can i say that my latent variable is categorical (expressed by c) or continuous (expressed by f)when i run CFA model. Thanks
If you put the Likert variables on the CATEGORICAL list, they are treated as categorical. If you do not, they are treated as continuous. In both cases, the latent variables specified using the BY option are continuous.
I ran a CFA and the residuals and the fit indices sound too good to be true (RMSEA = .00, CFI= 1.00). The factor coefficients are between 0.6 and 0.91, the alphas for the factors are .89-.95, and the correlations between items belonging to the same factor range between 0.7-0.98.
1. I am guessing there is a lot of multicollinearity in this set up. Hence, the high fit. Correct me if I am wrong. ---Is there a way to deal with this problem? I read somewhere that centering might be a technique that people use. Any suggestions?
2. A factor requires at least 4 items. He is left with two items in a factor (in another model). But these are important items. How do people normally deal with this situation (you want to include the items, but do not want to construct a factor with 2 items)?
Regarding the two items, if that is all you have you have no choice but to use them knowing the pitfalls. The model is not identified unless information is borrowed from other parts of the model and model fit for that factor cannot be assessed.
Hi, I have two latent variables (my model two main constructs), i have run CFA for the first latent variable (first construct) then I have run CFA for the second latent variable (the second construct) and i got good fit indices for each construct, Now I want to check the covariance between those two latent variables (constructs) but i do not know how can i do that? could you please give any advice. Thanks.
You can start from the inputs found in the Topic 2 course handout on the website under multiple group analysis. This looks at binary items. You can extend it using the inputs shown in the Topic 4 course handout under multiple indicator growth. This uses ordinal items.
1. Categorize the continuous variable and use multiple group analysis. 2. Use the XWITH option to create an interaction between the factor and the continuous variable and regress the factor indicators on the interaction.
I have an annoying problem. I am running a multigroup CFA. When I try to save factor scores using SAVEDATA the fit statistics and factor loadings I get are different from the model without the SAVEDATA command. Otherwise everything is identical between two specificaitons.
Do you know of any rules-of-thumb for the maximum number of indicator variables you should have on a latent factor? I read some ideas about parceling variables but it seems controversial….so I’m wondering at what point you start worrying about the number of items you have for a scale.
I don't think there is an upper limit on the number of factor indicators. If you have 15 unidimensional items, that should be sufficient for creating a sum score. So between 4 and 15 should be optimal.
Oana Lup posted on Tuesday, May 31, 2011 - 5:44 am
Could you please help?
I test for significance of the difference between intercepts for east european and west european countries.
USEVARIABLES ARE poldisc polint media pid female age agesq educlev emp mar urban swi rus por den wger eger nl slo nor ro mold sp; CENTERING = GRANDMEAN (poldisc, polint, media, age, agesq, educlev); Missing are all (-999);
poldisc ON polint media pid female age agesq educlev emp mar urban swi(mswi) rus(mrus) por(mpor) den(mden) wger(mwger) eger(meger) nl(mnl) slo(mslo) nor(mnor) ro(mro) mold(mmold) sp(msp) !swe (mswe) ;
and I get these warnings : *** WARNING Warning: No specification of mean structure analysis in 'ANALYSIS' paragraph. *** ERROR in Model Constraint command Unknown parameter label in MODEL CONSTRAINT: NEW in assignment: NEW = (HM)
With the default WLSMV estimator, categorical variables must have the same categories in each class. You can collapse categories to achieve this.
chuma owums posted on Tuesday, June 21, 2011 - 12:08 pm
I am knew to Mplus and was wondering how to interpret the Confidence Intervals of a bootstrapped indirect effect. In particular the output from my analysis shows that my result zero was not within the upper and lower limits of a 2.5% bootstrapped CI but not at .5%. do these percentages correspond to 97.5% and 99.5% confidence levels? Any help with this would be greatly appreciated.
The message means that for these subjects the optimum could not be found by the iterative technique, perhaps due to an unusual response vector. Also, be sure that you use the latest 6.11 version of Mplus.
I am having a very strange problem. In SPSS one of my variables has a mean of 0 but then in mplus it says that the mean is 11. I have tried everything, including combing through the spss and text files by hand to see if there are large values or if something got read in incorrectly. I have no idea what could be going on. Thank you!
It sounds like you are reading the data incorrectly. You may have blanks in your data set. Free format data cannot contain blanks. If you can't see the problem, please send your output, data, and license number to firstname.lastname@example.org.
Lena Herich posted on Friday, August 19, 2011 - 11:16 am
I was reading your article “ Applications of continuous-time survival in latent variable Models for the analysis of oncology randomized clinical trial data using mplus.”. In chapter five, several different latent variable models are fit to the data. My question ist, why for the 1f, 2f and 3f models exploratory factor analysis was used and not confirmatory factor analysis. Would it also have been possible to fit confirmatory models, and what would be the differences?
That's certainly possible, but there was not really any well-formed substantive theory behind the measurement instrument that called for specific CFAs.
But note that M4 is a CFA.
Eric Teman posted on Friday, August 26, 2011 - 8:31 pm
I have a 3 factor CFA (with 4 indicators per factor). When I output the results ASCII file, factor loading estimates only appear for 3 indicators per latent variable. Is there a reason that all the factor loading estimates do not appear in the ASCII file?
I am running a two factor CFA on dataset containing 107 variables. We keep getting the following warning when trying to run the code:
Warning: The estimation of a model with 107 variables with the WLSMV estimatory may be slow. Using the VLSMV estimator will produce more timely results. If analysis with the WLSMV estimator is desired, try specifying NOSERROR and NOCHISQUARE in the output command to reduce computation command.
Here is my input. Do you see any error that could be causing this warning?
Regarding an older post of yours from 2010, in my multilevel CFA I also get a warning for a negative residual variance of an item at the between level. You said that if it is very small and non-significant, it can be fixed to 0 or 0.001. It's actually -.001 and non-significant in my case. If I fix it, do you think I have to report that in my paper? Is there a reference for that?
This will not change your solution so I would simply mention it in a footnote. See the following paper which is available on the website:
Muthén, B. & Asparouhov, T. (2011). Beyond multilevel regression modeling: Multilevel analysis in a general latent variable framework. In J. Hox & J.K. Roberts (eds), Handbook of Advanced Multilevel Analysis, pp. 15-40. New York: Taylor and Francis.
Hi, We have a factor analysis model fit in two independent groups with invariance constraints. One unique variance estimate is negative. There are no identification problems. As an experiment, we imposed a positivity constraint on the unique variance in both groups using the model constraint command. The result of that is a model that converges, but we get a message about the standard errors not being computed, and a possible identification problem with parameter 8. Yet there are only 7 free parameters; no parameter 8 exists in the parameter count. Can you tell us what is going on? Why would the model be identified without the constraint, but not identified with the constraint?
When a model is estimated with inequality constraints the constraint parameter is substituted with the so called slack parameter in your case Variance=slack*slack and this keeps the variance positive. The actual parameter in the model is the slack parameter and that is parameter 8. Typically in a situation as the one you describe this parameter will be estimated to a value that is near 0 and thus the variance itself will be estimated to 0 which is a parameter value on the borderline of admissible solutions. In that case the standard methodology for computing SE is not valid and currently Mplus would just report that as a problem. The bottom line is that in this case the standard error for the variance parameter, if estimated at the borderline value of 0, is not reliable. All other results are fine.
Without looking at the exact results I am not 100% sure of the above answer so if this doesn't make sense for your situation send the example to email@example.com
yezi posted on Wednesday, November 09, 2011 - 4:04 pm
Hi, When we do ESEM, is the construc of the test ECFA(that is ESEM)? That is to say, each item load each factor? Thank you very much!
we used WLSMV estimator for a CFA with 3 factors. There are 11 observed variables on each factor. For 2 of the 3 factors, indicators are binary while indicators for the 3rd factor have 3 categories.
A reviewer now asks how binary-based correlations were corrected. He/she claims this to be necessary whenever limited information is used in MPlus.
However, we did not use limited information but raw data of item responses. Is, in this case, a correction necessary and if so, is this done automatically? Can I find any information about this in the manual or the technical reports?
I'm not sure what the reviewer is asking about. The sample statistics for model estimation for WLSMV for a model with no covariates are tetrachoric and polychoric correlations.
Wen-Hsu Lin posted on Thursday, March 08, 2012 - 1:41 am
I try to fit a simple CFA model. I have 4 observable variables, which are all categorical variables, and want to see if they load on one latent variable. However, Mplus kept giving me the following error message. What is wrong? INPUT INSTRUCTIONS data: file is c:\crime1.dat; type is individual; format is 4f1.0; variable: names are w1c1-w1c4; usevariable are w1c1-w1c4; categorical are w1c1 w1c2 w1c3 w1c4; missing is blank; model: dev1 by w1c4 w1c1 w1c2 w1c3; output: sampstat stand mod (4); *** ERROR The number of observations is 0. Check your data and format statement. Data file: c:\crime1.dat *** ERROR Invalid symbol in data file: "ï»¿0000" at record #: 1, field #: 1
You seem to have a problem in the data file. Please send your output, data, and license number to firstname.lastname@example.org.
Wen-Hsu Lin posted on Thursday, March 08, 2012 - 5:38 pm
I am using public computer to analyze it. I know it looks like I have problem in the data file. However, I check it and did not see a problem nor do the SPSS report any strange number or something like that.
It sounds like the dataset may be saved in an incorrect format. Try opening your dataset in Excel and resaving it as a txt file.
seefeh posted on Thursday, March 29, 2012 - 3:31 am
Hello, I would like to run a multiple group (males vs females) CFA using indicator values that are non-normally distributed. I have tried a number of transformations (log, square root and reciprocal), but indicator skew is not reduced to <2. I have tried running a CFA with the indicators as count data, but this doesn't seem to work when running a multiple group analysis. Is there a way around this issue?
If your variables are continuous and do not have a piling up at either end, using a non-normality robust estimator like MLR should be sufficient. If they have a piling up at either end, you can consider treating them as censored.
I conducted a CFA on a two-factor model, where each factor had 2 indicators. The residual variances of all 4 indicators were constrained equal. I wanted to test whether a 2-factor or a 1 factor solution was best- thus, I ran the model once where the two factors were allowed to freely correlate, and again where I constrained the correlation between the two-factors to 1. I determined that the 1 factor solution was better than the 2 factor solution using a chi-square difference test of the nested models.
Subsequently, I ran a 1-factor model, with the same 4 indicators as before. The residual variances of all 4 indicators were still constrained equal. However, I found that when run as a 1-factor model, the model fit indices changed significantly, as did the residual variances. Could you explain why these parameters changed when I changed the model from a 2-factor model (with the factors correlated @1) to a 1-factor model?
Below are my syntax for reference.
2-factor model: DISORG by ydiseng ychaotic; CONTROL by yrigid yenmesh; ydiseng(1); ychaotic(1); yrigid(1); yenmesh(1);
This setup does not constrain the factor correlation to 1, but the factor covariance. This is because your factor variances are not one but freely estimated.
You can instead free all factor loadings by using * and fix the factor variances to 1. And then fix the covariance which is then a correlation. But note that the chi-square difference test is suspect here because you are on the border of the admissible parameter space, namely a correlation of 1.
I have a question regarding the Scaling Correction Factor.
When running CFAs I get Scaling Correction Factors for MLR of >3 even though there is high Goodness of Fit (e.g., CFI = 0.962; TLI = 0.958; and RMSEA = 0.043). What does the high Scaling Correction Factor tell me about the model?
Also - is there a rule of thumb regarding what should be classified as a satisfactory SCF value?
I am trying to run a multilevel CFA where I have children nested in families - less than half of the families have more than 1 child. The purose of multilevel modeling is more simply to account for dependency in the data, not make strong conclusions about factors which vary within and between.
I have run the following using raw data:
VARIABLE: NAMES ARE famid relrf1-relrf8 fmon1-fmon4 psinv1-psinv2; CLUSTER = famid ANALYSIS: TYPE=TWOLEVEL; ESTIMATOR=ML MODEL: %WITHIN% relatew by relrf1-relrf8; monitorw by fmon1-fmon4; involvew by psinv1-psinv2; %BETWEEN% relateb by relrf1-relrf8; monitorb by fmon1-fmon4; involveb by psinv1-psinv2; OUTPUT: STDYX modindices residual; TECH4;
First, Mplus doesn't seem to be recognizing my families by famid because it says "Number of groups 1". Next, I'm getting the error that the correlations among many of my items is either 1.00 or 0.994. However, when I examine a correlation matrix and none of the items appear to be correlated greater than r=.49.
I've attempted examining these data as a single level model, in case the small number of clusters with more than 1 case was causing a problem, and I'm still getting the same correlation messages.
I am trying a include in a SEM model a acquiescence style like the one proposed by Billiet and McClendon (2000). I can't figure out how to impose the constraints to measure the style factor. This is also complicated by the fact that the items use different scales and are coded in different directions (sometimes a large code represents agreement, sometimes disagreement).
The basic idea is to model acquiescence (i.e., responding positively regardless of the question content) using a latent variable. For this at least two balanced sets of items are needed (i.e., for some questions answering positively represents positive attitudes while for others negative attitudes). In the articles the authors used LISREL8 and the figure presented shows only a "+1" (for all the questions a higher score represented agreement and they assumed the effect to be equal for all the questions) for the relationships from the items to the acquiescence factor. Considering that I have different scales (5-7-10 categories) that are ordered in different directions (agreement represented by a smallest code or by the highest) I was wondering if I can impose something like positive or negative relationships (eventually with the possibility of freeing the equality constraint) so I won't need to recode all the questions in the same direction.
You might want to take a look at Confirmatory Factor Analysis for Applied Research by Timothy A. Brown. He uses Mplus for Multi-Trait Multi-Methods models. You might get some ideas from his Mplus syntax that you can apply to your situation.
Sam Hawes posted on Wednesday, August 08, 2012 - 8:42 am
I've ran a model attempting to identify a latent trait. The fit indices are good (CFI-1.00, TLI-1.00, RMSEA- 0.00), but I wanted to check and see if anyone sees any problems in attempting to identify a latent trait with the following model setup. Thank you for your help.
im by im1* im2 (1); el by el1* el2 (2); ca by ca1* ca2 (3);
If Mplus doesn't complain about the model not being identified, it most likely is. It seems that it could be.
Sam Hawes posted on Thursday, August 09, 2012 - 7:01 am
Thank you for your quick response. Would it be accurate to say that the latent variables in the model represent the trait-like stable aspects of the three constructs across the two timepoints? Thank you again.
I have a categorical (5-point Likert scale) single-item latent variable. I know I can treat the single-item latent variable directly as observed in Mplus, but in the CFA framework this would not allow me to asses the model fit of the one factor vs. two factors solution, which is what I am after. Therefore, my syntax would look like: F1 BY y1* y2 y3; F2 BY y4; [F1@0]; F1@1; F2@0;
My questions are: 1.) I know the F2 as a single item latent variable should be fixed to F2 BY y4@1; y4@a; where a = (1 - reliability)* sample variance. However, is this valid also for categorical (Likert scale) variables?! 2.) In case it is, I apologize for the lack of knowledge, but is there a way in Mplus how to actually obtain the reliability and sample variance? And if there is, could you please provide me with the relevant syntax? 3.) the initial syntax works when I do not specify y4 as a categorical variable. When I do, the THETA parametrization is necessary. However, when I use the THETA parametrization, the model does not work. Is there a solution to this problem?
The variance of a categorical variable is not an estimated parameter in a cross-sectional study so you can't fix it at zero.
Cindy Masaro posted on Wednesday, November 07, 2012 - 3:44 pm
Hi Linda, I am hoping you can help me. I am running 5 separate CFAs. Each CFA has one latent factor measured by three indicators. All indicators (in each CFAs) has been measured on a 7 point scale. I have specified these indicators as categorical and used WLSMV as the estimator. For each of these CFAs I get an RMSEA=0.000, CFI=1.000, TLI=1.000. Standardized factor loadings are high with small standard errors. Residuals for covariances/correlations/residual correlations are all 0.000. The modification indices (ON statements for all indicators) all show an MI of 999.000, and EPC 0.000. I'm suspecting something isn't quite right so my question is, why am I getting 999.000 for the MI and can I put any faith in the parameter estimates and fit indices etc.?
Dear Linda, thank you very much for your prompt answer! However, I still have few questions. If I understood you correctly, you are saying to run the syntax with y1-y4 categorical: F1 BY y1* y2 y3; F2 BY y4@1; [F1@0]; F1@1;
When I do, I get the following warning:
WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE F2. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 51.
Therefore, it does not work! 1.) Is there any solution to this? 2.) If I decide to treat the 5-point Likert scale variable y4 as continuous and therefore F2 will be continuous, can I say F2 BY y4; y4@0;
or do I have to set y4 at some other number than 0 (given that it is a 5-point scale Likert variable)? 3.) And if I do have to set it to (1 - reliability)* sample variance, how do I actually find out "reliability" and "sample variance".
I am hoping that the place I am posting is correct. I want to ask a question about a CFA study that has a sample of 3276 individuals, 191 items, 8 factors, and WLSMV estimation method. When I run it, I get an error massage that says the phisical memory of the computer is not enough (i5 processor, 8GB ram, 64 bit). Then i reduce the items to 131, again the same error. I did not try much, but the program worked when i had 77 items. I have two questions now: 1)What is the highest number of items i could analyse at a time? How can i know it? 2)the criteria to reduce the items was a previously conducted EFA study. I simply referred to the size of loadings. I exluded the items if they had a loading of .40 or lower.. Is there a better way to make this decision. I mean statistically, without considering the content background of the instrument.
I never see the whole picture since I cannot run everything at the same time. So, forgive me I am a naive user of Mplus and a learner of CFA. Thanks a lot in advance regards Mustafa Yildiz
Lucy Hebert posted on Wednesday, February 13, 2013 - 1:57 pm
I have a 2 factor model with categorical indicators and solid sample size. I have 6 very distinct groups by sex and city and I am wanting to simply compare the loadings for my hypothesized CFA model between these 6 groups. Is it most appropriate to compare the unstandardized loadings or the standardized in this case? (All indicators have the same response options). Thank you in advance.
Lucy Hebert posted on Wednesday, February 13, 2013 - 1:59 pm
And to clarify-- I am not doing a multi=group comparison analysis-- simply comparing between different strata in unique analyses. Thanks.
It is hard to make that sort of comparison because it is confounded by group variation in factor means and factor variances. So I wouldn't use either approach. The advantage of the multiple-group analysis is that you put the factor on a common scale.
Hello, We conducted a confirmatory factor analysis (one factor out of three indicator variables) and link the factor to another variable using "with" command. The sample size is 54. The unstandardized correlation coefficient is signficant, but the standardized correlation coefficient is not significant. Is this inconsistency due to the small sample size? My other question is what correlation coefficient to report in this condition? Should we report standardized or unstandardized correlation coefficient?
Raw and standardized significance can be different because the sampling distributions of the two coefficients are different. It would be your decision which coefficient to report.
Lucy Hebert posted on Wednesday, March 06, 2013 - 1:40 pm
Unless I am mistaken, Tech10 provides estimates of standardized residuals, but if every indicator I am using has the exact same scale, then wouldn't the fitted residuals (unstandardized) be indicative of strain, and if so, what is an acceptable cut-off value for that? Also, if Tech10 is only used for mixture models, so how does one get standardized residuals with a non-mixture model? Thanks in advance for your help!
For unstandardized residuals you don't get a statistical test so the cutoff is arbitrary.
You can do a single-class mixture analysis to get Tech10.
John Nelson posted on Monday, March 18, 2013 - 11:39 am
I ran a CFA using MPlus 6.12. My sample had 186 parameters. I used a randomized sample of 2,500, drawn from a sample of 5,000. My model fit very well. I now want to test this model in a sample from another country but I have only 82 in my sample. I would like to run a partial model, even just the factor loadings of the 51 items, from 10 different factors. I have read through the discussion board and searched on the web for MPlus solutions but cannot find. Could you please tell me if this is possible using my version of MPlus? Thank you!
You say that you have 10 factors. It isn't clear to me if you have 51 parameters (loadings) in the run with n=82. You don't say if your items are categorical or continuous. A sample size of n=82 is rather small unless your items are continuous or uni-dimensional.
I'm not sure how you could do this. With a sample size of 82, you should have less than 82 parameters.
John Nelson posted on Tuesday, March 19, 2013 - 7:19 pm
Thank you for this response! I do agree I should have less than 82 parameters. Thus, I am seeking to minimize my parameters. My plan was to set all error variances to 1, the command, as I understand, is f1@1; f2@1; etc. That should remove 50 parameters.
I then plan to remove all the correlations/covariances between factors as I am interested in confirming the CFA from a previous sample from the USA and not how correlated the factors are.
My plan was to 1. Set factor variances to 1: f1@1; f2@1; etc
3. Eliminate certain correlations between factors: f1 with f5 - f9@0; this allows the correlation between f1 and f2, f3, f4 but eliminates the rest.
All of the items in my measures/subscales use 7-point Likert scales.
I would be grateful for your reflection that my plan is accurate or flawed. If I am accurate, I would like to know where I can find in the MPlus documents what the command is for me to use in the syntax so I can execute this analysis. I can also not find direction in how to minimize the number of parameters in a small sample.
I don't think fixing parameters is a good idea. You need a different model for the small group.
John Nelson posted on Saturday, March 23, 2013 - 1:15 pm
I have done extensive research on this model in 10 different facilities using large samples and the model held up well using CFA. Those studies were in the USA. This study was conducted in the Caribbean and so I would like to stick with this model to see if it applies in other countries. I did try setting the variances to 1 and so on as I stated above, but it did not help.
I see that parceling my factors makes sense. I did define how the indicators should be parceled. However, after I defined how the indicators should be parceled and ran the model, I received the following error: *** ERROR in MODEL command Unknown variable(s) in a BY statement: S1
I have a 2-factor model with 4 parcel scores in first factor and 4 parcel scores in the second factor. I have checked and rechecked but cannot see what is wrong. Any ideas?
I am getting the below error meesage when i try to run a CFA
*** ERROR Invalid symbol in data file: "ï»¿5" at record #: 1, field #: 1
I have tried saving the file as a txt file, manually checking the data file, altering the input from spss file to sepcify the field size rather than use tab delimited and i continue to get variants on this error.
We don't provide them with SAMPSTAT. With TYPE=BASIC we give standard errors for correlations for categorical variables. We also give standard errors for TECH4 in most cases. You would need to compute the ratio of the estimate to the standard error to get a z-value and get the p-value from that.
Scott Smith posted on Thursday, October 10, 2013 - 10:54 am
I am running a two level CFA with three composites. Two of the composites have three items each. One composite only has two items. After the initial run, one of the items in the two-item composite had an STDYX estimate above 1 at the within level. I set it to @.001 and reran the model. Now the other item in the two item composite has an STDYX estimate above 1. Will I get valid results if I set both items in a two-item composite to @.001, in conjunction with two three-item composites?
I am running a bifactor model CFA, and I am interested in the estimated common variance accounted for by the general factor. Mplus does not give this value by default. How can I calculate/obtain this value?
Perhaps you mean the amount of variance in all the indicators explained by the general factor. If the general factor variance is set at 1, you sum the squared factor loadings and divide them by the sum of the indicator variances.
I am conducting a CFA to test for measurement invariance for whites and blacks on a scale. Here are my models:
VARIABLE: NAMES ARE RACE ACS1-ACS10; USE VARAIBLES ARE RACE ACS1-ACS10; CATEGORICAL ARE ACS1-ACS10; MISSING ARE ALL (999); GROUPING is RACE (1=black 0=white); MODEL: pos BY ACS1 ACS2 ACS3 ACS4 ACS5; neg BY ACS6 ACS7 ACS8 ACS9 ACS10; OUTPUT: STDYX MODINDICES;
VARIABLE: NAMES ARE RACE ACS1-ACS10; USE VARAIBLES ARE RACE ACS1-ACS10; CATEGORICAL ARE ACS1-ACS10; MISSING ARE ALL (999); GROUPING is RACE (1=black 0=white); MODEL: pos BY ACS1* ACS2 ACS3 ACS4 ACS5; neg BY ACS6* ACS7 ACS8 ACS9 ACS10; pos@1neg@1; MODEL white: pos BY ACS1* ACS2 ACS3 ACS4 ACS5; neg BY ACS6* ACS7 ACS8 ACS9 ACS10; OUTPUT: STDYX MODINDICES;
I am unsure what Mplus is constraining to be equal across groups (e.g., factor loadings, intercepts, error variances) as a default in the first model. Also, is there a way to individually constrain factor loadings, intercepts, and error variances to be equal across race so that I can conduct tests for weak, strong, and strict invariance?
See the discussion in Chapter 14 on multiple group analysis. This should answer your questions. For the models to test for measurement invariance, see the Topic 1 course handout under multiple group analysis. See also the Version 7.1 Language Addendum on the website with the user's guide where a new feature that automatically tests for measurement invariance is described.
I have 25 different variables and only 2 of them give a good fit. In all the rest, the x squared is significant and the RMSEA high. However, in many variables the CFI and TLI are close to 1. Can I assume that the model fits the data because of the CFI and TLI values and ignore the x squared?
Sarah Hafidz posted on Wednesday, January 22, 2014 - 7:28 pm
I ran a CFA model for 75 indicators into 12 latent variables. The fit indices showed that the model fit is good. However, it also came up with a warning as follows:
THE MODEL ESTIMATION TERMINATED NORMALLY
WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE FACTOR4.
Does this essentially mean that I cant accept the model and use it for other analysis?
TECH3 is the matrix of variances and covariances among the parameters. See the SAMPLE option of the SAVEDATA command to save the observed variable covariance matrix. See the RESIDUAL option for the model estimated covariance matrix.
SY Khan posted on Thursday, March 13, 2014 - 3:30 am
Thanks for your guidance on the above Dr. Muthen.
I have been able to get the RESIDUAL option working. However, since my data are categorical the SAMPLE SAVEDATA command is giving the correlation matrix as default.
I have tried using the TYPE option along with the SAMPLE option, but it is still not giving the covariance matrix.
Is it possible to obtain a covariance matrix for categorical dependent variables at all?
Many thanks for your time and guidance in advance.
The covariance matrix is not the matrix analyzed for categorical variables. If you want a covariance matrix for these variables, remove the CATEGORICAL option.
Tom Bailey posted on Sunday, April 06, 2014 - 8:56 am
I was hoping you (or someone else on the board) may be so kind as to answer an issue related to items on one of latent factors in a CFA model using the WLSMV estimator in MPlus.
When I run an EFA in SPSS or MPlus or a CFA in AMOS all the item loadings on my latent variable are positive (with the exception of one). However, when I run the CFA model in MPlus loadings on this variable are now all negative (again with the exception of 1). Is there a rational explanation for this, or is it something that perhaps I am doing wrong when specifying the model?
PPP has not been developed for multilevel models. It is not available.
Nara Jang posted on Monday, April 28, 2014 - 12:30 pm
Dear Dr. Muthen,
I got following warning, after conducting CFA latent variable. Would you tell me how I can solve this problem. Thank you very much for your great help!
WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE I16CAT.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 7.
Nara Jang posted on Monday, April 28, 2014 - 1:27 pm
Dear Dr. Muthen,
This is follow-up question. I found out the theta value of the numhsl variable in parameter specification showed "8" and the other variables had "0". So I removed the "numhsl". The model fit indices are as follows:
Chi-Square Test of Model Fit Value 0.000* Degrees of Freedom 0 P-Value 0.0000
RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.000 Probability RMSEA <= .05 0.000
CFI/TLI CFI 1.000 TLI 1.000
WRMR (Weighted Root Mean Square Residual) Value 0.000
Would you tell me if it is correct to remove the numhsl variable. And is it OK to interpret the model is good based on the CFI/TLI value?
I tried removing i16cat, but the result indicated that the negative variance/residual variance of the numhsl variable. So I keep the "i16cat" and delete the "numhsl" variable. Then the result showed as aforementioned.
Thank you so much for your expert explanation in advance!
Regarding your first question, I would need to see the output and your license number.
The fit of a model with zero degrees of freedom cannot be assessed.
Nara Jang posted on Saturday, May 03, 2014 - 10:15 pm
I conducted CFA with a half random sample drawn from my dataset. I would like to use weighting methods, because women and old aged people were oversampled. So I downloaded 5 zip code areas (the surveyed areas) from the U.S. census web. Would you tell me if it is correct or not that I have to use weighting method for only CFA or both total sample? The first random half sample was used for EFA.
Thank you very much!
Nara Jang posted on Sunday, May 04, 2014 - 5:48 am
Dear Dr. Muthen,
The weighting method I posted earlier is for regression/logistic regression. Would you mind explaining weighting methods for CFA or recommend any reference regarding weighting methods for CFA?
Thank you very much for your expert explanation in advance!
Dear To test the factor structure of a screening tool for mental health problems (yes or no items; 7 scales -but no total score; some items appear on two scales)I ran a CFA (WLSMV) :
ADU by M10 M19 M23 M24 M33 M37 M40 M45; AI by M2 M6 M7 M8 M13 M35 M39 M42 M44; DA by M3 M14 M17 M21 M34 M35 M41 M47 M51; SC by M27 M28 M29 M30 M31 M43; SI by M11 M16 M18 M22 M47; TD by M9 M20 M25 M26 M32; TE by M46 M48 M49 M51 M52;
And check if the MODINDICES suggest if correlations between factors are needed. The model estimation terminated normally but with warnings (non-positive definite) Removing the items that caused trouble did not solve the problem. It The model may be too complex to test in 1600 boys
Yet, I wonder if it makes sense to just perform seven separate CFAs (e.g., one input file testing : ADU by M10 M19 M23 M24 M33 M37 M40 M45; then a next input file AI by M2 M6 M7 M8 M13 M35 M39 M42 M44; etcetera). After all, and conceptually, this tool was designed to assess 7 several construct that are not supposed to load on a higher order factor. This would avoid having scales in one and the same model that include the same items, and having scales that are related to each other.
So, do you think this is a strategy that makes sense? I would really appreciate your input as I have difficulties to find discussion or examples that are related to my question. Cheers Olivier
I have a second order CFA model, and I would like to get a histogram of the distribution of estimated factor scores. I succeeded in getting that with the plot3 command. However, in terms of lay-out I prefer the frequency table over the histrogram to make it in excel in the format of the journal. Is there a command to give me the estimated factor scores in a table?
No, there is no such option. You can save the factor scores and create the table using another software.
Djangou C posted on Friday, January 09, 2015 - 7:24 pm
Hi I am doing a simulation study with ESTIMATOR=BAYES. And I am interested in the median, mean and the mode for point estimate. The default in Mplus is the median. Is there a way to get the same stat for mean and mode in simulation studies? Thank you.
Djangou C posted on Sunday, January 11, 2015 - 1:08 am
Fatih Koca posted on Monday, January 26, 2015 - 10:14 am
Hi I need help on that. Here is my question How I can modify this code to use effects coding method of identification and introduce phantom constructs for each of the lower-order constructs to convert the variances covariances into standard deviations and correlations?
Grouping is SchLev(1=ELEM, 2=MIDDLE, 3=HIGH) ; idvariable=ID; Missing are all (99, 777);
I don't understand what you want to do. Is there a reference that you are going by?
Fatih Koca posted on Tuesday, January 27, 2015 - 8:01 am
Dr. Muthen, The script is above. What I want to use effects coding method of identification and introduce phantom constructs for each of the lower-order constructs to convert the variances covariances into standard deviations and correlations. However, I really could not figure out how I can?
I don't know what you mean by lower-order constructs in this context. The relationship between f1 and f2 is a correlation. Perhaps you want to ask this question on a general discussion list like SEMNET.
lee posted on Thursday, January 29, 2015 - 8:47 am
I am handling a single item measure. I have checked the slide 44 in topic 1 but still not sure how to calculate the reliability. How can I get sample variance, psi and reliability?
I am running CFA for my path analysis. However, I received an error message as below:
"WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE ATT1."
I checked the ATT1 variable. It has negative residual variance. It does not look How can I fix this problem? Thanks
Yes you would add a line for the weight variable "weight= ".
The normalization is done automatically for you.
Jian-Bin Li posted on Wednesday, May 27, 2015 - 7:48 am
Hi, I am using Mplus 7.0 to construct a CFA model. The data are 25 observed ordinal variables. The model consists of 5 first-order factors and an additional factor that several items are cross-loaded on. Following is my input:
variable: NAMES ARE nation gender period sdq1-sdq25 sdqm1-sdqm25 sdqp1-sdqp25; USEV ARE sdqm1-sdqm25; categorical are sdqm1-sdqm25;
model: emo by sdqm3 sdqm8 sdqm13 sdqm16 sdqm24; con by sdqm5 sdqm7 sdqm12 sdqm18 sdqm22; hyp by sdqm2 sdqm10 sdqm15 sdqm21 sdqm25; peer by sdqm6 sdqm11 sdqm14 sdqm19 sdqm23; pro by sdqm1 sdqm4 sdqm9 sdqm17 sdqm20; method by sdqm1 sdqm4 sdqm9 sdqm17 sdqm20 sdqm21 sdqm25 sdqm7 sdqm11 sdqm14;
I am using Mplus 6.1 to construct a CFA model. The data are 12 observed ordinal variables. The model consists of 3 first-order factors. Because of the ordinal variables, I am not sure whether I have to use Maximum Likelihood (ML) estimation, or, as proposed by DiStefano en Morgan (2014), the Weighted Least Squares — Mean and Variance adjusted (WLSMV).Which one do you prefer, and do you know a reference to support this decision?
Thank you very much for your prompt reply. ML seems to be the best choice. After running a CFA and looking at the Modification Indices, I want to add an additional parameter between two items within one factor, in order to improve the model fit. In the user's guide I cannot found the proper command to do so. Could you please help me with this?
What estimator should be used in the case of a second order factor analysis model where all the observed variables are binary?
Also, is it possible to weight the data (which is household survey data), or should it be weighted prior to inputting it to Mplus? When trying to weight the data, I receive the error message that "Categorical variable SL6 contains non-integer values". SL6 is the last variable listed just before the weight variable.
Hi! I have run a cfa with three correlated factors (wlsmv estimation). Why is it that when I run the same model but as hierarchical, this produce the very same fit results as the correlated model? How, then, to choose between the two?
You can't choose on statistical grounds since they produce the same correlation matrix. It is just two ways at looking at the same thing. We have this situation often - for instance, with EFA and correlated vs uncorrelated factors. Go with whatever alternative is most useful to you.
Hi, Is it possible to set minimum and maximum item loadings in CFA using Mplus? For instance, I would want one item to load with a .7 or higher on a latent factor and another to load with a .5 or lower. I am trying to simulate several models for fit comparison.
You can do this using MODEL CONSTRAINT. See the user's guide for further information.
Yanxia WANG posted on Tuesday, August 11, 2015 - 1:32 am
Hi Professor Muthen,
I tried to do a CFA by using Mplus, however, the result keeps warming that "unexpected end of file reached in data file". I checked the related response before, and then checked the number of variables in the "names" part and found that it was as the same as the column of variables in the data set. I really do not know how to deal with this issue, please help me to figure it out. Thanks so much!
It sounds like you have blanks in the data set and are reading it free format where blanks are not allowed. If you can't see the problem, send the output, data set, and your license number to email@example.com.
Yanxia WANG posted on Tuesday, August 11, 2015 - 11:38 pm
Thanks Muthen, I have already fixed it through deal with missing value in the data file. Still thanks for your reply.
Robert Buch posted on Monday, October 19, 2015 - 12:16 am
Dear Dr. Muthen,
Analysis: Type = COMPLEX; MODEL=NOMEANSTRUCTURE; ESTIMATOR = wlsmv;
I obtain the SRMR fit, but when adding: CATEGORICAL =....
Then I no longer obtain the SRMR.. Is there a way to obtain it when using the "categorical=" option?
From reading an earlier post I thought "MODEL=NOMEANSTRUCTURE;" would do the trick, but seems this does not help?
With CATEGORICAL outcomes, SRMR is available only when there are no thresholds and no covariates. If this is your situation and you don't get SRMR, please send the output and your license number to firstname.lastname@example.org.
(I accidentally posted this in the "mean structures" thread, but I cannot figure out how to delete that comment; apologies)
I am using Mplus to test the theorized factor structure of a 6-item unidimensional measure. The items are rated on a 1-5 scale with strongly disagree <-> strong agree anchors.
My question is this: is there a minimum percent of my cases that need to select a given answer option (e.g., "1: Strongly Disagree) in order to assume that my data is continuous? I.e., say only 2% of my cases indicate "1: Strongly Disagree" for item 1. Is this a problem? What about if only 2% of my cases indicate "1: Strongly Disagree" for the measure as a whole?
I have heard that there is a rough rule around 5%, but I have yet to find a citation for it. I.e., I have heard that if less than 5% of my cases indicate a response (e.g., "1: Strongly Disagree"), that A) I can no longer assume my data is continuous, and B) that I should combine that response with another (e.g., combine "1: Strongly Disagree" with "2: Disagree" to create a "Disagree" category).
I am working with a model that is consolidated in the literature, showing good model fit adjustments in several cross-cultural studies. I'm trying to test this model (18 Observed variables split equally into 6 latent variables) with my data (22.000 subjects divided into 20 countries), but the model fit indices are not being satisfactory, and I'm not sure whether I am using the correct approach to analyze it. One of the differences between my data and the data which the model has been tested in is that I have a greater variability of age and I'm not sure if it's composing a heterogeneous population, and consequently influencing the model fit index (as for the construct that I'm working with variations in the scores during the life span are expected). I have tested the model with robust estimators for non-normal samples (MLR, WLS) and I have tested the model using age as a covariate as well (MIMIC). However, I did not get improvement in the model fit. I wonder if it would be reasonable to test the CFA MIXTURE MODELING for this case. I appreciate any comments.
It sounds like you are using a model that was validated on a sample from a different population. I am not sure mixture modeling would help here because the population it was validated on was not unobserved. Try an EFA to see if the CFA you are using is close the what the data show.