Message/Author 

anonymous posted on Monday, January 22, 2001  2:44 pm



Therapists evaluate their supervisor on 43 questions and we need to determine a stable factor structure that reflects qualities of supervisors. What complicates things is that several therapists share and thus evaluate the same supervisor which creates a multilevel situation. We have been doing factor analysis using SAS proc factor (exploratory factor analysis), ignoring this multilevel structure, which is of course not completely right. Your 1994 paper (I read second hand from the book by Heck and Thomas, 2000, which referenced your paper heavily as the definitive answer to this problem) proposed decomposing the total covariance to the within and between parts. I understand that Mplus can do this. My questions are: 1. Does this strategy apply to our multilevel situation? If yes, can we use SAS proc factor to analyze the within and between covariance from Mplus? What is the interpretation for the factor structure based on within vs. between covariance? 2. What analytic strategy do you recommend to assess the similarity/difference in factor structure generated by the within vs. between analyses? 3. Is there an example in your Mplus support site that matches our problem closely? 


Yes, this sounds like it would be suitable for multilevel factor analysis as I described it in the 1994 article. The supervisors form clusters of therapists, with unequal numbers of them within clusters. Mplus offers a couple of different forms of this analysis. The Type=twolevel analysis draws on a model where there are both between and withinvariation sources. The withinlevel part of the model describes the factor structure for how the therapists' 43 evaluations covary across therapists. The betweenlevel part of the model describes how the 43 supervisor means covary across supervisors. The estimator is based on maximum likelihood where the analysis of the between and within parts of the model are analyzed simultaneously. The number of factors, loading values, and factor distribution parameters can be different on the between and within levels. Tests of equalities can be made. Exploratory factor analysis (EFA) can be done when carried out in this confirmatory framework, using the m**2 restrictions needed (m being the number of factors) on Lambda and Psi, where the restrictions are placed on both levels. Experience shows that the between structure is often different from and simpler than the within structure (see, e.g. reference in the 1989 Muthen Psychometrika article as referenced on this web site). A simpler approach is to use the pooledwithin sample covariance matrix (scaled to a correlation matrix) for a regular EFA. This gives estimates close to those that are obtained for the withinlevel parameters in the twolevel analysis described above. Use sam;ple size N  C, where N is the total number of therapists and C is the total number of supervisors. For the between level parameters, the sample between covariance matrix can be used, although this is not an unbiased estimator of the between covariance matrix; the unbiased estimator can be used instead (for details, see the User's Guide, Technical Appendix 10). Betweenlevel estimates can differ quite a lot from those obtained through the twolevel analysis. The Mplus web site Example section has a similar analysis under Continuous, cont10. 


How do you format data to perform multilevel analysis? that is to say, are clusterlevel data repeated for each element in the second level, or are they written only once? Gianbattista FLEBUS 

Anonymous posted on Wednesday, July 03, 2002  10:37 am



Clusterlevel data are repeated for each element in the cluster. 

Anonymous posted on Tuesday, January 21, 2003  10:29 pm



Is the quasiML fiting function for multilevel factor (MFA) anlaysis robust to designs that are unbalanced and have a nonnormal multivariate distribution? In any case, please supply references. Also, please supply some guidelines and limitations for using the MLM estimator in MFA under nonnormal situations (e.g., normalized versions of Mardia's coefficients are greater than 10 for skew and kurtosis). please send your responses to: tolandmd@unlserve.unl.edu Thank you for any and all of your help. 

bmuthen posted on Wednesday, January 22, 2003  6:45 pm



Mplus 2.1 has both MUML (limitedinformation, quasiML) and FIML (full information ML) estimators, each with a nonnormality robust version for the standard errors (called MUMLM and MLR, respectively). Twolevel MLM in earlier Mplus versions is now called MUMLM. More about these estimators is stated in the Addendum to Mplus Version 2.1 on the Mplus web site. It is my experience that MUML agrees well with FIML also with data that have very different cluster sizes (unbalanced data) and this has also been seen in studies by Joop Hox. MUMLM nonnormality robustness for s.e.'s is expected but has not been thoroughly studied nor written about; simulation studies can explore this. MLR nonnormality robustness for s.e.'s is expected due to the sandwich formula used. With very skewed data it is often the case that there is also strong floor or ceiling effects in which case linear models are not suitable, so the s.e.'s correction does not help. 


We have data on teachers, evaluating leadership at school. I did a multilevel factor analysis which has nice results. Now I want to calculate scores for these underlying concepts on the school level to do some exploratory analysis with pupil data. How do I calculate these scores for the schools. Are there some good practices that someone could advice me? Do I calculate on the schoollevel the pooled mean scores for the individual items and afterwards sum up these scores weighted by the inverse of the error variance? Are there better ways of doing this? 

bmuthen posted on Monday, February 23, 2004  8:34 am



Mplus prints out the estimated factor scores for factors on the between level. 


See page 88 of the Mplus User's Guide to see how to use the SAVEDATA command to save factor scores. 


Inserting the following line: SAVE=FSCORES results only in the original scores. In your User's Guide you mention that this option is not available for TYPE=TWOLEVEL. Neither is the option FSCOEFFICIENTS available in this type of analysis. Do I make a mistake? 


I just ran an example and got factor scores using Version 2.14. What version of the program are you using? 


I also use that version. I'll send you an example of the output. 


Similar to the therapist example above, I have personality items that I want to look at the factor structure. The respondents are grouped within families (actually twin pairs). Unlike the cont10 example, my items have only 2 levels and need categorical methods. I do not have any predictors for the family/twin level, and am not clear about the need/interpretation of the within and between factors. Also, I wanted to do a multigroup analysis to look at age groupings. I am running into trouble because the TWOLEVEL option forces a ML estimator (or MLR) with numerical integration, and this is incompatiable with the THETA parameterization and MGROUP option I am using. I have evidence for strong age invariance, and could colapse into one group and use DELTA (because I will no longer test strict invariance  or even invariance), but would prefer to keel the multiple groups if possible. The error messages said something about using MIXTURE and KNOWNCLASS, but I am unsure of the implications of this. Can youelaborate or provide a reference? Given the basic problem of multigroup (age or gender) factor analysis with multilevel (family) data, what suggestions would you offer for me to pursue? ANy paper or example that had similar issues would be great. Sorry to make such a sweeping request. 


I would suggest taking the multivariate approach to family data described in Khoo and Muthen. See our reference list. Then you can use WLSMV and multiple group. 


Ok  I will look at this paper. IT does appear to be longitudinal data, while mine is crosssectional. My levels are twin within twin pair, and my groups are agebands. I still want to get a single factor using all the twins. Perhaps this will be clearer in the paper. 


The general idea is helpful. It can be applied to bothlongitudinal and crosssectional settings. 


Linda  I have reviewed the paper and am uncertian about the applicability. Recall my goal is to handle the dependancies due to twins in a multigroup factor analysis with binary indicators. In the paper, family was the unit of analysis and each sibbling was allowed to have their own growth parameters. If I follow your suggestion, twinpair would become the unit of analysis, and I would estimate a common factor for each side of the twinship. I think I would need to insure that the same common factor was produced on each side of the twinship  I'm not sure if this would be by constraining loadings or directly constraining the factor mean and variance. At this point, I am not sure that I haven't undone any "handling of the dependancy" that was addresses by the multivariate approach. Please, could you comment on this? Also, I have been thinking about TYPE=MIXTURE using KNOWNCLAS to indicate my two or three groupings. In this senario, could I also use TWOLEVEL clustered on twin? Please, could you also comment on this? Thanks 


Linda  Since it is arbitrary which twin is placed on each side of the twinship, I am wondering if there is a montecarlo senario where I could get MPlus to repeatedly randomly divide the twins and estimate the model and present me with averages. Thanks again 


To ensure that the same factor operates for both twins, your multivariate model for the joint anslysis of the twins should have one factor for each twin where the factor correlation (assuming factor variances are fixed at one to set the metric) is fixed at one and the factor loadings are held equal. If you go to MIXTUE TWOLEVEL you will have the same numerical integration issue you started with. The twins are not mixed together in the analysis. If you have two factors as described above, a random arrangement is not an issue. 

Sungworn posted on Tuesday, June 14, 2005  2:13 pm



I am wondering if Mplus can do multilevel factor analysis of dichotomous data (i.e., achievement test data where 1=right, and 0=wrong)? Thanks. 

bmuthen posted on Wednesday, June 15, 2005  7:43 am



Yes, this can be done using ML. 

Sungworn posted on Thursday, June 16, 2005  1:04 pm



Dr. Muthen, Are you familiar with NOHARM, a computer software for multidimensional IRT? If so, do Mplus and NOHARM yield the same results in terms of factor loadings and thresholds? 

BMuthen posted on Friday, June 17, 2005  1:38 am



I am not very familiar with NOHARM but I don't think the estimates would be the same. 


Hello Linda and Bengt, I'm wondering if I can conduct the following analysis in Mplus. I modify the example 9.9 and 9.10 from the Mplus version 3 User's guide on pages 205207. I have 31 clusters would that be large enough cluster size? TITLE: this is an example of twolevel CFA with continuous factor indicators, covariates,and random slopes DATA: FILE IS ex9.9.dat; VARIABLE:NAMES ARE y1y4 x1x4 w clus; CLUSTER = clus; BETWEEN = w; ANALYSIS:TYPE = TWOLEVEL RANDOM; ALGORITHM = INTEGRATION; INTEGRATION = 10; MODEL: %WITHIN% fw1 BY y1y4; fw2 BY x1x4; s  fw1 ON fw2; %BETWEEN% fb BY y1y4; y1y4@0; fb s ON w; Thanks a lot, Pancho 

bmuthen posted on Thursday, November 17, 2005  5:15 am



See answer to the same question under SEM. 

Marco posted on Monday, December 19, 2005  1:21 pm



Hello Linda, hello Bengt, I experience sometimes, that a MFA with estimator=MLR yields an undefined scaling factor. Judging from the preliminary steps (from Muthén, 1994), the chi²statistic and the fit indices seem to be ok. So, what is the meaning/reason of an undefined scaling factor? Is there a way to conduct an chi²differencetest with these results? Many thanks! Btw, is it possible to see somewhere on the homepage, what exactly has been updated? That would be a good idea, since the homepage contains so many important information. 

bmuthen posted on Tuesday, December 20, 2005  5:47 am



The scaling correction factor comes out negative, i.e., the estimation gives a poor approximation to the chi2 asymptotic distribution. Wald testing is an alternative, but is not easy to do by hand; will be available in future Mplus. 

Marco posted on Tuesday, December 20, 2005  5:55 am



I guess that the "poor approximation" refers to the estimation of the scaling factor. Does this imply that the chi²statistic itself is unreliable? 

bmuthen posted on Tuesday, December 20, 2005  6:09 am



Yes. 

Marco posted on Tuesday, December 27, 2005  3:20 pm



Hello, based on my limited trials, I found an undefined scaling factor only in models, where an indicator is specified as within (despite having little between variance). The scaling factor becomes positive defined after eliminating the indicator entirely from the analysis or allowing the indicator to vary within and between. Is this dataspecific or generally expected? Thanks! 


Most likely data specific. 


Hello, After watching the latest training on multilevel analysis and reading the Grilli paper on multilevel factor analysis with ordinal variables, I have several questions resgarding multilevel CFA in MPLUS. 1) Are there any other examples of papers that discuss the interpretation of the MPLUS output for multilevel CFAs with categorical variables? 2) Dr. Muthen said that the categorical multilevel CFA is essentially a 2parameter IRT model. Is this still the case when the model doesn't have random slopes or does it then become a Rasch model? 3) Are the factor variances at the within and between levels directly interpretable in the case of categorical CFA and can I use it to calculate an ICC? 4) what do the thresholds mean in the case of the categorical CFA output? Thank you for your help. Sincerely, Magdalena Cerda 


1) There are papers on multilevel CFA, but not discussing the Mplus output per se as far as I know. An early paper with continuous outcomes is: Muthén, B. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338354. (#37) 2) Rasch has the same slope for all items. If the slope is random or not (cluster level variation or not) is another matter. One can have a Rasch or not a Rasch model and have fixed or random slopes. 3) For that you need to hold the loadings invariant across the two levels and that often does not fit as well as letting them be different. 4) With binary outcomes they are the same as the negative of the intercepts. For translations between Mplus and IRT parameterizations, see our Short Course handout from Day 3 which can be requested off the web. 


Dear Dr. Muthen, Thank you very much for your reply to my questions. I have ordered the handouts from the short course for lectures 3 and 5. In the meantime however, could you tell me how I can calculate the ICC from the output obtained from a categorical multilevel CFA? I cannot hold the loadings invariant across the two levels because I have 2 factors at level 1 and 1 factor at level 2. Thank you. Magdalena Cerda 


Icc is a concept for a continuous variable where the variance is a freely estimated parameter. This is not the case with a categorical variable because you don't estimate a free variance parameter for the dependent variable (the mean p and the variance p(1p) are mathematically linked). You can talk about an icc for a factor as I did in the article I mentioned, but for that you need loadings that are invariant across levels. So I don't see how you can meaningfully compute an icc here. On the other hand, I don't see the need for it either because the estimated model has all the information you need  the amount of between cluster variation tells you how much 2level modeling is needed. 


Dear Dr. Muthen, Thank you for your reply. Is there a way, given a multilevel factor analysis with different loadings at the two levels and categorical variables, to calculate the level 2 reliability coefficient from the MPLUS output? For example, as proposed by Raudenbush in some of his papers on threelevel logistic Rasch measurement models? Sincerely, Magdalena 


Could you give me a reference to a key paper on this? 


Dear Dr. Muthen, Two papers which discuss threelevel logistic Rasch measurement models, and present equations to calculate level 1 and level 2 reliability are: Raudenbush, S.W. , Johnson, C. and Sampson, R. J. (2003). A multivariate, multilevel Rasch model for selfreported criminal behavior. Sociological Methodology, Vol. 33(1), 169 211. Cheong, Y.F. & Raudenbush, S.W. (2000). Measurement and structural models for children’s problem behaviors. Psychological Methods, 5(4), 477495. THanks, Magdalena 


Dear Dr. Muthen, I have read the short course handouts and I still have some questions about the output from the MPLUS categorical multilevel factor analysis. 1) should the thresholds be divided by the level 1 factor loadings to get IRT parameters or to the level 2 factor loadings? what level do these thresholds correspond to? 2) how can one calculate item and scale information from a twolevel logistic IRT as output in MPLUS? 3) how can one calculate within and between reliability from a twolevel logistic IRT as output in MPLUS? 4) how can one specify in MPLUS a Rasch twolevel model otherwise equivalent to a twolevel logistic IRT? Thank you for your help. Sincerely, Magdalena 


You may find it useful to study the paper posted on our web site under Recent Papers: Grilli, L. & Rampichini, C. (2004). Multilevel factor models for ordinal variables. Submitted for publication. This gives details in an Mplus framework. 1) In twolevel modeling, means/intercepts/thresholds are given on level 2 (see 2level linear regression in the RaudenbushBryk book as an example). It sounds like you are using a model that has different loadings for the 2 levels. I don't know that multilevel IRT models have addressed the issue of differing loadings (discriminations) on the different levels. If you compare to the Raudenbush et al (2003) article in Soc Meth, eqn (2) is written in classic IRT form, but when adding the page 183 multilevel features, it looks like the discrimination parameter lambda has been dropped. And more to the point, I don't see why it is necessary to connect to IRT  all you are interested in is being able to plot your item characteristic curve as a function of within and between variation in the ability. You can do that straight from the Mplus model (again see Grilli & Rampichini). But if you show me a multilevel IRT model with different loadings, I will make the translation. 2) Regarding within and betweenlevel reliability, I still have to read up on the references you gave me to answer this. These are not quantitites I am used to looking at (I think). 3) A twolevel Rasch model in line with the Raudenbush article would seem to be easy to specify in Mplus using ML logit. You simply set loadings equal across items and across levels. Keep me informed about your progress. And I will try to find time to read those 2 articles you suggested (my reading stack is just a little high right now...). 


Thank you very much for your reply and for taking the time to keep answering my many questions! 1) I did read the Grilli and Rampichini paper, but I would like to make a statement about the overall precision, or reliability, of the scale at the individual and neighborhood levels, and this paper only proposes formulas to calculate communalities at the item levels. That's why I thought I could at least calculate the scale information from the transformed IRT parameters... 2) Am I right in assuming that if one has two factors at the subject level and one factor at the neighborhood level, one should not constrain the items to have equal loadings at the two levels? This is the model I have specified: MODEL RESULTS Estimates S.E. Within Level FW1 BY Q12AR 1.000 0.000 Q12BR 1.064 0.097 Q12CR 0.766 0.134 Q12ER 0.651 0.056 Q12FR 0.562 0.146 FW2 BY Q11AR 1.000 0.000 Q11BR 1.297 0.090 Q11ER 1.439 0.178 Q11FR 0.698 0.059 Q11KR 0.582 0.048 Q11MR 0.993 0.077 Q11GR 1.033 0.067 FW2 WITH FW1 2.830 0.186 Variances FW1 5.784 0.508 FW2 2.761 0.265 Between Level FB BY Q12AR 1.000 0.000 Q12BR 1.314 0.192 Q12CR 0.516 0.106 Q12ER 0.847 0.122 Q12FR 0.641 0.285 Q11AR 0.578 0.081 Q11BR 1.022 0.163 Q11ER 1.842 0.324 Q11FR 1.265 0.211 Q11KR 0.976 0.149 Q11MR 1.764 0.262 Q11GR 1.100 0.162 Thresholds Q12AR$1 0.101 0.093 Q12BR$1 1.267 0.133 Q12CR$1 0.094 0.163 Q12ER$1 0.402 0.110 Q12FR$1 1.280 0.097 Q11AR$1 0.764 0.060 Q11BR$1 0.052 0.090 Q11ER$1 1.644 0.146 Q11FR$1 0.895 0.103 Q11KR$1 0.469 0.090 Q11MR$1 0.208 0.145 Q11GR$1 0.840 0.092 Variances FB 0.371 0.129 


1) Just so that we have the language clear, when you say "scale information", do you refer to estimated factor scores and their precision (SE)? 2) That's right. The factors mean very different things on the two levels so that equal loadings would not make sense. 


Hi, In terms of "scale information", I mean the sum of item information for the items in a scale (information provided by specific response category times probability of respondent with trait level x choosing gth response category), assessed across the range of the underlying latent construct. In essence however, I would just like to get a measure of the precision of the scale and be able to make a statement about its measurement quality at the subject and neighborhood levels. Thank you for all your help and time with this problemI very much appreciate it! Sincerely, Magdalena 


When you describe what you mean by "scale information", it sounds like what IRT people call "information function". That is, the standard errors for the factor score estimates expressed as a function of the true ability. See for example the 1985 HambletonSwaminathan book, chapter 6. Do you agree? If so, Mplus has not yet implemented this, but we will do so, also for multilevel models. 


Yes, it is. OK, that's good to knowthanks. Then is there a way I can characterize the precision/reliability of the scale as a whole with these models in MPLUS? 


The reason calculating precision/reliability for a scale (= a latent variable construct, estimated as factor scores) is not included yet in Mplus is that this is not typically central to latent variable modeling in the following sense. You have a measurement model with 2 withinlevel constructs, one for 5 and one for 7 categorical items. The measurement modeling is typically not an aim in itself, but is related to other variables, either predictors or consequences. Those other variables can be brought together with the measurement model to create a structural equation model that is estimated in a single step. The precision/reliability aspect of the measurement model then translates to how well you can estimate structural regression slopes and that is assessed by their SEs. Few research questions need to be approached in a 2step fashion (measurement model producing a scale, scale used for some purpose).  What kind of use of your measurement model do you have in mind? 


I would like to make a statement about the quality of a measure across two different sites. So one of the things I would like to compare is the level of reliability of a measure of the latent construct in the two sitesi.e. can the same construct be measured with comparable reliability in the two sites? That's why I wanted to compute the reliability. Maybe there's a way to do it manually, as proposed by Raudenbush in those articles I cited. Also, I have another question. Is there a way that MPLUS has of doing differential item functioning for multilevel structures? I would like to compare measurement equivalence across the two sites for this scale as well, but since the items are dichotomous, I can't use multiple group cfa. Thanks, as always, for all your help! Sincerely, Magdalena 


Regarding comparing reliability of the measure across the 2 sites, what I was referring to would amount to using a 2group latent variable analysis instead of estimating factor scores and comparing them across groups. In the 2group analysis, you can test group invariance of the item parameters directly. The reliability of the factor score estimates that you are referring to will not be very high with only 57 categorical items. In contrast, the 2group latent variable analysis comparing say the means of the latent variable across sites (assuming measurement invariance), can give good power/precision in the estimation. I guess the above answers your second question as well. You can do 2group analysis for categorical items. In the ML estimation framework you would do that using Type = mixture and the Knownclass option to capture the 2 groups. 


Dear Dr. Muthen, Using STREAMS with Mplus, I performed a twolevel analysis. To be able to use the start values, I performed at first a regular confirmatory factor analysis on the total covariance matrix. This leaded to a model with six latent variables and covariances among the most of them. The model fit of this model equals the following: RMSEA= .061 and X squared/df = 2.18. Although the RMSEA indicates some possibilities to adjust the model, the X squared/df shows a good model fit. Since further adjustments would lead to difficulties in interpretation, I decided to use this model to begin with at the within level of the twolevel analysis. At first I presupposed no between structure by allowing the manifest variables at the between level to covary freely. However, running this twolevel analysis, I encountered several problems. Sometimes I received the remark: 'Estimated between covariance matrix is not positive definite as it should be. Computation could not be completed. Model estimation did not terminated normally. Change model and/or starting values.' I tried some small changes, like covariances among all latent variables, but that did not help. At other times I received an internal error code (GH1006), or even a fatal error code that pointed out that there is not enough memory space to run the program on the current input file. However, at other times the model did run (see remark above). Trying to specify the between level structure also leaded to the internal error code or the fatal error code. Encountering these problems, I was wondering if these are linked to the poorer model fit of my model for the total covariance matrix, though only indicated by the RMSEA. If not, what are your suggestions to overcome these problems? If necessary I can send in my input file and data, as asked for in the internal error code GH1006. Thanks for your help. Ellen D'Haenens 


The only way to see what is happening is for you to send your input, data, output, and license number to support@statmodel.com. 


Dear Dr. Muthen, I have a couple of followup questions to the issue that Magdalena Cerda raised. I am also interested in calculating the level2 test information function for a model with many level 2 units. I realize that calculating level 2 precision estimates of factor scores is often "unnecessary" in latent variable modeling since the goal is to keep the measurement and structural components in a single model rather than taking out the factor scores to use in a path analysis. However, there are many instances in educational research where test scores are used in a multilevel analysis and are not treated as latent variables. The current state of educational research is that test scores are much more widely available as scaled scores than as raw data. When student test scores (scaled theta estimates computed from an IRT model, so equivalent to factor scores from a categorical FA) are used in multilevel models where the effect of interest is at level 2 (such as looking at an effect of a teacher intervention on student achievement where teachers are the unit of assignment), it seems that it would be important to know the reliability of the latent mean achievement at the teacher level. It is possible that the test information function at level 2 may be very different from the test information function at level 1, which would suggest that the same measures may not be appropriate for inferences that involve students and teachers. I realize that Mplus does not currently plot the IRT information function at level 2 or save the standard errors of factor scores at level 2. However, if Mplus can be used to obtain level 2 loadings and thresholds for binary indicators, then can't I use this information myself to compute a test information function at level 2? Thank you for your help. Sharyn Rosenberg 


Mplus does give information curves for level 2 latent variables. If you don't get them, there might be another reason for it. You might want to send your input, output, data, and license number to support@statmodel.com 


Thanks. I actually did get the level 2 information curve when I tried it (I didn't realize that Mplus now had this capability when I first posted the message). 

wendy posted on Friday, July 21, 2006  4:17 pm



Hi, Dr. Muthen: I simulate a 2level CFA model and both within and between levels have equivalent structures: 2 factors and each factor predict 3 indicators. The output showed an error message indicates that my between level covariance estimation is not positive definite, it says: THE RESIDUAL CORRELATION BETWEEN Factor2 AND Factor1 IS 1.000 Does that mean two between level factors are highly correlated? What is meaning of it and how could I correct starting values such as factor variances? 


It is likely that in MODEL MONTECARLO you specify a high correlation between factor2 and factor1 and that in one random draw, the correlation becomes one. If you want more information on this, send your input, output, and license number to support@statmodel.com. 


Dear Dr. Muthen, I estimated a twolevel factor analysis, with two factors at the respondent level (4 items each) and one factor at the neighborhood level (8 items), and an n=2494, nested within 166 neighborhoods. I get very different results for the neighborhoodlevel factor loadings and the neighborhoodlevel reliability (using the information curves), depending on the scale item I decide to fix to 1 in the neighborhoodlevel factor and one of the respondentlevel factors. If I select one particular item, the reliability is low (highest information statistic about 0.8) but the item loadings are significant. If I select another item, the reliability becomes skyhigh (an information statistic of about 100), but the factor loadings for many of the items at the neighborhood level become insignificant. Changing the reference item doesn't have an impact on factor loadings or reliability at the respondent level . Would you know why this is? Thank you, Magdalena Cerda 


Please send your input, output, data and license number to support@statmodel.com. Include both choices for item loading fixings. 


Different items chosen to set the metric of the factor gives different results for significance of loadings and for information functions because the factor is expressed in a different metric. For loadings, this is in line with standardized coefficients not being significant at the same time as the raw coefficients. The information curves will differ in the two runs but they will be proportional by lambda^2 see formula (8) in http://statmodel.com/download/MplusIRT1.pdf I recommend using the conventional IRT metric of all loadings free and the factor variances fixed at 1 on both levels. As an alternative to Monte Carlo integration I would suggest INTEGRATION = 7. 


Hello, I have estimated a multilevel confirmatory factor analysis with one factor at each of two levels, and covariates at the two levels as well. One of the neighborhood covariates consists of a set of dummy variables for different neighborhood "types" (poor and cohesive vs poor and noncohesive, for example). I would like to construct a bar graph showing the estimated level of the latent outcome (perceived violence) for the different neighborhood types, with average values for all other covariates. The question is, how does one obtain this in a twolevel factor model that has covariates at both levels? Does one just use the beta estimates at the between level to estimate a prediction, as one would in a singlelevel model, or does one also use thresholds at the between level, and does one also need to use anything at the within level? The concern arises particularly since there is no one intercept in the model, but several... Thank you for your help, as always! Sincerely, Magdalena Cerda 


Please send your full output and license number to support@statmodel.com so we can see your full model. 


Dear authors, I'm carry on a multilevel factor analysis for ordinal variables. As suggested by Grilli, Rampichini (2007), I want to carry on a separate EFA on the estimated between and within correlation matrices of the latent responses. "The decomposition of the latent response correlation matrix into the between and within components can be obtained by means of a multivariate twolevel ordinal model with unconstrained covariance structure." How is possible with Mplus to obtain this decomposition? sincerely Roberta 


Yes, see the SAVEDATA command in the Mplus User's Guide. 

elisa posted on Thursday, March 08, 2007  4:38 am



Dear Dr. Muthen, I'm working on MFA following the strategy suggested by Muthen(1994) and Grilli(2007). I have 12 indicators and I want to estimate the between and within covariance matrix. I referred mainly to the Mplus web site Example, cont10. USEVARIABLES ARE ...; MISSING ARE all(999); cluster = cdl; ANALYSIS: TYPE = TWOLEVEL; estimator = ml; model: %within% fW BY giud RAPCOL RAPDOC RAPNDOC RAPRELA RAPSTUD STRAULE STRBLB STRLAB indietr R1452 R1482; %between% fB BY giud RAPCOL RAPDOC RAPNDOC RAPRELA RAPSTUD STRAULE STRBLB STRLAB indietr R1452 R1482; OUTPUT: SAMPSTAT STANDARDIZED; SAVEDATA: SIGB IS "d:\...\SIGB.txt"; SAMPLE IS "d:\...\SAMPLE.txt"; the output gaves me the estimated sample statistics for between and within, but it says: THE ESTIMATED BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. COMPUTATION COULD NOT BE COMPLETED. PROBLEM INVOLVING VARIABLE R1482. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES What do I have to do? And, another question: values in the SIGB I saved are different from ESTIMATED SAMPLE STATISTICS FOR BETWEEN, why? Which values do I have to use in order to carry on an EFA on the Beetween covariance matrix? Really thanks, best regards Roberta 


This message usually points to a problem of zero variance on the between level. You can fix the variance to zero. If this is not the issue, please send the input, data, output, and your license number to support@statmodel.com. I would need more information to answer your other question. Please send the input, data, output, and your license number to support@statmodel.com. 

elisa posted on Tuesday, March 13, 2007  4:12 am



Dear Mrs. Muthen, another question. I have 12 indicators and I want to estimate the between and within covariance matrix in order to follow the strategy of Muthen(1994) and Grilli(2007) for the subsequent analysis. Snyiders and Bosker (1999) suggest to carry on a multilevel multivariate empty model, but I think in Mplus is not possible. So, I have to use a TWOLEVEL analysis to obtain the estimates of the between and within covariance matrix. But, at every level, do I have to "create" as many factors as the observed variables are, or do I have to "create" just one factor? Are there some references on this topic with Mplus examples? And, when I use the SAVEDATA command, do I obtain SIGMAB and "SIMGMAW" or the pooled SIGMAW? After obtaining this matrix, may I use an EFA? So, ANALYSIS: TYPE = EFA 2 4; MATRIX = COVARIANCE; Thanks a lot, sincerely Elisa 

elisa posted on Wednesday, March 14, 2007  8:40 am



Dear authors, I have 12 indicators and I want to estimate the between and within covariance matrix in order to follow the strategy of Muthen(1994) and Grilli(2007) for the subsequent analysis. After saving the estimated between and within covariance matrix through SAVEDATA command, how may I use it for an EFA and a CFA? If I use: VARIABLE: NAMES ARE item1item12; USEVARIABLES ARE item1item12; ANALYSIS: MATRIX=COVARIANCE; Mplus does not read data in the correct way. How can I solve the problem? Thanks a lot, sincerely Elisa 


Example 12.1 shows how to read a covariance matrix as data. The MATRIX option of the ANALYSIS command should not be used. If you continue to have problems, please send your input, data, output, and license number to support@statmodel.com. 


Hello, I am working with item level data for a 3item scale (y1y3) collected on 150 people (id) once a day for 14 days (day). I would like to break down the variance components for the scale in order to calculate various scale reliability estimates (e.g., within person reliability across days). In order to do this, I would like to calculate variance components for person, day, item, person*day, person*item, day*item, and error. I was wondering if someone might be able to help me with the Mplus syntax to get these estimates? 


I would Google George Marcoulides. He has done work in this area. Get an input from one of his papers and translate it into Mplus if he has not used Mplus. 


Dear Dr. Muthen, I 'm working on multilevel CFA.Mplus 5.1 demo does not display chisquare and modification suggestion. How can I solve the problem? result : THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILLCONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.668D16. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 11. thank you so much. 


We would have to see your exact model to be able to comment. Please send input, output, data, and license number to support@statmodel.com. 


I have done longitudinal content analysis of 128 websites of candidates for President. I have 81 variables, most of which dichotomous. Overall, 20 candidatesâ€™ websites were coded in 11 observations, but not all of them were coded in all observations: websites were taken in and out of the target population based on the candidatesâ€™ decisions to enter or quit the race. Thus, a candidate that entered early and stayed throughout all the race would have 11 observations, while one than entered early but dropped out quickly would have, say, 4 observations, and so on. I would like to know whether multilevel factor analysis would be suitable to analyze my data. I am concerned that the sample size is too small, especially considering that both between and withingroup analyses need to be conducted. Thanks a lot for your help. 


It sounds like you have a sample of 20 subjects, which is quite low. But I am not sure what the difference is between your "81 variables" and your "11 observations". Perhaps an "observation" is a time point at which you observe several variables. If you have 20 subjects observed at 11 time points (some subjects have fewer time points) you have longitudinal data and that means your information is increased substantially. See for example discussion of growth modeling sample size in Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599620. With only 20 subjects, however, you can do only very limited "level2" (personlevel) modeling. 

robertav posted on Thursday, October 08, 2009  7:13 am



Dear authors, I'm applying a multilevel factor model to my data. Mplus output gives me some relative goodness of fit measures (BIC, AIC, etc). I would like to know if there are some references about some absolute goodness of fit measures that have to be used in multilevel factor models. Thanks, besta regards, Roberta 


I don't know of any such references. If you want chisquare and related fit statistics, you can use weighted least squares estimation. See the ESTIMATOR option in the user's guide. 


Dear Dres. Muthen, working on a paper on intergenerational relations, I am analyzing the following threelevel data structure: Respondents (level 2) rate several aspects (items) of their relations (level 1) with 6 different kin. Additionally, the study is crossnational in that respondents are nested within 11 countries (level 3). I am currently working on the measurement models (CFAs) which, if successful, are to be extended to full multilevel SEMs. I planned on modeling the first two levels explicitly while correcting standard errors for the clustering by country (using the TYPE=TWOLEVEL COMPLEX command). From my understanding, 11 level 3 units is not enough to be modeled as a third level, anyway. However, in this analysis Mplus gives implausible fit indices (a CFI of 0.0 and even a negative TLI). As soon as I take the country cluster variable out, results become plausible (CFI=.92, TLI=.85). 1. Do you have any suggestion why this occurs or what I can do about it? 2. In my case, would you recommend a multigroup approach to establish measurement invariance across countries instead of simply correcting the standard errors by country clustering? 3. Do you know of any literature on clustered or multigroup twolevel CFA? Many thanks in advance, O. Arránz Becker 


11 countries is typically not sufficient for Type=Complex. At least 20 are needed. So I would switch from random country effects to fixed ones, using country dummies as covariates. Or use country as a grouping variable as you say. I wonder if you need multilevel modeling  couldn't the different items for the different kins simply be a multivariate observation vector where the items and the kins are correlated via the model? Multiplegroup twolevel CFA was written about in Muthén, B., Khoo, S.T. & Gustafsson, J.E. (1997). Multilevel latent variable modeling in multiple populations. Unpublished technical report. which is on our web site under Papers, Multilevel SEM. 


Thanks a lot for your interesting suggestions. Some new questions arise: 1. I wonder what the suggested multivariate approach would imply for my CFA. For every kin, there would be as many factors as there are constructs, having measurement error variances per item correlate between kins, is that right? 2. I also wonder how one would then build in covariates in an SEM extension measured on different levels (e.g., a dummy for relation with samegender kin (level 1) and respondent's education (level 2))? Via 2nd order factors (i.e., relational aspects across kin)? I am a bit confused here, would appreciate any suggestion. 3. If I use fixed country dummies, are standard errors for lowerlevel covariates (e.g., respondent's education) estimated correctly? 4. Finally, do you have any references on the equivalence (or similarity) of the multilevel and the multivariate approach to hierarchical data? Thank you so much in advance, best regards, Oliver 


See the Khoo et al reference on slide 54 of our Topic 8 handout on our web site (and surrounding slides) about multivariate versus multilevel approaches. 


Multilevel CFA and sample size: I am developing a grouplevel scale, and need to know the minimum number of individuals and groups required to run a multilevel EFA (and CFA). In the final scale, I expect about 2025 items and 3 subscales. Would you recommend running a Montecarlo simulation to know (approximately) the required sample size, and which references would be useful for that? Thanks a lot. PS: I have (Muthén,1994),(Muthén&Muthén, 2002) and the doc “v5.1 ex. Addendum". 


I think a Monte Carlo study could be helpful to determine the sample size needed for your study. I don't know of any references beyond what you mention. 


Hi, I have a followup question concerning my multilevel analysis of perceptions of relationships (level 1) nested within respondents (level 2) nested within 11 countries. Concerning your suggestion to include fixed country dummies, does this actually solve the problem of biased standard errors due to countryclustering? I noticed that adding a cluster command for countries (despite concerns about the small number of countries) does affect standard errors considerably even if country dummies are already in the model. Do I also get correct standard errors if I include countrylevel aggregate covariates (e.g., gross national income) instead of country dummies? Thanks in advance, Oliver 


Using dummy variables will take care of some but not all of the nonindependence of observations. Including countrylevel covariates can also help. The standard errors are not trustworthy using TYPE=COMPLEX with only 11 countries. 


Does "not trustworthy" mean that standard errors may be both over and underestimated? Can you provide any reference on the minimum number of clusters for TYPE=COMPLEX (in case some reviewer objects)? Thanks, Oliver 


Yes. See writings by Joop Hox. 


I have run a multilevel factoranalysis using this code: MODEL: %WITHIN% LNw BY RTw Logitw; LNp BY RTp Logitp; s Logitp ON LNw; %BETWEEN% s; My question is whether the factorscores are also influenced by the s  part, or are the factorscores only determined by the first two lines? 


Factor scores are estimated using the entire estimated model. 

Teresa Dubb posted on Tuesday, March 29, 2011  6:01 pm



Hi, I have an experiment where 50 participants are assigned into two conditions (good/bad) and in each of the conditions, each participant is presented with two different alternatives (A/B) and they are asked to provide their judgments on both alternatives A and B. I understand that the judgments are clustered by participants but I can't quite figure out how to use TWOLEVEL EFA to perform an exploratory factor analysis with CLUSTER = Participant. Below is an example of the data where X1, X2, and X3 are judgments on different aspects of the alternatives. Thanks very much. Participant Condition Alternative X1 X2 X3 ... 1 good A 10 12 15 ... 1 good B 30 25 38 ... 2 good A 20 22 33 ... 2 good B 60 35 50 ... 3 bad A 30 30 40 ... 3 bad B 20 40 50 ... 


If the alternatives aren't randomly equivalent it seems like you might want to spread their judgements in a wide, multivariate fashion instead of doing a twolevel model. The data would then look like: 1 good 10 12 15 ... 30 25 38... 2 good 20... etc You handle this by "longitudinal factor analysis" which can be done in an EFA framework using "ESEM"  see exploratory structural equation modeling in the UG index. You can then check if the judgement factors are the same for A and B. 


Hello, I am trying to run a twolevel sem (with clusterlevel) with continuous factor indicators (latent factors 12), categorical factor indicators (latent factors 35) and 2 observed covariates on the within level. Additionally, I have 2 observed covariates on the between level. I wonder which estimator is appropriate in this analysis, wlmsv or mlr? Thanks for your advice, Sofie 


If you have three factors with categorical indicators, that will require three dimensions of integration which is computationally demanding which would suggest using WLSM or WMSMV. If you have a lot of missing data, you may want to use MLR or multiple imputation followed by WLSM or WLSMV. 


Hello, I am trying to run a relatively simple latent variable model using a complex data set (TIMSS). I was under the impression that the most recent version of Mplus had the capacity to apply SEM analysis to complex data, accommodating the weighting variable and the replicate weighting variable. However, I am getting an error suggesting otherwise: EFA factors are not allowed with replicate weights. EFA factors are declared with (*label). Can you not use the complex data options with LVM modeling in version 6.1? Thank you in advance for your time! Below is my syntax. VARIABLE: NAMES ARE BS4GSEX BS4GBOOK TOTWGT JKZONE JKREP USBSGQ6A USBSGQ6B PATM VALUE SCONF APP01 KNO01 REA01; USEVARIABLES ARE BS4GBOOK TOTWGT JKREP USBSGQ6A USBSGQ6B PATM VALUE SCONF APP01 KNO01 REA01; MISSING ARE ALL (9); CATEGORICAL ARE BS4GBOOK USBSGQ6ASCONF; WEIGHT IS TOTWGT; REPWEIGHTS = JKREP; MODEL: F1 BY BS4GBOOK USBSGQ6A USBSGQ6B (*1); F2 BY PATM VALUE SCONF; F3 BY APP01 KNO01 REA01; F3 ON F1F2; ANALYSIS: TYPE = COMPLEX; ESTIMATOR IS WLS; REPSE = BRR; ITERATIONS = 1000; CONVERGENCE = 0.00005; 


You are combining EFA and CFA factors in the MODEL command. Replicate weights are not allowed with the EFA factor: F1 BY BS4GBOOK USBSGQ6A USBSGQ6B (*1); They are allowed for the CFA factors: F2 BY PATM VALUE SCONF; F3 BY APP01 KNO01 REA01; 


Dear Dr. Muthen, I have a 3level nested crosssectional dataset where the outcome variable is binary. The level1 data is on patients’ characteristics, the level2 data is on Physicians’ characteristics, and the level3 data is on clinics characteristics. Patients’ characteristics data include bunch of observed variables where some variables are continuous, some are binary (and ordinal), and the remaining are count variables. Physicians’ characteristics data also include a mixture of continuous, binary, and count variables. Clinics’ characteristics data only includes one binary variable. Given the mix of exogenous variable types (continuous, binary, and count), and binary outcome variable, can Mplus handle this as a 3level factor model? Furthermore, I was thinking that before going ahead with 3level model, maybe I should first do EFA on just the patientlevel data so as to reduce the number of predictors to a smaller but representative set of composites. Later, I can use those composites to define level1 model in the 3level factor model. Do you think it is okay to do that? Thank you. 


For crosssectional models, Mplus currently handles twolevels. You can use TYPE=COMPLEX TWOLEVEL for threelevel where the standard errors at the third level are computed taking clustering into account. 


Thank you. A followup question. I was thinking that before going ahead with 3level model, I should first do EFA on just the patientlevel data so as to reduce the number of predictors to a smaller but representative set of composites. Later, I can use those composites to define level1 model in the 3level factor model. Do you think it is okay to do that? 


I really can't answer this question. I would need to know far more about your data and study than is allowed on Mplus Discussion. 

Nidhi Kohli posted on Tuesday, July 26, 2011  12:52 pm



Dear Linda, Thank you for your response. I tried to run 2level CFA Model on the clustered dataset that I described in my earlier post. I am getting the following error message: *** ERROR in ANALYSIS command Estimator WLSMV is not allowed with TYPE=TWOLEVEL COMPLEX. I have a categorical factor indicators, hence, I wanted to use WLSMV. Can you tell me what other estimator I can use for a dataset like I have? Below is the relevant Mplus code: ANALYSIS: TYPE = COMPLEX TWOLEVEL; ESTIMATOR = WLSMV MODEL: %WITHIN% fw BY td_mh; fw ON gender race mar emp anxiety phq2 MH_meds Non_MH_meds out_visits; %BETWEEN% fb BY td_mh; fb ON age_md gender_md race_md img family panel_pts PCP_lateness; Thank You. 


You need to use the MLR estimator which is the default for this type of analysis. 


Thanks! I used the default estimator, but still I have not got any success in running the Mplus program. I am getting the following error message: *** ERROR One or more betweenlevel variables have variation within a cluster for one or more clusters. Check your data and format statement. Between Cluster ID with variation in this variable Variable (only one cluster ID will be listed) PCP_LATE 395 I am not able to understand what this warning message is trying to say. Can you please explain it to me? Thanks. 


The message means that in cluster 395, the variable PCPLATE is not the same for each individual. This is a requirement for a betweenlevel variable. For further help send your output, data, and license number to support@statmodel.com. 


Thanks. I have sent the data and output file to support@statmodel.com, along with the license number. 

Jacky Luo posted on Friday, November 04, 2011  6:47 pm



Dear Dr. Muthen, I want to fit a 2level Rasch testlet model to my data, which consist of 4 testlets with each having 9 binary items. I am also interested in getting the group variance. I ran with the following code, but the output seemed a little bit off, and it seemed that I could not get a direct estimate of the group variance. Would you please take a quick look at it to see whether there is something wrong with my code? TITLE: Two level IRT testlet model DATA: FILE IS C:\Users\Jacky\Downloads\r1.dat; VARIABLE: NAMES ARE u1u36 group; CATEGORICAL = u1u36; CLUSTER = group; MISSING = ALL (999); ANALYSIS: TYPE = TWOLEVEL; MODEL=nocovariances; INTEGRATION = MONTECARLO; MODEL: %WITHIN% f1 BY u1u9@(1); f2 BY u10u18@(1); f3 BY u19u27@(1); f4 BY u28u36@(1); f5 BY u1u36@(1); f5@1; %BETWEEN% u1u36*; OUTPUT: TECH1 TECH8; 


When you hold the loadings equal you should not use @ because @ means fixed. Also, you are only fixing the f5 variance at 1 and I think you want to fix all five variances. It will be impossible to handle all 36 random intercept variances on between (unless you use estimator = wlsmv). Instead you want to formulate a factor on between where those 36 random intercepts are the indicators. 

Jacky Luo posted on Sunday, November 06, 2011  11:55 am



Hi Dear Dr. Muthen, Thanks very much for your prompt response. Based on your suggestions, I used the following code: %WITHIN% f1 BY u1u9 *(1); f2 BY u10u18*(1); f3 BY u19u27*(1); f4 BY u28u36*(1); f5 BY u1u36*(1); f1@1; f2@1; f3@1; f4@1; f5@1; %BETWEEN% fb BY u1u36; Is this consistent with your suggestions? I am also wondering why I have to fix all 5 factor variances to be 1, since f14 are testlet factors and they are simulated not to have the same variance as f5. Thanks very much, Jacky 


I would say %WITHIN% f1 BY u1u9 *(1); f2 BY u10u18*(2); f3 BY u19u27*(3); f4 BY u28u36*(4); f5 BY u1u36*(5); f1@1; f2@1; f3@1; f4@1; f5@1; f1f5 WITH f1f5@0; %BETWEEN% fb BY u1u36; u1u36@0; If you are going to have a Raschlike model I would not generate data where the factor variances differ but instead the loadings. 

Jacky Luo posted on Monday, November 07, 2011  10:33 am



Thanks so much, Dr. Muthen! 

Morag T posted on Thursday, November 17, 2011  3:09 am



I wonder can anyone help me please. I have 5 waves of a birth cohort study (data collected annually) and I have a series of categorical variables on ‘closeness to family and friends’ collected in each sweep (26 variables in total). I did 3 principal components analyses on these (I couldn’t just do one as not all the variables were sufficiently correlated). The resulting 3 factors regressed nicely with my DV, ‘Children’s Stress and Difficulties score’. However, as I used variables across waves this makes the data multilevel and my PCA was not. Can anyone advise me whether this data would be suitable for a multilevel factor analysis in MPlus? I have MPlus but it is new to me. Just to complicate matters, my data is a complex sample, with PSU (cluster), Strata and a longitudinal sample weight. Should these be incorporated into a multilevel factor analysis, and if so, how? If it is possible and you could show me the necessary syntax, I would be eternally grateful. What I had tried to do in MPlus was: Title: Data: File is \ Variable: Names are ; Missing are all (9999) ; Categorical are Usevariables are Stratification = DcStrat; Cluster = DcPSU; Weight = DcWTbrth; Analysis: Type = Complex EFA 1 4 ; Could you please advise on this? And, many thanks for reading this far! 


This seems to be a reasonable approach. 

Morag T posted on Friday, November 18, 2011  3:26 am



Many thanks for your prompt response. However, I'm sorry, I'm not quite understanding, do you mean the approach I already took was reasonable, or to take the multilevel factor approach would be reasonable? 


You can take the nonindependence of observations into account using COMPLEX or TWOLEVEL. If you use TWOLEVEL, then in addition to taking nonindependence of observations into account you would be interested in study the factor structure on both within and between. 

Morag T posted on Tuesday, November 22, 2011  2:30 am



Thank you very much! 

finnigan posted on Tuesday, December 13, 2011  6:06 am



Linda/Bengt I want to run a multiple indicator growth model using three waves of data , but the data are collected from individuals nested in companies. Arguably , I have a multilevel data structure. I am trying to conduct measurement invariance testing using CFA prior to running the growth model. Should I run a CFA with a single factor using a multilevel analysis ie at time one run a two level CFA and repeat this analysis for time two and three or would it be more appropriate to run a two level longitudinal factor analysis to check for multi level effects. Groups are unbalanced and less than 30. Sample size for t1 = 130, t2 = 118. T3 =110 The multilevel structure is not of interest , I am trying to rule it out so that I can focus on the measurement invariance testing for the growth model. Does MPLUS calculate sampling weights or do they have to be specified for MPLUS 


I would run the analysis with and without TYPE=COMPLEX to see if there is a big difference in the standard errors. If not, I would ignore the clustering. You can see the Topic 4 course handout on the website for the steps for multiple indicator growthm 

finnigan posted on Thursday, December 15, 2011  2:10 am



Thanks Linda Should the multilevel factor analysis be run longitudinally ie all three time points or one factor analysis per time point? 


If you look at the Topic 4 course handout, you will see the steps we recommend. 


Hello, I am working on a twolevel EFA/CFA model with likert scale items from three scales. Items from two of the scales have response categories ranging from 14 and items from the third scale have response categories from 03. I am running continuous models. I am wondering if it is preferable or acceptable to standardize the responses (using zscores) when running the multilevel factor analyses to get the items on the scale. Results from the EFA/CFA's using standardized and nonstandardized items are very similar in the withinlevel part of the model, but are a little different in the betweenlevel part of the model. Is there standard practice for whether or not to standardize items on different response scales in factor analysis and does this differ for multilevel factor analysis? Thank you. 


I would not standardize variables. 

Juliette posted on Wednesday, March 14, 2012  3:04 am



Thank you for the response. Could you elaborate on why I wouldn't want to standardize? Would standardizing not allow each variable to contribute one unit of variance to the solution? Thank you. 


You can only use standardized variables if you have a scale free model. In this case, it does not matter whether you analyze a correlation or covariances matrix. If you have a model with any constraints, you will obtain different results analyzing a correlation versus a covariance matrix. Standardizing also presents problems for across group or across time comparisons. 


In TWOLEVEL EFA with ordinal indicators, I obtain the within and between sample covariance with SAMPSTAT. Consistent with LRV formulation of categorical models, I see 1.00 for the within variance of all indicators and itemspecific variances in the between matrix (unlike TWOLEVEL EFA with continuous indicators, where itemspecific variances appear in both within and between covariance matrix). However, ICC is given for each item, and some algebra with ICC and betweenvariance will yield the withinvariance consistent with the ICC, and it s not 1.00. How are within variances and ICC calculated with categorical items and LRV formulation? Is it appropriate for me to use ICC to calculate the withinvariance for reporting purposes? Thanks 


Is this probit or logistic regression? 


I don't understand probit or logistic in this context. It is an EFA Data: File is tone.dat ; Variable: Names are ss video clip SU NU CA WA PO RE AF PA DO CO BO DI; usevar SU CA PO RE DO CO BO DI; categorical SU CA PO RE DO CO BO DI; useobs video==0; Missing are all (9999) ; Cluster = ss; Analysis: Type = TWOLEVEL EFA 1 2 UW 1 2 UB; Output: SAMPSTAT; 


An EFA with a categorical indicator uses a logit or probit regression of the indicator on the factor. See the IRT literature for instance  the topic is "item characteristic curves". WLSMV uses probit, while with ML you have a choice. 


Ok, but I am still missing something, unless this relates to the intraclass correlation coefficient calculation. I hope I am not being dense. That is my entire program, so whatever the default estimator is for twolevel efa with categorical indicators, thats what I used. But my question is about the sample statistics obtained with SAMPSTAT. ... and about the ICC. I am just trying to get some semblence of total, within, and between estimated sample correlations. Papers with continuious indicators, like Reise's 2005 paper demonstrating MFA, report these correlations. I think logit vs probit has to do with interpreting the loading. I am after the withinvariance used to calculate ICC. 


You can see which estimator and link you are using in the output summary. The default for 2level categorical EFA is WLSMV which uses a probit link. The reason we were asking is that the icc calculation differs depending on the link. In this case, the withinlevel variance for a factor indicator is fixed at 1 so the icc is betweenvariance/(betweenvariance + 1) If that doesn't seem right in your run, please send output to Support. 


Thanks for confirming the withinvariance is fixed at 1. This morning I recheck my calculations and find that the problem was I had transposed the the betweenvariance (.125) and the ICC (.111) when I made my calculations, and so I did not compute 1.0. (I computed .777, where I expected 1.0, and got confused). Apologies for troubling the board with an issue that stemmed from me transposing two numbers. Many thanks for your help in getting me back on track. If I can ask a followup question, do I understand that with some other estimators, the withinvariance would not be fixed at 1.0? .... and, is there any advantage/disadvantage to choosing different estimators with this EFA I am conducting? 


For logistic regression, it is fixed at pi squared divided by 3 where pi is 2.14. No advantage to choosing different estimators. 


Dear Drs. Muthen. Hello. I am a novice to using multilevel factor models and I have a question concerning the use of a twolevel factor model with four ordinal (Likert type) items via WLSMV estimation. My sample size is 589 with 50 clusters. There is only one factor that I am trying to assess using this model (there is a unidimensional factor structure at level1 and at level2), but in all of the models tested the betweenlevel (level2) factor variance is nonsignificant using an alpha level of .05. Thus, I am wondering if the nonsignificant betweenlevel factor variance warrants that I continue to use multilevel factor analysis, or if I should use Mplus to assess the factor structure of these items using an aggregated, singlelevel CFA? Put another way, is a multilevel factor model warranted when the level2 factor variance is nonsignificant? Thank you for your assistance! 


Yes, it is warranted if the residual variances are significant. 

KUN YUAN posted on Thursday, November 08, 2012  10:56 am



Dear Dr. Muthen, I have a fourlevel data set with 10 categorical factor indicators and would like to conduct multilevel EFA with it using Mplus. I've seen the examples with 2 levels. I wonder whether Mplus is capable of running a fourlevel EFA with categorical variables. If so, could you please point me to any examples you have? Thank you! 


No, Mplus can only do 2level EFA. I haven't seen any other program that can do more than 1level EFA. 

KUN YUAN posted on Friday, November 09, 2012  12:29 pm



Thank you for your answer, Dr. Muthen. 


Dear Dr. Muthen I have conducted, for the first time, Multilevel factor Analysis. I estimated a twolevel factor analysis using teacher survey and principal survey. (Teachers are nested within schools). I did Exploratory Factor Analysis (EFA) using principal components analysis extraction method with Geomin rotation on 19 items through Mplus ver. 6.11. Now I have three results 1. EXPLORATORY FACTOR ANALYSIS WITH 2 WITHIN FACTOR(S) AND 2 BETWEEN FACTOR(S): 2. EXPLORATORY FACTOR ANALYSIS WITH UNRESTRICTED WITHIN COVARIANCE AND 2 BETWEEN FACTOR(S): 3. EXPLORATORY FACTOR ANALYSIS WITH 2 WITHIN FACTOR(S) AND UNRESTRICTED BETWEEN COVARIANCE: What is the reasoning for choosing one of those three factors over the others? Is there anything I should look out for before I use one of those? For example, if I choose to use the information from 2 within factors and 2 between factors compare to unrestricted, what should be the rational? and vise versa? Thank you 


There are fit statistics to help you choose. However, it may be difficult to choose based only on these. I would say ultimately, the interpretability of the factors based on your theory should guide you. 


Hi! I am trying to conduct a twolevel CFA on a data set with two observations per participant (ID) and categorical factor indicators (variables rated on a 4point likert scale). When I run the analyses, the output states that the model terminates normally, and there are no further warnings. However, the only model fit indicators I get are Loglikelihood and Information criteria, and no CFI, TLI or RMSEA. The command I use is this: TITLE: Twolevel factor analysis, categorical indicators DATA: FILE IS M:\Data.dat; VARIABLE: NAMES ARE ID Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12; CATEGORICAL IS Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12; CLUSTER IS ID; ANALYSIS: TYPE IS twolevel; MODEL: %between% F1Bet BY Y2 Y4 Y7 Y9 Y11 Y12; F2Bet BY Y1 Y3 Y5 Y6 Y8 Y10; %within% F1With BY Y2 Y4 Y7 Y9 Y11 Y12; F2With BY Y1 Y3 Y5 Y6 Y8 Y10; Is there anything I should ask for in the OUTPUT to receive CFI, TLI etc.? Thank you very much for your help! 


With categorical dependent variables and maximum likelihood estimation, means, variances, and covariances are not sufficient statistics for model estimation. Because of this, chisquare and related fit statistics are not available. 


Hello, I was hoping I could get advice on what type of EFA to perform in MPLUS for my dataset. In our study, participants retrieve several memories and rate them on 13 fixed variables. Since memories are personal, they vary between subjects. We are interested in examining 1) the factor structure underlying the variables characterizing the memories (i.e. withinlevel EFA) and 2) the factor structure at the betweensubject level, such that a factor reflects the fact that people who (on average) rate their memories high on one variable also rate their memories as high on another variable. Factor scores are most meaningful to me on the betweensubject level as I would like to correlate them with other measures. My question is whether it would make sense to perform a twolevel EFA (saving factor scores for the betweenlevel), OR to perform two separate EFAs, each with slightly different interpretations: 1) a withinlevel EFA where each row reflects a single memory and I ignore subjects alltogether and 2) a betweenlevel EFA where each row represents the average ratings across all memories from a single subject. Finally, if I chose the twolevel route, will MPLUS accept a datafile where each row reflects a single memory and there are 13 different columns and a separate “SubjectID” column? Thank you in advance! 


I think it makes sense to do a 2level factor analysis where the focus is on level 2 factors as you say. Regarding your last question, yes Mplus accepts this format. 


Hi! Thanks very much for your response. We ran a twolevel EFA and the algorithm identified an unrestricted within and 4factor between as providing the best fit to the data. I want to extract factor scores for the betweenlevel factors (I don’t care about the within) but am having trouble. I gather that saving factor scores is not possible with a twolevel EFA, so I ran a twolevel CFA and have gotten the following error. However, we played around with changing the number of factors and removing some of the variables (just to see if the model would run), and we were successful at getting a 3 factor model and producing factor scores. Do you have any insight into this error? We also tried different start values with no avail. Thank you very much! 


I apologize, I forgot to include the error: MAXIMUM LOGLIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS 74757.670 THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILLCONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.153D13. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 53. 


Please send the output and your license number to support@statmodel.com. 

burak aydin posted on Friday, November 22, 2013  8:58 am



Hello, Our data have 130(N) clusters, 15 kids in each cluster(n~ ); and, we are trying to confirm 2 latent factors assessed with likert type questions (scale 13). We have 14 indicators for f1, 30 indicators for f2. We are not interested in factor structure at between level, also running MCFA is computationally demanding (also we dont wantto get into parceling). We plan to run single level CFA with declared clusters (CLUSTER command). Do you think this model is sound/valid/ satisfactory to run and report a CFA for clustered data? Thanks 


I think you are asking about Type = Complex versus Type=Twolevel. The factor model is not "aggregatable" (see the MuthenSatorra article Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267316.) and may be somewhat distorted by taking the Complex approach relative to the Twolevel approach. But it may be a reasonable approximation. 

burak aydin posted on Monday, November 25, 2013  10:12 am



Yes, I am asking Complex vs. Twolevel. I was encouraged to ask this question because complex vs twolevel provided consistent findings for a two level regression model, not in a simulation study but with 25 different outcomes. For a two level regression I always prefer using the twolevel approach.(this might come to GEE vs multilevel) We also wanted to use twolevel approach for CFA, given it was the nature of the data set. But we faced some difficulties, both computational and theoretical. Thank you very much for your response. 

SY Khan posted on Monday, December 09, 2013  6:58 am



Dear Drs. Muthen, I am working on secondary data in which the independent variables (employed HR practices) are measured at workplace level (level2) and are binary. Employees responses to the HR practices in terms of their job satisfaction, depression and organizational support etc are measured at level1 and are intervening and outcome data in my model (ordinal). So the employees are grouped within workplaces and evaluate the effects of their corresponding HR/organizational practices. The EFA results at level2 highlight four factors for HR practices confirmed by CFA at level 2. At the level1 EFA and CFA confirm seven constructs. So, four dimensions of HR practices (level2)affect perceptions of employees’ outcomes on seven aspects (level1). Kindly advise: 1 Because out of the total 11 constructs in the SEM model, four are at level 2 and 7 at level 1 is it appropriate the overall measurement model (CFA) be evaluated as a twolevel CFA? 2 I would like to omit the covariates in the twolevel CFA and introduce the covariates in regression/path analysis. I have found example 9.7, in Users Guide 7.11, but this is with covariates. Can you please recommend a more suitable example and syntax in my case. Thank you in advance. 


1. Yes. 2. Remove the covariates from Example 9.7. 

SY Khan posted on Monday, December 09, 2013  3:34 pm



Hi Linda, Thanks for your prompt reply. I have I made following changes to example 9.7 syntax: 1As I have seven factors at level 1 formed by Categorical factor indicators so in the WITHIN and BETWEEN: MODEL: %WITHIN% PJC1 BY AUTOM15; JSATS1 BY JS19; And so on for remaining five factors. (All level 1 indicators) %BETWEEN% PJC2 BY AUTOM15; JSATS2 BY JS19; And so on for remaining five factors. (All level 1 indicators) 2As I am not interested in covariates so removed the WITHIN=x1 x2; BETWEEN=w; 3 INTEGRATION=MONTECARLO; 4 CLUSTER=WORKPLACE#; BUT I get the following FATAL ERROR: *** FATAL ERROR THERE IS NOT ENOUGH MEMORY SPACE TO RUN Mplus ON THE CURRENT INPUT FILE. NOTE THAT THE MAXIMUM MEMORY USAGE BY Mplus 32BIT IS LIMITED BY THE OPERATING SYSTEM. Kindly advise if: 1 I have altered the syntax correctly? 2Also the syntax does not include any of the HR factors at level2 that are affecting the perceptions of employees at level 1 about their job satisfaction and six other employee outcome aspects. i.e. the overall measurement model is incomplete as it does not have four constructs at level 2 included. How can I include the effects of the level 2 constructs in the model? Thank you for your time and help 


Please send your output and license number to support@statmodel.com. 

SY Khan posted on Wednesday, December 11, 2013  11:44 am



Dear Dr. Muthen, Thank you for your email regarding the use of WLSMV estimator for computational ease of the level 2 CFA. I am still not clear on the link between level 1 and level 2 factors though. you recommend that I can create a factor at level 2 following the example 9.7 of users guide. 1 My question is that in the %WITHIN% part I have seven factors with categorical indicators at level 1. So in the %BETWEEN% part should I create one factor with all level1 factor indicators (which form seven factors). OR rewrite the seven factors at level 1 in the %BETWEEN% part again. 2 What will creating a factor at level 2 in the %BEWEEN% show? 3is it possible to have a diagram of this model? Thank you for your time and guidance. 


MODEL: %WITHIN% fw BY y1 y2 y3 y4; %BETWEEEN% fb BY y1 y2 y3 Y4; fw uses factor indicators that are the within part of y1, y2, y3, and y4. fw cannot be used in the between part of the model. fb uses factor indicators that are the between part of y1, y2, y3, and y4. This can be used in the between part of the model. A model with a random slope can have a crosslevel interaction. See Example 9.2. Path diagrams are not available for multilevel models. 

SY Khan posted on Wednesday, December 11, 2013  3:08 pm



Hi Linda, Thanks for your prompt reply. Kindly excuse my repetition as I am getting confused by the same Y1 y2 Y3 y4 variables that are coming in both between and within part. Kindly correct me if I understand incorrectly. Although the within and between parts both have y1 y2 y3 y4 but these are different variables. In the Within part these are factor indicators that form level 1 factors . In my case all different factor indicators that form seven different factors at level 1. So I will have for example fw1 By y1y4 fw2 By y5y6 fw3 BY y7y9 fw5 BY y10y13 fw6 BY y14y18 Fw7 BY y19y24 But in the between part y1 y2 y3 y4 refer to some other level 2 factor indicators that form level 2 factors. In my case all variables that form four different factors at level 2? So I have fb1 fb2 fb3 fb4 at between level as for example fb1 BY y25y28 Fb2 BY y29y32 fb3 BY y33y36 fb4 BY y3739 OR fb BY y1y24 Which one at Between level is correct? Thank you for your continuous help. 


Please see Example 9.1 where there is a discussion of the latent variable decomposition of a variable measured on the individual level. This may clarify things for you. How you specify the model on between is your decision. There are often fewer factors on between than on within. You may also want to see the Topic 7 course video and handout on the website. 

SY Khan posted on Friday, December 13, 2013  10:24 am



Dear Dr. Muthen, As recommended I have watched the short course video on multilevel modelling and found it to be very useful. Thanks for that. However, I still have some problem in successfully running the MCFA. I ran a two level CFA with five factors at level 1 and one at level 2 on categorical factor indicators. the analysis ran for approx. 15 hrs (I am not sure if this is norm in these kind of analysis?). At the end I got the message that THE MODEL ESTIMATION TERMINATED NORMALLY THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 121. THE CONDITION NUMBER IS 0.665D12. Also I did not get a chisquare, RMSEA, CFI or SRMR for within and between levels.or any standardized loadings/correlation. I used the following input analysis instructions: ANALYSIS: ESTIMATOR=WLSMV; TYPE=TWOLEVEL; INTEGRATION=MONTECARLO; OUTPUT: TECH1 TECH8; 1 Am I missing anything in these commands? 2Is the test run time usual? Thanks for your help. 

SY Khan posted on Sunday, December 15, 2013  5:23 am



Hello Dr. Muthen, In continuation to my query above regarding MCFA and no standard errors calculation due to probable model nonidentification I altered the %BETWEEN% part of the model to include only the level 2 predictor factors and %WITHIN% part only level 1 factors. The test ran ok and for 1617 hrs but at the end it did not give any output. Just the message that input data terminated normally. Kindly advise where am I missspecifying the model? Thank you for your kind cooperation. 


Please send your output and license number to support@statmodel.com. 


Dear Dr. Muthen, I am trying to compute a MCFA according to Muthen (1990) and Dyer et al (2005). So, I computed within and between level covariance matrices: DATA: FILE IS "D:\analysis\bd.dat"; VARIABLE: NAMES ARE country team q25_c q25_f q25_k q25_nr; USEVARIABLES ARE team q25_c q25_f q25_k q25_nr; CLUSTER IS team; ANALYSIS: TYPE = TWOLEVEL BASIC; OUTPUT: sampstat; SAVEDATA: SAMPLE=wincov.dat; SIGB=betcov.dat; This syntaxis automatically created the two files. For example, the within covariance matrix file (wincov.dat)contains: 0.81027265E+01 0.15827253E+01 0.32850610E+01 0.12791074E+01 0.90138411E+00 0.13141025E+01 0.99105353E+00 0.89080775E+00 0.50168173E+00 0.87720516E+00 Then I try to compute the confirmatory factor analysis on the within matrix: DATA: FILE IS "D:\analisis\wincov.dat"; TYPE IS COVA; FORMAT IS FREE; NOBSERVATIONS IS 1415; VARIABLE: NAMES ARE country team q25_c q25_f q25_k q25_nr; USEVARIABLES ARE q25_c q25_f q25_k q25_nr; MODEL: F1 by q25_c q25_f q25_k q25_nr; OUTPUT: stand mod res; But unfortunatly, I get this error message (this error is presented for both files): *** ERROR Insufficient data in "D:\analisis\wincov.dat" Thanks in advance for your suggestions to solve the problem. 


The NAMES statement in the second analysis should use the variables on the USEVARIABLES list in the first analysis. 


Dear Dr. Muthen, Firstly, thank you very much for your quick answer. The only one variable that differs on the USEVARIABLES statement is "team", which is the grouping variable (cluster is team;). Thus, it does not appear in the covariances matrix. Anyway, I have included it and computed the analysis...just to try... Below, you can see the error message. DATA: FILE IS "D:\Bea\Proyecto Clima Job Insecurity_UIBK_2010\ articulos\paper_conceptualizacion jinsc\analisis\wincov.dat"; TYPE IS COVA; FORMAT IS FREE; NOBSERVATIONS IS 1415; VARIABLE: NAMES ARE country team q25_c q25_f q25_k q25_nr; USEVARIABLES ARE team q25_c q25_f q25_k q25_nr; MODEL: F1 by q25_c q25_f q25_k q25_nr; OUTPUT: stand mod res; *** WARNING in Data command Summary data must be in free format. The FORMAT option will be ignored. *** WARNING in Model command Variable is uncorrelated with all other variables: TEAM *** WARNING in Model command All least one variable is uncorrelated with all other variables in the model. Check that this is what is intended. *** ERROR Insufficient data in "D:\Bea\Proyecto Clima Job Insecurity_UIBK_2010\articulos 


Please send the input, data, output, and your license number to support@statmodel.com. 

Rebecca posted on Tuesday, December 16, 2014  12:26 am



Hi After running a twolevel CFA (input below), this error message occurs "Residual covariance matrix is not positive defined". Input: %within% wInt by int1 int2 int3; %between% bInt by int1 int2 int3; Fixing the res. var to zero in the between part helps. The model terminates normally. Is it acceptable to keep the restricted res. var.@0 for further steps in model building (twolevel SEM) or does the error message indicate a problem in the data/model? 


Fixing the residual variance to zero for further steps is reasonable. 

Rebecca posted on Thursday, December 18, 2014  5:56 am



Thanks! For better understanding  what does it mean when the residual variance of some latent factor indicators is fixed at zero at the between level? Why is it sometimes necessary? 


Correlations on between tend to be large which leads to high reliability and small residual variances. 

Rebecca posted on Friday, December 19, 2014  1:01 am



Is there any article/webnote etc. stating that it is acceptable to fix res. variances on the between level @0? What could I cite? Thank you very much. 


See Papers/Multilevel SEM. There are several papers that probably address this issue. Try Muthen 1994 as a first step. 


Dear Dr. Muthen, I am doing a multiple group multilevel CFA test. The group variable is a betweenlevel variable. I want to examine the measurement invariance property of an instrument relevant to the group so that I can compare the difference on the mean of the betweenlevel latent factor between the two groups. Are the following example codes right for testing intercept equivalence model? Many thanks! John !!!!! Intercept equivalence model; model: %within% f_w by x1@1 x2 (wfl2) x3 (wfl3) x4 (wfl4); %between% f_B by x1@1 x2 (bfl2) x3 (bfl3) x4 (bfl4); [x1*] (I1); [x2*] (I2); [x3*] (I3); [x4*] (I4); model g2: %within% f_w by x1@1 x2 (wfl2) x3 (wfl3) x4 (wfl4); %between% f_B by x1 x2 (bfl2) x3 (bfl3) x4 (bfl4); [x1*] (I1); [x2*] (I2); [x3*] (I3); [x4*] (I4); [f_B*]; output: standardized mod(3.84); 


It looks right. 


Dear Dr. Muthen, Thank you so much! 


Hello, Can you recommend a paper or example on longitudinal CFA with a multilevel approach? I wish to run a multilevel CFA (7 time points within 710 persons). I would guess the basic syntax is below. Next I would want to regress some observed timevarying variables on these latent factors. Thank you! usevariables s11s13 s23 s24 s25 s26; cluster is id; Analysis: Type = TWOLEVEL; Model: WITHIN: W_f2 by s23s26; W_f4 by s11s13; BETWEEN: B_f2 by s23s26; B_f4 by s11s13; 


Take a look at Dunn, C., Masyn, K.E., Jones, S.M., Subramanian, S.V., & Koenen, K.C. (2014). Measuring psychosocial environments using individual responses: an application of multilevel factor analysis to examining students in schools. Prevention Science. DOI 10.1007/s111210140523x 


Dear Dr. Muthen, I am doing MCFA on a team engagement scale (e.g. "My team feels very motivated to do a good job"). I followed the steps of Dyer et al. (2005): (1) conventional CFA; (2) create within + between level covariance matrices + obtain ICC's; (3) CFA on within and between matrix; (4) MCFA. Conventional CFA and CFA on within matrix fitted well. ICC's range from 0.084 to 0.178. When doing CFA on between level I get the following warning (fit is also bad): WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. For the MCFA, I also got warning messages (about illconditioned fisher information matrix). I do not know what is wrong. I also wonder whether I should do multilevel CFA in this case (I do not aim to aggregate the data). Thank you 


Please send the MCFA output to Support along with your license number. 

Djangou C posted on Wednesday, June 08, 2016  2:12 am



Dear Dr Muthén, I am running a multilevel CFA model. Below is the message from Mplus. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.844D16. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 27, %BETWEEN%: [ Y6 ] THE NONIDENTIFICATION IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS. I am not sure to understand. My model has 52 df but I have more parameters than the number of clusters. I have three questions in this regard. 1) In multilevel models positive degree of freedom is not enough for identification, the information matrix must be positive definite. Is this true? 2) What Mplus does in this case? For instance will Mplus impose constraints on Parameter 27 by fixing it to a value to achieve identification? 3) Could you please recommend some readings on this issue? Thank you for your invaluable help. 


This is a warning. Independence of observations is at the cluster level with clustered data. So having more parameters than clusters is like having more parameters than observations. The impact of this on the results has not been wellstudied but you should be aware of it. 


Dear professors, I am running a multilevel EFA analysis on an elevenitem scale with both binary and ordinal variables (5 binary and 6 ordinal variables). After examining my within and between correlation matrices, I noticed that the between correlation matrix shows that two items within my scale have perfect correlation. After running within and between group correlation matrices on a different statistical program, I am getting different results (this program would assume that all variables are continuous). I can’t seem to determine why I am getting perfect correlation between these two variables in Mplus. Any ideas about why this might be the case? 


Please send input, output, data and license number to Support. 


Dear Dr Muthén, I am new in the field of 2level exploratory factor analysis and have difficulties in interpreting my results. The modelfit indices show the best results with unrestricted variance at the within level and a 3factor solution at the between level. The modelfit indices for the model with 3factors at the within level and unrestricted variance at the between level are also very good. I do not really understand what that means. Unfortunaetly the model with 3 factors at each level shows rather inacceptable model fit indices (but compared to the other models with fixed factors it is still the best). I do not know which model i should take as base for interpretation, as from a theoretical approach the solution of 3 factors at both levels would make sense (what I also find as long as I chose a unrestricted solution for the other level). Maybe the quality of my data is just not good enough? I tried to find some literature for a better understanding of my problem, but untill now I have not found something helpful. Maybe you can reference me to a paper? Please apologize my limited knowledge. 


Take a look at the papers on our website under Multilevel SEM. For instance, Muthen 1994, which gives an analysis strategy, and Dunn et al 2014. Typically, the between level has fewer factors than within. It sounds like there isn't a clear factor structure on both within and between. One alternative in that case is to simply use Type=Complex instead of Type = Twolevel. 

Malin Anniko posted on Thursday, November 17, 2016  9:51 am



Dear Dr Muthén, I recently received a review asking that I take into account that my data is nested. The main analyses is a CFA with 9 factors (24 indicators per factor). My data is individuals nested within classrooms within schools (3 lvls). I checked the ICC:s which are moderate to low, however the design effect is above 2 for several of the items at both the class and school level, indicating that it would be appropriate to take into account the nestedness. However, my first problem is that I only have 18 clusters at the school level. As I understand it you should have at least 20 and preferably at least 30 for both COMPLEX and THREELEVEL. Is there any way to still account for the school level within the analysis? My second problem is that I have far too many parameters I´m trying to estimate in relation to clusters, even at the class lvl were I have 138 clusters this becomes a problem (I have a total of 27 indicators). So my second question is if there is a way that I could move forward from here if reducing parameters is not an option? I´m just interested in accounting for the nestedness in my estimates, s.e and model fit, I have no predictors at the class or school level that I´d like to model. Kind regards 


You should think about how many betweenlevel parameters you have relative to 138 and to 18. The withinlevel parameters are probably well covered by data points on the student level. 18 is very low. Perhaps use Twolevel Complex to see if SEs are any different than Twolevel alone. Twolevel referring to students within classrooms. You can also use 17 dummy variables for schoold. 

Sophie Dan posted on Friday, March 31, 2017  3:32 am



Dear Mr. Muthen, It's great to find this discussion area! I have a problem with CFA. The CFA result indicates that the factor correlations of my data are relatively high, I suppose that one of the reasons might be no control for the second level factor. So I turn to the two level CFA. Because in the Mplus User Guide, there is an example of two level CFA with continuous factor indicators, covariates and random slopes, in my design there is no covariates,I just want to do the basic confirmatory factor analysis. Do you have the syntax of this? 


Use Example 9.6. Leave out the covariates. 

Sophie Dan posted on Friday, March 31, 2017  6:26 am



Dear Mr. Muthen, Thank you! What about the ¡°y1y4@0¡±in the 9.6 example? Should I fix all these to 0? For example, if I have factor A with A1 A2 A3 indicators, and factor B with B1 B2 B3 indicators, now I should specify that in the %BETWEEN% part, A1A3@0, B1B3@0? 


You can or cannot. They are often close to zero on between so most multilevel programs fix them to zero. Here you have the flexibility to see if they are zero. 

Sophie Dan posted on Friday, March 31, 2017  2:46 pm



I have tried, thank you! There is another problem, I have 50 groups in SPSS file, but when I did the CFA, the output indicates that cluster=4, what would be the cause of it? (I establish a IDCLASS variable in SPSS, and label the sample which belong to the same group with the same number, so there are number 150 under the variable IDCLASS, is this can be identified by Mplus?) 


Please send the output, data set, and your license number to support@statmodel.com 

Sophie Dan posted on Saturday, April 22, 2017  10:20 pm



Dear Dr. Muthen, I'm doing a twolevel CFA with five factors in the first level, I found that no matter how you set tht between level model, the result is not ideal, for example, 2 factors to 5 factors on the between level, they all have very high correlation with each other. But it is also not good to have 1 factor becasue some of them are negative items while others are positive. Is there a way to just partial out the impact of the between level, and see the result of the within level, because it is hard to explain the result of between level. I tried to make the variables on between level correlated, but it makes the TLI decrease to a number like 0.8 which cannot meet the requirement. Do you have some suggestions for this? Thank you! 


The partialling out approach using S_PW is described in my 1994 article on our website: Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376398. download paper contact author But note that having one factor with negative and positive factor loadings does not need to be a problem. 

Sophie Dan posted on Wednesday, April 26, 2017  12:08 am



Many thanks for you help！ I have another problem，what is the between level factor used for?Is it for predicting other between level factor in other scales？I didn't find books explaining between level factor in factor analysis，do you have some recommendations; Thank you a lot! 

Sophie Dan posted on Wednesday, April 26, 2017  1:39 am



Dear Dr. Muthen, I read your paper, so "partialling out approach using S_PW" means a factor structure at the within level with a unrestricted between covariance? But it seems that the result is the same as estimate the between and within level structure at the same time? 

Sophie Dan posted on Wednesday, April 26, 2017  6:54 am



Dear Dr. Muthen, sorry for last post, I now understand. But another problem comea, as in the syntax to define the number of observations, is NG for the S_PW, but on the website of "http://www.statmodel.com/bmuthen/ED231e/Handouts/Lecture19.pdf", it should be Nc, I'm not sure if I understand it right, is it means that "the number of observations  the mean of cluster size"? Thank you! 


G and C both refer to the number of clusters. 

lopisok posted on Wednesday, June 28, 2017  5:46 am



Dear group members, I'm running a Multilevel exploratory factor analysis based on the MPLUS syntax descriptions in this paper: " Dunn (2014) Modeling contextual effects using individuallevel data and without aggregation" My model works fine (34 countries; n=38000) and I get several possible solutions. My question is: How do I decide which model is best? As I understood there are no established fit statistics for multilevel EFA's? In the paper they use the cut off point of .10 for SRMR but this looks arbitrary. Based on theory it would be expected that there is only one factor on both levels. But is it acceptable to claim that the one factor model on both levels is a good fitting model based on the fit statistics and theory we have? Or should we go for the two factor model on both levels based on the fit statistics? How important is the SRMR in this story? EXPLORATORY FACTOR ANALYSIS WITH 1 WITHIN FACTOR(S) AND 1 BETWEEN FACTOR(S): RMSEA = 0.028 CFI = 0.975 TLI = 0.962 SRMR within = 0.142 SRMR between = 0.195 EXPLORATORY FACTOR ANALYSIS WITH 2 WITHIN FACTOR(S) AND 2 BETWEEN FACTOR(S): RMSEA = 0.013 CFI = 0.997 TLI = 0.991 SRMR within = 0.043 SRMR between = 0.071 Is there any reference paper on choosing the best model for MEFA? Kind regards, 


Research on these fit indices for multilevel EFA is still lacking so I have no clear cut answers for you. I know of no references for this. It is not clear that SRMR should be prioritized. I suggest you go by interpretability to a large degree. Also, you want to check the model with 2 within factors and 1 between factor. Often there are fewer factors on between. 


Dear all, I am knew to Mplus, and I am having some trouble. I have a data set with repeated measures. Each subject (n = 80) answered the same 15 items (binary) in 8 different times. I, therefore, believe I could treat it with MFA. However, my main point is to extract the factor overtime. In my understanding, the repeated measure account for the within source of variance. As I am not used the the Mplus language yet, I am not sure if my code is write: VARIABLE: NAMES ARE suj u1u8; CATEGORICAL ARE u1u8; USEVARIABLES = suj u1u8; WITHIN u1u8; ANALYSIS: TYPE = TWOLEVEL EFA 1 2 UW 1 2 UB; MODEL: %WITHIN% fw BY u1u8; %BETWEEN% fb BY suj; I appreciate any help. 


It sounds like you want to study the change in the factor over time. This can be done for example using a growth model (see UG example of growth models and Topics 3 and 4 of our short course on our website) for the factor. This assumes that you have measurement invariance for the 15 items across time. Usually, this is done as a singlelevel, wide model but you have 15*8 variables and only n=80 so that won't work. As an alternative, a twolevel analysis with a linear growth model can let the factor score change over time as follows where time is scored say 0, 1, 2, ..., 7 for the 8 time points: Model: %within% fw by u1u15* (p1p5); s  fw on time; %between% fb by u1u15* (p1p15); fb@1; s with fb; Here, loadings are assumed equal over the two levels which simplifies things. Apart from getting their means and variances, you can ask to plot s and fb. They correspond to the slope and intercept growth factors. 


Hello, I have a dataset originating from a simple random sample of employees. When running CFAs, I'd like to simply control for the nested structure of the data, where employees are part of directorates (n=96), which are part of branches (n=24), which are part of divisions (n=3). Now, I'm assuming I should use TYPE=COMPLEX and CLUSTER IS in this particular case. What should my cluster variable be? Poststratification weights were computed based on divisions only; should I also use this weight variable in my analyses? Many thanks for your help! 


Having only 3 divisions is too few units for an extra level (you want as the bare minimum about 20). So use Type=Threelevel for employee, directorate, and branch. Divisions can be reflected by 2 dummy variables. Instead of weights, you can use the demographic variables that are used to form the weights as further covariates. 


Dear Mplus users, I'm testing a multilevel CFA model (individuals nested within teams). Although I am not interested in any factor structure at the between level, I do want to take into account the fact that the data are nested. Therefore, I use type = twolevel but I only declare my CFA model at the within level (not at the between). The results that I get though are quite different from the results I get in AMOS when I do not do multilevel analysis. In AMOS everything looks fine but in multilevel analysis one factor loading becomes nonsignificant. I repeat analyses in AMOS with groupmean centering for all items (which I assume Mplus is doing at the withinlevel?). But the results are again fine and I don't get the nonsignificant loading. Anyone has experience with this? Is there a way to reproduce the nonsignificant loading in AMOS? I'm just trying to understand why this discrepancy occurs. Thanks! Paris 


You should check to see if the model you get on Between allows the variables to correlate. If not, you can add a single factor on Between. You can use Mplus to do a singlelevel analysis of S_PW, that is, the covariance matrix for groupmean centered variables. This gets you close to the withinlevel results you would get in a wellfitting twolevel analysis. See the paper on our website under Multilevel SEM: Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376398. download paper contact author 


Dear Bengt, Thank you, that really helps. There were no correlations at the between level so I added one factor at the between level. The model runs successfully but the factor loading is still nonsignificant at the within level. Do you maybe know what could explain the difference from the groupmean centered CFA results in AMOS? Also, I had a look at the Mplus Guide but I can't find an example for multilevel covariance structure analysis. Do you maybe have any input available? Thank you in advance! Paris 


Multilevel factor analysis is shown in the UG  see e.g. ex9.6. You can also save the pooledwithin covariance matrix (S_PW) in Mplus (see UG) and analyze it in a second step. 


How are the standard errors for Rsquared values determined at the within and between levels in multilevel CFA? 


Using the Delta method based on the parameter estimates and their covariance matrix. 


Dear Dr. Muthen, I performed a Multilevel CFA with 3 factors and a bifactor within, and 3 factors at the between level. Fit indices are: ChiSquare Test of Model Fit 83.790 Df 71 PValue 0.1423 RMSEA 0.016 CFI 0.995 TLI 0.992 SRMR Within 0.036 SRMR Between 0.258 While the overall fit is good, the SRMR between is poor. It has been stated that you can get level specific fit indices by specifying a saturated model at one level and your specified model at the other (Ryu, 2014). Specifying saturated within and then the same 3 factor between model as above yields generally excellent fit indices (except the SRMR between): ChiSquare Test of Model Fit 25.224 df 41 PValue 0.9749 RMSEA 0.000 CFI 1.000 TLI 1.018 SRMR Within 0.022 SRMR Between 0.220 Why is the fit based on the partially saturated model excellent, despite a poor SRMR between? Does this suggest the between model is acceptable? Item ICCs range between .068 and .184. Thanks so much for your help! 


The SRMR should not be used at all in this situation. The SRMR is a fit index that can help you establish "approximately fitting model". If the model fits well already based on the chisquare (i.e. you have more than approximately well fitting model), SRMR provides no further information about the model and should not be used. SRMR is intended to counter chisquare sensitivity due to large samples, i.e., in large samples even the smallest difference between the data and the model becomes a reason to reject the model. In such situation SRMR can be used to affirm that the model is approximately well fitting. We do not recommend using SRMR for small and even moderate sample size because in those cases model fit should best be establish via the chisquare. SRMR is an average of fits and can hide substantial model misfits that should not be treated as good enough approximations. In twolevel models the SRMR Between is guided by the between level sample size, i.e., the number of clusters. Unless you have a large number of clusters SRMR Between should not be used as a fit measure, i.e., chisquare takes precedent. In your situation the explanation is very simple. The number of clusters in your sample is fairly small and thus the misfit between the estimated model and the data is not statistically significant  while according to SRMR Between the differences between the H0 and the H1 model are large, due to small number of clusters these differences are acceptable. You can look at the residual output in Mplus to understand more about why you get large SRMR Between, however, there is little you can make out of these differences because they are not significant. Most likely small covariance differences on between, on correlation scale are exaggerated. 


Dear Dr. Asparouhov, The analysis was run on a twin and family dataset, with the family groups comprising the higher level clusters. As such, there are a large number of clusters (433) at the higher level, and a large ratio of clusters to individuals. There are only 2 siblings within each cluster. However, the problem seems to be the SRMR Between. In this case, is using the chi squared test still the best way? The fit of the multilevel model overall is better than that of the within only model according to the CFI, TLI and RMSEA. Thanks for your help. 


if you can you send the data and example to support@statmodel.com we can try to provide more insight. It is an unusual situation  I would expect SRMR to be more in line with the chisquare for that many clusters. You can take a look at the residual output and see if you can spot the discrepancies. Nevertheless since the chisquare does not reject the model, you should ignore SRMR Between. 

Back to top 