Multilevel factor analysis PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 anonymous posted on Monday, January 22, 2001 - 2:44 pm
Therapists evaluate their supervisor
on 43 questions and we need to determine a stable factor structure that
reflects qualities of supervisors. What complicates things is that
several therapists share and thus evaluate the same supervisor which
creates a multi-level situation. We have been doing factor analysis
using SAS proc factor (exploratory factor analysis), ignoring this
multi-level structure, which is of course not completely right. Your
1994 paper (I read second hand from the book by Heck and Thomas, 2000,
which referenced your paper heavily as the definitive answer to this
problem) proposed decomposing the total covariance to the within and
between parts. I understand that M-plus can do this. My questions are:

1. Does this strategy apply to our multi-level situation? If yes, can
we use SAS proc factor to analyze the within and between covariance
from M-plus? What is the interpretation for the factor structure based
on within vs. between covariance?

2. What analytic strategy do you recommend to assess the
similarity/difference in factor structure generated by the within vs.
between analyses?

3. Is there an example in your M-plus support site that matches our
problem closely?
 Bengt O. Muthen posted on Monday, January 22, 2001 - 3:08 pm
Yes, this sounds like it would be suitable for multilevel factor analysis as I described it in the 1994 article. The supervisors form clusters of therapists, with unequal numbers of them within clusters. Mplus offers a couple of different forms of this analysis.

The Type=twolevel analysis draws on a model where there are both between- and within-variation sources. The within-level part of the model describes the factor structure for how the therapists' 43 evaluations covary across therapists. The between-level part of the model describes how the 43 supervisor means covary across supervisors.

The estimator is based on maximum likelihood where the analysis of the between and within parts of the model are analyzed simultaneously. The number of factors, loading values, and factor distribution parameters can be different on the between and within levels. Tests of equalities can be made. Exploratory factor analysis (EFA) can be done when carried out in this confirmatory framework, using the m**2 restrictions needed (m being the number of factors) on Lambda and Psi, where the restrictions are placed on both levels. Experience shows that the between structure is often different from and simpler than the within structure (see, e.g. reference in the 1989 Muthen Psychometrika article as referenced on this web site).

A simpler approach is to use the pooled-within sample covariance matrix (scaled to a correlation matrix) for a regular EFA. This gives estimates close to those that are obtained for the within-level parameters in the twolevel analysis described above. Use sam;ple size N - C, where N is the total number of therapists and C is the total number of supervisors.

For the between level parameters, the sample between covariance matrix can be used, although this is not an unbiased estimator of the between covariance matrix; the unbiased estimator can be used instead (for details, see the User's Guide, Technical Appendix 10). Between-level estimates can differ quite a lot from those obtained through the twolevel analysis.

The Mplus web site Example section has a similar analysis under Continuous, cont10.
 Gianbattista FLEBUS posted on Wednesday, July 03, 2002 - 3:08 am
How do you format data to perform multilevel analysis? that is to say, are cluster-level data repeated for each element in the second level, or are they written only once?
Gianbattista FLEBUS
 Anonymous posted on Wednesday, July 03, 2002 - 10:37 am
Cluster-level data are repeated for each element in the cluster.
 Anonymous posted on Tuesday, January 21, 2003 - 10:29 pm
Is the quasi-ML fiting function for multilevel factor (MFA) anlaysis robust to designs that are unbalanced and have a non-normal multivariate distribution? In any case, please supply references. Also, please supply some guidelines and limitations for using the MLM estimator in MFA under non-normal situations (e.g., normalized versions of Mardia's coefficients are greater than 10 for skew and kurtosis).

please send your responses to:

Thank you for any and all of your help.
 bmuthen posted on Wednesday, January 22, 2003 - 6:45 pm
Mplus 2.1 has both MUML (limited-information, quasi-ML) and FIML (full information ML) estimators, each with a non-normality robust version for the standard errors (called MUMLM and MLR, respectively). Two-level MLM in earlier Mplus versions is now called MUMLM. More about these estimators is stated in the Addendum to Mplus Version 2.1 on the Mplus web site. It is my experience that MUML agrees well with FIML also with data that have very different cluster sizes (unbalanced data) and this has also been seen in studies by Joop Hox. MUMLM non-normality robustness for s.e.'s is expected but has not been thoroughly studied nor written about; simulation studies can explore this. MLR non-normality robustness for s.e.'s is expected due to the sandwich formula used. With very skewed data it is often the case that there is also strong floor or ceiling effects in which case linear models are not suitable, so the s.e.'s correction does not help.
 Sven De Maeyer posted on Monday, February 23, 2004 - 7:51 am
We have data on teachers, evaluating leadership at school. I did a multilevel factor analysis which has nice results. Now I want to calculate scores for these underlying concepts on the school level to do some exploratory analysis with pupil data. How do I calculate these scores for the schools.

Are there some good practices that someone could advice me? Do I calculate on the schoollevel the pooled mean scores for the individual items and afterwards sum up these scores weighted by the inverse of the error variance? Are there better ways of doing this?
 bmuthen posted on Monday, February 23, 2004 - 8:34 am
Mplus prints out the estimated factor scores for factors on the between level.
 Linda K. Muthen posted on Monday, February 23, 2004 - 8:50 am
See page 88 of the Mplus User's Guide to see how to use the SAVEDATA command to save factor scores.
 Sven De Maeyer posted on Monday, February 23, 2004 - 8:56 am
Inserting the following line:
SAVE=FSCORES results only in the original scores. In your User's Guide you mention that this option is not available for TYPE=TWOLEVEL. Neither is the option FSCOEFFICIENTS available in this type of analysis. Do I make a mistake?
 Linda K. Muthen posted on Monday, February 23, 2004 - 9:29 am
I just ran an example and got factor scores using Version 2.14. What version of the program are you using?
 Sven De Maeyer posted on Monday, February 23, 2004 - 9:43 am
I also use that version. I'll send you an example of the output.
 Daniel Bontempo posted on Thursday, January 13, 2005 - 10:08 am
Similar to the therapist example above, I have personality items that I want to look at the factor structure. The respondents are grouped within families (actually twin pairs).

Unlike the cont10 example, my items have only 2 levels and need categorical methods.

I do not have any predictors for the family/twin level, and am not clear about the need/interpretation of the within and between factors.

Also, I wanted to do a multi-group analysis to look at age groupings. I am running into trouble because the TWOLEVEL option forces a ML estimator (or MLR) with numerical integration, and this is incompatiable with the THETA parameterization and MGROUP option I am using.

I have evidence for strong age invariance, and could colapse into one group and use DELTA (because I will no longer test strict invariance - or even invariance), but would prefer to keel the multiple groups if possible.

The error messages said something about using MIXTURE and KNOWNCLASS, but I am unsure of the implications of this. Can youelaborate or provide a reference?

Given the basic problem of multigroup (age or gender) factor analysis with multilevel (family) data, what suggestions would you offer for me to pursue? ANy paper or example that had similar issues would be great. Sorry to make such a sweeping request.
 Linda K. Muthen posted on Thursday, January 13, 2005 - 4:27 pm
I would suggest taking the multivariate approach to family data described in Khoo and Muthen. See our reference list. Then you can use WLSMV and multiple group.
 Daniel Bontempo posted on Thursday, January 13, 2005 - 9:04 pm
Ok - I will look at this paper. IT does appear to be longitudinal data, while mine is cross-sectional. My levels are twin within twin pair, and my groups are age-bands.

I still want to get a single factor using all the twins. Perhaps this will be clearer in the paper.
 Linda K. Muthen posted on Friday, January 14, 2005 - 8:26 pm
The general idea is helpful. It can be applied to bothlongitudinal and cross-sectional settings.
 Daniel Bontempo posted on Wednesday, January 19, 2005 - 12:53 am
Linda -

I have reviewed the paper and am uncertian about the applicability.

Recall my goal is to handle the dependancies due to twins in a multi-group factor analysis with binary indicators.

In the paper, family was the unit of analysis and each sibbling was allowed to have their own growth parameters. If I follow your suggestion, twin-pair would become the unit of analysis, and I would estimate a common factor for each side of the twinship.

I think I would need to insure that the same common factor was produced on each side of the twinship - I'm not sure if this would be by constraining loadings or directly constraining the factor mean and variance. At this point, I am not sure that I haven't undone any "handling of the dependancy" that was addresses by the multivariate approach.

Please, could you comment on this?

Also, I have been thinking about TYPE=MIXTURE using KNOWNCLAS to indicate my two or three groupings. In this senario, could I also use TWOLEVEL clustered on twin?

Please, could you also comment on this?

 Daniel Bontempo posted on Wednesday, January 19, 2005 - 12:57 am
Linda -

Since it is arbitrary which twin is placed on each side of the twinship, I am wondering if there is a monte-carlo senario where I could get MPlus to repeatedly randomly divide the twins and estimate the model and present me with averages.

Thanks again
 Linda K. Muthen posted on Thursday, January 20, 2005 - 8:09 pm
To ensure that the same factor operates for both twins, your multivariate model for the joint anslysis of the twins should have one factor for each twin where the factor correlation (assuming factor variances are fixed at one to set the metric) is fixed at one and the factor loadings are held equal.

If you go to MIXTUE TWOLEVEL you will have the same numerical integration issue you started with.

The twins are not mixed together in the analysis. If you have two factors as described above, a random arrangement is not an issue.
 Sungworn posted on Tuesday, June 14, 2005 - 2:13 pm
I am wondering if Mplus can do multilevel factor analysis of dichotomous data (i.e., achievement test data where 1=right, and 0=wrong)? Thanks.
 bmuthen posted on Wednesday, June 15, 2005 - 7:43 am
Yes, this can be done using ML.
 Sungworn posted on Thursday, June 16, 2005 - 1:04 pm
Dr. Muthen,

Are you familiar with NOHARM, a computer software for multidimensional IRT? If so, do Mplus and NOHARM yield the same results in terms of factor loadings and thresholds?
 BMuthen posted on Friday, June 17, 2005 - 1:38 am
I am not very familiar with NOHARM but I don't think the estimates would be the same.
 Pancho Aguirre posted on Wednesday, November 16, 2005 - 7:06 pm
Hello Linda and Bengt,

I'm wondering if I can conduct the following analysis in Mplus. I modify the example 9.9 and 9.10 from the Mplus version 3 User's guide on pages 205-207. I have 31 clusters would that be large enough cluster size?

TITLE: this is an example of two-level CFA with continuous factor indicators, covariates,and random slopes
DATA: FILE IS ex9.9.dat;
VARIABLE:NAMES ARE y1-y4 x1-x4 w clus;
CLUSTER = clus;
fw1 BY y1-y4;
fw2 BY x1-x4;
s | fw1 ON fw2;
fb BY y1-y4;
fb s ON w;

Thanks a lot,

 bmuthen posted on Thursday, November 17, 2005 - 5:15 am
See answer to the same question under SEM.
 Marco posted on Monday, December 19, 2005 - 1:21 pm
Hello Linda, hello Bengt,

I experience sometimes, that a MFA with estimator=MLR yields an undefined scaling factor. Judging from the preliminary steps (from Muthén, 1994), the chi²-statistic and the fit indices seem to be ok. So, what is the meaning/reason of an undefined scaling factor? Is there a way to conduct an chi²-difference-test with these results?

Many thanks! Btw, is it possible to see somewhere on the homepage, what exactly has been updated? That would be a good idea, since the homepage contains so many important information.
 bmuthen posted on Tuesday, December 20, 2005 - 5:47 am
The scaling correction factor comes out negative, i.e., the estimation gives a poor approximation to the chi2 asymptotic distribution. Wald testing is an alternative, but is not easy to do by hand; will be available in future Mplus.
 Marco posted on Tuesday, December 20, 2005 - 5:55 am
I guess that the "poor approximation" refers to the estimation of the scaling factor. Does this imply that the chi²-statistic itself is unreliable?
 bmuthen posted on Tuesday, December 20, 2005 - 6:09 am
 Marco posted on Tuesday, December 27, 2005 - 3:20 pm

based on my limited trials, I found an undefined scaling factor only in models, where an indicator is specified as within (despite having little between variance). The scaling factor becomes positive defined after eliminating the indicator entirely from the analysis or allowing the indicator to vary within and between. Is this data-specific or generally expected? Thanks!
 Linda K. Muthen posted on Wednesday, December 28, 2005 - 8:33 am
Most likely data specific.
 Magdalena Cerda posted on Wednesday, March 08, 2006 - 8:20 pm

After watching the latest training on multilevel analysis and reading the Grilli paper on multilevel factor analysis with ordinal variables, I have several questions resgarding multilevel CFA in MPLUS.

1) Are there any other examples of papers that discuss the interpretation of the MPLUS output for multilevel CFAs with categorical variables?

2) Dr. Muthen said that the categorical multilevel CFA is essentially a 2-parameter IRT model. Is this still the case when the model doesn't have random slopes or does it then become a Rasch model?

3) Are the factor variances at the within and between levels directly interpretable in the case of categorical CFA and can I use it to calculate an ICC?

4) what do the thresholds mean in the case of the categorical CFA output?

Thank you for your help.

Magdalena Cerda
 Bengt O. Muthen posted on Thursday, March 09, 2006 - 6:41 am
1) There are papers on multilevel CFA, but not discussing the Mplus output per se as far as I know. An early paper with continuous outcomes is:

Muthén, B. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338-354. (#37)

2) Rasch has the same slope for all items. If the slope is random or not (cluster level variation or not) is another matter. One can have a Rasch or not a Rasch model and have fixed or random slopes.

3) For that you need to hold the loadings invariant across the two levels and that often does not fit as well as letting them be different.

4) With binary outcomes they are the same as the negative of the intercepts. For translations between Mplus and IRT parameterizations, see our Short Course handout from Day 3 which can be requested off the web.
 Magdalena Cerda posted on Thursday, March 09, 2006 - 10:46 am
Dear Dr. Muthen,

Thank you very much for your reply to my questions. I have ordered the handouts from the short course for lectures 3 and 5. In the meantime however, could you tell me how I can calculate the ICC from the output obtained from a categorical multilevel CFA? I cannot hold the loadings invariant across the two levels because I have 2 factors at level 1 and 1 factor at level 2.

Thank you.
Magdalena Cerda
 Bengt O. Muthen posted on Friday, March 10, 2006 - 6:37 am
Icc is a concept for a continuous variable where the variance is a freely estimated parameter. This is not the case with a categorical variable because you don't estimate a free variance parameter for the dependent variable (the mean p and the variance p(1-p) are mathematically linked). You can talk about an icc for a factor as I did in the article I mentioned, but for that you need loadings that are invariant across levels. So I don't see how you can meaningfully compute an icc here. On the other hand, I don't see the need for it either because the estimated model has all the information you need - the amount of between cluster variation tells you how much 2-level modeling is needed.
 Magdalena Cerda posted on Friday, March 10, 2006 - 11:43 am
Dear Dr. Muthen,

Thank you for your reply. Is there a way, given a multilevel factor analysis with different loadings at the two levels and categorical variables, to calculate the level 2 reliability coefficient from the MPLUS output? For example, as proposed by Raudenbush in some of his papers on three-level logistic Rasch measurement models?

 Bengt O. Muthen posted on Friday, March 10, 2006 - 6:35 pm
Could you give me a reference to a key paper on this?
 Magdalena Cerda posted on Saturday, March 11, 2006 - 2:32 pm
Dear Dr. Muthen,

Two papers which discuss three-level logistic Rasch measurement models, and present equations to calculate level 1 and level 2 reliability are:

Raudenbush, S.W. , Johnson, C. and Sampson, R. J. (2003). A multivariate, multilevel Rasch
model for self-reported criminal behavior. Sociological Methodology, Vol. 33(1), 169-

Cheong, Y.F. & Raudenbush, S.W. (2000). Measurement and structural models for children’s
problem behaviors. Psychological Methods, 5(4), 477-495.

 Magdalena Cerda posted on Wednesday, March 22, 2006 - 11:46 am
Dear Dr. Muthen,

I have read the short course handouts and I still have some questions about the output from the MPLUS categorical multi-level factor analysis.

1) should the thresholds be divided by the level 1 factor loadings to get IRT parameters or to the level 2 factor loadings? what level do these thresholds correspond to?

2) how can one calculate item and scale information from a two-level logistic IRT as output in MPLUS?

3) how can one calculate within and between reliability from a two-level logistic IRT as output in MPLUS?

4) how can one specify in MPLUS a Rasch two-level model otherwise equivalent to a two-level logistic IRT?

Thank you for your help.

 Bengt O. Muthen posted on Wednesday, March 22, 2006 - 5:45 pm
You may find it useful to study the paper posted on our web site under Recent Papers:

Grilli, L. & Rampichini, C. (2004). Multilevel factor models for ordinal variables. Submitted for publication.

This gives details in an Mplus framework.

1) In two-level modeling, means/intercepts/thresholds are given on level 2 (see 2-level linear regression in the Raudenbush-Bryk book as an example). It sounds like you are using a model that has different loadings for the 2 levels. I don't know that multilevel IRT models have addressed the issue of differing loadings (discriminations) on the different levels. If you compare to the Raudenbush et al (2003) article in Soc Meth, eqn (2) is written in classic IRT form, but when adding the page 183 multilevel features, it looks like the discrimination parameter lambda has been dropped. And more to the point, I don't see why it is necessary to connect to IRT - all you are interested in is being able to plot your item characteristic curve as a function of within and between variation in the ability. You can do that straight from the Mplus model (again see Grilli & Rampichini). But if you show me a multilevel IRT model with different loadings, I will make the translation.

2) Regarding within- and between-level reliability, I still have to read up on the references you gave me to answer this. These are not quantitites I am used to looking at (I think).

3) A two-level Rasch model in line with the Raudenbush article would seem to be easy to specify in Mplus using ML logit. You simply set loadings equal across items and across levels.

Keep me informed about your progress. And I will try to find time to read those 2 articles you suggested (my reading stack is just a little high right now...).
 Magdalena Cerda posted on Wednesday, March 22, 2006 - 8:39 pm
Thank you very much for your reply and for taking the time to keep answering my many questions!

1) I did read the Grilli and Rampichini paper, but I would like to make a statement about the overall precision, or reliability, of the scale at the individual and neighborhood levels, and this paper only proposes formulas to calculate communalities at the item levels. That's why I thought I could at least calculate the scale information from the transformed IRT parameters...

2) Am I right in assuming that if one has two factors at the subject level and one factor at the neighborhood level, one should not constrain the items to have equal loadings at the two levels?

This is the model I have specified:

Estimates S.E.
Within Level

Q12AR 1.000 0.000
Q12BR 1.064 0.097
Q12CR 0.766 0.134
Q12ER 0.651 0.056
Q12FR 0.562 0.146

Q11AR 1.000 0.000
Q11BR 1.297 0.090
Q11ER 1.439 0.178
Q11FR 0.698 0.059
Q11KR 0.582 0.048
Q11MR 0.993 0.077
Q11GR 1.033 0.067

FW1 2.830 0.186
FW1 5.784 0.508
FW2 2.761 0.265
Between Level

Q12AR 1.000 0.000
Q12BR 1.314 0.192
Q12CR 0.516 0.106
Q12ER 0.847 0.122
Q12FR 0.641 0.285
Q11AR 0.578 0.081
Q11BR 1.022 0.163
Q11ER 1.842 0.324
Q11FR 1.265 0.211
Q11KR 0.976 0.149
Q11MR 1.764 0.262
Q11GR 1.100 0.162

Q12AR$1 -0.101 0.093
Q12BR$1 -1.267 0.133
Q12CR$1 0.094 0.163
Q12ER$1 -0.402 0.110
Q12FR$1 -1.280 0.097
Q11AR$1 -0.764 0.060
Q11BR$1 -0.052 0.090
Q11ER$1 -1.644 0.146
Q11FR$1 -0.895 0.103
Q11KR$1 0.469 0.090
Q11MR$1 -0.208 0.145
Q11GR$1 -0.840 0.092

FB 0.371 0.129
 Bengt O. Muthen posted on Thursday, March 23, 2006 - 8:31 pm
1) Just so that we have the language clear, when you say "scale information", do you refer to estimated factor scores and their precision (SE)?

2) That's right. The factors mean very different things on the two levels so that equal loadings would not make sense.
 Magdalena Cerda posted on Friday, March 24, 2006 - 5:06 am

In terms of "scale information", I mean the sum of item information for the items in a scale (information provided by specific response category times probability of respondent with trait level x choosing gth response category), assessed across the range of the underlying latent construct.

In essence however, I would just like to get a measure of the precision of the scale and be able to make a statement about its measurement quality at the subject and neighborhood levels.

Thank you for all your help and time with this problem--I very much appreciate it!

 Bengt O. Muthen posted on Friday, March 24, 2006 - 5:52 am
When you describe what you mean by "scale information", it sounds like what IRT people call "information function". That is, the standard errors for the factor score estimates expressed as a function of the true ability. See for example the 1985 Hambleton-Swaminathan book, chapter 6. Do you agree? If so, Mplus has not yet implemented this, but we will do so, also for multilevel models.
 Magdalena Cerda posted on Friday, March 24, 2006 - 9:20 am
Yes, it is. OK, that's good to know--thanks. Then is there a way I can characterize the precision/reliability of the scale as a whole with these models in MPLUS?
 Bengt O. Muthen posted on Friday, March 24, 2006 - 9:54 am
The reason calculating precision/reliability for a scale (= a latent variable construct, estimated as factor scores) is not included yet in Mplus is that this is not typically central to latent variable modeling in the following sense. You have a measurement model with 2 within-level constructs, one for 5 and one for 7 categorical items. The measurement modeling is typically not an aim in itself, but is related to other variables, either predictors or consequences. Those other variables can be brought together with the measurement model to create a structural equation model that is estimated in a single step. The precision/reliability aspect of the measurement model then translates to how well you can estimate structural regression slopes and that is assessed by their SEs. Few research questions need to be approached in a 2-step fashion (measurement model producing a scale, scale used for some purpose). - What kind of use of your measurement model do you have in mind?
 Magdalena Cerda posted on Thursday, March 30, 2006 - 1:12 pm
I would like to make a statement about the quality of a measure across two different sites. So one of the things I would like to compare is the level of reliability of a measure of the latent construct in the two sites--i.e. can the same construct be measured with comparable reliability in the two sites? That's why I wanted to compute the reliability. Maybe there's a way to do it manually, as proposed by Raudenbush in those articles I cited.

Also, I have another question. Is there a way that MPLUS has of doing differential item functioning for multilevel structures? I would like to compare measurement equivalence across the two sites for this scale as well, but since the items are dichotomous, I can't use multiple group cfa.

Thanks, as always, for all your help!

 Bengt O. Muthen posted on Thursday, March 30, 2006 - 3:29 pm
Regarding comparing reliability of the measure across the 2 sites, what I was referring to would amount to using a 2-group latent variable analysis instead of estimating factor scores and comparing them across groups. In the 2-group analysis, you can test group invariance of the item parameters directly. The reliability of the factor score estimates that you are referring to will not be very high with only 5-7 categorical items. In contrast, the 2-group latent variable analysis comparing say the means of the latent variable across sites (assuming measurement invariance), can give good power/precision in the estimation.

I guess the above answers your second question as well. You can do 2-group analysis for categorical items. In the ML estimation framework you would do that using Type = mixture and the Knownclass option to capture the 2 groups.
 Ellen D'Haenens posted on Tuesday, April 25, 2006 - 5:14 am
Dear Dr. Muthen,

Using STREAMS with Mplus, I performed a twolevel analysis. To be able to use the start values, I performed at first a regular confirmatory factor analysis on the total covariance matrix. This leaded to a model with six latent variables and covariances among the most of them. The model fit of this model equals the following: RMSEA= .061 and X squared/df = 2.18. Although the RMSEA indicates some possibilities to adjust the model, the X squared/df shows a good model fit. Since further adjustments would lead to difficulties in interpretation, I decided to use this model to begin with at the within level of the twolevel analysis. At first I presupposed no between structure by allowing the manifest variables at the between level to covary freely.
However, running this twolevel analysis, I encountered several problems.
-Sometimes I received the remark: 'Estimated between covariance matrix is not positive definite as it should be. Computation could not be completed. Model estimation did not terminated normally. Change model and/or starting values.' I tried some small changes, like covariances among all latent variables, but that did not help.
-At other times I received an internal error code (GH1006), or even a fatal error code that pointed out that there is not enough memory space to run the program on the current input file. However, at other times the model did run (see remark above).
Trying to specify the between level structure also leaded to the internal error code or the fatal error code.
Encountering these problems, I was wondering if these are linked to the poorer model fit of my model for the total covariance matrix, though only indicated by the RMSEA.
If not, what are your suggestions to overcome these problems?
If necessary I can send in my input file and data, as asked for in the internal error code GH1006.

Thanks for your help.

Ellen D'Haenens
 Linda K. Muthen posted on Tuesday, April 25, 2006 - 6:20 am
The only way to see what is happening is for you to send your input, data, output, and license number to
 Sharyn L. Rosenberg posted on Wednesday, July 05, 2006 - 9:34 am
Dear Dr. Muthen,

I have a couple of follow-up questions to the issue that Magdalena Cerda raised. I am also interested in calculating the level-2 test information function for a model with many level 2 units. I realize that calculating level 2 precision estimates of factor scores is often "unnecessary" in latent variable modeling since the goal is to keep the measurement and structural components in a single model rather than taking out the factor scores to use in a path analysis. However, there are many instances in educational research where test scores are used in a multilevel analysis and are not treated as latent variables. The current state of educational research is that test scores are much more widely available as scaled scores than as raw data. When student test scores (scaled theta estimates computed from an IRT model, so equivalent to factor scores from a categorical FA) are used in multilevel models where the effect of interest is at level 2 (such as looking at an effect of a teacher intervention on student achievement where teachers are the unit of assignment), it seems that it would be important to know the reliability of the latent mean achievement at the teacher level. It is possible that the test information function at level 2 may be very different from the test information function at level 1, which would suggest that the same measures may not be appropriate for inferences that involve students and teachers.

I realize that Mplus does not currently plot the IRT information function at level 2 or save the standard errors of factor scores at level 2. However, if Mplus can be used to obtain level 2 loadings and thresholds for binary indicators, then can't I use this information myself to compute a test information function at level 2?

Thank you for your help.

Sharyn Rosenberg
 Bengt O. Muthen posted on Monday, July 17, 2006 - 9:45 am
Mplus does give information curves for level 2 latent variables. If you don't get them, there might be another reason for it. You might want to send your input, output, data, and license number to
 Sharyn L. Rosenberg posted on Tuesday, July 18, 2006 - 5:21 am
Thanks. I actually did get the level 2 information curve when I tried it (I didn't realize that Mplus now had this capability when I first posted the message).
 wendy posted on Friday, July 21, 2006 - 4:17 pm
Hi, Dr. Muthen:
I simulate a 2-level CFA model and both within and between levels have equivalent structures: 2 factors and each factor predict 3 indicators. The output showed an error message indicates that my between level covariance estimation is not positive definite, it says:
Does that mean two between level factors are highly correlated? What is meaning of it and how could I correct starting values such as factor variances?
 Linda K. Muthen posted on Saturday, July 22, 2006 - 11:36 am
It is likely that in MODEL MONTECARLO you specify a high correlation between factor2 and factor1 and that in one random draw, the correlation becomes one. If you want more information on this, send your input, output, and license number to
 Magdalena Cerda posted on Wednesday, January 10, 2007 - 9:44 am
Dear Dr. Muthen,

I estimated a two-level factor analysis, with two factors at the respondent level (4 items each) and one factor at the neighborhood level (8 items), and an n=2494, nested within 166 neighborhoods.

I get very different results for the neighborhood-level factor loadings and the neighborhood-level reliability (using the information curves), depending on the scale item I decide to fix to 1 in the neighborhood-level factor and one of the respondent-level factors. If I select one particular item, the reliability is low (highest information statistic about 0.8) but the item loadings are significant. If I select another item, the reliability becomes sky-high (an information statistic of about 100), but the factor loadings for many of the items at the neighborhood level become insignificant. Changing the reference item doesn't have an impact on factor loadings or reliability at the respondent level .

Would you know why this is?

Thank you,
Magdalena Cerda
 Bengt O. Muthen posted on Wednesday, January 10, 2007 - 11:19 am
Please send your input, output, data and license number to Include both choices for item loading fixings.
 Linda K. Muthen posted on Thursday, January 11, 2007 - 10:53 am
Different items chosen to set the metric of the factor gives different results for significance of loadings and for information functions because the factor is expressed in a different metric. For
loadings, this is in line with standardized coefficients not being
significant at the same time as the raw coefficients. The information curves will differ in the two runs but they will be
proportional by lambda^2 see formula (8) in

I recommend using the conventional IRT metric of all loadings free and the factor variances fixed at 1 on both levels.

As an alternative to Monte Carlo integration I would suggest INTEGRATION = 7.
 Magdalena Cerda posted on Thursday, February 08, 2007 - 10:31 am

I have estimated a multilevel confirmatory factor analysis with one factor at each of two levels, and covariates at the two levels as well. One of the neighborhood covariates consists of a set of dummy variables for different neighborhood "types" (poor and cohesive vs poor and non-cohesive, for example). I would like to construct a bar graph showing the estimated level of the latent outcome (perceived violence) for the different neighborhood types, with average values for all other covariates.

The question is, how does one obtain this in a two-level factor model that has covariates at both levels? Does one just use the beta estimates at the between level to estimate a prediction, as one would in a single-level model, or does one also use thresholds at the between level, and does one also need to use anything at the within level? The concern arises particularly since there is no one intercept in the model, but several...

Thank you for your help, as always!

Magdalena Cerda
 Linda K. Muthen posted on Friday, February 09, 2007 - 8:39 am
Please send your full output and license number to so we can see your full model.
 roberta varriale posted on Thursday, March 01, 2007 - 3:24 am
Dear authors,
I'm carry on a multilevel factor analysis for ordinal variables.

As suggested by Grilli, Rampichini (2007), I want to carry on a separate EFA on the estimated between and within correlation matrices of the latent responses. "The decomposition of the latent response correlation matrix into the between and within components can be obtained by means of a multivariate two-level ordinal model with unconstrained covariance structure."

How is possible with Mplus to obtain this decomposition?

 Linda K. Muthen posted on Thursday, March 01, 2007 - 7:06 am
Yes, see the SAVEDATA command in the Mplus User's Guide.
 elisa posted on Thursday, March 08, 2007 - 4:38 am
Dear Dr. Muthen,
I'm working on MFA following the strategy suggested by Muthen(1994) and Grilli(2007).
I have 12 indicators and I want to estimate the between and within covariance matrix.
I referred mainly to the Mplus web site Example, cont10.

MISSING ARE all(999);
cluster = cdl;
estimator = ml;
SIGB IS "d:\...\SIGB.txt";
SAMPLE IS "d:\...\SAMPLE.txt";

the output gaves me the estimated sample statistics for between and within, but it says:

What do I have to do?
And, another question: values in the SIGB I saved are different from ESTIMATED SAMPLE STATISTICS FOR BETWEEN, why?
Which values do I have to use in order to carry on an EFA on the Beetween covariance matrix?
Really thanks,
best regards
 Linda K. Muthen posted on Thursday, March 08, 2007 - 8:51 am
This message usually points to a problem of zero variance on the between level. You can fix the variance to zero. If this is not the issue, please send the input, data, output, and your license number to

I would need more information to answer your other question. Please send the input, data, output, and your license number to
 elisa posted on Tuesday, March 13, 2007 - 4:12 am
Dear Mrs. Muthen,
another question.
I have 12 indicators and I want to estimate the between and within covariance matrix in order to follow the strategy of Muthen(1994) and Grilli(2007) for the subsequent analysis.
Snyiders and Bosker (1999) suggest to carry on a multilevel multivariate empty model, but I think in Mplus is not possible.
So, I have to use a TWOLEVEL analysis to obtain the estimates of the between and within covariance matrix. But, at every level, do I have to "create" as many factors as the observed variables are, or do I have to "create" just one factor?
Are there some references on this topic with Mplus examples?

And, when I use the SAVEDATA command, do I obtain SIGMAB and "SIMGMAW" or the pooled SIGMAW?
After obtaining this matrix, may I use an EFA?
TYPE = EFA 2 4;

Thanks a lot,
 elisa posted on Wednesday, March 14, 2007 - 8:40 am
Dear authors,
I have 12 indicators and I want to estimate the between and within covariance matrix in order to follow the strategy of Muthen(1994) and Grilli(2007) for the subsequent analysis.

After saving the estimated between and within covariance matrix through SAVEDATA command, how may I use it for an EFA and a CFA? If I use:

NAMES ARE item1-item12;
USEVARIABLES ARE item1-item12;

Mplus does not read data in the correct way. How can I solve the problem?

Thanks a lot,
 Linda K. Muthen posted on Saturday, March 17, 2007 - 9:05 am
Example 12.1 shows how to read a covariance matrix as data. The MATRIX option of the ANALYSIS command should not be used. If you continue to have problems, please send your input, data, output, and license number to
 Thomas Fuller-Rowell posted on Sunday, August 03, 2008 - 11:26 am
I am working with item level data for a 3-item scale (y1-y3) collected on 150 people (id) once a day for 14 days (day). I would like to break down the variance components for the scale in order to calculate various scale reliability estimates (e.g., within person reliability across days). In order to do this, I would like to calculate variance components for person, day, item, person*day, person*item, day*item, and error. I was wondering if someone might be able to help me with the Mplus syntax to get these estimates?
 Linda K. Muthen posted on Monday, August 04, 2008 - 9:05 am
I would Google George Marcoulides. He has done work in this area. Get an input from one of his papers and translate it into Mplus if he has not used Mplus.
 kanreutai klangphahol posted on Tuesday, March 17, 2009 - 6:53 pm
Dear Dr. Muthen,

I 'm working on multilevel CFA.Mplus 5.1 demo does not display chi-square and modification suggestion. How can I solve the problem?

result :

NUMBER IS -0.668D-16.


thank you so much.
 Bengt O. Muthen posted on Friday, March 20, 2009 - 11:55 am
We would have to see your exact model to be able to comment. Please send input, output, data, and license number to
 Cristian Vaccari posted on Monday, August 03, 2009 - 12:24 pm
I have done longitudinal content analysis of 128 websites of candidates for President. I have 81 variables, most of which dichotomous. Overall, 20 candidates’ websites were coded in 11 observations, but not all of them were coded in all observations: websites were taken in and out of the target population based on the candidates’ decisions to enter or quit the race. Thus, a candidate that entered early and stayed throughout all the race would have 11 observations, while one than entered early but dropped out quickly would have, say, 4 observations, and so on.
I would like to know whether multilevel factor analysis would be suitable to analyze my data. I am concerned that the sample size is too small, especially considering that both between- and within-group analyses need to be conducted.
Thanks a lot for your help.
 Bengt O. Muthen posted on Monday, August 03, 2009 - 4:25 pm
It sounds like you have a sample of 20 subjects, which is quite low. But I am not sure what the difference is between your "81 variables" and your "11 observations". Perhaps an "observation" is a time point at which you observe several variables. If you have 20 subjects observed at 11 time points (some subjects have fewer time points) you have longitudinal data and that means your information is increased substantially. See for example discussion of growth modeling sample size in

Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.

With only 20 subjects, however, you can do only very limited "level-2" (person-level) modeling.
 robertav posted on Thursday, October 08, 2009 - 7:13 am
Dear authors,
I'm applying a multilevel factor model to my data.

Mplus output gives me some relative goodness of fit measures (BIC, AIC, etc).
I would like to know if there are some references about some absolute goodness of fit measures that have to be used in multilevel factor models.

besta regards,
 Linda K. Muthen posted on Thursday, October 08, 2009 - 8:17 am
I don't know of any such references.

If you want chi-square and related fit statistics, you can use weighted least squares estimation. See the ESTIMATOR option in the user's guide.
 Oliver Arranz-Becker posted on Thursday, April 22, 2010 - 9:23 am
Dear Dres. Muthen,

working on a paper on intergenerational relations, I am analyzing the following three-level data structure: Respondents (level 2) rate several aspects (items) of their relations (level 1) with 6 different kin. Additionally, the study is cross-national in that respondents are nested within 11 countries (level 3). I am currently working on the measurement models (CFAs) which, if successful, are to be extended to full multilevel SEMs.
I planned on modeling the first two levels explicitly while correcting standard errors for the clustering by country (using the TYPE=TWOLEVEL COMPLEX command). From my understanding, 11 level 3 units is not enough to be modeled as a third level, anyway.
However, in this analysis Mplus gives implausible fit indices (a CFI of 0.0 and even a negative TLI). As soon as I take the country cluster variable out, results become plausible (CFI=.92, TLI=.85).

1. Do you have any suggestion why this occurs or what I can do about it?
2. In my case, would you recommend a multigroup approach to establish measurement invariance across countries instead of simply correcting the standard errors by country clustering?
3. Do you know of any literature on clustered or multigroup two-level CFA?

Many thanks in advance,
O. Arránz Becker
 Bengt O. Muthen posted on Thursday, April 22, 2010 - 6:11 pm
11 countries is typically not sufficient for Type=Complex. At least 20 are needed. So I would switch from random country effects to fixed ones, using country dummies as covariates. Or use country as a grouping variable as you say.

I wonder if you need multilevel modeling - couldn't the different items for the different kins simply be a multivariate observation vector where the items and the kins are correlated via the model?

Multiple-group two-level CFA was written about in

Muthén, B., Khoo, S.T. & Gustafsson, J.E. (1997). Multilevel latent variable modeling in multiple populations. Unpublished technical report.

which is on our web site under Papers, Multilevel SEM.
 Oliver Arranz-Becker posted on Friday, April 23, 2010 - 1:17 am
Thanks a lot for your interesting suggestions. Some new questions arise:
1. I wonder what the suggested multivariate approach would imply for my CFA. For every kin, there would be as many factors as there are constructs, having measurement error variances per item correlate between kins, is that right?
2. I also wonder how one would then build in covariates in an SEM extension measured on different levels (e.g., a dummy for relation with same-gender kin (level 1) and respondent's education (level 2))? Via 2nd order factors (i.e., relational aspects across kin)? I am a bit confused here, would appreciate any suggestion.
3. If I use fixed country dummies, are standard errors for lower-level covariates (e.g., respondent's education) estimated correctly?
4. Finally, do you have any references on the equivalence (or similarity) of the multilevel and the multivariate approach to hierarchical data?

Thank you so much in advance,
best regards,
 Bengt O. Muthen posted on Friday, April 23, 2010 - 9:09 am
See the Khoo et al reference on slide 54 of our Topic 8 handout on our web site (and surrounding slides) about multivariate versus multilevel approaches.
 Sébastien Fosse posted on Thursday, April 29, 2010 - 12:55 am
Multilevel CFA and sample size:
I am developing a group-level scale, and need to know the minimum number of individuals and groups required to run a multilevel EFA (and CFA). In the final scale, I expect about 20-25 items and 3 sub-scales.
Would you recommend running a Montecarlo simulation to know (approximately) the required sample size, and which references would be useful for that?
Thanks a lot.
PS: I have (Muthén,1994),(Muthén&Muthén, 2002) and the doc “v5.1 ex. Addendum".
 Linda K. Muthen posted on Thursday, April 29, 2010 - 6:10 am
I think a Monte Carlo study could be helpful to determine the sample size needed for your study. I don't know of any references beyond what you mention.
 Oliver Arranz-Becker posted on Tuesday, May 11, 2010 - 5:30 am
I have a follow-up question concerning my multilevel analysis of perceptions of relationships (level 1) nested within respondents (level 2) nested within 11 countries.
Concerning your suggestion to include fixed country dummies, does this actually solve the problem of biased standard errors due to country-clustering? I noticed that adding a cluster command for countries (despite concerns about the small number of countries) does affect standard errors considerably even if country dummies are already in the model.
Do I also get correct standard errors if I include country-level aggregate covariates (e.g., gross national income) instead of country dummies?

Thanks in advance,
 Linda K. Muthen posted on Tuesday, May 11, 2010 - 9:44 am
Using dummy variables will take care of some but not all of the non-independence of observations. Including country-level covariates can also help. The standard errors are not trustworthy using TYPE=COMPLEX with only 11 countries.
 Oliver Arranz-Becker posted on Monday, May 17, 2010 - 12:57 am
Does "not trustworthy" mean that standard errors may be both over- and underestimated?
Can you provide any reference on the minimum number of clusters for TYPE=COMPLEX (in case some reviewer objects)?

 Linda K. Muthen posted on Monday, May 17, 2010 - 6:22 am
Yes. See writings by Joop Hox.
 Wim Van den Broeck posted on Monday, September 20, 2010 - 8:37 am
I have run a multilevel factor-analysis using this code:
LNw BY RTw Logitw;
LNp BY RTp Logitp;
s| Logitp ON LNw;
My question is whether the factorscores are also influenced by the s | part, or are the factorscores only determined by the first two lines?
 Linda K. Muthen posted on Monday, September 20, 2010 - 9:27 am
Factor scores are estimated using the entire estimated model.
 Teresa Dubb posted on Tuesday, March 29, 2011 - 6:01 pm

I have an experiment where 50 participants are assigned into two conditions (good/bad) and in each of the conditions, each participant is presented with two different alternatives (A/B) and they are asked to provide their judgments on both alternatives A and B. I understand that the judgments are clustered by participants but I can't quite figure out how to use TWOLEVEL EFA to perform an exploratory factor analysis with CLUSTER = Participant. Below is an example of the data where X1, X2, and X3 are judgments on different aspects of the alternatives. Thanks very much.

Participant Condition Alternative X1 X2 X3 ...
1 good A 10 12 15 ...
1 good B 30 25 38 ...
2 good A 20 22 33 ...
2 good B 60 35 50 ...
3 bad A 30 30 40 ...
3 bad B 20 40 50 ...
 Bengt O. Muthen posted on Tuesday, March 29, 2011 - 6:13 pm
If the alternatives aren't randomly equivalent it seems like you might want to spread their judgements in a wide, multivariate fashion instead of doing a twolevel model. The data would then look like:

1 good 10 12 15 ... 30 25 38...
2 good 20... etc

You handle this by "longitudinal factor analysis" which can be done in an EFA framework using "ESEM" - see exploratory structural equation modeling in the UG index. You can then check if the judgement factors are the same for A and B.
 Sofie Henschel posted on Monday, June 27, 2011 - 6:25 am
I am trying to run a two-level sem (with cluster-level) with continuous factor indicators (latent factors 1-2), categorical factor indicators (latent factors 3-5) and 2 observed covariates on the within level. Additionally, I have 2 observed covariates on the between level. I wonder which estimator is appropriate in this analysis, wlmsv or mlr?
Thanks for your advice,
 Linda K. Muthen posted on Monday, June 27, 2011 - 6:56 am
If you have three factors with categorical indicators, that will require three dimensions of integration which is computationally demanding which would suggest using WLSM or WMSMV. If you have a lot of missing data, you may want to use MLR or multiple imputation followed by WLSM or WLSMV.
 Sunny Duerr posted on Sunday, July 24, 2011 - 10:01 am

I am trying to run a relatively simple latent variable model using a complex data set (TIMSS). I was under the impression that the most recent version of Mplus had the capacity to apply SEM analysis to complex data, accommodating the weighting variable and the replicate weighting variable. However, I am getting an error suggesting otherwise:

EFA factors are not allowed with replicate weights.
EFA factors are declared with (*label).

Can you not use the complex data options with LVM modeling in version 6.1?

Thank you in advance for your time!

Below is my syntax.


F3 BY APP01 KNO01 REA01;
F3 ON F1-F2;

CONVERGENCE = 0.00005;
 Linda K. Muthen posted on Sunday, July 24, 2011 - 11:49 am
You are combining EFA and CFA factors in the MODEL command. Replicate weights are not allowed with the EFA factor:


They are allowed for the CFA factors:

F3 BY APP01 KNO01 REA01;
 Nidhi Kohli posted on Monday, July 25, 2011 - 10:03 am
Dear Dr. Muthen,

I have a 3-level nested cross-sectional dataset where the outcome variable is binary. The level-1 data is on patients’ characteristics, the level-2 data is on Physicians’ characteristics, and the level-3 data is on clinics characteristics.

Patients’ characteristics data include bunch of observed variables where some variables are continuous, some are binary (and ordinal), and the remaining are count variables. Physicians’ characteristics data also include a mixture of continuous, binary, and count variables. Clinics’ characteristics data only includes one binary variable.

Given the mix of exogenous variable types (continuous, binary, and count), and binary outcome variable, can Mplus handle this as a 3-level factor model? Furthermore, I was thinking that before going ahead with 3-level model, maybe I should first do EFA on just the patient-level data so as to reduce the number of predictors to a smaller but representative set of composites. Later, I can use those composites to define level-1 model in the 3-level factor model. Do you think it is okay to do that?

Thank you.
 Linda K. Muthen posted on Monday, July 25, 2011 - 2:25 pm
For cross-sectional models, Mplus currently handles two-levels. You can use TYPE=COMPLEX TWOLEVEL for three-level where the standard errors at the third level are computed taking clustering into account.
 Nidhi Kohli posted on Monday, July 25, 2011 - 2:38 pm
Thank you.

A follow-up question. I was thinking that before going ahead with 3-level model, I should first do EFA on just the patient-level data so as to reduce the number of predictors to a smaller but representative set of composites. Later, I can use those composites to define level-1 model in the 3-level factor model. Do you think it is okay to do that?
 Linda K. Muthen posted on Monday, July 25, 2011 - 2:58 pm
I really can't answer this question. I would need to know far more about your data and study than is allowed on Mplus Discussion.
 Nidhi Kohli posted on Tuesday, July 26, 2011 - 12:52 pm
Dear Linda,

Thank you for your response.

I tried to run 2-level CFA Model on the clustered dataset that I described in my earlier post. I am getting the following error message:

*** ERROR in ANALYSIS command
Estimator WLSMV is not allowed with TYPE=TWOLEVEL COMPLEX.

I have a categorical factor indicators, hence, I wanted to use WLSMV. Can you tell me what other estimator I can use for a dataset like I have? Below is the relevant Mplus code:


fw BY td_mh;
fw ON gender race mar emp anxiety
phq2 MH_meds Non_MH_meds out_visits;

fb BY td_mh;
fb ON age_md gender_md race_md img family panel_pts PCP_lateness;

Thank You.
 Linda K. Muthen posted on Tuesday, July 26, 2011 - 1:40 pm
You need to use the MLR estimator which is the default for this type of analysis.
 Nidhi Kohli posted on Tuesday, July 26, 2011 - 2:19 pm

I used the default estimator, but still I have not got any success in running the Mplus program. I am getting the following error message:

One or more between-level variables have variation within a cluster for
one or more clusters. Check your data and format statement.

Between Cluster ID with variation in this variable
Variable (only one cluster ID will be listed)


I am not able to understand what this warning message is trying to say. Can you please explain it to me? Thanks.
 Linda K. Muthen posted on Tuesday, July 26, 2011 - 2:57 pm
The message means that in cluster 395, the variable PCP-LATE is not the same for each individual. This is a requirement for a between-level variable. For further help send your output, data, and license number to
 Nidhi Kohli posted on Tuesday, July 26, 2011 - 3:44 pm
Thanks. I have sent the data and output file to, along with the license number.
 Jacky Luo posted on Friday, November 04, 2011 - 6:47 pm
Dear Dr. Muthen,
I want to fit a 2-level Rasch testlet model to my data, which consist of 4 testlets with each having 9 binary items. I am also interested in getting the group variance.
I ran with the following code, but the output seemed a little bit off, and it seemed that I could not get a direct estimate of the group variance. Would you please take a quick look at it to see whether there is something wrong with my code?

TITLE: Two level IRT testlet model
DATA: FILE IS C:\Users\Jacky\Downloads\r1.dat;
VARIABLE: NAMES ARE u1-u36 group;
CLUSTER = group;
MISSING = ALL (999);
f1 BY u1-u9@(1);
f2 BY u10-u18@(1);
f3 BY u19-u27@(1);
f4 BY u28-u36@(1);
f5 BY u1-u36@(1);


 Bengt O. Muthen posted on Friday, November 04, 2011 - 8:31 pm
When you hold the loadings equal you should not use @ because @ means fixed.

Also, you are only fixing the f5 variance at 1 and I think you want to fix all five variances.

It will be impossible to handle all 36 random intercept variances on between (unless you use estimator = wlsmv). Instead you want to formulate a factor on between where those 36 random intercepts are the indicators.
 Jacky Luo posted on Sunday, November 06, 2011 - 11:55 am
Hi Dear Dr. Muthen,
Thanks very much for your prompt response.
Based on your suggestions, I used the following code:
f1 BY u1-u9 *(1);
f2 BY u10-u18*(1);
f3 BY u19-u27*(1);
f4 BY u28-u36*(1);
f5 BY u1-u36*(1);
fb BY u1-u36;

Is this consistent with your suggestions? I am also wondering why I have to fix all 5 factor variances to be 1, since f1-4 are testlet factors and they are simulated not to have the same variance as f5.

Thanks very much,
 Bengt O. Muthen posted on Sunday, November 06, 2011 - 6:16 pm
I would say

f1 BY u1-u9 *(1);
f2 BY u10-u18*(2);
f3 BY u19-u27*(3);
f4 BY u28-u36*(4);
f5 BY u1-u36*(5);
f1-f5 WITH f1-f5@0;
fb BY u1-u36;

If you are going to have a Rasch-like model I would not generate data where the factor variances differ but instead the loadings.
 Jacky Luo posted on Monday, November 07, 2011 - 10:33 am
Thanks so much, Dr. Muthen!
 Morag T posted on Thursday, November 17, 2011 - 3:09 am
I wonder can anyone help me please. I have 5 waves of a birth cohort study (data collected annually) and I have a series of categorical variables on ‘closeness to family and friends’ collected in each sweep (26 variables in total).

I did 3 principal components analyses on these (I couldn’t just do one as not all the variables were sufficiently correlated). The resulting 3 factors regressed nicely with my DV, ‘Children’s Stress and Difficulties score’.

However, as I used variables across waves this makes the data multi-level and my PCA was not. Can anyone advise me whether this data would be suitable for a multi-level factor analysis in MPlus? I have MPlus but it is new to me.

Just to complicate matters, my data is a complex sample, with PSU (cluster), Strata and a longitudinal sample weight. Should these be incorporated into a multi-level factor analysis, and if so, how? If it is possible and you could show me the necessary syntax, I would be eternally grateful.

What I had tried to do in MPlus was:

File is \ Variable:
Names are ;
Missing are all (-9999) ;
Categorical are
Usevariables are
Stratification = DcStrat;
Cluster = DcPSU;
Weight = DcWTbrth;
Type = Complex EFA 1 4 ;

Could you please advise on this? And, many thanks for reading this far!
 Linda K. Muthen posted on Thursday, November 17, 2011 - 10:41 am
This seems to be a reasonable approach.
 Morag T posted on Friday, November 18, 2011 - 3:26 am
Many thanks for your prompt response. However, I'm sorry, I'm not quite understanding, do you mean the approach I already took was reasonable, or to take the multilevel factor approach would be reasonable?
 Linda K. Muthen posted on Friday, November 18, 2011 - 1:45 pm
You can take the non-independence of observations into account using COMPLEX or TWOLEVEL. If you use TWOLEVEL, then in addition to taking non-independence of observations into account you would be interested in study the factor structure on both within and between.
 Morag T posted on Tuesday, November 22, 2011 - 2:30 am
Thank you very much!
 finnigan posted on Tuesday, December 13, 2011 - 6:06 am

I want to run a multiple indicator growth model using three waves of data , but the data are collected from individuals nested in companies. Arguably , I have a multilevel data structure.

I am trying to conduct measurement invariance testing using CFA prior to running the growth model.

Should I run a CFA with a single factor using a multilevel analysis ie at time one run a two level CFA and repeat this analysis for time two and three or would it be more appropriate to run a two level longitudinal factor analysis to check for multi level effects. Groups are unbalanced and less than 30. Sample size for t1 = 130, t2 = 118. T3 =110 The multilevel structure is not of interest , I am trying to rule it out so that I can focus on the measurement invariance testing for the growth model.

Does MPLUS calculate sampling weights or do they have to be specified for MPLUS
 Linda K. Muthen posted on Wednesday, December 14, 2011 - 2:47 pm
I would run the analysis with and without TYPE=COMPLEX to see if there is a big difference in the standard errors. If not, I would ignore the clustering. You can see the Topic 4 course handout on the website for the steps for multiple indicator growthm
 finnigan posted on Thursday, December 15, 2011 - 2:10 am
Thanks Linda

Should the multilevel factor analysis be run longitudinally ie all three time points or one factor analysis per time point?
 Linda K. Muthen posted on Thursday, December 15, 2011 - 8:32 am
If you look at the Topic 4 course handout, you will see the steps we recommend.
 Juliette Berg posted on Sunday, February 26, 2012 - 3:01 pm
I am working on a two-level EFA/CFA model with likert scale items from three scales. Items from two of the scales have response categories ranging from 1-4 and items from the third scale have response categories from 0-3. I am running continuous models. I am wondering if it is preferable or acceptable to standardize the responses (using z-scores) when running the multilevel factor analyses to get the items on the scale. Results from the EFA/CFA's using standardized and non-standardized items are very similar in the within-level part of the model, but are a little different in the between-level part of the model. Is there standard practice for whether or not to standardize items on different response scales in factor analysis and does this differ for multilevel factor analysis?

Thank you.
 Linda K. Muthen posted on Monday, February 27, 2012 - 10:33 am
I would not standardize variables.
 Juliette  posted on Wednesday, March 14, 2012 - 3:04 am
Thank you for the response. Could you elaborate on why I wouldn't want to standardize? Would standardizing not allow each variable to contribute one unit of variance to the solution?
Thank you.
 Linda K. Muthen posted on Wednesday, March 14, 2012 - 2:40 pm
You can only use standardized variables if you have a scale free model. In this case, it does not matter whether you analyze a correlation or covariances matrix. If you have a model with any constraints, you will obtain different results analyzing a correlation versus a covariance matrix. Standardizing also presents problems for across group or across time comparisons.
 Daniel E Bontempo posted on Wednesday, July 18, 2012 - 10:37 am
In TWOLEVEL EFA with ordinal indicators, I obtain the within and between sample covariance with SAMPSTAT. Consistent with LRV formulation of categorical models, I see 1.00 for the within variance of all indicators and item-specific variances in the between matrix (unlike TWOLEVEL EFA with continuous indicators, where item-specific variances appear in both within and between covariance matrix).

However, ICC is given for each item, and some algebra with ICC and between-variance will yield the within-variance consistent with the ICC, and it s not 1.00.

How are within variances and ICC calculated with categorical items and LRV formulation?

Is it appropriate for me to use ICC to calculate the within-variance for reporting purposes?

 Linda K. Muthen posted on Wednesday, July 18, 2012 - 1:13 pm
Is this probit or logistic regression?
 Daniel E Bontempo posted on Wednesday, July 18, 2012 - 2:31 pm
I don't understand probit or logistic in this context. It is an EFA

File is tone.dat ;
Names are
ss video clip SU NU CA WA PO RE AF PA DO CO BO DI;
categorical SU CA PO RE DO CO BO DI;
useobs video==0;
Missing are all (-9999) ;
Cluster = ss;
Type = TWOLEVEL EFA 1 2 UW 1 2 UB;
 Bengt O. Muthen posted on Wednesday, July 18, 2012 - 3:26 pm
An EFA with a categorical indicator uses a logit or probit regression of the indicator on the factor. See the IRT literature for instance - the topic is "item characteristic curves".

WLSMV uses probit, while with ML you have a choice.
 Daniel E Bontempo posted on Wednesday, July 18, 2012 - 3:45 pm
Ok, but I am still missing something, unless this relates to the intraclass correlation coefficient calculation. I hope I am not being dense.

That is my entire program, so whatever the default estimator is for twolevel efa with categorical indicators, thats what I used.

But my question is about the sample statistics obtained with SAMPSTAT. ... and about the ICC.

I am just trying to get some semblence of total, within, and between estimated sample correlations. Papers with continuious indicators, like Reise's 2005 paper demonstrating MFA, report these correlations.

I think logit vs probit has to do with interpreting the loading. I am after the within-variance used to calculate ICC.
 Bengt O. Muthen posted on Wednesday, July 18, 2012 - 4:03 pm
You can see which estimator and link you are using in the output summary. The default for 2-level categorical EFA is WLSMV which uses a probit link.

The reason we were asking is that the icc calculation differs depending on the link. In this case, the within-level variance for a factor indicator is fixed at 1 so the icc is

between-variance/(between-variance + 1)

If that doesn't seem right in your run, please send output to Support.
 Daniel E Bontempo posted on Thursday, July 19, 2012 - 8:01 am
Thanks for confirming the within-variance is fixed at 1.

This morning I recheck my calculations and find that the problem was I had transposed the the between-variance (.125) and the ICC (.111) when I made my calculations, and so I did not compute 1.0. (I computed .777, where I expected 1.0, and got confused).

Apologies for troubling the board with an issue that stemmed from me transposing two numbers. Many thanks for your help in getting me back on track.

If I can ask a followup question, do I understand that with some other estimators, the within-variance would not be fixed at 1.0?

.... and, is there any advantage/disadvantage to choosing different estimators with this EFA I am conducting?
 Linda K. Muthen posted on Thursday, July 19, 2012 - 10:56 am
For logistic regression, it is fixed at pi squared divided by 3 where pi is 2.14.

No advantage to choosing different estimators.
 rachel upton posted on Friday, August 03, 2012 - 1:01 pm
Dear Drs. Muthen.

Hello. I am a novice to using multilevel factor models and I have a question concerning the use of a two-level factor model with four ordinal (Likert type) items via WLSMV estimation. My sample size is 589 with 50 clusters.

There is only one factor that I am trying to assess using this model (there is a unidimensional factor structure at level-1 and at level-2), but in all of the models tested the between-level (level-2) factor variance is nonsignificant using an alpha level of .05.

Thus, I am wondering if the non-significant between-level factor variance warrants that I continue to use multilevel factor analysis, or if I should use Mplus to assess the factor structure of these items using an aggregated, single-level CFA?

Put another way, is a multilevel factor model warranted when the level-2 factor variance is nonsignificant?

Thank you for your assistance!
 Linda K. Muthen posted on Saturday, August 04, 2012 - 11:14 am
Yes, it is warranted if the residual variances are significant.
 KUN YUAN posted on Thursday, November 08, 2012 - 10:56 am
Dear Dr. Muthen,

I have a four-level data set with 10 categorical factor indicators and would like to conduct multilevel EFA with it using Mplus. I've seen the examples with 2 levels. I wonder whether Mplus is capable of running a four-level EFA with categorical variables. If so, could you please point me to any examples you have?

Thank you!
 Bengt O. Muthen posted on Thursday, November 08, 2012 - 11:26 am
No, Mplus can only do 2-level EFA. I haven't seen any other program that can do more than 1-level EFA.
 KUN YUAN posted on Friday, November 09, 2012 - 12:29 pm
Thank you for your answer, Dr. Muthen.
 SE WOONG LEE posted on Thursday, January 03, 2013 - 9:37 am
Dear Dr. Muthen

I have conducted, for the first time, Multilevel factor Analysis.
I estimated a two-level factor analysis using teacher survey and principal survey. (Teachers are nested within schools). I did Exploratory Factor Analysis (EFA) using principal components analysis extraction method with Geomin rotation on 19 items through Mplus ver. 6.11.

Now I have three results



What is the reasoning for choosing one of those three factors over the others? Is there anything I should look out for before I use one of those?
For example, if I choose to use the information from 2 within factors and 2 between factors compare to unrestricted, what should be the rational? and vise versa?

Thank you
 Linda K. Muthen posted on Thursday, January 03, 2013 - 12:09 pm
There are fit statistics to help you choose. However, it may be difficult to choose based only on these. I would say ultimately, the interpretability of the factors based on your theory should guide you.
 Silje M. Ormhaug posted on Saturday, March 02, 2013 - 4:24 am

I am trying to conduct a twolevel CFA on a data set with two observations per participant (ID) and categorical factor indicators (variables rated on a 4-point likert scale). When I run the analyses, the output states that the model terminates normally, and there are no further warnings. However, the only model fit indicators I get are Loglikelihood and Information criteria, and no CFI, TLI or RMSEA.

The command I use is this:
TITLE: Two-level factor analysis, categorical indicators
DATA: FILE IS M:\Data.dat;
Y6 Y7 Y8 Y9 Y10 Y11 Y12;
Y6 Y7 Y8 Y9 Y10 Y11 Y12;
MODEL: %between%
F1Bet BY Y2 Y4 Y7 Y9 Y11 Y12;
F2Bet BY Y1 Y3 Y5 Y6 Y8 Y10;
F1With BY Y2 Y4 Y7 Y9 Y11 Y12;
F2With BY Y1 Y3 Y5 Y6 Y8 Y10;

Is there anything I should ask for in the OUTPUT to receive CFI, TLI etc.?

Thank you very much for your help!
 Linda K. Muthen posted on Saturday, March 02, 2013 - 7:48 am
With categorical dependent variables and maximum likelihood estimation, means, variances, and covariances are not sufficient statistics for model estimation. Because of this, chi-square and related fit statistics are not available.
 Jessica Andrews posted on Wednesday, March 27, 2013 - 4:30 pm
Hello, I was hoping I could get advice on what type of EFA to perform in MPLUS for my dataset.

In our study, participants retrieve several memories and rate them on 13 fixed variables. Since memories are personal, they vary between subjects. We are interested in examining

1) the factor structure underlying the variables characterizing the memories (i.e. within-level EFA)
2) the factor structure at the between-subject level, such that a factor reflects the fact that people who (on average) rate their memories high on one variable also rate their memories as high on another variable.

Factor scores are most meaningful to me on the between-subject level as I would like to correlate them with other measures.

My question is whether it would make sense to perform a two-level EFA (saving factor scores for the between-level), OR to perform two separate EFAs, each with slightly different interpretations: 1) a within-level EFA where each row reflects a single memory and I ignore subjects all-together and 2) a between-level EFA where each row represents the average ratings across all memories from a single subject.

Finally, if I chose the two-level route, will MPLUS accept a datafile where each row reflects a single memory and there are 13 different columns and a separate “SubjectID” column?

Thank you in advance!
 Bengt O. Muthen posted on Wednesday, March 27, 2013 - 5:59 pm
I think it makes sense to do a 2-level factor analysis where the focus is on level 2 factors as you say. Regarding your last question, yes Mplus accepts this format.
 Jessica Andrews posted on Thursday, March 28, 2013 - 5:20 pm
Hi! Thanks very much for your response. We ran a two-level EFA and the algorithm identified an unrestricted within and 4-factor between as providing the best fit to the data. I want to extract factor scores for the between-level factors (I don’t care about the within) but am having trouble. I gather that saving factor scores is not possible with a two-level EFA, so I ran a two-level CFA and have gotten the following error.

However, we played around with changing the number of factors and removing some of the variables (just to see if the model would run), and we were successful at getting a 3 factor model and producing factor scores.

Do you have any insight into this error? We also tried different start values with no avail.

Thank you very much!
 Jessica Andrews posted on Thursday, March 28, 2013 - 10:09 pm
I apologize, I forgot to include the error:



 Linda K. Muthen posted on Friday, March 29, 2013 - 5:54 am
Please send the output and your license number to
 burak aydin posted on Friday, November 22, 2013 - 8:58 am
Our data have 130(N) clusters, 15 kids in each cluster(n~ ); and, we are trying to confirm 2 latent factors assessed with likert type questions (scale 1-3).
We have 14 indicators for f1, 30 indicators for f2.

We are not interested in factor structure at between level, also running MCFA is computationally demanding (also we dont wantto get into parceling).

We plan to run single level CFA with declared clusters (CLUSTER command).

Do you think this model is sound/valid/ satisfactory to run and report a CFA for clustered data?
 Bengt O. Muthen posted on Sunday, November 24, 2013 - 3:17 pm
I think you are asking about Type = Complex versus Type=Twolevel. The factor model is not "aggregatable" (see the Muthen-Satorra article

Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.)

and may be somewhat distorted by taking the Complex approach relative to the Twolevel approach. But it may be a reasonable approximation.
 burak aydin posted on Monday, November 25, 2013 - 10:12 am
Yes, I am asking Complex vs. Twolevel.
I was encouraged to ask this question because complex vs twolevel provided consistent findings for a two level regression model, not in a simulation study but with 25 different outcomes. For a two level regression I always prefer using the two-level approach.(this might come to GEE vs multilevel)
We also wanted to use two-level approach for CFA, given it was the nature of the data set. But we faced some difficulties, both computational and theoretical.
Thank you very much for your response.
 SY Khan posted on Monday, December 09, 2013 - 6:58 am
Dear Drs. Muthen,

I am working on secondary data in which the independent variables (employed HR practices) are measured at workplace level (level-2) and are binary. Employees responses to the HR practices in terms of their job satisfaction, depression and organizational support etc are measured at level-1 and are intervening and outcome data in my model (ordinal). So the employees are grouped within workplaces and evaluate the effects of their corresponding HR/organizational practices.

The EFA results at level-2 highlight four factors for HR practices confirmed by CFA at level 2. At the level-1 EFA and CFA confirm seven constructs. So, four dimensions of HR practices (level-2)affect perceptions of employees’ outcomes on seven aspects (level-1).

Kindly advise:

1- Because out of the total 11 constructs in the SEM model, four are at level 2 and 7 at level 1 is it appropriate the overall measurement model (CFA) be evaluated as a two-level CFA?

2- I would like to omit the covariates in the two-level CFA and introduce the covariates in regression/path analysis. I have found example 9.7, in Users Guide 7.11, but this is with covariates. Can you please recommend a more suitable example and syntax in my case.

Thank you in advance.
 Linda K. Muthen posted on Monday, December 09, 2013 - 11:00 am
1. Yes.
2. Remove the covariates from Example 9.7.
 SY Khan posted on Monday, December 09, 2013 - 3:34 pm
Hi Linda,

Thanks for your prompt reply. I have I made following changes to example 9.7 syntax:

1-As I have seven factors at level 1 formed by Categorical factor indicators so in the WITHIN and BETWEEN:


And so on for remaining five factors. (All level 1 indicators)

And so on for remaining five factors. (All level 1 indicators)

2-As I am not interested in covariates so removed the

WITHIN=x1 x2;


BUT I get the following FATAL ERROR:

Kindly advise if:

1- I have altered the syntax correctly?

2-Also the syntax does not include any of the HR factors at level2 that are affecting the perceptions of employees at level 1 about their job satisfaction and six other employee outcome aspects. i.e. the overall measurement model is incomplete as it does not have four constructs at level 2 included. How can I include the effects of the level 2 constructs in the model?

Thank you for your time and help
 Linda K. Muthen posted on Tuesday, December 10, 2013 - 11:16 am
Please send your output and license number to
 SY Khan posted on Wednesday, December 11, 2013 - 11:44 am
Dear Dr. Muthen,

Thank you for your email regarding the use of WLSMV estimator for computational ease of the level 2 CFA.

I am still not clear on the link between level 1 and level 2 factors though. you recommend that I can create a factor at level 2 following the example 9.7 of users guide.

1- My question is that in the %WITHIN% part I have seven factors with categorical indicators at level 1. So in the %BETWEEN% part should I create one factor with all level-1 factor indicators (which form seven factors). OR re-write the seven factors at level 1 in the %BETWEEN% part again.

2- What will creating a factor at level 2 in the %BEWEEN% show?
3-is it possible to have a diagram of this model?

Thank you for your time and guidance.
 Linda K. Muthen posted on Wednesday, December 11, 2013 - 1:08 pm
fw BY y1 y2 y3 y4;
fb BY y1 y2 y3 Y4;

fw uses factor indicators that are the within part of y1, y2, y3, and y4. fw cannot be used in the between part of the model.

fb uses factor indicators that are the between part of y1, y2, y3, and y4. This can be used in the between part of the model.

A model with a random slope can have a cross-level interaction. See Example 9.2.

Path diagrams are not available for multilevel models.
 SY Khan posted on Wednesday, December 11, 2013 - 3:08 pm
Hi Linda,

Thanks for your prompt reply. Kindly excuse my repetition as I am getting confused by the same Y1 y2 Y3 y4 variables that are coming in both between and within part.

Kindly correct me if I understand incorrectly. Although the within and between parts both have y1 y2 y3 y4 but these are different variables. In the Within part these are factor indicators that form level 1 factors . In my case all different factor indicators that form seven different factors at level 1. So I will have for example

fw1 By y1-y4
fw2 By y5-y6
fw3 BY y7-y9
fw5 BY y10-y13
fw6 BY y14-y18
Fw7 BY y19-y24

But in the between part y1 y2 y3 y4 refer to some other level 2 factor indicators that form level 2 factors. In my case all variables that form four different factors at level 2? So I have fb1 fb2 fb3 fb4 at between level as for example

fb1 BY y25-y28
Fb2 BY y29-y32
fb3 BY y33-y36
fb4 BY y37-39


fb BY y1-y24

Which one at Between level is correct?

Thank you for your continuous help.
 Linda K. Muthen posted on Wednesday, December 11, 2013 - 3:41 pm
Please see Example 9.1 where there is a discussion of the latent variable decomposition of a variable measured on the individual level. This may clarify things for you.

How you specify the model on between is your decision. There are often fewer factors on between than on within. You may also want to see the Topic 7 course video and handout on the website.
 SY Khan posted on Friday, December 13, 2013 - 10:24 am
Dear Dr. Muthen,

As recommended I have watched the short course video on multi-level modelling and found it to be very useful. Thanks for that. However, I still have some problem in successfully running the MCFA.

I ran a two level CFA with five factors at level 1 and one at level 2 on categorical factor indicators. the analysis ran for approx. 15 hrs (I am not sure if this is norm in these kind of analysis?).

At the end I got the message that




Also I did not get a chi-square, RMSEA, CFI or SRMR for within and between levels.or any standardized loadings/correlation.

I used the following input analysis instructions:



1- Am I missing anything in these commands?
2-Is the test run time usual?

Thanks for your help.
 SY Khan posted on Sunday, December 15, 2013 - 5:23 am
Hello Dr. Muthen,

In continuation to my query above regarding MCFA and no standard errors calculation due to probable model non-identification--

I altered the %BETWEEN% part of the model to include only the level 2 predictor factors and %WITHIN% part only level 1 factors. The test ran ok and for 16-17 hrs but at the end it did not give any output. Just the message that input data terminated normally.

Kindly advise where am I miss-specifying the model?

Thank you for your kind co-operation.
 Linda K. Muthen posted on Sunday, December 15, 2013 - 7:21 am
Please send your output and license number to
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message