anonymous posted on Monday, January 22, 2001 - 2:44 pm
Therapists evaluate their supervisor on 43 questions and we need to determine a stable factor structure that reflects qualities of supervisors. What complicates things is that several therapists share and thus evaluate the same supervisor which creates a multi-level situation. We have been doing factor analysis using SAS proc factor (exploratory factor analysis), ignoring this multi-level structure, which is of course not completely right. Your 1994 paper (I read second hand from the book by Heck and Thomas, 2000, which referenced your paper heavily as the definitive answer to this problem) proposed decomposing the total covariance to the within and between parts. I understand that M-plus can do this. My questions are:
1. Does this strategy apply to our multi-level situation? If yes, can we use SAS proc factor to analyze the within and between covariance from M-plus? What is the interpretation for the factor structure based on within vs. between covariance?
2. What analytic strategy do you recommend to assess the similarity/difference in factor structure generated by the within vs. between analyses?
3. Is there an example in your M-plus support site that matches our problem closely?
Yes, this sounds like it would be suitable for multilevel factor analysis as I described it in the 1994 article. The supervisors form clusters of therapists, with unequal numbers of them within clusters. Mplus offers a couple of different forms of this analysis.
The Type=twolevel analysis draws on a model where there are both between- and within-variation sources. The within-level part of the model describes the factor structure for how the therapists' 43 evaluations covary across therapists. The between-level part of the model describes how the 43 supervisor means covary across supervisors.
The estimator is based on maximum likelihood where the analysis of the between and within parts of the model are analyzed simultaneously. The number of factors, loading values, and factor distribution parameters can be different on the between and within levels. Tests of equalities can be made. Exploratory factor analysis (EFA) can be done when carried out in this confirmatory framework, using the m**2 restrictions needed (m being the number of factors) on Lambda and Psi, where the restrictions are placed on both levels. Experience shows that the between structure is often different from and simpler than the within structure (see, e.g. reference in the 1989 Muthen Psychometrika article as referenced on this web site).
A simpler approach is to use the pooled-within sample covariance matrix (scaled to a correlation matrix) for a regular EFA. This gives estimates close to those that are obtained for the within-level parameters in the twolevel analysis described above. Use sam;ple size N - C, where N is the total number of therapists and C is the total number of supervisors.
For the between level parameters, the sample between covariance matrix can be used, although this is not an unbiased estimator of the between covariance matrix; the unbiased estimator can be used instead (for details, see the User's Guide, Technical Appendix 10). Between-level estimates can differ quite a lot from those obtained through the twolevel analysis.
The Mplus web site Example section has a similar analysis under Continuous, cont10.
How do you format data to perform multilevel analysis? that is to say, are cluster-level data repeated for each element in the second level, or are they written only once? Gianbattista FLEBUS
Anonymous posted on Wednesday, July 03, 2002 - 10:37 am
Cluster-level data are repeated for each element in the cluster.
Anonymous posted on Tuesday, January 21, 2003 - 10:29 pm
Is the quasi-ML fiting function for multilevel factor (MFA) anlaysis robust to designs that are unbalanced and have a non-normal multivariate distribution? In any case, please supply references. Also, please supply some guidelines and limitations for using the MLM estimator in MFA under non-normal situations (e.g., normalized versions of Mardia's coefficients are greater than 10 for skew and kurtosis).
bmuthen posted on Wednesday, January 22, 2003 - 6:45 pm
Mplus 2.1 has both MUML (limited-information, quasi-ML) and FIML (full information ML) estimators, each with a non-normality robust version for the standard errors (called MUMLM and MLR, respectively). Two-level MLM in earlier Mplus versions is now called MUMLM. More about these estimators is stated in the Addendum to Mplus Version 2.1 on the Mplus web site. It is my experience that MUML agrees well with FIML also with data that have very different cluster sizes (unbalanced data) and this has also been seen in studies by Joop Hox. MUMLM non-normality robustness for s.e.'s is expected but has not been thoroughly studied nor written about; simulation studies can explore this. MLR non-normality robustness for s.e.'s is expected due to the sandwich formula used. With very skewed data it is often the case that there is also strong floor or ceiling effects in which case linear models are not suitable, so the s.e.'s correction does not help.
We have data on teachers, evaluating leadership at school. I did a multilevel factor analysis which has nice results. Now I want to calculate scores for these underlying concepts on the school level to do some exploratory analysis with pupil data. How do I calculate these scores for the schools.
Are there some good practices that someone could advice me? Do I calculate on the schoollevel the pooled mean scores for the individual items and afterwards sum up these scores weighted by the inverse of the error variance? Are there better ways of doing this?
bmuthen posted on Monday, February 23, 2004 - 8:34 am
Mplus prints out the estimated factor scores for factors on the between level.
Inserting the following line: SAVE=FSCORES results only in the original scores. In your User's Guide you mention that this option is not available for TYPE=TWOLEVEL. Neither is the option FSCOEFFICIENTS available in this type of analysis. Do I make a mistake?
Similar to the therapist example above, I have personality items that I want to look at the factor structure. The respondents are grouped within families (actually twin pairs).
Unlike the cont10 example, my items have only 2 levels and need categorical methods.
I do not have any predictors for the family/twin level, and am not clear about the need/interpretation of the within and between factors.
Also, I wanted to do a multi-group analysis to look at age groupings. I am running into trouble because the TWOLEVEL option forces a ML estimator (or MLR) with numerical integration, and this is incompatiable with the THETA parameterization and MGROUP option I am using.
I have evidence for strong age invariance, and could colapse into one group and use DELTA (because I will no longer test strict invariance - or even invariance), but would prefer to keel the multiple groups if possible.
The error messages said something about using MIXTURE and KNOWNCLASS, but I am unsure of the implications of this. Can youelaborate or provide a reference?
Given the basic problem of multigroup (age or gender) factor analysis with multilevel (family) data, what suggestions would you offer for me to pursue? ANy paper or example that had similar issues would be great. Sorry to make such a sweeping request.
I have reviewed the paper and am uncertian about the applicability.
Recall my goal is to handle the dependancies due to twins in a multi-group factor analysis with binary indicators.
In the paper, family was the unit of analysis and each sibbling was allowed to have their own growth parameters. If I follow your suggestion, twin-pair would become the unit of analysis, and I would estimate a common factor for each side of the twinship.
I think I would need to insure that the same common factor was produced on each side of the twinship - I'm not sure if this would be by constraining loadings or directly constraining the factor mean and variance. At this point, I am not sure that I haven't undone any "handling of the dependancy" that was addresses by the multivariate approach.
Please, could you comment on this?
Also, I have been thinking about TYPE=MIXTURE using KNOWNCLAS to indicate my two or three groupings. In this senario, could I also use TWOLEVEL clustered on twin?
Since it is arbitrary which twin is placed on each side of the twinship, I am wondering if there is a monte-carlo senario where I could get MPlus to repeatedly randomly divide the twins and estimate the model and present me with averages.
To ensure that the same factor operates for both twins, your multivariate model for the joint anslysis of the twins should have one factor for each twin where the factor correlation (assuming factor variances are fixed at one to set the metric) is fixed at one and the factor loadings are held equal.
If you go to MIXTUE TWOLEVEL you will have the same numerical integration issue you started with.
The twins are not mixed together in the analysis. If you have two factors as described above, a random arrangement is not an issue.
Sungworn posted on Tuesday, June 14, 2005 - 2:13 pm
I am wondering if Mplus can do multilevel factor analysis of dichotomous data (i.e., achievement test data where 1=right, and 0=wrong)? Thanks.
bmuthen posted on Wednesday, June 15, 2005 - 7:43 am
Yes, this can be done using ML.
Sungworn posted on Thursday, June 16, 2005 - 1:04 pm
Are you familiar with NOHARM, a computer software for multidimensional IRT? If so, do Mplus and NOHARM yield the same results in terms of factor loadings and thresholds?
I'm wondering if I can conduct the following analysis in Mplus. I modify the example 9.9 and 9.10 from the Mplus version 3 User's guide on pages 205-207. I have 31 clusters would that be large enough cluster size?
TITLE: this is an example of two-level CFA with continuous factor indicators, covariates,and random slopes DATA: FILE IS ex9.9.dat; VARIABLE:NAMES ARE y1-y4 x1-x4 w clus; CLUSTER = clus; BETWEEN = w; ANALYSIS:TYPE = TWOLEVEL RANDOM; ALGORITHM = INTEGRATION; INTEGRATION = 10; MODEL: %WITHIN% fw1 BY y1-y4; fw2 BY x1-x4; s | fw1 ON fw2; %BETWEEN% fb BY y1-y4; y1-y4@0; fb s ON w;
Thanks a lot,
bmuthen posted on Thursday, November 17, 2005 - 5:15 am
See answer to the same question under SEM.
Marco posted on Monday, December 19, 2005 - 1:21 pm
Hello Linda, hello Bengt,
I experience sometimes, that a MFA with estimator=MLR yields an undefined scaling factor. Judging from the preliminary steps (from Muthén, 1994), the chi²-statistic and the fit indices seem to be ok. So, what is the meaning/reason of an undefined scaling factor? Is there a way to conduct an chi²-difference-test with these results?
Many thanks! Btw, is it possible to see somewhere on the homepage, what exactly has been updated? That would be a good idea, since the homepage contains so many important information.
bmuthen posted on Tuesday, December 20, 2005 - 5:47 am
The scaling correction factor comes out negative, i.e., the estimation gives a poor approximation to the chi2 asymptotic distribution. Wald testing is an alternative, but is not easy to do by hand; will be available in future Mplus.
Marco posted on Tuesday, December 20, 2005 - 5:55 am
I guess that the "poor approximation" refers to the estimation of the scaling factor. Does this imply that the chi²-statistic itself is unreliable?
bmuthen posted on Tuesday, December 20, 2005 - 6:09 am
Marco posted on Tuesday, December 27, 2005 - 3:20 pm
based on my limited trials, I found an undefined scaling factor only in models, where an indicator is specified as within (despite having little between variance). The scaling factor becomes positive defined after eliminating the indicator entirely from the analysis or allowing the indicator to vary within and between. Is this data-specific or generally expected? Thanks!
After watching the latest training on multilevel analysis and reading the Grilli paper on multilevel factor analysis with ordinal variables, I have several questions resgarding multilevel CFA in MPLUS.
1) Are there any other examples of papers that discuss the interpretation of the MPLUS output for multilevel CFAs with categorical variables?
2) Dr. Muthen said that the categorical multilevel CFA is essentially a 2-parameter IRT model. Is this still the case when the model doesn't have random slopes or does it then become a Rasch model?
3) Are the factor variances at the within and between levels directly interpretable in the case of categorical CFA and can I use it to calculate an ICC?
4) what do the thresholds mean in the case of the categorical CFA output?
1) There are papers on multilevel CFA, but not discussing the Mplus output per se as far as I know. An early paper with continuous outcomes is:
Muthén, B. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338-354. (#37)
2) Rasch has the same slope for all items. If the slope is random or not (cluster level variation or not) is another matter. One can have a Rasch or not a Rasch model and have fixed or random slopes.
3) For that you need to hold the loadings invariant across the two levels and that often does not fit as well as letting them be different.
4) With binary outcomes they are the same as the negative of the intercepts. For translations between Mplus and IRT parameterizations, see our Short Course handout from Day 3 which can be requested off the web.
Thank you very much for your reply to my questions. I have ordered the handouts from the short course for lectures 3 and 5. In the meantime however, could you tell me how I can calculate the ICC from the output obtained from a categorical multilevel CFA? I cannot hold the loadings invariant across the two levels because I have 2 factors at level 1 and 1 factor at level 2.
Icc is a concept for a continuous variable where the variance is a freely estimated parameter. This is not the case with a categorical variable because you don't estimate a free variance parameter for the dependent variable (the mean p and the variance p(1-p) are mathematically linked). You can talk about an icc for a factor as I did in the article I mentioned, but for that you need loadings that are invariant across levels. So I don't see how you can meaningfully compute an icc here. On the other hand, I don't see the need for it either because the estimated model has all the information you need - the amount of between cluster variation tells you how much 2-level modeling is needed.
Thank you for your reply. Is there a way, given a multilevel factor analysis with different loadings at the two levels and categorical variables, to calculate the level 2 reliability coefficient from the MPLUS output? For example, as proposed by Raudenbush in some of his papers on three-level logistic Rasch measurement models?
You may find it useful to study the paper posted on our web site under Recent Papers:
Grilli, L. & Rampichini, C. (2004). Multilevel factor models for ordinal variables. Submitted for publication.
This gives details in an Mplus framework.
1) In two-level modeling, means/intercepts/thresholds are given on level 2 (see 2-level linear regression in the Raudenbush-Bryk book as an example). It sounds like you are using a model that has different loadings for the 2 levels. I don't know that multilevel IRT models have addressed the issue of differing loadings (discriminations) on the different levels. If you compare to the Raudenbush et al (2003) article in Soc Meth, eqn (2) is written in classic IRT form, but when adding the page 183 multilevel features, it looks like the discrimination parameter lambda has been dropped. And more to the point, I don't see why it is necessary to connect to IRT - all you are interested in is being able to plot your item characteristic curve as a function of within and between variation in the ability. You can do that straight from the Mplus model (again see Grilli & Rampichini). But if you show me a multilevel IRT model with different loadings, I will make the translation.
2) Regarding within- and between-level reliability, I still have to read up on the references you gave me to answer this. These are not quantitites I am used to looking at (I think).
3) A two-level Rasch model in line with the Raudenbush article would seem to be easy to specify in Mplus using ML logit. You simply set loadings equal across items and across levels.
Keep me informed about your progress. And I will try to find time to read those 2 articles you suggested (my reading stack is just a little high right now...).
Thank you very much for your reply and for taking the time to keep answering my many questions!
1) I did read the Grilli and Rampichini paper, but I would like to make a statement about the overall precision, or reliability, of the scale at the individual and neighborhood levels, and this paper only proposes formulas to calculate communalities at the item levels. That's why I thought I could at least calculate the scale information from the transformed IRT parameters...
2) Am I right in assuming that if one has two factors at the subject level and one factor at the neighborhood level, one should not constrain the items to have equal loadings at the two levels?
In terms of "scale information", I mean the sum of item information for the items in a scale (information provided by specific response category times probability of respondent with trait level x choosing gth response category), assessed across the range of the underlying latent construct.
In essence however, I would just like to get a measure of the precision of the scale and be able to make a statement about its measurement quality at the subject and neighborhood levels.
Thank you for all your help and time with this problem--I very much appreciate it!
When you describe what you mean by "scale information", it sounds like what IRT people call "information function". That is, the standard errors for the factor score estimates expressed as a function of the true ability. See for example the 1985 Hambleton-Swaminathan book, chapter 6. Do you agree? If so, Mplus has not yet implemented this, but we will do so, also for multilevel models.
The reason calculating precision/reliability for a scale (= a latent variable construct, estimated as factor scores) is not included yet in Mplus is that this is not typically central to latent variable modeling in the following sense. You have a measurement model with 2 within-level constructs, one for 5 and one for 7 categorical items. The measurement modeling is typically not an aim in itself, but is related to other variables, either predictors or consequences. Those other variables can be brought together with the measurement model to create a structural equation model that is estimated in a single step. The precision/reliability aspect of the measurement model then translates to how well you can estimate structural regression slopes and that is assessed by their SEs. Few research questions need to be approached in a 2-step fashion (measurement model producing a scale, scale used for some purpose). - What kind of use of your measurement model do you have in mind?
I would like to make a statement about the quality of a measure across two different sites. So one of the things I would like to compare is the level of reliability of a measure of the latent construct in the two sites--i.e. can the same construct be measured with comparable reliability in the two sites? That's why I wanted to compute the reliability. Maybe there's a way to do it manually, as proposed by Raudenbush in those articles I cited.
Also, I have another question. Is there a way that MPLUS has of doing differential item functioning for multilevel structures? I would like to compare measurement equivalence across the two sites for this scale as well, but since the items are dichotomous, I can't use multiple group cfa.
Regarding comparing reliability of the measure across the 2 sites, what I was referring to would amount to using a 2-group latent variable analysis instead of estimating factor scores and comparing them across groups. In the 2-group analysis, you can test group invariance of the item parameters directly. The reliability of the factor score estimates that you are referring to will not be very high with only 5-7 categorical items. In contrast, the 2-group latent variable analysis comparing say the means of the latent variable across sites (assuming measurement invariance), can give good power/precision in the estimation.
I guess the above answers your second question as well. You can do 2-group analysis for categorical items. In the ML estimation framework you would do that using Type = mixture and the Knownclass option to capture the 2 groups.
Using STREAMS with Mplus, I performed a twolevel analysis. To be able to use the start values, I performed at first a regular confirmatory factor analysis on the total covariance matrix. This leaded to a model with six latent variables and covariances among the most of them. The model fit of this model equals the following: RMSEA= .061 and X squared/df = 2.18. Although the RMSEA indicates some possibilities to adjust the model, the X squared/df shows a good model fit. Since further adjustments would lead to difficulties in interpretation, I decided to use this model to begin with at the within level of the twolevel analysis. At first I presupposed no between structure by allowing the manifest variables at the between level to covary freely. However, running this twolevel analysis, I encountered several problems. -Sometimes I received the remark: 'Estimated between covariance matrix is not positive definite as it should be. Computation could not be completed. Model estimation did not terminated normally. Change model and/or starting values.' I tried some small changes, like covariances among all latent variables, but that did not help. -At other times I received an internal error code (GH1006), or even a fatal error code that pointed out that there is not enough memory space to run the program on the current input file. However, at other times the model did run (see remark above). Trying to specify the between level structure also leaded to the internal error code or the fatal error code. Encountering these problems, I was wondering if these are linked to the poorer model fit of my model for the total covariance matrix, though only indicated by the RMSEA. If not, what are your suggestions to overcome these problems? If necessary I can send in my input file and data, as asked for in the internal error code GH1006.
I have a couple of follow-up questions to the issue that Magdalena Cerda raised. I am also interested in calculating the level-2 test information function for a model with many level 2 units. I realize that calculating level 2 precision estimates of factor scores is often "unnecessary" in latent variable modeling since the goal is to keep the measurement and structural components in a single model rather than taking out the factor scores to use in a path analysis. However, there are many instances in educational research where test scores are used in a multilevel analysis and are not treated as latent variables. The current state of educational research is that test scores are much more widely available as scaled scores than as raw data. When student test scores (scaled theta estimates computed from an IRT model, so equivalent to factor scores from a categorical FA) are used in multilevel models where the effect of interest is at level 2 (such as looking at an effect of a teacher intervention on student achievement where teachers are the unit of assignment), it seems that it would be important to know the reliability of the latent mean achievement at the teacher level. It is possible that the test information function at level 2 may be very different from the test information function at level 1, which would suggest that the same measures may not be appropriate for inferences that involve students and teachers.
I realize that Mplus does not currently plot the IRT information function at level 2 or save the standard errors of factor scores at level 2. However, if Mplus can be used to obtain level 2 loadings and thresholds for binary indicators, then can't I use this information myself to compute a test information function at level 2?
Mplus does give information curves for level 2 latent variables. If you don't get them, there might be another reason for it. You might want to send your input, output, data, and license number to firstname.lastname@example.org
Hi, Dr. Muthen: I simulate a 2-level CFA model and both within and between levels have equivalent structures: 2 factors and each factor predict 3 indicators. The output showed an error message indicates that my between level covariance estimation is not positive definite, it says: THE RESIDUAL CORRELATION BETWEEN Factor2 AND Factor1 IS 1.000 Does that mean two between level factors are highly correlated? What is meaning of it and how could I correct starting values such as factor variances?
It is likely that in MODEL MONTECARLO you specify a high correlation between factor2 and factor1 and that in one random draw, the correlation becomes one. If you want more information on this, send your input, output, and license number to email@example.com.
I estimated a two-level factor analysis, with two factors at the respondent level (4 items each) and one factor at the neighborhood level (8 items), and an n=2494, nested within 166 neighborhoods.
I get very different results for the neighborhood-level factor loadings and the neighborhood-level reliability (using the information curves), depending on the scale item I decide to fix to 1 in the neighborhood-level factor and one of the respondent-level factors. If I select one particular item, the reliability is low (highest information statistic about 0.8) but the item loadings are significant. If I select another item, the reliability becomes sky-high (an information statistic of about 100), but the factor loadings for many of the items at the neighborhood level become insignificant. Changing the reference item doesn't have an impact on factor loadings or reliability at the respondent level .
Different items chosen to set the metric of the factor gives different results for significance of loadings and for information functions because the factor is expressed in a different metric. For loadings, this is in line with standardized coefficients not being significant at the same time as the raw coefficients. The information curves will differ in the two runs but they will be proportional by lambda^2 see formula (8) in
I have estimated a multilevel confirmatory factor analysis with one factor at each of two levels, and covariates at the two levels as well. One of the neighborhood covariates consists of a set of dummy variables for different neighborhood "types" (poor and cohesive vs poor and non-cohesive, for example). I would like to construct a bar graph showing the estimated level of the latent outcome (perceived violence) for the different neighborhood types, with average values for all other covariates.
The question is, how does one obtain this in a two-level factor model that has covariates at both levels? Does one just use the beta estimates at the between level to estimate a prediction, as one would in a single-level model, or does one also use thresholds at the between level, and does one also need to use anything at the within level? The concern arises particularly since there is no one intercept in the model, but several...
Dear authors, I'm carry on a multilevel factor analysis for ordinal variables.
As suggested by Grilli, Rampichini (2007), I want to carry on a separate EFA on the estimated between and within correlation matrices of the latent responses. "The decomposition of the latent response correlation matrix into the between and within components can be obtained by means of a multivariate two-level ordinal model with unconstrained covariance structure."
How is possible with Mplus to obtain this decomposition?
Yes, see the SAVEDATA command in the Mplus User's Guide.
elisa posted on Thursday, March 08, 2007 - 4:38 am
Dear Dr. Muthen, I'm working on MFA following the strategy suggested by Muthen(1994) and Grilli(2007). I have 12 indicators and I want to estimate the between and within covariance matrix. I referred mainly to the Mplus web site Example, cont10.
USEVARIABLES ARE ...; MISSING ARE all(999); cluster = cdl; ANALYSIS: TYPE = TWOLEVEL; estimator = ml; model: %within% fW BY giud RAPCOL RAPDOC RAPNDOC RAPRELA RAPSTUD STRAULE STRBLB STRLAB indietr R1452 R1482; %between% fB BY giud RAPCOL RAPDOC RAPNDOC RAPRELA RAPSTUD STRAULE STRBLB STRLAB indietr R1452 R1482; OUTPUT: SAMPSTAT STANDARDIZED; SAVEDATA: SIGB IS "d:\...\SIGB.txt"; SAMPLE IS "d:\...\SAMPLE.txt";
the output gaves me the estimated sample statistics for between and within, but it says: THE ESTIMATED BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. COMPUTATION COULD NOT BE COMPLETED. PROBLEM INVOLVING VARIABLE R1482. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES
What do I have to do? And, another question: values in the SIGB I saved are different from ESTIMATED SAMPLE STATISTICS FOR BETWEEN, why? Which values do I have to use in order to carry on an EFA on the Beetween covariance matrix? Really thanks, best regards Roberta
This message usually points to a problem of zero variance on the between level. You can fix the variance to zero. If this is not the issue, please send the input, data, output, and your license number to firstname.lastname@example.org.
I would need more information to answer your other question. Please send the input, data, output, and your license number to email@example.com.
Dear Mrs. Muthen, another question. I have 12 indicators and I want to estimate the between and within covariance matrix in order to follow the strategy of Muthen(1994) and Grilli(2007) for the subsequent analysis. Snyiders and Bosker (1999) suggest to carry on a multilevel multivariate empty model, but I think in Mplus is not possible. So, I have to use a TWOLEVEL analysis to obtain the estimates of the between and within covariance matrix. But, at every level, do I have to "create" as many factors as the observed variables are, or do I have to "create" just one factor? Are there some references on this topic with Mplus examples?
And, when I use the SAVEDATA command, do I obtain SIGMAB and "SIMGMAW" or the pooled SIGMAW? After obtaining this matrix, may I use an EFA? So, ANALYSIS: TYPE = EFA 2 4; MATRIX = COVARIANCE;
Thanks a lot, sincerely Elisa
elisa posted on Wednesday, March 14, 2007 - 8:40 am
Dear authors, I have 12 indicators and I want to estimate the between and within covariance matrix in order to follow the strategy of Muthen(1994) and Grilli(2007) for the subsequent analysis.
After saving the estimated between and within covariance matrix through SAVEDATA command, how may I use it for an EFA and a CFA? If I use:
VARIABLE: NAMES ARE item1-item12; USEVARIABLES ARE item1-item12; ANALYSIS: MATRIX=COVARIANCE;
Mplus does not read data in the correct way. How can I solve the problem?
Example 12.1 shows how to read a covariance matrix as data. The MATRIX option of the ANALYSIS command should not be used. If you continue to have problems, please send your input, data, output, and license number to firstname.lastname@example.org.
Hello, I am working with item level data for a 3-item scale (y1-y3) collected on 150 people (id) once a day for 14 days (day). I would like to break down the variance components for the scale in order to calculate various scale reliability estimates (e.g., within person reliability across days). In order to do this, I would like to calculate variance components for person, day, item, person*day, person*item, day*item, and error. I was wondering if someone might be able to help me with the Mplus syntax to get these estimates?
I 'm working on multilevel CFA.Mplus 5.1 demo does not display chi-square and modification suggestion. How can I solve the problem?
result : THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILL-CONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.668D-16.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 11.
I have done longitudinal content analysis of 128 websites of candidates for President. I have 81 variables, most of which dichotomous. Overall, 20 candidatesâ€™ websites were coded in 11 observations, but not all of them were coded in all observations: websites were taken in and out of the target population based on the candidatesâ€™ decisions to enter or quit the race. Thus, a candidate that entered early and stayed throughout all the race would have 11 observations, while one than entered early but dropped out quickly would have, say, 4 observations, and so on. I would like to know whether multilevel factor analysis would be suitable to analyze my data. I am concerned that the sample size is too small, especially considering that both between- and within-group analyses need to be conducted. Thanks a lot for your help.
It sounds like you have a sample of 20 subjects, which is quite low. But I am not sure what the difference is between your "81 variables" and your "11 observations". Perhaps an "observation" is a time point at which you observe several variables. If you have 20 subjects observed at 11 time points (some subjects have fewer time points) you have longitudinal data and that means your information is increased substantially. See for example discussion of growth modeling sample size in
Muthén, L.K. & Muthén, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.
With only 20 subjects, however, you can do only very limited "level-2" (person-level) modeling.
robertav posted on Thursday, October 08, 2009 - 7:13 am
Dear authors, I'm applying a multilevel factor model to my data.
Mplus output gives me some relative goodness of fit measures (BIC, AIC, etc). I would like to know if there are some references about some absolute goodness of fit measures that have to be used in multilevel factor models.
working on a paper on intergenerational relations, I am analyzing the following three-level data structure: Respondents (level 2) rate several aspects (items) of their relations (level 1) with 6 different kin. Additionally, the study is cross-national in that respondents are nested within 11 countries (level 3). I am currently working on the measurement models (CFAs) which, if successful, are to be extended to full multilevel SEMs. I planned on modeling the first two levels explicitly while correcting standard errors for the clustering by country (using the TYPE=TWOLEVEL COMPLEX command). From my understanding, 11 level 3 units is not enough to be modeled as a third level, anyway. However, in this analysis Mplus gives implausible fit indices (a CFI of 0.0 and even a negative TLI). As soon as I take the country cluster variable out, results become plausible (CFI=.92, TLI=.85).
1. Do you have any suggestion why this occurs or what I can do about it? 2. In my case, would you recommend a multigroup approach to establish measurement invariance across countries instead of simply correcting the standard errors by country clustering? 3. Do you know of any literature on clustered or multigroup two-level CFA?
11 countries is typically not sufficient for Type=Complex. At least 20 are needed. So I would switch from random country effects to fixed ones, using country dummies as covariates. Or use country as a grouping variable as you say.
I wonder if you need multilevel modeling - couldn't the different items for the different kins simply be a multivariate observation vector where the items and the kins are correlated via the model?
Thanks a lot for your interesting suggestions. Some new questions arise: 1. I wonder what the suggested multivariate approach would imply for my CFA. For every kin, there would be as many factors as there are constructs, having measurement error variances per item correlate between kins, is that right? 2. I also wonder how one would then build in covariates in an SEM extension measured on different levels (e.g., a dummy for relation with same-gender kin (level 1) and respondent's education (level 2))? Via 2nd order factors (i.e., relational aspects across kin)? I am a bit confused here, would appreciate any suggestion. 3. If I use fixed country dummies, are standard errors for lower-level covariates (e.g., respondent's education) estimated correctly? 4. Finally, do you have any references on the equivalence (or similarity) of the multilevel and the multivariate approach to hierarchical data?
Thank you so much in advance, best regards, Oliver
Multilevel CFA and sample size: I am developing a group-level scale, and need to know the minimum number of individuals and groups required to run a multilevel EFA (and CFA). In the final scale, I expect about 20-25 items and 3 sub-scales. Would you recommend running a Montecarlo simulation to know (approximately) the required sample size, and which references would be useful for that? Thanks a lot. PS: I have (Muthén,1994),(Muthén&Muthén, 2002) and the doc “v5.1 ex. Addendum".
Hi, I have a follow-up question concerning my multilevel analysis of perceptions of relationships (level 1) nested within respondents (level 2) nested within 11 countries. Concerning your suggestion to include fixed country dummies, does this actually solve the problem of biased standard errors due to country-clustering? I noticed that adding a cluster command for countries (despite concerns about the small number of countries) does affect standard errors considerably even if country dummies are already in the model. Do I also get correct standard errors if I include country-level aggregate covariates (e.g., gross national income) instead of country dummies?
Using dummy variables will take care of some but not all of the non-independence of observations. Including country-level covariates can also help. The standard errors are not trustworthy using TYPE=COMPLEX with only 11 countries.
I have run a multilevel factor-analysis using this code: MODEL: %WITHIN% LNw BY RTw Logitw; LNp BY RTp Logitp; s| Logitp ON LNw; %BETWEEN% s; My question is whether the factorscores are also influenced by the s | part, or are the factorscores only determined by the first two lines?
Factor scores are estimated using the entire estimated model.
Teresa Dubb posted on Tuesday, March 29, 2011 - 6:01 pm
I have an experiment where 50 participants are assigned into two conditions (good/bad) and in each of the conditions, each participant is presented with two different alternatives (A/B) and they are asked to provide their judgments on both alternatives A and B. I understand that the judgments are clustered by participants but I can't quite figure out how to use TWOLEVEL EFA to perform an exploratory factor analysis with CLUSTER = Participant. Below is an example of the data where X1, X2, and X3 are judgments on different aspects of the alternatives. Thanks very much.
Participant Condition Alternative X1 X2 X3 ... 1 good A 10 12 15 ... 1 good B 30 25 38 ... 2 good A 20 22 33 ... 2 good B 60 35 50 ... 3 bad A 30 30 40 ... 3 bad B 20 40 50 ...
If the alternatives aren't randomly equivalent it seems like you might want to spread their judgements in a wide, multivariate fashion instead of doing a twolevel model. The data would then look like:
1 good 10 12 15 ... 30 25 38... 2 good 20... etc
You handle this by "longitudinal factor analysis" which can be done in an EFA framework using "ESEM" - see exploratory structural equation modeling in the UG index. You can then check if the judgement factors are the same for A and B.
Hello, I am trying to run a two-level sem (with cluster-level) with continuous factor indicators (latent factors 1-2), categorical factor indicators (latent factors 3-5) and 2 observed covariates on the within level. Additionally, I have 2 observed covariates on the between level. I wonder which estimator is appropriate in this analysis, wlmsv or mlr? Thanks for your advice, Sofie
If you have three factors with categorical indicators, that will require three dimensions of integration which is computationally demanding which would suggest using WLSM or WMSMV. If you have a lot of missing data, you may want to use MLR or multiple imputation followed by WLSM or WLSMV.
I am trying to run a relatively simple latent variable model using a complex data set (TIMSS). I was under the impression that the most recent version of Mplus had the capacity to apply SEM analysis to complex data, accommodating the weighting variable and the replicate weighting variable. However, I am getting an error suggesting otherwise:
EFA factors are not allowed with replicate weights. EFA factors are declared with (*label).
Can you not use the complex data options with LVM modeling in version 6.1?
Thank you in advance for your time!
Below is my syntax.
VARIABLE: NAMES ARE BS4GSEX BS4GBOOK TOTWGT JKZONE JKREP USBSGQ6A USBSGQ6B PATM VALUE SCONF APP01 KNO01 REA01; USEVARIABLES ARE BS4GBOOK TOTWGT JKREP USBSGQ6A USBSGQ6B PATM VALUE SCONF APP01 KNO01 REA01; MISSING ARE ALL (9); CATEGORICAL ARE BS4GBOOK USBSGQ6A-SCONF; WEIGHT IS TOTWGT; REPWEIGHTS = JKREP;
MODEL: F1 BY BS4GBOOK USBSGQ6A USBSGQ6B (*1); F2 BY PATM VALUE SCONF; F3 BY APP01 KNO01 REA01; F3 ON F1-F2;
ANALYSIS: TYPE = COMPLEX; ESTIMATOR IS WLS; REPSE = BRR; ITERATIONS = 1000; CONVERGENCE = 0.00005;
I have a 3-level nested cross-sectional dataset where the outcome variable is binary. The level-1 data is on patients’ characteristics, the level-2 data is on Physicians’ characteristics, and the level-3 data is on clinics characteristics.
Patients’ characteristics data include bunch of observed variables where some variables are continuous, some are binary (and ordinal), and the remaining are count variables. Physicians’ characteristics data also include a mixture of continuous, binary, and count variables. Clinics’ characteristics data only includes one binary variable.
Given the mix of exogenous variable types (continuous, binary, and count), and binary outcome variable, can Mplus handle this as a 3-level factor model? Furthermore, I was thinking that before going ahead with 3-level model, maybe I should first do EFA on just the patient-level data so as to reduce the number of predictors to a smaller but representative set of composites. Later, I can use those composites to define level-1 model in the 3-level factor model. Do you think it is okay to do that?
For cross-sectional models, Mplus currently handles two-levels. You can use TYPE=COMPLEX TWOLEVEL for three-level where the standard errors at the third level are computed taking clustering into account.
A follow-up question. I was thinking that before going ahead with 3-level model, I should first do EFA on just the patient-level data so as to reduce the number of predictors to a smaller but representative set of composites. Later, I can use those composites to define level-1 model in the 3-level factor model. Do you think it is okay to do that?
The message means that in cluster 395, the variable PCP-LATE is not the same for each individual. This is a requirement for a between-level variable. For further help send your output, data, and license number to email@example.com.
Jacky Luo posted on Friday, November 04, 2011 - 6:47 pm
Dear Dr. Muthen, I want to fit a 2-level Rasch testlet model to my data, which consist of 4 testlets with each having 9 binary items. I am also interested in getting the group variance. I ran with the following code, but the output seemed a little bit off, and it seemed that I could not get a direct estimate of the group variance. Would you please take a quick look at it to see whether there is something wrong with my code?
TITLE: Two level IRT testlet model DATA: FILE IS C:\Users\Jacky\Downloads\r1.dat; VARIABLE: NAMES ARE u1-u36 group; CATEGORICAL = u1-u36; CLUSTER = group; MISSING = ALL (999); ANALYSIS: TYPE = TWOLEVEL; MODEL=nocovariances; INTEGRATION = MONTECARLO; MODEL: %WITHIN% f1 BY u1-u9@(1); f2 BY u10-u18@(1); f3 BY u19-u27@(1); f4 BY u28-u36@(1); f5 BY u1-u36@(1); f5@1;
When you hold the loadings equal you should not use @ because @ means fixed.
Also, you are only fixing the f5 variance at 1 and I think you want to fix all five variances.
It will be impossible to handle all 36 random intercept variances on between (unless you use estimator = wlsmv). Instead you want to formulate a factor on between where those 36 random intercepts are the indicators.
Jacky Luo posted on Sunday, November 06, 2011 - 11:55 am
Hi Dear Dr. Muthen, Thanks very much for your prompt response. Based on your suggestions, I used the following code: %WITHIN% f1 BY u1-u9 *(1); f2 BY u10-u18*(1); f3 BY u19-u27*(1); f4 BY u28-u36*(1); f5 BY u1-u36*(1); f1@1; f2@1; f3@1; f4@1; f5@1; %BETWEEN% fb BY u1-u36;
Is this consistent with your suggestions? I am also wondering why I have to fix all 5 factor variances to be 1, since f1-4 are testlet factors and they are simulated not to have the same variance as f5.
%WITHIN% f1 BY u1-u9 *(1); f2 BY u10-u18*(2); f3 BY u19-u27*(3); f4 BY u28-u36*(4); f5 BY u1-u36*(5); f1@1; f2@1; f3@1; f4@1; f5@1; f1-f5 WITH f1-f5@0; %BETWEEN% fb BY u1-u36; u1-u36@0;
If you are going to have a Rasch-like model I would not generate data where the factor variances differ but instead the loadings.
Jacky Luo posted on Monday, November 07, 2011 - 10:33 am
Thanks so much, Dr. Muthen!
Morag T posted on Thursday, November 17, 2011 - 3:09 am
I wonder can anyone help me please. I have 5 waves of a birth cohort study (data collected annually) and I have a series of categorical variables on ‘closeness to family and friends’ collected in each sweep (26 variables in total).
I did 3 principal components analyses on these (I couldn’t just do one as not all the variables were sufficiently correlated). The resulting 3 factors regressed nicely with my DV, ‘Children’s Stress and Difficulties score’.
However, as I used variables across waves this makes the data multi-level and my PCA was not. Can anyone advise me whether this data would be suitable for a multi-level factor analysis in MPlus? I have MPlus but it is new to me.
Just to complicate matters, my data is a complex sample, with PSU (cluster), Strata and a longitudinal sample weight. Should these be incorporated into a multi-level factor analysis, and if so, how? If it is possible and you could show me the necessary syntax, I would be eternally grateful.
What I had tried to do in MPlus was:
Title: Data: File is \ Variable: Names are ; Missing are all (-9999) ; Categorical are Usevariables are Stratification = DcStrat; Cluster = DcPSU; Weight = DcWTbrth; Analysis: Type = Complex EFA 1 4 ;
Could you please advise on this? And, many thanks for reading this far!
Morag T posted on Friday, November 18, 2011 - 3:26 am
Many thanks for your prompt response. However, I'm sorry, I'm not quite understanding, do you mean the approach I already took was reasonable, or to take the multilevel factor approach would be reasonable?
You can take the non-independence of observations into account using COMPLEX or TWOLEVEL. If you use TWOLEVEL, then in addition to taking non-independence of observations into account you would be interested in study the factor structure on both within and between.
Morag T posted on Tuesday, November 22, 2011 - 2:30 am
Thank you very much!
finnigan posted on Tuesday, December 13, 2011 - 6:06 am
I want to run a multiple indicator growth model using three waves of data , but the data are collected from individuals nested in companies. Arguably , I have a multilevel data structure.
I am trying to conduct measurement invariance testing using CFA prior to running the growth model.
Should I run a CFA with a single factor using a multilevel analysis ie at time one run a two level CFA and repeat this analysis for time two and three or would it be more appropriate to run a two level longitudinal factor analysis to check for multi level effects. Groups are unbalanced and less than 30. Sample size for t1 = 130, t2 = 118. T3 =110 The multilevel structure is not of interest , I am trying to rule it out so that I can focus on the measurement invariance testing for the growth model.
Does MPLUS calculate sampling weights or do they have to be specified for MPLUS
I would run the analysis with and without TYPE=COMPLEX to see if there is a big difference in the standard errors. If not, I would ignore the clustering. You can see the Topic 4 course handout on the website for the steps for multiple indicator growthm
finnigan posted on Thursday, December 15, 2011 - 2:10 am
Should the multilevel factor analysis be run longitudinally ie all three time points or one factor analysis per time point?
Hello, I am working on a two-level EFA/CFA model with likert scale items from three scales. Items from two of the scales have response categories ranging from 1-4 and items from the third scale have response categories from 0-3. I am running continuous models. I am wondering if it is preferable or acceptable to standardize the responses (using z-scores) when running the multilevel factor analyses to get the items on the scale. Results from the EFA/CFA's using standardized and non-standardized items are very similar in the within-level part of the model, but are a little different in the between-level part of the model. Is there standard practice for whether or not to standardize items on different response scales in factor analysis and does this differ for multilevel factor analysis?
You can only use standardized variables if you have a scale free model. In this case, it does not matter whether you analyze a correlation or covariances matrix. If you have a model with any constraints, you will obtain different results analyzing a correlation versus a covariance matrix. Standardizing also presents problems for across group or across time comparisons.
In TWOLEVEL EFA with ordinal indicators, I obtain the within and between sample covariance with SAMPSTAT. Consistent with LRV formulation of categorical models, I see 1.00 for the within variance of all indicators and item-specific variances in the between matrix (unlike TWOLEVEL EFA with continuous indicators, where item-specific variances appear in both within and between covariance matrix).
However, ICC is given for each item, and some algebra with ICC and between-variance will yield the within-variance consistent with the ICC, and it s not 1.00.
How are within variances and ICC calculated with categorical items and LRV formulation?
Is it appropriate for me to use ICC to calculate the within-variance for reporting purposes?
I don't understand probit or logistic in this context. It is an EFA
Data: File is tone.dat ; Variable: Names are ss video clip SU NU CA WA PO RE AF PA DO CO BO DI; usevar SU CA PO RE DO CO BO DI; categorical SU CA PO RE DO CO BO DI; useobs video==0; Missing are all (-9999) ; Cluster = ss; Analysis: Type = TWOLEVEL EFA 1 2 UW 1 2 UB; Output: SAMPSTAT;
Ok, but I am still missing something, unless this relates to the intraclass correlation coefficient calculation. I hope I am not being dense.
That is my entire program, so whatever the default estimator is for twolevel efa with categorical indicators, thats what I used.
But my question is about the sample statistics obtained with SAMPSTAT. ... and about the ICC.
I am just trying to get some semblence of total, within, and between estimated sample correlations. Papers with continuious indicators, like Reise's 2005 paper demonstrating MFA, report these correlations.
I think logit vs probit has to do with interpreting the loading. I am after the within-variance used to calculate ICC.
Thanks for confirming the within-variance is fixed at 1.
This morning I recheck my calculations and find that the problem was I had transposed the the between-variance (.125) and the ICC (.111) when I made my calculations, and so I did not compute 1.0. (I computed .777, where I expected 1.0, and got confused).
Apologies for troubling the board with an issue that stemmed from me transposing two numbers. Many thanks for your help in getting me back on track.
If I can ask a followup question, do I understand that with some other estimators, the within-variance would not be fixed at 1.0?
.... and, is there any advantage/disadvantage to choosing different estimators with this EFA I am conducting?
Hello. I am a novice to using multilevel factor models and I have a question concerning the use of a two-level factor model with four ordinal (Likert type) items via WLSMV estimation. My sample size is 589 with 50 clusters.
There is only one factor that I am trying to assess using this model (there is a unidimensional factor structure at level-1 and at level-2), but in all of the models tested the between-level (level-2) factor variance is nonsignificant using an alpha level of .05.
Thus, I am wondering if the non-significant between-level factor variance warrants that I continue to use multilevel factor analysis, or if I should use Mplus to assess the factor structure of these items using an aggregated, single-level CFA?
Put another way, is a multilevel factor model warranted when the level-2 factor variance is nonsignificant?
Yes, it is warranted if the residual variances are significant.
KUN YUAN posted on Thursday, November 08, 2012 - 10:56 am
Dear Dr. Muthen,
I have a four-level data set with 10 categorical factor indicators and would like to conduct multilevel EFA with it using Mplus. I've seen the examples with 2 levels. I wonder whether Mplus is capable of running a four-level EFA with categorical variables. If so, could you please point me to any examples you have?
I have conducted, for the first time, Multilevel factor Analysis. I estimated a two-level factor analysis using teacher survey and principal survey. (Teachers are nested within schools). I did Exploratory Factor Analysis (EFA) using principal components analysis extraction method with Geomin rotation on 19 items through Mplus ver. 6.11.
Now I have three results 1. EXPLORATORY FACTOR ANALYSIS WITH 2 WITHIN FACTOR(S) AND 2 BETWEEN FACTOR(S):
2. EXPLORATORY FACTOR ANALYSIS WITH UNRESTRICTED WITHIN COVARIANCE AND 2 BETWEEN FACTOR(S):
3. EXPLORATORY FACTOR ANALYSIS WITH 2 WITHIN FACTOR(S) AND UNRESTRICTED BETWEEN COVARIANCE:
What is the reasoning for choosing one of those three factors over the others? Is there anything I should look out for before I use one of those? For example, if I choose to use the information from 2 within factors and 2 between factors compare to unrestricted, what should be the rational? and vise versa?
There are fit statistics to help you choose. However, it may be difficult to choose based only on these. I would say ultimately, the interpretability of the factors based on your theory should guide you.
I am trying to conduct a twolevel CFA on a data set with two observations per participant (ID) and categorical factor indicators (variables rated on a 4-point likert scale). When I run the analyses, the output states that the model terminates normally, and there are no further warnings. However, the only model fit indicators I get are Loglikelihood and Information criteria, and no CFI, TLI or RMSEA.
The command I use is this: TITLE: Two-level factor analysis, categorical indicators DATA: FILE IS M:\Data.dat; VARIABLE: NAMES ARE ID Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12; CATEGORICAL IS Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12; CLUSTER IS ID; ANALYSIS: TYPE IS twolevel; MODEL: %between% F1Bet BY Y2 Y4 Y7 Y9 Y11 Y12; F2Bet BY Y1 Y3 Y5 Y6 Y8 Y10; %within% F1With BY Y2 Y4 Y7 Y9 Y11 Y12; F2With BY Y1 Y3 Y5 Y6 Y8 Y10;
Is there anything I should ask for in the OUTPUT to receive CFI, TLI etc.?
With categorical dependent variables and maximum likelihood estimation, means, variances, and covariances are not sufficient statistics for model estimation. Because of this, chi-square and related fit statistics are not available.
Hello, I was hoping I could get advice on what type of EFA to perform in MPLUS for my dataset.
In our study, participants retrieve several memories and rate them on 13 fixed variables. Since memories are personal, they vary between subjects. We are interested in examining
1) the factor structure underlying the variables characterizing the memories (i.e. within-level EFA) and 2) the factor structure at the between-subject level, such that a factor reflects the fact that people who (on average) rate their memories high on one variable also rate their memories as high on another variable.
Factor scores are most meaningful to me on the between-subject level as I would like to correlate them with other measures.
My question is whether it would make sense to perform a two-level EFA (saving factor scores for the between-level), OR to perform two separate EFAs, each with slightly different interpretations: 1) a within-level EFA where each row reflects a single memory and I ignore subjects all-together and 2) a between-level EFA where each row represents the average ratings across all memories from a single subject.
Finally, if I chose the two-level route, will MPLUS accept a datafile where each row reflects a single memory and there are 13 different columns and a separate “SubjectID” column?
Hi! Thanks very much for your response. We ran a two-level EFA and the algorithm identified an unrestricted within and 4-factor between as providing the best fit to the data. I want to extract factor scores for the between-level factors (I don’t care about the within) but am having trouble. I gather that saving factor scores is not possible with a two-level EFA, so I ran a two-level CFA and have gotten the following error.
However, we played around with changing the number of factors and removing some of the variables (just to see if the model would run), and we were successful at getting a 3 factor model and producing factor scores.
Do you have any insight into this error? We also tried different start values with no avail.
MAXIMUM LOG-LIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS -74757.670
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILL-CONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.153D-13.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 53.
burak aydin posted on Friday, November 22, 2013 - 8:58 am
Hello, Our data have 130(N) clusters, 15 kids in each cluster(n~ ); and, we are trying to confirm 2 latent factors assessed with likert type questions (scale 1-3). We have 14 indicators for f1, 30 indicators for f2.
We are not interested in factor structure at between level, also running MCFA is computationally demanding (also we dont wantto get into parceling).
We plan to run single level CFA with declared clusters (CLUSTER command).
Do you think this model is sound/valid/ satisfactory to run and report a CFA for clustered data? Thanks
I think you are asking about Type = Complex versus Type=Twolevel. The factor model is not "aggregatable" (see the Muthen-Satorra article
Muthén, B. & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316.)
and may be somewhat distorted by taking the Complex approach relative to the Twolevel approach. But it may be a reasonable approximation.
burak aydin posted on Monday, November 25, 2013 - 10:12 am
Yes, I am asking Complex vs. Twolevel. I was encouraged to ask this question because complex vs twolevel provided consistent findings for a two level regression model, not in a simulation study but with 25 different outcomes. For a two level regression I always prefer using the two-level approach.(this might come to GEE vs multilevel) We also wanted to use two-level approach for CFA, given it was the nature of the data set. But we faced some difficulties, both computational and theoretical. Thank you very much for your response.
SY Khan posted on Monday, December 09, 2013 - 6:58 am
Dear Drs. Muthen,
I am working on secondary data in which the independent variables (employed HR practices) are measured at workplace level (level-2) and are binary. Employees responses to the HR practices in terms of their job satisfaction, depression and organizational support etc are measured at level-1 and are intervening and outcome data in my model (ordinal). So the employees are grouped within workplaces and evaluate the effects of their corresponding HR/organizational practices.
The EFA results at level-2 highlight four factors for HR practices confirmed by CFA at level 2. At the level-1 EFA and CFA confirm seven constructs. So, four dimensions of HR practices (level-2)affect perceptions of employees’ outcomes on seven aspects (level-1).
1- Because out of the total 11 constructs in the SEM model, four are at level 2 and 7 at level 1 is it appropriate the overall measurement model (CFA) be evaluated as a two-level CFA?
2- I would like to omit the covariates in the two-level CFA and introduce the covariates in regression/path analysis. I have found example 9.7, in Users Guide 7.11, but this is with covariates. Can you please recommend a more suitable example and syntax in my case.
1. Yes. 2. Remove the covariates from Example 9.7.
SY Khan posted on Monday, December 09, 2013 - 3:34 pm
Thanks for your prompt reply. I have I made following changes to example 9.7 syntax:
1-As I have seven factors at level 1 formed by Categorical factor indicators so in the WITHIN and BETWEEN:
%WITHIN% PJC1 BY AUTOM1-5; JSATS1 BY JS1-9; And so on for remaining five factors. (All level 1 indicators)
%BETWEEN% PJC2 BY AUTOM1-5; JSATS2 BY JS1-9; And so on for remaining five factors. (All level 1 indicators)
2-As I am not interested in covariates so removed the
WITHIN=x1 x2; BETWEEN=w;
3- INTEGRATION=MONTECARLO; 4- CLUSTER=WORKPLACE#;
BUT I get the following FATAL ERROR: *** FATAL ERROR THERE IS NOT ENOUGH MEMORY SPACE TO RUN Mplus ON THE CURRENT INPUT FILE. NOTE THAT THE MAXIMUM MEMORY USAGE BY Mplus 32-BIT IS LIMITED BY THE OPERATING SYSTEM.
Kindly advise if:
1- I have altered the syntax correctly?
2-Also the syntax does not include any of the HR factors at level2 that are affecting the perceptions of employees at level 1 about their job satisfaction and six other employee outcome aspects. i.e. the overall measurement model is incomplete as it does not have four constructs at level 2 included. How can I include the effects of the level 2 constructs in the model?
SY Khan posted on Wednesday, December 11, 2013 - 11:44 am
Dear Dr. Muthen,
Thank you for your email regarding the use of WLSMV estimator for computational ease of the level 2 CFA.
I am still not clear on the link between level 1 and level 2 factors though. you recommend that I can create a factor at level 2 following the example 9.7 of users guide.
1- My question is that in the %WITHIN% part I have seven factors with categorical indicators at level 1. So in the %BETWEEN% part should I create one factor with all level-1 factor indicators (which form seven factors). OR re-write the seven factors at level 1 in the %BETWEEN% part again.
2- What will creating a factor at level 2 in the %BEWEEN% show? 3-is it possible to have a diagram of this model?
MODEL: %WITHIN% fw BY y1 y2 y3 y4; %BETWEEEN% fb BY y1 y2 y3 Y4;
fw uses factor indicators that are the within part of y1, y2, y3, and y4. fw cannot be used in the between part of the model.
fb uses factor indicators that are the between part of y1, y2, y3, and y4. This can be used in the between part of the model.
A model with a random slope can have a cross-level interaction. See Example 9.2.
Path diagrams are not available for multilevel models.
SY Khan posted on Wednesday, December 11, 2013 - 3:08 pm
Thanks for your prompt reply. Kindly excuse my repetition as I am getting confused by the same Y1 y2 Y3 y4 variables that are coming in both between and within part.
Kindly correct me if I understand incorrectly. Although the within and between parts both have y1 y2 y3 y4 but these are different variables. In the Within part these are factor indicators that form level 1 factors . In my case all different factor indicators that form seven different factors at level 1. So I will have for example
fw1 By y1-y4 fw2 By y5-y6 fw3 BY y7-y9 fw5 BY y10-y13 fw6 BY y14-y18 Fw7 BY y19-y24
But in the between part y1 y2 y3 y4 refer to some other level 2 factor indicators that form level 2 factors. In my case all variables that form four different factors at level 2? So I have fb1 fb2 fb3 fb4 at between level as for example
fb1 BY y25-y28 Fb2 BY y29-y32 fb3 BY y33-y36 fb4 BY y37-39
1- Am I missing anything in these commands? 2-Is the test run time usual?
Thanks for your help.
SY Khan posted on Sunday, December 15, 2013 - 5:23 am
Hello Dr. Muthen,
In continuation to my query above regarding MCFA and no standard errors calculation due to probable model non-identification--
I altered the %BETWEEN% part of the model to include only the level 2 predictor factors and %WITHIN% part only level 1 factors. The test ran ok and for 16-17 hrs but at the end it did not give any output. Just the message that input data terminated normally.
Kindly advise where am I miss-specifying the model?
Dear Dr. Muthen, I am trying to compute a MCFA according to Muthen (1990) and Dyer et al (2005). So, I computed within and between level covariance matrices:
DATA: FILE IS "D:\analysis\bd.dat"; VARIABLE: NAMES ARE country team q25_c q25_f q25_k q25_nr; USEVARIABLES ARE team q25_c q25_f q25_k q25_nr; CLUSTER IS team; ANALYSIS: TYPE = TWOLEVEL BASIC; OUTPUT: sampstat; SAVEDATA: SAMPLE=wincov.dat; SIGB=betcov.dat;
This syntaxis automatically created the two files. For example, the within covariance matrix file (wincov.dat)contains: 0.81027265E+01 -0.15827253E+01 0.32850610E+01 0.12791074E+01 -0.90138411E+00 0.13141025E+01 -0.99105353E+00 0.89080775E+00 -0.50168173E+00 0.87720516E+00
Then I try to compute the confirmatory factor analysis on the within matrix:
DATA: FILE IS "D:\analisis\wincov.dat"; TYPE IS COVA; FORMAT IS FREE; NOBSERVATIONS IS 1415; VARIABLE: NAMES ARE country team q25_c q25_f q25_k q25_nr; USEVARIABLES ARE q25_c q25_f q25_k q25_nr; MODEL: F1 by q25_c q25_f q25_k q25_nr; OUTPUT: stand mod res;
But unfortunatly, I get this error message (this error is presented for both files): *** ERROR Insufficient data in "D:\analisis\wincov.dat"
Thanks in advance for your suggestions to solve the problem.
Dear Dr. Muthen, Firstly, thank you very much for your quick answer. The only one variable that differs on the USEVARIABLES statement is "team", which is the grouping variable (cluster is team;). Thus, it does not appear in the covariances matrix. Anyway, I have included it and computed the analysis...just to try... Below, you can see the error message.
DATA: FILE IS "D:\Bea\Proyecto Clima Job Insecurity_UIBK_2010\ articulos\paper_conceptualizacion jinsc\analisis\wincov.dat"; TYPE IS COVA; FORMAT IS FREE; NOBSERVATIONS IS 1415; VARIABLE: NAMES ARE country team q25_c q25_f q25_k q25_nr; USEVARIABLES ARE team q25_c q25_f q25_k q25_nr; MODEL: F1 by q25_c q25_f q25_k q25_nr; OUTPUT: stand mod res;
*** WARNING in Data command Summary data must be in free format. The FORMAT option will be ignored. *** WARNING in Model command Variable is uncorrelated with all other variables: TEAM *** WARNING in Model command All least one variable is uncorrelated with all other variables in the model. Check that this is what is intended. *** ERROR Insufficient data in "D:\Bea\Proyecto Clima Job Insecurity_UIBK_2010\articulos
I am doing a multiple group multilevel CFA test. The group variable is a between-level variable. I want to examine the measurement invariance property of an instrument relevant to the group so that I can compare the difference on the mean of the between-level latent factor between the two groups. Are the following example codes right for testing intercept equivalence model? Many thanks!
Can you recommend a paper or example on longitudinal CFA with a multilevel approach? I wish to run a multilevel CFA (7 time points within 710 persons). I would guess the basic syntax is below. Next I would want to regress some observed time-varying variables on these latent factors. Thank you!
usevariables s11-s13 s23 s24 s25 s26; cluster is id; Analysis: Type = TWOLEVEL;
Model: WITHIN: W_f2 by s23-s26; W_f4 by s11-s13; BETWEEN:
Dunn, C., Masyn, K.E., Jones, S.M., Subramanian, S.V., & Koenen, K.C. (2014). Measuring psychosocial environments using individual responses: an application of multilevel factor analysis to examining students in schools. Prevention Science. DOI 10.1007/s11121-014-0523-x
I am doing MCFA on a team engagement scale (e.g. "My team feels very motivated to do a good job"). I followed the steps of Dyer et al. (2005): (1) conventional CFA; (2) create within + between level covariance matrices + obtain ICC's; (3) CFA on within and between matrix; (4) MCFA. Conventional CFA and CFA on within matrix fitted well. ICC's range from 0.084 to 0.178. When doing CFA on between level I get the following warning (fit is also bad): WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE.
For the MCFA, I also got warning messages (about ill-conditioned fisher information matrix).
I do not know what is wrong. I also wonder whether I should do multilevel CFA in this case (I do not aim to aggregate the data). Thank you
Please send the MCFA output to Support along with your license number.
Djangou C posted on Wednesday, June 08, 2016 - 2:12 am
Dear Dr Muthén, I am running a multilevel CFA model. Below is the message from Mplus.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.844D-16. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 27, %BETWEEN%: [ Y6 ] THE NONIDENTIFICATION IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS.
I am not sure to understand. My model has 52 df but I have more parameters than the number of clusters. I have three questions in this regard. 1) In multilevel models positive degree of freedom is not enough for identification, the information matrix must be positive definite. Is this true? 2) What Mplus does in this case? For instance will Mplus impose constraints on Parameter 27 by fixing it to a value to achieve identification? 3) Could you please recommend some readings on this issue? Thank you for your invaluable help.
This is a warning. Independence of observations is at the cluster level with clustered data. So having more parameters than clusters is like having more parameters than observations. The impact of this on the results has not been well-studied but you should be aware of it.
I am running a multilevel EFA analysis on an eleven-item scale with both binary and ordinal variables (5 binary and 6 ordinal variables). After examining my within and between correlation matrices, I noticed that the between correlation matrix shows that two items within my scale have perfect correlation. After running within and between group correlation matrices on a different statistical program, I am getting different results (this program would assume that all variables are continuous). I can’t seem to determine why I am getting perfect correlation between these two variables in Mplus. Any ideas about why this might be the case?
I am new in the field of 2level exploratory factor analysis and have difficulties in interpreting my results. The model-fit indices show the best results with unrestricted variance at the within level and a 3-factor solution at the between level. The model-fit indices for the model with 3-factors at the within level and unrestricted variance at the between level are also very good. I do not really understand what that means. Unfortunaetly the model with 3 factors at each level shows rather inacceptable model fit indices (but compared to the other models with fixed factors it is still the best). I do not know which model i should take as base for interpretation, as from a theoretical approach the solution of 3 factors at both levels would make sense (what I also find as long as I chose a unrestricted solution for the other level). Maybe the quality of my data is just not good enough? I tried to find some literature for a better understanding of my problem, but untill now I have not found something helpful. Maybe you can reference me to a paper? Please apologize my limited knowledge.
Take a look at the papers on our website under Multilevel SEM. For instance, Muthen 1994, which gives an analysis strategy, and Dunn et al 2014.
Typically, the between level has fewer factors than within. It sounds like there isn't a clear factor structure on both within and between. One alternative in that case is to simply use Type=Complex instead of Type = Twolevel.
Malin Anniko posted on Thursday, November 17, 2016 - 9:51 am
Dear Dr Muthén, I recently received a review asking that I take into account that my data is nested. The main analyses is a CFA with 9 factors (2-4 indicators per factor). My data is individuals nested within classrooms within schools (3 lvls). I checked the ICC:s which are moderate to low, however the design effect is above 2 for several of the items at both the class and school level, indicating that it would be appropriate to take into account the nestedness.
However, my first problem is that I only have 18 clusters at the school level. As I understand it you should have at least 20 and preferably at least 30 for both COMPLEX and THREELEVEL. Is there any way to still account for the school level within the analysis?
My second problem is that I have far too many parameters I´m trying to estimate in relation to clusters, even at the class lvl were I have 138 clusters this becomes a problem (I have a total of 27 indicators). So my second question is if there is a way that I could move forward from here if reducing parameters is not an option?
I´m just interested in accounting for the nestedness in my estimates, s.e and model fit, I have no predictors at the class or school level that I´d like to model.
You should think about how many between-level parameters you have relative to 138 and to 18. The within-level parameters are probably well covered by data points on the student level.
18 is very low. Perhaps use Twolevel Complex to see if SEs are any different than Twolevel alone. Twolevel referring to students within classrooms.
You can also use 17 dummy variables for schoold.
Sophie Dan posted on Friday, March 31, 2017 - 3:32 am
Dear Mr. Muthen,
It's great to find this discussion area! I have a problem with CFA. The CFA result indicates that the factor correlations of my data are relatively high, I suppose that one of the reasons might be no control for the second level factor. So I turn to the two level CFA.
Because in the Mplus User Guide, there is an example of two level CFA with continuous factor indicators, covariates and random slopes, in my design there is no covariates,I just want to do the basic confirmatory factor analysis. Do you have the syntax of this?
You can or cannot. They are often close to zero on between so most multilevel programs fix them to zero. Here you have the flexibility to see if they are zero.
Sophie Dan posted on Friday, March 31, 2017 - 2:46 pm
I have tried, thank you!
There is another problem, I have 50 groups in SPSS file, but when I did the CFA, the output indicates that cluster=4, what would be the cause of it? (I establish a IDCLASS variable in SPSS, and label the sample which belong to the same group with the same number, so there are number 1-50 under the variable IDCLASS, is this can be identified by Mplus?)
Sophie Dan posted on Saturday, April 22, 2017 - 10:20 pm
Dear Dr. Muthen,
I'm doing a twolevel CFA with five factors in the first level, I found that no matter how you set tht between level model, the result is not ideal, for example, 2 factors to 5 factors on the between level, they all have very high correlation with each other. But it is also not good to have 1 factor becasue some of them are negative items while others are positive.
Is there a way to just partial out the impact of the between level, and see the result of the within level, because it is hard to explain the result of between level.
I tried to make the variables on between level correlated, but it makes the TLI decrease to a number like 0.8 which cannot meet the requirement.
The partialling out approach using S_PW is described in my 1994 article on our website:
Muthén, B. (1994). Multilevel covariance structure analysis. In J. Hox & I. Kreft (eds.), Multilevel Modeling, a special issue of Sociological Methods & Research, 22, 376-398. download paper contact author
But note that having one factor with negative and positive factor loadings does not need to be a problem.
Sophie Dan posted on Wednesday, April 26, 2017 - 12:08 am
Many thanks for you help！
I have another problem，what is the between level factor used for?Is it for predicting other between level factor in other scales？I didn't find books explaining between level factor in factor analysis，do you have some recommendations; Thank you a lot!
Sophie Dan posted on Wednesday, April 26, 2017 - 1:39 am
Dear Dr. Muthen,
I read your paper, so "partialling out approach using S_PW" means a factor structure at the within level with a unrestricted between covariance? But it seems that the result is the same as estimate the between and within level structure at the same time?
Sophie Dan posted on Wednesday, April 26, 2017 - 6:54 am
Dear Dr. Muthen,
sorry for last post, I now understand. But another problem comea, as in the syntax to define the number of observations, is N-G for the S_PW, but on the website of "http://www.statmodel.com/bmuthen/ED231e/Handouts/Lecture19.pdf", it should be N-c, I'm not sure if I understand it right, is it means that "the number of observations - the mean of cluster size"?
lopisok posted on Wednesday, June 28, 2017 - 5:46 am
Dear group members,
I'm running a Multilevel exploratory factor analysis based on the MPLUS syntax descriptions in this paper: " Dunn (2014) Modeling contextual effects using individual-level data and without aggregation"
My model works fine (34 countries; n=38000) and I get several possible solutions. My question is: How do I decide which model is best? As I understood there are no established fit statistics for multilevel EFA's? In the paper they use the cut off point of .10 for SRMR but this looks arbitrary. Based on theory it would be expected that there is only one factor on both levels. But is it acceptable to claim that the one factor model on both levels is a good fitting model based on the fit statistics and theory we have? Or should we go for the two factor model on both levels based on the fit statistics? How important is the SRMR in this story?
EXPLORATORY FACTOR ANALYSIS WITH 1 WITHIN FACTOR(S) AND 1 BETWEEN FACTOR(S): RMSEA = 0.028 CFI = 0.975 TLI = 0.962 SRMR within = 0.142 SRMR between = 0.195
EXPLORATORY FACTOR ANALYSIS WITH 2 WITHIN FACTOR(S) AND 2 BETWEEN FACTOR(S): RMSEA = 0.013 CFI = 0.997 TLI = 0.991 SRMR within = 0.043 SRMR between = 0.071
Is there any reference paper on choosing the best model for MEFA?
Research on these fit indices for multilevel EFA is still lacking so I have no clear cut answers for you. I know of no references for this. It is not clear that SRMR should be prioritized. I suggest you go by interpretability to a large degree. Also, you want to check the model with 2 within factors and 1 between factor. Often there are fewer factors on between.
It sounds like you want to study the change in the factor over time. This can be done for example using a growth model (see UG example of growth models and Topics 3 and 4 of our short course on our website) for the factor.
This assumes that you have measurement invariance for the 15 items across time. Usually, this is done as a single-level, wide model but you have 15*8 variables and only n=80 so that won't work.
As an alternative, a two-level analysis with a linear growth model can let the factor score change over time as follows where time is scored say 0, 1, 2, ..., 7 for the 8 time points:
Model: %within% fw by u1-u15* (p1-p5); s | fw on time; %between% fb by u1-u15* (p1-p15); fb@1; s with fb;
Here, loadings are assumed equal over the two levels which simplifies things. Apart from getting their means and variances, you can ask to plot s and fb. They correspond to the slope and intercept growth factors.
I have a dataset originating from a simple random sample of employees. When running CFAs, I'd like to simply control for the nested structure of the data, where employees are part of directorates (n=96), which are part of branches (n=24), which are part of divisions (n=3).
Now, I'm assuming I should use TYPE=COMPLEX and CLUSTER IS in this particular case. What should my cluster variable be? Post-stratification weights were computed based on divisions only; should I also use this weight variable in my analyses?
Having only 3 divisions is too few units for an extra level (you want as the bare minimum about 20). So use Type=Threelevel for employee, directorate, and branch. Divisions can be reflected by 2 dummy variables. Instead of weights, you can use the demographic variables that are used to form the weights as further covariates.