Message/Author 


Hello, I am attempting to conduct a multilevel CFA with Mplus. Repeated measures data from daily surveys are nested within individuals. So, I`ve got a twolevel model with a series of repeated measures on the withinindividual level and individual differences on the betweenindividual level. Now, I`d like to conduct a CFA with level1 variables that were measured 18 times. I guess the basic syntax for my analysis might be Example 9.6 in the handbook but I have a very simple question concerning this syntax: TITLE: this is an example of a twolevel CFA with continuous factor indicators and covariates DATA: FILE IS ex9.6.dat; VARIABLE: NAMES ARE y1y4 x1 x2 w clus; WITHIN = x1 x2; BETWEEN = w; CLUSTER = clus; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% fw BY y1y4; fw ON x1 x2; %BETWEEN% fb BY y1y4; y1y4@0; fb ON w; I am not sure about what variable I should use for "w" in my data. Actually, I am interested in testing variables y1y11 (measured on level1) and like to figure out if a onefacor or a twofactor model will be appropriate. So, I do not understand what variables in our model would parallel "w" in this example. I would be grateful for your help! Thanks in advance. 


It doesn't sound like you have a w or an x in your example. And given that you can treat longitudinal data as singlelevel data, you may also not have a need for twolevel modeling. It is not clear how many items you measure per time point. If only 1 item, then the number of items is the number of time points in your singlelevel factor analyis. If a set of items, then you have a factor model at each time point, like the top part of UG ex 6.14. 


I am asking advice on how I should model a multilevel CFA. I have a model where subjects observe variables under specific conditions hence I have repeated measures (conditions are within subjects). The dataset is structured as follows: Subj VarX Condition 111 ... 11 ... ... 12 ... ... 13 ... ... ... 222 ... 21 ... ... 22 ... ... 23 ... ... ... 333 ... 11 ... 12 ... 13 ... The total number of conditions is 16 and each subject observes only 8 of them (random block). What is the most appropriate example I should follow in the manual? I am avoiding using complex sample (w/ sandwich estimation) techniques because I want to capture the variance given by subjects as well as the variance explained by the different conditions. Thanks in advance. 


With data in the wide format, multivariate modeling takes care of the fact that several variables are measured for each person. The 8 conditions not measured should be represented as missing data. There is no need for multilevel modeling. 


I don't understand exactly what you mean. I you mean to model all the variables in the CFA after having transposed the dataset such as Subj VArXCond11 VarXCon12 . . . . then my model would have an incredible number of variables and won't converge as I will have more parameter estimates than observations. Instead I was thinking about using a two level CFA where I would cluster the subject but I still need another level to look after the condition. Is there any possibility to run a kind of random effects CFA model? 


The parameter reductions you get by using the long format impose measurement invariance, something you cannot then test. You should consider this. You can impose those same restrictions in the wide format. 


It's an impossible model to run (in the wide format) as I would have 16 conditions x 32 variables and I would end up w/ 512 variables having ~220 observations. I think the CFA would have >1,500 parameter estimates. What kind oif restriction were you suggesting? Loadings, error variances and factor variance to be equal across the items/variables and to be freely estimates across the conditions (being orthogonal?) 


It sounds like you have 32 variables per condition and 8 conditions per person. If so, do the 32 variables measure a single factor for all conditions? If so, can it be assumed that the factor indicators (the 32 vbles) have measurement invariance (at least loadings, perhaps also intercepts) across all 16 conditions? 


Hello Mplus Team: I am running the following multilevel mediational model: observed within and between level covariates > Multilevel CFA (leading to their respective parts) > 2 individual measured outcome variables (with random intercepts; fixed slopes). I have grand mean centered the within level covariates and I have manifest group averages of the same covariates at between level. As I understand, by default, multilevel CFA are essentially constructed by having group mean centered within level indicators and the between level indicators are a latent mean average that contain both within and between group variance.  please correct me if I am wrong here. By having the within level covariates grandmean centered, am I getting contextual effects at the between level without putting any constraints on the Multilevel CFA indicators? If I do need to put constraints on the MLCFA, what would be the best why to go about doing this? Ideally, I would like to get only within level variation on the within level and between level variation on the between level. Thank you for any help. 


Multilevel FA formulates a model for the population SigmaW and SigmaB, corresponding to within and between variation decomposition. In terms of the analysis, the factor indicators are not centered or averaged. The decomposition is in line with 1way random effects anova: y_{ij} = eta_j + epsilon_{ij} where eta_j has between variance and epsilon within variance. In line with Raudenbush & Bryk (2002), page 140, you get contextual effects on the between level when you grandmean as opposed to groupmean center the covariate. 

Sandra N. posted on Wednesday, September 21, 2011  8:16 am



Hi, I conducted a multilevel cfa with 6 latent factors, each specified by 4 indicators on the within and between level (12 latent latent variables and 24 manifest variables in total). I used data from 620 students nested in 45 classes. This model converged fine and model fits were satisfying (RMSEA 0.035, CFI/TLI 0.928, 0.917, SRMR within 0.039, SRMR between 0.125). However, I obtained the following error: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.125D15. PROBLEM INVOLVING PARAMETER 46. THE NONIDENTIFICATION IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS. I tried to reduce the number of parameters but it seems I cannot reduce them lower than 45, so I am not sure how to deal with that error. Is the model trustable (loadings, s.e.)? Furthermore, I obtained 3 negative residual variances (0.007  0.028)on the between level. As those values are small I fixed them to zero. Could you give me some advice on how to deal with those Heywood cases in reporting the results as I am currently working on a publication of these analyses. Is the model trustable despite the Heywood cases? Thank you for your help. Sandra 


It is common to have small residual variances on the between level and common to fix these to zero. See the following paper which is available on the website: Muthén, B. & Asparouhov, T. (2011). Beyond multilevel regression modeling: Multilevel analysis in a general latent variable framework. In J. Hox & J.K. Roberts (eds), Handbook of Advanced Multilevel Analysis, pp. 1540. New York: Taylor and Francis. Regarding the other problem, it is not known what the effect of having more parameters than clusters has on model results. This has not been studied. You would need to do a simulation to see this. 

sarah posted on Tuesday, August 28, 2012  12:20 pm



Hi I am doing something similar to this topic's first message posted by Antje. I have 20 indiv measured every month for 30 years (30x12=360 obs). I have 3 items and want to do a CFA. I don't think I can for my data due to many obsv and need to do multilevel. I looked at ex. 6.14 and also 9.15 and 9.16, but it's unclear. My data is set up in a long format with each line containing one individual for each time period (with 3items, time, cluster ID). In 9.16 WITHIN = time a3; BETWEEN = x1 x2; What would be my MODEL syntax for within and between? 


There will be a new method is Version 7 that can handle your situation. 

sarah posted on Wednesday, September 05, 2012  12:20 pm



Hi I need to do what I wrote in the previous post ASAP for my dissertation. When is Version 7 coming out? Is it going to be in September or October? Is it going to allow me to conduct multi level analysis and still test for measurement invariance across time? This will definitely solve my problem. 


While waiting, you may want to study the handouts from the Version 7 course last week in Utrecht posted at http://www.statmodel.com/v7workshops.shtml 


Late summer or early fall. 


hi Dr.Muthens now, i'm analyzing twolevel CFA this model is one factor model. my code is title: this is mCFA for daily ego depletion, job engagement, CWB data: file is mCFA.txt ; variable: names = x1x3 y1y3 z1z4 id ; usevariables = x1x3 y1y3 z1z4 id ; cluster = id ; analysis: type=twolevel ; estimator = MLR ; model: %within% fw BY x1x3 y1y3 z1z4 ; %between% fb BY x1x3 y1y3 z1z4 ; x1x3@0 ; y1y3@0 ; z1z4@0 ; output: tech1 tech8 ; than result said THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. what is this mean?? please let me know about it!!! 


Please send your output and license number to support@statmodel.com. 


Dear Dr. Muthens, I try to analyze a twolevel CFA with two secondorder factors with the same lambdas for both level: analysis: type = twolevel; model: %within% kh_w by ... dk_w by ... cc_w by kh_w@1 dk_w (cc2); %between% kh_b by ... dk_b by ... cc_b by kh_b@1 dk_b (cc2); *** I get the following: MAXIMUM LOGLIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS 118332.266 THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILLCONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.135D12. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 43. (Parameter 43 is Psi of cc_w with cc_w). Can you help me? Kind regards. 


You can't identify a secondorder factor with only 2 firstorder factors as indicators; you need at least 3. 

Joao Garcez posted on Wednesday, September 27, 2017  12:29 pm



Dear Dr Muthens, I want to explore the model fit of latent variables prior to testing a structural model and I want to account for the clustering in the data (45 clusters). I saw that in the manual this is how a multilevel CFA is defined: %within% YW BY X1X5; %between% YB BY X1X5; X1X5@0; However, I am not interested in modelling the measurement model in the between level, I just want to account for the clustering. Would it be wrong to just have it in the within level, thus leaving the between empty and allowing mplus to only calculate the variances in the between level like below? %within% YW BY X1X5; %between% Thank you. 


You want to allow the X1X5 random intercepts to correlate on between. A 1factor model is a good way to accomplish this. 

Joao Garcez posted on Thursday, September 28, 2017  12:20 am



Dear Drs Muthen & Muthen, Thank you for your quick answer and availability. I really appreciate it. Have a nice day. 


I'm attempting to run a CFA of a scale (2 factors of ~10 indicators each). I have 4 repeated measures of this scale however. Running a multiple indicator CFA in wideform is resulting in more parameters estimated than observations. Can I just run the CFA at the first time point without considering the repeated measures? What are my other options? 


I'm attempting to run a CFA of a scale (2 factors of ~10 indicators each). I have 4 repeated measures of this scale however. Running a multiple indicator CFA in wideform is resulting in more parameters estimated than observations. Is it appropriate to run the CFA at the first time point only? What are my other options? 


I'm attempting to run a CFA of a scale (2 factors of ~10 indicators each). I have 4 repeated measures of this scale however. Running a multiple indicator CFA in wideform is resulting in more parameters estimated than observations. Is it appropriate to run the CFA at the first time point only? What are my other options? 


To reduce the number of parameters, you can assume measurement invariance across the 4 time points (but it's a strong assumption). Under this assumption, you can do the analysis in a 2level, long format with time as level 1 and subject as level 2. Or, you can do 2 timepoints at a time in wide format. 


Thank you for your response, Dr. Muthen. Would the 2level approach look something like this, where all the x items are in long form, indexed by time? USEVARIABLES ARE id x1x9; MISSING ARE ALL (999); CLUSTER = id; ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = MLR; MODEL: %WITHIN% fwithin1 by x1@1 x2 (1) x3 (2) x4 (3); fwithin2 by x5@1 x6 (4) x7 (5) x8 (6) x9 (7); %BETWEEN% fbetween1 by x1@1 x2 (1) x3 (2) x4 (3); fbetween2 by x5@1 x6 (4) x7 (5) x8 (6) x9 (7); 


That's one way of doing it. Assuming invariance of loadings across levels as you do, you may instead want to hold all loadings equal, instead setting the metric by fwithin1@1, fwithin2@1, and let the fbetween factors have free variances to be estimated. 


Thank you! Do you mean something like this? From what I understand, the below code allows you to estimate the factor variation at the between level (and not the withinlevel, because we have assumed invariance of time). ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = MLR; MODEL: %WITHIN% fwithin1 by x1 (4) x2 (1) x3 (2) x4 (3); fwithin1@1; fwithin2 by x5 (8) x6 (4) x7 (5) x8 (6) x9 (7); fwithin2@1; %BETWEEN% fbetween1 by x1 (4) x2 (1) x3 (2) x4 (3); fbetween2 by x5 (8) x6 (4) x7 (5) x8 (6) x9 (7); 


Right. 


You can then see how much smaller/larger the between factor variances relative to the fixed unit within factor variances. 


Dear Dr. Muthen, I´m very new to using Mplus and struggling a lot with the multilevel CFA. I used a diary study to collect my data so that I now have repeated measures (5 consequential days). That´s why I think I would have to analyze my CFA with a twolevel model with Level 1 being the subject and Level 2 being the day. Am I correct to think that then the cluster should be the variable "day"? My code looks as follows: Analysis: Type = twolevel; estimator=mlf ; ALGORITHM=EM; Model: %Within% Perm_home_w by PE01_01 PE01_02R PE01_03 PE01_04R PE01_05R PE01_06 PE01_07 PE01_08 ; Perm_work_w by PW01_01R PW01_02R PW01_03 PW01_04 PW01_05R PW01_06R PW01_07 PW01_08; Perm_home_w with Perm_work_w %Between% Perm_home_b by PE01_01 PE01_02R PE01_03 PE01_04R PE01_05R PE01_06 PE01_07 PE01_08; Perm_work_b by PW01_01R PW01_02R PW01_03 PW01_04 PW01_05R PW01_06R PW01_07 PW01_08; Perm_home_b with Perm_work_b But I receive this error message: THE VARIANCE OF PE01_02R APPROACHES 0. FIX THIS VARIANCE AND THE CORRESPONDING COVARIANCES TO 0, DECREASE THE MINIMUM VARIANCE, OR SPECIFY THE VARIABLE AS A WITHIN VARIABLE. [....] I tried to run the analysis without the Item PE01_02R but it makes no difference. I still have an error message then, "complaining" about other Items. Thank you in advance! 


Yes, cluster=day. Try fixing all betweenlevel residual variances at zero. They are typically not very big. Try using MLR instead of MLF. 

shuang posted on Sunday, November 18, 2018  11:53 pm



Dear Dr. Muthen, I did a multilevel CFA. However, SRMR at the between level is very high. Could you advise what the problem is and how to fix pls? Thank you. SRMR (Standardized Root Mean Square Residual) Value for Within 0.058 Value for Between 0.424 The code I used is as follows. ANALYSIS: TYPE = TWOLEVEL; H1ITERATIONS = 5000; MODEL: %WITHIN% TPerfW BY TPERF1 TPERF2 TPERF3 TPERF4 TPERF5 TPERF6; CPerfW BY CPERF1 CPERF2 CPERF3 CPERF4 CPERF5 CPERF6 CPERF7; SCDW BY SCD1 SCD2 SCD3 SCD4 SCD5 SCD6; EEW BY EE1 EE2 EE3 EE4; EEW WITH SCDW TPerfW CPerfW; SCDW WITH TPerfW CPerfW; TPerfW WITH CPerfW; %BETWEEN% TPerfB BY TPERF1 TPERF2 TPERF3 TPERF4 TPERF5 TPERF6; CPerfB BY CPERF1 CPERF2 CPERF3 CPERF4 CPERF5 CPERF6 CPERF7; TPERF1 TPERF2 TPERF3 TPERF4 TPERF5 TPERF6 @ 0; SCD1 SCD2 SCD3 SCD4 SCD5 SCD6 @ 0; OUTPUT: SAMPSTAT RESIDUAL TECH1 STANDARDIZED; 


SRMR on between can be high when the number of clusters is not large while at the same time the chisquare for the model is still good. You may want to consult SEMNET for analysis strategies. 

Back to top 