Message/Author 


Hi support team, I am running a CFA across three waves of longitudinal data. I would like to follow the approach by A. Farrel 1994, to test consistency of the measurement model across time. Therefore I need the factor loadings of the latent constructs to be equal across each time point. How can I model this with Mplus, if I do not run a multiple group CFA and handle three times points as three groups? Thank you for your help!! 


Please see the multiple indicator growth example in the Topic 4 course handout. The first part of the example tests for measurement invariance across time. It is not correct to use the three time points as three groups because the groups would not be independent. 

Elan Cohen posted on Tuesday, August 23, 2011  8:03 am



Dr. Muthen, I'm trying to run a longitudinal CFA over 15 years of data. There are 50 items per year (all dichotomous), 1200 observations, and 2 latent factors per year. 1. Does this seem feasible? (It seems like too many variables and not enough observations to me). 2. Could you tell me if the following input will give me the correct model (shown only for 3 years of data). MODEL: f1a5 by item1a5item25a5; f2a5 by item26a5item50a5; f1a6 by item1a6item25a6; f2a6 by item26a6item50a6; f1a7 by item1a7item25a7; f2a7 by item26a7item50a7; item1a5item25a5 pwith item1a6item25a6; item26a5item50a5 pwith item26a6item50a6; item1a6item25a6 pwith item1a7item25a7; item26a6item50a6 pwith item26a7item50a7; Thank you very much. 


With that many items and time points, I would try a twolevel approach. You would then have 50 variables and 15 members (max) per cluster, where cluster is id (subject). This preassumes measurement invariance across time. You can then specify a random intercept and slope which vary across subjects to see how factor means change over time. You can look at UG ex9.16 and combine it with ex9.15. 

Susan Pe posted on Monday, June 25, 2012  12:33 pm



I am using a Panel data with 57 firms over 30years (each year 16 observations, not all firms have 16 observations). I am trying to do CFA, but not sure if this is correct. Other than observed measures, I add under VARIABLE CLUSTER IS firms; ANALYSIS: TYPE IS TWOLEVEL; I am not sure about %within% and %between% commands, and which is more appropriate. I am also not sure if I need to add within = TIME. I am not sure whether I should fix the latent variable @1 either. Thank you!!! 


There are many ways to do this, but let me ask you some questions first. How many items are you considering at each time point and are they continuous or categorical? Note also that longitudinal data need not be analyzed as twolevel, but can be handled in a singlelevel approach where a wide instead of a long format is considered. See our handouts for Topics 34 in our courses. 

Susan Pe posted on Tuesday, June 26, 2012  2:47 am



I have 7 items, some are continuous and some are categorical. Given that I have over 12000 observations x 7items over 500 different time points, do you think using a singlelevel approach with a wide format is doable? 


No, 500 time points is not doable in wide form in the current Mplus. In the current Mplus version you want to take a twolevel approach: time and firms. This has the disadvantage that you have to assume measurement invariance across time. In the upcoming Version 7 of Mplus, however, there will be more choices, such as a 3level approach and also a the choice of allowing random forms of measurement noninvariance. 

Susan Pe posted on Wednesday, June 27, 2012  7:24 am



Thank you so much for your reply. Do you think I can do this in long format with cluster = firm; within = time ; analysis: type = twolevel random; model: %within% latent variables by observed variables; But, how do I incorporate time under model? Also, the gap between different time points are typically 2 weeks, but is longer when the year changes (couple of months). I see it in the example that time is written in the data as 0,1,2,3... Is it a problem that the time between observations varies? Is it possible to write time as dates? 


See Example 9.16 for the long format. 

Susan Pe posted on Tuesday, August 28, 2012  2:16 pm



Hi I saw the example but it's still not clear. Is this command correct? cluster = firmid; Analysis: Type = twolevel; Model: %within% RD by item1 item2 item3; I don't need specify between? 


Example 9.16 is a growth model. You are showing a factor model. What exactly are you trying to do? 

Susan Pe posted on Wednesday, August 29, 2012  2:06 pm



I want to just confirm the factor structure by doing a cfa, not estimate a growth model. I have panel data but due to too many time points I cannot do wide format, so I opted for two level model. Level 1 (WITHIN) is data points over time, level 2 (BETWEEN) is firms. I want to validate my factor structure with a confirmatory factor analysis, using longitudinal data. I am not particularly interested in BETWEEN and WITHIN level differences. So I am not sure which factor scores to look at or whether I am using the commands 


If you have a measurement instrument that is given at many time points, you can do twolevel factor analysis with data in long format. This assumes meas. invar. across time. You say: %within% fw by y1y10; %betweeen% fb by y1y10; although you can have a different number of factors on the two levels. 

June Zhou posted on Wednesday, November 07, 2012  7:47 am



I have a question on interpretation of longitudinal invariance CFA results. Can I interpret as "The responses on items are consistent across time given that the intercept invariance or strong invariance is hold (factor loading and thresholds were constrained)"? 


If you have intercept and factor loading invariance, you can say that. 

June Zhou posted on Sunday, November 11, 2012  9:24 am



Thank you so much for your reply, Dr. Muthen. 


Dear Linda and Bengt, I am looking for help coding a twolevel longitudinal CFA which uses a long data structure. Level 1 = time and level 2 = participant. For example:  Title: Twolevel multilevel CFA with long data Data: File is example.dat ; Variable: Names are id time y1 y2 y3; ! Assume that I have 4 timepoints, i.e. within each yvariable column participants have four entries each pertaining to a separate timepoint. WITHIN = time; CLUSTER = id; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% f1 BY y1 y2 y3; %BETWEEN% f1 on time;  I realise the inputs to %WITHIN% and %BETWEEN% are incorrect. Where would I estimate a CFA for each timepoint and where would I then estimate its slope? One cannot estimate the factor at the within level (UG example 9.15) with a long data structure (UG example 9.16). Any guidance with specific code would be greatly appreciated. Kind regards, Matthew 


Time is Within, so say: %WITHIN% f1 BY y1 y2 y3; f1 on time; %BETWEEN% f1b by y1y3; And perhaps hold loadings equal across levels. The long, twolevel approach assume measurement invariance across time. 


Dear Bengt, Many thanks for your response. There are a few things I would like to clarify. I am having problems interpreting the CFA results at the within and between levels in the model you describe. From my understanding, the %within% portion of the model estimates a factor which predicts the (co)variation in Y1Y3, based on data organized by the four timepoints, and how this factor changes over time. Latent variables based on random intercepts for Y1Y3 are then used as factor indicators at the %between% level. In other words, the %between% portion reflects how participant id predicts (co)variation in random intercepts for Y1Y3. Critically, whilst the %within% level CFA uses observations based on each timepoint, timepoints are clustered by participant ID. Q1: To understand how a factor changes over time, the %within% portion is enough as it a hybrid of withinlevel (time) and betweenlevel (cluster ID) data. Q2. What is the point of %between%. The model works fine without it, except that model fit estimates reduce greatly. Q3. Would I add covariates, both timevarying and timeinvariant, to the %within% portion of the model? Q4. Which CFA results does one use to understand the factors which predict data for a single group of participants collapsed across time points, and how that factor changes over time? 


The Within level captures the variation across time and the Between across subject. This gives the answers: Q1: The Within is not a hybrid. Q2: You certainly want to know how big the variation is across subjects Q3: Timevarying covariates go on Within and Timeinvariant covariates go on Between. Q4: I will send you a paper of mine that is coming out in SM&R. See the section on random effects and how intercept variation across subjects can be seen as a form of measurement invariance. 


Dear Bengt, Thanks so much for clarifying that. It makes sense now: a within level factor predicts the covariances in an indicator across time, whereas the between level factor predicts the covariances in an indicator across people. I have a few followup questions: 1. In the twolevel longitudinal model, what does it mean to predict a factor with time (e.g., ‘F1 on Time;’). Does the resultant beta value indicate a factor’s temporal stability? 2a. A singlelevel linear growth approach (i.e. assessing how a factor changes across time) suits my needs, but what is it about the factor that changes? In the command: f1@0 f2@1 f3@2; are we estimating how the factor’s predictive strength of covariances in a set of indicators changes across time? 2b. When we correlate two factors (e.g., f1 WITH z1), are we examining the covariance between their predictive strengths? 2c. A factor’s predictive strength is different from factor scores, which estimate one’s position on a factor (regardless of its predictive strength), right? To examine how factor scores change over time, we would have to extract scores (e.g., ‘SAVE IS Fscores;’) and use those as the observed measures of a growth model. Your thoughts would be most appreciated. Matthew 


I don't think you should try to learn this topic by a series of Discussion questions. Instead find papers/books about it such as Grimm's new growth book. But here are brief answers: 1. It means that F1 has a linear trend that you want to describe. 2a. See our multipleindicator growth video from Hopkins short courses. 2b2c. Ask general analysis questions not pertaining to Mplus specifically on SEMNET. 


Bengt, I have been reading Grimm's 'Growth Modelling' book and would recommended it to anyone running growth models in MPlus. Thank you for the recommendation. I want to clarify a few things for those new to longitudinal CFA which tripped me up: 1. There are two general approaches to growth modelling  multilevel modelling (MLM) and structural equation modelling (SEM). Statistically, these approaches are very similar. Practically, they each have their pros and cons. Mplus can run both types of techniques (and even integrate them, see UG 9.10). 


2. We create factors or latent variables in growth models even if our goal isn’t a CFA. For example, to understand how math ability changed over time, I would estimate a slope and interceptboth of which are latent variablesfor maths scores at each timepoint (TP). This can be confusing for those who want to run a CFA over time, because we're essentially creating latent growth variables of latent factors. 3. A CFA run via the SEM approach (e.g., F1 by depr1 depr25; F2 by anx1anx14;) requires a sufficient number of indicators to identify the model. To run a CFA for each TP (e.g., F1a by depr1_1depr1_25; F1b by depr2_1depr2_25; F2a by anx1_1anx1_14; F2b by anx2_1anx2_14; i_depr s_depr  F1a@0 F1b@1; i_anx s_anx  F2a@0 F2b@1;) is thus a resourcehungry process, especially when we have at least 3 TPs and a questionnaires' worth of indicators. An alternative is to use MLM (‘two/threelevel’ in Mplus) with long data. This allows you to estimate a single factor across TPs (rather than a factor per TP). But you are also partitioning the variance, such that a withinlevel factor is derived from covariances between TPs lumped across participants (see ‘P data’), whereas the betweenlevel factor is based on interindividual differences in covariances lumped across TPs. 


Bengt, Your guidance has been invaluable thus far. I wondered whether you could address the following: 1. Is there a way of running nonlinear twolevel CFAs for longdata (where WITHIN = Time;). I suspect that we'd have to manipulate the column of time values from linear (e.g., 1, 2, 3, 4) to nonlinear (e.g., 1, 1.3, 1.7, 2.4, 3.9). 2. How can we even tell if a withinlevel factor has a nonlinear trend, if we cannot estimate means at each timepoint? Many thanks, Matthew 


1. CFAs that have indicators which are nonlinear in the factors can be handled via XWITH. Models that are nonlinear in time can be handled via time scores as you indicate or more generally as shown in the Grimm book. 2. You can estimate factor means if you do the analysis in wide format. Or, using a much more advanced approach, you can use Crossclassified growth modeling (I don't volunteer to guide you through it though; see our Utrecht Aug 2012 short course handouts and videos); see also Version 8 coming out next week. Given that you didn't ask questions in your first 2 posting, I will let it go by that you violated our rule of posting in no more than one window. 


Dr. Muthen, After reading several threads, I am still unsure how the navigate the modelling of a longitudinal CFA with clustering. Set up: I am working with a latent factor "governance" which is measured by 6 manifest indicators (VA PS GE RQ RL CC). Additionally, this data is drawn across multiple countries (ccode) at multiple time periods (year). So far, this is my model: variable: names are ccode year VA PS GE RQ RL CC; Missing are all (999); usevariables = year VA PS GE RQ RL CC; cluster = ccode; within = year; analysis: type = twolevel random; estimator = mlr; However, the more I read up the %within% and %between% commands, the less I understand exactly how to code these for my specific problem. Any assistance you could provide would be appreciated.  Justin 


Are the data for the multiple time points collected for the same individuals or different ones? How many countries do you have? 


The data collected is for the same individuals across time (the countries/ territories) and there are about 180 of them 


If you don't have too many time points you can approach it like in slide 165 of our handout for our Topic 1 short course, that is, in wide, singlelevel format. Then add the twolevel feature for the 180 clusters. 


Dear Bengt and Linda, Goal: Compare the model fit between a bifactor, secondorder, and onefactor CFA. Implementation: Twolevel long data approach (a wide data approach is too computationally taxing + I want to use all available data points). Example syntax (bifactor): https://goo.gl/EybTmW Questions: a. Do you see any problems with the syntax? b. DIFFTEST is not available for twolevel models: is there another way to compare twolevel models (either via Mplus or manually)? c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2 S_SPEC3  GEN SPEC1 SPEC2 SPEC3 by TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a twolevel, long format CFA? d. Is it possible to increase the speed of each iteration (e.g., via hardware improvements)? e. I seem to be able to run two analyses simultaneously without a cost in speed. Are there any issues with this? What if I were to run 34 analyses each with a ‘very heavy’ number of dimensions for integration? 


When the syntax doesn't fit in one window, please send the output to Support along with license number. 


I'm comparing bifactor, secondorder, and onefactor CFAs with growth factors using a twolevel long data approach. Example syntax (bifactor): USEVARIABLES = TIME ID V1V20; CATEGORICAL = V1V20; MISSING ARE all (999); CLUSTER = ID; WITHIN = TIME V1V20; BETWEEN = ;  ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = WLSMV; MODEL = NOCOVARIANCES; MODEL: %within% GEN by V1* V2V20; GEN@1; SPEC1 by V1* V2V10; SPEC1@1; SPEC2 by V10* V11V20; SPEC2@1; GEN SPEC1 SPEC2 by TIME; %between% ;  SAVEDATA: DIFFTEST IS BIFAC.dat; a. DIFFTEST is not available for twolevel models: is there another way to compare twolevel models (either via Mplus or manually)? c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2  GEN SPEC1 SPEC2 on TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a twolevel, long format CFA? 


If you are using an estimator that requires DIFFTEST, I know of no other way to do a difference test. 

Back to top 