Message/Author 


Hi support team, I am running a CFA across three waves of longitudinal data. I would like to follow the approach by A. Farrel 1994, to test consistency of the measurement model across time. Therefore I need the factor loadings of the latent constructs to be equal across each time point. How can I model this with Mplus, if I do not run a multiple group CFA and handle three times points as three groups? Thank you for your help!! 


Please see the multiple indicator growth example in the Topic 4 course handout. The first part of the example tests for measurement invariance across time. It is not correct to use the three time points as three groups because the groups would not be independent. 

Elan Cohen posted on Tuesday, August 23, 2011  8:03 am



Dr. Muthen, I'm trying to run a longitudinal CFA over 15 years of data. There are 50 items per year (all dichotomous), 1200 observations, and 2 latent factors per year. 1. Does this seem feasible? (It seems like too many variables and not enough observations to me). 2. Could you tell me if the following input will give me the correct model (shown only for 3 years of data). MODEL: f1a5 by item1a5item25a5; f2a5 by item26a5item50a5; f1a6 by item1a6item25a6; f2a6 by item26a6item50a6; f1a7 by item1a7item25a7; f2a7 by item26a7item50a7; item1a5item25a5 pwith item1a6item25a6; item26a5item50a5 pwith item26a6item50a6; item1a6item25a6 pwith item1a7item25a7; item26a6item50a6 pwith item26a7item50a7; Thank you very much. 


With that many items and time points, I would try a twolevel approach. You would then have 50 variables and 15 members (max) per cluster, where cluster is id (subject). This preassumes measurement invariance across time. You can then specify a random intercept and slope which vary across subjects to see how factor means change over time. You can look at UG ex9.16 and combine it with ex9.15. 

Susan Pe posted on Monday, June 25, 2012  12:33 pm



I am using a Panel data with 57 firms over 30years (each year 16 observations, not all firms have 16 observations). I am trying to do CFA, but not sure if this is correct. Other than observed measures, I add under VARIABLE CLUSTER IS firms; ANALYSIS: TYPE IS TWOLEVEL; I am not sure about %within% and %between% commands, and which is more appropriate. I am also not sure if I need to add within = TIME. I am not sure whether I should fix the latent variable @1 either. Thank you!!! 


There are many ways to do this, but let me ask you some questions first. How many items are you considering at each time point and are they continuous or categorical? Note also that longitudinal data need not be analyzed as twolevel, but can be handled in a singlelevel approach where a wide instead of a long format is considered. See our handouts for Topics 34 in our courses. 

Susan Pe posted on Tuesday, June 26, 2012  2:47 am



I have 7 items, some are continuous and some are categorical. Given that I have over 12000 observations x 7items over 500 different time points, do you think using a singlelevel approach with a wide format is doable? 


No, 500 time points is not doable in wide form in the current Mplus. In the current Mplus version you want to take a twolevel approach: time and firms. This has the disadvantage that you have to assume measurement invariance across time. In the upcoming Version 7 of Mplus, however, there will be more choices, such as a 3level approach and also a the choice of allowing random forms of measurement noninvariance. 

Susan Pe posted on Wednesday, June 27, 2012  7:24 am



Thank you so much for your reply. Do you think I can do this in long format with cluster = firm; within = time ; analysis: type = twolevel random; model: %within% latent variables by observed variables; But, how do I incorporate time under model? Also, the gap between different time points are typically 2 weeks, but is longer when the year changes (couple of months). I see it in the example that time is written in the data as 0,1,2,3... Is it a problem that the time between observations varies? Is it possible to write time as dates? 


See Example 9.16 for the long format. 

Susan Pe posted on Tuesday, August 28, 2012  2:16 pm



Hi I saw the example but it's still not clear. Is this command correct? cluster = firmid; Analysis: Type = twolevel; Model: %within% RD by item1 item2 item3; I don't need specify between? 


Example 9.16 is a growth model. You are showing a factor model. What exactly are you trying to do? 

Susan Pe posted on Wednesday, August 29, 2012  2:06 pm



I want to just confirm the factor structure by doing a cfa, not estimate a growth model. I have panel data but due to too many time points I cannot do wide format, so I opted for two level model. Level 1 (WITHIN) is data points over time, level 2 (BETWEEN) is firms. I want to validate my factor structure with a confirmatory factor analysis, using longitudinal data. I am not particularly interested in BETWEEN and WITHIN level differences. So I am not sure which factor scores to look at or whether I am using the commands 


If you have a measurement instrument that is given at many time points, you can do twolevel factor analysis with data in long format. This assumes meas. invar. across time. You say: %within% fw by y1y10; %betweeen% fb by y1y10; although you can have a different number of factors on the two levels. 

June Zhou posted on Wednesday, November 07, 2012  7:47 am



I have a question on interpretation of longitudinal invariance CFA results. Can I interpret as "The responses on items are consistent across time given that the intercept invariance or strong invariance is hold (factor loading and thresholds were constrained)"? 


If you have intercept and factor loading invariance, you can say that. 

June Zhou posted on Sunday, November 11, 2012  9:24 am



Thank you so much for your reply, Dr. Muthen. 


Dear Linda and Bengt, I am looking for help coding a twolevel longitudinal CFA which uses a long data structure. Level 1 = time and level 2 = participant. For example:  Title: Twolevel multilevel CFA with long data Data: File is example.dat ; Variable: Names are id time y1 y2 y3; ! Assume that I have 4 timepoints, i.e. within each yvariable column participants have four entries each pertaining to a separate timepoint. WITHIN = time; CLUSTER = id; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% f1 BY y1 y2 y3; %BETWEEN% f1 on time;  I realise the inputs to %WITHIN% and %BETWEEN% are incorrect. Where would I estimate a CFA for each timepoint and where would I then estimate its slope? One cannot estimate the factor at the within level (UG example 9.15) with a long data structure (UG example 9.16). Any guidance with specific code would be greatly appreciated. Kind regards, Matthew 


Time is Within, so say: %WITHIN% f1 BY y1 y2 y3; f1 on time; %BETWEEN% f1b by y1y3; And perhaps hold loadings equal across levels. The long, twolevel approach assume measurement invariance across time. 


Dear Bengt, Many thanks for your response. There are a few things I would like to clarify. I am having problems interpreting the CFA results at the within and between levels in the model you describe. From my understanding, the %within% portion of the model estimates a factor which predicts the (co)variation in Y1Y3, based on data organized by the four timepoints, and how this factor changes over time. Latent variables based on random intercepts for Y1Y3 are then used as factor indicators at the %between% level. In other words, the %between% portion reflects how participant id predicts (co)variation in random intercepts for Y1Y3. Critically, whilst the %within% level CFA uses observations based on each timepoint, timepoints are clustered by participant ID. Q1: To understand how a factor changes over time, the %within% portion is enough as it a hybrid of withinlevel (time) and betweenlevel (cluster ID) data. Q2. What is the point of %between%. The model works fine without it, except that model fit estimates reduce greatly. Q3. Would I add covariates, both timevarying and timeinvariant, to the %within% portion of the model? Q4. Which CFA results does one use to understand the factors which predict data for a single group of participants collapsed across time points, and how that factor changes over time? 


The Within level captures the variation across time and the Between across subject. This gives the answers: Q1: The Within is not a hybrid. Q2: You certainly want to know how big the variation is across subjects Q3: Timevarying covariates go on Within and Timeinvariant covariates go on Between. Q4: I will send you a paper of mine that is coming out in SM&R. See the section on random effects and how intercept variation across subjects can be seen as a form of measurement invariance. 


Dear Bengt, Thanks so much for clarifying that. It makes sense now: a within level factor predicts the covariances in an indicator across time, whereas the between level factor predicts the covariances in an indicator across people. I have a few followup questions: 1. In the twolevel longitudinal model, what does it mean to predict a factor with time (e.g., ‘F1 on Time;’). Does the resultant beta value indicate a factor’s temporal stability? 2a. A singlelevel linear growth approach (i.e. assessing how a factor changes across time) suits my needs, but what is it about the factor that changes? In the command: f1@0 f2@1 f3@2; are we estimating how the factor’s predictive strength of covariances in a set of indicators changes across time? 2b. When we correlate two factors (e.g., f1 WITH z1), are we examining the covariance between their predictive strengths? 2c. A factor’s predictive strength is different from factor scores, which estimate one’s position on a factor (regardless of its predictive strength), right? To examine how factor scores change over time, we would have to extract scores (e.g., ‘SAVE IS Fscores;’) and use those as the observed measures of a growth model. Your thoughts would be most appreciated. Matthew 


I don't think you should try to learn this topic by a series of Discussion questions. Instead find papers/books about it such as Grimm's new growth book. But here are brief answers: 1. It means that F1 has a linear trend that you want to describe. 2a. See our multipleindicator growth video from Hopkins short courses. 2b2c. Ask general analysis questions not pertaining to Mplus specifically on SEMNET. 


Bengt, I have been reading Grimm's 'Growth Modelling' book and would recommended it to anyone running growth models in MPlus. Thank you for the recommendation. I want to clarify a few things for those new to longitudinal CFA which tripped me up: 1. There are two general approaches to growth modelling  multilevel modelling (MLM) and structural equation modelling (SEM). Statistically, these approaches are very similar. Practically, they each have their pros and cons. Mplus can run both types of techniques (and even integrate them, see UG 9.10). 


2. We create factors or latent variables in growth models even if our goal isn’t a CFA. For example, to understand how math ability changed over time, I would estimate a slope and interceptboth of which are latent variablesfor maths scores at each timepoint (TP). This can be confusing for those who want to run a CFA over time, because we're essentially creating latent growth variables of latent factors. 3. A CFA run via the SEM approach (e.g., F1 by depr1 depr25; F2 by anx1anx14;) requires a sufficient number of indicators to identify the model. To run a CFA for each TP (e.g., F1a by depr1_1depr1_25; F1b by depr2_1depr2_25; F2a by anx1_1anx1_14; F2b by anx2_1anx2_14; i_depr s_depr  F1a@0 F1b@1; i_anx s_anx  F2a@0 F2b@1;) is thus a resourcehungry process, especially when we have at least 3 TPs and a questionnaires' worth of indicators. An alternative is to use MLM (‘two/threelevel’ in Mplus) with long data. This allows you to estimate a single factor across TPs (rather than a factor per TP). But you are also partitioning the variance, such that a withinlevel factor is derived from covariances between TPs lumped across participants (see ‘P data’), whereas the betweenlevel factor is based on interindividual differences in covariances lumped across TPs. 


Bengt, Your guidance has been invaluable thus far. I wondered whether you could address the following: 1. Is there a way of running nonlinear twolevel CFAs for longdata (where WITHIN = Time;). I suspect that we'd have to manipulate the column of time values from linear (e.g., 1, 2, 3, 4) to nonlinear (e.g., 1, 1.3, 1.7, 2.4, 3.9). 2. How can we even tell if a withinlevel factor has a nonlinear trend, if we cannot estimate means at each timepoint? Many thanks, Matthew 


1. CFAs that have indicators which are nonlinear in the factors can be handled via XWITH. Models that are nonlinear in time can be handled via time scores as you indicate or more generally as shown in the Grimm book. 2. You can estimate factor means if you do the analysis in wide format. Or, using a much more advanced approach, you can use Crossclassified growth modeling (I don't volunteer to guide you through it though; see our Utrecht Aug 2012 short course handouts and videos); see also Version 8 coming out next week. Given that you didn't ask questions in your first 2 posting, I will let it go by that you violated our rule of posting in no more than one window. 


Dr. Muthen, After reading several threads, I am still unsure how the navigate the modelling of a longitudinal CFA with clustering. Set up: I am working with a latent factor "governance" which is measured by 6 manifest indicators (VA PS GE RQ RL CC). Additionally, this data is drawn across multiple countries (ccode) at multiple time periods (year). So far, this is my model: variable: names are ccode year VA PS GE RQ RL CC; Missing are all (999); usevariables = year VA PS GE RQ RL CC; cluster = ccode; within = year; analysis: type = twolevel random; estimator = mlr; However, the more I read up the %within% and %between% commands, the less I understand exactly how to code these for my specific problem. Any assistance you could provide would be appreciated.  Justin 


Are the data for the multiple time points collected for the same individuals or different ones? How many countries do you have? 


The data collected is for the same individuals across time (the countries/ territories) and there are about 180 of them 


If you don't have too many time points you can approach it like in slide 165 of our handout for our Topic 1 short course, that is, in wide, singlelevel format. Then add the twolevel feature for the 180 clusters. 


Dear Bengt and Linda, Goal: Compare the model fit between a bifactor, secondorder, and onefactor CFA. Implementation: Twolevel long data approach (a wide data approach is too computationally taxing + I want to use all available data points). Example syntax (bifactor): https://goo.gl/EybTmW Questions: a. Do you see any problems with the syntax? b. DIFFTEST is not available for twolevel models: is there another way to compare twolevel models (either via Mplus or manually)? c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2 S_SPEC3  GEN SPEC1 SPEC2 SPEC3 by TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a twolevel, long format CFA? d. Is it possible to increase the speed of each iteration (e.g., via hardware improvements)? e. I seem to be able to run two analyses simultaneously without a cost in speed. Are there any issues with this? What if I were to run 34 analyses each with a ‘very heavy’ number of dimensions for integration? 


When the syntax doesn't fit in one window, please send the output to Support along with license number. 


I'm comparing bifactor, secondorder, and onefactor CFAs with growth factors using a twolevel long data approach. Example syntax (bifactor): USEVARIABLES = TIME ID V1V20; CATEGORICAL = V1V20; MISSING ARE all (999); CLUSTER = ID; WITHIN = TIME V1V20; BETWEEN = ;  ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = WLSMV; MODEL = NOCOVARIANCES; MODEL: %within% GEN by V1* V2V20; GEN@1; SPEC1 by V1* V2V10; SPEC1@1; SPEC2 by V10* V11V20; SPEC2@1; GEN SPEC1 SPEC2 by TIME; %between% ;  SAVEDATA: DIFFTEST IS BIFAC.dat; a. DIFFTEST is not available for twolevel models: is there another way to compare twolevel models (either via Mplus or manually)? c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2  GEN SPEC1 SPEC2 on TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a twolevel, long format CFA? 


If you are using an estimator that requires DIFFTEST, I know of no other way to do a difference test. 


Dear Dr.Muthens, We want to conduct a longitudinal CFA. We have a multilevel dataset (respondents from different companies) gathered via surveys at two time points. This would be a three level data; time, individuals and companies. The thing is that we do not have the follow up of the same individuals. At the first time point in one company we have for example 50 answers for each item whereas in the second time point maybe we have 30 for the same company. We have the dataset organized in wide format so we have in that example 20 cases with missing data for that company in the second measurement point. Our questions are: a) if we want to conduct a longitudinal cfa in order to test for measurement invariance, does it make sense to conduct a twolevel analysis (having data in wide format for different time points that level is not included right?) for companies and individuals even not having the follow up of the individuals? Or is it better to aggregate to company level and conducting the longitudinal cfa? b) Is it necessary to include a mean structure in a longitudinal cfa? Thank you very much for your help in advance. 


In the case of the approach highlighted in this thread (multilevel factor analyses, data structured by time in long format), is it possible to save factor scores (or, say, Bayesian Plausible values) for each factor x time point even though the model will 'aggregate' across the time points when generating the overall 'single' factor(s) that one models? Or would it be necessary to save factor scores for each factor x time point in a separate standalone analyses? In short, would one be able to save within time factor scores for each person in the multilevel setup? Thanks in advance for insight. 


So time is L1 and subject is L2? 


Yes, time is L1 and subject is L2. Each indicator for each factor is modeled for each time point for each person. 


Saving factor x time factor scores would seem possible only with Type=Crossclassified in a time series context (DSEM/RDSEM), which requires quite a few time points (> 15?). Crossclassified allows the factor scores to vary across time  even though the data are in long format. 


Thanks, Bengt, this was my inclination. It leaves me with a followup which is: in a ML CFA framework, when the model 'aggregates' over time at the within level (e.g., when estimating the factor(s)), what does the factor then become for each observation? That is to say, what source of variation does the 'aggregate' factor reflect? Wouldn't it essentially becomes a betweensubjects average for each person (grand mean, not person mean) at L2? Thus, if a factor or set of factors estimated in the ML setup were to be saved as fac scores, they would reflect betweenperson variation, correct? My instincts incline me to this interpretation because one is averaging across time rather than separately estimating the factor structure at each time point at the withintime level which (if executed) would yield a given set of factors depending on how many time points the investigator was modeling across. 


Level 1 concerns withinsubject variation, that is variation across time. Use a betweenlevel factor to pull out the variation across subjects. You may also want to consult the latent statetrait literature  there are a lot of writings on this for cases where you don't have a lot of time points. That's a clean way of looking at things. In that literature, you also consider that observations at different time points correlate not only because of them coming from the same person but also because observations at one time point may influence the next time point. 


Thanks, Bengt. That jives with my understanding. Final question: If one in fact runs an ML CFA in Mplus, any factor score that was saved (I can't recall from the UG if FAC scores in ML CFA are available but that's irrelevant to my question right now), would be at the betweensubjects level, yes? If saving factor scores in ML is not an option, obviously above question is moot. 


You can save factor scores for both the within and betweenlevel factor. 


I see, and the saved withinlevel factor score would be 1 factor or set of factors (depending on model etc.) aggregated across time, regardless of the # of time points modeled. 


Right. 


Curious to know the implications of the following scenario based on the above: One estimates the within subject factor in the ML FA setup. This aggregated factor for each person is then used in a separate ML growth analysis where it is used as an indicator for each persons factor score at each of 1....k time points (the same # of time points the original ML FA was based on). Because the factor indicator for each person's 1....k time points in the second growth analyses will be the same within aggregate score generated in the original ML FA, *assuming no other steps taken*, how would it be possible to track growth in a factor score that is ostensibly the same across individual time points in the 2nd (growth) analysis? 


But if that's where you're heading, why not just do the longitudinal modeling in wide format to begin with? 


Let's assume it is because more efficient in the within setting with this sort of model and categorical indicators and that the wide format has already been attempted. In the example above, even if one takes the aggregate within factor(s) and estimates values from that average for each person at different time points, wouldn't one be starting that estimation process from the same value (whereas in contrast, if one had scores from the wide format, the estimation process would start with different person x time values)? 


With twolevel (L1=time, L2= subject), the L1 factor score would be the same at all time points so not suitable for growth modeling. Sounds like you may want crossclassified analysis (time x subject) where the factor score can change over time. 


Thanks Bengt. Yes, crossclassified would be the ideal (esp. as relates to MI). Two brief further questions: 1)Is the ONLY reason you say that the within time factor described above is not suitable for GModeling because one can not establish MI in this setup? For example, if (assuming MI) in a secondary GLM analyses one estimates a distribution on the aggregate within factor(s) at each time point for each person, would that approach be formally suitable? 2)If one regresses the within time factor aggregate on a time dummy (again assuming MI) what precisely would the betas reflect? Only betweensubjects change (variance) since one is only generating a fixed effect in that scenario? 


If you have a time variable on Within, you can of course have change over time of the Within factor  e.g., you can do growth modeling there. When you say time as a dummy, I assume you don't want to commit to a specific growth function. If not committing to a growth function is the goal, I think you'll like Type=Crossclassified. We discuss examples of that in our time series analysis Short Course Topic 12 on our website. 


Thanks Bengt, this is precisely how I was thinking about it. And crossclassified is what I intend to consider using. But to be sure, saving a within time factor score (ML CFA) and then using that factor score or set of factor scores (aggregated over time) as a person x time point indicator in a *separate* ML growth model would not be best approach per se, correct? Would your answer to the above change if MI was assumed? 


Q1: C9orrect. Q2: Yes, MI is assumed. Using Crossclassified, you don't have to assume that because measurement parameters can vary over time. 

Back to top 