I am running a CFA across three waves of longitudinal data. I would like to follow the approach by A. Farrel 1994, to test consistency of the measurement model across time. Therefore I need the factor loadings of the latent constructs to be equal across each time point. How can I model this with Mplus, if I do not run a multiple group CFA and handle three times points as three groups?
Please see the multiple indicator growth example in the Topic 4 course handout. The first part of the example tests for measurement invariance across time. It is not correct to use the three time points as three groups because the groups would not be independent.
Elan Cohen posted on Tuesday, August 23, 2011 - 8:03 am
I'm trying to run a longitudinal CFA over 15 years of data. There are 50 items per year (all dichotomous), 1200 observations, and 2 latent factors per year.
1. Does this seem feasible? (It seems like too many variables and not enough observations to me).
2. Could you tell me if the following input will give me the correct model (shown only for 3 years of data).
MODEL: f1a5 by item1a5-item25a5; f2a5 by item26a5-item50a5; f1a6 by item1a6-item25a6; f2a6 by item26a6-item50a6; f1a7 by item1a7-item25a7; f2a7 by item26a7-item50a7; item1a5-item25a5 pwith item1a6-item25a6; item26a5-item50a5 pwith item26a6-item50a6; item1a6-item25a6 pwith item1a7-item25a7; item26a6-item50a6 pwith item26a7-item50a7;
With that many items and time points, I would try a twolevel approach. You would then have 50 variables and 15 members (max) per cluster, where cluster is id (subject). This pre-assumes measurement invariance across time. You can then specify a random intercept and slope which vary across subjects to see how factor means change over time. You can look at UG ex9.16 and combine it with ex9.15.
Susan Pe posted on Monday, June 25, 2012 - 12:33 pm
I am using a Panel data with 57 firms over 30years (each year 16 observations, not all firms have 16 observations). I am trying to do CFA, but not sure if this is correct.
Other than observed measures, I add under VARIABLE CLUSTER IS firms; ANALYSIS: TYPE IS TWOLEVEL;
I am not sure about %within% and %between% commands, and which is more appropriate. I am also not sure if I need to add within = TIME. I am not sure whether I should fix the latent variable @1 either. Thank you!!!
There are many ways to do this, but let me ask you some questions first. How many items are you considering at each time point and are they continuous or categorical? Note also that longitudinal data need not be analyzed as twolevel, but can be handled in a single-level approach where a wide instead of a long format is considered. See our handouts for Topics 3-4 in our courses.
Susan Pe posted on Tuesday, June 26, 2012 - 2:47 am
I have 7 items, some are continuous and some are categorical. Given that I have over 12000 observations x 7items over 500 different time points, do you think using a single-level approach with a wide format is doable?
No, 500 time points is not doable in wide form in the current Mplus. In the current Mplus version you want to take a two-level approach: time and firms. This has the disadvantage that you have to assume measurement invariance across time. In the upcoming Version 7 of Mplus, however, there will be more choices, such as a 3-level approach and also a the choice of allowing random forms of measurement non-invariance.
Susan Pe posted on Wednesday, June 27, 2012 - 7:24 am
Thank you so much for your reply. Do you think I can do this in long format with cluster = firm; within = time ; analysis: type = twolevel random; model: %within% latent variables by observed variables; But, how do I incorporate time under model? Also, the gap between different time points are typically 2 weeks, but is longer when the year changes (couple of months). I see it in the example that time is written in the data as 0,1,2,3... Is it a problem that the time between observations varies? Is it possible to write time as dates?
Example 9.16 is a growth model. You are showing a factor model. What exactly are you trying to do?
Susan Pe posted on Wednesday, August 29, 2012 - 2:06 pm
I want to just confirm the factor structure by doing a cfa, not estimate a growth model. I have panel data but due to too many time points I cannot do wide format, so I opted for two level model. Level 1 (WITHIN) is data points over time, level 2 (BETWEEN) is firms. I want to validate my factor structure with a confirmatory factor analysis, using longitudinal data. I am not particularly interested in BETWEEN and WITHIN level differences. So I am not sure which factor scores to look at or whether I am using the commands
If you have a measurement instrument that is given at many time points, you can do two-level factor analysis with data in long format. This assumes meas. invar. across time. You say:
%within% fw by y1-y10;
%betweeen% fb by y1-y10;
although you can have a different number of factors on the two levels.
June Zhou posted on Wednesday, November 07, 2012 - 7:47 am
I have a question on interpretation of longitudinal invariance CFA results. Can I interpret as "The responses on items are consistent across time given that the intercept invariance or strong invariance is hold (factor loading and thresholds were constrained)"?
I am looking for help coding a two-level longitudinal CFA which uses a long data structure. Level 1 = time and level 2 = participant.
----------------- Title: Twolevel multilevel CFA with long data Data: File is example.dat ; Variable: Names are id time y1 y2 y3; ! Assume that I have 4 time-points, i.e. within each y-variable column participants have four entries each pertaining to a separate time-point.
WITHIN = time; CLUSTER = id; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% f1 BY y1 y2 y3;
%BETWEEN% f1 on time;
I realise the inputs to %WITHIN% and %BETWEEN% are incorrect. Where would I estimate a CFA for each time-point and where would I then estimate its slope? One cannot estimate the factor at the within level (UG example 9.15) with a long data structure (UG example 9.16).
Any guidance with specific code would be greatly appreciated.
Many thanks for your response. There are a few things I would like to clarify.
I am having problems interpreting the CFA results at the within and between levels in the model you describe.
From my understanding, the %within% portion of the model estimates a factor which predicts the (co)variation in Y1-Y3, based on data organized by the four time-points, and how this factor changes over time.
Latent variables based on random intercepts for Y1-Y3 are then used as factor indicators at the %between% level. In other words, the %between% portion reflects how participant id predicts (co)variation in random intercepts for Y1-Y3.
Critically, whilst the %within% level CFA uses observations based on each time-point, time-points are clustered by participant ID.
Q1: To understand how a factor changes over time, the %within% portion is enough as it a hybrid of within-level (time) and between-level (cluster ID) data.
Q2. What is the point of %between%. The model works fine without it, except that model fit estimates reduce greatly.
Q3. Would I add covariates, both time-varying and time-invariant, to the %within% portion of the model?
Q4. Which CFA results does one use to understand the factors which predict data for a single group of participants collapsed across time points, and how that factor changes over time?
Thanks so much for clarifying that. It makes sense now: a within level factor predicts the covariances in an indicator across time, whereas the between level factor predicts the covariances in an indicator across people.
I have a few follow-up questions:
1. In the two-level longitudinal model, what does it mean to predict a factor with time (e.g., ‘F1 on Time;’). Does the resultant beta value indicate a factor’s temporal stability?
2a. A single-level linear growth approach (i.e. assessing how a factor changes across time) suits my needs, but what is it about the factor that changes? In the command: f1@0f2@1f3@2; are we estimating how the factor’s predictive strength of covariances in a set of indicators changes across time?
2b. When we correlate two factors (e.g., f1 WITH z1), are we examining the covariance between their predictive strengths?
2c. A factor’s predictive strength is different from factor scores, which estimate one’s position on a factor (regardless of its predictive strength), right? To examine how factor scores change over time, we would have to extract scores (e.g., ‘SAVE IS Fscores;’) and use those as the observed measures of a growth model.
Bengt, I have been reading Grimm's 'Growth Modelling' book and would recommended it to anyone running growth models in MPlus. Thank you for the recommendation.
I want to clarify a few things for those new to longitudinal CFA which tripped me up:
1. There are two general approaches to growth modelling - multilevel modelling (MLM) and structural equation modelling (SEM). Statistically, these approaches are very similar. Practically, they each have their pros and cons. Mplus can run both types of techniques (and even integrate them, see UG 9.10).
2. We create factors or latent variables in growth models even if our goal isn’t a CFA. For example, to understand how math ability changed over time, I would estimate a slope and intercept-both of which are latent variables-for maths scores at each time-point (TP). This can be confusing for those who want to run a CFA over time, because we're essentially creating latent growth variables of latent factors.
3. A CFA run via the SEM approach (e.g., F1 by depr1- depr25; F2 by anx1-anx14;) requires a sufficient number of indicators to identify the model. To run a CFA for each TP (e.g., F1a by depr1_1-depr1_25; F1b by depr2_1-depr2_25; F2a by anx1_1-anx1_14; F2b by anx2_1-anx2_14; i_depr s_depr | F1a@0F1b@1; i_anx s_anx | F2a@0F2b@1;) is thus a resource-hungry process, especially when we have at least 3 TPs and a questionnaires' worth of indicators. An alternative is to use MLM (‘two/three-level’ in Mplus) with long data. This allows you to estimate a single factor across TPs (rather than a factor per TP). But you are also partitioning the variance, such that a within-level factor is derived from covariances between TPs lumped across participants (see ‘P data’), whereas the between-level factor is based on inter-individual differences in covariances lumped across TPs.
Your guidance has been invaluable thus far. I wondered whether you could address the following:
1. Is there a way of running non-linear two-level CFAs for long-data (where WITHIN = Time;). I suspect that we'd have to manipulate the column of time values from linear (e.g., 1, 2, 3, 4) to non-linear (e.g., 1, 1.3, 1.7, 2.4, 3.9).
2. How can we even tell if a within-level factor has a non-linear trend, if we cannot estimate means at each time-point?
1. CFAs that have indicators which are non-linear in the factors can be handled via XWITH. Models that are non-linear in time can be handled via time scores as you indicate or more generally as shown in the Grimm book.
2. You can estimate factor means if you do the analysis in wide format. Or, using a much more advanced approach, you can use Cross-classified growth modeling (I don't volunteer to guide you through it though; see our Utrecht Aug 2012 short course handouts and videos); see also Version 8 coming out next week.
Given that you didn't ask questions in your first 2 posting, I will let it go by that you violated our rule of posting in no more than one window.
After reading several threads, I am still unsure how the navigate the modelling of a longitudinal CFA with clustering.
Set up: I am working with a latent factor "governance" which is measured by 6 manifest indicators (VA PS GE RQ RL CC). Additionally, this data is drawn across multiple countries (ccode) at multiple time periods (year).
So far, this is my model: variable: names are ccode year VA PS GE RQ RL CC; Missing are all (-999); usevariables = year VA PS GE RQ RL CC; cluster = ccode; within = year;
analysis: type = twolevel random; estimator = mlr;
However, the more I read up the %within% and %between% commands, the less I understand exactly how to code these for my specific problem.
Any assistance you could provide would be appreciated.
If you don't have too many time points you can approach it like in slide 165 of our handout for our Topic 1 short course, that is, in wide, single-level format. Then add the two-level feature for the 180 clusters.
b. DIFFTEST is not available for two-level models: is there another way to compare two-level models (either via Mplus or manually)?
c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2 S_SPEC3 | GEN SPEC1 SPEC2 SPEC3 by TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a two-level, long format CFA?
d. Is it possible to increase the speed of each iteration (e.g., via hardware improvements)?
e. I seem to be able to run two analyses simultaneously without a cost in speed. Are there any issues with this? What if I were to run 3-4 analyses each with a ‘very heavy’ number of dimensions for integration?
I'm comparing bifactor, second-order, and one-factor CFAs with growth factors using a twolevel long data approach.
Example syntax (bifactor):
USEVARIABLES = TIME ID V1-V20; CATEGORICAL = V1-V20; MISSING ARE all (-999); CLUSTER = ID; WITHIN = TIME V1-V20; BETWEEN = ; ---- ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = WLSMV; MODEL = NOCOVARIANCES; MODEL: %within% GEN by V1* V2-V20; GEN@1;
%between% ; ----- SAVEDATA: DIFFTEST IS BIFAC.dat;
a. DIFFTEST is not available for two-level models: is there another way to compare two-level models (either via Mplus or manually)?
c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2 | GEN SPEC1 SPEC2 on TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a two-level, long format CFA?
Dear Dr.Muthens, We want to conduct a longitudinal CFA. We have a multilevel dataset (respondents from different companies) gathered via surveys at two time points. This would be a three level data; time, individuals and companies. The thing is that we do not have the follow up of the same individuals. At the first time point in one company we have for example 50 answers for each item whereas in the second time point maybe we have 30 for the same company. We have the dataset organized in wide format so we have in that example 20 cases with missing data for that company in the second measurement point. Our questions are: a) if we want to conduct a longitudinal cfa in order to test for measurement invariance, does it make sense to conduct a twolevel analysis (having data in wide format for different time points that level is not included right?) for companies and individuals even not having the follow up of the individuals? Or is it better to aggregate to company level and conducting the longitudinal cfa? b) Is it necessary to include a mean structure in a longitudinal cfa? Thank you very much for your help in advance.
In the case of the approach highlighted in this thread (multi-level factor analyses, data structured by time in long format), is it possible to save factor scores (or, say, Bayesian Plausible values) for each factor x time point even though the model will 'aggregate' across the time points when generating the overall 'single' factor(s) that one models?
Or would it be necessary to save factor scores for each factor x time point in a separate stand-alone analyses?
In short, would one be able to save within time factor scores for each person in the multi-level setup?
Saving factor x time factor scores would seem possible only with Type=Crossclassified in a time series context (DSEM/RDSEM), which requires quite a few time points (> 15?). Cross-classified allows the factor scores to vary across time - even though the data are in long format.
Thanks, Bengt, this was my inclination. It leaves me with a follow-up which is: in a ML CFA framework, when the model 'aggregates' over time at the within level (e.g., when estimating the factor(s)), what does the factor then become for each observation? That is to say, what source of variation does the 'aggregate' factor reflect?
Wouldn't it essentially becomes a between-subjects average for each person (grand mean, not person mean) at L2? Thus, if a factor or set of factors estimated in the ML setup were to be saved as fac scores, they would reflect between-person variation, correct? My instincts incline me to this interpretation because one is averaging across time rather than separately estimating the factor structure at each time point at the within-time level which (if executed) would yield a given set of factors depending on how many time points the investigator was modeling across.
Level 1 concerns within-subject variation, that is variation across time. Use a between-level factor to pull out the variation across subjects. You may also want to consult the latent state-trait literature - there are a lot of writings on this for cases where you don't have a lot of time points. That's a clean way of looking at things. In that literature, you also consider that observations at different time points correlate not only because of them coming from the same person but also because observations at one time point may influence the next time point.
Final question: If one in fact runs an ML CFA in Mplus, any factor score that was saved (I can't recall from the UG if FAC scores in ML CFA are available but that's irrelevant to my question right now), would be at the between-subjects level, yes?
If saving factor scores in ML is not an option, obviously above question is moot.
Curious to know the implications of the following scenario based on the above: One estimates the within subject factor in the ML FA setup. This aggregated factor for each person is then used in a separate ML growth analysis where it is used as an indicator for each persons factor score at each of 1....k time points (the same # of time points the original ML FA was based on).
Because the factor indicator for each person's 1....k time points in the second growth analyses will be the same within aggregate score generated in the original ML FA, *assuming no other steps taken*, how would it be possible to track growth in a factor score that is ostensibly the same across individual time points in the 2nd (growth) analysis?
Let's assume it is because more efficient in the within setting with this sort of model and categorical indicators and that the wide format has already been attempted.
In the example above, even if one takes the aggregate within factor(s) and estimates values from that average for each person at different time points, wouldn't one be starting that estimation process from the same value (whereas in contrast, if one had scores from the wide format, the estimation process would start with different person x time values)?
With twolevel (L1=time, L2= subject), the L1 factor score would be the same at all time points so not suitable for growth modeling. Sounds like you may want cross-classified analysis (time x subject) where the factor score can change over time.
Thanks Bengt. Yes, cross-classified would be the ideal (esp. as relates to MI).
Two brief further questions:
1)Is the ONLY reason you say that the within time factor described above is not suitable for GModeling because one can not establish MI in this setup? For example, if (assuming MI) in a secondary GLM analyses one estimates a distribution on the aggregate within factor(s) at each time point for each person, would that approach be formally suitable?
2)If one regresses the within time factor aggregate on a time dummy (again assuming MI) what precisely would the betas reflect? Only between-subjects change (variance) since one is only generating a fixed effect in that scenario?
If you have a time variable on Within, you can of course have change over time of the Within factor - e.g., you can do growth modeling there. When you say time as a dummy, I assume you don't want to commit to a specific growth function.
If not committing to a growth function is the goal, I think you'll like Type=Crossclassified. We discuss examples of that in our time series analysis Short Course Topic 12 on our website.
Thanks Bengt, this is precisely how I was thinking about it. And cross-classified is what I intend to consider using.
But to be sure, saving a within time factor score (ML CFA) and then using that factor score or set of factor scores (aggregated over time) as a person x time point indicator in a *separate* ML growth model would not be best approach per se, correct?
Would your answer to the above change if MI was assumed?