Longitudinal Factor Analysis
Message/Author
 Bernhard Swoboda posted on Monday, August 01, 2011 - 10:08 am
Hi support team,

I am running a CFA across three waves of longitudinal data.
I would like to follow the approach by A. Farrel 1994, to test consistency of the measurement model across time. Therefore I need the factor loadings of the latent constructs to be equal across each time point.
How can I model this with Mplus, if I do not run a multiple group CFA and handle three times points as three groups?

 Linda K. Muthen posted on Monday, August 01, 2011 - 10:41 am
Please see the multiple indicator growth example in the Topic 4 course handout. The first part of the example tests for measurement invariance across time. It is not correct to use the three time points as three groups because the groups would not be independent.
 Elan Cohen posted on Tuesday, August 23, 2011 - 8:03 am
Dr. Muthen,

I'm trying to run a longitudinal CFA over 15 years of data. There are 50 items per year (all dichotomous), 1200 observations, and 2 latent factors per year.

1. Does this seem feasible? (It seems like too many variables and not enough observations to me).

2. Could you tell me if the following input will give me the correct model (shown only for 3 years of data).

MODEL:
f1a5 by item1a5-item25a5;
f2a5 by item26a5-item50a5;
f1a6 by item1a6-item25a6;
f2a6 by item26a6-item50a6;
f1a7 by item1a7-item25a7;
f2a7 by item26a7-item50a7;
item1a5-item25a5 pwith item1a6-item25a6;
item26a5-item50a5 pwith item26a6-item50a6;
item1a6-item25a6 pwith item1a7-item25a7;
item26a6-item50a6 pwith item26a7-item50a7;

Thank you very much.
 Bengt O. Muthen posted on Tuesday, August 23, 2011 - 1:28 pm
With that many items and time points, I would try a twolevel approach. You would then have 50 variables and 15 members (max) per cluster, where cluster is id (subject). This pre-assumes measurement invariance across time. You can then specify a random intercept and slope which vary across subjects to see how factor means change over time. You can look at UG ex9.16 and combine it with ex9.15.
 Susan Pe posted on Monday, June 25, 2012 - 12:33 pm
I am using a Panel data with 57 firms over 30years (each year 16 observations, not all firms have 16 observations). I am trying to do CFA, but not sure if this is correct.

Other than observed measures, I add under VARIABLE
CLUSTER IS firms;
ANALYSIS: TYPE IS TWOLEVEL;

I am not sure about %within% and %between% commands, and which is more appropriate. I am also not sure if I need to add within = TIME. I am not sure whether I should fix the latent variable @1 either. Thank you!!!
 Bengt O. Muthen posted on Monday, June 25, 2012 - 8:18 pm
There are many ways to do this, but let me ask you some questions first. How many items are you considering at each time point and are they continuous or categorical? Note also that longitudinal data need not be analyzed as twolevel, but can be handled in a single-level approach where a wide instead of a long format is considered. See our handouts for Topics 3-4 in our courses.
 Susan Pe posted on Tuesday, June 26, 2012 - 2:47 am
I have 7 items, some are continuous and some are categorical. Given that I have over 12000 observations x 7items over 500 different time points, do you think using a single-level approach with a wide format is doable?
 Bengt O. Muthen posted on Tuesday, June 26, 2012 - 9:37 am
No, 500 time points is not doable in wide form in the current Mplus. In the current Mplus version you want to take a two-level approach: time and firms. This has the disadvantage that you have to assume measurement invariance across time. In the upcoming Version 7 of Mplus, however, there will be more choices, such as a 3-level approach and also a the choice of allowing random forms of measurement non-invariance.
 Susan Pe posted on Wednesday, June 27, 2012 - 7:24 am
Thank you so much for your reply. Do you think I can do this in long format with
cluster = firm;
within = time ;
analysis: type = twolevel random;
model:
%within%
latent variables by observed variables;
But, how do I incorporate time under model?
Also, the gap between different time points are typically 2 weeks, but is longer when the year changes (couple of months). I see it in the example that time is written in the data as 0,1,2,3... Is it a problem that the time between observations varies? Is it possible to write time as dates?
 Linda K. Muthen posted on Wednesday, June 27, 2012 - 11:39 am
See Example 9.16 for the long format.
 Susan Pe posted on Tuesday, August 28, 2012 - 2:16 pm
Hi I saw the example but it's still not clear. Is this command correct?
cluster = firmid;
Analysis:
Type = twolevel;
Model:
%within%
RD by item1 item2 item3;

I don't need specify between?
 Linda K. Muthen posted on Tuesday, August 28, 2012 - 2:54 pm
Example 9.16 is a growth model. You are showing a factor model. What exactly are you trying to do?
 Susan Pe posted on Wednesday, August 29, 2012 - 2:06 pm
I want to just confirm the factor structure by doing a cfa, not estimate a growth model. I have panel data but due to too many time points I cannot do wide format, so I opted for two level model. Level 1 (WITHIN) is data points over time, level 2 (BETWEEN) is firms. I want to validate my factor structure with a confirmatory factor analysis, using longitudinal data. I am not particularly interested in BETWEEN and WITHIN level differences.
So I am not sure which factor scores to look at or whether I am using the commands
 Bengt O. Muthen posted on Friday, August 31, 2012 - 2:36 pm
If you have a measurement instrument that is given at many time points, you can do two-level factor analysis with data in long format. This assumes meas. invar. across time. You say:

%within%
fw by y1-y10;

%betweeen%
fb by y1-y10;

although you can have a different number of factors on the two levels.
 June Zhou posted on Wednesday, November 07, 2012 - 7:47 am
I have a question on interpretation of longitudinal invariance CFA results. Can I interpret as "The responses on items are consistent across time given that the intercept invariance or strong invariance is hold (factor loading and thresholds were constrained)"?
 Linda K. Muthen posted on Thursday, November 08, 2012 - 9:37 am
If you have intercept and factor loading invariance, you can say that.
 June Zhou posted on Sunday, November 11, 2012 - 9:24 am
 Matthew Constantinou posted on Monday, March 27, 2017 - 5:04 pm
Dear Linda and Bengt,

I am looking for help coding a two-level longitudinal CFA which uses a long data structure. Level 1 = time and level 2 = participant.

For example:

-----------------
Title:
Twolevel multilevel CFA with long data
Data:
File is example.dat ;
Variable:
Names are id time y1 y2 y3;
! Assume that I have 4 time-points, i.e. within each y-variable column participants have four entries each pertaining to a separate time-point.

WITHIN = time;
CLUSTER = id;
ANALYSIS:
TYPE = TWOLEVEL;
MODEL:
%WITHIN%
f1 BY y1 y2 y3;

%BETWEEN%
f1 on time;

-----------------

I realise the inputs to %WITHIN% and %BETWEEN% are incorrect. Where would I estimate a CFA for each time-point and where would I then estimate its slope? One cannot estimate the factor at the within level (UG example 9.15) with a long data structure (UG example 9.16).

Any guidance with specific code would be greatly appreciated.

Kind regards,
Matthew
 Bengt O. Muthen posted on Thursday, March 30, 2017 - 9:14 am
Time is Within, so say:

%WITHIN%
f1 BY y1 y2 y3;
f1 on time;

%BETWEEN%
f1b by y1-y3;

The long, two-level approach assume measurement invariance across time.
 Matthew Constantinou posted on Sunday, April 02, 2017 - 7:05 am
Dear Bengt,

Many thanks for your response. There are a few things I would like to clarify.

I am having problems interpreting the CFA results at the within and between levels in the model you describe.

From my understanding, the %within% portion of the model estimates a factor which predicts the (co)variation in Y1-Y3, based on data organized by the four time-points, and how this factor changes over time.

Latent variables based on random intercepts for Y1-Y3 are then used as factor indicators at the %between% level.
In other words, the %between% portion reflects how participant id predicts (co)variation in random intercepts for Y1-Y3.

Critically, whilst the %within% level CFA uses observations based on each time-point, time-points are clustered by participant ID.

Q1: To understand how a factor changes over time, the %within% portion is enough as it a hybrid of within-level (time) and between-level (cluster ID) data.

Q2. What is the point of %between%. The model works fine without it, except that model fit estimates reduce greatly.

Q3. Would I add covariates, both time-varying and time-invariant, to the %within% portion of the model?

Q4. Which CFA results does one use to understand the factors which predict data for a single group of participants collapsed across time points, and how that factor changes over time?
 Bengt O. Muthen posted on Monday, April 03, 2017 - 8:10 am
The Within level captures the variation across time and the Between across subject. This gives the answers:

Q1: The Within is not a hybrid.

Q2: You certainly want to know how big the variation is across subjects

Q3: Time-varying covariates go on Within and Time-invariant covariates go on Between.

Q4: I will send you a paper of mine that is coming out in SM&R. See the section on random effects and how intercept variation across subjects can be seen as a form of measurement invariance.
 Matthew Constantinou posted on Wednesday, April 05, 2017 - 11:00 am
Dear Bengt,

Thanks so much for clarifying that. It makes sense now: a within level factor predicts the covariances in an indicator across time, whereas the between level factor predicts the covariances in an indicator across people.

I have a few follow-up questions:

1. In the two-level longitudinal model, what does it mean to predict a factor with time (e.g., ‘F1 on Time;’). Does the resultant beta value indicate a factor’s temporal stability?

2a. A single-level linear growth approach (i.e. assessing how a factor changes across time) suits my needs, but what is it about the factor that changes? In the command: f1@0 f2@1 f3@2; are we estimating how the factor’s predictive strength of covariances in a set of indicators changes across time?

2b. When we correlate two factors (e.g., f1 WITH z1), are we examining the covariance between their predictive strengths?

2c. A factor’s predictive strength is different from factor scores, which estimate one’s position on a factor (regardless of its predictive strength), right? To examine how factor scores change over time, we would have to extract scores (e.g., ‘SAVE IS Fscores;’) and use those as the observed measures of a growth model.

Your thoughts would be most appreciated.
Matthew
 Bengt O. Muthen posted on Wednesday, April 05, 2017 - 3:41 pm
I don't think you should try to learn this topic by a series of Discussion questions. Instead find papers/books about it such as Grimm's new growth book. But here are brief answers:

1. It means that F1 has a linear trend that you want to describe.

2a. See our multiple-indicator growth video from Hopkins short courses.

2b-2c. Ask general analysis questions not pertaining to Mplus specifically on SEMNET.
 Matthew Constantinou posted on Wednesday, April 12, 2017 - 10:40 am
Bengt, I have been reading Grimm's 'Growth Modelling' book and would recommended it to anyone running growth models in MPlus. Thank you for the recommendation.

I want to clarify a few things for those new to longitudinal CFA which tripped me up:

1. There are two general approaches to growth modelling - multilevel modelling (MLM) and structural equation modelling (SEM). Statistically, these approaches are very similar. Practically, they each have their pros and cons. Mplus can run both types of techniques (and even integrate them, see UG 9.10).
 Matthew Constantinou posted on Wednesday, April 12, 2017 - 10:47 am
2. We create factors or latent variables in growth models even if our goal isn’t a CFA. For example, to understand how math ability changed over time, I would estimate a slope and intercept-both of which are latent variables-for maths scores at each time-point (TP). This can be confusing for those who want to run a CFA over time, because we're essentially creating latent growth variables of latent factors.

3. A CFA run via the SEM approach (e.g., F1 by depr1- depr25; F2 by anx1-anx14;) requires a sufficient number of indicators to identify the model. To run a CFA for each TP (e.g., F1a by depr1_1-depr1_25; F1b by depr2_1-depr2_25; F2a by anx1_1-anx1_14; F2b by anx2_1-anx2_14; i_depr s_depr | F1a@0 F1b@1; i_anx s_anx | F2a@0 F2b@1;) is thus a resource-hungry process, especially when we have at least 3 TPs and a questionnaires' worth of indicators. An alternative is to use MLM (‘two/three-level’ in Mplus) with long data. This allows you to estimate a single factor across TPs (rather than a factor per TP). But you are also partitioning the variance, such that a within-level factor is derived from covariances between TPs lumped across participants (see ‘P data’), whereas the between-level factor is based on inter-individual differences in covariances lumped across TPs.
 Matthew Constantinou posted on Wednesday, April 12, 2017 - 11:02 am
Bengt,

Your guidance has been invaluable thus far. I wondered whether you could address the following:

1. Is there a way of running non-linear two-level CFAs for long-data (where WITHIN = Time;). I suspect that we'd have to manipulate the column of time values from linear (e.g., 1, 2, 3, 4) to non-linear (e.g., 1, 1.3, 1.7, 2.4, 3.9).

2. How can we even tell if a within-level factor has a non-linear trend, if we cannot estimate means at each time-point?

Many thanks,
Matthew
 Bengt O. Muthen posted on Thursday, April 13, 2017 - 4:02 pm
1. CFAs that have indicators which are non-linear in the factors can be handled via XWITH. Models that are non-linear in time can be handled via time scores as you indicate or more generally as shown in the Grimm book.

2. You can estimate factor means if you do the analysis in wide format. Or, using a much more advanced approach, you can use Cross-classified growth modeling (I don't volunteer to guide you through it though; see our Utrecht Aug 2012 short course handouts and videos); see also Version 8 coming out next week.

Given that you didn't ask questions in your first 2 posting, I will let it go by that you violated our rule of posting in no more than one window.
 Justin Ponkow posted on Friday, April 21, 2017 - 2:15 pm
Dr. Muthen,

After reading several threads, I am still unsure how the navigate the modelling of a longitudinal CFA with clustering.

Set up: I am working with a latent factor "governance" which is measured by 6 manifest indicators (VA PS GE RQ RL CC). Additionally, this data is drawn across multiple countries (ccode) at multiple time periods (year).

So far, this is my model:
variable:
names are ccode year VA PS GE RQ RL CC;
Missing are all (-999);
usevariables = year VA PS GE RQ RL CC;
cluster = ccode;
within = year;

analysis:
type = twolevel random;
estimator = mlr;

However, the more I read up the %within% and %between% commands, the less I understand exactly how to code these for my specific problem.

Any assistance you could provide would be appreciated.

- Justin
 Bengt O. Muthen posted on Friday, April 21, 2017 - 5:38 pm
Are the data for the multiple time points collected for the same individuals or different ones?

How many countries do you have?
 Justin Ponkow posted on Friday, April 21, 2017 - 5:48 pm
The data collected is for the same individuals across time (the countries/ territories) and there are about 180 of them
 Bengt O. Muthen posted on Saturday, April 22, 2017 - 5:27 pm
If you don't have too many time points you can approach it like in slide 165 of our handout for our Topic 1 short course, that is, in wide, single-level format. Then add the two-level feature for the 180 clusters.
 Matthew Constantinou posted on Friday, July 21, 2017 - 11:05 am
Dear Bengt and Linda,

Goal: Compare the model fit between a bifactor, second-order, and one-factor CFA.

Implementation: Two-level long data approach (a wide data approach is too computationally taxing + I want to use all available data points).

Example syntax (bifactor): https://goo.gl/EybTmW

Questions:

a. Do you see any problems with the syntax?

b. DIFFTEST is not available for two-level models: is there another way to compare two-level models (either via Mplus or manually)?

c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2 S_SPEC3 | GEN SPEC1 SPEC2 SPEC3 by TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a two-level, long format CFA?

d. Is it possible to increase the speed of each iteration (e.g., via hardware improvements)?

e. I seem to be able to run two analyses simultaneously without a cost in speed. Are there any issues with this? What if I were to run 3-4 analyses each with a ‘very heavy’ number of dimensions for integration?
 Bengt O. Muthen posted on Friday, July 21, 2017 - 5:08 pm
When the syntax doesn't fit in one window, please send the output to Support along with license number.
 Matthew Constantinou posted on Saturday, July 22, 2017 - 12:58 am
I'm comparing bifactor, second-order, and one-factor CFAs with growth factors using a twolevel long data approach.

Example syntax (bifactor):

USEVARIABLES = TIME ID V1-V20;
CATEGORICAL = V1-V20;
MISSING ARE all (-999);
CLUSTER = ID;
WITHIN = TIME V1-V20;
BETWEEN = ;
----
ANALYSIS:
TYPE = TWOLEVEL;
ESTIMATOR = WLSMV;
MODEL = NOCOVARIANCES;
MODEL:
%within%
GEN by V1* V2-V20;
GEN@1;

SPEC1 by V1* V2-V10;
SPEC1@1;

SPEC2 by V10* V11-V20;
SPEC2@1;

GEN SPEC1 SPEC2 by TIME;

%between%
;
-----
SAVEDATA: DIFFTEST IS BIFAC.dat;

a. DIFFTEST is not available for two-level models: is there another way to compare two-level models (either via Mplus or manually)?

c. I will test how the slopes differ for 2 treatment groups by first estimating the random slopes (S_GEN S_SPEC1 S_SPEC2 | GEN SPEC1 SPEC2 on TIME;) and then regressing group membership on each slope at the %between% level. The analysis requires 8 integration points and crashes with a ‘memory space’ error. Even if I reduce the number of integration points, I wondered whether you could advise on a computationally simpler way to compare the slopes between groups in a two-level, long format CFA?
 Linda K. Muthen posted on Saturday, July 22, 2017 - 10:16 am
If you are using an estimator that requires DIFFTEST, I know of no other way to do a difference test.