Multilevel Time Series Question PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Anonymous posted on Tuesday, March 12, 2002 - 5:05 am
I have a rather complex data set that I'm going to examine.


1. Survey data were collected at three distinct time periods of an event (pre-, mid-, and post-event - approximately 9 months from beginning to end) from one cohort. Each time period the sampling frame was sampled and about 700 respondents completed the survey all three times. An additional 500+/- responded to the survey, of which some may have completed surveys for two of the three time periods (I have not received the database yet so I do not know a more accurate figure).

2. The respondents are members of distinct subgroups and group level data were collected to allow for conceptually relevant multi-level analysis.

3. At time one, group level variables (3 variables) and one individual predictor variable were collected. At time two, individual predictor variables (4 variables) and outcome variables (3 variables) were collected in addition to group level variables described. At time three the same variables as time two were collected.


1. I would like to use as much of the time one thru three data as possible. Is it possible to use SEM and Mplus to model the missing values for those respondents who did not answer all three time periods. Is there a reference you are aware of that provides a model for such a procedure.

2. Each of the variables, I conjecture, are latent factors rather than manifest indicators (although, composite scores for the variables can be derived by summing the values for each variable). Do I have too many latent varibles to adequately model this or should I consider a SEM path analysis strategy using manifest variables?

3. I'm not certain how to include covariates (two) in a SEM model? please clarify

Thanks in advance

Steve Lewis
 bmuthen posted on Tuesday, March 12, 2002 - 9:39 am
This type of analysis should not be problematic. It sounds like you have longitudinal data with missingness, where the interest is not in growth over time, but rather a path analysis model with variables on both individual and group levels. Missing data can be handled in SEM using standard ML estimation under MAR (see missing data references under the Reference section on this web site); here you use all available data. I am not clear on the groups you mention - have you sampled the groups so that this could be considered a random effect (such as sampling schools), or are they fixed? The former leads to multilevel modeling and the latter to multiple-group or MIMIC modeling using covariates (for basic SEM concepts, see Bollen and other ref's under References on this web site). Mplus 2.02 cannot handle multilevel SEM with missing data, but this is forthcoming in version 2.1. Latent variables typically need at least 2 manifest indicators, and preferrably more.
 Anonymous posted on Wednesday, March 27, 2002 - 2:47 pm
Thank you for responding so quickly. I finally received the database and am now looking at the patterns of missingness and have questions about minimum number of observations to obtain unbiased estimates using a FIML estimation for missing values. Here is how the database is arrayed:

Completed T1 & T2 = 178
Completed T2 & T3 = 482
Completed T1 & T3 = 132
Completed all 3 = 186
Completed only 1 time = 3,789

I've done some reading on missing values and ML but have not run across acceptable missingness levels.

Finally, The groups are random & unbalanced (military units) so I plan to test the data using a multilevel path analysis. I've used Mplus for CFA and basic SEM models so i'm venturing into new territory and trying to aviod as many mistakes as possible.

Thanks in advance,

Steve Lewis
 Linda K. Muthen posted on Thursday, March 28, 2002 - 9:16 am
Good references for acceptable missingness levels would be the Little and Rubin and Shafer books that are referenced on our website. You will have computational difficulties if you have more than 90% missing. Information about this is given in the coverage output. However, with a large percentage missing, the analysis relies very strongly on model and missing data assumptions

The current version of Mplus cannot combine multilevel and missing. Version 2.1 which is due out in a few months (a free update for Version 2 users) will allow this.
 Alicia Merline posted on Friday, April 07, 2006 - 3:01 pm
I want to estimate a multilevel model where individuals are nested within couples and the dependent variable is measured repeatedly and the main predictor of interest is measured repeatedly as well. There are no latent components to my model.

I am not interested in estimating change over time. Rather, I would like to estimate the overall relation between X and Y, but take into consideration the non-independence of my data
I have looked through the manual, the handouts from the one week Mplus training and documents you provide on your website. The only examples of multilevel repeated measures modeling I can find estimate latent curves.

On page 72 of your publication “Multilevel modeling with latent variables using Mplus” There is a model estimating the intercept and slope in math scores, but data on attendance are available at all 4 time points. Using this example, what if we wanted to know the effect of attendance on math scores in any given year? How would the model be altered so that we would be estimating the association between attendance and math scores?

Here is my stab at some syntax using your example on page 7 :

VARIABLE: NAMES ARE cohort id school weight math7 math8 math9 math10 att7 att8
Att9 att10 gender mothed homres;
USEOBS: (gender EQ 1 AND cohort EQ 2);
MISSING = ALL (999);
USEVAR = math7-math10 att7-10 mothed homers;
Cluster = school;

Math7 ON att7; within individual time-varying
Math8 ON att8;
Math9 ON att9;
Math10 ON att10;
Math7 ON mothed; within individual time-invariant
Math8 ON mothed;
Math9 ON mothed;
Math10 ON mothed;

I can use the widelong command to change math7-10 to an across-time “math”, but I have no way to “telling” Mplus that the repeated observations are not independent. Is there a way to estimate a repeated measures TWOLEVEL model without making the latent slope the DV?
 Bengt O. Muthen posted on Friday, April 07, 2006 - 5:37 pm
You can do this in a couple of different ways. Growth modeling is not needed; it is ok to consider only a regression of y on x. One way is to use a multivariate approach to indviduals within couples (since there are only 2), taking care of the couple correlation, and let the time dependence be handled by Type = Complex. This means that you would say

cluster = couple;

and have your data arranged as

y1 y2 x1 x2

where the subscript refers to person within couple for a given time point. So you give the data in long form wrt time - each couple has as many rows as there are time point. The number of rows a couple has is their "cluster size". In this way, Type = Complex computes SEs that take the correlatedness over time within couple into account.

Your model statement would be:

y1 on x1;
y2 on x2;

where the x's are correlated by default and so are the residuals of the y's.
 Bengt O. Muthen posted on Friday, April 07, 2006 - 6:17 pm
Just to add a clarification, your variable list would be:

Names = couple y1 y2 x1 x2;


Usev = y1 y2 x1 x2;

and your data set would then have the same couple value for the rows of that couple (the repeated measures on y and x).
 Alicia Merline posted on Monday, April 10, 2006 - 12:15 pm
Thanks, your response cleared things very well for me.

 Alicia Merline posted on Wednesday, April 12, 2006 - 11:10 am
Just a follow-up. If I combine husbands and wives into one line of data, I will be modeling men and women seperately, so I will not be estimating any effect of gender on the DV. Because I have to estimate the regressions seperately for each time I would not get a time-independent estimate of the regression Y ON X.

What I am interested in is the effect of marital status on mental health. Because there is no invervention in this study and because the points of data collection don't have developmental significance, I'd like an overall estimate of the association between marital status and depression, irrespective of time. Because the data are nested in individuals, who are nested in couples, I feel the data require a multilevel model. Otherwise, this would simply be a single logistic regression. But I feel I cannot ignore the nestedness here.

It is looking like I can't do what I'm trying to do if I use MPLUS.

I suppose there is another option if I select only cases where no remarriage takes place after divorce and then align the data among divorced individuals such that all respondents are married at times 1 and 2 and divorced at times 3 through 5 (data from non-divorced people would be left as is). Then I could estimate a piecewise lcm at the WITHIN level and regress the intercepts and slopes on the within and between subjects variables. This would allow me to do things like see if couple-level characteristics relate to slope before divorce differently than they relate to slope after divorce. If I do this, is there also a way to compare intercept 1 to intercept 2 and slope 1 to slope 2?

Thanks again,

 Bengt O. Muthen posted on Thursday, April 13, 2006 - 10:47 am
Your first paragraph suggests that you misunderstood my recommendation. You don't do the regression separately for each time. My suggestion implies that you do get a time-independent estimate of the regression of y on x - you get only one intercept and one slope. You do take into account the nestedness of the data by using Type = Complex.

You are right that my suggestion estimates separate regressions for men and women, so allowing both different intercepts and slopes.

You don't need to do multilevel modeling to take nestedness into account. But you can do multilevel modeling in Mplus if that is what you want.

If this is unclear, let me know how I can help clarify further.
 ide katrine birkeland posted on Sunday, April 15, 2012 - 11:58 pm
Dear Muthen(s).

I am turning to Mplus because of my need for a complex analysis. I'd like to test for mediation and moderation (in separate analyses) in a longitudinal dataset consisting of three waves. I would like to do this using multilevel, in order to investigate both changes within and between subjects. I have been searching for papers that show examples of how to this and preferably perhaps even something even more hands on, but have not been successful. Do you have any reading tips or online courses where I can learn more? Perhaps also a syntax or example to practise with?

Thank you in advance.
 Linda K. Muthen posted on Monday, April 16, 2012 - 8:51 am
I think what you want is the model in Example 9.3 at three time points.
 ide katrine birkeland posted on Tuesday, April 17, 2012 - 2:06 am
Excellent, thank you!
 ide katrine birkeland posted on Tuesday, April 24, 2012 - 2:06 am
Dear Prof.

I apologize in advance for any errors in my interpretation of my current problems.

I have now read more on the suggested literature and see how example 9.3 might apply to my mediated model. I only have a couple of questions.

My mediated research model is with continuous variables (except time and person). Thus, I wonder if having u as a dependent variable does not makes sense in this case. It seems more corrects that y is the DV, x2 is the M and x1 is the IV. If I then understand Bollen & Curran (2006) correct, time should then be regressed (or multiplied with?) on all the included variables. Further, I have to admit, I am not sure if u should represent time or person, and this goes for w as well. I see that w is meant to represent the cluster level covariate, but since this is longitudinal all my variables are at the same level, thus I’m confused.

My next question pertains to a moderated research model. This is based on the same dataset, where time is within person, and the moderator variable is time-variant like the rest of the variables.
Again, if I read B&C correctly, time should then be multiplied with all the variables, how to describe this in the syntax and how to interpret the results is very difficult to understand.

Thank you very much in advance.

 Linda K. Muthen posted on Tuesday, April 24, 2012 - 8:06 am
I see now that you don't have clustering except for time. In this case, you don't need multilevel modeling. When data are in a wide multivariate format, you have a single-level model. The multivariate analysis takes care of clustering due to repeated measures.
 ide katrine birkeland posted on Wednesday, April 25, 2012 - 11:37 pm
OK, thank you very much. I'm still not sure how to calculate the moderating variable and write syntax to test for moderation with longitudinal data, though. I've been investigating the examples in ch. 6, but I can't seem to find anything similar. Any suggestions? I apologize for my ignorance and thank you very much for you time.

 Linda K. Muthen posted on Thursday, April 26, 2012 - 1:46 pm
Moderation can be tested for categorical moderators using multiple group analysis. If a parameter is not equal across groups, this is an interaction. It can also be tested by including an interaction in the regression:

y ON x1 x2 x1x2;
 ide katrine birkeland posted on Monday, April 30, 2012 - 1:50 am
Thank you very much. How do you then include the time variable in this? I am familiar with moderation, but have never done it with a longitudinal data set. My hypotheses is whether and how x1x2 moderates the x1-y relationship over time. It is theorethically relevant to investigate this both within and between subjects. My intial thought was to do a triple interaction in multilevel, x1x2t, however, it seems problematic as t is then a categorical variable.

Here's an idea I had, but I see that it's just a start. Can I write the syntax something like this?

ix1 sx2 | x1@0 x1@1 x1@2;
iy sy | y@0 y@1 y@2;
ix1x2 sx1x2 | x1x2@0 x1x2@1 x1x2@2;
iy ON x1;
sy ON x1;
ix1x2 ON x1 XWITH y;
sx1x2 ON x1 XWITH y;
 ide katrine birkeland posted on Monday, April 30, 2012 - 1:53 am
*Continues from post above:

...or maybe something like this?

ix1 sx2 | x1@0 x1@1 x1@2;
iy sy | y@0 y@1 y@2;
ix1x2 sx1x2 | x1x2@0 x1x2@1 x1x2@2;
iy ON x1 x1x2;
sy ON x1 x1x2;
 Linda K. Muthen posted on Monday, April 30, 2012 - 2:13 pm
Please keep your posts to one window.

I am confused about your model. The | symbol for growth is used for wide format data where you would have variables x1, x2, and x3. It seems to me you have long format data because you show only x1 three times. See Example 9.16 of the user's guide. Then you would create an interaction with time using the DEFINE statement, for example,

int = time*moderator;

And then in the MODEL command, you would have:

s1 | y ON time;
s2 | y ON int;
 ide katrine birkeland posted on Wednesday, May 02, 2012 - 3:10 am
Thank you very much for you patience, and my apologies for the double posting.

Actually, I have both long and wide versions of the data set, so with regards to your comment I am now using the long format of my data.

Using Example 9.16, I have tried to create a new syntax.

When I run it I get the following error: *** ERROR in MODEL command
Unknown variable: INT

Below is parts of the syntax I ran:

Usevariable = OPce POSxOP CY ID time;
Cluster = ID;

DEFINE: int = time*POSxOP;

ANALYSIS: Type = Twolevel Random;

s1 | CY ON time;
s2 | CY ON int;

CY s1 s2 ON OPce POSxOP;
CY WITH s1 s2;
 Linda K. Muthen posted on Wednesday, May 02, 2012 - 6:26 am
Any new variable created in DEFINE that is used in the analysis must be put at the end of the USEVARIABLES list.
 ide katrine birkeland posted on Thursday, May 10, 2012 - 12:30 am
Dear Prof. Muthen,

Thank you, I am now able to run the full model, but I get the following message:


Of course there is logic in time and int being correlated as int is defined by time*moderator. However, I do not understand why the correlation is so high, in SPSS, the correlation is -.058**. Second, I'm am unsure how I set the "ALGORITHM=EM AND MCONVERGENCE

Thank you for your help.
 Linda K. Muthen posted on Thursday, May 10, 2012 - 11:19 am
Please send the output and your license number to
 Alberto Canarini posted on Tuesday, October 13, 2015 - 5:26 pm
I'd like to estimate a path analysis model (no latent variables) the same as Alicia Merline posted on Wednesday, April 12, 2006.
I have three time reps (I'm not interested in time variation, but just to correct for the dependency) and two groups represented by two soil depth.

I was thinking to do the same approach but to input:
CLUSTER : time
and organize the data by depth, as:
y1 y2 x1 x2
where the 1 and 2 represent the two depth, and model would be:

y1 on x1
y2 on x2

Does this sound correct?

Also when I will graphically represents my model, is there any way to represent a total model and not separate by the two depth? because when I get the output from Mplus the estimates are grouped on Y1 on X1 and Y2 on X2 obviously representing the two separate groups.

thanks in advance for any suggestion.
 Bengt O. Muthen posted on Wednesday, October 14, 2015 - 2:10 pm
With 2 groups (2 soil depths) influencing the outcome y, you should create a single dummy variable x where x=0 is depth 1 and x=1 is depth 2. So your data have only 2 columns. Then say

y on x;
 Alberto Canarini posted on Wednesday, October 14, 2015 - 8:43 pm
Dear Prof Muthen,

Thanks for your answer. However I still have doubts.
As I mentioned I will use CLUSTER=time to correct for the 3 time reps. Then I want to see the effect of x on y (well the model is a little more complicated), were both x and y have data from the two depths (therefore is not 2 groups influencing 1, it's not multilevel). Let's say x corrispond to microbial biomass and y is soil carbon. Therefore the data for microbial biomass and soil carbon come from both depth. But in my thinking I believe that being the two soil depth one after each other there's some sort of correlation I should correct for? Does it make sense?

I hope I explained a little clearer this time.

 Bengt O. Muthen posted on Thursday, October 15, 2015 - 3:27 pm
I see, so y1 and y2 are seen as 2 different variables, so your original approach seems right:

CLUSTER : time
and organize the data by depth, as:
y1 y2 x1 x2
where the 1 and 2 represent the two depth, and model would be:

y1 on x1
y2 on x2

where you could use Type=Complex in the Analysis command.

What throws me off in your original message though is "the total model". What would that mean?
 Alberto Canarini posted on Thursday, October 15, 2015 - 3:37 pm
Thanks again for your answer.

What I meant when asking for a "total model" was: the output when calculating

y1 on x1
y2 on x2

it's obviously going to give me two results(in terms of estimates, s.e., p-values etc), one for variable y1x1 and the other for y2x2. My question was: is there a way to combine them together, in order to have a graphical output that include both?
 Bengt O. Muthen posted on Thursday, October 15, 2015 - 4:08 pm
If you mean the estimated regression lines, yes they can be plotted in one graph for example using LOOP and PLOT where you define

reg1 = a1 + b1 *x
reg2 = a2 + b2* x

as the two estimated regression lines, where the a's and the b's are labels given in the Model command. See UG ex 3.18.
 Alberto Canarini posted on Thursday, October 15, 2015 - 9:56 pm
Sorry if I was confused in the previous message.

In my case it's a path analysis, with more variables than just x and y (I used only two to simplify my example). For instance I would have something like:

y1 on x1;
y2 on X2;
X1 on a1 b1;
X2 on a2 b2;
a1 with b1;
a2 with b2;

And so the output gives me the estimates (and standardize estimates) and R square which I use to graphically show interaction between different soil parameters as path analysis. But if I have two estimates (and 2 R square) for each path (because of the two depths) is it possible to combine the two values in each path? in order to have only one value per path, and then graphically represent only one model.
 Bengt O. Muthen posted on Friday, October 16, 2015 - 1:39 pm
I don't see a natural way to combine the two - unless you see the 1 and 2 measurements as 2 indicators of a Y factor and 2 indicators of a X factor and it is the factors that follow the path model.
 Alberto Canarini posted on Friday, October 16, 2015 - 4:29 pm
It's a pity there's no way to combine the two.
Thanks anyway for all your help and all the quick answers.
 Ok-young, JI posted on Sunday, March 27, 2016 - 7:06 pm
hi Dr.Muthens

i'm studying a master course of Organizational psychology.

now i'm analyzing the diary study data.

my data consists of 'three level'.

level 1 : time(10 days)
level 2 : person
level 3 : team level

i read Mplus user guide and user guide said my data can analyze using 'two level random' but, ican't find the example about that.

how can i analyze this model with two level random?

please let me know~
 Bengt O. Muthen posted on Monday, March 28, 2016 - 6:36 pm
Why do you want "Random"? And why don't you want 3-level analysis given that you have there levels?
 Ok-young, JI posted on Monday, March 28, 2016 - 8:50 pm
because level 1 data is nested level 2, and level 2 data is nested level 3.

and my study design assumed that within variance and between variance.

so i need random slope and random intercept and three level modeling.

i so some studies that using three level modeling but i can't find syntax.

in Mplus user guide, my data can analyze with twolevel random..

but i can't find any example about that
 Linda K. Muthen posted on Tuesday, March 29, 2016 - 8:18 am
See Example 9.20 in the user's guide.
 Ok-young, JI posted on Tuesday, March 29, 2016 - 8:03 pm
i saw user guide example 9.20
but this example is three level random not twolevel random.

in user guide page 253, "Three level analysis where time is the firest level, individual is the second level, and cluster is the third level ls handled by two-level modeling."

my data is exactly the same as mentioned above.

my need is syntax example about that model.
 Linda K. Muthen posted on Wednesday, March 30, 2016 - 1:08 pm
See Example 9.2. It is TWOLEVEL RANDOM. Example 9.20 is THREELEVEL RANDOM.
 Ok-young, JI posted on Friday, April 01, 2016 - 1:11 am
i saw user guide example 9.2
but this example consist of two level design and analysing with twolevel random

my need is three level deign with twolevel random analysis.

is this rare modeling?
 Linda K. Muthen posted on Friday, April 01, 2016 - 7:13 am
I don't know what you mean. With THREELEVEL RANDOM, you can have a random slope level 1 or level 2 or both level 1 and level 2. What exactly are you trying to do?
 Matthew Burgess posted on Thursday, August 04, 2016 - 9:52 pm
Hi Drs. Muthen.

I have a query regarding whether I have appropriately specified and interpreted a model using time as a predictor

My syntax is:
!where tScore represents minutes from first observation

MISSING = ALL (-9999);
CLUSTER = PNum; !individual
WITHIN = tNA tScore;
DEFINE: center tNA(groupmean);
s1c | tERC ON tNA;
s2e | tERE ON tNA;
s1tc | tERC ON tScore;
s2te | tERE ON tScore;

tERC s1c tERE s2e;
tERC WITH s1c;
tERE WITH s2e;
tERC tERE WITH s1tc s2te;

I've interpreted the output:

s1tc -0.001 (sig .000)

s2te .001 (sig .000)

as higher individual mean TERC predicts a reduction in tERC over time

and higher individual mean TERE predicts a reduction in tERE over time.

1. Is this an appropriate way to model time? (Note missing data has lead me to have difficulty getting a LGM to converge)

2. Have I interpreted this appropriately (in particular the 'over time' statement)

Kind Regards
 Bengt O. Muthen posted on Friday, August 05, 2016 - 3:00 pm
I assume that tNA is a time-varying covariate and that tScore has a zero somewhere (preferably at time 1)to give a clear interpretation of your intercept and slope growth factors (random effects).

I would change "predicts" to "correlates with". Otherwise, it seems ok.
 Matthew Burgess posted on Sunday, August 07, 2016 - 5:41 pm
Yes tNA is a time-varying covariate and 0 for each individual is set at time 1.

Thankyou for your reply.
 Bengt O. Muthen posted on Monday, August 08, 2016 - 3:29 pm
 Katerina Rnic posted on Monday, August 31, 2020 - 7:38 am

I am wondering whether I have correctly specified a model.

We conducted a study using twice-daily diaries. Youth were asked to indicate how much they co-ruminated in person, over text, over phone, and over social media at each prompt. They were also asked to indicate their positive affect. We created time-lagged variables so that we could regress positive affect on levels of co-rumination across each type of communication at the previous time point. (So we are examining how co-rumination at time T-1 predicts positive affect at T). We have already ran the model in DSEM and now want to try it using MLR.

Is the below syntax correct?
Please note that Inp_M_1, t_M_1, sm_M_1, and p_M_1 are the lagged variables for co-rumination in minutes in person, over text, over social media, and over the phone, respectively. PosAf is positive affect.

usevariables are Inp_M_1 t_M_1 sm_M_1 p_M_1 PosAf PosAf_1;
within are Inp_M_1 t_M_1 sm_M_1 p_M_1 PosAf_1 PosAf;
Cluster is ID;

type is twolevel random;

s_n | PosAf ON PosAf_1;
s_i | PosAf ON Inp_M_1;
s_t | PosAf ON t_M_1;
s_s | PosAf ON sm_M_1;
s_o | PosAf ON p_M_1;


output: sampstat tech1 tech8;
 Bengt O. Muthen posted on Monday, August 31, 2020 - 3:49 pm
Looks fine. The correlation within subjects over time is here captured only by the cluster effects on Between. The correlation between adjacent observations close in time is not captured as it is with DSEM and its autocorrelation. This means that SEs may be too small.
 Katerina Rnic posted on Monday, August 31, 2020 - 3:55 pm
Dear Dr. Muthen,

Thank you so much for your comments. Is there a way to capture autocorrelations using MLR? I do have information on the time that each observation was completed.

Thanks very much,
 Bengt O. Muthen posted on Monday, August 31, 2020 - 4:02 pm
Only if you arrange the data in single-level wide format which you would do only with a small number of time points. Then you can capture AR using UG ex 6.17.
 Katerina Rnic posted on Monday, August 31, 2020 - 4:08 pm
We have 28 time points, is that too many?
 Bengt O. Muthen posted on Monday, August 31, 2020 - 4:38 pm
With 1 DV, that's not too much. But for T=28, I would use DSEM or RDSEM instead with much more modeling flexibility.
 Katerina Rnic posted on Monday, August 31, 2020 - 7:13 pm
Thanks very much for your advice.

We noted that our DSEM models and models run in the software HLM (which uses MLR) were producing different results. Do you know why this might be? For example, besides using different estimators, does DSEM and the HLM program apply different assumptions, or handle missing data differently?
 Bengt O. Muthen posted on Tuesday, September 01, 2020 - 8:35 am
There must be a difference in the models used so the runs are not comparable. Send the outputs from DSEM and HLM to Support so we can see where the differences lie.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message