Dynamic panel data models in Mplus? PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
 Mike Zyphur posted on Wednesday, October 05, 2016 - 3:50 am
With panel data in a multilevel model, autoregressive (AR) effects have positive bias unless unit effects are removed. However, removing these with groupmean centering creates a negative bias because it treats unit effects as observed. This is a classic econometric problem (Baltagi, 2008). I am wondering if Mplus can creatively solve it, by estimating a single between latent variable for y and lagged y (ly):

y on ly;
f by y ly@1; y@0; ly@0;

However, this cause a singularity because y and ly are almost identical.

Thousands of papers and many Stata programs have been written to address this problem. SEM can handle it in a 'wide' format because there is only one y (Moral-Benito, 2013), but this causes estimation problems with large T and random slopes. Any multilevel solutions in Mplus would be great!


Baltagi, B. (2008). Econometric analysis of panel data. John Wiley & Sons.

Moral-Benito, E. (2013). Likelihood-based estimation of dynamic panels with predetermined regressors. Journal of Business & Economic Statistics, 31(4), 451-472.
 Bengt O. Muthen posted on Wednesday, October 05, 2016 - 4:05 pm
Don't know about unit effects but I'll have a look at the article. Large T (like > 50) and random slopes is coming in V8.
 Mike Zyphur posted on Wednesday, October 05, 2016 - 7:36 pm
Hi Bengt,
Thanks for this. Sorry for not being clear. By 'unit effect' I mean time-invariant factors, so the Between part of the model. If the problem can be addressed in Mplus without positively or negatively biasing the AR parameter in a multilevel model, it would be profound.

 Mike Zyphur posted on Saturday, October 08, 2016 - 12:10 am
Hi Bengt,
Apologies for posting again, but the bias associated with dynamic panel data models in a multilevel framework is probably best described in Bond (2002).


The amazing thing is that no one has been able to solve the problem in a typical multilevel framework. The issue is that the outcome variable and its lag as a predictor must simultaneously have their Between parts 'integrated out', so to speak, in order to avoid bias. Whether through plausible values or another approach, the problem might be solvable in Mplus (even if the econometricians have failed to solve it for over 30 years). It would be an amazing feat. Any thoughts are greatly appreciated!

 Tihomir Asparouhov posted on Monday, October 10, 2016 - 9:14 am

We think that the bias you are talking about is resolved in the upcoming version 8 release.

Some quick links

pages 8, 9

pages 13, 35, 36
 Mike Zyphur posted on Tuesday, October 11, 2016 - 3:26 am
Hi Tihomir,
Thank you for this. In the first set of slides, the "LAGVAR=" and "&1" options reflect a solution to problem of needing to estimate, yet treat as observed, the Between part for centering the lag in an unbiased way. This looks like a watershed development. The messy econometric tradition of GMM with instruments seems much less useful now. An amazing achievement!

Can I ask how you've managed this? "BUT: CMC in Mplus is not associated with this bias (nor is it in WinBUGS, see Jongerling et al., 2015), probably because the same (individual) parameter is used as the intercept and for CMC of the lagged predictor."

The past few days I have been developing a plausible values approach that now seems irrelevant.

Thanks again,
 Tihomir Asparouhov posted on Tuesday, October 11, 2016 - 8:28 am

We use Bayes estimation via MCMC. The between part of the centering is now on both the left and the right hand side of the model (as opposed to regular multilevel models where it is only on the left side). It is basically the same as what you are doing via plausible values, but we do it as one step MCMC estimation. Btw, as usual, we have this in multivariate settings and for any length of the lag ... and for latent variables as well.

 Bengt O. Muthen posted on Tuesday, October 11, 2016 - 10:48 am
Mike - what's the typical value of T (number of time points) in these applications? And what's N?
 Mike Zyphur posted on Tuesday, October 11, 2016 - 5:03 pm
Hi guys,
This is an amazing result. Typically, it is "small T, large N", with asymptotics often examined in N.

It's hard to overstate your advance. As you probably know, econometrics often uses GMM with instruments in an Arrelano-Bond tradition. The citation counts here are large:


The Stata 'xtabond2' procedure is often used, but it creates many dilemmas as its author notes:


Yet, econometrics seems trapped. Indeed, the current issue of the Stata journal has a new panel VAR program using GMM:


The new Mplus method offers a clear path out, with many GMM dilemmas eliminated. It's an amazing result.

 Tihomir Asparouhov posted on Wednesday, October 12, 2016 - 3:58 pm

The simulation study from pages 35 and 36 in the above slides produce good results with T as low as 10.

When T is 5 or so the results are no as great and we would recommend using the ML estimation like in user's guide example 6.17.
 Mike Zyphur posted on Thursday, October 13, 2016 - 3:45 am
Hi Tihomir,
If T is small enough, then SEM would seem possible as long as numerical integration could be handled. For example:

Eta by y1-y5; ! Allow time-varying unit effects
AR | y2 on y1;
AR | y3 on y2;
AR | y4 on y3;
AR | y5 on y4;
y1-y5@.01; ! Required for numerical integration
u_y1-u_y5 by y1@0;
RES | y1 on u_y1;
RES | y2 on u_y2;
RES | y3 on u_y3;
RES | y4 on u_y4;
RES | y5 on u_y5;
u_y1-u_y5 Eta with u_y1-u_y5@0 Eta@0;
Eta RES AR with Eta RES AR;

If Mplus were to allow Bayes estimation for single-level models with random slopes, this could be a nice alternative to the multilevel specification. Very few assumptions here (e.g., time-varying variances), but harder to identify and numerical integration quickly blows up.
 Tihomir Asparouhov posted on Thursday, October 13, 2016 - 2:56 pm

> If Mplus were to allow Bayes estimation for single-level models with random slopes

You can trick Mplus to do this by running it as two-level clusters using cluster =ID (all cluster sizes are 1) - in fact this is how we do the "no algorithm integration" estimation.

You would be happy to hear another exciting V8 news: new command that does your RES function. v | y; would estimate Var(y) = exp(v).
 Mike Zyphur posted on Thursday, October 13, 2016 - 4:06 pm
Thank you for this. The past few days have been truly exciting thanks to all of you! Simply stunning developments.

Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message