Autoregressive errors in multi-level ... PreviousNext
Mplus Discussion > Multilevel Data/Complex Sample >
Message/Author
 Shige Song posted on Wednesday, June 18, 2003 - 3:01 am
Hi,

Is it possible to estimate a two-level model with autoregressive errors at the second level using MPlus? Some other packages (HLM, MLwiN, aML) can estimate multi-level model with autoregressive errors at the first level but not at the second level, as described in DiPrete and Grusky (1990).

Since MPlus is more felxible in handling covariance structure, maybe it can do a better job? Thanks!

Best,
Shige Song
Department of Sociology, UCLA

----------------
Reference
DiPrete, Thomas A., and David B. Grusky. 1990. "The Multilevel Analysis
of Trends with Repeated Cross-Sectional Data." Sociological Methodology
20:337-368.
 bmuthen posted on Wednesday, June 18, 2003 - 4:38 am
Yes, this is possible. In 3-level modeling of growth - which is handled as a 2-level model in Mplus - the time-specific residual variances for the outcomes are in fact assumed zero on the cluster level (level 3) in multilevel modeling, whereas they can be estimated in Mplus.
 Shige Song posted on Wednesday, June 18, 2003 - 1:41 pm
Thanks for your reply! I did not find the word "autoregressive error" or "autocorrelation" in the user manual, could you tell me how to specify them using MPlus language?

Best,
Shige
 Linda K. Muthen posted on Wednesday, June 18, 2003 - 3:52 pm
You would use the WITH option. To specify the residual covariance between y1 and y2, you would say:

y1 WITH y2;
 Linda K. Muthen posted on Wednesday, June 18, 2003 - 6:00 pm
If you want to impose a first-order autoregressive residual structure, you respecify the model so that the residual is expressed as a factor. This means that the observed outcome has a factor influencing it and zero residual.

MODEL:

f1 BY y1@1; y1@0;
f2 BY y2@1; y2@0;
f3 BY y3@0; y3@0;

f3 ON f2 (1);
f2 ON f1 (1);

This gives a first-order autoregressive for the residuals called f1, f2, and f3.
 Shige Song posted on Friday, June 20, 2003 - 3:25 am
Dear Linda,

Thanks for the response. I am fairly new to MPlus and trying to read through the User Guide. Meanwhile maybe you can give me some quick response that may take me days to figure out otherwise.

Suppose I have 8 different surveys collected in one country at different time points with each survey containing 1000 observations. I pooled them together and want to do a two-level analysis.

At level-1 (individual) I have dependent variable y and two independent vairables x1, x2; at level-2 I have two independent variables z1, z2. All 5 variables are continuous.

If I ignore the fact that the data is created by stacking 8 cross-sectional data sets from the same country at different time points, this is a fairly straightforward 2-level model that can be estimated using any multi-level packages like HLM, MLwiN, aML. The MPlus code probably looks like this:
-----------------------------------------
...
VARIABLE: NAMES ARE y z1 z2 x1 x2 cohort;
CLUSTER IS cohort;
ANALYSIS: TYPE=TWOLEVEL;
MODEL:
%BETWEEN%
y ON z1 z2 x1 x2;
%WITHIN%
y ON z1 z2 x1 x2;
----------------------------------------

But now I want to consider such a fact by including a first-order autoregresive residual temr at the second level, how do I incorporate the code you presented above into this specific question? Specifically, what are the "y"s in your code, are they the "x"s or "z"s in my question? Thanks a lot!

---------------------
MODEL:

f1 BY y1@1; y1@0;
f2 BY y2@1; y2@0;
f3 BY y3@0; y3@0;

f3 ON f2 (1);
f2 ON f1 (1);
---------------------
 Linda K. Muthen posted on Friday, June 20, 2003 - 8:57 am
For clarification, when you say you have stacked your data, it appears that you are doing a cross-sectional design where you have 8 clusters and a total sample size of 8,000. Is this true?
 Shige Song posted on Friday, June 20, 2003 - 11:53 am
That's right, I have 8 cross-sectional data sets (each has sample size of 1000), I pool them together into one data set, now the total sample size is 8000.
 bmuthen posted on Friday, June 20, 2003 - 12:05 pm
Let me jump in and ask some more questions. So, you have only 8 clusters then? This seems very small for 2-level analysis. Also, I don't understand how you can have level 2 autocorrelation if level 2 is cohort and the cohorts are 8 independent samples. But perhaps that is explained in the Soc Meth article you referred to? Typically, it is assumed that the highest level, here cohort, are independently observed. In single-level models, however, I am aware of the use of auto-correlated errors and how that changes the likelihood. I haven't seen that in 2-level models. Mplus handles auto-correlated observations in a multivariate approach, but I don't see immediately how that plays in here.
 Shige Song posted on Friday, June 20, 2003 - 12:35 pm
Hi Linda,

8 cohort is just an example (a bad one, appenrently) to simplify the question. In my real data, I have data from 30 countries, and 20-50 cohorts within each country (now I want to put the cross-country country comparison aside for a moment and focus on one country).

The reason for cohort level autoregressive error, as described in the paper I cited, is because in each country the data was compiled by combining many cross-sectional data sets collected in different time points, and they are not completely independent samples in the sense they are sample of the same population at different time!

As you mentioned, level-1 autoregressive error can be handled in other multilevel packages - HLM, MLwiN, aML. But they can not handle autoregressive error on the aggregate level; it was my hope that the flexibility of MPlus in handling variance/covariance structure that can push one step further. Even if it is not feabile in the current version, maybe it's something to think about in the next version?

Thanks!

Shige
 bmuthen posted on Sunday, June 22, 2003 - 3:25 pm
I think I have to read the original Soc Meth article to be able to understand what you want to do and to be able to say how/if Mplus can be helpful here. One source of my confusion is your statement that your samples are not independent because they are samples from the same population at different times. I don't see how you get non-independent samples this way, unless the population is very small. Unfortunately, I don't have time right now to study the original article.
 Larry Kurdek posted on Monday, April 12, 2004 - 11:48 am
I am trying to replicate findings from a growth-curve (intercept and slope) multilevel model I have estimated in both HLM and the multilevel module in LISREL in Mplus 3. At this point, I am just interested in estimating intercepts and slopes with no covariates. In HLM terms, the model is a 3-level model with time (8 annual assessments with some missing values) which is nested in partner (only 2 per couple) which, in turn, is nested in couple. Fixing the variance of the slope at the partner level to 0, I can get the following syntax to run just fine in Mplus (using Example 9.12 as a model):
USEVARIABLES ARE
sat1 sat2 sat3 sat4 sat5 sat6 sat7 sat8;
ANALYSIS: TYPE = TWOLEVEL MISSING;
MITERATIONS=5000;
MODEL:
%WITHIN%
iw sw | sat1@-7 sat2@-5 sat3@-3 sat4@-1 sat5@1
sat6@3 sat7@5 sat8@7;
sat1-sat8 (1);
sw@0;
%BETWEEN%
ib sb | sat1@-7 sat2@-5 sat3@-3 sat4@-1 sat5@1
sat6@3 sat7@5 sat8@7;
sat1-sat8@0;

My problem is that I want to explore different error covariance structures at "level 1" involving sat1-sat8. I have tried simply adding WITH statements after the last model entry without success. (For example, just adding "sat1 WITH sat2" to the above syntax to get the correlated error for sat1 and sat2 yields the following error message:

THE ESTIMATED BETWEEN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT
SHOULD BE. COMPUTATION COULD NOT BE COMPLETED.
PROBLEM INVOLVING VARIABLE SAT8.
THE CORRELATION BETWEEN SAT2 AND SAT1 IS 0.996
THE CORRELATION BETWEEN SAT3 AND SAT2 IS 0.994
THE CORRELATION BETWEEN SAT4 AND SAT3 IS 0.991
THE RESIDUAL CORRELATION BETWEEN SAT2 AND SAT1 IS -3.150


THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE
COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.)

Is using the WITH statements the proper way to model the level 1 error covariance matrix? Do I need to change statements for the WITHIN and BETWEEN portions?
 Larry Kurdek posted on Monday, April 12, 2004 - 12:05 pm
I am interested in using intercepts and slopes from 3-level (time, partner, couple) growth curves from MULTIPLE variables to predict a categorical (dichotomous) outcome. More specifically, I want to see whether average values and linear change over 8 annual assessments for satisfaction, commitment, and investment (6 predictors) contribute unique information regarding eventual divorce (no, yes). Usually, I run this kind of analysis in 2 rather tedious steps. In the first step, I run the growth curves for each outcome in a mutlilevel program and then output the estimates so I can create a new data set from them. Then I merge the files from the separate outcomes and run a logistic (or multinomial) regression. Can I run this problem in one step in Mplus 3? Example 6.13 (2 parallel processes) comes close to what I am looking for, but it is not set in a multilevel context.
 bmuthen posted on Tuesday, April 13, 2004 - 10:26 am
Larry - here is an answer to your first question above about the correlated residuals on level 1. Using a with statement is correct. I wonder if you are incorrectly doing this on the Between level instead of the Within level (the Within level is level 1). The error message concerns the between-level covariance matrix and says that you have a problem with residual covariances - you should not have any such covariances on between since you have zero residual variances there.
 Larry Kurdek posted on Tuesday, April 13, 2004 - 11:51 am
Yes, that is exactly the error I made. It also seems to be the case that if I model the error covariance matrix at level 1, I need to set the variances for BOTH the within intercept and the within slope to 0. Is that correct? (I assume so, because the results I got were an exact match to the results I got from a parallel run in LISREL).
 bmuthen posted on Tuesday, April 13, 2004 - 12:15 pm
Yes, you can do this in a single analysis step in Mplus Version 3. Version 3 allows two-level analysis of growth with a dichotomous distal outcome where you can have random effects that vary across the between-level units.
 bmuthen posted on Tuesday, April 13, 2004 - 12:19 pm
Regarding your question about having to set the variances to 0 for both the within intercept and slope - no, you can still estimate these variances and you would typically want them free.
 Larry Kurdek posted on Tuesday, April 13, 2004 - 4:52 pm
Does the Version 3 manual have an example of a two-level analysis of growth with a dichotomous distal outcome where you can have random effects that vary across the between-level units?
I can run the growth curve analyses for the multiple outcomes in one analysis, but can't see any examples of how to link the intercepts and slopes from these analyses to a separate and common categorical outcome.
 bmuthen posted on Tuesday, April 13, 2004 - 6:08 pm
Not an explicit example, but you get it if you piece together examples from different chapters. We couldn't fit all the combinations... Here's the idea of how you do it.

You specify your iw, sw and ib, sb growth factors on within and between. And you have a distal categorical, say a binary u, and you have within-level covariates x and between-level covariates w. So regarding u, you say on within

u on x;

or

u on iw sw w;

and on between you say

u on ib sb w;

where on between, u is the random intercept (a continuous latent variable) in the logistic regression of u on x etc.
 bmuthen posted on Tuesday, April 13, 2004 - 6:08 pm
I meant to say

u on iw sw x;
 Larry Kurdek posted on Wednesday, April 14, 2004 - 2:10 pm
Thank you for your suggestions. I tried running the following syntax to predict a dichotomous outcome (sep) by an intercept and slope. I assumed I needed to declare 'sep' as a categorical variable:

TITLE: Trial run
DATA: FILE IS C:\DATA\MPLUS\BARRIERS\gl252.txt;
FORMAT IS 24F3/16F3,5F2,F4;
VARIABLE: NAMES ARE
sat1 sat2 sat3 sat4 sat5 sat6 sat7 sat8
alt1 alt2 alt3 alt4 alt5 alt6 alt7 alt8
inv1 inv2 inv3 inv4 inv5 inv6 inv7 inv8
bar1 bar2 bar3 bar4 bar5 bar6 bar7 bar8
cmt1 cmt2 cmt3 cmt4 cmt5 cmt6 cmt7 cmt8
sep wth drp lng fu cpl;
CATEGORICAL=sep;
WITHIN=;
BETWEEN=;
CLUSTER=cpl;
MISSING ARE ALL (-9);
USEVARIABLES ARE
sat1-sat8 sep;
ANALYSIS: TYPE = TWOLEVEL MISSING;
MODEL:
%WITHIN%
iw sw | sat1@-7 sat2@-5 sat3@-3 sat4@-1 sat5@1
sat6@3 sat7@5 sat8@7;
sat1-sat8 (1);
sep ON iw sw;
%BETWEEN%
ib sb | sat1@-7 sat2@-5 sat3@-3 sat4@-1 sat5@1
sat6@3 sat7@5 sat8@7;
sat1-sat8@0;
sep ON ib sb;


and received the following fata error message:

*** FATAL ERROR
THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT
INPUT FILE. YOU CAN TRY TO FREE UP SOME MEMORY BY CLOSING OTHER
APPLICATIONS THAT ARE CURRENTLY RUNNING. ANOTHER SUGGESTION IS
CLEANING UP YOUR HARD DRIVE BY DELETING UNNECESSARY FILES.

I have ample memory and hard drive space, so the error does not make sense to me. Thanks!
 bmuthen posted on Thursday, April 15, 2004 - 11:37 am
When you do this type of analysis, numerical integration is required for maximum-likelihood estimation. You will see this in the screen output if you request Tech8 and in the printed output (for successful runs). Your example uses 4 or 5 dimensions of integration which is very high, using many integration points with the default settings. The memory requirement and computational time go up essentially as a function of the product of integration points and sample size. So even having a lot of RAM won't always be sufficient and even if it were, the run could take an extremely long time. All this is described in the User's Guide section on numerical integration. To get your analysis going, you can reduce the number of integration points per dimension to say 10 or 7, or you can simplify the model at least as a first step. You want to build up your model in small steps, starting with simple parts of it. For example, you can first do the model without the distal outcome. Perhaps the sb variance is ignorable, which would reduce the integration dimensionality by 1. Building up the model like this also gives you good starting values for the final run, which then goes faster. Let me know how it goes and don't hesitate to send input, output, and data to support@statmodel.com.
 bmuthen posted on Saturday, April 17, 2004 - 11:36 am
Larry - can you please send me your input and data for the run above where you ran out of memory so I can check where this happens?
 Zoogah posted on Sunday, April 25, 2004 - 11:55 am
Are there student rates for Mplus? I am new to Mplus and would want to get it because of its ability to model categorical variables.
 zoogah posted on Sunday, April 25, 2004 - 11:59 am
bmuthen,
I have a model that has continuous indicators on a categorical latent variable (1). The latter in turn relates to another latent variable (2) that is continous and has continuous indicators. Together, the model involves LPA and LCA. Can I analyze the model at once? I believe I need to do the analysis separately (LPA and LCA). Can you help me? Thanks
 bmuthen posted on Sunday, April 25, 2004 - 12:05 pm
This analysis can be done in a single step in Mplus Version 3. For information about student pricing, see the top of the Mplus home page. There is also a free demo version as described in the web site.
 Anonymous posted on Wednesday, February 23, 2005 - 8:46 am
Dear Dr. Muthen,

I'm trying to fit the following growth curve model
with binary outcomes:

DATA:
FILE IS "C:\Data\data123.txt";

VARIABLE:
NAMES = SCHOOL x1 x2 x3 y1 y2 y3;
USEV = SCHOOL x1 x2 x3 y1 y2 y3;
CATEGORICAL ARE y1 y2 y3;
MISSING ARE ALL (999);
WITHIN = x1 x2 x3;
CLUSTER = SCHOOL;

ANALYSIS:
TYPE = TWOLEVEL MISSING H1 RANDOM;

MODEL:
%WITHIN%
iw sw | y1@0 y2@1 y3@2;
iw sw ON x1 x2 x3;
%BETWEEN%
ib sb | y1@0 y2@1 y3@2;
y1-y3@0;
OUTPUT: TECH1 SAMPSTAT CINTERVAL;

Here's the error message I get:
*** FATAL ERROR
THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT
INPUT FILE. YOU CAN TRY TO FREE UP SOME MEMORY BY CLOSING OTHER
APPLICATIONS THAT ARE CURRENTLY RUNNING. ANOTHER SUGGESTION IS
CLEANING UP YOUR HARD DRIVE BY DELETING UNNECESSARY FILES.

Could you point out why this happens even though I have enough memory
and disk space? Thanks.
 Thuy Nguyen posted on Wednesday, February 23, 2005 - 11:29 am
This model requires numerical integration which can be computationally heavy. There is a section in Chapter 13 of the Mplus User's Guide that discusses numerical integration and suggestions for using numerical integration. If you still have problems, please send your input and data to support@statmodel.com.
 bmuthen posted on Sunday, February 27, 2005 - 11:24 am
In the current version of Mplus you will be informed about the number of dimensions of integration. I think it is 4 in your case, which leads to heavy computations and can cause memory shortage with large sample sizes. You can try integration = montecarlo instead to reduce the computations.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: