Chaoyang Li posted on Wednesday, January 17, 2001 - 4:07 pm
We have a data set that contains about 2,200 subjects who were surveyed on cigarette use longitudinally for more than 10 years across 65 schools. First we are interested in looking at the growth trend at individual level and then we are interested in examining the differences of the slopes and intercepts across schools. A three-level growth modeling might be used for these purposes. By referring the examples in MPlus manual and the handout from the workshop, we wrote the following programs:
VARIABLE: NAMES ARE ID SCHID GROUP WNC6 WNC8 WNC9 WNC10 WNC11 WNC18 WNC21; MISSING is .; USEVAR = GROUP SCHID WNC6-WNC21; CLUSTER=SCHID;
ANALYSIS: TYPE = TWOLEVEL; ITERATIONS = 1200; ESTIMATOR = MLM;
THE SAMPLE COVARIANCE MATRIX FOR THE VARIABLES IN THE MODEL CANNOT BE INVERTED. THIS CAN OCCUR IF A VARIABLE HAS NO VARIATION OR IF TWO VARIABLES ARE PERFECTLY CORRELATED. CHECK YOUR DATA. *** FATAL ERROR
Could you please expain why the errors occured and how to modify the program?
This appears to be a problem with your data. It is most likely due to little variability on the between level. If you send the input and data to email@example.com, I can take a look at it.
Anonymous posted on Wednesday, July 10, 2002 - 7:05 am
I have data on 850 children attending about 300 different schools (about 3 kids per class) with about six waves of data collection. Is this too few children to estimate a multilevel growth curve? Also, if the children change schools during the course of the study, will this impact my ability to estimate a multi-level model?
Anonymous posted on Thursday, July 11, 2002 - 9:14 am
It is possible to estimate a multilevel growth curve even with 2 subjects per cluster. To take into account children changing schools you have to set up a multiple membership model. This is not very easy to do but here is how it goes. First form the new clusters to be clusters of schools where children can move from one school to another in the same cluster. Then you have to setup dummy variables for school membership for each student. Finally set the model as between level intercept | y on dummy and between level slope | y on dummy x time.
Anonymous posted on Thursday, March 25, 2004 - 7:14 am
We have student achievement data over a five year period on tests in math, reading and writing. The students are nested in classes within schools within boards. We would like to investigate school improvement over the five years. The constraints in these data are that: 1. Different tests were taken in each year although equated from year to year: 2. Different students took the different tests, for example grade 3 students in year 1 are different from grade 3 students in year two in the same school: 3. Minimal information on schools, e.g. average income Is it possible to fit a cross-sectional longitudinal model to examine school improvement? Thanks
bmuthen posted on Thursday, March 25, 2004 - 4:19 pm
That's a big topic. You may be interested in looking at my UCLA colleague Yeow Meng Thum's work in this area:
finnigan posted on Friday, August 29, 2008 - 9:35 am
I am using a multiple indicator growth model to model varablity of individuals within shools, and schools are in different regions using 4 measurement occasions.
In this case I take it that this is a three level model, individuals in schools within regions.
are there any MPLUS examples you are aware of that use a multiple indicator growth model at three levels?
I plan to use individual times of observation to examine within and between person change. Does the introduction time as a varible add a fourth level ie individuals, within schols,within regions within time?
I am running a multilevel (students within school) latent growth curve model looking at achievement scores over three time points. I have three covariates at the student level (race, ses, and gender-- all binary).
I am computing the intercepts (starting point) and I'm wondering why only the intercept and slope at the between level are shown by default in the output. For computing the intercept (with race, ses, and gender set to 0), shouldn't the starting year be:
There is only one parameter for the mean of the intercept growth factor and it appears at the between level (you can think of this as the intercept growth factor have zero mean on level 1). Note that the intercept growth factor mean is the mean of the outcome at time 1. Just like there is only one outcome mean, there is only one intercept mean.
1. So I take it that the within level intercepts and slopes are usually not reported? What is the main purpose of the intercept(within) in the tech4 output? When I add the intercept(between) and the intercept(within) I get the overall mean and my intercept(within) is negative and not zero (it's only zero when I don't have any covariates in the model).
2. Also- would you have a recommendation of any article that does a good job of reporting the output of analyses using MLGC?
Slide 58 of our Topic 8 handout makes it clear what iw, sw, ib, and sb consist of in multilevel terms.
You see there that with a covariate x,
iw = beta*x+r
and this is why Tech4 shows a non-zero value for the mean of iw - it is simply beta*x-bar. There is no intercept parameter for iw, which is the same as no mean parameter for iw when there are no covariates.
I cannot think of articles off hand - anybody else? I would think the Raudenbush-Bryk (2002) book has examples of this kind.
Generally, a TECH4 quantity is a function of several model parameter estimates, not a single model parameter estimate. This is the case of the variance of an endogeneous variable for example. To get the SE you would have to define a NEW parameter in Model Constraint and express the new parameter as a function of the model parameters using their labels. An approximate approach is to say drop covariates so that you get the variance as a model parameter.
Gabriela R posted on Tuesday, February 15, 2011 - 11:50 am
Hello, I hope you can give me some advice on the following: I have modeled a questionnaire at 4 time-points, obtaining 4 factors, one at each time point. I then applied equal structure, equal loadings and equal thresholds constraints. The next step was to apply an LGM on the 4 factors. In order to reduce the number of variables in my model, I was thinking of fitting the intercept and slope straight on the factor scores of the 4 factors. The factor scores would be obtained from the invariance model, letting the 4 factors correlate.
My question is: If I let the factors correlate in the model from which I save the factor scores, would this bias the LGM estimation?
No, correlating the factors would be in line with the LGM because LGM implies a certain factor correlation.
To use factor scores, however, you should have a sufficient number of high-loading items for the factor. It does help of course that you draw on information from all 4 time points.
You can also do a 1-step ML analysis, although with categorical items that will involve 4 dimensions of numerical integration which gives heavy computations. You can also use Bayesian analysis which avoids the integration; see papers on our web site.
I am trying to run a three-level model that estimates the whether neighborhood-level poverty (measured only at Wave 1) impacts individual youths' academic growth trajectories, here measured by WISC scores.
When I run the following code Mplus balks and says that 'povtycon' (the neighborhood-level covariate of interest) has no variation when in reality it does.
Tao Yang posted on Tuesday, December 04, 2012 - 8:30 pm
Dear Dr. Muthen, I am running a two-level linear growth model with individually varying time scores. I would like to model the cross-level moderating effect of a between-level variable (w) on the within-level effects of latent intercept and slope on an outcome variable (z) respectively. My syntax is as below.
VARIABLE: USEVAR = y1-y5 t1-t5 z w clustid; TSCORES = t1-t5; MISSING ARE ALL (-9999); CLUSTER = clustid; BETWEEN = w;
MODEL: %WITHIN% i s|y1-y5 AT t1-t5; si| z ON i; ss| z ON s; %BETWEEN% y1-y5@0; si ss ON w;
I got the error message "THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-ZERO DERIVATIVE OF THE OBSERVED-DATA LOGLIKELIHOOD..CHECK YOUR STARTING VALUES OR INCREASE THE NUMBER OF MITERATIONS".
I increased number of miterations to 2000 and got the same message. I then increased Montecarlo integration points to 10000 and MITERATIONS to 10000, and got this message:
"THE ESTIMATED BETWEEN COVARIANCE MATRIX COULD NOT BE INVERTED..CHANGE YOUR MODEL AND/OR STARTING VALUES.THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION..."
I am not sure what might be the cause(s) of the error and/or whether there were errors in the model specification.
Based on the 3-level LG model mentioned above, I am running a model in which child cohort membership moderates the effect of the between-level covariate neighborhood affluence ("AFF"). I also want to include a mediator ("MED"), but two issues come up. First, I do not get any fit indices. Second, I am not able to use MODEL INDIRECT to do an effect decomposition. Is it possible to test for mediated moderation within a 3-level latent growth model? If so, how would I need to change the syntax below?
ANALYSIS: TYPE = TWOLEVEL RANDOM;
MODEL: %WITHIN% iw sw | wrat@0wrat2@1wrat3@2; s9i | iw on cohort9; s12i | iw on cohort12; s9s | sw on cohort9; s12s | sw on cohort12; iw sw on ...;
I have a negative binomial model with time (4 timepoints) nested within students (N=18921) nested within schools (N=132). I am predicting intercepts and longitudinal slopes of alcohol consumption (a negative binomial outcome). I am having trouble getting these models to converge in wide format (they are taking several days, and often not converging). However, when I switch the data to long, the models run much more quickly. I understand the differences between the models in long versus wide format. But, is it running these models in the long format still a valid way to analyze this data? Thanks in advance.
One more question regarding running the model in wide format. I want to use both individual level covariates (sex, race, SES, etc.) and school level covariates (public vs. private, school locale, etc.) to predict individual intercepts and slopes. I created within-level latent intercept, slope and slope squared terms (intcptw slopew squarew). However, I get an error message when I try to use these terms on the between level ("Within-level variables cannot be used on the between level."). I am worried, however, that by creating a second set of latent terms (intcptb slopeb squareb) on the between level I am actually predicting school-level intercepts and slopes and not individual-level intercepts and slopes. How can I be sure I am predicting individual level and not school level intercepts and slopes when using "between level" variables? Model: %Within% intcptw slopew squarew | dyam1@0dyam2@1dyam3@6dyam4@13 ; intcptw slopew squarew sex race SES ; dyam1-dyam4(1); %Between% intcptb slopeb squareb | dyam1@0dyam2@1dyam3@6dyam4@13 ; intcptb slopeb squareb public locale1 locale2 ; dyam1-dyam4@0;
For your first question you have to send the outputs for the same model done in wide and long for us to see.
For your second question, intcptb etc on Between are the between-level parts of the growth factors, so it is correct to regress them on between-level covariates. So, between-level covariates predict the between-level part of the growth factors which in turn predict the between-level part of the dyam outcomes, which therefore predict the observed dyam outcomes.
Using complex survey data, I am trying to run a three-level growth model where children are nested in 4 time points as well as neighborhoods. My data are in long format due to non-constant weights over time. I've written code based on example 9.16 where y2= outcome, x1-x5= time-invariant covariates, a1-a5= time-varying covariates. When I attempt to run the model I get the following error:
*** ERROR IN MODEL command Between-level variables cannot be used on the within level. Between-level variables used: Y2.
How do I need to structure my code to avoid this error? Also, How would I modify my code to set the covariance between the random intercept and slope to zero?
TITLE: Random Intercept & Slope Model with Time-Invariant & Time Varying Covariates; DATA: FILE IS C:...\Mplus\math_long.csv; VARIABLE:NAMES= id clus strat iptw x1 y3 y4 y1 y2 x2-x5 a1-a5 time; USEVARIABLE= clus strat iptw x1 y2 x2-x5 a1 a2 a4 a5 time; MISSING= ALL (-1234); CLUSTER= clus; STRATIFICATION= strat; WEIGHT= iptw; WITHIN= time y2 a1 a2 a4 a5; BETWEEN= x1-x5; ANALYSIS: TYPE= TWOLEVEL COMPLEX RANDOM; MODEL: %WITHIN% s | y2 ON time; y2 ON a1 a2 a4 a5; %BETWEEN% y2 s ON x1-x5; y2 WITH s; OUTPUT: SAMPSTAT TECH4 TECH8;
Thank you for the fast response. I apologize for being unclear. Each of the 'a' variables represent a different time-varying covariate, e.g., SES advantage, residential stability, rather than the same variable at different time points. After I removed y (my y2) from the Within list, I still received the same error.
Bep Uink posted on Thursday, October 15, 2015 - 12:33 am
Hello, I am trying to model the effect of stress on change in emotion across the day using the uni-variate (i.e. multi level) format. I have centered time on stressful event (which is a level 1 IV). So, t=0 when stressful event occurs; t = 1, t = 2 etc. are time points after the event and t = -1, t= -2 etc. are time points before the event. I am unsure how to interpret significant main effects of time. There is a sig. negative relationship between Time of Event and emotion, can I say that as time moves toward values > 0 (i.e. post-stress) values of emotion decrease?
I have a question about the interpretation of results.
I run a multilevel (members in teams) latent growth model and include an individual-level time-varying covariate with a random slope that varies on both the within and between levels. So, it is similar to the example of 9.14 in MPlus user guide.
1. How do I know a1-a4 has an impact on y1-y4?
2. For “S” reported in variance at within level, what does it mean if I get a positive estimate with significant p-value. It means that there is a between-person difference?
3. For “S” reported in “means” at between level, I get a negative estimate with significant p-value, what does it mean?
4. For “S” reported in “variance” at between level, I get a positive with significant p-value, it means that there is between-team difference?
5. If I add a team-level predictor to predict “S” at between level, I get a positive estimate with significant p-value, what does it mean?
I am a new user to MPlus, so please help answer these questions. Many thanks.
We conducted a Randomized controlled trial (treatmement and control groups assessed in 4 time points) and we are interested in finding if different profiles of those subjects (we previously conducted a LPA based on their personality and found a 4 class solution) change over time differently (in variables such as anger, shame, paranoia, ..) also considering if they are in the treatment or in the control group
We tried GMM, but that did not solve our problem, because we do not want to classify people considering how they change over time. We want to see how different profiles change over time considering also treatment/control condicton
Is there any way to enter those 4 different profiles as multiple dummy variables in a LGCM?
We face the problem of a rather complex longitudinal data structure based on which we would like to recover a developmental score scale (e.g., latent variable growth score) and would very much appreciate your advice.
The data structure is as follows:
-Intensive longitudinal data consisting of multiple digital assessments the students conducted throughout the school year in a digital platform/ formative assessment system (we face: imbalanced data per student and unequal time intervals (e.g. 5 – 100 assessments per student/ school year). Further, the students conducted different assessments (different items, maybe some overlap) and the items are binary coded (correct/false)
Our aim is to get insight into students development in subject domains and we would like to extract a developmental score for further analyses (based on as few assumptions on e.g. functional form, dimensionality etc. as possible – maybe previous analyses on other data will show that we need to fit a multidimensional model with different dimensions for e.g., mathematics …)
Could you give us a hint on potentially suitable latent variable (growth) modelling technique?
Many thanks for your expertise in this regard! We highly appreciate it
Regarding analysis of intensive longitudinal data, including unequal time intervals, you may want to study the article
Hamaker, E.L., Asparouhov, T., Brose, A., Schmiedek, F. & Muthén, B. (2018). At the frontiers of modeling intensive longitudinal data: Dynamic structural equation models for the affective measurements from the COGITO study. Multivariate Behavioral Research, DOI: 10.1080/00273171.2018.1446819 (Online supporting material).
See also our Short Course Topic 12 video and handout at
Growth modeling of latent variable constructs is treated in our Short Course Topic 4. See especially the section on Multiple Indicator growth.
Rachel Dew posted on Wednesday, March 20, 2019 - 9:08 am
I have been working on a growth model of behavioral variables over four waves of data, using age as a time score. A reviewer has asked that I also use age as a control variable. Is that necessary and if not, how would I explain that?
If the starting age varies to a substantively important degree, you should take this into account. A flexible and interesting way is to see the different starting ages as multiple-cohorts like in UG ex 6.18.