Message/Author 

Chaoyang Li posted on Wednesday, January 17, 2001  4:07 pm



We have a data set that contains about 2,200 subjects who were surveyed on cigarette use longitudinally for more than 10 years across 65 schools. First we are interested in looking at the growth trend at individual level and then we are interested in examining the differences of the slopes and intercepts across schools. A threelevel growth modeling might be used for these purposes. By referring the examples in MPlus manual and the handout from the workshop, we wrote the following programs: ... VARIABLE: NAMES ARE ID SCHID GROUP WNC6 WNC8 WNC9 WNC10 WNC11 WNC18 WNC21; MISSING is .; USEVAR = GROUP SCHID WNC6WNC21; CLUSTER=SCHID; ANALYSIS: TYPE = TWOLEVEL; ITERATIONS = 1200; ESTIMATOR = MLM; MODEL: %BETWEEN% levelb BY WNC6WNC21@1; trendb BY WNC6@0 WNC7@1 WNC8@2 WNC9@3 WNC10@4 WNC11@5.5 wnc18@7 wnc19@8.5 WNC21@10; [WNC6WNC11@0 WNC18@0 WNC19@0 WNC21@0]; [levelb trendb]; levelb ON GROUP; trendb ON group; %within% levelw BY WNC6WNC11@1 wnc18@1 wnc19@1 wnc21@1; trendw BY WNC6@0 WNC7@1 WNC8@2 WNC9@3 WNC10@4 WNC11@5.5 wnc18@7 wnc19@8.5 WNC21@10; levelw ON GROUP; trendw ON group; OUTPUT: SAMPSTAT STANDARDIZED; However, we got the following error message: THE SAMPLE COVARIANCE MATRIX FOR THE VARIABLES IN THE MODEL CANNOT BE INVERTED. THIS CAN OCCUR IF A VARIABLE HAS NO VARIATION OR IF TWO VARIABLES ARE PERFECTLY CORRELATED. CHECK YOUR DATA. *** FATAL ERROR Could you please expain why the errors occured and how to modify the program? Thank you for your help. 


This appears to be a problem with your data. It is most likely due to little variability on the between level. If you send the input and data to support@statmodel.com, I can take a look at it. 

Anonymous posted on Wednesday, July 10, 2002  7:05 am



I have data on 850 children attending about 300 different schools (about 3 kids per class) with about six waves of data collection. Is this too few children to estimate a multilevel growth curve? Also, if the children change schools during the course of the study, will this impact my ability to estimate a multilevel model? Thank you. 

Anonymous posted on Thursday, July 11, 2002  9:14 am



It is possible to estimate a multilevel growth curve even with 2 subjects per cluster. To take into account children changing schools you have to set up a multiple membership model. This is not very easy to do but here is how it goes. First form the new clusters to be clusters of schools where children can move from one school to another in the same cluster. Then you have to setup dummy variables for school membership for each student. Finally set the model as between level intercept  y on dummy and between level slope  y on dummy x time. 

Anonymous posted on Thursday, March 25, 2004  7:14 am



We have student achievement data over a five year period on tests in math, reading and writing. The students are nested in classes within schools within boards. We would like to investigate school improvement over the five years. The constraints in these data are that: 1. Different tests were taken in each year although equated from year to year: 2. Different students took the different tests, for example grade 3 students in year 1 are different from grade 3 students in year two in the same school: 3. Minimal information on schools, e.g. average income Is it possible to fit a crosssectional longitudinal model to examine school improvement? Thanks 

bmuthen posted on Thursday, March 25, 2004  4:19 pm



That's a big topic. You may be interested in looking at my UCLA colleague Yeow Meng Thum's work in this area: http://www.gseis.ucla.edu/faculty/thum/Papers/SMR1103.pdf. 

finnigan posted on Friday, August 29, 2008  9:35 am



Dear Linda/Bengt I am using a multiple indicator growth model to model varablity of individuals within shools, and schools are in different regions using 4 measurement occasions. In this case I take it that this is a three level model, individuals in schools within regions. are there any MPLUS examples you are aware of that use a multiple indicator growth model at three levels? I plan to use individual times of observation to examine within and between person change. Does the introduction time as a varible add a fourth level ie individuals, within schols,within regions within time? Thanks 


First question is if you have many regions, where many means at least say 20. If not, then treat region as a fixed mode  using dummy covariates. If yes, then use Type = Complex Twolevel; where Complex covers the region and Twolevel covers the schools (see UG). AT (individuallyvarying times of observation) should not add a level. You can use AT in the wide, multivariate approach to growth modeling that we prefer. 

finnigan posted on Friday, August 29, 2008  10:06 am



Bengt Thanks for this. Just to recap I would have 4 regions at most east,west,midlands and south. Hence I have a two level model  individuals within schools. Region would act as a covariate variable and time of observation does not add any level. Thanks 


That's right. 


Dear Linda & Bengt, is it possible to measure individual's (nested in groups) growth on grouplevel outcomes? Do you have any references or Mplus examples on that? Many thanks in advance. 


Yes, this is possible. We don't have any refs or examples, but you specify it just like you would in regular growth. 


I am running a multilevel (students within school) latent growth curve model looking at achievement scores over three time points. I have three covariates at the student level (race, ses, and gender all binary). I am computing the intercepts (starting point) and I'm wondering why only the intercept and slope at the between level are shown by default in the output. For computing the intercept (with race, ses, and gender set to 0), shouldn't the starting year be: intercept(between)+intercept(within)? Thanks. 


There is only one parameter for the mean of the intercept growth factor and it appears at the between level (you can think of this as the intercept growth factor have zero mean on level 1). Note that the intercept growth factor mean is the mean of the outcome at time 1. Just like there is only one outcome mean, there is only one intercept mean. 


Thank you for the reply. A few follow up questions: 1. So I take it that the within level intercepts and slopes are usually not reported? What is the main purpose of the intercept(within) in the tech4 output? When I add the intercept(between) and the intercept(within) I get the overall mean and my intercept(within) is negative and not zero (it's only zero when I don't have any covariates in the model). 2. Also would you have a recommendation of any article that does a good job of reporting the output of analyses using MLGC? Many thanks. 


Slide 58 of our Topic 8 handout makes it clear what iw, sw, ib, and sb consist of in multilevel terms. You see there that with a covariate x, iw = beta*x+r and this is why Tech4 shows a nonzero value for the mean of iw  it is simply beta*xbar. There is no intercept parameter for iw, which is the same as no mean parameter for iw when there are no covariates. I cannot think of articles off hand  anybody else? I would think the RaudenbushBryk (2002) book has examples of this kind. 


Thank you! Will check out your handout. 


Wanted to also ask: how do you get the corresponding standard errors of the output of TECH4 (variances)? 


Generally, a TECH4 quantity is a function of several model parameter estimates, not a single model parameter estimate. This is the case of the variance of an endogeneous variable for example. To get the SE you would have to define a NEW parameter in Model Constraint and express the new parameter as a function of the model parameters using their labels. An approximate approach is to say drop covariates so that you get the variance as a model parameter. 

Gabriela R posted on Tuesday, February 15, 2011  11:50 am



Hello, I hope you can give me some advice on the following: I have modeled a questionnaire at 4 timepoints, obtaining 4 factors, one at each time point. I then applied equal structure, equal loadings and equal thresholds constraints. The next step was to apply an LGM on the 4 factors. In order to reduce the number of variables in my model, I was thinking of fitting the intercept and slope straight on the factor scores of the 4 factors. The factor scores would be obtained from the invariance model, letting the 4 factors correlate. My question is: If I let the factors correlate in the model from which I save the factor scores, would this bias the LGM estimation? Thank you! Gabriela 


No, correlating the factors would be in line with the LGM because LGM implies a certain factor correlation. To use factor scores, however, you should have a sufficient number of highloading items for the factor. It does help of course that you draw on information from all 4 time points. You can also do a 1step ML analysis, although with categorical items that will involve 4 dimensions of numerical integration which gives heavy computations. You can also use Bayesian analysis which avoids the integration; see papers on our web site. 


Hello, I am trying to run a threelevel model that estimates the whether neighborhoodlevel poverty (measured only at Wave 1) impacts individual youths' academic growth trajectories, here measured by WISC scores. When I run the following code Mplus balks and says that 'povtycon' (the neighborhoodlevel covariate of interest) has no variation when in reality it does. Any ideas? Thanks! William            USEVAR = subid nc wiscraw wiscraw2 wiscraw3 needsratio povtycon race_d2 race_d3 race_d4 race_d5 sex_r; CLUSTER = nc; Analysis: Type = TWOLEVEL; ESTIMATOR = MUML; MODEL: %WITHIN% iw sw  wiscraw@0 wiscraw2@1 wiscraw3*2 (1); iw sw ON needsratio race_d2 race_d3 race_d4 race_d5 sex_r; %BETWEEN% ib sb  wiscraw@0 wiscraw2@1 wiscraw3*2 (1); ib sb ON povtycon; OUTPUT: SAMPSTAT STANDARDIZED RESIDUAL; 


Please send the output, data, and your license number to support@statmodel.com. 

Tao Yang posted on Tuesday, December 04, 2012  8:30 pm



Dear Dr. Muthen, I am running a twolevel linear growth model with individually varying time scores. I would like to model the crosslevel moderating effect of a betweenlevel variable (w) on the withinlevel effects of latent intercept and slope on an outcome variable (z) respectively. My syntax is as below. VARIABLE: USEVAR = y1y5 t1t5 z w clustid; TSCORES = t1t5; MISSING ARE ALL (9999); CLUSTER = clustid; BETWEEN = w; ANALYSIS: TYPE = TWOLEVEL RANDOM; ALGORITHM = INTEGRATION; INTEGRATION = MONTECARLO(5000); MODEL: %WITHIN% i sy1y5 AT t1t5; si z ON i; ss z ON s; %BETWEEN% y1y5@0; si ss ON w; I got the error message "THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONZERO DERIVATIVE OF THE OBSERVEDDATA LOGLIKELIHOOD..CHECK YOUR STARTING VALUES OR INCREASE THE NUMBER OF MITERATIONS". I increased number of miterations to 2000 and got the same message. I then increased Montecarlo integration points to 10000 and MITERATIONS to 10000, and got this message: "THE ESTIMATED BETWEEN COVARIANCE MATRIX COULD NOT BE INVERTED..CHANGE YOUR MODEL AND/OR STARTING VALUES.THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION..." I am not sure what might be the cause(s) of the error and/or whether there were errors in the model specification. Thanks! 


Please send the output and your license number to support@statmodel.com. 


Based on the 3level LG model mentioned above, I am running a model in which child cohort membership moderates the effect of the betweenlevel covariate neighborhood affluence ("AFF"). I also want to include a mediator ("MED"), but two issues come up. First, I do not get any fit indices. Second, I am not able to use MODEL INDIRECT to do an effect decomposition. Is it possible to test for mediated moderation within a 3level latent growth model? If so, how would I need to change the syntax below? ANALYSIS: TYPE = TWOLEVEL RANDOM; MODEL: %WITHIN% iw sw  wrat@0 wrat2@1 wrat3@2; s9i  iw on cohort9; s12i  iw on cohort12; s9s  sw on cohort9; s12s  sw on cohort12; iw sw on ...; %BETWEEN% ib sb  wrat@0 wrat2@1 wrat3@2; s9i@0; s12i@0; s9s@0; s12s@0; wrat@0; wrat2@0; wrat3@0; ib sb s9i s12i s9s s12s on AFF MED; MED on AFF; MODEL INDIRECT: s9i IND MED AFF; s12i IND MED AFF; s9s IND MED AFF; s12s IND MED AFF; ib IND MED AFF; sb IND MED AFF; 


Chisquare and related fit statistics are not available when means, variances, and covariances are not sufficient statistics for model estimation. This is the case with TYPE=RANDOM. MODEL INDIRECT is also not available. You can use MODEL CONSTRAINT to specify the indirect effects. 


I have a negative binomial model with time (4 timepoints) nested within students (N=18921) nested within schools (N=132). I am predicting intercepts and longitudinal slopes of alcohol consumption (a negative binomial outcome). I am having trouble getting these models to converge in wide format (they are taking several days, and often not converging). However, when I switch the data to long, the models run much more quickly. I understand the differences between the models in long versus wide format. But, is it running these models in the long format still a valid way to analyze this data? Thanks in advance. 


One more question regarding running the model in wide format. I want to use both individual level covariates (sex, race, SES, etc.) and school level covariates (public vs. private, school locale, etc.) to predict individual intercepts and slopes. I created withinlevel latent intercept, slope and slope squared terms (intcptw slopew squarew). However, I get an error message when I try to use these terms on the between level ("Withinlevel variables cannot be used on the between level."). I am worried, however, that by creating a second set of latent terms (intcptb slopeb squareb) on the between level I am actually predicting schoollevel intercepts and slopes and not individuallevel intercepts and slopes. How can I be sure I am predicting individual level and not school level intercepts and slopes when using "between level" variables? Model: %Within% intcptw slopew squarew  dyam1@0 dyam2@1 dyam3@6 dyam4@13 ; intcptw slopew squarew sex race SES ; dyam1dyam4(1); %Between% intcptb slopeb squareb  dyam1@0 dyam2@1 dyam3@6 dyam4@13 ; intcptb slopeb squareb public locale1 locale2 ; dyam1dyam4@0; 


For your first question you have to send the outputs for the same model done in wide and long for us to see. For your second question, intcptb etc on Between are the betweenlevel parts of the growth factors, so it is correct to regress them on betweenlevel covariates. So, betweenlevel covariates predict the betweenlevel part of the growth factors which in turn predict the betweenlevel part of the dyam outcomes, which therefore predict the observed dyam outcomes. 


Using complex survey data, I am trying to run a threelevel growth model where children are nested in 4 time points as well as neighborhoods. My data are in long format due to nonconstant weights over time. I've written code based on example 9.16 where y2= outcome, x1x5= timeinvariant covariates, a1a5= timevarying covariates. When I attempt to run the model I get the following error: *** ERROR IN MODEL command Betweenlevel variables cannot be used on the within level. Betweenlevel variables used: Y2. How do I need to structure my code to avoid this error? Also, How would I modify my code to set the covariance between the random intercept and slope to zero? TITLE: Random Intercept & Slope Model with TimeInvariant & Time Varying Covariates; DATA: FILE IS C:...\Mplus\math_long.csv; VARIABLE:NAMES= id clus strat iptw x1 y3 y4 y1 y2 x2x5 a1a5 time; USEVARIABLE= clus strat iptw x1 y2 x2x5 a1 a2 a4 a5 time; MISSING= ALL (1234); CLUSTER= clus; STRATIFICATION= strat; WEIGHT= iptw; WITHIN= time y2 a1 a2 a4 a5; BETWEEN= x1x5; ANALYSIS: TYPE= TWOLEVEL COMPLEX RANDOM; MODEL: %WITHIN% s  y2 ON time; y2 ON a1 a2 a4 a5; %BETWEEN% y2 s ON x1x5; y2 WITH s; OUTPUT: SAMPSTAT TECH4 TECH8; 


Note that UG ex 9.16 does not put y (your y2) on the Within list. Note also that this UG ex shows that you should not say y2 ON a1 a2 a3 a4; Instead, this statement should only have one variable "a", which like y2 repeats 4 times. On our website you will find UG ex9.16 input and data so you can see how the data are structured. 


Thank you for the fast response. I apologize for being unclear. Each of the 'a' variables represent a different timevarying covariate, e.g., SES advantage, residential stability, rather than the same variable at different time points. After I removed y (my y2) from the Within list, I still received the same error. 


Please send your output, data, and license number to Support@statmodel.com. 


Hello, I keep on getting... THE MODEL ESTIMATION TERMINATED NORMALLY WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE... WITHIN = CumRiskTb; BETWEEN = ScholClim08 ScholClim09 ScholClim10; MISSING=ALL (999); IDVARIABLE IS ID; CLUSTER IS SchId; ANALYSIS: TYPE IS TWOLEVEL; MODEL: %WITHIN% iEMO sEMO MAMS_emo_08@0 MAMS_emo_09@1 MAMS_emo_10@2; iEMO sEMO on CumRiskTb; %BETWEEN% iSCLIM sSCLIM ScholClim08b@0 ScholClim09b@1 ScholClim10b@2; iEMOB sEMOB MAMS_emo_08@0 MAMS_emo_09@1 MAMS_emo_10@2; iEMOB on iSCLIM; sEMOB on sSCLIM; There are no negative variances.But some correlations are 999.00 (i.e. Iext & Sext) Can the warning be due to that? if so is there anything I can do to fix it? Thank you for your time! 


Please send the output and your license number to support@statmodel.com. 

Bep Uink posted on Thursday, October 15, 2015  12:33 am



Hello, I am trying to model the effect of stress on change in emotion across the day using the univariate (i.e. multi level) format. I have centered time on stressful event (which is a level 1 IV). So, t=0 when stressful event occurs; t = 1, t = 2 etc. are time points after the event and t = 1, t= 2 etc. are time points before the event. I am unsure how to interpret significant main effects of time. There is a sig. negative relationship between Time of Event and emotion, can I say that as time moves toward values > 0 (i.e. poststress) values of emotion decrease? 


Yes. 


Hi Bengt, I have a question about the interpretation of results. I run a multilevel (members in teams) latent growth model and include an individuallevel timevarying covariate with a random slope that varies on both the within and between levels. So, it is similar to the example of 9.14 in MPlus user guide. 1. How do I know a1a4 has an impact on y1y4? 2. For “S” reported in variance at within level, what does it mean if I get a positive estimate with significant pvalue. It means that there is a betweenperson difference? 3. For “S” reported in “means” at between level, I get a negative estimate with significant pvalue, what does it mean? 4. For “S” reported in “variance” at between level, I get a positive with significant pvalue, it means that there is betweenteam difference? 5. If I add a teamlevel predictor to predict “S” at between level, I get a positive estimate with significant pvalue, what does it mean? I am a new user to MPlus, so please help answer these questions. Many thanks. 


Answered on Mplus Support. 


Dear Muthen & Muthen We conducted a Randomized controlled trial (treatmement and control groups assessed in 4 time points) and we are interested in finding if different profiles of those subjects (we previously conducted a LPA based on their personality and found a 4 class solution) change over time differently (in variables such as anger, shame, paranoia, ..) also considering if they are in the treatment or in the control group We tried GMM, but that did not solve our problem, because we do not want to classify people considering how they change over time. We want to see how different profiles change over time considering also treatment/control condicton Is there any way to enter those 4 different profiles as multiple dummy variables in a LGCM? Or it there a better way to do it? Thank you so much 


Look at Latent Transition Analysis examples in the UG and papers on this topic (under Papers) on our website. 


Dear Mr. and Mrs Muthén We face the problem of a rather complex longitudinal data structure based on which we would like to recover a developmental score scale (e.g., latent variable growth score) and would very much appreciate your advice. The data structure is as follows: Intensive longitudinal data consisting of multiple digital assessments the students conducted throughout the school year in a digital platform/ formative assessment system (we face: imbalanced data per student and unequal time intervals (e.g. 5 – 100 assessments per student/ school year). Further, the students conducted different assessments (different items, maybe some overlap) and the items are binary coded (correct/false) Our aim is to get insight into students development in subject domains and we would like to extract a developmental score for further analyses (based on as few assumptions on e.g. functional form, dimensionality etc. as possible – maybe previous analyses on other data will show that we need to fit a multidimensional model with different dimensions for e.g., mathematics …) Could you give us a hint on potentially suitable latent variable (growth) modelling technique? Many thanks for your expertise in this regard! We highly appreciate it 


Regarding analysis of intensive longitudinal data, including unequal time intervals, you may want to study the article Hamaker, E.L., Asparouhov, T., Brose, A., Schmiedek, F. & Muthén, B. (2018). At the frontiers of modeling intensive longitudinal data: Dynamic structural equation models for the affective measurements from the COGITO study. Multivariate Behavioral Research, DOI: 10.1080/00273171.2018.1446819 (Online supporting material). See also our Short Course Topic 12 video and handout at http://www.statmodel.com/course_materials.shtml Growth modeling of latent variable constructs is treated in our Short Course Topic 4. See especially the section on Multiple Indicator growth. 

Rachel Dew posted on Wednesday, March 20, 2019  9:08 am



I have been working on a growth model of behavioral variables over four waves of data, using age as a time score. A reviewer has asked that I also use age as a control variable. Is that necessary and if not, how would I explain that? Rachel 


If the starting age varies to a substantively important degree, you should take this into account. A flexible and interesting way is to see the different starting ages as multiplecohorts like in UG ex 6.18. 

Back to top 