Message/Author 


using the multivariate and multilevel approach, i get the same parameter values but a different likelihood. the data consist of 89 persons, measured 18 times. there are some person level covariates as well. the model i want is akin to the multilevel approach, but the data are actually censored from above on the later measurements. since mplus (version 3) does not allow the multilevel approach, i was trying how to formulate the multilevel model under the multivariate approach. below the code for the two approaches and the loglikelihood multilevel (long data format, resp =dep var) VARIABLE: NAMES ARE id sex leeft type1 zitbal incont resp tijd; USEVARIABLE = id leeft zitbal incont resp tijd typ2 typ3; missing are all(99); CLUSTER = id; WITHIN = tijd; BETWEEN = leeft zitbal incont typ2 typ3; define: typ2=0; typ3=0; if type1==2 then typ2=1; if type1==3 then typ3=1; ANALYSIS: TYPE = TWOLEVEL RANDOM; type=missing; estimator=ml; MODEL: %WITHIN% resp ON tijd; %BETWEEN% resp on leeft typ2 typ3 zitbal incont; ************************ multivariate (wide data format, bar* dep vars) loglik = 4422.653 VARIABLE: NAMES ARE sex leeft type1 zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52; missing are all(99); usevariables are leeft zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52 typ2 typ3 ; !censored are resp1resp18 (a); define: typ2=0; typ3=0; if type1==2 then typ2=1; if type1==3 then typ3=1; ANALYSIS: type=missing; estimator=ml; MODEL: i s q bar1@.1 bar2@.2 bar3@.3 bar4@.4 bar5@.5 bar6@.6 bar7@.7 bar8@.8 bar9@.9 bar10@1.0 bar12@1.2 bar14@1.4 bar16@1.6 bar18@1.8 bar20@2.0 bar26@2.6 bar38@3.8 bar52@5.2; !fixed effects for time and time**2 s@0; q@0; i on typ2 typ3 zitbal incont leeft; bar1bar52(1) 


for a growth model with censored dependent variables, is there a way to define the censoring limits yourself? i have a set of 18 measurements censored from above. at the first measurement occasions, the censoring limit is not reached. however, in the model statement, all variables should be of the same type. the consequence is that mplus sets the censoring limit for these variables at the maximal observed value. this is not correct for the first measurements. 


Wide versus long will have different log loglikelihood metrics. You cannot declare the censoring limit but you can change it by transforing the variable using the DEFINE command. 


dear Bengt, thx for the reply, but could you a bit more detailed? in what respect to the metrics differ? censoring: how could i set the censoring limit the same for all variables (at 20), where this limit is reached for some nut not all variables thanks 


I am not sure what more to say. The loglikelihoods are on different scales because the number of variables differ. You can treat some variables as censored and some as not. This is allowed in Mplus. 


that puzzles me. the models are identical, with identical parameter estimates. a random intercept model (multilevel) boils down to a compound symmetry model (multivariate). with identical likelihoods. i am sure i am missing something. if only i knew what... censored data: this (partial) code gives the following error message VARIABLE: NAMES ARE sex leeft type1 zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52; missing are all(99); usevariables are leeft zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52 typ2 typ3 ; censored are bar4bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52 (a); ANALYSIS: type=missing; estimator=ml; INTEGRATION = GAUSSHERMITE (25); MODEL: i s q bar1@.1 bar2@.2 bar3@.3 bar4@.4 bar5@.5 bar6@.6 bar7@.7 bar8@.8 bar9@.9 bar10@1.0 bar12@1.2 bar14@1.4 bar16@1.6 bar18@1.8 bar20@2.0 bar26@2.6 bar38@3.8 bar52@5.2; !fixed effects for time and time**2 s@0; q@0; i on typ2 typ3 zitbal incont leeft; bar1bar52(1) ** Observed outcomes in a growth process must be measured on the same scale. ** i do not get this message when i declare all variables to be censored. but then the first three (bar1bar3) get censoring limits at values lower than the true one because the latter is not reached. 


The loglikelihood is different with the long versus wide format because with the long format you have one variable to represent the repeated measures and with the wide format you have several variables to represent repeated measures. You can specify the same model and get the same parameters estimates but not the same loglikelihood. With a growth model, all variables must be measured on the same scale. So all variables must be inlcuded on the CENSORED list. 


You can also trick Mplus to set the censoring limits by adding one observation at the censoring limit and weight 0 (weight 1 for all other observations). 


thanks for the censoring trick. wrt the first issue: if 2 models are the same, shouldnt they have the same likelihood? i can only think of the situation of logistic regression, where the likelihoods of the grouped and ungrouped case are proportional. in our case, the likelihoods must be at least proportional as well. otherwise comparing nested models will give different LR test values under each of the formats. 


You are right that for LR difference testing nested models should give the same p value across the univariate long and the multivariate wide approaches. As an example of equivalent long and wide model versions, you may want to check the log likelihood and the relationship between the User's Guide ex 9.16 and 6.10 with the restrictions indicated. 


what i did and the results are unsatisfactory. 1. I ran ex 9.16 on the ex 6.10 data that i reformatted (since mplus 3.13 does not yet have the widetolong format). the model has 11 parameters and a loglik of 3075.853. this result is also obtained when using spss mixed. the model without an effect of 'a3' has a loglik of 3165.564 (again, the same is obtained with spss mixed). i then did the 2 equivalent models with the wide format (syntax see below). for both models, parameters estimates are almost identical with the previous ones. the logliks are 7261.105 and 4566.702 respectively. it is clear that the LR statistic has a substantially different value for the wide and the long format. 


**** with effect of a3 ***** DATA: FILE IS ex6.10.dat; VARIABLE: NAMES ARE y11y14 x1 x2 a31a34; ANALYSIS: ESTIMATOR = ML; MODEL: i s  y11@0 y12@1 y13@2 y14@3; i s ON x1 x2; y11 ON a31 (2); y12 ON a32 (2); y13 ON a33 (2); y14 ON a34 (2); y11y14 (1); **** without effect of a3 ***** DATA: FILE IS ex6.10.dat; VARIABLE: NAMES ARE y11y14 x1 x2 a31a34; usevariable= y11y14 x1 x2; ANALYSIS: ESTIMATOR = ML; MODEL: i s  y11@0 y12@1 y13@2 y14@3; i s ON x1 x2; y11 ON a31 (2); y12 ON a32 (2); y13 ON a33 (2); y14 ON a34 (2); y11y14 (1); 


Please send the two Mplus wide inputs and outputs, the two Mplus long inputs and outputs, the wide data, the long data, and your license number to support@statmodel.com so we can see the full picture. 


The problem is that when you did the wide analysis witihout the regression on the timevarying covariate, you removed the four timevarying covariates from the analysis. You should have instead kept the variables and fixed the regression coefficients to zero. If you do that, then the two loglikelihood differences are the same. I will send you the new output. 


Perhaps the issue is that you have covariates in the model. With the long, 2level approach I think the likelihood is considered for y conditional on x's (the x part does not impose any restrictions so this is ok), whereas with the wide multivariate version, the x's are part of the likelihood (although the x part is still unrestricted). Try this conjecture by doing the wide approach with Type = random so it would also condition on x's. 


thanks. it is indeed a different loglik under the two approaches long: log(yx) wide: log(y,x)=log(yx)+log(x) empty models, without predictors, have indeed the same loglik for the wide and long format. it explains as well why, in the wide format, a model with a covariate who's effect is put to zero results in a different likelihood than the model without the covariate. type=random does the job 


Hello, I am wondering whether is it still true that Mplus uses joint ML  log(y,x)=log(yx)+log(x) for growth modeling with wide data and conditional ML (log(yx)) when data are in long format? I am using version 7. If I am missing documentation regarding estimation techniques under different growth modeling situations, I would appreciate a reference to this information. Thanks very much! 


Mplus now uses [y  x] for all analyses. This was introduced in Version 6.1 and is described in Version History under the heading Analysis Conditional on Covariates. 


Thank you. Does this also mean that y is conditioned on the random effects in the likelihood function, so that the FIML estimation procedure would be identical to that used in other multilevel software, and different from the estimation method used in other SEM software? 


Multilevel software doesn't condition on the random effects, but integrates them out to get [y  x]. In Rasch modeling, there is a conditional ML estimator (giving rise to the oppposite marginal ML). 

xiaoyu bi posted on Thursday, January 02, 2014  9:03 am



Dear Dr. Muthen, Happy New Year! I have a question about wide or long format approach. I have 5 waves data: 424 people in the first wave, and among them 231 had finished all 5 waves data. I wanted to use all 424 people's information for my analyses. Should I use long or wide format approach given there are so many missing values? Also, are there any differences between wide format approach and long format approach using multilevel analysis if I use all 424 people? Thank you so much! 


The missing value issue is the same whether the data are in long or wide format. Wide format has more flexibility because more parameters are available, for example, residual variances in a growth model do not need to be held equal as they are in the long format. For the same model, data, and estimator, you will obtain the same results with the long and wide formats. 

xiaoyu bi posted on Friday, January 03, 2014  11:55 am



Hi, Linda, Thank you so much for your response. One more question: for the longitudinal modeling in wide format, we can do multilevel modeling approach (between and withincluster variation) and multivariate approach (latent variable growth modeling where the cluster members are the repeated measures over time). Are there any differences between the two approaches using the same wide format data? If the data have a lot of missing values, is the multilevel approach is better than multivariate approach? Thank you! 


See the Topic 8 course handout and video on the website. These issues are thoroughly described there. 

Emily Blood posted on Friday, August 01, 2014  9:09 am



I get a warning that a variable specified as an x in BETWEEN and a y in WITHIN will be treated as a Y in both. I don't want this to happen because I wanted to regress a distal outcome (BDI) on the latent intercept and slope. Does the warning above indicate that I'm not able to do this? Output seems like y was treated as x in BETWEEN and y in WITHIN. Syntax is below: CLUSTER=userid; WITHIN = days prompted; BETWEEN = BDI; ANALYSIS: type=twolevel random; MODEL: %within% s  y on days ; y on prompted; %between% BDI on y s; 


You can use y on the righthand side of ON as you have done. Because it is a dependent variable on within, distributional assumptions will be made about it. 


Dear Linda and Bengt, I estimate latent class trajectories using both the multilevel and the multivariate approach. I find equal estimaties (after applying robust SE also in the multivariate model using MLR). When I want to use the MIXTUREalgorithm, I find very different results. The multivariate approach splits data almost equally while the multilevel approach splits 98% to 2%. Why is that and is there a setting with which I might reproduce the multivariate results with the multilevel approach? * Multilevel: *** TITLE: DATA: FILE IS ex8.1.dat; DATA WIDETOLONG: WIDE = y1y4 ; LONG = y; IDVARIABLE = person; REPETITION = time; VARIABLE: NAMES ARE y1y4 x; USEVARIABLES ARE y time person; CLASSES = c (2); Cluster = person; Within = time; ANALYSIS: TYPE = TWOLEVEL MIXTURE; STARTS = 20 2; MODEL: %Within% %OVERALL% y on time; %Between% %OVERALL% y; OUTPUT: TECH1 TECH8; * Multivariate: *** TITLE: DATA: FILE IS ex8.1.dat; VARIABLE: NAMES ARE y1y4 x; USEVARIABLES ARE y1y4; CLASSES = c (2); ANALYSIS: TYPE = MIXTURE; STARTS = 20 2; MODEL: %OVERALL% i s  y1@0 y2@1 y3@2 y4@3; y1y4 (A); i with s@0; s@0; OUTPUT: TECH1 TECH8; 


I came the solution a step closer: When explicitly mentioning the different classes, the class sizes and the parameters are estimated much more similar. Nevertheless, the models are not identical yet but I don't see why. * Multilevel *** MODEL: %Within% %OVERALL% y on time; %c#2% y on time; y; ! Frees the residual variance of y to be independently estimated in each class; %Between% %OVERALL% y; * Multivariate: *** MODEL: %OVERALL% i s  y1@0 y2@1 y3@2 y4@3; i; y1y4 (A); s@0; %c#2% i s  y1@0 y2@1 y3@2 y4@3; i; y1y4 (B); s@0; 


In the Multivariate approach you have a mean of s even though you fix the variance at zero. You can get that mean of s in the twolevel approach by changing the within statement to s  y on time; and then fix s@0 on Between to estimate only the s mean. 


Thank you for the advice, Bengt! I specified the model now as following and run a TYPE = TWOLEVEL MIXTURE RANDOM: %Within% %OVERALL% s  y on time; %c#2% s  y on time; y; %Between% %OVERALL% y; s@0; This leads, however, to a next problem: *** ERROR in MODEL command Random effect variables can only be declared in the OVERALL model. If I comment out "s  y on time;" in the 2nd class, the model runs fine but the variables are fixed to be equal. 


Instead of saying %c#2% s  y on time; you want to say %c#2% [s]; or %c#2% [s]; s; 


Dear Bengt, thanks for the advice but if I ask for "%c#2% s;" I receive an error saying that a betweenlevel variable is not allowed to vary between classes. I decided to set the variances of s to 0. When I run these models without mixture specs, I get very very similar findings.
* Multivariate: i s  y1@0 y2@1 y3@2 y4@3; s@0; y1y4 (A); * Twolevel random: %Within% s  y on time; y; %Between% s@0;
If I run these models as a mixture with 2 classes (without class specific specs) I receive different class sizes and different estimators:
multi multi level variate #class 1 mean I 2.650 1.472 mean S 0.179 0.208 var I 2.995 1.928 var S 0.000 0.000 res I 0.580 0.649 #class 2 mean I 1.162 2.291 mean S 1.299 1.118 var I 2.995 1.928 var S 0.000 0.000 res I 0.580 0.649 mean c1 0.100 0.241
In both models, variances and error terms are set to be equal over classes. But why do the classes differ so much from each other? 


Check that the 2 mixture runs have the same number of parameters. If this doesn't help, send those 2 mixture outputs to Support along with your license number. 

Back to top 