Message/Author 


using the multivariate and multilevel approach, i get the same parameter values but a different likelihood. the data consist of 89 persons, measured 18 times. there are some person level covariates as well. the model i want is akin to the multilevel approach, but the data are actually censored from above on the later measurements. since mplus (version 3) does not allow the multilevel approach, i was trying how to formulate the multilevel model under the multivariate approach. below the code for the two approaches and the loglikelihood multilevel (long data format, resp =dep var) VARIABLE: NAMES ARE id sex leeft type1 zitbal incont resp tijd; USEVARIABLE = id leeft zitbal incont resp tijd typ2 typ3; missing are all(99); CLUSTER = id; WITHIN = tijd; BETWEEN = leeft zitbal incont typ2 typ3; define: typ2=0; typ3=0; if type1==2 then typ2=1; if type1==3 then typ3=1; ANALYSIS: TYPE = TWOLEVEL RANDOM; type=missing; estimator=ml; MODEL: %WITHIN% resp ON tijd; %BETWEEN% resp on leeft typ2 typ3 zitbal incont; ************************ multivariate (wide data format, bar* dep vars) loglik = 4422.653 VARIABLE: NAMES ARE sex leeft type1 zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52; missing are all(99); usevariables are leeft zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52 typ2 typ3 ; !censored are resp1resp18 (a); define: typ2=0; typ3=0; if type1==2 then typ2=1; if type1==3 then typ3=1; ANALYSIS: type=missing; estimator=ml; MODEL: i s q bar1@.1 bar2@.2 bar3@.3 bar4@.4 bar5@.5 bar6@.6 bar7@.7 bar8@.8 bar9@.9 bar10@1.0 bar12@1.2 bar14@1.4 bar16@1.6 bar18@1.8 bar20@2.0 bar26@2.6 bar38@3.8 bar52@5.2; !fixed effects for time and time**2 s@0; q@0; i on typ2 typ3 zitbal incont leeft; bar1bar52(1) 


for a growth model with censored dependent variables, is there a way to define the censoring limits yourself? i have a set of 18 measurements censored from above. at the first measurement occasions, the censoring limit is not reached. however, in the model statement, all variables should be of the same type. the consequence is that mplus sets the censoring limit for these variables at the maximal observed value. this is not correct for the first measurements. 


Wide versus long will have different log loglikelihood metrics. You cannot declare the censoring limit but you can change it by transforing the variable using the DEFINE command. 


dear Bengt, thx for the reply, but could you a bit more detailed? in what respect to the metrics differ? censoring: how could i set the censoring limit the same for all variables (at 20), where this limit is reached for some nut not all variables thanks 


I am not sure what more to say. The loglikelihoods are on different scales because the number of variables differ. You can treat some variables as censored and some as not. This is allowed in Mplus. 


that puzzles me. the models are identical, with identical parameter estimates. a random intercept model (multilevel) boils down to a compound symmetry model (multivariate). with identical likelihoods. i am sure i am missing something. if only i knew what... censored data: this (partial) code gives the following error message VARIABLE: NAMES ARE sex leeft type1 zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52; missing are all(99); usevariables are leeft zitbal incont bar1bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52 typ2 typ3 ; censored are bar4bar10 bar12 bar14 bar16 bar18 bar20 bar26 bar38 bar52 (a); ANALYSIS: type=missing; estimator=ml; INTEGRATION = GAUSSHERMITE (25); MODEL: i s q bar1@.1 bar2@.2 bar3@.3 bar4@.4 bar5@.5 bar6@.6 bar7@.7 bar8@.8 bar9@.9 bar10@1.0 bar12@1.2 bar14@1.4 bar16@1.6 bar18@1.8 bar20@2.0 bar26@2.6 bar38@3.8 bar52@5.2; !fixed effects for time and time**2 s@0; q@0; i on typ2 typ3 zitbal incont leeft; bar1bar52(1) ** Observed outcomes in a growth process must be measured on the same scale. ** i do not get this message when i declare all variables to be censored. but then the first three (bar1bar3) get censoring limits at values lower than the true one because the latter is not reached. 


The loglikelihood is different with the long versus wide format because with the long format you have one variable to represent the repeated measures and with the wide format you have several variables to represent repeated measures. You can specify the same model and get the same parameters estimates but not the same loglikelihood. With a growth model, all variables must be measured on the same scale. So all variables must be inlcuded on the CENSORED list. 


You can also trick Mplus to set the censoring limits by adding one observation at the censoring limit and weight 0 (weight 1 for all other observations). 


thanks for the censoring trick. wrt the first issue: if 2 models are the same, shouldnt they have the same likelihood? i can only think of the situation of logistic regression, where the likelihoods of the grouped and ungrouped case are proportional. in our case, the likelihoods must be at least proportional as well. otherwise comparing nested models will give different LR test values under each of the formats. 


You are right that for LR difference testing nested models should give the same p value across the univariate long and the multivariate wide approaches. As an example of equivalent long and wide model versions, you may want to check the log likelihood and the relationship between the User's Guide ex 9.16 and 6.10 with the restrictions indicated. 


what i did and the results are unsatisfactory. 1. I ran ex 9.16 on the ex 6.10 data that i reformatted (since mplus 3.13 does not yet have the widetolong format). the model has 11 parameters and a loglik of 3075.853. this result is also obtained when using spss mixed. the model without an effect of 'a3' has a loglik of 3165.564 (again, the same is obtained with spss mixed). i then did the 2 equivalent models with the wide format (syntax see below). for both models, parameters estimates are almost identical with the previous ones. the logliks are 7261.105 and 4566.702 respectively. it is clear that the LR statistic has a substantially different value for the wide and the long format. 


**** with effect of a3 ***** DATA: FILE IS ex6.10.dat; VARIABLE: NAMES ARE y11y14 x1 x2 a31a34; ANALYSIS: ESTIMATOR = ML; MODEL: i s  y11@0 y12@1 y13@2 y14@3; i s ON x1 x2; y11 ON a31 (2); y12 ON a32 (2); y13 ON a33 (2); y14 ON a34 (2); y11y14 (1); **** without effect of a3 ***** DATA: FILE IS ex6.10.dat; VARIABLE: NAMES ARE y11y14 x1 x2 a31a34; usevariable= y11y14 x1 x2; ANALYSIS: ESTIMATOR = ML; MODEL: i s  y11@0 y12@1 y13@2 y14@3; i s ON x1 x2; y11 ON a31 (2); y12 ON a32 (2); y13 ON a33 (2); y14 ON a34 (2); y11y14 (1); 


Please send the two Mplus wide inputs and outputs, the two Mplus long inputs and outputs, the wide data, the long data, and your license number to support@statmodel.com so we can see the full picture. 


The problem is that when you did the wide analysis witihout the regression on the timevarying covariate, you removed the four timevarying covariates from the analysis. You should have instead kept the variables and fixed the regression coefficients to zero. If you do that, then the two loglikelihood differences are the same. I will send you the new output. 


Perhaps the issue is that you have covariates in the model. With the long, 2level approach I think the likelihood is considered for y conditional on x's (the x part does not impose any restrictions so this is ok), whereas with the wide multivariate version, the x's are part of the likelihood (although the x part is still unrestricted). Try this conjecture by doing the wide approach with Type = random so it would also condition on x's. 


thanks. it is indeed a different loglik under the two approaches long: log(yx) wide: log(y,x)=log(yx)+log(x) empty models, without predictors, have indeed the same loglik for the wide and long format. it explains as well why, in the wide format, a model with a covariate who's effect is put to zero results in a different likelihood than the model without the covariate. type=random does the job 


Hello, I am wondering whether is it still true that Mplus uses joint ML  log(y,x)=log(yx)+log(x) for growth modeling with wide data and conditional ML (log(yx)) when data are in long format? I am using version 7. If I am missing documentation regarding estimation techniques under different growth modeling situations, I would appreciate a reference to this information. Thanks very much! 


Mplus now uses [y  x] for all analyses. This was introduced in Version 6.1 and is described in Version History under the heading Analysis Conditional on Covariates. 


Thank you. Does this also mean that y is conditioned on the random effects in the likelihood function, so that the FIML estimation procedure would be identical to that used in other multilevel software, and different from the estimation method used in other SEM software? 


Multilevel software doesn't condition on the random effects, but integrates them out to get [y  x]. In Rasch modeling, there is a conditional ML estimator (giving rise to the oppposite marginal ML). 

xiaoyu bi posted on Thursday, January 02, 2014  9:03 am



Dear Dr. Muthen, Happy New Year! I have a question about wide or long format approach. I have 5 waves data: 424 people in the first wave, and among them 231 had finished all 5 waves data. I wanted to use all 424 people's information for my analyses. Should I use long or wide format approach given there are so many missing values? Also, are there any differences between wide format approach and long format approach using multilevel analysis if I use all 424 people? Thank you so much! 


The missing value issue is the same whether the data are in long or wide format. Wide format has more flexibility because more parameters are available, for example, residual variances in a growth model do not need to be held equal as they are in the long format. For the same model, data, and estimator, you will obtain the same results with the long and wide formats. 

xiaoyu bi posted on Friday, January 03, 2014  11:55 am



Hi, Linda, Thank you so much for your response. One more question: for the longitudinal modeling in wide format, we can do multilevel modeling approach (between and withincluster variation) and multivariate approach (latent variable growth modeling where the cluster members are the repeated measures over time). Are there any differences between the two approaches using the same wide format data? If the data have a lot of missing values, is the multilevel approach is better than multivariate approach? Thank you! 


See the Topic 8 course handout and video on the website. These issues are thoroughly described there. 

Back to top 