

Extreme skew in longitudinal model 

Message/Author 


Hello: I have 4 waves of measurement on children’s behavioral problem scores (normally distributed) and 4 corresponding measurements on the # of placements experienced between waves for 5500 children in child welfare system. The placement data is extremely skewed and has many zeros. The %s of children experiencing, respectively, 0, 1, 2, and 3 or more placements in the four waves are: wave 1: 85, 10, 3, 1; wave 2: 75, 12, 7, 6; wave 3: 88, 7, 3, 2 and wave 4: 95, 2, 1, 2. I am using a crosslagged panel design, with the # of placements predicting behavior problems and visa versa. For now, I am treating # of placements as a 4category ordinal variable, but I wonder whether the % of zeros and/or skew are such that this model ceases to be viable. How do I assess? Would a zeroinflated poisson model be preferred? A problem is that the inflated portion can’t be used as a predictor (page 521). Can mplus build overdispersion into a poisson model? The semicontinuous model seems to hold possibilities. If I don’t switch to a growth model, can the binary and continuous variables created from the placement variable at an earlier wave function as predictors of outcomes in a subsequent one? I may exclude wave 4, due to its low variability and other problems. I can exclude a subgroup of 1000 kids who experienced hardly any placements. Would these changes help? Thanks. Jim 


In Version 5.1, we added several models for count variables. These are documented in the Version 5.1 Language Addendum and the Version 5.1 Examples Addendum which are on the website with the user's guide. Try the negative binomial model. I would not use a semicontinuous model. There are not enough values in the tail. See how the negative binomial model works before you consider other changes. 


Hello, Is it possible to do a groupbased trajectory analysis based on cost data? Time points will be 10+. There are obviously many options for fitting longitudinal cost data, but I didn't know if these options would then be limited within the GBTM framework? Thank you 


You can do growth mixture modeling. I don't know what GBTM means. 


How skewed do data have to be such that the MLR estimator is no longer appropriate in the context of LCGA/GMM? I am attempting to identify longitudinal trajectories in my data on ADHD symptoms across 4 ages in which there are many zeros (over 50% zeros). Using TECH13, both the skew and kurtosis tests are statistically significant p<0.0001. If I ignore the skewness and treat the outcome as continuous, I get solutions for a 4class LCGA with quite high entropy (0.94). But are these results invalidated by the skew of the data? I have also run the analyses with a negative binomial distribution, but the entropy is much lower (about 0.74). If you have any advice about how would be most appropriate to proceed I would really appreciate it. Many thanks. 


When you have that high degree of a floor effect, you should not treat the variable as continuous. Instead, you can treat it as censored or do twopart modeling  both have examples in the UG. 

Back to top 

