
Message/Author 


Hello, We are considering whether to use Mplus to examine daily diary data from a large randomized clinical trial of patch vs. placebo for smoking cessation. The primary data of interest are number of cigarettes smoked per day (if any) and selfreported responses [ e.g., selfefficacy (SE)] during each day of the study period (6 weeks, or about 45 days). Our primary aim is to examine the effect of responses to lapse on subsequent lapse (yes/no and #cigs) and ultimately relapse. We assume that we will need to fit a ZIP or twopart model to deal with zeros. We would then want to fit responses to lapse as a parallel process, and examine the relationship between the two processes. 1. Is it possible to do this sort of modeling, given the large number of time points? We also have ~5 reports per day, so we could potentially have an even greater number of time points per subject. How many time points are too many? Are there ways to reduce the computational demands of such models (would you recommend a piecewise approach)? 2. Following Curran, Stice, and Chassin (Jrl of Consult & Clin Psy, 1997), would it be possible to remove the latent growth parameters and fit an autoregressive crosslagged model that would capture the bidirectional influence of lapse and response over time (e.g., time_1_lapse > time_1_selfefficacy > time_2_lapse ...) ? If so, what would the Mplus code look like? 3. Could you offer other recommendations re: the use of Mplus to explore these data? thanks in advance! 


1. 45 variables (time points) is already rather heavy for growth modeling in the wide, multivariate form. But a growth model with many time points, be it for a single process or parallel processes, can be turned into a 2level model with individual as level 2 (cluster = id) arranging the data in the long form. With 2 outcomes per time point (1 for each process) and say 100 time points, your singlelevel growth model has 200 variables, but your twolevel model has only 2 variables  the 100 time points become the size of the clusters. I am not sure that growth modeling is the appropriate tool here, however. It seems like the smoking process is an on/off process and efficacy also does not follow a growth process. Autoregressive models may be more to the point. 2. Such autoregressive models need the data in the wide form. With y for lapse and z for efficacy, you simply say y2 on z1; ! or perhaps y2 on y1 z1; z2 on y1; etc 3. Don't have a good, simple answer. For the lapse process, particularly for long series, you might want to study statistics books on Hidden Markov models, such as MacDonald & Zucchini (1997; ChapmanHall), although I don't know software availability. You may also want to contact Joe Schafer at Penn State who is working on similar problems of long series. 


Dr. Muthen, Thanks very much for your response. There are ways we could cut down the number of time points. For instance, we could use only the ~20 days that follow initial lapse. What exactly are the disadvantages of using a large number of time points in Mplus? Would running a model with this many points take a long time? We have previously used multilevel modeling to look at smoking withdrawal responses over time and found interesting results (using a similar approach to Piasecki et al., 2003). Given that n=211 of 306 in our trial lapsed, smoking an average of 29 cigs between intial lapse and relapse, our hope was that we could use a ZIP approach to model this along side withdrawal/urge/SE etc. Could you explain why you don't think reponses like these would follow a "growth process" in the usual sense (obviously I'm new to this area)? thanks again. 


In a multivariate analysis (wide form of longitudinal analysis), having many variables, say > 2030, makes the computations heavy and the analysis take a long time. Using 20 days and 2 parallel processes would lead to 40 variables. I don't know enough about this phenomenon to say if growth modeling is appropriate. I don't know the Piasecki ref. I guess growth modeling would be relevant if, for example, after lapse, the number of cigs smoked increases over time. Or, does the smoking tendency bounce up and down over time with no specific trend, in which case a random intercept only growth model might be useful. In other words, there should be a growth/stability question of interest. If I were a consultant, I would have to hear detailed research questions related to the course of smoking to understand what the proper modeling would be. 


Dear Dr. Muthen, when working with a large number of observations per person (varying from 10 to 50), you recommend using the long approach to data structuring. However, when estimating latent growth trajectories, the model assigns cases to latent classes at the within level. Specifically, in example 10.1 of User's guide, the number of clusters is 110, but individual observations are assigned to 2 classes such that class 1 contains 573 cases and class 2 contains 427 cases. This is fine for students nested within schools example, but is problematic for longitudinal data. Is there a way to change the syntax so that the class assignment is done at the person (between) level similarly to how it is done in wide data approach? THANK YOU! 


See Example 10.2 for the language for between classes. 


Thank you for the reference to ex 10.2. While the syntax is helpful, it's the same problem: the assignment to latent classes is happening on the 1st level of the model. Here is a small excerpt: Number of observations 1000 Number of clusters 110 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 503.43696 0.50344 2 496.56304 0.4965 Is it possible to assign clusters and not individual observations? I am thinking of longitudinal data where we can't assign individual time points to latent classes, instead we want to classify trajectories. THANK YOU! 


In ex 10.2, "cb" is defined as a level 2 variable. The final class counts concern individuals, but that doesn't mean the classification is on level 1. Saving the "cprob"s you will see individuals being partial members in different level 2 classes. 


Thank you for your response. I have another question: I am working on the GMM model (set as a multilevel model)where latent trajectories are estimated from count data. When I introduce a distal outcome u, the class membership changes drastically. It appears that the model takes into account only u, assigning people with positive u to one trajectory class and people with negative u to another, so there is a perfect separation. From previous posts on this topic, when the situation like this arises, estimated classes are questionable. How can I investigate this problem? are there any solutions? as I can not trust the results as they are now. Thank you in advance. 


That outcome suggests that the GMM for the count growth does not have clearly defined latent classes, but a solution that is easily influenced by an extra variable u. You want to make sure BIC and other indices clearly point to k>1 classes when analyzing without u. And, for the analysis with u, you also want to use a large number of random starts to make sure you have found the best solution. 


Thank you, this is very helpful. To follow up on my question: if all model fit indices (BIC, AIC, LogL, LMR test, and BLRT) favor K > 1 classes for the model that doesn't include u, is it possible that once u is included, the fit changes in favor of 1 class? What is the best way to introduce u to the model  should all parameters be fixed to the values obtained in models without u? Also, what would be a large number of random starts (100?)? Finally, when I compare models with different number of latent classes to each other, they have slightly different sets of covariates as some covariates are only useful for some latent classes  so, strictly speaking, the models aren't nested. Is that a problem for model comparison stats? 


another clarification: when a U variable is added, for example, to a 2latentclass model, it sets the model of balance completely, changing most parameter estimates, significance of covariates, number of people in latent classes, fit stats. Is it an indicator that the model was not estimated reliably in the 1st place? Or does it point to the U variable being not appropriate? 


It is possible that once u is added, one class might be preferred. I would test this as a first step. You should not fix parameters when you add u. A large number of random starts is 1000 100 or 5000 250. As long as the set of dependent variables are used, this is okay. The model is estimated condtioned on the covariates. 


Hello Linda, I have a quick question. Is it appropriate to use model fit indices (e.g. BIC and ABIC) to compare growth mixture models with and without a distal outcome? For example, for a threeclass GMM without a distal outcome the BIC may be smaller than for the same model with a distal outcome. Can these models be compared statistically (besides making a conceptual decision)? Thank you, Mariya 


No, you have to have the same dependent variables in the model to be able to compare the BIC values in the same metric. 

Back to top 

