Message/Author 

Anonymous posted on Wednesday, May 07, 2003  9:58 am



I have not seen any conversation on LTA in Mplus Discussion. Can we conduct LTA in Mplus? 


Only with two timepoints. Version 3 will have LTA with more than 2 timepoints. 

Anonymous posted on Friday, May 09, 2003  7:44 am



Great! When will Mplus v.3 be released? 


Fall 2003. 

Anonymous posted on Tuesday, May 04, 2004  10:23 am



Hi. I'm trying to fit a latent transition model where the latent class indicators are continous variables. Is it possible to fit this type of model in Mplus? 


Yes. 

Anonymous posted on Tuesday, May 04, 2004  12:06 pm



Could you point me to an example for this? Thanks. 


It would be Example 8.13 but without the CATEGORICAL statement and with intercepts instead of thresholds. So refer to the intercept as [u21] rather than the threshold as [u21$1]. 

Anonymous posted on Wednesday, May 05, 2004  10:57 am



Great! Thanks! 

Anonymous posted on Saturday, December 11, 2004  11:18 am



Is the model in example 8.13 exactly the same as the one in Reboussin, et.al 1998? 

bmuthen posted on Sunday, December 12, 2004  11:05 am



It looks from page 460 of the Reboussin et al article that they do not allow c1 and x to interact as shown in Ex 8.13. The broken arrow from c1 to the arrow from x to c2 is not accounted for in their model because the page 460 formula does not have subscript k on the gamma slope for x. Otherwise the models are the same. 

Anonymous posted on Monday, December 13, 2004  2:26 pm



Thank you Dr. Muthen for answering my previous question. Continuing with the example 8.13, I ran the program with 3 classes in the model and set class 3 as the reference class then I should have 6 specific transition parameters (beta_km, k=1,2,3,m=1,2) like the paper said. But in the Mplus output I only got 4 parameters (k=1,2; m=1,2). How about the transition from the reference class to the other two classes? 


The latent transition table is described at the end of Chapter 13. I think the two parameters that you are missing are the intercepts of c2. 


I have conducted a multigroup longitudinal LCA (2 occasions, gender as "knownclass"). It appears from the output that Mplus constrains the latent transition probabilities (t1 class > t2 class) to be equal across gender (since they are only reported for the entire sample). Is there a way to set these probs free (it would be interesting for me to study whether the groups differ with respect to the tansition probs)? Thank you. 

bmuthen posted on Tuesday, May 03, 2005  10:02 am



To obtain gender differences, you want to regress the true class variables (c1 and c2, say) on the gender knownclass class variable (cg, say): c1#1c1#... ON cg#1; c2#1c2#... ON cg#1; See the User's Guide for details. 


I have actually done what you propose (in order to obtain different class sizes for males and females): MODEL: %OVERALL% c1#1 on csex#1; c1#2 on csex#1; c1#3 on csex#1; c1#4 on csex#1; c2#1 on csex#1; c2#2 on csex#1; c2#3 on csex#1; c2#4 on csex#1; Nevertheless the output only contains transition probabilities c1 > c2 for the entire sample. No separate transition probs for each group are reported. 


I think it is more efficient for you to send your output to support@statmodel.com and describe exactly which parameters you want to obtain. 


I'm sorry, I'm asking this again, but I think it's an important question (maybe also for others) and I have not yet received a satisfying answer. I have estimated a LTA model with 5 classes and 12 indicators on each of 2 occasions as a multigroup model (i.e., I have used gender as a knownclass variable). I assumed measurement invariance across groups and across occasions. Now there are different constraints that I'm interested in. First I have estimated a model with unequal initial class proportions delta but equal latent transition probabilities (tau) across genders. No problem, the number of parameters (89) is correct, the fit and the estimates are the same as in PANMARK. Then I wanted to test a less restrictive model in which not only the delta's are allowed to vary across genders but also the tau's. This seemed to work also, since I got the correct number of parameters (109) and exactly the same fit as in PANMARK. However, Mplus reported only a single taumatrix, while PANMARK reports separate matrices for both genders. I wonder whether it is a bug or if I'm missing something. Furthermore, I would like to know if there is a possibility to get standard errors for delta's and tau's in Mplus And  a very simple question: What is the correct citation for Mplus (sorry, if it's on the homepage  I just couldn't find it)? Thank you once again for your excellent support! 

BMuthen posted on Saturday, November 12, 2005  6:10 pm



Mplus does not print a tau matrix for each gender but instead the marginal transition matrix mixing the two genders. If you want the genderspecific tau matrix, you will have to compute it using the parameter estimates. This can be done in line with Chapter 13. Mplus does not provide standard errors for taus but they can be computed using the Delta method. The citation for Mplus is the citation of the user's guide which is shown on the second page of the user's guide. 


I am working on a latent transition analysis with more than two timepoints. Do you know of any LTA examples or papers that use more than two timepoints that I can use as a guide? I am unsure how to build the model. Thanks in advance for your help! 

bmuthen posted on Tuesday, January 10, 2006  8:53 am



I know of references with multitimepoint LTA using a single indicator per timepoint (which is also referred to as Markov Modeling). Here are two articles for which we also have the Mplus inputs: Langeheine, R. & van de Pol, F. (2002). Latent Markov chains. In Hagenaars, J.A. & McCutcheon, A.L. (eds.), Applied latent class analysis (pp. 304341). Cambridge, UK: Cambridge University Press. Mooijaart, A. (1998). Loglinear and Markov modeling of categorical longitudinal data. In Bijleveld, C. C. J. H., & van der Kamp, T. (eds). Longitudinal data analysis: Designs, models, and methods. Newbury Park: Sage. 


Dear Dr Muthen I am trying to run a LTA model with 4 time points, 3 variables with three categories, and 4 latent classes. I have run the model once and have then used the model thresholds as starting values for further model runs. However, in this rerun the TECH 8 output is indicating that each interation is taking about 20 minutes or so (time = approx. 1600.00). Is there any way in which I can speed this up. Would fixing those thresholds that are very large ( or +) reduce the estimation time. Thanks for your help Andy 


The generality of the current LTA implementation allows for not only 1st, but also 2ndorder and higherorder Markov processes. This generality makes the computing slow with many time points. Essentially, with 4 time points and 4 latent classes, you end up with a latent class model with 256 classes. This is on our list to simplify. In the meanwhile, looking at fewer timepoints at a time saves time. 

Boliang Guo posted on Wednesday, March 29, 2006  7:20 am



in mplus 4 ex8.14 the if the logit of c2#1 was fix at 15, same as for u11$1u14$1, the result will change, same siuation when change 20 to 15. what is the rule for fixing the logit, logistics coefficient to an extram vale for 0 or 1 probability? what is the difference betwen 15 and 10, and 4(i remember you mention 4 else where), exp(10. 15)are both extrem small! thanks. 


Using 10 or 15 for extreme logits is (almost) equivalent. Using 5 (or 4) is only approximate. In this example, we fix [c2#1@10] and in Model c we fix c2#1 ON c1#1@20. Because of this, c2 gets the logit 10 for c1=0 and +10 (= 10+20) for c1=1. So the difference between 10 and +20 (=10) is what is critical here; it should be at least 10. We could have used 15 and +30 instead. 


Dear Dr Muthen : I am trying to run a LTA with covariates. I have a question about that how to get the latent transition probabilities in the output. The transition probabilities of each individual is expressed by Reboussin et al. (1998). In the output, is this the ¡§average¡¨ transition probability of individuals? In Mplus, how is computed about the estimation of transition probabilities in literatures? Are there literatures about the estimation of transition probabilities in Mplus? Thanks for your help. TSAI 


Transition probabilities are population parameters in the LTA model, not individual characteristics. Mplus prints the estimates of these parameters. The Mplus User's Guide has several references to LTA that define transition probabilities; see e.g. the Mooijaart ref. If you think that Reboussin et al computes individual values, please send an email with a pdf of the article and point me to the page. 


Dear Dr Muthen : Thank you for answering my previous question. But I have not yet received a satisfying answer. Maybe I don¡¦t give you a clear expression about my question, I¡¦m asking this again. I fit the latent transition model with individual covariates similar to ex8.13 of Mplus version 3, that c2 is depend on c1 and Xi, i=1,¡KN. I get the latent transition probabilities based on the estimated model in output. How is computed about these transition probabilities in output of Mplus? In the page 317 of Mplus user's guide, the transition probabilities are expressed as P(c2=r c1=1) = exp(a_r +b_r1)/ sum, and so on. But the model without individual covariates is not my model. The transition probabilities of the model with individual covariates was expressed in the page 461 of Reboussin et al. (1998), such as P(c2=rc1=1,Xi)=exp(a_r+b_r1+c_r*Xi)/sum. Are the transition probabilities based on the estimated model in Mplus output computed as {P(c2=rc1=1,X1)+ P(c2=rc1=1,X2)+¡K¡K+ P(c2=rc1=1,XN)}/N ??? Thanks for your help. 


With an x variable you simply add x to the formula P(c2=r c1=1) = exp(a_r +b_r1)/ sum so that you have P(c2=r c1=1, x_i ) = exp(a_r + b_r1 + beta*x_i)/ sum 


Dear Dr Muthen : Yes, I have P(c2=r c1=1, x_i ) = exp(a_r + b_r1 + beta*x_i)/ sum <= But It depenod on "i", it's a individual probability. In Mplus output, the transition prob. table are expressed. Therefore, are the transition probabilities based on the estimated model in Mplus output computed as {P(c2=rc1=1,x_1)+ P(c2=rc1=1,x_2)+¡K¡K+ P(c2=rc1=1,x_N)}/N ??? However, it's just my guess. So I want to ask if I guess it right or not. Thanks for your help. TSAI 


Yes, my formula is individualspecific. This is why transition tables are not given in Mplus when there are x's (too many tables). I don't recognize the formula you give. 


Dear Dr Muthen : But, in my fitted model with individual covariates (x), the transition table is also given. The label of table is "LATENT TRANSITION PROBABILITIES BASED ON THE ESTIMATED MODEL". I don't know how to compute about this transition table in Mplus. So I guess it is the average transition table of individuals, right or not?? Sorry, my English is not well, I didn't express clearly my question previously. Thanks very much. TSAI 


I misspoke  Mplus does give transition tables with x's. The estimated transition table is computed by summing over each person using the formula I gave. 

AnnaG posted on Thursday, May 04, 2006  11:18 am



Dear Dr. Muthen, I did an LCA on 9 binary indicators of risky behaviors in a sample of N=2500, and found that 4 latent classes best fitted the data, based on BIC, entropy, etc., and theoretically it fits well too. I have these data for 3 time points, and the same 4 class solution fits the data over time. Now I want to do a LTA for transitions between the latent classes over time. I am referring to the LTA Example 8.13 in your Mplus Version 3 Manual (April 2004) Would I have to specify c1 (4) C2 (4) and c3 (4) in an LTA, or do I have to import the class membership variables from the LCA in a data file? I just don't get whether I can put the number of classes in myself, or whether I should do something different. Furthermore, I am interested whether the LTA would use the same classes as indicated by the LCA. Thanks very much for your response, Anna 


You would specify c1 (4) C2 (4) and c3 (4) as you say, not import class membership. The classes that you found in the LCA for each time point should be found by Mplus also in the LTA without any problem if those classes are welldefined. Try it. Start with 2 time points. 

AnnaG posted on Thursday, May 04, 2006  12:41 pm



Thanks very much. I have a question about the numbers to put in  for example  the Model c1 subcommands $1*1 : can I limit those at any number, say: MODEL c1: %c1#1% [catggaw2$1*1] (1); etc. %c1#2% [catggaw2$1*1] (12); %c1#3% [catggaw2$1*2] (23); %c1#4% [catggaw2$1*3] (34); or should I put them to 1 and 1 for class 1 and 2, and limit them to 0 for the other classes? Or something else? I could not find anything on this in the manual Thanks again, Anna 


You don't have to put in any numbers. But you can if you want to indicate which class is which  and that is discussed in the LCA examples in the User's Guide in the context of threshold starting values. 


AnNA, Just shar something on LTA becasue I jsut finish my LTA ANALYSIS. AS Prof. Muthen said above, you must be clear now 'which class is which' based on the conditional probability pattern. c1 in time1 may not the the c1 in time 2, you MUST check the conditional probability pattern/intercept pattern to make tsure what is the class mean in each time!! 


When you hold the threshold parameters equal across time for each class, as you would assuming measurement invariance across time, you get the correct ordering of the classes. 

AnnaG posted on Tuesday, May 09, 2006  2:42 pm



I did my LTAs, with four latent classes in each latent c1 variable. In the Tech 11 Output, I get the Model Results for the Categorical Latent Variables, stating for example: C2#1 ON C1#1, and then the Estimates, SE etc I wondered what these values mean: the estimate for staying in class 1 from time 1 to time 2, compared to the reference class? And what does the estimate mean? Should I run my LTA with 4 different reference classes, in order to get all my estimates? Thanks for your help, Anna 


Read the Version 4 User's Guide which is on our web site, Chapter 13, pages 357359. Bottom of page 358 gives a table with logit components and page 359 gives you the corresponding Mplus names. For example, this makes it clear that c2#1 ON c1#1 is the logit slope b_11. If positive, this says that membership in class 1 at time 1 makes it more likely to be a member of class 1 also at time 2. The regular output also gives the translation of these logits into the corresponding transition probability table. No, you don't have to run with 4 different reference classes  the transition table should be all that you need. Also look at the Mplus UG references to LTA and Markov modeling. 

AnnaG posted on Tuesday, May 30, 2006  10:52 am



Dear Dr. Muthen, I wondered whether it is possible to use grouping variables in MPlus LTA (e.g., look to see if latent statuses, prevalences, and/or transitions differ across group, for example if women difer from men in transition probabilities) and whether it is possible to do a significance test for this in MPlus. Thanks very much, Anna 


You can use the KNOWNCLASS option for this. And you can do a difference test of nested models using 2 times the loglikelihood difference. 


Hello, I am trying to run a LTA (4 classes, 3 time points) on a data set composed of elderly. The problem is that I see dead people and I do not wish to ignore them. I would like to build a model with death as an absorbing state at T1 and T2. In a sense, it would mean 4 classes at T0 but 5 classes at T1 and T2. How can I do this with MPLUS...or can I? Thank you in advance for you precious help. Louise 


i bet there are 4 class in time 1, 5 class in time2 and time3. which the death as the 5th class,and the death class can not be 'transited',am I right? you can right the mplus code following the general way. 4 class in first time, 5 class in 2nd and 3rd time.then, maybe fix the tranison posibility from 4th death to 5th death to 1 or not let them transite. say, there is no transition from 4th death to 5th death class. 


As Boliang said, the absorbing state is such that later transition probabilities are 1 for staying in this death class. Also, the conditional item probabilities for the death class should be zero for observed categories other than missing data. See also chapter 13 for more information on transition probability modeling. 


Hello, I have 6 continuous indicators, and each of the 6 indicators are measured at two timepoints. Based on LPA analyses at each time point, there appear to be 3 latent classes at each time point. Here is what I want to know: 1) What are the classspecific parameters (means, variances, covariances) at each time point? 2) What are the transition probabilities of being in a particular class at time 2 given membership in a certain class at time 1? From what I can tell, example 8.13 in the Mplus manual is the best to follow. I did this, and have two questions: 1)I do not understand what to put for the for the Model, %overall% on statement. Could I just say "c2 on c1"? 2) In the output, I did not see the classspecific parameters I mentioned above at each time point. I only saw the classspecific parameters for each of the 9 possible sequences of change. How can I obtain the class specific parameters for the 3 classes at each time point? Thanks. 


1) With 3 classes, you say c2#1c2#2 on c1#1c1#2; so that you refer to all parameters of the multinomial logistic regression; the last class is not referred to (see Chapter 13). 2) If you follow ex 8.13, you will see that the classspecific parameters for the 3 classes at each time point (in your case means of continuous outcomes) are repeated over different patterns of classes. So with 3 classes and 6 variables, you only get 3x6=18 distinct means. This is due to measurement invariance across time (as specified by ex8.13). 


Hello, Thank you so much for your responses to my 11/16/06 post. I have a follow up question. First of all, I should have clarified that we are NOT assuming measurement invariance as example 8.13 does. Thus, we did NOT get 18 distinct means, as you mentioned we should in your response to my 11/13 post. Rather, we got 54 distinct means at each time point, 108 total (9 latent sequences x 12 indicators for each latent sequence). Besides this fact that we are not assuming measurement invariance across time, and that our indicators are continuous as opposed to categorical, our analysis should follow that of 8.13. That being said, we understand that in the output we get the class counts and proportions for each time point. We also understand that we get a transition probability matrix in the output, followed by the parameters (means, variances, and covariances), for each of the 9 cells/sequences of change in the transition probability matrix. However, we also want to obtain the parameters (means, variances, covariances) for each of the 3 latent classes at time 1, and each of the three latent classes at time 2. It is these parameters that will help us name and define our latent classes. Question: Is there a way to get Mplus to give us these classspecific parameters at each time point? 


Mplus prints the means for each latent class at each time point. If you cannot find this in the output, please send your input, output, data and license number to support@statmodel.com. 


Hello Dr. Muthen, I am interested in estimating a latent transition analysis over three waves, with individuals nested within neighborhoods. i am interested in finding out whether the transition probabilities vary between neighborhoods. I initially thought the most appropriate model would be the multilevel latent transition analysis as you have it in the MPLUS 4.2 addendum (example 7). However, I think I may also need a random slope, and I'm not sure how to specify it within the model. Is this the appropriate model, or would model 8 be better? Moreover, when I estimate the model following example 7, I get a message saying "one or more multinomial logit parameters were fixed to avoid singularity of the information matrix. The singularity is most likely because the model is not identified or because of empty cells in the joint distribution of the categorical latent variables and any independent variables." Thank you for your feedback. Sincerely, Magdalena Cerda 


I would start with example 7. Ex 8 is more advanced. The message you get is not related to 2level LTA per se but can also be seen with regular LCA with covariates. It means most often that some classes do not have variation in some covariates so regression coefficients cannot be determined. That is ok and often good in that it means that classes are clearly different wrt to the covariate. 


Hi Dr. Muthen, Thank you for your response. I will use example 7 then. But I still think I need to add a random slope to determine whether the transition probabilities vary by neighborhood. Am I right? If so, how would I do that? If not, what parameters from example 7 would tell me how the transition probabilities vary between neighborhoods? Is there any paper that you are aware of that applies examples 7 and 8? The problem is that I can't find any resource that interprets the output, so I am trying to put bits and pieces together from different sources to interpret it. Thank you for your help. Sincerely, Magdalena Cerda 


You are at the research frontier here, so little is written so far. The 2006 AsparouhovMuthen paper on our web site has a first application. I would do ex7 first. If that works out well, I would turn to ex8 to look at transition probabilities that vary across neighborhoods. The "cb" latent class variable allows different withinlevel "c2 on c1" relationships in different types of neighborhoods. It is not a random slope, but the slope is allowed to have distinct values in different cb classes (nonparametric representation of a random slope). 

Sara posted on Tuesday, March 27, 2007  8:21 am



I have a question regarding missing data and latent transition analysis. Specifically, I have a 6 continuous variables measured at 2 time points. I have 1,300 at time point 1 and 1,000 at time point 2. A total of 612 respondents have scores at both time points. I ran a latent profile analysis at both time points and found 3 classes (which were nearly the same: same pattern & magnitude of means). I am now going to run the LTA to examine the probability of moving across classes over time. My question concerns the use of a missing data technique. Basically, I expect that readers will question if it makes sense to model all data if only 619 had actual data at both time points. However, I suspect I should use the 1,300 data points at time 1 and the 1,000 at time 2 with a missing data technique when running the LTA. Specifically, I suspect that this would be advised over using only the 619 who had data at both time points for the LTA. Is this true? If so, is it because the transition probabilities would be less biased (more accurate) when using the missing data technique than using the smaller sample of 619? Any advice would be much appreciated. 


The best approach is to use all available information: those who have data at both occasions+those who have data at only one of the 2 occasions. This is accomplished in Mplus using Type=Missing. Although the data for those who have information at only one of the 2 occasions do not contribute to the estimation of the transition parameters, they do contribute to estimating the timespecific parameters and therefore help giving better results. 

Sara posted on Wednesday, March 28, 2007  10:58 am



Thanks for the information Bengt. We ran this model. We believe we have an issue with sparseness of cells. We have three classes at T1 (freshman) and T2 (sophomores). We got the following warning. ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 56 57 This transition matrix is produced. FRESHPWB Classes (Rows) by SOPHPWB Classes (Columns) 1 2 3 1 1.000 0.000 0.000 2 0.114 0.818 0.068 3 0.025 0.202 0.773 Parameter 56=change in log odds of being in Soph Class 1 compared to Soph Class 3 if a respondent is in Freshman Class 1 vs. Class 3. Parameter 57=change in log odds of being in Soph Class 2 vs. 3 if a respondent is in Freshman Class 1 vs. 3. Categorical Latent Variables SOPH#1 ON FRESH#1 56805.586 0.000 0.000 FRESH#2 3.957 0.901 4.392 SOPH#2 ON FRESH#1 10.766 0.000 0.000 FRESH#2 3.830 0.817 4.686 I suspect I should be setting parameters to handle this issue, not simply allowing Mplus to fix them? If so, how do I know which parameters to fix and to what values? 


No need to do anything, just let Mplus fix these  the large values give the probabilities of 1 0 0 in the first row, and these probabilites are clearly interpretable. 

Sara posted on Wednesday, March 28, 2007  11:19 am



With respect to the model above and your comments on missing data, I have an additional question: I understand that the latent transition probabilities are estimated using only the subset of sample that has data at both occasions. Examining the number of respondents in each latent class pattern, it is classifying all respondents. It seems that this table isn't interpretable given that 1/2 the respondents don't have data at one point. Is this correct? FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON THE ESTIMATED MODEL Latent Class Pattern 1 1 166.75043 0.10009 1 2 0.00000 0.00000 1 3 0.00000 0.00000 2 1 90.04993 0.05405 2 2 645.00756 0.38716 2 3 53.56662 0.03215 3 1 17.64928 0.01059 3 2 143.66123 0.08623 3 3 549.31494 0.32972 Also, we get a warning that says WARNING in Model command. All variables are uncorrelated with all other variables within class. Check that this is what is intended. We didn't intend to set all variables to be uncorrelated within a class. We set the within class correlations to be equal across classes and time. Looking at the results it appears that the latter is what happened but this warning concerned us. 


The table is totally interpretable. Estimated probabilities for people with information at only one timepoint are based on information from people with information at both timepoints who are like the people with one timepoint at that timepoint. This is the strength of the method. The warning about variables being uncorrelated is given for analysis variables that are not mentioned in the MODEL command. I would have to see the output and your license number at support@statmodel.com to give specific information about your analysis. 


Dear Mplus team, I have a quick question concerning latent transition probabilities. I want to test a LTA model with 2 categorical latent variables (c1 and c2; each with 5 classes) in which no change occurs. Thus I tried to constrain the transition matrix to be an identity matrix using the following statement: c2#1 on c1#1@20; c2#2 on c1#1@20; c2#3 on c1#1@20; c2#4 on c1#1@20; c2#1 on c1#2@20; c2#2 on c1#2@20; c2#3 on c1#2@20; c2#4 on c1#2@20; c2#1 on c1#3@20; c2#2 on c1#3@20; c2#3 on c1#3@20; c2#4 on c1#3@20; c2#1 on c1#4@20; c2#2 on c1#4@20; c2#3 on c1#4@20; c2#4 on c1#4@20; However, this does not seem to be fully correct as some of the tau's are estimated > 0 or < 1, and also the number of parameters is larger than expected. How can I constrain ALL tau's to 1 / 0? Thank you, Christian 


You are forgetting about the intercept parameters that are referred to in brackets. See Examples 8.13 and 8.14 and the last section in Chapter 13, Parameterizations of Model With More Than One Categorical Latent Variable. If you don't have the most recent user's guide, see the one on the website. 


I am running three independent LCA models for criminal offending types (7 indicators with zeroinflated counts) at three time points as a precursor to running an LTA. For two of the years I found that a four class solution was reasonable (although thresholds were fixed by MPlus in one of those years). For the third year, I was having trouble with local maxima and used the OPTSEED approach suggested in the manual. I found a (twice) replicated log likelihood about two points below the highest LL. The estimates are similar. Should I accept those estimates? If so, are there any special considerations that I have to make in the subsequent LTA to account for this? 


Did you try STARTS=1000 100; or greater? If you did not, try more starts. Another factor to consider is if all final stage starts converged? If you did use many starts and compared the results from the best loglikelihood to one of the replicated second best loglikelihoods and they look the same, I think you can trust them. Are these results also similar to times 1 and 2? It may be that the class structure is not as clear at time 3. You may find that when you impose measurement invariance in your LTA this will help stabilize the model. 


Thank you. I attempted to run the LTA as suggested but received the following error messages: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.347D10. PROBLEM INVOLVING PARAMETER 95. ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED: 20 98 My workshop notes indicate that identification in such models can be difficult, but I was unsure of exactly how to go forward from here or further check the model. 


You would need to send the input, data, output, and your license number to support@statmodel.com for us to understand the problem. 


Follow up to May 2006, re: whether LTA uses the "same" classes as indicated by a LCA model: I am using LTA to examine transitions between classes of victimization with former and current partners. A 3 3 model best fits the data; the first class for each latent variable is fixed such that all lc probs are 0. I can calculate the CP of each category and have given appropriate "names" to each class. The problem comes when I add in covariates. First, I guess I don't know quite how to calculate the CP for each class (the output reports thresholds rather than the lc pattern probs which I used to calculate CP). But, does the ordering of each of the latent classes remain the same? I've looked at the final class counts for the lc and it *seems* that the classes change order for C2 even though I have done nothing but add the covariates, i.e. C 2 1 now becomes C 2 2 (substantively) in the LTA model, based on the final class counts, and so on. Questions: 1) How to calculate the conditional probabilities when covariates are included. (And would it be more appropriate to report those as opposed to the unconditional model?) 2) How can I be sure that my classes in the conditional model are substantively the same as in the unconditional model? I apologize for any repeats of previous postings and for my naivete; thanks in advance for your response. 


When you add covariates, do you regress the categorical latent variable on the covariates or the latent class indicators on the covariates? I am assuming it is the categorical latent variable. You can choose the order of the classes by using userspecified starting values. There are several examples of this in Chapter 7. You take the starting values from a previous analysis. 1. The formula is shown in Technical Appendix 8 formula 153. Note that an intercept is the negative of a threshold. See also the section on Calculating Probabilities from Logistic Regression Coefficients in Chapter 13. 2. You can check whether class counts change and see also if the thresholds of the latent class indicators change. 


dear all, I am working on a LTA and I am using the KNOWCLASS command to model gender (c) differences. CLASSES = c(2) c1(4) c2(4); I have managed to calculate different tau matrixs and different rho values for the two groups, but still cannot manage to calculate different delta values. Where should I specify that I want different delta for male and female? in MODEL: %OVERALL% or in MODEL c:??? thanks a lot Luca 


What is delta? 


Hi Dr.Muthen, My question is related to example 8.13 in the user guide. But I will drop the covariate 'x'. I have 3 time points , two class model with no restriction on the transition probabilities. I have the following code: CLASSES = c1 (2) c2 (2) c3(2); ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% c2#1 ON c1#1 ; c3#1 ON c2#1 ; Now I want to write the transition model for the logit of the transition probabilities. (baseline category logit model with last class being my reference class).Here is the model I think : log(p_km/p_kc)=alpha_mt + Beta_km(t1) where m=2,3,...,C (C in this case is 2) k=1,2,...,C Beta_km,(1) = 0 for all k,m Beta_cm(t)= 0 for all t t=1,2,3. p_km= transition probability of moving from class k in time t1 , to class m in time t. Is this correct ? Thanks, Chinthaka. 


I meant to write Beta_km(t) not Beta_km(t1) Sorry about that. 


Sorry, I found another typo m=1,2,3,...,C1 not m=2,3,...,C Thanks. 


See the section at the end of Chapter 13 called Parameterization of Models With More Than One Categorical Latent Variables. 

Sarah Dauber posted on Tuesday, November 27, 2007  10:15 am



Hello, I am running a LTA model with 2 timepoints and 4 latent classes at each timepoint. I am trying to follow example 8.13 in the user's guide. I got the following error and don't know what it means: *** ERROR in Model command Ordered thresholds 1 and 2 for class indicator QUANTITY1 are not increasing. Check your starting values. This appears in reference to several of my latent class indicators. I'm not sure what it means or what to do. Thanks, Sarah Dauber 


Please send your input, data, output, and license number to support@statmodel.com. 


I'm comparing time1 to time2 latent profile transition models. Two meaningful models work: 2class to 2class, and 2class to 3class solutions. AIC goes down with the 22 compared to 23 solution (7070 to 7034, rounded), but the BIC goes up (7215 to 7237). LL goes from 3489.937 to 3453.896. If I understand these correctly, the AIC indicates that the 23 class transition fits better, but the BIC indicates the 22 class transition fits better. Ns are reasonable for both solutions. Any suggestions about how to choose the model to report? Thanks! 


I would suggest looking at more than AIC and BIC. The following dissertation discusses the steps to take when carrying out a Latent Transition Analysis: Nylund, K. (2007). Latent transition analysis: Modeling extensions and an application to peer victimization. Doctoral dissertation, University of California, Los Angeles. This dissertation is available on the website under Papers. 


I think I'm having a brain failure! I have done an LTA for 4 continuous variables measured at two times to identify transitions in latent profile classes. I've been thinking that C1 represented the latent class variable for the 4 indicators at time 1, and C2 likewise for time 2. It appears that they really just represent 2 latent class variables that together define class membership for the 8 indicators (same 4 at the two times). I realize this from seeing that the solution for C1(3) and C2(2) is exactly the same as the solution for C1(2) and C2(3) regarding class memberships. Perhaps I really want a "mover/stayer" model, but I haven't figured out how to make the syntax work for continuous indicator despite notes here about it and information in the manual. Do you have any examples for this sort of LTA on latent profile classes for continous variables I could work from? Thanks! Bruce 


Your LTA should not allow c1 to influence the 4 indicators at time 2, and not allow c2 to influence the 4 indicators at time 1. See UG ex8.13. Here, "not influence" implies that the means do not vary over those classes. 


Thanks, Bengt  I've been using ex8.13 but not correctly, it seems. I have meaningful LPA classes at T1 (3) and T2 (2), but I can't get even a 22 LTA to work. I can live with zero corr within classes, but my attempts to specify even equal diag matrices at each time haven't worked. Going with the defaults for covar matrices, this syntax gives a class at each time with no cases. MODEL: %OVERALL% c2 ON c1 ; MODEL c1: %c1#1% [t1v1*1 t1v2*3 t1v3*40 t1v4*6] ; %c1#2% [t1v1*5 t1v2*6 t1v3*55 t1v4*15] ; MODEL c2: %c2#1% [t2v1*1 t2v2*4 t2v3*42 t2v4*8] ; %c2#2% [t2v1*4 t2v2*7 t2v3*60 t2v4*18] ; If I drop one set of starting values at each time, I get a solution, but it's not similar to any combination of LPA models. Thanks for any help with this! Bruce 


If you get a meaningful 3class LPA at t1 and a 2class at t2  with BIC supporting those choices  it would seem that an input like the one you have here (although with 3 classes at t1) should work fine (I assume the starting values come from the individual LPA's). If using many random starts doesn't get you a solution that looks like the individual LPAs at each time point, perhaps (a) those solutions weren't stable enough, or (b) the sample size is rather small, or (c) putting the two time points together creates a model misfit, such that the correlations among indicators across time are not well modeled by the conventional LPA. One example of (c) would be that a given indicator has a residual correlation with itself across time. It is hard to say more than that without doing analyses. 


Thank you Bengt  I've revised my models a bit more trying to get a solution when specifying starting values from the prior LPAs, but I am still getting strange results  one class gets fixed as having 0 obs at each time, after this warning: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.371D15. PROBLEM INVOLVING PARAMETER 26. (The ALPHA(C) for C2#1) This happens with a 2c to 2c and a 3c to 2c model. When I allow Mplus to choose its own starting values, I get solutions both ways. Any ideas what is going on? Eg, MODEL c1: %c1#1% [t1v1*6 t1v2*5 t1v3*68 t1v4*19]; %c1#2% [t1v1*1 t1v2*3 t1v3*46 t1v4*10]; MODEL c2: %c2#1% [t2v1*6 t2v2*4 t2v3*63 t2v4*17]; %c2#2% [t2v1*1 t2v2*3 t2v3*50 t2v4*12]; Produces: 1 1 67.97338 1 2 0.00000 2 1 0.00000 2 2 119.02662 Thanks! Bruce 


It's hard to say without seeing the outputs and running it ourselves. For example, I don't know if you have used a large number of random starts or use the default and I don't know how they loglikelihoods compare across the models. The nonidentification message clearly appears when you have empty classes, since you don't have people supporting parameters in those classes. The results you show indicate no transitions. A good way to start an LTA is to do the two LCA's first with K1 and K2 classes and then crossclassify people into a K1 x K2 frequency table to see if people fill the cells so that you have transitions. You say "When I allow Mplus to choose its own starting values, I get solutions both ways." if that result comes from many random starts with the best LL replicated several times and better than the LL of the solution you show here, then that's what the LTA gives. If that doesn't help, send the input, output, data and license number to support@statmodel.com. 

Kim D'zatko posted on Thursday, October 30, 2008  1:31 pm



Hello all, I ran an LTA with three classes each over three timepoints. I included two covariates, as well. This converged in ~2.5 hours. The current model includes a mover/stayer latent variable, but no covariates. After 36 hours, it has progressed through only 22 sets of starting values. Is this typical or should I stop the run and check my code? Take care 


It's hard to say without more information. If you are not using Version 5.1, I suggest that you do. If you are, please send the output from the model with covariates, the input and data for the model without covariates, and your license number to support@statmodel.com. 


I receive the following message when running an IRT mixture model with 2 latent classes:  ONE OR MORE PARAMETERS WERE FIXED TO VOID SINGULARITY OF THE INFORMATION MATRIX ....  In this case, the singularity is due to empty cells in the joint distributions of some of the categorical variables. Consequently, the SEs for the fixed parameters can not be computed. However, the values of the fixed parameter estimates are provided in the output. How are the values of the fixed parameter estimates determined given that there are empty cells in the joint distributions? 


I can imagine that a univariate outcome gets prob zero or one in a certain class and therefore a large or small threshold that gets fixed. The choice of value of this fixing is innocous because say 15 or 20 gives the same zero probability. A joint distribution being the cause seems odd to me given the IRT mixture model. If this doesn't help, feel free to send input, output, data, and license number to support@statmodel.com. 


I have 2 questions about how to specify measurement noninvariance and correlations among continuous manifest indicators in the context of LTA. I'm running an LTA with 2 timepoints with 5 classes at time1 and 4 at time2. (Those solutions are supported theoretically and statistically following your recommendations with LCAs for continuous indicators at each timepoint.) I am trying to follow UG 8.13 and the very helpful Nylund (2007), but I don't know how to: 1) Allow correlations within class 2) Account for measurement noninvariance across time I'm interested in the transition probabilities and want to allow the means to differ across time. I apologize if I have missed this answer here or in the user guide. I'd appreciate any direction from you or recent references. Abbreviated model syntax is pasted below: MODEL: %OVERALL% C2#1C2#3 ON C1#1C1#4; Model C1: %C1#1% [map1 pap1 pav1] . . . %C1#5% [map1 pap1 pav1] Model C2: %C2#1% [map3 pap3 pav3] . . . %C2#4% [map3 pap3 pav3] 


1) You can accomplish that by adding a factor measured by the items at each time. See the chapter on our web site: Muthén, B. (2008). Latent variable hybrids: Overview of old and new models. In Hancock, G. R., & Samuelsen, K. M. (Eds.), Advances in latent variable mixture models, pp. 124. Charlotte, NC: Information Age Publishing, Inc. 2) You can use the "dot" option in the Model statement: %c1#1.c2#1% [map1pav1] (13); [map3pav3] (46); %c1#1.c2#2% [map1pav1] (13); [map3pav3] (1416); %c1#2.c2#1% [map1pav1] (1113); [map3pav3] (46); etc which gives noninvariance for the means [map1pav1] versus the means [map3pav3]. The invariance version is obtained by allowing only 2 sets of means instead of these 4 sets of means. 


Thank you for your quick response. I have gone back to your 2008 chapter on hybrids and it sounds like you are suggesting an FMALTA approach in preference to the conventional LTA. This makes sense to me and your data example in the chapter makes a compelling case. Can you direct me to syntax examples of how to run FMALTA? Would I just add these lines to the model command: f1 by map1pav1; f2 by map3pav3; Thank you for you help. 


Yes, but you also have to fix the factor means to zero in the %overall% part since the outcome means are free over the classes. And say f2 on f1 to cover that relationship. 


Hello, I have a total of 7 time points, and am working on LTA analyses with 3 classes at each time point. I have successfully completed the LTA with 3 time points, but when I add a 4th time point it takes a VERY long time to iterate (more than a day with a single processor). Can MPlus handle more than 3 timepoints for LTA right now? Should I try to use a computer with numerous processors to decrease the iteration time? Or do I just need to do a series of 3 timepoints at different combinations to get the transition tables I want? Any suggestions? thanks. 


I would do the LTA's three at a time. Beyond that it is computationally demanding. 


Hello, I described LTA results in a recent paper. A reviewer wants to know whether it is possible to provide standard errors for the latent transition probabilties. I could not find any information on this and tried to bootstrap CI's ("Bootstrap" in Analysis command and "Cinterval" in output command). However, Mplus would not provide bootstrapped CI's for the posterior probabilities. Is it possible (and how) to obtain the CIs given that bootstrapping does not work? Thank you! 


You can use MODEL CONSTRAINT to obtain standard errors of the latent transition probabilities. You would need to define them according to the formulas found at the end of Chapter 13. 


Thank you for your reply. However, I'm not sure exactly which formula you mean at the end of Chapter 13. Could you be somewhat more specific? Thank you in advance! 


See pages 411414 of the Mplus User's Guide. 


Hello, I have a set of data with 2 time points. When I performed LPA's separately for each time point, I earned 5class solution as the bestfitting for both time points. I couldn't assume measurement invariance in my data, and specified the starting values in LTA based on the previous LPA results. (see below) === MODEL: %overall% c2 ON c1 sex NS HA; c1 ON sex NS HA; MODEL c1: %c1#1% [DEP_t*10 DELIN_t*3]; %c1#2% [DEP_t*25 DELIN_t*5]; %c1#3% [DEP_t*16 DELIN_t*18]; %c1#4% [DEP_t*45 DELIN_t*7]; %c1#5% [DEP_t*29 DELIN_t*38]; MODEL c2: %c2#1% [DEP2_t*12 DELIN2_t*3]; %c2#2% [DEP2_t*21 DELIN2_t*24]; %c2#3% [DEP2_t*17 DELIN2_t*12]; %c2#4% [DEP2_t*31 DELIN2_t*5]; %c2#5% [DEP2_t*26 DELIN2_t*45]; === However, when I performed this LTA, means of 5 classes from time 1 were very different from my starting values which I specified in my input, and it became very similar to 5 class pattern of time 2. Why does this happen? Thank you in advance! 


Hi~ In regards to my last question above, I have tried c1(3) c2(3) LTA model as an alternative. The result of this analysis was very different from individual 3class LPA for each time point. Specifically, in 2 separate LPA's for time 1 and time 2, I've both earned "normal", "depression", "depressiondelinquency comorbid" classes. But, in LTA, I've got "delinquency" class instead of "depression" class. How can I interpret this kind of result? Thank you again. 


It is hard to say without knowing your data analysis situation. I don't know how many random starts you have used and how many times the best LL was replicated. It is not clear from your question that you had the same covariates in the model for a given time point and for the two time points togther. Generally speaking, however, when you put the two time points together your model has more content than for each time point  you are saying for example that c1 does not influence the c2 indicators directly but only indirectly via c2. This means that results can change. Typically, this does not happen with a welldefined solution for each time point. 


Hello, I am attempting to establish measurement invariance for a LTA across three time points. I was wondering if there is an empirical indicator of which constraints might be particularly problematic in the analysis (e.g., an LM test). I have not yet been able to establish invariance and will likely have to settle for partial invariance, but am not sure how to empirically decide which constraints to free in order to improve model fit. Thank you, Miguel Villodas 


You can look at modification indices (LM). They don't work as well in mixture modeling but may give you some idea where parameters are not equal. 


Thank you very much Dr. Muthen for your quick reply. I did try to run the model with the MODINDICES option in the output section, but received the following message. Is there another way to obtain these? *** WARNING in OUTPUT command MODINDICES option is not available for TYPE=MIXTURE with more than one categorical latent variable. Request for MODINDICES is ignored. 


In this case, your only option is to look at each parameter separately. 


That's what I was afraid of. Thank you very much for clarifying this for me. Miguel Villodas 

Evgenia posted on Friday, February 12, 2010  1:52 am



Hello. I'm trying to fit a hybrid model with two latent classes with binary indicators. In one class I want to fit a two parameter logistic model and the other class is assumed homogeneous without any latent structure, with given probability of a positive response on item i, for a subject belongs to this class (similar to classic Latent Class model). Is it possible to fit this type of model in Mplus? Can I simulate data from this model? Thank you . Evgenia 


I think in the IRT mixture literature a similar situation arises when one class of subjects use their knowledge (a factor f, say) to solve a problem and another class of subjects guesses. The second class would then be specified with f@0; [f@0]; to eliminate the factor in that class. This would make for independent items in that class, which may make sense with guessing. One could also contemplate other models in the second class. A totally unrestricted model for the second class is hard to estimate by ML because it would involve correlating all items (ML for categorical outcomes does not allow WITH), although the use of many factors could approximate this. 


Hello Dr. Muthen, I have a LTA model with three time points and three classes at the first two timepoints and four classes at the last time point. Although I could not achieve statistical invariance for any of the classes in their entirety, the conditional response probabilities indicate that three of the classes are very similar and that a fourth emerges at the final time point. However, the ordering of the classes seems to change at the fourth time point. After identifying the latent class orders at each time point, I ran an LTA, but the transition probabilities did not seem valid. I checked some of the conditional response probabilities for the specific class membership patterns and concluded that the order seemed to have changed again when the LTA was run. Is this possible? Is there a way to get CRPs for each class at each time point to confirm? I appreciate your feedback about this issue. 


Classswitching can happen. The solution is to use starting values for the thresholds. Userspecified starting values are shown in examples in Chapter 7. 


Thank you Linda! So I assume that I should enter the CRPs from each measurement model as start values for each threshold and turn off the random starts in order to assure that my class orders do not change. Is this correct? 


You should use the logit threshold values as starting values. You can specify STARTS0; 


Thank you very much Linda, I tried this and it worked very nicely. However, not I am trying to add a distal outcome that will be regressed on my final four class variable at my third time point. There did not seem to be much in the User's Guide about this issue, so I read through Karen Nylund's dissertation and her syntax example and added my continuous distal outcome variable the same way. In the output, it seems that means and variances are estimated for each latent class pattern, but I could not find means for each class of the final latent class variable. In other words, I was hoping for three means and ended up with 36. Is there an easy way to get the means that I am looking for or are they printed in another section of the output? Thank you so much for all of your help with this. 


You need to impose equality constraints using the . labelling feature which is described on pages 56061 of the user's guide and in Example 8.14. For further help on this, contact support@statmodel.com. 


I am attempting to run a LTA (moverstayer model) using example 8.14 in the Mplus user's guide. I've run the model with constraints (which allowed for a graph) and freewithout constraints (which did not allow for a graph). However, even when I remove the constraints (13) (46) are there some default constraints that Mplus imposes? I ask because even though all the mean estimates are no longer held to be the same for one latent class, it appears that their are patterns of means which are identical across latent classes. Also, Based on the constrained model there are are 74 movers and 356 stayers. Based on the free to vary model there are 333 movers and 97 stayers. Why would the number of movers and stayers change so much? I am following Karen Nylund's papers which reports size of classes, % of individuals in each pattern (movers then stayers), etc. 


It's hard to say what is going on without looking at your 2 runs. There are default constraints when using Model c1, Model c2, etc, namely what you would expect: time 1 means only change as function of c1 classes, not c2 classes, etc. If the model is correctly set up, the changing numbers of movers and stayers might indicate a model misfit. If that doesn't help, please send your input, output, data, and license number to support@statmodel.com. 


Is it possible to have more than 2 latent classes for the higher order latent variable "c" in the moverstayer model (8.14)? Specifically, is it possible to create a 4 class latent variable c? This way instead lumping all movers and all stayers together (respectively), you could estimate: stayer type 1, stayer type 2, mover type 1 and mover type 2. 


This is possible in principle but I think there could be some difficulties in doing it. 


Hi, After finding out that LTA with more than 3 time points is very computational demanding, I've decided to pool the data, and do the analysis in a multilevel LTA framework, in which the different transitions are nested in individual cases. As I'm not interested in crosslevel interactions, and just want to control for the fact that the cases are not statistically independent, I've read I can use the "TYPE=COMPLEX MIXTURE" option, and specifying the variable by which the nesting is indicated. However, as the individuals are also nested in families, I have actually 2 nesting variables, something "TYPE=COMPLEX MIXTURE" cannot handle. Instead, I found out I should use the "TYPE=TWOLEVEL COMPLEX MIXTURE" option. The model is running fine with this last option. However, there are 2 problems I run into. (BTW, I'm using MPlus v5.0) 1) The output only reports the thresholds for the item probabilities. Is there someway to also get the item probabilities themselves, or should I calculate these by hand? 2) The output does not report the chi square test of model fit. Is there someway to get these nonetheless? Again, please keep in mind that I'm not interested in any crosslevel interactions, but that I just want to adjust the standard errors for the nested structure of the data. Thank you for your future comments! 


To add to my last post, in question 2 I was refering to the output of the LCAs I conducted on the two time points before turning to the LTA. 


We don't give probabilities at this time when numerical integration is involved. You cannot compute the probabilities by hand in this case. You may find the following paper which is available on the website helpful: Henry, K. & Muthén, B. (2009). Multilevel latent class analysis: An application of adolescent smoking typologies with individual and contextual predictors. Forthcoming in Structural Equation Modeling. If the chisquare tests are not given automatically, there is no way to request them. 


Thanks for your quick reply and the helpful reference. Examining the results in more detail, I seem to find the same fit indices and thresholds for the 'fixed effects model' (in which I control for the clustering of the cases in families) and the 'random effects model' in which I add a third level (cases nested in individuals). Am I doing something wrong in the syntax? The fixed effects model syntax looks like: USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; CATEGORICAL ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; CLUSTER = FAM; CLASSES = c(#); MISSING ARE all (9999); ANALYSIS: TYPE = COMPLEX MIXTURE MISSING; The random effects model syntax looks like: USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; CATEGORICAL ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; WITHIN ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; CLUSTER = FAM ID; CLASSES = c(#); MISSING ARE all (9999); ANALYSIS: TYPE = TWOLEVEL COMPLEX MIXTURE MISSING; Again, please keep in mind, that I only want to adjust the standard errors for the nested structure of the data, and that I am not interested in any crosslevel interactions. 


You should compare TYPE = COMPLEX MIXTURE MISSING; with TYPE = MIXTURE MISSING; You need to include a multilevel model with TYPE = TWOLEVEL COMPLEX MIXTURE MISSING; See the Henry and Muthen article. 


Would using the STRATIFICATION option be a solution? So the syntax would look like: USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; CATEGORICAL ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; WITHIN ARE x1 x2 x3 x4 x5 x6 x7 x8 x9 x10; CLUSTER = ID; STRATIFICATION=FAM; CLASSES = c(#); MISSING ARE all (9999); ANALYSIS: TYPE = TWOLEVEL COMPLEX MIXTURE MISSING; Or can I leave out the WITHIN and TWOLEVEL statements altogether? Also, the IDs are technically not randomly drawn within the stratification variable, so I don't know whether that is a problem. 


When you have both COMPLEX and TWOLEVEL, you need to cluster variables. You can also have stratification. Subjects should be randomly sampled from strata. 


With four continuous variables and one categorical variable, I generated a four class LPA solution, good fit, makes sense. My question: Is it proper to use crosssectional LPA class means to constrain latent variables, each with a four class solution in an LTA? syntax: CATEGORICAL = male latino aa coh_one coh_plus; classes = c1(4) c2(4); ANALYSIS: TYPE = MIXTURE; MITERATIONS = 1000; Starts= 100 10; MODEL: %overall% c2 ON c1 male latino aa coh_one coh_plus; c1 ON male latino aa coh_one coh_plus; Model c1: %c1#1% [T1OVERT @ 0.068]; [TOCA1SC @ 4.133]; [TOCA1C @ 4.088]; [RELAGG1 @ 0.50]; [TA1_25 @ 4.835]; [T1OVERT @ 0.168]; [TOCA1SC @ 3.044]; [TOCA1C @ 2.862]; [RELAGG1 @ 0.891]; [TA1_25 @ 4.241]; %c1#3% [T1OVERT @ 0.828]; [TOCA1SC @ 2.636]; [TOCA1C @ 2.851]; [RELAGG1 @ 1.673]; [TA1_25 @ 3.835]; %c1#4% [T1OVERT @ 1.742]; [TOCA1SC @ 2.083]; [TOCA1C @ 2.311]; [RELAGG1 @ 2.265]; [TA1_25 @ 2.902]; Model c2: %c2#1% [T2OVERT @ 0.068]; [TOCA2SC @ 4.133]; [TOCA2C @ 4.088]; [RELAGG2 @ 0.50]; [TA2_26 @ 4.835]; %c2#2% ditto... %c2#3% and so on... %c2#4% OUTPUT: tech1 tech8; 


We would not recommend fixing the parameter estimates. 


I want to estimate the effect of three tx conditions on four classes at 2 time points. My observed variables are continuous. I have been using the example from your Berlin lectures, slides 4851 "LTA with Intervention studies." However, my model differs in that my knownclass would have three tx groups in it and there are four classes at both T1 and T2. I am having difficulty with programming the model language to reflect this, can you assist me in the most efficient way to state this model? Also, can covariates (i.e. race & gender) be included in this model? 


Yes, covariates can be added to this model. Try generalizing the three groups and four classes. If you fail, send the full output and your license number to support@statmodel.com. 


Dr. Muthen, Is there a way to calculate the regression estimates and significance levels for the reference class and the reference tx condition in a LTA with an intervention model? I have a three class model at two time points for 4 tx conditions, and I am unable to know the estimates and significance levels for the third class at each time point and for the fourth tx condition. Also, the output does not explain the regression coeffs for my gender and race covariates. Thank you in advance! @ 


The coefficients for the regression class are zero. You can compute the probability of being in that class. Or you can change the reference class. 


Dr. Muthen, I have been working on the model that you reviewed several weeks ago for me. If you recall, I am treating 4 tx groups as a latent variable with two time points using six continuous measures with high reliabilities. So, in a 3 class model, there are 3x3= 9 cells for 4 tx groups = 36 transition patterns, and a model with 4 classes will produce a 4x4= 16 cells for 4 tx groups = 48 transition patterns. So, I have excellent fit indices and entropy over .92 for models with 25 classes. However, the 3 class model has the largest drop in BIC magnitude from a 2 group model (over 1,200 pts lower from a 2 class model) compared to the other models (less than 500 pts diff between a 3 and 4 group model and 250 diff between 4 and 5 groups). In addition, the model w/ 4 groups has a comparable entropy to the 3, but many empty cells. Kline would suggest a more parsimonious model would be the way to go, but... 1. Is there a citation I can go to that would discuss the drop in magnitude in the BIC as a criteria for deciding on model selection? 2. Do models with empty transition cells/patterns produce stable estimates? I have good fit with a four group model, and there is literature to support the meaning of the class, but 25% of the cells are empty in the 4 group model. 3. Are there are citations that you are aware of investigating the stability/instability of models with empty cells? Thank you for your time. 


1. Here are some citations of interest: Wasserman (2000) in J of Math Psych gives a formula (27) which implies that a BICrelated difference between two models is logBij where B is the Bayes factor for choosing between model i and j. Wasserman's (27) says that logBij is approximately what Mplus calls minus 1/2 BIC. This means that 2log Bij is in the Mplus BIC scale apart from the ignorable sign difference. Kass and Raftery (1995) in J of the Am Stat Assoc gives rules of evidence on page 777 for 2log_e Bij which say that >10 is very strong evidence in favor of the model with largest value. So, to conclude, this says that an Mplus BIC difference > 10 is strong evidence against the model with the highest Mplus BIC value (I hope I got that right). Raftery has a Soc Meth chapter from around 1995 (?) that talks about Bij from a SEM perspective Rob Dvorak posted on Wednesday, July 14, 2010  6:45 pm Hi Michael, Here's the Raftery cite: Raftery, A. E. (1995). Bayesian Model Selection in Social Research. Sociological Methodology, 25, 111163. There's also a good discussion about this here: http://www.statmodel.com/discussion/messages/23/2232.html?1209409498 23. I wouldn't worry about empty cells. 


Dr. Muthen, Thank you for the great resources. I hate to belabor this point, but I am a stickler for accuracy and I am an intervention researcher  not a mathematician. Last summer, I took an ICPSR course and learned about the Raferty citation for calculating a more interpretable BIC using the Mplus chi2 in the formula "chi2df (ln(N))". This calculation produces a BIC that is comparable across nonnested models following the Raferty rule >10. However, as Mplus LTA output does not give a chi2, but only a LgLkd chi2, I am assuming that I can not use this statistic in this calculation, am I correct in my understanding? Therefore, following your suggestions using the results from my models, 2ln of the BIC (19355.681) for model i = 19.741, 2ln of the BIC (18956.107) for model j = 19.699. The difference between Bij is less than 10. Thus, according to your explanation, this is is "strong" statistical evidence for retaining the more parsimonious model with the larger BIC (i.e. keep model i over model j). Is my interpretation of this accurate? Thanks again for your time and consideration. 


Comparing models using the formula "chi2df (ln(N))" is the same as using the Mplus BIC = 2logL + p*ln(N), where p is the number of parameters. Note that chi2 = 2(logL_a  logL_b), where a is a model nested within b. In the usual SEM case b is the totally unrestricted model called H1. Note also that df = p_b  p_a, where p is the number of parameters. So when you look at the difference between the BIC of two models using the formula chi2df (ln(N)) there is a canceling out of the terms 2logL_b and of the terms p_b*ln(N). This means that BIC differences are the same for both formulas. And this means that we should view a BIC difference > 10 as strong evidence that the model with lower BIC is better. 

Dana Wood posted on Wednesday, September 15, 2010  12:59 am



I have a question about how to interpret the latent classes in my latent transition analysis. When I ran the preliminary latent class analyses, Mplus provided results both in terms of thresholds and in probability scale. However, when I ran the latent transition analysis, only the thresholds were provided. Is it possible to manually convert these thresholds to probability scale? All indicator variables for the latent classes are ordered categorical variables (3 categories in each). Thank you. 


The computation of probabilities for ordered polytomous variables is shown in Technical Appendix 1 on the website. 

csulliva posted on Sunday, October 10, 2010  3:11 pm



I am attempting to run a two stage LTA model with latent classes comprised of categorical, censored, and count measures. When I try to incorporate equality constraints over time, I get a series of warnings stating "There are more equality labels given than there are parameters" and a termination message that reads "***FATAL ERROR EQUALITIES BETWEEN PARAMETERS ARE NOT POSSIBLE IN THIS SITUATION." I was wondering what these messages mean and whether anything can be done about them. 


Please send the full output and your license number to support@statmodel.com. 


Dear Dr. Muthen, I have two questions: 1 Can I use LTA for modeling parallel processes? 2 If yes, how does the input look like? If no, then which model should I use. By the way, one model contains 1 class and the other contains 2 classes. Thank you for your time, Jaap U. v. Opstelten 


You can specify two LTA models in the same MODEL command. You would need to expand Example 8.13. 


I am conducting an LTA w/ 3 classes at 2 time points. The measurement model is LPA. I added gender as a covariate. Every time I run the model w/ gender included, the output has the 1st class as the largest class. Unfortunately, I would like for this class (a normative group) to be the last class so I can use it as the reference group in the logistic regression. Based on earlier posts, I tried to reorder classes by putting the original start values for the last class (i.e., the mean estimates (Nu)) as the start values for 1st class, & using 0 random starts. When I do this, the largest (reference) class becomes class 2 and I get an error: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NoNIDENTIFICATION. THE CONDITION NUMBER IS 0.851D10. PROBLEM INVOLVING PARAMETER 69. ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 90 Should I be using different start values or is there another way to reorder classes? Any suggestions you have are appreciated. 


You should be using the ending values from the results section of the first analysis as starting values in the second analysis. IF you continue to have problems, send the relevant outputs and your license number to support@statmodel.com. 


Hi Dr. Muthen, I attended the August mixture modeling course in Baltimore, and I have a question about an LTA I am trying to do. In following the steps outlined in the course handout (page 42 topic 6), I am on step two. I also have been following the Nylund (2007) dissertation, and I am a bit stuck. To explore transitions based on cross sectional results, I am unsure about how to do this. In the handout and in the dissertation, they talk about crosstabs based on most likely class membership. Is this something I would synthesize from my LCA output files from each time point? If this is not the case, I am wondering if I need to run a new model, such as example 8.14 in the version 6 user’s guide (but without model constraints)? Thank you for your time. 


For each LCA, you can use the CPROBABILITIES option of the SAVEDATA command to save the posterior probablities and most likely class membership. You can use the CROSSTABS option of the OUTPUT command to do crosstabs between most likely class membership for time 1 and 2, 2 and 3, etc. to see what the transitioning looks like. 


Hi Dr. Muthen, I am at the point in the LTA building process where I am testing for invariance between fully constrained and partially constrained models (I have a 3 class solution at 3 time points). While I am able to free and constrain classes at different time points, I am having trouble figuring out how to constrain individual variables in each class at each time point in my analysis. I have written the following syntax. Is this how I would constrain individual variables in each class at a single time point? CLASSES = C1(3) C2(3) C3(3); USEVAR = PYYOUDRK EVFOOLSH EVACCID EVINJSOM EVFIGHTS m6q21 m6q59 m6q73 m6q74 m6q61 q21 q59 q73 q74 q61; CATEGORICAL = PYYOUDRK EVFOOLSH EVACCID EVINJSOM EVFIGHTS m6q21 m6q59 m6q73 m6q74 m6q61 q21 q59 q73 q74 q61; ANALYSIS: TYPE = mixture; MODEL: %Overall% MODEL C1: %C1#1% [PYYOUDRK$1* EVFOOLSH$1* EVACCID$1@1 EVINJSOM$1* EVFIGHTS$1*]; %C1#2% [PYYOUDRK$1* EVFOOLSH$1* EVACCID$1@2 EVINJSOM$1* EVFIGHTS$1*]; %C1#3% [PYYOUDRK$1* EVFOOLSH$1* EVACCID$1@3 EVINJSOM$1* EVFIGHTS$1*]; Thank you for your time. 


What you show under MODEL C1 is the default where thresholds are free across classes. With mixture modeling if you want to test the equality of thresholds, instead of constraining the thresholds to be equal across classes which can cause the classes to change, label the threshold parameters and use MODEL TEST to test the equalities. 


Dear Drs. Muthen, I have a question about the model Mplus estimates when running an LTA with covariates. Based on my reading of the Nylund (2007) dissertation, the coefficients for the multinomial regression that occurs when a covariate is included in the model are for the latent status at a given time point. Stated otherwise, it is the effect of the covariate on latent class/status membership at a given time point. Other programs (e.g., PROC LTA, Collins & Lanza, 2010), provide the effect of the covariate on the transitional probabilities. Thus, in Mplus we get the effect on class membership at a given time point for a covariate, whereas in PROC LCA and others it is the effect of transitioning in to a given class membership at a given time point. Am I correct in the way I am distinguishing these effects and the model I understand Mplus to be running? If so, is there a way to make it so that Mplus runs the model where the effect of the covariate on transitional probabilities is tested? Thank you in advance for your help. Aidan 


Mplus does allow for transition probabilities to vary as a function of a covariate. Essentially such a phenomenon is an interaction between the latent class variable say c1 at time 1 and the x covariate in their influence on the latent class variable c2 at time 2. As usual, an interaction can be viewed as a moderated effect, either by (1) c1 moderating the effect of x on c2 or (2) by x moderating the effect of c1 on c2. Estimates from either approach can be used to compute estimates from the other approach. In Mplus, the transformation can be done in Model Constraint. Approach (1) is shown in UG ex 8.13 with the broken line from c1 to the arrow from x to c2 indicating the interaction through c1 moderating x's influence on c2. Approach (2) is shown in UG ex 8.14, where c takes the role of x. The c variable can be latent as shown in that example (this is not possible in proc lta as far as I understand), or it can be observedcategorical. The observed case is handled by using the Knownclass approach making the observed x identical to the latent class variable. An example of this approach is given in the Topic 6 handout of 8/17/2009, slides 4850. That's an example where x is a binary treatment/control variable in an intervention. Various intervention effects of interest are expressed using new parameters defined in Model Constraint. Approach (2) is used in proc LTA and does not use a latent c. An illustration is given in the LanzaCollins (2008) article in Dev Psych. Their x is binary, representing pastyear drunkenness. This model can also be done in Mplus. 


Thank you very much for the reply and for directing me to the handout. As a follow up, is it also possible to do this with an observed continuous variable instead of an observed categorical? Presumably Knownclass wouldn't be the appropriate choice there, but something else? Thanks again, very helpful. Aidan 


In principle yes, via approach (1)  the Mplus approach (2) could not be used unless you categorize it (more than 2 cat's possible). But it is probably wise to first dichotomize it and use the approach (2) of the handout approach. As you saw in the LanzaCollins article even approach (2) with a binary x sometimes has problems in practice. 


I should add that a more advanced way of doing approach (2) with a continuous covariate x in Mplus is to use the Constraint=x option in the Variable command. This is then applied to the c2 on c1 regression. For an example, look for quantitative trait locus in the index of the UG. 

F Lamers posted on Wednesday, April 06, 2011  10:06 am



I’m modeling an LTA with 3 classes at two time points and I am in the process of evaluating measurement invariance. The 3 classes have the same interpretation at the two time points, but some of the items turn out not to be invariant across measurements, so if I understand correctly I should use partial measurement invariance in my final LTA model. The 3 items (out of 10) that aren’t invariant don’t change the interpretation of the classes. I’ve seen some studies having the same situation enforce full MI in the final model, because of the conceptual similarity of classes and to aid interpretability. Is assuming full MI justifiable in such a situation? Are there any serious drawbacks to this approach? 


I would not enforce full invariance if you found partial invariance. I would allow for the partial invariance. The interpretation of the transitions are still valid if you model the partial invariance. 


Dear Professors, I am fitting an LTA with 3 manifest indicators, 3 classes and 6 time points. I am trying to establish if there is stationarity and measurement invariance over time. Based on BIC, I come to the conclusion that item response probabilities do not vary over time but transition probabilities are time heterogeneous. I get though the following msg with this message: ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. I checked through TECH1 which parameters these are and they all refer to the betas of when we regress the latest time point latent variable to the previous one. All of them have s.e=0. 1) Shall I worry? If yes shall impose restrictions to the transition probabilities? 2)Can I try the same model with 4 latent classes? Many thanks, Artemis 


If the betas are large, this means that you have transition probabilities of zero or one which is not a problem. I don't think increasing the number of classes would change this. 


Thanks very much Linda, this is very helpful! 

F Lamers posted on Thursday, April 14, 2011  10:21 am



Thanks for your answer, Linda! I will use partial MI in my LTA model then. I have one other question: Can you do a mover/stayer model, when you have partial measurement invariance? 


Yes. 


Dear Dr Muthén I have time series data on dyads (motherchild, with 3 categorical variables) with number of time points for each dyad varying between 1030. First, can I perform LTA with up to 30 time points? Second, can I have varying number of time points in LTA? Many thanks, Håkan 


Having an LTA with 30 time points would be very computationally demanding. I would suggest using fewer time points, for example, early, middle, and late. The varying number of time points can be handled by missing data. 


Thank you! The research question is such that I need all time points though. 


Then LTA would be very difficult. You could do two or three time points at a time. 


Dear Professors, I am trying to fix the transition probability C1#2  C2#1 (0.011) at zero. I am not sure what the model specification is: [c2#1]; c2 ON c1; Do you know where I could find an example? I am trying to deal with inconsistencies when a respondent reports having some experience with a drug, and then at a later occasion reports never having tried it. C1 Classes (Rows) by C2 Classes (Columns) 1 2 3 1 0.815 0.092 0.093 2 0.011 0.848 0.141 3 0.153 0.040 0.807 Thank you in advance, Sebastian 


See the following paper which is available on the website: Kaplan, D. (2008). An overview of Markov chain methods for the study of stagesequential developmental processes. Developmental Psychology, 44, 457467. 


Thank you! 


Dear Professors, I am trying to replicate Example 10.12. I do not want yet to include any covariates at the individual or cluster level. I have CLASSES = C1(3) C2(3) C3(3) C4(3) C5(3); In the overall part of the between part of the model, I wonder if the following code for my problem is correct: C2#1 ON C1#1; C3#1 ON C2#1; C4#1 ON C3#1; C5#1 ON C4#1; C1#1 C2#1 C3#1 C4#1 C5#1; If yes then I get the following message: THERE IS NOT ENOUGH MEMORY SPACE TO RUN THE PROGRAM ON THE CURRENT INPUT FILE. THE ANALYSIS REQUIRES 5 DIMENSIONS OF INTEGRATION RESULTING IN A TOTAL OF 0.75938E+06 INTEGRATION POINTS. THIS MAY BE THE CAUSE OF THE MEMORY SHORTAGE. YOU CAN TRY TO FREE UP SOME MEMORY BY CLOSING OTHER APPLICATIONS THAT ARE CURRENTLY RUNNING. NOTE THAT THE MODEL MAY REQUIRE MORE MEMORY THAN ALLOWED BY THE OPERATING SYSTEM. REFER TO SYSTEM REQUIREMENTS AT www.statmodel.com FOR MORE INFORMATION ABOUT THIS LIMIT. I have tried 4 timepoints but the problem persists and then 2 timepoints, the program runs but with warning msgs of nonidentification and se's that they cannot be computed. I read the paper entitled ‘Multilevel Mixture Models’ and I wonder if it is a problem the fact that I have 184 clusters with 114 individuals in each of them. Any advice from you would help greatly. With Kind Regards, Artemis 


A couple of points: Saying C2#1 on C1#1 is only correct if both latent class variables have 2 categories. Use the general form C2 on C1. There should be no need for integration in this model unless you have a continuous latent variable which is not in ex 10.12. 5 timepoints with 3 classes each leads to very heavy computations. 


Dear Prof Muthen, Thanks so much for the speedy response; can I just confirm please that I understand correct? So in the overall part of the between part of the model, is it correct the following code, if all 5 latent class variables have 3 categories? C2 ON C1; C3 ON C2; C4 ON C3; C5 ON C4; C1#1 C2#1 C3#1 C4#1 C5#1; C1#1 C2#2 C3#2 C4#2 C5#2; Can you please advise? All latent variables are categorical. I think I can suspect what you mean about very heavy computations. Thanks again, Kind Regards, Artemis 


The statement (with one correction) C1#1 C2#1 C3#1 C4#1 C5#1; C1#2 C2#2 C3#2 C4#2 C5#2; refers to the random intercepts (so continuous latent variables) of the withinlevel categorical latent variables C1C5. They do not refer to categorical variables. That's a very high dimensionality (=10) which is difficult to work with. With 2 timepoints you would get 4 dimensions and will already then need a factor trick like in Henry and Muthen (2010). 


Thank you so much Prof Muthen for all your valuable comments, I will read again also the paper entitled 'Multilevel Latent Class Analysis: An Application of Adolescent Smoking Typologies with Individual and Contextual Predictors' as I see there are examples of MPLUS codes for both parametric and non parametric approaches for estimating an MLCA. Hopefully I can edit these and estimate a reasonable MLTA. Thanks again! 

csulliva posted on Saturday, July 23, 2011  8:31 am



I am trying to run a 2Wave LTA model with covariates but am getting a "nonpositive definite" message regarding the standard errors. Looking at Tech 1, it appears that the problem is with a transition estimate where, if you look at the output, there doesn't seem to be any cases making that designated transition. I did run a model where I tried to fix that parameter but it doesn't seem to have helped with that issue (the same warning appears with a different parameter number). I have two questions: (a) how would I work with that parameter to determine whether it's a problem? It seems that only that part of the multinomial estimates is problematic. (b) This model is intended to be a precursor to a moverstayer model. Will the use of that type of model alleviate this problem as that second order latent class variable is designed to capture those cases in the stayer class? 


Typically, when no one transitions there will be an extreme estimate for a logit parameter and the program fixes it, avoiding the singular information matrix (SE) issue you refer to. So I am not sure why you have a problem here  I think you need to send it to support. 


Thank you. Support answered the question. I had a quick followup on the MoverStayer model. Basically, I'm trying to follow ex. 8.14, but am wondering what changes to the input need to be made to accomodate (a) latent class variables with three rather than two classes and (b) the highlighted portions of the within class specifications on page 226 with alternative levels of measurement(I have categorical, inflated count, and censored items). 


We don't have an example of that. Try generalizing it yourself and if you have problems contact support@statmodel.com. 

csulliva posted on Sunday, July 31, 2011  6:35 pm



Thank you. After reviewing a comment above in response to a question from 3/2/11, I decided it might be more straightforward to look at covariate interaction effects on the transition probabilities based on example 8.13. It appears that the model runs, but I am getting a message that "the sample covariance of the independent variables in class 2 is singular." 


You may want to take a look at the new note we just wrote on this topic, Muthén, B. and Asparouhov, T. (2011). LTA in Mplus: Transition probabilities influenced by covariates. Mplus Web Notes: No. 13. July 27, 2011. This explains how ex 8.13 can be used for your purpose. It sounds like you have covariates and that in class 2 there is no variation in one of them (everybody in this class having the same covariate value). This is ok  at least if it makes substantive sense. 

Julia Lee posted on Tuesday, August 16, 2011  12:07 pm



I'm in the process of planning my analysis using both latent profile analysis and latent transition analysis in my study. I am trying to understand the use of covariates and concurrent variables in both analyses. Regarding LTA, the Mplus manual example 8.13 shows a diagram with covariate x. Q1. I am assuming that the covariate is an antecedent variable. Is this correct? Q2. If I am planning a study that examines the transition of classes from Time 1 (fall of Grade 1) to Time 2 (spring of Grade 1), are concurrent variables i.e., variables that were administered at Time 1in the fall of Grade 1 feasible? Because Grade 1 data is all that I have, I am hoping to use the variables in the fall of Grade 1 to predict the latent transition from fall to spring of Grade 1. Based on a paper recommended on the Mplus website, i.e., Marsh et al. (2009), I understand that in Mplus, the term covariate refers strictly to antecedents. I would like to iron out any misconceptions I am still have in my mind. Q3. I would also be most grateful if you would recommend papers on covariates and concurrent variables and timevarying and time invariant predictors in LTA analyses. Thank you very much! 


Covariates are variable for which your model doesn't specify a relationship. They can be timeinvariant, antecedent, or timevarying. All these types can be used in LTA with Mplus. You find LTA papers using Mplus on our web site under Papers, Latent Transition Analysis. You may be interested in the new Mplus Web Note #13 describing LTA, http://www.statmodel.com/examples/LTAwebnote.pdf 

Julia Lee posted on Thursday, September 08, 2011  2:07 pm



I edited the syntax on Example 8.14 on page 226 to analyze 4 classes with continuous variables instead of categorical variables for the LTA moverstayer model. I would appreciate your input regarding the correct syntax for the Model c.c1 and Model c.c2 for continuous variables. The highlighted sections (i.e., the syntax on measurement error) must be where I had a problem because my Mplus software ran for 8 hours without coming to a convergence and I had to cancel the analysis. This highlighted section is easier to understand in terms of categorical variables; I am still trying to understand it terms of continuous variables. I would be most grateful if you would provide some insight/explanation. Thank you very much. 


Please send your input to support. 


Hi there, I am running an LTA with two time points with four classes at each time. I have received the: "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX..." message. Examining the TECH1 output, I see the issue is with a parameter listed in the beta parameterization matrix. In looking at the regression coefficient itself, I don't see an issue, per se. That is, the estimate does not look problematic. What more can I do to identify and fix the problem? Is there a resource that might have some tips? Any ideas would be greatly appreciated. 


Please send your output and license number to support@statmodel.com. 


Dear Drs. Muthen, I have a question regarding LTA and confounding. I have conducted a multiple group LTA (2 time points and home smoking ban as "known class"). Some of the transition probabilities in my output are counter intuitive and I was wondering if this could be due to possible confounding. Is confounding an issue in GMM and if it is,then how can I account for confounding in a multiple group LTA, in MPus ? I will appreciate your help. Thanks a lot, Charu Mathur 


First be sure you classes are ordered the same in both groups and at both time points. Confounding can be helped by adding a covariate. See Web Note 13 on the website. 

Julia Lee posted on Tuesday, February 28, 2012  1:35 pm



I am conducting LTA moverstayer. I read Nylund's dissertation. On p.57 she wrote that interpretation of the stayers may not be meaningful without measurement invariance. My question: what if full measurement invariance is not observed? Can the LTA moverstayer still be used? How should the analysis be conducted or interpretation if full measurement invariance is not met? Thanks. I appreciate your response. 


You can still do the analysis, but you just change your interpretation since you don't consider movement among the same classes at the different time points. But you should have measurement invariance across the moverstayer classes for it to make sense. 


Dear Drs. Muthen, i want to compute a transition model with unordered categorical variables in three time points. In every timepoint there is only one variable. The variables represent different states of school/occupationalstatus and so the categories of the variables aren't equal over time (especially from t1 to t2): Var1t1 schooltype1 schooltype2 schooltype3 schooltype4 Var2t2 schooltype5 schooltype6 schooltype7 vocational training Var3t3 schooltype5 schooltype6 schooltype7 vocational training unemployed Is it possible to model the transition between those variables over time in an hidden markov process, or do I have to recode the different categories into dummy variables? I was not able to find an example in the literature that fits to this kind of problem. If it's possible, do you have a recommendation for an article where such a LTA/hidden markov process is performed? Thank you very much for your help!! Andy 


Yes, you can model the transitions in a hidden Markov process. See the Topic 6 course handout starting on Slide 15. There is an article cited there. 


Thank you Linda!! 


Dear Mplus team, I am running the LTA with nominal data. Please tell me how to constain the measurement equivalent about the two point tmes. Thank you! Wen 


See Example 8.13. Instead of thresholds referred to with a $ sign, with nominal you have intercepts referred to with a # sign. 


Dear Linda May I set the reference category to what I choose on the intecepts with nominal? Thank you. Wen 


The highest value is the reference class. You can redefine this using the DEFINE command. 


Dear Linda I have another question. When I conduct the GMM, may I let the Mplus estimate the nonlinear parameters in each class? For example, I set c(4), %c#1% i s  y1@0 y2* y3* y4@1; %c#2% i s  y1@0 y2* y3* y4@1; %c#3% i s  y1@0 y2* y3* y4@1; How do I interpret the four classes outcome ? Just in FMM, if the factor loadings are different in each class, these four classes have different meanings. Am I right? Thank you. Wen 


In GMM the goal is to find trajectories that differentiate people. You do not expect to find the same trajectory in each class. You do not compare the means, variances, and covariances of the growth factors across classes. To do this you would need measurement invariance which you would have only if the same growth model was found in all classes. You compare the different trajectory shapes. 


So, I just can set the nonlinear parameters in %overall% not in all classes. Right? Another problem is that there are many settingg across classes. For example,I can let variances of growth factors free estimated in each class or let residual variancles of Ys in each class free. Then I choose the best model from BIC or some indices. So, it is not necessary that let the variances and covariances of growth facttors equal across classes. Am I right? Thank you. Wen 


You need to study this topic to do the analysis. You can see the Topic 6 course handout and video on the website. You might also find the following paper which is available on the website helpful: Jung, T. & Wickrama, K.A.S. (2008). An introduction to latent class growth analysis and growth mixture modeling. Social and Personality Psychology Compass, 2, 302317. 


Dear Dr. Muthen, I am interested in the members of a class A at time 2 and I would like to know in which classes these subjects were at time 1. Therefore, I conducted an LTA and used the final class proportions for the latent class patterns based on the estimated model to calculate class membership probabilities at time 1 conditional on class membership at time 2. Is it statistically correct to do this? Thank you! Maartje Basten 


I don't think this is correct. Use the RESPONSE option in the OUTPUT command and see if this helps you. 


I could not find the RESPONSE option in the OUTPUT command in the users guide. I tried to use the TECH10 option in the OUTPUT command and the RESPONSE option in SAVEDATA command, but both do not work because the indicators in the model are continuous. Is it possible to use LTA to investigate class membership at time 1 conditional on class membership at time 2? What part of the output do I need to calculate these probabilities? Do you have suggestions for further readings? Thank you, Maartje 


I am analyzing the data from a violence prevention intervention study with 4 cohorts. Two cohorts recieved the intervention and two cohorts serve as a comparison group. I am using a latent profile transition model to group participants based on five measures (i.e., overt aggression, social aggression, social competence, cognitive concentration, and liked by peers). Next, the pre to posttest transition of individuals between groups is conditioned on their membership in a knownclass (i.e., treatment or comparison cohort) One of the cohorts in the study is signifcantly different at pretest on the social aggression measure...which leads me to my two questions. First, are there guidelines for using the social aggression along with soicodemographic measures in both the latent portion of the model and in the autoregressive statements of the model to control for the pretest differences between groups? Second, (...and since I have already analyzed the model with and without the social aggression measure as a control) if there is no significant change in the model as a result of including the social aggression measure (indicated by no drop in BIC, no change in LgLkd chi square, and a nonsignificant regression parameter for social aggression) is it defensible to opt for the more parsimonious model without the social aggression measure? Thank you in advance for your time and consideration. 


Aaron: The should be posted on a general discussion forum like SEMNET where you will get a broader range of responses. 


Maartje  have you tried saving CPROBABILITIES? 


Dear Dr Muthén, Thank you for the previous remarks. I want to perform a multiple group analysis to examine whether my LTA model is equal for boys and girls. This is part of my syntax: VARIABLE: CLASSES= CG(2) C1 (3) C2 (3); KNOWNCLASS= cg (gender=1 gender =2); MODEL: %overall% c2 on c1 cg; c1 on cg; MODEL C1: %c1#1% [var1var4]; %c1#2% [var1var4]; %c1#3% [var1var4]; MODEL C2: %c2#1% [var5var8]; %c2#2% [var5var8]; %c2#3% [var5var8]; MODEL CG: %cg#1% Var1var8; %cg#2% Var1var8; This syntax results in a model with equal thresholds for both groups, do you know how I can let the thresholds vary across groups? Thank you 


You need to use the dot language, for example, MODEL cg.c1: MODEL cg.c2: 


Hello, I am currently trying to create an LTA analysis over 3 different time points. I have already conducted the LCA analysis and have determined that a 3 class solution for T1, a 4 class solution for T2, and a 3 class solution for T3. My problem is this: The variable structure is not identical across all three time points. For instance, I have 6 variables at T1; at T2, three new variables were added to the 6 at T1; and at T3, a few of the variables from T2 were dropped and a couple more were added. This is appropriate crosssectionally because the appropriateness of the variables changed as the respondents aged. I thought that Mplus was capable of conducting a LTA analysis even if the variable structure was not identical across all time points, but I have not been able to find examples of this. Am I incorrect? Must the variables be identical across time points to conduct an LTA? If not, could I be directed to some sample syntax and/or articles? Thank you, Vic 


The variables do not need to be identical over time and can also vary in number. The interpretation of the transitions will be different because you don't have the same classes over time, but that is ok. All you need to do is to relax the measurement invariance restrictions that are shown in the UG examples for LTA. 


Wonderful. Thank you very much. 


Drs. Muthen, Example 8.13 in the current manual is for an LTA model with 2 latent categorical variables with 2 classes each. It is stated that the regression of c2 on x in the overall model gives the effect in a multinomial logistic regression of x of c2 when comparing class 1 to class 2 of c2. Similarly, the regression of c1 on x gives the effect in a multinomial logistic regression of x on c1 when comparing class 1 to class 2 of c1. It is stated that because both c1 and c2 have two classes, there is only one parameter to be estimated for x for each latent categorical variable. My model has 2 latent categorical variables with THREE classes each. When I regress c1 and c2 on covariates, I receive TWO sets of effects for each covariate for each latent categorical variable. How do I interpret this? For example, I see: C2#1 ON INTER_AV 0.074 1.246 0.060 0.952 EXTER_AV 0.137 0.871 0.157 0.875 ADVERS11 0.174 0.124 1.407 0.159 C2#2 ON INTER_AV 0.780 0.882 0.884 0.377 EXTER_AV 0.024 0.639 0.037 0.970 ADVERS11 0.013 0.073 0.173 0.863 The above output (just pasting effects for c2) shows effects of the three covariates on classes of the latent categorical variable c2, but this is comparing what with what? Class 1 to Class 3, and Class 2 to Class 3 of c2? Is Class 3 similar to an omitted class (as in dummy coding)? 


See pages 443445 of the user's guide. 


Hello! I have a question about a warning I received when conducting a LTA over two time points with 3 classes in each time point. Some of the variables are the same across time, and some are different  I have relaxed the measurement invariance to reflect this. I found that it is helpful to include STScale=1 in other LTAs, so I included it here as well. My stating values are STARTS = 5000 100. I receive the following warning: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.122D10. PROBLEM INVOLVING PARAMETER 47. Now, I was unable to determine exactly what parameter 47 is (the Tech1 output was not terribly helpful here), but I did notice that only 1 case transitioned from T1C2 to T2C1; only 2 cases from T1C3 to T2C1; and 0 cases from T1C2 to T2C3. I think this must be problematic. Is it possible to "fix" the transition from T1C2 to T2C3 since no one transitions here (should I do this for the other transitions with such low cases)? How would I write that in the input? Do you think this may be the problem causing the warning message? What other sorts of problems might I consider? Any advice would be appreciated. Thank you. 


Hi Linda, if I see the following in my output, should I be concerned about the p levels of 999.000? Do those numbers indicate a problem in my model, or are these p values so small simply because the SEs are near zero? Thanks. Categorical Latent Variables Est S.E. Est./S.E. PValue C2#1 ON C1#1 30.086 0.000 999.000 999.000 C1#2 30.439 0.392 77.657 0.000 C2#2 ON C1#1 2.629 0.000 999.000 999.000 C1#2 29.877 0.000 999.000 999.000 C2#1 ON INTER_AV 0.074 1.246 0.060 0.952 EXTER_AV 0.137 0.871 0.157 0.875 ADVERS11 0.174 0.124 1.407 0.159 C2#2 ON INTER_AV 0.780 0.882 0.884 0.377 EXTER_AV 0.024 0.639 0.037 0.970 ADVERS11 0.013 0.073 0.173 0.863 C1#1 ON INTER_AV 1.000 1.900 0.526 0.599 EXTER_AV 0.319 3.142 0.101 0.919 ADVERS11 0.194 0.162 1.198 0.231 C1#2 ON INTER_AV 0.246 0.722 0.341 0.733 EXTER_AV 0.061 0.538 0.113 0.910 


Also, Linda, is there a way to switch the reference category in these LTA model with predictors, so that Class 2 is the reference category instead of Class 3? Many thanks. 


Victoria: Please send your output and license number to support@statmodel.com. 


Lisa: Please send your output and license number to support@statmodel.com. 


Linda, I have explored several predictors of class membership in my LTA models. Some predictors improved model fit (AIC, BIC) and had significant effects on class membership; while other predictors worsened model fit slightly and had no significant effect on class membership. Only one of our predictors was dichotomous: a marker of Ethnicity (0/1) for our two groups. When we added this predictor, we had to specify INTEGRATION=ALGORITHM under the analysis line to accomodate the dichotomous nature of this variable. However, the fit of the model worsened much more dramatically for this predictor than for the other predictors that ended up not having significant effects. Could this be due to the dichtotomous nature of the predictor, or the INTEGRATION=ALGORITHM? Is the fit of a model with TYPE=ALGORITHM reasonable to compare with our unconditional model, which was estimated without INTEGRATION=ALGORITHM? Or does the INTEGRATION=ALGORITHM specification make the model very different in nature than a model without this line of codesuch that it is not a fair comparison? Thank you. 


It sounds like you have ethnicity on the CATEGORICAL list if this causes numerical integration. The CATEGORICAL list is for dependent variable only. 


Thank you for this point, Linda! 


Hi, I am running a two time point LTA, three classes at each time point. Is it possible to get confidence intervals or standard errors for the class probability estimates and the transition probability estimates? I read above that there was a formula in Chapter 13, but I think the manual must have changed since then because the page numbers given don't exist in Chapter 13. Thanks 


It is now Chapter 14. 


Hi Linda, Thanks. However I searched through Chapet 14 but I couldn't find where it talks about how to calculate standard errors. O 


To get SEs of transition probabilities, you have to express transition probabilities in terms of your logit parameters in Model Constraint using the formulas of Chapter 14. Version 7 will have a probability parameterization where such SEs are produced directly. 


Hi Bengt, Thanks for the pointers. I am still not sure exactly how to get the standard errors of the transition probabilites. I have been able to use the formulas on page 446 and 447 of the User's Guide to get from the logit parameters output in MPLUS to the transition probabilities also output, so I know that I am using the formulas correctly. However, I am not sure how to apply these to the standard errors. In the formulas on page 446, the sum is defined as the 'sum of the exponentials across the classes of c2 for c1 = j'. I can do this for the point estimates, for example: sum1 = exp(a1 +b11) + exp(a2 + b21) + exp(a3 + b31) where a1, a2, b11, b12 are all in the output, and a3 and b31 are 0. However, when I try to do this for the standard errors, I use the standard error values corresponding to a1, a2, b11, b12 in the formula, but I do not know what to use for the the values of a3 and b31, and so I can't calculate the sum. Regards, O 


You get the SEs automatically for any NEW parameter that you define in Model Constraint  you don't have to do anything. Mplus does it using the Delta method. 


Hi Bengt, Oh I see. Great thankyou, I have got that working now for the transition probabilities. Is there a way to get standard errors for the latent class statuses? Regards, O 


You get SEs for the latent class statuses expressed as logit parameters. You can transform those logits to probabilities in Model Constraint as described in Chapter 14 of the UG (in version 7 you can request probability parameterization and get this directly). 


I am sorry, can you tell me exactly what page in the User's Guide? I have looked in Chapter 14, and I can only find formulas for the latent class statuses conditional on covariate x. I wish to get standard errors for the unconditional latent class status probabilities, as per the output 'FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE BASED ON THE ESTIMATED MODEL'. Thanks, O 


The bottom of page 443, top of page 444 discuss the case where all the x's are zero and therefore use only the intercepts. You can follow this example for your case without covariates. 

Jon Heron posted on Friday, September 14, 2012  8:18 am



I just programmed this earlier in the week. say you have a 4class variable X: Model: %overall% [x#1] (cp1); [x#2] (cp2); [x#3] (cp3); model constraint: new(temp_c1 temp_c2 temp_c3 sum p_c1 p_c2 p_c3 p_c4); temp_c1 = exp(cp1); temp_c2 = exp(cp2); temp_c3 = exp(cp3); sum = 1 + temp_c1 + temp_c2 + temp_c3; p_c1 = temp_c1/sum; p_c2 = temp_c2/sum; p_c3 = temp_c3/sum; p_c4 = 1/sum; 

Jon Heron posted on Friday, September 14, 2012  8:21 am



here's the extra output you get: New/Additional Parameters TEMP_C1___9.081___0.741___12.261___0.000 TEMP_C2___1.187___0.179___6.616___0.000 TEMP_C3___1.828___0.212___8.639___0.000 SUM___13.096___1.021___12.826___0.000 P_C1___0.693___0.011___65.074___0.000 P_C2___0.091___0.010___9.117___0.000 P_C3___0.140___0.010___13.587___0.000 P_C4___0.076___0.006___12.826___0.000 and here's the latent class distribution you usually get: FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes 1___2270.19508___0.69340 2___296.87514___0.09068 3___456.92251___0.13956 4___250.00727___0.07636 I do wish there was a truetype font option on the forum :( 


Hi Jon and Bengt, Hmm, thanks, that is what tried to do but it doesn't seem to be working, so I thought I must have been looking at the wrong thing. For example, I have a three class variable c1 at time 1, so I used the following code (extra stuff on transitions omitted): MODEL: %OVERALL% [c1#1] (a_1); [c1#2] (a_2); MODEL CONSTRAINT: NEW(tempa_1 tempa_2 proba_1 proba_2 proba_3 suma_12); tempa_1 = exp(a_1); tempa_2 = exp(a_2); suma_12 = 1 + tempa_1 + tempa_2; proba_1 = tempa_1/suma_12; proba_2 = tempa_2/suma_12; proba_3 = 1/suma_12; Which gives: PROBA_1 0.826 PROBA_2 0.174 PROBA_3 0.000 But these don't match the previous output from, FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE BASED ON THE ESTIMATED MODEL C1 1 0.24989 2 0.47318 3 0.27693 So I am not sure what I have done wrong? Thanks again, O 


Are you sure you don't have covariates in the model? You can send your output to Support. 


Hi Bengt, Sorry I misread your earlier message and thought you said it could be used for the case with covariates. AT least now I know why it wasn't working. Is there a way to calculate the probabilties when covariates are present? O 


See page 444 in the UG. 


The formulas on page 444 are for the probabilities at specific values of the covariates  in the two examples, there is the case where all covariates = 0, and then the case when all covariates = 1. However, I have tried both these and neither of these gives me the probabilities output by MPLUS under 'FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE BASED ON THE ESTIMATED MODEL'. I think what I need is a way to calculate the marginal probabilties (and s.e.), not those for a specific value of the covariates. Basically what I need is a way to calc the s.e. for the 'Proportion for each latent class variable based on the estimated model'. Is there a way to do this? 

Jon Heron posted on Monday, September 17, 2012  6:33 am



Presumably if your covariate is categorical then this would just be a weighted sum of those two figures you have just derived, with the weights depending on the distribution of your covariate. 


With covariates, the marginal probabilities of the latent classes are not parameters in the model so this is not straightforward to get. As Jon says, the point estimates of the probabilities can be obtained by computing the probabilities for each subject's covariate values and averaging, but I see no easy way to get the SEs. Some would argue that you get the class probabilities and their SEs from the unconditional model and that the conditional model is used for explaining the class membership; problem is that the class probabilities may change between these two models, but the reason for this can be explored. 


OK,thankyou Jon and Bengt for your continued patience with all my questions, which you have answered more than satisfactorily. 


Hi, I have a query. I have a data set with 4 time points with 3 choices (political parties, vote choice), n (1,400) . I want to do a moverstayer analysis, what example should I use? 


I am not sure if your observed variables are nominal and if you have more than one observed variable per time point. Look at the Version 7 UG on our web site, ex8.15. Also, read the Langeheine & van de Pol, 2002 reference given there. 


Thanks for the reply. Yes my observed variables are nominal per time point. I tried to replicated example 8.15 and I got an error. *** ERROR in ANALYSIS command Unrecognized setting for PARAMETERIZATION option: PROBABILITY I have mplus 7. 


Check the top of your output to be sure you are using Version 7. You may have more than one version of Mplus on your computer. 

Laure posted on Thursday, November 08, 2012  1:46 am



Dear Linda and Bengt I am running a LTA with 15 binary variables, 3 timepoints, 3 latent classes and a covariate. I would like to estimate the missing values of the variables based on the existing information of the other timepoints. My syntax is based on ex8.13part2.inp and I am using Mplus 7. Could you please give me an example of how to specify the syntax for the imputation of the missing values? Thank you so much. 


See Example 11.5 and also the section in the user's guide for DATA IMPUTATION. 

Laure posted on Saturday, November 10, 2012  5:39 am



Thank you, Linda. Unfortunately, with ex11.5 the following warning occurred: *** FATAL ERROR THE CONVERGENCE CRITERION IS NOT SATISFIED.INCREASE THE MAXIMUM NUMBER OF ITERATIONS OR INCREASE THE CONVERGENCE CRITERION. PROBLEM OCCURRED DURING THE DATA IMPUTATION.YOU MAY BE ABLE TO RESOLVE THIS PROBLEM BY SPECIFYING THE USEVARIABLES OPTION TO REDUCE THE NUMBER OF VARIABLES USED IN THE IMPUTATION MODEL.SPECIFYING A DIFFERENT IMPUTATION MODEL MAY ALSO RESOLVE THE PROBLEM. I would not like to reduce the number of variables, unless it would be absolutely necessary. What can I do to remedy this issue? For information: With only two timepoints ex11.5 works fine. 


Please send the output and your license number to support@statmodel.com. 


Hi! I am running LTA following Nylund Dissertation. But…. ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL THE FOLLOWING PARAMETERS WERE FIXED… Also I am running a Moverstayer LTA for three time points using a probability parameterization (following example 8.15). But… THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILLCONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUE BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.375D16. THIS MAY ALSO BE DUE TO LARGE THRESHOLDS. DECREASING (INCREASING) LOGHIGH (LOGLOW) MAY RESOLVE THIS PROBLEM. LARGE THRESHOLDS WERE FOUND… 


Please send the outputs and your license number to support@statmodel.com. 

sojung park posted on Monday, November 19, 2012  4:21 pm



Hi, I also ran into the same problem as above ========================================= ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 37 38 39 40 41 42 43 44 ========================================= Could you please help me with this problem? thank you so much! 


Please send the output and your license number to support@statmodel.com. 


Hope this is the right topic heading. What kind of analysis is the following: Categorical variables measured at baseline (e.g. big, heavy, smart, wise) would give you something like two latent classes > say, weight and IQ; you also measure change (i.e. not using the same set of variables as at baseline) as in bigGER, heavIER, smartER and wiseR at some point later on > that also gives two classes, as in weightIER and IQIER ;). What's of interest is how class membership at baseline affects the class membership of change  and how these are associated with a later outcome, say life expectancy (could be categorical or continuous). Would it be latent transition analysis with distal outcomes  of sorts? Can one model it? Would one use the AUXILIARY setting? Cheers! 


Sounds like LTA with an added distal outcome. It's doable in Mplus. 

Andy Daniel posted on Wednesday, January 23, 2013  6:49 am



Hi, I've a question concerning a Hidden Markov Chain Model with more that 2 unordered (nominal) categories in the variables. I've two nominal variables, one in each timepoint. Before building a markov chain I have to define a latent variable in each timepoint with the nominal variable as an indicator. My question is now, how do I have to set up the measurement model? Do I have to define for each category a latent class with the number of thresholds (k1) defined like in the examples for ordinal variables, just replacing the $ with #? Here my example for the first timepoint (nominal variable 5 categories) !Measurement Model for Latent Variable c1 MODEL c1: %c1#1% [var1#4]; %c1#2% [var1#4]; %c1#3% [var1#4]; %c1#4% [var1#4]; %c1#5% [var1#4]; If I do this for the two timepoint Mplus produces an output, but I'm not sure if the result are valid. Many thanks for your help!! Best regards Andy 


You have to mention all 4 nominal intercepts, not just the last one. 

Andy Daniel posted on Thursday, January 24, 2013  1:39 am



Thank you for the quick answer!! You mean like this: MODEL c1: %c1#1% [var1#4]; [var1#3]; [var1#2]; [var1#1]; %c1#2% [var1#4]; [var1#3]; [var1#2]; [var1#1]; %c1#3% [var1#4]; [var1#3]; [var1#2]; [var1#1]; %c1#4% [var1#4]; [var1#3]; [var1#2]; [var1#1]; %c1#5% [var1#4]; [var1#3]; [var1#2]; [var1#1]; 


Yes but you want to place equalities as shown in Example 8.13. 

Andy Daniel posted on Monday, January 28, 2013  5:53 am



Thank you Linda!! 

Tania Wood posted on Wednesday, January 30, 2013  6:46 am



Dear Mplus team, I'm working on an LTA model with covariates and have tried to follow the input used by Nylund in her thesis. My input looks like this: MODEL: %OVERALL% ptype4 ON ptype2; MODEL ptype2: ptype2#1 ptype2#2 ptype2#3 ptype2#4 ON zwrkhrs2 sw2anx; ptype4#1 ptype4#2 ptype4#3 ptype4#4 ON zwrkhrs4 sw4anx anybaby ONSSECch; MODEL ptype2: %ptype2#1% etc, etc. I keep getting the error message: *** ERROR in MODEL command No OVERALL or class label for the following MODEL statement(s): PTYPE2#1 PTYPE2#2 PTYPE2#3 PTYPE2#4 ON ZWRKHRS2 SW2ANX; and I can't work out what I'm doing wrong. I've tried putting the model statement on the same line and adding % to the class labels but I get the same error message. I'd be really grateful for any ideas. Tania Wood 


I think the problem is MODEL ptype2: After MODEL should come the name of the categorical latent variable in the following format if c is the categorical latent variable named in the CLASSES statement: %c#1% for class one. It should be MODEL %c#1%: If this doesn't help, send your output and license number to support@statmodel.com. 


Hello, I am running a LTA model with continuous covariates and would like to utilize the new LTA calculator function. The model runs without error, but I am unable to click on the LTA calculator in the drop down menu. Is there something I need to write in the syntax to make this option available? Thanks! 


Do you say type = plot2? The LTA calculator option sits under the "Mplus" menu. 


Hi, Linda I am running SEM analysis using WLSMV estimation method. I have three latent variables for IVs and three mediating variables (they are all continuous) and one latent variable for DV (three items were categorical). I have a question about the indirect effect. The Mplus example code shows that I can include a code for indirect effect. However, when I read over posting on the website, so many people talk about the bootstrapping method. It was my understanding that the indirect effect on the output can be used for the report of the results, correct? Or should I use the code for bootstrapping to test the indirect effects? If I need to use this bootstrapping method, how to set the code? Thanks for your time in advance. 


Bootstrapping is used for small samples because the indirect effect may be nonnormal. See the BOOTSTRAP option in the user's guide. 


Hello, I am having some with an LTA with 3 time points. There are 3, 3 and 4 classes at the respective time points the classes represent psychological disorder classes. I have two covariates, gender and a continuous, time varying covariate. I have been through examples 8.13 and 8.14 as well as the webnote and have developed the following input (excluding threshold constraints to save space): MODEL: %Overall% C1 ON Gender ACES04; C2 ON C1 Gender ACES04 ACES48; C3 ON C2 Gender ACES04 ACES48 ACES812; MODEL C1: %C1#1% C2 ON Gender ACES04 ACES48; %C1#2% C2 ON Gender ACES04 ACES48; %C1#3% C2 ON Gender ACES04 ACES48; MODEL C2: %C2#1% C3 ON Gender ACES04 ACES48 ACES812; %C2#2% C3 ON Gender ACES04 ACES48 ACES812; %C2#3% C3 ON Gender ACES04 ACES48 ACES812; I have tried this both parameterizations from the webnote. Both gave me the logits and odds ratios predicting class membership at each time point and for transitioning to each class, given specific Latent Class Patterns (i.e., 111, 121, 131, 211, 311). However, what I am really looking for is whether or not the transition probabilities from each of the two transition matrices (3X3 and 3X4) are dependent on my covariates. Do these parameters have to be created manually or is there a way to print them? 


Please send your output and license number to support@statmodel.com. 


Hello! I am trying to run an LTA with two concurrent growth models (ie., childhood ADHD and Depression symptoms) that predict a second set of two concurrent growth models (ie., early adolescent ADHD and Depression symptoms). I am evaluating the transitions from childhood symptoms to adolescent symptoms. I also have binge eating in late adolescence as an outcome. My problem is with testing significant mean differences on the outcome among classes within each growth model. I was able to identify how to place binge eating in the model as an outcome, based on syntax referenced in Karen Nylund's dissertation. However, I understand from the discussion board that I need to use the Wald test in Model Test to identify statistical means differences across classes. Unfortunately, I keep getting this warning: "WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX." I may have too many empty cells for these tests to be estimated. Do you have any suggestions for how to remedy this? Thank you for your help! Kathryn 


Please send the output and your license number to support@statmodel.com. 


Hi, I am trying to run an LTA, but have problems with testing measurement invariance. I have four timepoints and four continuous indicators, and from individual LCA's at each time point I find that a 4 class solution has the best fit for every time point. I am now running a model assuming full invariance (as described in the Nylund dissertation), but I am very confused to find that the model has 2100 free parameters (and of course takes ages to run). Any idea what I might be doing wrong? 


It sounds like you are not setting up the model correctly. Please send input, output, data, and license number to support@statmodel.com. 


Nylund (2007, p. 100) notes that in LTA "It is important to explore measurement invariance of the classes before imposing structure on their relationship across time (i.e., through the autoregressive relationship)." The examples on invariance that I have seen, however, seem to include the autoregressive paths (UG ex 8.13, posting of May 12, 2009). Should I omit the "on"statements between the classes in the %OVERALL% section for exploring invariance? 


You should not omit ON. Nylund's comment is relevant when you have more than two time points and say: c2 ON c1; c3 ON c2; etc but not c3 ON c1; That omission gives "a structure". But typically this aspect is ignored. 


Hello, I am running a LTA with two time points. The measurement model is LCA with four binary indicators and 3 classes at Time 1, 2 classes at Time 2. When I run the analysis, I get the same error message reported by several others above: ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: 23 24 25 Based on the Tech 1 output, I believe the parameters correspond to the following: 23 = Alpha(c) c2#1 24 = Beta(c) c1#1:c2#1 25 = Beta(c) c2#1:c1#2 The transition matrix produced is as follows: 1 2 1 0.00 1.00 2 1.00 0.00 3 1.00 0.00 My questions are: 1) How can I determine whether the singularity is due to empty cells or to the model not being identified? 2) What is the meaning of the alpha and beta values? 3) How problematic is this error message in terms of the validity of my model? Any input would be greatly appreciated. 


Please send the output and your license number to support@statmodel.com. 

Tait Medina posted on Wednesday, September 18, 2013  4:34 am



if i conduct an LTA with 2 time points and full measurement noninvariance, should i get the same conditional item probabilities that i obtain when conducting an LCA separately in each time period? 


Not necessarily. The two time points are not independent. 

Oliver Perra posted on Thursday, September 19, 2013  6:00 am



Hello I am reading with interest the paper regarding the 3step approach implemented in Mplus 7 (see Mplus Web Noes No.15, version 7). I have a couple of general questions and I would appreciate your comments. 1When applying this method to LTA, what is the best way to test for measurement invariance across occasions? Looking at the two examples in the webnote (one with and one without measurement invariance), I cannot figure out how one would compare models with and without measurement invariance. 2. Assuming one has partial measurement invariance (e.g. two classes invariant in a 3class model), what would be the best strategy to apply the 3step method? Would one estimate the SVALUES for the invariant part in step 1 and then use them to fix parameters for the invariant part as one would do for a completely invariant model? Thanks Oliver 


I would do the invariance testing as a separate step before doing 3step. Then in 3step, if you want partial invariance, you apply the invariance version of the 3step with relaxations of invariance added. 


Hello, do indicators of classes in an LTA have to be categorical, or can they be continuous? Typically, I see indicators as categorical in LCA and LTA because the classes are determined by probability of endorsement on the group of indicators. I also saw the examples in the Mplus manual for GMM and LCGA (also in the family of mixture modeling), where indicators can be continuous. But for those, continuous indicators are measured multiple times in the growth model, which is different. But can indicators be continous in LCA and LTA, such as ratings on items on a scale from 1 to 100, or age, or a percent score? I would think that the answer is no, because I don't see that as allowing classes to be defined according to endorsement probabilities. Only categorical indicators lend to probabilistic interpretations. Is this right? Thank you. 


Yes, you can have any combination of variable types in both LCA and LTA. With a continuous indicator, classes are determined based on means. 


Very interesting! Dated literature on LTA (e.g., Collins et al. 1994) describes it as being based on categorical items only. This is intriguing, but I appreciate this expansion to the method. It is as if the analysis has changed in definition since then, perhaps because of expanded software capabilities. Thank you. 

Tait Medina posted on Wednesday, September 25, 2013  2:59 pm



I am wondering if a model such as this can be estimated and if you think it makes sense: Say I have 3 time periods. I allow the conditional item probabilities to be noninvariant over time, but I fix the transition probabilities to be 1 on the diagonal (everyone in c1 at t1 is in c1 at t2 and t3, everyone in c2 at t1 is in c2 at t2 and t3, etc.). I want to do this b/c the LCAs estimated separately at each time reveal interesting and reasonable differences in conditional item probabilities for classes with the same substantive meaning. Absent the data, I also think this makes sense theoretically. I am trying to capture developmental trajectories during emerging adulthood (the period from 18 to 24 yo). If we think of a "normative class," we can envision the probability of leaving the parents' house, finishing college, working fulltime, having a first child to increase over this time period. We can also think of a "uturn" class who leave the parents' house and begin fulltime employment only to later return home and enroll in school. I believe the above model will be able to capture this, but am not sure if it is a reasonable approach. 


This sounds similar to the moverstayer model of UG ex 8.15. 

Tait Medina posted on Thursday, September 26, 2013  9:32 am



Thank you for your reply. Doesn't the model of UG ex 8.15 assume measurement invariance across time for the five latent class indicators? So it would be the same as 8.15 if measurement noninvariance of latent class indicators was allowed and everyone was in the stayer class. Is this reasonable? 


Right. But having only a strict stayer class like that might not fit well. 


I'm trying to conduct an LTA with count and continuous indicators & a continuous covariate as in Ex. 8.14. I am getting the errors "There is at least one count variable that has only one unique value..." and "One or more variables in the data set have no nonmissing values." referring to T2 indicators. When I check these variables in Stata, they have few missing values. Here's part of the output:
Continuous  Number of   Variable  Observations  Variance     MMOHW1  609  48.373  [omitted]    SOCHW1  610  53.635  **MMOHW2  0   **BROWHW2  0   **OFVGHW2  0   **IMHW2  0   **SOCHW2  0   LOGVAT1  501  0.12  **LOGVAT2  0   OLF1SQ  614  29.647    What's very confusing is that there should be roughly 992 observations at each time point (although some may be missing one or more indicators). Any additional help would be appreciated. Thanks, Michelle 


You need to check the data set that Mplus is reading not the Stata data set. It sounds like you may be reading the data set incorrectly in Mplus. Check the data set for blanks and be sure the number of variable names in the NAMES statement is the same as the number of columns in the data set. 


Hi I am very new to Mplus (only got it this week in fact) but I have a few (probably basic) questions. I have done my best to look through the discussions above but I may have missed solutions to my problems. I am trying to fit an LTA model, quite simply, I have 2 time points, 7 indicator variables. I want to know the number of classes to select and then to interpret what I have found. I have some notes from an Mplus course I attended, and have been looking at the User’s guide a fair bit. Anyway, my questions are: a) Is it possible to use an ordinal indicator to form the latent variable (as each of my indicator variables are on a scale (most 5 levels)) instead of binary? If so, how are they coded? b) Is it possible to get the Bootstrapped Likelihood Ratio Test (BLRT) for LTA. In the notes I have, it is possible to get it with the TECH14 command for LCA, however, when I run it, it doesn’t accept it in the LTA. c) From what I have read, one of the key parameters in LTA is the itemresponse probabilities (Rho’s). From the output, I can’t see clearly where they are. I have found the proportions in the classes and transitional probabilities but can’t find the bit for the rhos. Those are my queries for now I hope they aren’t too trivial! Many thanks, Dan 


a) Yes; just declare the variables as categorical and Mplus will find how many categories you have. You don't need to code the variables in any special way. b) No, we don't have BLRT when there is more than one latent class variable. Do LCA BLRT for each time point separately. Or, use BIC for either LTA or LCA. I find that BIC is so much easier to work with. c) The rho's are the thresholds for each class and variable. See our Topic 5 teaching handouts and videos on our website covering LCA and Topic 6 on LTA. 


Hi Thank you very much for the response (and at a weekend). That info was very helpful (as are the videos). Dan 


Hello, Example 8.13 in the Mplus manual shows an interaction between c1 and x on c2 in a twoclass, two time point LTA as follows. MODEL c1: %c1#1% [u11$1u14$1*1] (14); c2 ON x; %c1#2% [u11$1u14$1*1] (58); MODEL c2: %c2#1% [u21$1u24$1*1] (14); %c2#2% [u21$1u24$1*1] (58); Why is the regression of c2 on x shown in the MODEL c1 portion of code, rather than in the MODEL c2 portion of code, given that it is an interaction on the c2 factor? I like this model very much, but am confused about the placement of that one line in the code. What would be modeled if it were placed in the MODEL c2 portion of code? Thank you. 


Actually, I think the above code was from a prior version of the manual, but I suppose the concept is the same. ApologiesI am checking the current manual now, but am still interested in the question above. Thank you. 


Go by ex 8.13 in the version 7 UG on our website. 


Hello, in example 8.13 there is an option for using PARAMETERIZATION = PROBABILITY. When transition from class 1 to class 2 is considered it gives the following results: P(C2=2CG=1,C1=1)=0.547 for cg1 P(C2=2CG=2,C1=1)=0.475 for cg2 The probabilities of staying in class 1 are: P(C2=1CG=1,C1=1)=0.194 for cg1 P(C2=1CG=2,C1=1)=0.244 for cg2 When cg2 is a reference class: OR=(0.547/0.194)/(0.475/0.244)=1.4 That OR means that members of cg1 group in comparison to cg2 group are more likely to move from class 1 to class 2 (rather than stay in class 1). Am I getting this right? Is it possible to calculate the significance level of this OR in Mplus as in the web note 13 using model constraint command, but with the probability parametrization instead of logit parametrization? Thank you. 


That looks right. You can give parameter labels also to the probability parameters and use Model constraint to get to the ORs. 


Thank you very much for your advice. I've tried to label the probability parameters: MODEL cg: %cg#1% c2#1 ON c1#1 (t011); c2#1 ON c1#2 (t021); c2#1 ON c1#3 (t031); c2#2 ON c1#1 (t012); c2#2 ON c1#2 (t022); c2#2 ON c1#3 (t032); ..and the same for x=1 Then I used the model constraint to get ORs using the code below: t013 = 1(t012+t011); t023 = 1(t021+t022); t033 = 1(t032+t031); ..and the same for x=1 oddsx012 = t012/t011; oddsx013 = t013/t011; oddsx021 = t021/t022; oddsx023 = t023/t022; oddsx031 = t031/t033; oddsx032 = t032/t033; ..and the same for x=1 or12 = oddsx112/oddsx012; or13 = oddsx113/oddsx013; or21 = oddsx121/oddsx021; or23 = oddsx123/oddsx023; or31 = oddsx131/oddsx031; or32 = oddsx132/oddsx032; The results seem to give me correct calculation of ORs. But they are based on the transition probabilities that differ a lot from the model without model constraint command. I will be very grateful for your suggestions how to solve this problem. 


Send the two outputs (with and w/out Model constraint) to Support. 


Thank you for your answers to my previous questions. I will be grateful for your opinion about yet another thing. I have a LTA model with assumed measurement invariance. Adding a second order path to this model resulted in significant improvement of fit based on 2*loglikelihood test, but the BIC for the second order model is still higher (difference = 4.03). How would you interpret these results? In theory it makes sense to add the second order path. Thank you 


We would use the LL test whenever it is available rather than BIC. 


Thank you very much, greetings from Stockholm! 


Hi I have a couple of queries regarding the LTA investigations I am running at the moment. I have about 9700 observations in my dataset, and I am currently looking at 9 binary indicator variables measured at 2 time points. The number of classes that is optimum is 3 classes, of which have latent class proportions roughly (class 1 16%, class 2 67%, class 3 17%). So am I correct in thinking that 67% of the 9700 (so about 6500 observations) belong to class 2? I would presume this, however, when I look at the itemresponse probabilities, class 2 has these at or nearly at 1.00, so that would imply there are 6500 observations in my dataset that have a '1' for all of the 9 variables? But I know that that isn't correct by a long way. Only about 600 observations have a '1' in every indicator variable. The majority of the dataset have '0' in the indicator variables as I have coded them such by design. Am I getting muddled with what this means or is my data not being read into Mplus correctly? My second question, is that I am using the save cprob command which gives me the probabilities in a txt file. Is there a program that I can open this file in so that I can rearrange the data and create frequencies etc? Regards 


Do a TYPE=BASIC with no MODEL command to be sure you are reading the data correctly. You can open the data set in Excel. 


Hello, can I run Example 8.14 in the UG using three time points? Or do three time points necessitate using Example 8.15 (the moverstayer model, or something similar to it)? I think I can extend Example 8.14 to three time pointsis that right? Thank you. 


Also, if I extend Example 8.14 to three time points, would the transition that occurs between time points 2 and 3 be independent of the transition that occurred between points 1 and 2? This idea is reflected in literature for latent trajectory analysis, but I was unsure whether it applies to this latent transition analysis model. For example, for latent trajectory analysis: "For a given trajectory class j, conditional independence is assumed for the sequential realizations of the elements of Yi, yit [i.e., the dependent variables] over the T periods of measurement." Thank you. 

Kathleen posted on Friday, January 31, 2014  5:53 pm



Hello, I'm looking at UG 8.15, but would like to know how to constrain transitions without specifying a mover or stayer class, more along the lines of the description of the ECLSK LTA? I would like to do this with the probability parametrization in Mplus 7.1. We have 2 time points with 4 classes at each time point. Looking at our results from latent class analysis at time 1 and 2 in cross tabs, there are some transitions that hardly happen (e.g.,0.3% of the sample). Related to the above is how to specify the transition constraints for the 4th class, since it is the reference class and therefore not mentioned in the multinomial regressions. Thanks much. 


When you mention ECLSK LTA perhaps you are referring to the V7Part2.pdf handout that you can get at http://www.statmodel.com/v7workshops.shtml looking at slide 83 of Part 2 in: Handouts for New Developments in Mplus Version 7 The handouts for the Mplus Version 7 workshops at Utrecht University on August 2729, 2012 are posted here in 4perpage format and in regular format: Part 1: 4perpage Regular Part 2: 4perpage Regular Part 3: 4perpage Regular 


Answer to Yarnell: Ex 8.1 can be extended to any number of time points. The different transition matrices are different as the default in Mplus. 

Jane Smith posted on Thursday, February 06, 2014  6:53 am



Drs. Muthen, I have an LTA model with 3 time points and a secondorder effect, and would like to compute the transition probabilities between time 2 and time 3 based on class membership at time 1. Can the new LTA calculator be used compute transition probabilities across different levels of a given latent class membership? If not, can I use the equations for computing transition probabilities across covariates? Thanks. 


Yes, the LTA calculator should be able to do this. Maybe I have examples in one of my talks that are posted (e.g Utrecht, August 2012). 

Kathleen posted on Sunday, February 09, 2014  9:50 am



Sorry if this is a repost; I thought I had posted but cannot find it on the board. I have two questions: 1) After reading Webnote 15 and Chapter 14 in the UG, I’m not able to figure out how to constrain transition probabilities in the last class in an LTA. I’d like to constrain movement from class 3 at Time 1 to class 4 at Time 2, and vice versa. In the logit parameterization, “c2#4 on c1#3@0;” is not allowed. How could I specify this with a model constraint command? I seek to do this is because the transition patterns for these groups are too small for analysis. I thought my options would be to either constrain the model or examine the item response probabilities and reclassify respondents posthoc. Which do you think is the better approach? 2) I am not using a covariate in my LTA, as I have no theoretical reason or substantive reason to do so. But I’d still like to improve the model due to misclassification at each time point. I applied the manual 3 step approach, saving class probabilities at each time point in 2time point LTA, and applying the logits in the LTA to the nominal variable. The BIC and ABIC of the model increased, but the entropy also increased substantially compared to results without the 3 step procedure. Would I compare the models using the measurement invariance LRT test with the scaling correction factor? Many thanks for your help. 


Dear Dr.Muthen, I have a question about your model in Mplus user's Guide Chapter 8 page 236 Variable cg you use circles and squares overlap symbols, this variable it mean know class variable right? If I apply this model to use in my research. I can use this right, if I have know class variable. Could you tell me about this symbol. Thanks you so much 


cg is specified as a latent categorical variable, but it is in fact observed. This dual background motivates the choice of a circle in a square. 


Answers to Kathleen: 1)Try using Parameterization = Probability and the examples given on our website. 2) I don't understand which analyses you are doing here and what you want to accomplish with 3step since you don't mention a covariate or a distal. You can send the relevant outputs to Support. 


Hi I am aware that my question(s) have been asked previously, but after reading the responses, and the relevant sections of the users guide and searching online, I can't seem to achieve what I am after. My next 2 steps in my LTA involve investigating any differences between certain factors (so initially gender), and then investigating whether a coefficient predict transitional probabilities. I have started with the first step of the multi group, but the output doesn't, intuitively, provide me with what I am after. The model code at the moment (with some help taken from the post by Maartje Basten in July 2012) has the parts: Variable: CLASSES = cg(2) c1(3) C2(3); KNOWNCLASS = cg (gender = 1 gender = 2); MODEL: %OVERALL% c2 ON c1; c1 on cg; MODEL c1: %c1#1% [pn13$1] (1); etc. and for MODEL c2 MODEL CG: %cg#1% [pn13$1]; etc %cg#2% [pn13$1]; etc. I also want to test if there is a difference between males and females. I can see I will also struggle with the covariate prediction stage as well. The users guide seems to be the best example I can find, but my data isn't clustered (or twolevel as phrased in the example). Do I simply remove the terms that are related to the clustering for it to work? Regards 

Jane Smith posted on Tuesday, February 11, 2014  9:41 am



Thanks for your quick response to my last question. I was able to use TECH15 to get the transitions split by class membership at the first time point. I then added continuous covariates to my model and the TECH15 output gave me ten outputs with slightly different numbers but all with the same heading"ESTIMATED CONDITIONAL PROBABILITIES FOR THE CLASS VARIABLES EVALUATED AT THE SAMPLE MEAN FOR ALL COVARIATES." Which output is most appropriate to use? Is this perhaps because I use multiple imputation? Thanks. 


Answer to Daniel: You will want to look at UG ex 8.13 as well as Muthén, B. and Asparouhov, T. (2011). LTA in Mplus: Transition probabilities influenced by covariates. Paper can be downloaded from here. Mplus Web Notes: No. 13. July 27, 2011. 


I am examining the moderating effect of a continuous covariate on transition probabilities, similar to the model shown in the Muthen & Asparouhov web notes entitled “LTA in MPlus: Transition Probabilities Influenced By Covariates.” I am using the LTA Calculator function to produce the transition probabilities at different values of the covariate. My question: Is there any available estimate of the statistical significance of the moderation effect, or are the apparent differences in transition probabilities at different values of the covariate merely descriptive? 


Merely descriptive. 


You may also want to take a look at our Web Note 13: Muthén, B. and Asparouhov, T. (2011). LTA in Mplus: Transition probabilities influenced by covariates. Paper can be downloaded from here. Mplus Web Notes: No. 13. July 27, 2011. 


Dear Linda and Bengt, I am running one and twolevel LTA models with dichtomous indicators. When I ran 1level LTA models, I received class endorsement probabilities in both threshold and probability scales. In my 2level LTA output, I see class endorsement probabilities only in the threshold scale, without the conversion to probability scale. What command may I use to obtain the class endorsement probabilities for my 2level LTA in the probability scale, as seen below from the 1level models? Alternatively, is there a way to convert them by hand? RESULTS IN PROBABILITY SCALE Latent Class Pattern 1 1 1 ALCYEAR1 Category 1 0.768 0.019 39.478 0.000 Category 2 0.232 0.019 11.926 0.000 


I believe this involves numerical integration so you could not do it by hand. If you cannot tell, send the output. 


I found a formula on slide 72 for Topic 5 of the Mplus Short Course "Categorical Latent Variable Modeling Using Mplus" P(u=1c) = 1/(1+e^(logit)) although this formula seems to give the conditional probability for a score of 0, not a score of 1. So I just subtracted 1 from the result. I tested this against my 1level LTA outout, inputting the threshold as the logit. It then produces the probability to have a score of zero, conditional on class. This produced perfectly the probability scale results for the 1level LTA. Can I just convert the thresholds for my 2level LTA using this same formula? 


Also, yes, I did use numerical integration for the 2level LTA. Would the above formula still work? Or would it produce incorrect results if I use it given the numerical integreation? I tried doing this using the formula functions in Excel, and the results seem reasonable. I am just not sure if they are correct, or if I need to do something else? 


The formula is not correct for a 2level model; numerical integration is needed. The formula only says what the probability is at the value zero of the random effect(s), i.e. it is a conditional probability so it underestimates the full (marginal) probability. 


OK, if I send the output, would you or Linda be able to help provide the accurate conditional probabilities for item endorsement by class? Or is there a command to employ in Mplus? How can I obtain the numbers I need? 


I could also report the values produced using the formula above, and say that results for each school will deviate somewhat from these values, given that I've allowed for random effects across schools; the degree of variation from the numbers I provide will differ by school? Or, please do let me know if there is a better way to do this. 


We don't have those algorithms handy. But the probability should be similar to what you get in singlelevel modeling. 


Hi I know that my question have been asked partially, but to be sure I want ask you 2 questions about LTA. (1) I have a data for 2 times and according to LPA T1 yielded 6class and t2 3classes. I am curious if LTA is possible with different class numbers. (2) I possible are there any papers or input examples I can take into account? (Especially some with a covariate variable) 


The number of classes can differ over time. I don't know of any examples where this is shown. 


Hello, I ran a 3time point LTA with 1 level, according to UG Ex 8.14. Could you assist in interpreting the output below? The names of the 3 class variables for the 3 time points are c1, c2, and c3. There are 4 classes. My questions are: (1) For latent class pattern 1 2 1 on the class variables, shown directly below, I see numbers that suggest the impact of age on being in classes 1, 2, and 3 for c3. But, if this is for pattern 1 2 1, it is not the case that anyone with this transition pattern was in class 2 or 3 at time 3 (i.e., for their status on c3). Could you explain? (2) Also, the output seems to provide some of the regressions of latent status on age that I requested, but not all. Why is that? Thank you sincerely. Estimate S.E. Est./S.E. PValue Latent Class Pattern 1 2 1 C3#1 ON AGE1 0.512 0.875 0.585 0.559 C3#2 ON AGE1 0.027 0.850 0.032 0.975 C3#3 ON AGE1 0.026 0.983 0.027 0.979 Latent Class Pattern 1 3 1 C3#1 ON AGE1 0.342 0.504 0.679 0.497 C3#2 ON AGE1 0.253 0.548 0.462 0.644 C3#3 ON AGE1 1.128 0.425 2.651 0.008 


Please send the output and your license number to support@statmodel.com. 


Hello, I entered two predictors of latent status at baseline and transition probability at two subsequent time points in a 3class (3 time point) LTA. The predictors marked ethnic membership, "afamer" and "latino," which were mutually exclusive, with white as the omitted reference category, as in traditional regression analysis (# of dummy variables = # of groups  1). But in the output for the LTA model, I am not obtaining p values for estimates of both predictors on baseline latent status and transition probability, e.g.: C1#1 ON AFAMER 25.412 0.189 134.114 0.000 LATINO 25.058 0.000 999.000 999.000 and C2#1 ON AFAMER 0.463 2.040 0.227 0.821 LATINO 1.248 0.000 999.000 999.000 Why am I not obtaining p values for the second predictor? I understand that since these are categorical predictors, I could run this model in accord with UG Ex. 8.13, with ethnic group as a known class variable. But then I would not obtain a beta regression weight for the impact of ethnic group on baseline status, and hence would not be able to compare these effects with those from other (continuous) predictors. What can I do to keep the model as is, in the form of UG Ex. 8.14, regressing the latent class variables directly on ethnic group, rather than treating ethnicity as a known class variableand get estimates for both of these predictors? Thank you. 


I suspect that latino in c2#1 is fixed because there is no variability in latino for that class. 


Hi I am running a LTA model with 3 time points on a cohort data where the available population reduces at each time point (through nonresponse). It has been suggested, to use all the data, to analysis it in one go (so time 1 > time 2 > time 3), and rather that dropping the people that disappear at time 3, so code them as missing for all time 3 variables so the LTA would incorporate those not responding at time 3, and include them in the time 1 to time 2 analysis (as dropping them cuts the numbers in half) and then analyse the rest on a time 1 > time 2 > time 3. My model has ran OK with the 9,705 observations but I am unsure how Mplus is treating those that are all missing for time 3 variables. In the output section titled "CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS PATTERN" where you can see the individual's pattern of transitions over the 3 time points, the total number of people in this equates to 9,705, although I know that nearly 5,000 people shouldn't be 'transitioning' at time 3, as they have no data for time 3 and this is where I am getting confused. Is Mplus predicting what states these people should be in (based on time 1 > time 2 transitions) or based on other people (like a multiple imputation sort of), or do they remain in the same state at time 3 from what they ended up in in time 2? Regards 


Mplus uses the "FIML" principle of MAR where you use all available data. So, yes, include those with missing on time 3 as long as they have at least one observation earlier. For a person missing on time 3 FIML in essence does predict what time 3 responses would be given earlier information on that person as well as parameter estimates for time 3 obtained from information on those not missing at time 3. So they may not end up in the same state as earlier. 


Thank you for your previous response. I am still working on the same LTA idea (3 time points, with 5 states at each time point), but in a slightly different dataset (6423 observations). I am wanting to see how well my population have been placed into the groups (looking at class probabilities) and also describe the type of people in each state at baseline (age, gender etc). I have used the save= cprobs command and getting a text file that I am opening in excel, which I am aiming on merging with my demographic data to then explain the states. However, for some reason, there are twice as many lines (so 6423*2= 12846). However, the id numbers only appear once, the other 6423 observations are either a 0 or decimal place (not in my original dataset). In my Mplus output, where it states what each column is, there are more labels in the output, than columns in the dataset. So there should be 125 columns (to represent the 125 different transitional movements with the 5 states), but there only appears to be 75 columns for this. Am I not reading in the text file into excel properly? Also, is there an alternative methods for finding out the class memberships for baseline? I am happy to calculate it the current way (so add all the probabilities that represent transitional patterns that begin with 1 (so 111, 112 etc, all 25 of them for each state), but wondered if there was a quicker technique? 


I assume you don't have age, gender etc included in the model. If not, you can use Auxiliary R3STEP or E to get the information you need on these covariates. Regarding the save=cprobs issue, you may want to send the relevant files to support@statmodel.com along with your license number. 

Nicole posted on Monday, July 07, 2014  5:46 pm



Hello, I am wondering if it is possible to run LTA with multiple groups and a covariate variable. If so could you point me in the direction of syntax I can reference? Thank you! Nicole 

Qin Xie posted on Tuesday, July 08, 2014  3:18 am



continue The model runs well when I specified the overall model with structure imposed (e.g. autoregress C1 on C2, C2 on C3). Q2: Are the LR differences between two structured LTA (with and without measurement invariance imposed) the same as the LR differences between the two unstructured ones (correlated latent variables)? Thank you! QX 

Qin Xie posted on Tuesday, July 08, 2014  3:26 am



Hello, Profs. Muthen, Following Nyland (2007) "It is important to explore measurement invariance of the classes before imposing structure on their relationship across time (i.e., through the autoregressive relationship)."(p.100). I tried to test full measurement invariance as following: MODEL: %OVERALL% C1 with C2; C2 with C3; C1 with C3; MODEL c1: %c1#1% [BC1b$1BC5b$1] (15); %c1#2% [BC1b$1BC5b$1] (610); %c1#3% [BC1b$1BC5b$1] (1115); %c1#4% [BC1b$1BC5b$1] (1620); MODEL c2.. MODEL c3: ¡K *** ERROR in MODEL command This model is not supported by LOGIT parameterization. Use LOGLINEAR parameterization.messag Something is wrong with the overall model, can you advice? Thanks QX 

Qin Xie posted on Tuesday, July 08, 2014  9:50 pm



Continue My model is for 3 time points, C1(4)C2(4)C3(4), each time point has 5 categorical indicators. I haven't added in any covariate to the model. I've tested the structured model (LTA) with partial measurement invariance fixed, it runs well. It is odd why the simple correlation model went wrong. Thanks Qx 


The measurement invariance testing will probably be only a little different when you have an unrestricted versus restricted C1, C2, C3 model. I would not bother with using the unrestricted model. If you use the unrestricted model you can use WITH only if you use Parameterization= Loglinear  see the UG pages 498500. Equivalently in the Logit parameterization, you can say C3 ON C2 C1 to make it unrestricted. 


Answer for Nicole: See UG ex 8.13. 


Hello: I am attempting to replicate (using the same data) some LTA estimates performed using much earlier versions of Mplus than the one I am using (7.11). I am getting very 'close' estimates to the class estimates from the original analyses although there are some differences. I am wondering if the different versions of Mplus may have something to do with this (i.e., why I am not able to replicate the estimates precisely). Is this a possibility? Thank you. 


Several little algorithmic improvements have been made. First make sure you get the same loglikelihood. 


Hello, I want to conduct LTA with a 2 wave survey on psychological resilience, but the second timepoint has considerably less respondents because of attrition. I was wondering if I could add a fixed category (i.e., dropout) to the LCA solution for the second timepoint to understand whether the nonresponse was equally distributed among the t1 classes or some of the t1 classes had higher probability to dropout of the survey. That woudl imply to have a lCV at t2 with one category of missing values, is taht possible to do? If so, what is the syntax to do that or where can I find it? thank you in advance Davide 


You can try to do that, but it is advanced modeling and not recommended unless you are an expert; I haven't tried it. That is, add a missing data category to the observed outcome, treating it as nominal. And then specify a latent class for which the probability is zero to have any other outcome than the missing data category. 


Hello, I have three time points with 4 continuous indicators at each. I have run latent profile analyses at each time point, and during this process, I chose a set of constraints that best fit my data  means are freely estimated, variances/covariance invariant across classes within time. When I run my LTA with all the three time points, I get very different prevalence rates than I did when I ran each LPA individually  and, the means within my classes change slightly. I seem to gather that slight changes in prevalence rates can occur, but not this dramatic (one group went from 15.82% to 49%). 1) Do I maintain the constraints from my LPAs in my LTA model? I have tried this, and it helps with fit, but not with the means & prevalences. 2) Is "fixing" means every advised? I tried fixing them to maintain the same probabilities, but the fit decreased (BIC increased). Is it possible that indicators at one time point are influencing class membership at other time points? Any insight you have regarding how best to solve the discrepancy between my LPA and LTA prevalences would be greatly appreciated. 


Things you can check include making you sure that you have your classes in the same order for the different analyses. And you would want to hold the indicator means equal across time for a given class (measurement invariance). But still, class percentages can be different in the analysis of all time points than for specific time points. You can check if they are more similar when you analyze only two time points at a time  if that is the case, perhaps you need to let time 1 classes influence time 3 classes directly in hour full analysis. 1) If by "constraints" you mean which withinclass covariances to have free, then yes. 2) I think one should in general avoid fixing parameters  equality constraints are better. In principle you can have direct influence from the latent class variable at time t to indicators at time t+1, but I would think this being significant may be more rare. A more reasonable extensions might be to correlate the residual of a given indicator over time. 


Thank you for your help. I found that allowing the residuals of my indicators to correlate across time was very helpful  my prevalence rates within each group (as well as other statistics and indices) in the longitudinal model are now much closer to what they were in the individual LPAs. 


Great. 


Like Kathleen (see 2/9/2014 in this thread), I used the probability parameterization to constrain one of four latent transitions to be 0 (the last class). I would also like to include some covariates (e.g., c2 ON c1 x; c1 ON x), but an error message indicates that latent class regressions are not allowed with this parameterization. Is there some other way to constrain a model in this way and also examine the effect of covariates? 


Then you have to do it via logit constraints. I discuss this in my handout for the August 2012 Utrecht workshop on our website. Web Note 13 is also useful when you have covariate effects on transitions (parameterization 2 is used in the V7 UG ex 8.14. 


Thanks for your reply. If I'm understanding your suggestion correctly, I'm still having some trouble. When I specify the following logit constraint, I get an error that says "No reference to the slopes of the last class is allowed." %OVERALL% [c1#1]; c2 ON c1 x; c1 ON x; c2#1 ON c1#2 @ 45; If I switch to PARAMETERIZATION=PROBABILITY, I get an error implying that I cannot include covariates in the way I'd like. 


"No reference to the slopes of the last class is allowed." This message comes out because the last class of c1 is class 2 and that slope is not a free parameter but is zero (I assume that c1 has 2 classes). See top of page 499 of the V7 UG to see the parameterization (there are no b's on the last line). Looking at page 499 you see that the "a1" logit has to be large negative for the transition probability to be zero for c1=2 transitioning to c2=1. 


Thank you very much; pages 498499 of V7 UG were helpful. 

Eric Deemer posted on Monday, January 05, 2015  7:30 am



Hello, I have a question about the coefficients in the model population statement in mc example 8.13: c1#1 on cg#1*0.5; c1#2 on cg#1*0.2; c1#1 on cg#2*0.2; c1#2 on cg#2*0.5; Are these probabilities or odds ratios? Thank you. Eric 


They are logits. 

Eric Deemer posted on Monday, January 05, 2015  8:53 am



Thanks, Linda. Eric 

Eric Deemer posted on Monday, January 05, 2015  11:57 am



Hello, I'm getting the following error message: *** ERROR The following MODEL statements are ignored: * Statements in Class %C1#1% of MODEL C1: [ EK1$1 ] [ EI1$1 ] * Statements in Class %C1#2% of MODEL C1: [ EK1$1 ] [ EI1$1 ] * Statements in Class %C2#1% of MODEL C2: [ EK2$1 ] [ EI2$1 ] * Statements in Class %C2#2% of MODEL C2: [ EK2$1 ] [ EI2$1 ] *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. Do these thresholds need to be represented by "#" rather than "$"? Eric 


Try adding ALGORITHM=INTEGRATION; to the ANALYSIS command. 

Eric Deemer posted on Monday, January 05, 2015  2:49 pm



Hi Linda, I tried adding algorithm=integration to the ANALYSIS command but that didn't work. Eric 


Please send the output and your license number to support@statmodel.com. 


I’m using LTA with 1 continuous and 4 categorical covariates with 5 classes at each time point and am having a hard time understanding the output. I am imputing covariates, fixing conditional probabiliites rather than using equality constraints, then specifying regressions as described in UG 8.14. My understanding is that this will produce classspecific regressions of c2 on x that allow x to influence the transition probabilities. My confusion comes when interpreting output. I first see under “Categorical Latent Variables” the usual class specific regressions of C1 on X (I think). I then see under the headings “Latent Class Pattern 1 1” (from k=15) additional regressions of C2#1, C2#2, C2#3, and C2#4 for each pattern from 1 1 to 5 1. Could you please clarify what these are, exactly? At Tech 15 I have the following heading repeated 5 times, but with slightly different probabilities, e.g.: ESTIMATED CONDITIONAL PROBABILITIES FOR THE CLASS VARIABLES EVALUATED AT THE SAMPLE MEAN FOR ALL COVARIATES [first appearance] P(C1=1)=0.000 P(C1=2)=0.223 … [second appearance] P(C1=1)=0.000 P(C1=2)=0.221 Finally, I would like to use BCH to specify a distal outcome but am unsure whether this is possible using the approach above. What would be the considerations? Thanks, Michelle 


Q1. The regressions of C2#1, C2#2, C2#3, and C2#4 for each pattern from 1 1 to 5 1 are the regression of C2 on x which are varying across the 5 classes of C1. You can read about this in our web note 13. UG ex 8.14 uses "parameterization 2" where the corresponding logits are shown in Table 6 of that web note. Table 6 indicates that the transition probabilities change as a function of x. Q2. Please send the output and license number to Support. Q3. BCH is limited to models with a single latent class variable. For a "manual" alternative, however, see the LTA section in Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Appendices with Mplus scripts are available here. and also NylundGibson, K., Grimm, R., Quirk, M., & Furlong, M. (2014): A latent transition mixture model using the threestep specification. Structural Equation Modeling: A Multidisciplinary Journal, 21, 439454. 


Thank you for the clarification. I ran the model without imputation and found only one set of estimated conditional probabilities in Tech 15; it seems like that must be the problem. Would you like to see output for both files? 


Sounds like you got TECH15 for each imputation (5 of them), which is normal. You can use the average. 


Hello again, I just discovered that the Mplus implementation of FIML means that I don't have to limit my longitudinal sample to those who have more than one observation (essentially what Sara talked about in this thread in 2007). However, I'm unclear about a few things: I would like Time 1 in my LTA to be values for ages 13 OR 14 and Time 2 to be values for ages 14 OR 15. Right now my data are in long format. What would be the next step to make sure that any values that are present at age 13 or 14 would contribute to class 1, and any values present at 15 or 16 would contribute to class 2? Would I just reshape them and have class1 be defined by time1 and time2 indicators? Also, would I need to use the Data Missing command? Would I do this only for the latent class indicators, or later for when I include covariates in the model? I'm confused about how the Missing Data chapter examples apply to longitudinal mixture modeling such as LTA. And does it matter that I have dropout and dropin? Thanks for your help with this. Michelle 


I just realized that I don't need to limit my sample to people with more than one time point when I do LTA. But I’m still confused about how, and for my research question I want to combine two timepoints into one, i.e. lump together 13 and 14 year olds and compare them to 15 and 16 year olds. That would be fine if people didn’t sometimes have observations at all 3 or 4 time points, but a few of them do. My questions are:
 My data are in long format. I assume I have to reshape them to wide format for the LTA?
 Is it possible to lump together those observations in Mplus? If so, how would I do that and then designate the responses from 13 and 14 year olds as Time 1 and 15 and 16 year olds as Time 2? I realize I could just use indicators from age 13 and age 14 as my Time 1 latent class variable and those from 15 and 16 as my Time 2 latent class variable, but this produces a model with too many parameters.
 What’s the best way to model covariates, given all the above?
 Do I have to use more than just Analysis: type= mixture missing to deal with those who don’t have observations at all time points? I.e., Do I need Data Missing?
Thanks a lot! Michelle 


Sorry about the double post! My computer was trying to tell me that the first one didn't post when the library turned off wifi yesterday. 


Yes, LTA is best done as a wideformat analysis. Why not "just use indicators from age 13 and age 14 as my Time 1 latent class variable and those from 15 and 16 as my Time 2 latent class variable"? Why do you say "but this produces a model with too many parameters"? 


Dear Dr.Muthen, I tried to run Random slope in LTA model for two times point from Example 8.13 in Mplus user’s Guide , but I can’t run it. If I apply this model in my research , Would you tell me about syntax for run random slope in LTA model and type of analysis. My syntax VARIABLE: NAMES = CLUS PRS11 RES12 COM13 CON14 CRE15 PRS21 RES22 COM23 CON24 CRE25 LCA_TS1 LCA_TS2 LCA_TS3 LCA_TS4; USEVARIABLES = PRS11 RES12 COM13 CON14 CRE15 PRS21 RES22 COM23 CON24 CRE25 LCA_TS1 LCA_TS2 LCA_TS3 LCA_TS4; CLASSES = C1(3) C2(3); ANALYSIS: TYPE IS MIXTURE ; MODEL: %OVERALL% S  C2 ON C1; C1 ON LCA_TS1 LCA_TS2 LCA_TS3 LCA_TS4; C2 ON LCA_TS1 LCA_TS2 LCA_TS3 LCA_TS4; C2 ON C1; S ON LCA_TS1 LCA_TS2 ; Thanks you so much 


You cannot have a random slope with categorical latent variables. 


Hello, I have 3time points LTA model based on categorical indicators with covariates (time invariant: gender, age, and time varying: occupation). I will be grateful if you could help me with the following questions: 1.I want to estimate how my final classes differ in terms of distal outcomes, but I don’t want the outcomes to impact the established model. According to Kam et al. (2013). Journal of Management doi:10.1177/0149206313503010 supplementary materials, to achieve that I can specify starting values with * and use STARTS = 0 function. When I do that, the thresholds are still a bit different than with the model without outcomes. When I fix all the parameters in the model using @ instead of *, I get an identical solution as in the model without outcomes. Thus, isn't it better to use @ instead of *? 2.The reviewer asked me whether the differences in outcomes between classes are controlled for gender. Given that gender is a timeinvariant covariate in my model i.e. influences the structure of classes at t1 (and this further predicts classes structure at t2 and t3 due to autoregressive paths) is it ok for me to answer yes to the reviewer’s question? Thank you! 


1. See the papers on our website: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Appendices with Mplus scripts are available here. NylundGibson, K., Grimm, R., Quirk, M., & Furlong, M. (2014): A latent transition mixture model using the threestep specification. Structural Equation Modeling: A Multidisciplinary Journal, 21, 439454. 2. "differences in outcomes between classes are controlled gender" sounds like an investigation of measurement invariance wrt gender. That would involve examining the significance of direct effects from gender to an outcome. Having an effect from gender to the latent class variable does not do that, so you can say no and explain what you did  and if need be add the direct effect investigation (which is a bigger task). 


Hello, I am running an LTA across 4 time points with 5 initial parameters at the first time point (c4) and 6 at each subsequent time point to include death as an absorbing state (c5). I have fixed the class structure to remain constant across time. The code looks like this at the start: VARIABLE: NAMES = id U1  U10 M1 U11U15 M2 U16U20 M3; USEVARIABLES = U1  M3; CATEGORICAL = U1  M3; AUXILIARY = ID; MISSING ARE .; CLASSES = c1(4) c2(5) c3(5) c4(5); With variations on this: MODEL c4: %c4#1% [u16$1  u20$1] (15); %c4#2% [u16$1  u20$1] (610); %c4#3% [u16$1  u20$1] (1115); %c4#4% [u16$1  u20$1] (1620); %c4#5% [M3$1@15] (21); After 28 hrs the output completed. To confirm, I matched the frequency output for each parameter on the output to the frequencies for the original data. Everything matches... with the exception of a single parameter  U18. The values for U18 correspond to the frequencies for U13  the same variable for the previous time point. I have double checked the original data and the code, but I can't find the source of the error. Do you have any ideas? Thanks, Kim 


Send output and license number to support@statmodel.com. I assume you are looking at Tech10. And when you say "parameter" you mean latent class indicator variable, I think. 


Shall do. I am looking at the section of the output labelled "UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES" and yes I mean latent class indicator variable. I am new to this type of method, so excuse the errors. Kim 


I am using a continuous covariate to predict transition probabilities using MODEL CONSTRAINT. In the Web Note "LTA in Mplus: Transition probabilities influenced by covariates," it says "Transition probability tables can be computed via MODEL CONSTRAINT using specific x values. For instance, the above x=0 case may correspond to the mean of x and the x=1 case may correspond to one standard deviation above the mean." My question: How does the code need to be altered to specify an x value other than 0 or 1? 


I think you are looking at the Table 9 input which refers to the Table 6 parameterization also used in the UG LTA. So simply plug in a specific x value into the Table 6 formulas on which Table 9 input is based. You may also consider using the LTA calculator as mentioned in UG ex 11.14, page 240. 


Thank you, Bengt. If am I interpreting this correctly, I would simply multiply the g for each equation by the specific xvalue I'm wanting to test. I did use the LTA calculator, but a reviewer wants significance estimates for the conditional probabilities, which the LTA calculator doesn't provide. As far as I can tell, the only way to obtain these is by using MODEL CONSTRAINT and examining the significance of (from your example) the parameters or12 and or21, for the odds of transitioning from class 1 to class 2 and vice versa. Is that correct? 


Correct on both counts. 


Thank you. One more question: does the pvalue for the odds ratios (or12 and or21) refer to the significance of their difference from zero or from 1? Since they are odds ratios I would think their difference from 1 would be the pvalue of interest. 


Anything in Model Constraint tests against zero. The program doesn't know it is an OR. So you need to change to testing against 1. 


Dear Dr.Muthen I tried to run LTA model for two times point from 8.13 and I have 3 class in each point, Then output show Categorical Latent variable C2#1 ON C1#1 C1#2 C2#2 ON C1#1 C1#2 My questions are: 1) The Last class is a reference class right? 2) Can I reorder reference class for see Estimate in C3#3? 3) Would you suggest me about meaning of statement ON for I use to Interpret result my research ? Thank you so much 


1) Yes. 2) Yes, by choosing starting values so that you get the classes in the order you want. 3) That's a topic of how to understand multinomial logistic regression. We talk about that in Topic 2 of our handout and video from the Johns Hopkins series on our website. See also the papers under Papers, Latent Transition Analysis on our website. 


Thank you so much Dr.Muthen, and I have one question when I tried to choose start values the reference class are not change . Would you suggest me about syntax for start value to reorder reference class. This is my syntax MODEL: %OVERALL% MPS2 ON MPS1; MODEL MPS1 : %MPS1#1% [PRS11  CRE15] (15); %MPS1#2% [PRS11  CRE15] (610); %MPS1#3% [PRS11  CRE15] (1115); MODEL MPS2 : %MPS2#1% [PRS21  CRE25] (15); %MPS2#2% [PRS21  CRE25] (610); %MPS2#3% [PRS21  CRE25] (1115); Thank you so much for helpful. 


Your statements don't show any starting values. 


Dr.Muthen, If I want to use MPS#1 is reference class , but I'm not sure about starting values . Would you suggest me. Thank you so much. My LTA model for two time point and have 3 class in each time point MODEL: %OVERALL% MPS2 ON MPS1; MODEL MPS1 : %MPS1#1% [PRS11  CRE15*3] (15); %MPS1#2% [PRS11  CRE15*1] (610); %MPS1#3% [PRS11  CRE15*2] (1115); MODEL MPS2 : %MPS2#1% [PRS21  CRE25*3] (15); %MPS2#2% [PRS21  CRE25*1] (610); %MPS2#3% [PRS21  CRE25*2] (1115); 


Ask for SVALUES in your first run. Then use them as starting values for the next run where you switch the classes. 


Dear Dr.Muthen again,I got SVALUE for first run. If I want to use them starting value for swith reference class. My questions are: 1)I will starting value by *a for indicators or class? 2) I will must starting value only reference class or all class? I tried to starting value in model , but I can't change refence class.Would you suggest me again. Thank you so much. 


If in your first run you got SVALUES like %c#1% [y*5]; %c#2% [y*10]; then if you want to switch the class order in your second run, you say Starts=0 and: %c#1% [y*10]; %c#2% [y*5]; That's all you have to change. 


Hello again, I have another question about our LTA model. I have run the LTA with no covariates (4 time points, 4 classes at T1 and 5 thereafter to account for mortality) full measurement invariance. When I add covariates into the model the latent class composition changes. Should the latent class composition remain consistent between the model with no covariates and the one where covariates are added? Thanks also for the info above on reference categories. Very helpful. Kim 


If your data have no need for direct effects from covariates to latent class indicators, that is you have measurement invariance wrt the covariate values (e.g. malesfemale invariance), the 2 analyses should give the same latent class definition. To avoid class changes, you can always do 3step LTA as shown in the AsparouhovMuthen and NylundGibson et al. articles. 


Thanks for the speedy (as always  I have no idea how you keep on top of everything) response. I do have complete measurement invariance with all covariates, which leads me to think I am specifying the model wrong somewhere. I'll reread the references you suggested. Thanks again, Kim 


Dear Dr. Muthen , I have questions again . This is SVALUES for first run %OVERALL% mps2#1 ON mps1#1*1.35706; [ mps1#1*1.64971 ]; [ mps2#1*3.32805 ]; %MPS1#1.MPS2#1% prs11*3.00642 (3); res12*2.85507 (4); prs21*3.23058 (5); res22*2.33574 (6); %MPS1#1.MPS2#2% prs11*3.00642 (3); res12*2.85507 (4); prs21*3.23058 (5); res22*2.33574 (6); %MPS1#2.MPS2#1% prs11*3.00642 (3); res12*2.85507 (4); prs21*3.23058 (5); res22*2.33574 (6); %MPS1#2.MPS2#2% prs11*3.00642 (3); res12*2.85507 (4); prs21*3.23058 (5); res22*2.33574 (6); MODEL MPS1: %MPS1#1% [ prs11*4.09228 ] (1); [ res12*3.69243 ] (2); %MPS1#2% [ prs11*7.41578 ] (7); [ res12*7.07305 ] (8); MODEL MPS2: %MPS2#1% [ prs21*4.09228 ] (1); [ res22*3.69243 ] (2); %MPS2#2% [ prs21*7.41578 ] (7); [ res22*7.07305 ] (8); 


This is model starting values for second run to reorder class , I’m not sure this is correct. If this is correct, but I can’t switch class. ANALYSIS: TYPE IS MIXTURE; starts=0; MODEL: %OVERALL% mps2 ON mps1; MODEL MPS1: %MPS1#1% [ PRS11*7.41578] (1); [ RES12*7.07305] (2); %MPS1#2% [ PRS11*4.09228] (3); [ RES12*3.69243] (4); MODEL MPS2: %MPS2#1% [ PRS21*7.41578] (1); [ RES22*3.69243] (2); %MPS2#2% [ PRS21*4.09228] (3); [ RES22*3.69243] (4); Would you suggest me please. 


We request no doubleposting on Mplus Discussion. For these outputspecific question instead send the relevant outputs, input, data, and license number to support@statmodel.com. Clearly state how you want which classes to be reordered. 


Hello again, I am back with another question about the 3step LTA procedure. I have scaled back the model we are estimating to run a 3step model with measurement invariance 4 classes at time 1 and 5 classes at time 2 (to account for mortality). I have a couple of questions about the parameters to enter for the fixed threshold values for the Nvariables at step 3. I have input the values from the columns of the logit table for each C run at step 2 (I have tried both the first and the last columns). My questions are: 1) Do I need to do any manual calculations on the values from these tables or can I enter them from a specific row/column? If the info I need is contained directly in the table for a C4/C5 model, which row/columns should I use? 2) I am receiving the following error message: THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A CHANGE IN THE LOGLIKELIHOOD DURING THE LAST E STEP. AN INSUFFICENT NUMBER OF E STEP ITERATIONS MAY HAVE BEEN USED. INCREASE THE NUMBER OF MITERATIONS OR INCREASE THE MCONVERGENCE VALUE. ESTIMATES CANNOT BE TRUSTED. Can you offer any guidance?? Thanks again, Kim 


Hello yet again, I think I may have figured it out. The threshold values are based on the entire table with rows including unique threshold components for each status of the given class less one, which represents the reference category. This would mean for %c1#1% I would have three threshold values as in: %c1#1% [N1#1@5.203]; [N1#2@8.607]; [N1#3@8.607]; I have rerun without error messages. Can you please confirm if I am on the right track or still way off base? Thanks again, Kim 


Please send the output from the different steps, plus data if possible, to Support@statmodel.com so we can look at it more closely. 


Thank you for your response. I am unable to send the data as it is confidential, but I can send the full output with the starting values. Our institution is experiencing connectivity issues today, but I'll get it to you asap. Appreciate your support immensely. Kim 


Dear Drs. Muthen, I am trying to run an LTA with multiple groups, and I need the means to vary between males and females. My model looks as follows: VARIABLE: NAMES ARE id g Zinter1 Zexter1 Zinter2 Zexter2 Ztrint3 Ztrext3 Zmal4 Zmal5 Zmal6 Zdep7 Zanx7 Zmal8; USEVARIABLES ARE g Zinter1 Zexter1 Zinter2 Zexter2; CLASSES = cg (2) c1 (2) c2 (2); KNOWNCLASS = cg (g = 1 g = 2); ANALYSIS: ESTIMATOR = MLR; TYPE = MIXTURE; STARTS = 50 5; MODEL: %OVERALL% c2 on c1 cg; c1 on cg; MODEL c1: %c1#1% [Zinter1 Zexter1]; %c1#2% [Zinter1 Zexter1]; MODEL c2: %c2#1% [Zinter2 Zexter2]; %c2#2% [Zinter2 Zexter2]; MODEL cg: %cg#1% Zinter1 Zexter1 Zinter2 Zexter2; %cg#2% Zinter1 Zexter1 Zinter2 Zexter2; My questions are: 1) Running this model results in all means being the same for males and females. What am I doing wrong? 2) I wonder whether it's possible to run these models and get transition probabilities? All the best, Nejra 


1) use the dot option to refer to combinations of classes, e.g. %cg#1.c1#1% [Zinter1 Zexter1]; %cg#1.c1#2% etc 2) TECH15 will give you that. 


Hi Dr. Muthen, thanks for a prompt reply! I have tried using the dot language (my syntax is the same as above), but every time I do I end up with the following message: *** ERROR in MODEL command Unknown class label in MODEL CG: %CG#1.C1#1% What am I doing wrong? Best, Nejra 


Hello: My question followsup on the post above by [Aidan G. Wright posted on Wednesday, March 02, 2011  10:53 am]: In short, in the Nylund dissertation (and available ms for dl) the role of the covariates as modeled do NOT involve an interaction between c1 and the covariate(s) in predicting subsequent transition probabilities. Rather they are describing the covariates influence on class membership at a given time point. I am specifically referring to gender in this case. Is my understanding correct? I ask because using similar data I have modeled the putative interaction effect of c1 and x (sex) on transition probabilities to subsequent classes as outlined in the webnote and vis a vis UG 8.13 (knownclass option). In this model, how do I interpret the latent class regression of sex on class membership at each time point? I imagine the same way I would as detailed by Nylund but when the interaction is modeled, these estimates (of timespecific class membership based on sex[x]) are somewhat different than the former model (with no interaction). Thank you! 


I think all this is outlined in Web Note 13. 


Hi again Dr. Muthen, I am trying to run a 2wave multiple group LTA comparing group differences across 3 classes per wave. My syntax is as follows: USEVARIABLES ARE g a1 b1 a2 b2; CLASSES = cg (2) c1 (3) c2 (3); KNOWNCLASS = cg (g = 1 g = 2); TYPE = MIXTURE; MODEL: %OVERALL% c1 ON cg; c2 ON c1 cg; MODEL c1: %c1#1% [a1 b1]; %c1#2% [a1 b1]; %c1#3% [a1 b1]; MODEL c2: %c2#1% [a2 b2]; %c2#2% [a2 b2]; %c2#3% [a2 b2]; MODEL cg: %cg#1.c1#1% [a1 b1]; %cg#1.c1#2% [a1 b1]; %cg#1.c1#3% [a1 b1]; %cg#2.c1#1% [a1 b1]; %cg#2.c1#2% [a1 b1]; %cg#2.c1#3% [a1 b1]; %cg#1.c2#1% [a2 b2]; %cg#1.c2#2% [a2 b2]; %cg#1.c2#3% [a2 b2]; %cg#2.c2#1% [a2 b2]; %cg#2.c2#2% [a2 b2]; %cg#2.c2#3% [a2 b2]; According to your previous suggestion, I used dotlanguage to allow the means to vary across the gender groups. However, I get the following error message: *** ERROR in MODEL command Unknown class label in MODEL CG: %CG#1.C1#1% Am I missing something completely obvious? Best, Nejra 


Don't use the dot statements within Model Cg, but in the Overall part. 

Back to top 