

LCA + nonobserved patterns 

Message/Author 

Jon Heron posted on Monday, October 19, 2009  10:16 am



Hi there, unless one has a very large sample, the incorporation of more and more indicators into a latent class model will invariably result in a smaller and smaller proportion of possible data patterns being realised in ones' dataset. In Bartholomew's discussion of latent class analysis Bartholomew, Galbraith, Moustaki, Steele: Analysis and Interpretation of Multivariate Data for Social Scientists (Chapman and Hall/CRC, February 2002) the number of patterns has an impact on the global goodness of fit G, as does the number of cases in those patterns that *are* observed, hence the suggestion to collapse over patterns to improve matters. Since Mplus uses different fit indices, and the d.f. are not affected by the number of patterns, I was just wondering if there was a rule of thumb regarding this issue when using Mplus. Would it be detrimental to have 50% of all patterns unobserved for example? many thanks for any insight you may have Jon 


Perhaps their global goodness of fit G is a chisquare test of the frequency table? If so, such test typically are not useful with more than say 8 variables due to empty cells. In Mplus you can instead look at the Tech10 bivariate residual testing. 

Jon Heron posted on Wednesday, October 21, 2009  2:55 am



Thanks Bengt, Yes I think you are right regarding G A supplementary question I have is about structural zeros  some patterns might not be possible given the nature of the data. I would expect that in output such as Tech10 there is still an expected cell frequency for all 4cells of each bivariate comparison. Can I restrict the model in some way to guard against this, or is this not something worth worrying about. I have seen LCA's with structural zeros in smoking research whereby the bottom category of a repeated ordinal is a 'neversmoker' category which obviously can never be revisited once it is left. 


I haven't seen structural zeros used in LCA, but it has been used in LTA, that is, longitudinal LCA. Sounds like you may have an LTA situation given that you use the word "repeated". See for example the Kaplan article on LTA on our web site. 

Jon Heron posted on Thursday, October 22, 2009  1:20 am



My memory was rusty, it was actually a GMM "The five ordered categories are as follows: 0 = never smoker; 1 = puffer (not ever having smoked a whole cigarette); 2 = experimenter (having smoked a whole cigarette but <100 cigarettes total in a lifetime); 3 = current smoker (smoked 1–19 days in the last 30 days and 100 cigarettes in a lifetime); 4 = frequent smoker (>=20 days smoked in the last 30 days and >100 cigarettes in a lifetime)." it's from here: Rodriguez D, Moss HB, AudrainMcGovern J. Developmental heterogeneity in adolescent depressive symptoms: associations with smoking behavior. Psychosom Med. 2005 MarApr;67(2):20010. My concerned about category zero remains, although the authors state that "This categorization is entirely consistent with the literature on adolescent smoking". I am inclined to agree with you that an LTA seems sensible. Methinks I will contact the authors as we have mutual acquaintances. all the best, Jon 

Jon Heron posted on Thursday, October 22, 2009  2:37 am



Having thought this over some more, I guess in reality a GMM on these data is little different to a similar model on physical growth data where subjects are pretty unlikely to shrink between measurement occasions. 

Back to top 

