mike zyphur posted on Wednesday, February 08, 2006 - 1:22 pm
Hi Bengt/Linda, If one wants to compare LC structures across samples, how similar should the sample sizes be? It would seem that having very disparate sample sizes would stack the deck against finding the same class structure (even if it truly was the same) because it would be easier to justify additional latent classes with a larger sample size. I've gone through the Hagenaars & McCutcheon text, but havn't found a place that addresses this issue specifically.
Anyone have any thoughts?
Thanks for your time!
bmuthen posted on Wednesday, February 08, 2006 - 6:36 pm
I assume LC means latent class models - I guess it could happen that more classes are found with larger samples; I haven't seen a literature on that - one the other hand, the larger sample has a chance to reveal more classes that are real while a smaller sample might have too volatile a likelihood for finding this. Also, a bootstrapped LRT test (Mplus Version 4) is probably protective against sample size matters.
I am doing LCGA with 2 groups and there is a large difference in sample size (N=700 versus N=4600). In the larger group I find an additional class. I understand there is no reference stating that more classes are found with larger samples?
About the LRT test, what do you mean by 'is probably protective against sample size matters'? Could I use the bootstrapped LRT test in both samples, and would the results then be adjusted for the difference in sample size?
I know of no methods reference pointing to finding more classes with larger samples - I have not seen studies of BIC and BLRT as a function of increasing sample size among sample sizes that are large (as in your case with n=700 and 4600). Perhaps that is a good methods topic. My assumption is that BLRT is less sensitive to pointing to more classes with increasing large sample size, but that is no more than a conjecture. We know that BIC tends to underestimate the number of classes for small n (see Nylund et al).
I would focus on how the classes relate in the 700 and 4600 groups. Perhaps some of the 700 classes split into sub classes for 4600 in which case one can see the increased sample size as having more power to detect more classes.