Message/Author 


Hi! For many of my LC models the response pattern frequencies can be pretty sparse, which makes the use of the chi square suspect. I remember a neat little paper in MBR, I think, by Linda Collins suggesting a Monte Carlo approach to assessing fit with such data. It strikes me that Mplus has MC capabilities; is there some way to implement Linda's approach in Mplus? and while I'm here... :) will it become possible to use LC with complex survey data at some point? Thanks, and thanks for such a great product! Chuck 


That is an interesting line of thought. Which MBR issue was that in? I'd like to get back to you about this to say if it can be done inside the current Mplus or needs to be done by generating data outside Mplus and running it through Mplus using the RUNALL Utility (see new web info). Further Mplus developments are going full speed ahead and latent class models with complex sample data is one of several things the team is working on. 


Thank you, Bengt! I believe this is the article (I lifted it from Linda's cv, the article is in the office): Collins, L. M., Fidler, P. L., Wugalter, S. E., & Long, J. L. (1993). Goodnessoffit testing for latent class models. Multivariate Behavioral Research, 28, 375389. It is (IMHO) a pretty cool little article. Thanks again, Chuck 


Thanks, Chuck. Will get back to you about this. 

Anonymous posted on Wednesday, December 05, 2001  7:23 am



I am interested in creating a latent class analysis using a series of mutually dependent variables (e.g. race, age dummies). In this data, people can only have one race and one age category. Is it even possible to use these types of variables to indicate latent class. All of the potential variables that may be included in the model all are in the form of mutually exclusive series. They overlap with other series, but not within series. There are a number of potential series that can be included in this model (e.g. zip code, aid categories, family status, gender). Second, in a separate run, where there were more variables that were not mutually exclusive, some categories were set to 15. Does this represent a potential problem or merely indicating that everyone of that indicator fits into one category. 

bmuthen posted on Thursday, December 06, 2001  6:02 pm



I can imagine a class being defined by a high probability of being in a certain age range, family status, etc. Mplus does not yet handle unordered categorical latent class indicators. Although I have not tried this, it would seem possible to approximate such analysis using a series of dummy binary indicators created from the polytomous variables, as long as only K1 binary variables are derived from a Kcategory item. Other readers of Mplus Discussion may have an opinion on this. Logits fixed at +15 do not imply a problem. In fact, it helps the interpretation in that such an indicator definately is switched on or off. For instance, a class that is defined as consisting of people positively inclined toward purchasing a youth item might have a low probability for the "age > 65" indicator being switched on. 

Anonymous posted on Monday, December 10, 2001  8:45 pm



It would be wrong to use such mutually dependent variables as class indicators, since class indicators should be independent given the class. Represent each series as unordered categorical latent class indicators, rather than through dummy variables. As long as there is no direct effect to this indicator (U on X) there is no difference between ordered and unordered indicator. As for using "zip code" as a class indicator, you may want to explore the latent class regression to model the impact of exogenous variables on the class membership. 

YI posted on Friday, May 10, 2002  4:38 am



Dear all, When I run a 2 class latent variable model, the error message like this shown up: "THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO CHANGE IN THE LIKELIHOOD DURING THE LAST E STEP. AN INSUFFICENT NUMBER OF E STEP ITERATIONS MAY HAVE BEEN USED. INCREASE THE NUMBER OF MITERATIONS. ESTIMATES CANNOT BE TRUSTED. " Where to increase the number of miterations? how to fix this problem? The model that I input is identifiable. YOur information and suggestion will be appreciated. Thank you. Yi 


MITERATIONS is an option of the ANALYSIS command. See page 40 of the Mplus User's Guide. 


Greetings. I'm working with an LCA with a large pool of dichotomous indicators (42). I started this project with a more traditional LCA (WinLTA), which, as I understand it, uses the crosstabulation of all of the items and the latent. Based on how long it was taking to iterate, I estimated it might finish sometime in the 22nd century. I reduced the indicator pool to 21 and had better luck. Does LCA in MPlus have the same limitation? That is, is computation time based on the 2^n cells? Or can I expand the item pool without as many worries? I'm also running into apparent identification problems pretty early with the 21item set, where the solution is dependent on starting values (for a 6class model I tried 20 different sets of starting values and got 20 different solutions). Is this likely to get better or worse with larger numbers of items? Sample size is 890, with a modest degree of missing data (1020%). Thanks. 

bmuthen posted on Thursday, August 15, 2002  7:26 am



Mplus has quite efficient LCA computations. I have an example with 17 binary items, n=7,300, and 4 classes that takes 50 seconds on my 1 Mhz computer. The number of items should not be a problem. Having as many local solutions as you report may be an indication that the 6class model does not have much information in the data to support it, i.e. it may not be suitable for the data. The new Mplus version 2.12, which will shortly be out, offers a new likelihood ratio test (TECH11) that allows you to see if a smaller number of classes is sufficient. Testing the model fit is difficult with many items because of sparse cells, so this approach of testing the incremental fit when adding classes is useful. 


Great, thanks. 1 Ghz, I take it? These are taking 2030 minutes per run on a 1.6 or 1.8 GHz (I don't quite remember) machine. I run a SAS job to generate the random starting values, and then run MPlus in batch mode. So it's basically a day devoted to each set of runs. I did go with a simpler model (5 classes  4 out of 20 runs converged on the same solution, and the BIC is only marginally worse than the best BIC from the 6class models). I look forward to 2.12. Meanwhile, I'll experiment with adding items. Thanks. 


Following myself up. I went to the 42item models, and was surprised to find that each ran in less than 40 seconds. I've got the warning that the crosstab was too big to allow for calculating the chisquare. But, since I'm not using the chisquare, that's fine with me. Everything I do need (including 2LL) seems to be present. Apparently the chisquare calculation was what was taking most of the 2030 minutes. If I'm interpreting this correctly, it makes me wonder if in future versions it might be possible to turn that off while running smaller models . . . Thanks. 

bmuthen posted on Thursday, August 15, 2002  9:58 am



Yes, I meant 1 Ghz. Please send the input and data for the run that took 2030 minutes to support@statmodel.com for further investigation. 

Andy Ross posted on Wednesday, October 12, 2005  9:55 am



Hi In a LCA I was wondering at what level of sparseness should i be concerned? As far as i have understood it, any level of sparesness will undermine the chisquare test of model fit, and that we should therefore use BIC instead. However, does there come a point when the level of sparesness will undermine the whole solution? And if so, what is that point? Many thanks for your support Andy 

bmuthen posted on Wednesday, October 12, 2005  11:39 am



It depends very much on the model and how many parameters need to have information from the sparse parts of your data. You need to have at least a couple of observations contributing to a parameter's estimation. To get good SEs one needs larger samples than point estimates. Probably the best way to get a rough sense of this is to do a Monte Carlo simulation using Mplus. There are general guidelines for doing this (although not specifically for LCA) in the Muthen & Muthen (2000) SEM article referenced on our web site. See also montecarlo inputs on your Mplus CD. 


I'm experimenting Multiple Imputation of large categorical dataset using latent class analysis as proposed by Vermunt et al. (2008). To do so I need to first run a LCA to estimate the density of many categorical variables (about 300). Vermunt et al. recommended to suppress the NewtonRaphson algorithm and standard error estimation in this procedure since we are only interested in the posterior class probability of the subjects. May I know how can I suppress the NewtonRaphson algorithm and standard error estimation in LCA? 


You would fix all parameters. 


Dr. Muthen, The only parameters that I'm interested in are class proabilities. Do you mean that I should fix all variances? Would doing so suppress the NewtonRaphson algorithm? 


I mean  fix all model parameters. That's the only way to avoid the optimization. You still get estimated posterior probabilities for each individual and each class. 


Dr. Muthen, It appears that Mplus does not allow fixing variances for categorical outcomes and all my 293 variables are categorical. What would you recommend me to do to avoid variance estimation? 


Categorical outcomes do not have variances. You don't need to fix them. Just don't mention them. 


Hi, I'm interested in conducting LCA on lifestyle data (n=217) with 11 twocategory indicators (ordinal) and 1 threecategory indicator (ordinal. I'm concerned that being excessive with the number of indicators/variables given the small sample size. While concerned about sparse data the computation time is short and class profiles make a lot of sense. Are there assumptions for LCA in terms of minimum sample size relative to number of indicators (and # categories per indicator)? Looking forward to your reply. 


I think LCA is perhaps less sensitive to small sample sizes due to it being a parsimonious model. But to get a good feeling for it, you need to do a Monte Carlo study for your particular case, which is easy to do in Mplus (see Chapter 12 of the v6 UG). 


Also concerning the problem of sparse data: I am applying a MoverStayerModel with four time points and 1,700 cases. For each time point, there is one nominal manifest indicator. I am succeeding in estimating the model if my nominal indicator (and hence also the latent variables) have less than five classes. However, the nominal variable that is of interest to me has seven classes. If I try to model the MoverStayerModel with more than four classes, either MPLUS processes indefinitely (I quit the program after running three weeks) or tells me that it does not have enough memory space to run it. I understand that the problem has to do with sparse date. With seven classes and four panel waves, I would ultimately arrive at a crosstable with 7^4 = 2401 cells, with a lot of them being empty. Before I quit my enterprise, however, I would like to ask whether there is anything I can do about this problem. Do you know of a way to run a moverstayermodel with seven manifest and latent classes? 


The number of classes is not determined by the number of categories of the nominal variable. I would start with 2 classes. 


Hello, I wanted to know if I am overfitting my latent class model as although it is estimated in Mplus I am not sure if it is empirically identified. I have 5 binary indicators of my latent variable (discrimination) and have estimated four classes of discrimination among a sample of 965 adolescents. Any help would be greatly appreciated? Thanks, Bernice 


Do you have an LCA with five binary latent class indicators or a model with one continuous factor that has five binary factor indicators? 


Sorry if I was not clear. I have an LCA with five binary latent class indicators. Thanks for your help, Bernice 


How many classes does your model have? 


My model has four classes. Thanks! Bernice 


You can identify up to 5 classes with 5 binary indicators. See Slide 72 of the Topic 5 course handout to see how to figure this out. 


Thank you Linda for the reference and associated formula... very helpful. Is their an equivalent way to estimate degrees of freedom and LCA parameters when you have both binary and ordinal indicators. In another analysis I have estimated a 2 class LCA with 3 indicators (one of which is binary and the other 2 indicators are three level ordinal indicators) I realize that a 2 class LCA model with 3 binary indicators is not empirically identified. Thank you! Bernice 


You need to figure the cells in your H1 model and subtract 1 for the number of H1 parameters. You need to figure the number of thresholds in the H0 model and the number of categorical latent variable means for the H0 model. Just expand on the slide I referred you to. 


In the TECH10 output, response patterns are printed. For example, 1 00 2 10 3 01 4 11 5 *0 6 *1 7 0* What does response pattern 5, 6, and 7 indicate, where there is a "*" as one of the values? This is an analysis with two dichotomous variables. Thank you! 


Please send the output and your license number to support@statmodel.com so I can see the context. 


Hi Linda, thank you for your response. I don't presently have an active support contract. I looked at this again, it appears that the *'s are not present when I use listwise=on, so I assume the *'s represent response patterns with missing data? 


I think that is right. 


Thank you Linda and Bengt for your responses! 

mpduser1 posted on Friday, June 14, 2013  1:23 pm



I am fitting a series of LC models in Mplus using weighted data and the newer model fitting procedures (specifically LMR ChiSquare), as well as AIC and BIC. I'm noticing that LMR ChiSquare and BIC provide indication of a reasonable number of latent classes (T < 5), and substantively the classes seem reasonable. However AIC does not provide an indication of a reasonable number of latent classes. So, my question is, is the wide divergence between the AIC and BIC results a function of sparse data? I would guess that AIC is less conservative than BIC in identifying the best latent class solution. Yet I've not seen published examples where the AIC fails to suggest a reasonable number of latent classes. 


I am not sure that sparse data would lead to this discrepancy. Don't know what might. 


Hello, I am running a latent class model with 9 binary indicators. Looking at the univariate response proportions, I notice that very few study participants fail to endorse this item (ie. 2 out of almost 500). Given this item provides very little information on which to differentiate individuals, I am inclined to drop it from the model. This item is also a key item when determining study eligibility so one would expect 100% endorsement. My question: Can items with such skewed distributions lead to problems with a LCAl and it is generally advisable to consider removing them when performing a latent class analysis. 


I don't think such an item necessarily leads to problems in the analysis, but it probably doesn't contribute much and you are wise to remove it. 

Sam Daneils posted on Thursday, November 14, 2019  1:14 pm



Dear Dr. Muthen, I have a question on how to improve LCA class homogeneity and separation for indicators with sparse response patterns. I'm estimating an LCA model with 14 binary indicators initially. But because all of them contain sparse responses and initial LCA model with 14 indicators indicated high bivariate residuals among different indicators, I further joined 14 indicators to 6 indicators. However, my models (4 class results shown below as an example) still have issues with low class homogeneity and separation. I tried constraining posterior parameters but that didn't help much. Inspired by your other publications, I think I will try including covariate to the model. Could you please point me to other approaches I should consider? Any suggestion is greatly appreciated! loglikelihood: 31052.761 Akaike (AIC) 62159.521 SampleSize Adjusted BIC 62277.897 Chisquare 68.19 Entropy 0.642 C1 C2 C3 C4 I1 0.076 0.322 1 0.065 I2 0.025 0.358 0.043 0.017 I3 0.007 0.359 0.09 0.012 I4 0.774 0.443 0.236 0.35 I5 0.635 0.494 0.271 0.366 I6 0.194 0.233 0.01 0 Thanks in advance! Sam 


You can work with your 14 items and study their itemspecific entropy contribution. See http://www.statmodel.com/download/UnivariateEntropy.pdf 

Back to top 