Message/Author 


I want to compare latent class analysis for symptoms in girls and boys. Can I do a 2group analysis in LCA (where I could then force the groups to be equivalent or not and check whether this causes worse fit), or do I do the two groups separately? Another question: I have been looking at Tech 8 to check on smooth convergence. Occasionally, a "QN" or "FS" pops up in the algorithm column and this is often accompanied by a big jump in the change in log likelihood. So the change in log likelihood is not smoothly converging to zero (which is what I thought I was hoping to see in the Tech 8 output). Is this a problem? Can you explain to me a little more what I should be checking for in the Tech 8 output. Thanks for all the help, Jennie 


It is not necessary to use multiple group analysis to compare the symptom items across gender. You can just regress the symptom items on gender and accomplish your goal. The symptom items have no variances or covariances, so you don't need multiple group analysis. The QN and FS in the algorithm column indicate that the estimation algorithm has changed. QN stands for quasinewton and FS stands for Fisher Scoring. This is not something to worry about. Following are the things you should be looking at in the TECH8 output: 1. loglikelihood should increase smoothly and reach a stable maximum  with a change in algorithm, there may be more of a change 2. absolute and relative changes should go to zero 3. class counts should remain stable 


Dear Linda, I want to check whether a latent class model with 12 binary LC indicators and 4 classes is the same for males and females. Therefore, I used the KNOWNCLASS option to do a multigroup analysis. I arrived at constraining the response probabilities to be equal across gender but now I also want to test if the class sizes are equal for both males and females. I tried to specify this with the following model statement: MODEL: %OVERALL% [csex#1.c#1*0.882] (49); [csex#1.c#2*0.216] (50); [csex#1.c#3*0.716] (51); [csex#2.c#1*0.882] (49); [csex#2.c#2*0.216] (50); [csex#2.c#3*0.716] (51); However, it didn't work. Could you give me a hint about what I must change? Thanks a lot! 

bmuthen posted on Sunday, February 27, 2005  11:30 am



I think perhaps the easiest way to get the gender invariance of class probabilities that you want is to instead let c and cg be uncorrelated as they are by default. Note that in ex 8.8 you have c#1 on cg#1; which means that the c class probabilities vary as a function of the cg (Knownclass) classes. Leaving out that line makes the c probabilities the same for the classes of cg, which is what you want. Try that. 


Thank you Bengt. This was actually what I did (I left out that line) but it appears from my output that the class sizes are only approximately (but not perfectly) identical: FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON THE ESTIMATED MODEL 1 1 247.64496 0.14610 1 2 116.88705 0.06896 1 3 148.71330 0.08774 1 4 221.79412 0.13085 1 5 115.96057 0.06841 2 1 245.60793 0.14490 2 2 115.92559 0.06839 2 3 147.49004 0.08701 2 4 219.96972 0.12978 2 5 115.00672 0.06785 When I do the (restricted) analysis in PANMARK, the class counts in both groups match perfectly (though the parameter estimates appear to be identical to those of Mplus). Do you have an explanation? 


Please send your input, output, and data to support@statmodel.com so we can answer your question. 

bmuthen posted on Thursday, March 03, 2005  1:47 pm



Christian  it looks like the difference in estimated class probabilities is merely due to the slightly different group sizes. You have sample size 851 in the first group and 844 in the second (this gives a ratio of 1.008). Since the estimated proportions that you give above are from the joint distribution of the 2 x 5 table (the 10 proportions at to 1), you would only see the same 5 class proportions in the two groups if you account for the sample size difference. For instance, 0.14620/1.0083 is approximately 0.1449. 


Dear Bengt, thank you very much. Later when I was looking at the transition probabilities I realized that the class probabilities were actually identical in both groups. I have another question. I did my multigroup LCA both in Mplus and PANMARK. Now, I found that the Loglikelihood, AIC and BIC values as well as the parameter estimates are identical in both programs. However, df, Pearson X^2 and the LR test statistic were different for multigroup analysis (not for single group LCA!). Do you have an explanation? Thanks again, Christian 

bmuthen posted on Thursday, March 24, 2005  7:52 am



There was a glitch in the computations of those statistics when Knownclass was used  this has been fixed in version 3.12. Let us know if you still have discrepancies after trying that. 

Anonymous posted on Thursday, September 01, 2005  9:46 am



I have crosssectional data from a psychological inventory and I want to test the hypotheses that there are different classes (or profiles) for different age groups (2 age groups). What is the best strategy to test this hypothesis, using age as a predictor or using Knownclass, e.g., ex 8.8? In addition, what paper do you recommend that could help me interpret the data for group differences? I need to be as descriptive as possible because my audience will not be very sophisticated in terms of stats knowledge. Thanks, P 

bmuthen posted on Thursday, September 01, 2005  11:30 am



Using age as a predictor is perhaps most straightforward. There is a 1985 Clogg & Goodman paper in Soc Meth which discussed group differences in latent class analysis. 

anonymous posted on Thursday, January 12, 2006  5:23 am



Is it feasible to regard latent classes generated through LCA as subpopulations? In other words, I would like to use the 4 latent classe yielded in my LCA in a general SEM model, but rather than use the classes as predictors in this model, i would like to carry out multigroup comparisons, based on individuals' most likely class membership. Is this possible, or does this result in estimation error? or would it be better to include the actual class probablities, and not most likely membership, as predictors in a model? 


It is not a good idea to use most likely class membership as a grouping variable. You will be introducing estimation error and your standard errors will not be correct. You could use the class probabilities as predictors, but it would be better to do the analysis simultaneously not in two steps. 

anonymous posted on Thursday, January 12, 2006  9:03 am



THanks for this. can you point me towards an example of how this is done in a single step, or can you suggest any paper/ reference which has applied this? many thanks 


The way to have a latent class variable as a covariate, that is, to regress a dependent variable on the latent class variable, is to allow the means of the dependent variable to vary across classes. 

RJM posted on Tuesday, January 24, 2006  4:13 pm



I would like to estimate a multiple group (KNOWNCLASS = cg) hidden markov model with two classes at each occasion, testing the equality for the two groups of the probability matrix linking each latent class variable and its indicator. Is this possible? I tried coding the model as follows but Mplus v3.13 returns an error that the class label is unknown. MODEL x: %cg#1.x#1% [f$1] (111); %cg#1.x#2% [f$1] (121); %cg#2.x#1% [f$1] (211); %cg#2.x#2% [f$1] (221); 


Please send your input, data, output, and license number to support@statmodel.com. 

Andy Ross posted on Tuesday, May 09, 2006  11:03 am



Dear Prof. Muthen I am attempting to run a multigroup analysis comparing a latent class solution for two groups In the first step I ran a four class solution for each group simultaneously using the KNOWNCLASS command, allowing both class and conditional probabilities to vary across groups: TITLE: slca DATA: FILE IS c:\slca; VARIABLE: NAMES ARE pt re kd em hq g; USEVARIABLES ARE pt re kd em hq; CLASSES = cg(2) c(4); KNOWNCLASS = cg (g = 1 g = 2); CATEGORICAL = pt re kd em hq; ANALYSIS: TYPE = MIXTURE; STARTS = 10 5; STITERATIONS = 200; MITERATIONS = 3000; MODEL: %overall% c#1c#3 on cg#1; In the next step I wanted to run a restricted model in which the class probabilities are equal across groups (structural homogeneity). However I have not been able to set this up. I tried inputting the start thresholds for the first group, and constraining the conditional probabilities to be equal across groups using the following command syntax: TITLE: slca DATA: FILE IS c:\slca; VARIABLE: NAMES ARE pt re kd em hq g; USEVARIABLES ARE pt re kd em hq; CLASSES = cg(2) c(4); KNOWNCLASS = cg (g = 1 g = 2); CATEGORICAL = pt re kd em hq; ANALYSIS: TYPE = MIXTURE; STARTS = 10 5; STITERATIONS = 200; MITERATIONS = 3000; MODEL: %overall% %cg#1.c#1% [pt$1*4.1 pt$2*2.7](p1); [re$1*2.7 re$2*7.8](p2); [kd$1*3.5 kd$2*1.5](p3); [em$1*0.3 em$2*1.5 em$3*3.7](p4); [hq$1*2.8 hq$2*0 hq$3*0.8](p5); %cg#1.c#2% [pt$1*0.7 pt$2*0.2](p6); [re$1*2.2 re$2*12](p7); [kd$1*3.1 kd$2*12](p8); [em$1*2.8 em$2*3.1 em$3*3.2](p9); [hq$1*3.3 hq$2*0.4 hq$3*0.3](p10); %cg#1.c#3% [pt$1*1.4 pt$2*0.6](p11); [re$1*1.2 re$2*3.6](p12); [kd$1*3.2 kd$2*0.5](p13); [em$1*0.8 em$2*0.1 em$3*1.8](p14); [hq$1*0.5 hq$2*2.3 hq$3*3.2](p15); %cg#1.c#4% [pt$1*4.0 pt$2*5.0](p16); [re$1*3.7 re$2*0.9](p17); [kd$1*3.8 kd$2*6.8](p18); [em$1*0.8 em$2*1.0 em$3*1.1](p19); [hq$1*1.2 hq$2*0.7 hq$3*1.5](p20); %cg#2.c#1% [pt$1*4.1 pt$2*2.7](p21); [re$1*2.7 re$2*7.8](p22); [kd$1*3.5 kd$2*1.5](p23); [em$1*0.3 em$2*1.5 em$3*3.7](p24); [hq$1*2.8 hq$2*0 hq$3*0.8](p25); %cg#2.c#2% [pt$1*0.7 pt$2*0.2](p26); [re$1*2.2 re$2*12](p27); [kd$1*3.1 kd$2*12](p28); [em$1*2.8 em$2*3.1 em$3*3.2](p29); [hq$1*3.3 hq$2*0.4 hq$3*0.3](p30); %cg#2.c#3% [pt$1*1.4 pt$2*0.6](p31); [re$1*1.2 re$2*3.6](p32); [kd$1*3.2 kd$2*0.5](p33); [em$1*0.8 em$2*0.1 em$3*1.8](p34); [hq$1*0.5 hq$2*2.3 hq$3*3.2](p35); %cg#2.c#4% [pt$1*4.0 pt$2*5.0](p36); [re$1*3.7 re$2*0.9](p37); [kd$1*3.8 kd$2*6.8](p38); [em$1*0.8 em$2*1.0 em$3*1.1](p39); [hq$1*1.2 hq$2*0.7 hq$3*1.5](p40); MODEL CONSTRAINT: p1=p21; p2=p22; p3=p23; p4=p24; p5=p25; p6=p26; p7=p27; p8=p28; p9=p29; p10=p30; p11=p31; p12=p32; p13=p33; p14=p34; p15=p35; p16=p36; p17=p37; p18=p38; p19=p39; p20=p40; However this did not work. Could you please tell me how I can set up and run the structural homogeneity model for the above example? Also, can I check, in order to run the next step in which I also restrict the class probabilities to be equal across groups (complete homogeneity) I simply run the original syntax, except for removing the model command: MODEL: %overall% c#1c#3 on cg#1; Is this correct? Many thanks for your support Andy 


You need to send your input, data, output, and license number to support@statmodel.com to get help on this. 


hi I want to test the hypotheses that proportions of 2 classes in low educated group are same to proportions of 2 classes in high educated grop, usig education as a preditor. However, I could not found correct commend to compair proportions of classes between 2 groups... Is there any commend for my test? Although I know the way to test my hyphotheses using a KNOWNGROUP model, the result of this model did not caculate 'df'(I don't know the reason). Because I want to statistical test using X^2 distribution, df must be needed. many thanks 


Instead of using the education variable as a grouping variable, use it as a covariate and regress the categorical latent variable on it using the ON option of the MODEL command. 


I would like to test group differences in 3class LCA profiles across two variables: sex and affection status for a disorder. I tried to do this using the KNOWNCLASS option, by creating 4 groups: male/unaffected, female/unaffected, male/affected, female/affected, and then equating the response probabilities step by step, starting with group 1 and 3 vs. 2 and 4 etc., and comparing the BICs for these models. I coded this as follows: KNOWNCLASS = cg (group=1 group=2 group=3 group=4); classes = cg(4) c(3) ; and then: Model: %OVERALL% C#1 ON cg#1; C#2 ON cg#1; C#1 ON cg#2; C#2 ON cg#2; C#1 ON cg#3; C#2 ON cg#3; I have two questions: 1) Did I specify this model correctly (I have never seen any scripts using more than 2 groups)? 2) Is it a valid approach to create four groups the way I did, or is there a better way to do this? 


1. It looks correct. In Version 5, you can simply say c ON cg; 2. I would use 2 times the loglikelihood difference not BIC. 

C. Sullivan posted on Thursday, July 24, 2008  6:00 am



Hi, I'm trying to conduct a multigroup lca using "knownclass" (adstat) and while I can run a model constrained in terms of conditional item probabilities, I'm having difficulty holding the lc probabilities to be equal. Any advice on how to constrain those lc probabilities would be much appreciated. This is the input that I have so far. MODEL: %Overall% drgcrm#1 on adstat#1; %adstat#1.drgcrm#1% [coc$1] (2); [op$1] (3); [pcp$1] (4); [mj$1] (5); %adstat#1.drgcrm#2% [coc$1] (6); [op$1] (7); [pcp$1] (8); [mj$1] (9); %adstat#2.drgcrm#1% [coc$1] (2); [op$1] (3); [pcp$1] (4); [mj$1] (5); %adstat#2.drgcrm#2% [coc$1] (6); [op$1] (7); [pcp$1] (8); [mj$1] (9); 


Try removing the statement: drgcrm#1 on adstat#1; If you continue to have problems, send your files and license number to support@statmodel.com. 


Sorry for this very simple question, but when comparing a restricted versus unrestricted LCA model with known classes, do you compare the statistics printed as the loglikelihood values or the likelihood ratio chisquares... Thanks, James 


To test nested LCA models, the regular loglikelihood values are compared. 2 times the loglikelihood difference is used. 


Hi, I'm comparing latent class structures between groups,I've performed the analysis separately on the two samples and results are that a 3 three class model is good for group 1; while for group 2 a 4 class model is preferable. Since classes from 1 to 3 are in effect similar in both groups (e.g., each class is defined in the same way by the same items), can I specify a model that consider at the same time the measurement equivalence between the class from 1 to 3 and the fact that group 2 has 1 additional class? 


That seems to be a reasonable approach. 


ok, but how can I specify the model? if I constrain means of class 4 of group 1 (cg#1.c#4) to be equal to zero I have 0 subjects in the models with no other constraints but 114 subjects in the model where I constrain measurement invariance across groups. Do I have to constraint some other parameter? the syntax is: CLASSES = cg (2) c(4) ; KNOWNCLASS = cg (country = 1 country = 2); ANALYSIS: TYPE = MIXTURE; STARTS = 500 50; MODEL: %OVERALL% c ON cg ; %cg#1.c#4% [v1v6@0]; MODEL c: %c#1% [v1v6]; %c#2% [v1v6]; %c#3% [v1v6]; 


In overall, for the class you want to be the empty class, for example, class 1, specify: c#1 ON cg#1@15; For further support, please send your question and license number to support@statmodel.com. 


Hello! I am wanting to do a LCA with multiple groups. I have 4 groups, and for 3 of the 4, a 2 class solution fits the data best, but for one class a 3 class solution fits best. Is there a way to free the number of classes for that one group with 3 classes, or some other way to handle this situation? Gracias! 


It is not easy to work with different number of classes in different groups. Instead, you could investigate the 3class solution in all groups  perhaps in the 3 groups where 2 classes fit best the 3class solution is just a minor extension of the 2class theme. 


I am working on a model similar to those described above (gender comparison of latent classes based upon 13 binary indicators) using the knownclass approach. We have been able to run models that freely estimate itemresponse probabilities across class by gender and also to run models in which the itemresponse probabilities are restricted within class by gender (i.e., males and females in a given class are restricted to have the same itemresponse probabilities). Where I am having some difficulty is in restricting the class probabilities (i.e., prevalence of each class) to be the same across gender. I have removed the regression of latent classes on the knownclass from the overall statement (as suggested previously), but the class probabilites still differ by gender. 


Please send your input, output, data, and license number to support@statmodel.com so we can see your exact setup. 

F Lamers posted on Friday, October 22, 2010  1:32 pm



I’m trying to run an LCA using the KNOWNCLASS command to evaluate whether profiles are similar between two groups. First, I ran an unrestricted model where class and conditional probabilities are allowed to vary across groups: VARIABLE: NAMES ARE sampleid d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 group; USEVARIABLES ARE d1d10 ; CATEGORICAL= d1 d2 d3 d4 d5 d6 d7 d8 d9 d10; MISSING= all (1234); IDVARIABLE IS sampleid; CLASSES= cg (2) c(2); KNOWNCLASS = cg (group=0 group=1); ANALYSIS: TYPE= MIXTURE; STARTS= 400 100 ; PROCESSORS=2; MODEL: %OVERALL% c on cg; Now, I would like to equalize the response probabilities in the classes between the two groups (and then compare the models with a 2 loglikelihood test). I’ve been trying to write the input for this, but I’m not sure if I’m doing things the right way. MODEL:%OVERALL% c on cg; %cg#1.c#1% [d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (114); %cg#1.c#2% [d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (1528); %cg#2.c#1% [d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (114); %cg#2.c#2% [d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (1528); Is this the right way? 


That looks correct. The best way to check is to run it and see if you get what you expect. 


I am trying to run a similar model to the one above, where response probabilities are equalized across different groups, and I have the following queries: 1) when the dependent variables are categorical, how many constraints do I need? Is it the number of categories minus 1? 2) Do the groups need to be the same size when doing multiple group analysis with knownclass? 


1. The number of thresholds for a categorical variable is the number of categories minus one. 2. No. 


Hi: I have a similar question that has been posed above but I am not sure if I am interpreting my output correctly. I have a 4 class model (generated from 7 binary latent class indicators) and the substantive question I want to ask is whether membership in these classes is the same for males and females (gender as covariate). I have regressed the categorical latent variable on gender using: MODEL: %OVERALL% C#1 on sex; However, I am not sure how to interpret the estimates. Specifically the overall C#1 on sex estimate and then the 3 intercept estimates (C#1, C#2, C#3). Any insight would be greatly appreciated. Also, is there a separate plot command to generate the plot for relationship between class probabilities and covariate (as opposed to the item probabilities for each class)? 


You should use the specification c ON x. For this multinomial logistic regression with four classes, you will obtain three regression coefficients and three intercepts. The intercepts are used along with the regresson coefficients to compute probabilities. See pages 443445 of the Version 6 user's guide. As far as the regression coefficients, the last class is the reference class. Let's say for a continuous covariate x in class 1, the regression coefficient for x is positive. The interpretation is that as x increases, the log odds increases for those in class 1 compared to the reference class. 


Thanks! Is it the case that if one uses algorithm integration (to regress the covariate on the latent classes) that plots for the relationship between class probabilities and the covariate can not be generated? I have seen examples of these plots and would like to be able to see them for my data, but I can not run the {above} model without using algorithm integration. 


I think that is true. Do you really need algorithm = integration? (Also, c ON x doesn't mean to regress the covariate on the latent classes, but the other way around.) 


Thanks and thanks for the language clarifiation. I worked with my syntax and was able to get the plots of probabilities for the classes as a function of its covariate (I was able to run it without algorithm=integration). One conceptual question: given that the covariate is not assumed independent amongst the classes, what is the usual justification for including a covariate in class generation? In other words, if there is a theoretical basis for including the covariate in the model, but the logistic regression coefficients are not significant does it make sense to NOT include the covariate in subsequent latent class generation? Relatedly, the approach I have been taking with my data is to use the classes to ascertain or predict differences on various relevant antecedent and sequalae variables (e.g., ANOVA/logistic regression). But does it make more sense to use variables that might explain the classes as covariates in the model (aside from the binary behavioral indicators of the phenomenon)? 


Answer to your 1st question: Yes. Answer to your 2nd question: If you think of certain variables as antecedents to the latent class variable, I would include them as covariates ("c ON x"). Then you can also see how the covariate means change over the classes. This is different from having these variables as indicators of the latent classes because there is no assumption of conditional independence among covariates, 


Thanks again for the helpful remarks. I am still in the process of fully grasping certain aspects of the conceptual aspects of the LCA method. One further point of clarification on the above: Is it the case that I can not compare models (same # classes) with and without the covariates included? Although my BIC and adjusted BIC values are more favorable with the covariates included, their estimates are not significant. That said, I am not certain such a comparison is tenable (i.e., one can only compare the same model in terms of adjudicating different class sizes). 


Because the likelihood is computed for outcomes conditional on covariates, you can compare BIC between models with and without covariates as long as the models have the same outcomes. But if none of the covariates are significant, why include them? Note that you can test joint significance of the covariates by Model Test testing if all their slopes are zero  instead of looking at the z score for each slope separately. I'm not sure, but I think this is what you were asking. 


Just to be sure of myself, when you use outcomes you mean classes correct? Yes, since the covariates are not significant, I likely will not include them in the model. The somewhat odd result I am having trouble grasping is that with the same indicators a 4class solution is best without the covariates but a 3class solution seems best with them (indeed, model estimates are more favorable for the 3class with covariates than without). Substantive theory in the area I am working in would point to either a 3 or 4 class solution, so in that sense it is not a problem. One thought I had was that, even though the covariates in the 3group solution are not significant (risk index comes close), the resultant classes (3) may have better predictive yield for the substantive question I am looking to address with the classes (i.e., unique predictive correlates). Is this a reasonable strategy to approach the issue from? 


I actually meant the observed outcomes being the same  which I assume you have in your case. The topic of deciding on the number of classes with or without covariates has been studied by several authors  you may want to email Katherine Masyn at Harvard School of Ed for example. Some covariate may have direct effects on the observed outcomes which makes things more complex; in that case you should include the direct effect. See also my 2004 chapter on GMM in the Kaplan handbook. If you have theoretical reasons for a set of covariate influencing the latent class variables, I would not be against reporting results from such an analysis even if none of the covariates turns out to be significant. 


Dr. Muthen thanks again this is very helpful. If I am correct in that observed outcomes = observed latent indicators then yes they were the same 7 for each model. Given that a 4class model emerged without the covariates, but a 3class model emerged with them (albeit they were nonsignificant) it must still be the case that group membership has changed (4 to 3 classes etc.) as a function of the inclusion of the covariate? Or am I misunderstanding? I have not utilized the threegroup solution (classes) yet in analyses but I am anxious to see if they show a different pattern of results (in terms of their relationship to subsequent variables). One thing that may be limiting the model is that the N (cases) is 177. This seems rather low based on my reading of most LCA literature. However, given the somewhat rarity of the behavioral phenotype of interest, 177 is probably the largest sample on which this type of analyses will likely be conductedat least at present. 


If your 3class model with covariates has some significant direct effects from some covariates to some observed latent class indicators, it may be that is has a worse BIC than a 4class model. So the group membership may not change when the modeling with covariates is done in more depth. A sample less than 200 can make BIC underestimate the number of classes  see the article on our web site: Nylund, K.L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535569. 


Thanks again, I read over the Nylund piece in detail and it was quite helpful. One thing that a number of colleagues have broached with me is that it may be circular to claim that, in my case for example, that a given class shows associations with a covariate (e.g., maternal depression) if that covariate is informing the class membership in the first place (i.e., their position is that is more plausible to just use the latent class indicators to inform class enumeration and then test for differences on antecedents in a separate step). I have received this critique a number of times and I am curious as to what is the advantage of using covariates in the LCA modeling process (i.e., what is the rebuttal to this claim)? 


That's a big topic. I think it all depends on the application situation. It is interesting that this comes up often in LCA but not in factor analysis. A factor model with factors regressed on covariates (a MIMIC model) typically would not be questioned  the covariates contribute information to the formation of the factor scores so you include them. Although there are exceptions  take the MIMIC model used by ETS to score student performance. On the individual level using only the indicators makes sense  you wouldn't want to have say gender influence your factor score, only your performance. But on the group level, covariates improve the scores. In your case, if you want to show how strongly a covariate predicts class membership, you do that best in a single run including the covariate. Doing it in several steps (without covariates, classify, regress class on covariates) has its own complications. However, I can see an argument for not using the covariates when deciding on the classes. For instance, if covariates are genotypes and the indicators are phenotypes  here you want to define the phenotype without using the genetic information, and then see the strength of relationship. 


Thanks, Dr. Muthen. Would one include a direct effect of a covariate on a given indicator if they had a priori theory or evidence that the covariate influences, say, one of the X indicators? This is certainly not the case for me (i.e., I only suspect that the chosen covariates will in some way influence class membershipI have no prior evidence to suggest that they are associated with a given indicator) but I was curious as to when one would include such a direct effect of a covariate on an indicator (in addition to regressing the class on the covariate). 


You would do it either by theory, or when less is known, simply by regressing one indicator at a time on all covariates in an exploratory way to see which direct effects are significant. 


Returning again to this thread to make sure I am interpreting some things correctly...Just finished reading some of Collins and Lanza (2010) and had a few questions. If I fit an LCA model with 5 covariates (simultaneously) is it the case that the oddsratios will tell me the increase in probability of a given class membership for a oneunit increase in a given covariate but that in order to test whether a covariate is significant, I would need to run separate models with and without a given covariate and perform the chisquare test of model fit? Also, I have not standardized the covarites prior to entry into the model..I realize this will not affect the results other than easing the conceptual part, but I did find out that to obtain standardized parameters for this type of model, numerical integration is required. I was just curious why this (numerical integration for standardized coffecients) was the case. 


The printed z tests (Est/SE) for the coefficient of each covariate gives the test of whether the covariate has a significant influence. I don't see how the request for standardized calls for numerical integration  better send to support to see the full picture. 


Thanks Dr. Muthen. This was my original understanding and I got a bit turned around after reading Chapter 6 of Collins and Lanza. They note that in an LCA with covariates the latent class prevalences are expressed as functions of the regression coefficients and individuals' values on the corresponding covariates. I get this. They then mention that hypothesis testing in LCA with covariates is done by means of a likelihood ratio chisquare test. This is where I became a bit confused. By hypothesis testing I am now understanding the authors to mean testing competing models (i.e., one with a covariate and one without). So, if there is not a significant improvement in model fit with the addition of the covariate (even though it may have a significant oddsratio estimate on a given class relative to the reference class) how does one evaluate the covariate? I guess since in this case I am using a hypothetical example of one covariate, if it has a signifcant oddratio estimate, there would have to be improvement in class prediction relative to a baseline model with no covariate? To be more clear, in my example of a 5covariate model, 2 of the 5 covariates had significant oddsratio estimates depending on the reference class (theoryexpected). I can say that these covariates significantly influence class membership correct? There is no need, say, to compare to a baseline model without these covariates? 


You can test the significance of a covariate by the z test for its slope that Mplus prints. You can also run 2 models, one with the slope free and one with the slope fixed at zero. 2 times the loglikelihood difference for these runs is chisquare which is zsquared  so these two tests should agree. To test more than one covariate having zero slopes you can still do the likelihood difference testing  I think that's the testing mentioned in the book. 

Stata posted on Monday, April 30, 2012  10:30 am



Dr. Muthen, I'd like to confirm with you regarding Nylund et al (2007) mentioned "the commonly used log likelihood difference test cannot be used to test nested latent class models". Does it mean the loglikelihood (Ho in Tech 8) value cannot be used for LCA comparison or does it mean something else (Tech 11)? Thank you. 


If means that to decide on the number of classes, you should not use loglikelihood difference tests. You should instead use BIC, TEHC11, TECH14 etc. 

Tracy Witte posted on Thursday, October 17, 2013  4:57 am



I am running a 3group LCA and have used the 3step procedure for testing equality of means in auxiliary variables across classes. I would like to provide effect sizes for differences in means on the auxiliary variables across the classes. Would it be appropriate to convert the SE's to SD's and then calculate Cohen's d from the means and standard deviations? Or, is there some other, preferred approach for determining effect sizes? 


Yes, that would be appropriate. 


We have an LTA that displays measurement noninvariance over time. However, upon visual inspection, one of the classes appears relatively unchanged. Is there a way to compare item probabilities over time to see which ones differ and which don't? I know there are ways to compare proportions in independent and matched samples, but this class at two timepoints doesn't classify as independent or matched because it has some of the same people and some different people. Thank you! 


You can test invariance one item at a time. 


Do you mean by using the likelihood ratio test to compare models (i.e., two times the difference in loglikelihoods)? If so, how do you interpret the case where the two models are not different from each other? I imagine that if the constrained model fits better, the item probability does not change over time, and if the free model fits better, the item probability changes over time. However, I wouldn't know how to interpret a nonsignificant result. Thank you! 


Yes, an LR chisquare test. If this test does not give significance we can't reject equal parameters, so we take them to be invariant. 


Thank you! 


I try to run an LCA using the KNOWNCLASS command to evaluate whether profiles are similar between two groups. I have 4 classes based on the observed items. If I want to test some specific classes, e.g. c#2 and c#4 based on result, whether the conditional probabilities are equal between the two groups, how do I specify the scripts? 


You can use MODEL CONSTRAINT to test if the thresholds or probabilities are equal across classes, for example, MODEL: %overall% [u$1 u$2]; %c#1% [u$1] (p1); %c#2% [u$1] (p2); MODEL CONSTRAINT: NEW (diff); diff = p1  p2; To test the probabilities, you need to create them in MODEL CONSTRAINT using the thresholds. 


I am having trouble interpreting my LCA multigroup output. In my first multigroup model (with two groups, 5 classes), I am not finding similar patterns of probabilities for the classes across groups. I had presumed that even though classes can vary across groups, that the patterns for the 1st, 2nd, etc., classes should be roughly similar. Is that not the case? Are classes just put in different order for each group? Similarly, when I constrain thresholds to be equal within one class, I am not finding that the paths are actually the same for the same class across the two groups (although the Wald test is significant). Why would that be the case? I thought that I should see the thresholds as being identical? Is it because the groups are different sizes? 


I assume that you are using 2 latent class variables and that one of them is specified as Knownclass. If so, the unknown classes can come out in different order for the 2 known classes. But the mean/probability profiles will tell you which class should be comparable to which. For the question in your second paragraph we would need to see your output  send that to Support with your license number  and point out what is not equal that you expected to be equal. 


Just to clarify  with the knownclass option, the classes could show up in different order across the known groups. So, 1 1 might not be the same class as 2 1 in the output. I need to use the predicted probabilities to match them? Also, what if I find that the overall predicted N for the classes is different with the Knownclass option as it was for the initial model? Shouldn't they be the same or can they shift once you can disaggregate by groups? I figured out part two. Thanks 


Q1. Yes. Q2. They can shift. 


Dear, Dr. Muthen. I have questions on testing measurement invariance in multiple group LCA. I have fitted two models with binary indicators: one model without any restriction on thresholds and another model with equality constraints on all thresholds between sex. 1) Can I use difference of 2*H0 Log likelihood between the two models to test thresholds equality between sex? 2) Alternatively, I attempted to test thresholds equality between sex using model test option. I have manually checked the order of classes and selected pairs to test that I think were appropriate. Optseed was used to fix the order. Is this way correct? 3) Test results were different between the two methods above. Test was insignificant from 1) (p=.31) but highly significant from 2) (p<.001) . I think the method 1) is more reasonable mainly because 1. there still is arbitriness in selecting pairs to testin method 2), 2. I'm not sure if the methods are equivalent, and 3. sample sizes were different between sex. 4) Is there better way to test the measurement invariance? Thank you for your help! 


See the Version 7.1 Language Addendum on the website with the user's guide under convenience features for multiple group analysis. This shows the models to use for testing measurement invariance and convenience features that can help you do this. 


Thank you for your response, Dr. Muthen. I think the relevant part of 7.1 language addendum is the last part about Knownclass and Do option. As far as I understand, what I did using Model test command was equivalent with what's described in the document, and using Do option is a convenience feature that helps user do the same thing more easily. I'm still wondering whether my first question  using 2*H0 Log likielihood difference  is true, because I still think this is a better way of testing threshold invariance for the reasons I described in my third question above. Thank you for your help! 


I forgot to mention this part. Is this page also relevant as I used MLR? http://statmodel.com/chidiff.shtml Thank you! 


See Multiple group factor analysis: Convenience features This has convenience features for testing measurement invariance. I think this is what you want to do. 


Thank you Dr. Muthen. I found the section and still wonder if it's relevant with LCA because it says "It is available for CFA and ESEM models for continuous variables with the maximum likelihood and Bayes estimators..." I'm running LCA using binary indicators. My main question is whether it's appropriate to use 2*H0 Log likelihood from the output to test threshold invariance. I'm using MLR estimator. In other words, I wonder if the difference of 2*H0 log likelihood between nested models follows chisquared distribution. I assumed it follows chisquared distribution based on my previous search on this forum. http://www.statmodel.com/discussion/messages/13/254.html?1401311492 http://www.statmodel.com/discussion/messages/23/393.html?1261082914 I'd also appreciate if you could recommend relevant readings. Thank you for your help. 


Yes, the LRT test is correct to use for threshold invariance. I don't know about references; there may be in the LTA context. 

db40 posted on Monday, April 06, 2015  2:33 pm



Hi, I'm just trying new things and I'm wondering if I can use Knownclass to examine a 5 class solution by gender? Thank you. 


Yes, you can do that. 


Hi. I have two data sets. I get a 4class solution separately in each sample as well as when I combine the samples. I'd like to get a better sense of whether the class solutions differ across samples and/or whether the solution is present when controlling for sample. I tried to run the analysis with group included using the KNOWNCLASS option. This is my code: VARIABLE: NAMES ARE g p1p20; CLASSES = cg (2) c (2); CATEGORICAL = p1p10; KNOWNCLASS = cg (g = 0 g = 1); ANALYSIS: TYPE = MIXTURE; ALGORITHM=INTEGRATION; MODEL: %OVERALL% c ON CG; MODEL c: %c#1% [p1p20]; %c#2% [p1p20]; MODEL cg: %cg#1% p1p20; %cg#2% p1p20; I get the following error (for all categorial vars): Variances for categorical outcomes can only be specified using PARAMETERIZATION=THETA with estimators WLS, WLSM, or WLSMV. Variance given for: P1 It seems to be related to the fact that 10 of my variables are binary. When I remove the line about categorical variables, the analysis seems to work, but I get a small error (too big to reproduce here). Am I able to include the binary variables here? Also, why can't I run TECH11 and TECH14 here (I will use this to contrast the different models)? Is this right approach for what I'm trying to do? Thanks! 


Your errors stem from the variance statements: MODEL cg: %cg#1% p1p20; %cg#2% p1p20; Apart from there being no variance parameters for categorical items, I don't know why you say this. Do you mean to use brackets here: [p1p20]? If you do, then you are saying that there is not measurement invariance across Knownclasses. 


Thanks for your quick and helpful response. I used the variance statements from another example I found but evidently that wasn't correct. So, to reiterate: 1. I have two samples of data 2. When I run the LPA separately, I get vaguely similar pattern of results. 3. When I run the LPA on the combined data sets, I get a set of results that converge with other research, but the samples are disproportionately represented across the different classes. I'd like to get a better sense of the extent to which the results are independent of the sample. I don't have any strong predictions regarding how the two samples would differ however. Is including sample as the knownclass variable worthwhile? Could you provide an example from the manual that I could use as a guide with regard to the covariance constraints? Alternatively, would it be better to include sample as a covariate? Thanks for your help! 


Look at UG ex 8.8 which uses an approach where you let the outcome means vary across the crossclassification of the 2 latent class variables. That's the most flexible LPA (with categorical outcomes there are no variance modeling to consider). You can then see how the means change over those crossclassifications. In that model you can also test measurement invariance in 2 ways. One, make different kinds of restrictions on the means to test invariance by comparing loglikelihoods resulting in likelihoodratio chi2 testing. Or, apply Wald testing using Model Test. 

Alvin posted on Monday, May 04, 2015  4:56 pm



HI Prof Muthen, I am working on a threeclass multigroup LCA. While I couldn't get full invariance in response probabilities, I was able to get partial invariance after letting the response probabilities of class 2 vary by 2 groups. This was done based on significant differences in probabilities of some of the items within class 2 across groups, while class 1 and class 3 are relatively homogenous in terms of probabilities of endorsement. Next, I am going to constrain class distribution to be equal across groups. Do you constrain the means to be equal, or? Kankaras recommends BIC as a key indicator to compare nested models (rather than LRT which is subject to sample size), is this acceptable? I notice in the case of complete homogeneity, there is no interaction effect between group and class, is this the model of complete invariance? 


Q1. Yes. Q2. BIC is good. But since it is a function of logL it is also influenced by sample size. Q3. I don't know how one looks at an interaction between group and class. 


Hello, I see above that you stated differing group sizes is not an issue when doing multigroup comparisons using KNOWNCLASS. Could you explain what MPlus does to adjust for the different sample sizes? Also, I am conducting a multigroup analysis using the KNOWNCLASS option, with 8 groups and wanted to compare coefficients across these groups (for a total of 28 possible comparisons). I have used MODEL TEST to produce a Wald Test. I have just changed the coefficients to be compared and rerun the syntax to produce Wald tests for each of the comparisons. I wanted to know if MPlus makes any corrections for multiple comparisons such as this or if something like a Bonferroni correction is required (This would be less than ideal considering the 28 comparisons being made). Thank you for any assistance! Will 


I don't think that multigroup and knownclass are any different with respect to different group sizes. No, Mplus does not make adjustments for multiple testing. The analyst has to do that. 

Chee Wee Koh posted on Thursday, February 25, 2016  8:28 pm



Hi there, Following up on Bengt's response to Devin on Apr 13 above, I tested MI in my model using UG e.g. 8.8 as reference (mine is LPA, not GMM). I'd like to check if I have done it correctly. DATA: FILE IS wvdat2.dat; VARIABLE: NAMES ARE u1u2 g; USEVARIABLES ARE u1u2; CLASSES = cg (2) c (4); KNOWNCLASS = cg (g = 0 g = 1); ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% u1 ON u2; c ON cg; %cg#1.c#1% [u1] (1); ![u2] (5); %cg#1.c#2% ![u1] (2); ![u2] (6); . . %cg#2.c#1% [u1] (1); ![u2] (5); . . %cg#2.c#4% ![u1] (4); ![u2] (8); As I progressively fixed means of u1 to be equal across the two knownclasses, the LL changed very little; however, when I then proceeded to fix means of u2, the LL increased drastically (almost doubled and much higher than when no mean was fixed). 1. Does this imply there is noMI? 2. Have I done the MI test correctly? Thank you. 


Please send your output to Support along with your license number. 


Hi, I am following up on my post above (the syntax in the post had an error which has since been fixed). I am interested in establishing measurement invariance across male and female data. First, I conducted LPA on male and female data separately. There were 4 profiles in each group which appeared similar across the 2 groups. I specified a model where all parameters were freed. Then I specified another model where all indicator means in group 1 were freed and corresponding indicator means in group 2 were equal to those in group 1. I used TECH1 to track parameter estimation. I computed 2(LL diff) to compare the two models and the chisquare was not significant. It appears, however, that class proportions differed in the two groups. I like to verify whether I have grasped the implications of the results: 1. Have the analyses sufficiently established that gender has no direct effect on any of the profile indicators? 2. Do the results imply that classspecific response probabilities do not differ between males and females? 3. Can I pool the male and female data for further analysis and specifying gender as a covariate affecting latent class only? 4. If the answers to the above are 'yes', then why did I not have to constrain factor loading across groups like how it is done to show scalar invariance in CFA? Thank you! 


Sorry, for (4), I meant metric equivalence. 


13: Yes. 4. There are no loadings when the latent variable is categorical. 


Oh that's right! Thank you! So, when we specify the crossclassification model, and constrain all measurement parameters in profiles to be equal across groups to test structural equivalence, we have essentially also fixed the class by knownclass interaction to zero, and thereby ensuring metric equivalence. Is this interpretation correct? 


Right. 

Ali posted on Tuesday, October 11, 2016  5:11 am



I am using the LCA model with four nominal variables for 7 countries. 6 out of 7 countries have 3 classes,but among 6 countries, two countries have different interpretations as other countries. And, 1 out of 7 countries has 2 classes. When I put the whole countries together, I have 3 class solution. My purpose is to know a typology of the use of learning strategies for each country, but now I have different number of latent class across countries. So, I don't if it violates measurement invariance and it makes interpretation reasonable. 


With these country difference I would just analyze each country separately and report the similarities and differences as you have here. 


Dear Bengt and Linda, I am conducting LCA on data about companies. There are 2 groups in my sample (say public and private) and I want to know if the best solution in terms of number of classes is the same for the 2 groups (subsamples). I plan to do the LCA on each subsample separately. Is that the right way or is there a better way? My understanding of KNOWNCLASS is that it will create the same number of classes within the 2 groups and that its purpose is to see if the means and thresholds are the same or different across the 2 groups (so it's not useful to me at this point). Is that correct? 


Q1. Since no parameters are held equal across the groups there is no benefit to analyzing them together. Q2. Yes. Knownclass (for cg say) can also have c on cg which means that the class percentages can vary as a function of cg. 

Jordan davis posted on Saturday, January 21, 2017  11:50 am



Hi Dr. Muthen, I'm wondering if it is possible to test mean differences for distal outcomes (the BCH method described in your webnote) across multiple groups. We have a 3 class solution for our LCA and wanted to look at Time 2 distal outcomes stratified by Sex (males and females). Is this possible in Mplus yet? Thanks! 


You can let sex be a covariate and use the approach in section 3.2 of web note 21. 


Thanks Dr. Muthen, A couple of clarifying questions after reading web note 21. 1. I don't see mention of setting logits for the classes. I thought this was the most up to date practice. example: MODEL: %C1#1% [CL3_W1#1@2.937]; [CL3_W1#2@0.727]; 2. Example 3.2. is this suggesting I use Female as my X variable to predict my distal outcome BY class? Example: %C#1% SHVICT6 on Female; .... 3. If I do set logits as noted above does this have implications for interpretation? 4. Are results interprted as a groups analysis? if so, should we test equality across the regression coefficients? Thanks! Jordan 


You should read our first paper on this. See Recent papers on our website: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Threestep approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Download appendices with Mplus scripts. 

Ali posted on Tuesday, February 28, 2017  5:32 am



I am comparing latent class structure between countries. First, I analyzed the two samples separately, and country 1 had a 3class model, while country 2 had a 2class model. When I interpreted the latent classes, both countries only had one similar interpretation. That means class 1 had similar interpretation across country 1 and country 2, but class 2 and class 3 in country 1 had different interpretations with class 2 in country 2. Second, I conducted the LCA on the whole sample, combined two countries data together. The results showed a 3class model fitted better based on BIC. Finally, I conducted the multiple LCA analysis with 3 classes. When I checked the estimated parameters, I found the 3rd class in country 2 could be a minor extension of the 2nd class in country 2. So, now, I only have one similar interpretation in both countries when I did the multiple group LCA analysis. So, how could I do measurement invariance across two countries with the multi group LCA analysis? 


You could either Use 3 classes and apply measurement invariance for only the one class Use 2 classes as an approximation and apply measurement invariance for both classes You may want to discuss this further on SEMNET. 

Ann Nguyen posted on Saturday, May 20, 2017  4:23 am



Hello, I would like to run a latent class regression analysis stratified by 3 age groups using the manual 3 step method. I am interested in testing how the relationship between a set predictors and the latent class variable (2 classes) might vary across age groups (variable name = agegroup). In step 3, I used the KNOWNCLASS option and the following syntax in the Model command: MODEL: %Overall% c on sex educ; c on agegroup; Model c: %c#1% [n#1@2.507]; %c#2% [n#1@3.350]; Model agegroup: %agegroup#1% c on sex educ; %agegroup#2% c on sex educ; %agegroup#3% c on sex educ; With this syntax, I receive the following message: ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED: Parameter 115, MODEL AGEGROUP: %AGEGROUP#3%: C#1 ON SEX Parameter 116, MODEL AGEGROUP: %AGEGROUP#3%: C#1 ON EDUC Am I receiving this message because there is an error in my syntax or the sample size for agegroup 3, class 1 is too small? 


To answer this we need to see your full output. Send to Support along with your license number. 


Dear Drs. Muthén, I am using the new BCH method to conduct an LCA distal outcomes analysis. My primary research interest after enumerating the latent class model is to explore whether the correlation between a distal outcome (z) and a covariate of interest (x1) varies as a function of class membership (controlling for the influence of two other covariates (x2 and x3) on the distal outcome. Syntax from the model statement is below: MODEL: %OVERALL% z on x1 x2 x3; %C#1% z on x1 (b1) ; z on x2 ; z on x3 ; %C#2% z on x1 (b2) ; z on x2 ; z on x3 ; %C#3% z on x1 (b3) ; z on x2 ; z on x3 ; MODEL TEST: 0 = b1  b2 ; 0 = b1  b3 ; I have two questions: 1) is the model test statement (omnibus wald test) appropriate here for testing whether the correlation between Z and x1 differs in at least one class? 2) I have missing data on my distal outcome (z) variable. So, when I run the distal outcome analysis I have fewer cases included in the full structural model than in my step 1 measurement model. Is there a way to use FIML on my distal outcome variable so that I can use all cases in the step 3 structural model? As it is, it seems to be listwise deleting cases that are missing my distal outcome. Thank you very much for your time, 


Q1: Yes. Q2: No. 


Thank you for your response! 

Back to top 