Message/Author 

Andy Ross posted on Thursday, October 18, 2007  7:12 am



Dear Prof Muthen I wish to extend example 7.21 (Mixture modelling with known classes  multiple group analysis) to include predictors  whereby x predicts c, but also allow this prediction to vary by cg. I would then also like to be able to test whether this variation is significant or not. Please could you advise me Andy 


You would specify c ON x in the classspecific parts of the MODEL command for the KNOWNCLASS variable. 

Andy Ross posted on Tuesday, October 23, 2007  10:06 am



Many thanks Linda I have another query if I may... I ran an analysis using the knownclass function holding the conditional probabilities equal across two samples. In the Mplus output I am informed that I have achieved this aim  however on saving the cprobs and using these to create a weight variable so that I can recreate the solution in SPSS, the solution for the two samples is only fairly equivelent. Would you know why this is? I have saved the cprobs to 16 decimal places so I expected the SPSS solution to be a highly accurate representation of the MPlus one (it has been before) Many thanks for your support Andy 


I'm not sure what you mean. What solution are you trying to reproduce in SPSS and how are you using the posterior probabilities to do this? If you cannot describe this briefly, please send the relevant information and your license number to support@statmodel.com. 


Hello, I am also working on a multigroup mixture model with known classes. My question is about the decision regarding the best model. If I run two models that are nested, how can I decide which model is the best? I would say that it is not enough to look at the BIC only. Thank you very much in advance! Thessa 


You can use 2 times the loglikelihood difference which is distributed as chisquare to test nested models. 


But dont forget to use the scaling factors since by default Mplus use MLR to estimate mixtures (unless you changed this default). See: http://www.statmodel.com/chidiff.shtml 


Hello, I have another question about my model. Again I am doing a multiple group analysis using mixture modelling with know classes (i.e. sex), since my dependent variable is poisson distributed. When I tried to analyze a model with latent variables, the computation could not be completed (even when the means/regression coefficients where not allowed to differ across groups). The program advices to give starting values (see below). Can you tell me what is the best way to choose starting values for this model? Thank you in advance! Unperturbed starting value run did not converge. 1 perturbed starting value run(s) did not converge. THE ESTIMATED COVARIANCE MATRIX IN CLASS 1 COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 389. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE ESTIMATED COVARIANCE MATRIX IN CLASS 1 COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 389. CHANGE YOUR MODEL AND/OR STARTING VALUES. WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. 


Please send your input, data, output, and license number to support@statmodel.com. 


I have a question about using the KNOWNCLASS option to do a multiple group analysis in a latent class model. I have covariates predicting my latent class variable and I want the relationship between the covariates and the latent classes to vary between my KNOWNCLASS groups. How do I incorporate that into the syntax? C is my latent class variable with 3 latent classes. G is my knownclass variable, which references 2 gender groups (0=male and 1=female). If I do it the following way, I get one set of coefficients for the regression of C on the covariates that does not vary by G. MODEL: %Overall% C on G; C#1 on covariates; C#2 on covariates; And the following doesn’t seem to give me the full set of regression coefficents for C on covariates. MODEL: %Overall% C on G; C#1 on covariates; C#2 on covariates; MODEL G: %G#1% C#1 on covariates; C#2 on covariates; %G#2% C#1 on covariates; C#2 on covariates; Do I need the MODEL G: command or do I list the %G#1% and %G#2% under the first MODEL: section? I am confused about that. 


Please send your input, data, output, and license number to support@statmodel.com. 


Hi, Is it possible to run a model like example 7.21 in the mplus manual except using categorical indicators? If so, what would the input file look like?When I try to run such a model I get the following each each of my indicators: ERROR in MODEL command Variances for categorical outcomes can only be specified using PARAMETERIZATION=THETA with estimators WLS, WLSM, or WLSMV. Thanks 


Yes, but variances are not estimated for categorical variables so you need to remove y1y4; 


Hi there, I specified TYPE=MIXTURE and used the KNOWNCLASS feature to run a multigroup model that should have been run using Bayesian estimation due to a small n. I learned that Bayesian estimation is not possible for multiple groups models, but was able to run the model this way. Can you tell me how parameters are estimated in LCA/mixture models (e.g., using KNOWNCLASS), and how this is different from default settings for multiple groups? Is parameter estimation more robust? Best regards, Adrianne 


You will obtain the same results using the GROUPING option as using the KNOWNCLASS option if all else is the same. 


Hi again, Thank you for your response. The same model using the GROUPING option gives an error message that reads, "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 35." When I use the KNOWNCLASS option with LCA, some parameters are fixed automatically "TO AVOID SINGULARITY OF THE INFORMATION MATRIX." Can you see any problem with this? Are the results the same as they would be if I constrained the parameter myself? Best regards, Adrianne 


Please send your output and license number to support@statmodel.com. It may be that you are mentioning the first factor indicator in the groups and classes. If you do that, you relax the fact that it is fixed at one. 

Laura Baams posted on Saturday, November 26, 2011  12:13 pm



Hi, I have a question about a parallel process model with Knownclass and Bayes. I would like to know the regression of s1 ON i2, and s2 ON i1 estimated freely for both classes. The syntax below enables me to do that. However, I would also like to obtain the correlations s1 WITH s2 and i1 WITH i2 estimated freely for both classes. In the multigroup MLR option I was able to do this. However with Bayes I get the following error message: *** FATAL ERROR VARIANCE COVARIANCE MATRIX IS NOT SUPPORTED WITH ESTIMATOR=BAYES. PARTIAL EQUALITY BETWEEN TWO VARIANCE COVARIANCE BLOCKS. IF TWO PARAMETERS FROM TWO DIFFERENT VARIANCE COVARIANCE BLOCKS ARE HELD EQUAL THEN ALL THE PARAMETERS HAVE TO BE EQUAL IN THE TWO BLOCKS. Any help would be great! Thanks so much! This is the Analysis and Model part of the syntax: ANALYSIS: TYPE = MIXTURE; ESTIMATOR = BAYES; STVALUES=ML; STITERATIONS = 100; CHAINS = 2; PROCESSOR = 2; ALGORITHM = GIBBS(RW); MODEL: %OVERALL% i1 s1  w1_A@0 w2_A@1 w3_A@2; i2 s2  w1_B@0 w2_B@1 w3_B@2; %PR_2#1% s1 on i2; s2 on i1; s1 with s2; i1 with i2; %PR_2#2% s1 on i2; s2 on i1; s1 with s2; s1 with i2; 


Are you using version 6.12? 

Laura Baams posted on Sunday, November 27, 2011  12:59 am



No I have version 6.1. 


Please send your input, data, output, and license number to support@statmodel.com. 


Hi I would like to do a simple class analysis (measurement and not structural model) in three cultures. I failed to find anything on the website or in the book concerning multiplegroup analysis and measurement invariance testing. Can you please let me know if I can define groups (countries) for LCA. Can you also give me some info about equality constraints in multiple group LCA? This is an example of what I (and many other psychologists) need to do with LCA: http://www.springerlink.com/content/l3u1l343u202gn25/ many thanks, Ebi 


For LCA because the only parameters are thresholds, you can simply regress the latent class indicators on the dummy variables representing the groups. It is not necessary to do multiple group analysis. If the group dummy variables influence the latent class variable but not the latent class indicators directly, you have measurement invariance. If there are some direct effects, you have measurement noninvariance. See Chapter 14 of the user's guide where there is a section on multiple group analysis. Everything in this section also applies to known classes. See the Topic 5 course handout on the website where multiple group analysis with mixture models is discussed. 


Thank you very much indeed for your reply. The method you are suggesting reminds me of a MIMIC model, and I am comfortable with it. I just made a hypothetical syntax (for three countries, 8 dichotomous items) based on your suggestion using Topic 5 course handout. I made dummies for two countries and not three. This is totally hypothetical and I have not run it yet. Just want to double check with you if my syntax is going to work well for my purpose. Do you find any problem in this syntax? Or do I need to add any other thing for this multigroup analysis? TITLE: LCA for three countries DATA: FILE IS asb.dat; VARIABLE: NAMES ARE B1B8 US Thai; USEVARIABLES ARE B1B8 Us Thai; CLASSES = c(3); CATEGORICAL ARE B1B8 Us Thai; ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% c#1c#3 ON US Thai; OUTPUT: TECH1 TECH8; many thanks, Ebi 


This is the correct model to start. You should then add each b variable regressed on US and Thai one at a time. 


I am using the KNOWNCLASS to conduct a multiplegroup analysis because my dependent variable is nominal. This multinomial DV is regressed on an exogenous latent factor (CFA) composed of a mix of categorical and continuous items, some of which I know are not invariant across the known classes. When I specify my baseline model with configural invariance, I see that the residual variance for the continuous indicator is fixed across known classes. This seems to be different from the normal defaults. Is the behavior of the KNOWNCLASS statement documented somewhere, regarding how to use it to evaluate measurement invariance and adjust for noninvariance? Thank you. 


The defaults differ in different tracks of the program. You can relax the equality by mentioning the parameter is the classspecific parts of the MODEL command. The steps to test for measurement invariance do not differ just the defaults. 


Great, I appreciate the clarification. Thanks for your speedy response as always. 


I'm trying to conduct a MG LCA with 10 binary indicators using the knownclass command (for gender) but 2 of the indicators are answered by men only. This means that the data for women can only be missing on these variables. My question is can this MG analysis be conducted (perhaps using contraints?) if some of the variables are not common across men and women? 


You can either analyze each group separately or not use the indicators with missing data for women. 


Hi Linda, 1. Can you explain (briefly) the purpose of the KNOWNCLASS option in mixture modeling? If classes are known, and mixture modeling accounts for group membership that is only probabilistic, what is the value of the KNOWNCLASS option? Why does Mplus software recommend the KNOWNCLASS option in certain scenarios over a multiplegroup approach? 2. Can results of a mixture model estimated with the KNOWNCLASS option be compared with results from a model estimated outside of the mixture modeling framework? Or is estimation inherently different? 3. Is it possible to estimate a KNOWNCLASS model where there really is only one group? Thank you. 


1. Sometimes in mixture modeling, people want to compare groups like males and females. The KNOWNCLASS option is a way to do this. Sometimes multiple group analysis is not available using the GROUPING option and must be done using the KNOWNCLASS option. This has no statistical implications. 2. KNOWNCLASS and GROUPING do the same thing and if you do the same analysis in both ways you will obtain the same results. 3. Yes but the results would be the same as not using the KNOWNCLASS option. I'm not sure if we give a message in this case. 


Hi Linda, can you explain the difference between the "c" and "cg" factors in Example 7.21 of Version 7 Users Guide? It seems that both "c" and "cg" represent class membership. I'm confused about that and the implications for drawing up my general and classspecific models, where I will ultimately want to test some different patterns of beta weight (regression) parameters across the classes. Is it that cg will reflect class differences for variances/covariances/beta weights, and c will reflect class differences for means/intercepts? Thank you! 


cg is based on an observed variable. The classes are therefore not estimated but are known. It is identical to a grouping variable. c is a categorical latent variable for which class membership is estimated. In both cases, parameters can vary across the classes. 


OK, so if the model I am working with is not a measurement model but just a structural path model, no "c" would be neededonly the "cg" factor, which will allow the model as a whole to differ in certain ways across the classes? Is that right? 


There is no relationship between c and cg and which parameters can vary across classes. The only difference is that with c classes are estimated and with cg classes are not estimated. 


Why do I need a c factor if there is no latent portion to my model? cg will reflect class membership, and that's all that is needed, right? No c is needed when there is no latent portion to the model itself? 


Basically, Linda, this is my model (below). Since there are no latent variables in this model, can I just use "cg" but not "c"? I have 5 classes. I am running this as a KNOWNCLASS model because aud_grp (the dependent variable) is a count variable, which works outside of the mixture modeling framework with 1 group, but not with 5 groups. Thank you. MODEL: aud_grp on pardrink cond_col age_col (b_agecol); pardrink with cond_col@0; age_col with cond_col@0; age_col with pardrink@0; age_col on pardrink (a_pard) cond_col (a_cond); MODEL CONSTRAINT: NEW (ind_pard ind_cond); ind_pard = a_pard*b_agecol; ind_cond = a_cond*b_agecol; 


If all classes are known, you need only a KNOWNCLASS variable. 


Great! Thank you. So I only need either cg or c, but not both. (Correct me if I have misinterpreted.) Thanks! 


You need one categorical latent variable. It can be called anything. It is also a KNOWNCLASS variable. 


Hi Linda, Why is it that in some class models, one must state the MODEL command a second time (after it is stated for the overall model), and in other class models, you do not need a second model command? I have a single class variable in my model (which I called "cg"). I did not restate the MODEL command when I started writing code for classspecific parameters. The model ran and the results are great. But then I noticed that I had not written the MODEL statement again for the classspecific estimates, like this:  CLASSES = cg (5); KNOWNCLASS = cg (eth_gene=1 eth_gene=2 eth_gene=3 eth_gene=4 eth_gene=5); ANALYSIS: TYPE = MIXTURE; ALGORITHM=INTEGRATION; INTEGRATION = MONTECARLO; MODEL: %OVERALL% (overall parameters here) MODEL cg: %cg#1% (parameters that differ in class 1)  The model runs great when I DO NOT include the "MODEL cg" line, but when I do include it, I get the message: *** ERROR in MODEL command Unknown class model name CG specified in Cspecific MODEL command. Why is that? I am surprised that my initial run went smoothly without the second statement of the MODEL command. Thanks for helping me understand, Linda. 


Please send the output and your license number to support@statmodel.com. 


Dear All, If I may kindly ask you for advice ... I received review of my paper, which applies multigroup latent class. One of the points reviewer makes is: Please provide a brief discussion on estimation when there is a different number of observations in each group. As my groups are time points and the number of responses at different time points ranges from 3000 to 10000 I would like to ask you: Does the different size of groups affect the results in Mplus? Could you help me with some references to deal with the issue raised by the reviewer? I already tried hard on my own to find the answer but was completely unsuccessful. Thanks in advance Piotr 


Are the same people in your groups. Do you have repeated measures of the same people across time? 


Dear Linda, I have a panel, but I don't use this information. I just treat each period separately as a different group. It will be the next step of my research to use latent transition. I think that the reviewer is just interested in technical issues of estimating multigroup LCA with different number of respondents in each group. Thank you very much Piotr 


Multiple group analysis requires the groups to consist of different people. You should be comparing across time in a single group analysis. 

Mike Todd posted on Wednesday, August 07, 2013  11:41 am



We have data from 2200 individuals sampled from 2 different cities. Our goal is to use 7 individuallevel indicators to obtain meaningful latent profile solutions. In exploring the possibility that the profile solutions differ between cities via the KNOWNCLASS command we have obtained somewhat confusing results Allowing only the estimated item means to vary across cities (KNOWNCLASS categories) results in a large increase in the number of parameters (30 vs. 52) but *worse* fit as judged by absolute differences in 2LL, BIC, and AIC. I estimated a series of 4 nested(?) models each with 3 derived/estimated classes and 2 observed/known classes. Model 1 ignores city altogether (no KNOWNCLASS command); Model 2 allows item means to vary across cities, Model 3 allows item means and class probabilities to vary across cities; and Model 4 allows item means, item variances, and class probabilities to vary across cities. Model 4 fit better than Model 3, which fit better than Model 2, which makes sense to me. But only Model 4 fit better than Model 1, which confuses me. I feel like I must be missing something fundamental about the nestedness (or nonnestedness) of my models. The results suggest that Models 2 and 3 are not actually less constrained versions of Model 1. Is this true? 


Model 1 is not on a loglikelihood metric comparable to the other models, which also means that BIC and AIC are not on a comparable metric. The reason is that Knownclass contributes to the likelihood (imagine an observed indicator, the probability of which is estimated). 


Hi, I have an LTA model with two time points and 4 classes at both time points. I need to test whether the transition probabilities differ by gender by testing 1) an LTA model in which the transition probabilities are constrained to equality across gender, and b) a model where the probabilities are allowed to vary between gender. I haven't been able to find out how to constrain trans. probabilities to equality. I tried like this: CLASSES = csex (2) c1(4) c2(4); KNOWNCLASS IS csex (SEX=1 SEX=2); ANALYSIS: TYPE = mixture; STARTS = 100 25; MODEL: %OVERALL% c2#1 ON c1#1 csex#1 (p1); c2#1 ON c1#2 csex#1 (p2); c2#1 ON c1#3 csex#1 (p3); c2#2 ON c1#1 csex#1 (p4); c2#2 ON c1#2 csex#1 (p5); c2#2 ON c1#3 csex#1 (p6); c2#3 ON c1#1 csex#1 (p7); c2#3 ON c1#2 csex#1 (p8); c2#3 ON c1#3 csex#1 (p9); And I can't add c2#1 ON c1#1 csex#2 (p1); c2#1 ON c1#2 csex#2 (p2); etc. because I can't refer to the last class of csex on MODEL command. Any help would be appreciated. 


Sorry, I had to shorten my message and the meaning was lost. What I tried above does constrain the simple class probabilities as equal for men and women, but not the transition probabilities. 


Take a look at Mplus Web Note 13, parameterization 2. 


Thank you, I modified my code based on the Web Note 13, like this: MODEL: %OVERALL% c1 ON csex; c2 ON c1; MODEL c1: %c1#1% c2#1 ON csex; c2#2 ON csex; c2#3 ON csex; %c1#2% c2#1 ON csex; etc. but I get an error message "Invalid ON statement: C2#1 ON CSEX#1. The order of categorical latent variables does not allow for this regression." But, csex is mentioned first in the CLASSES command? So, I'm puzzled as to what to do. 


It should not be first. See the CLASSES option in the user's guide where the order is discussed. 


Dear Linda, the CLASSES option in the User's Guide tells me that the class on which other classes are regressed on should be first. Because I try to regress c1 and c2 on csex, I figured csex should be first in the CLASSES command. But be that as it may, I get a similar type of error message regardless of which order I use in the CLASSES. If csex is second or last, the error message says "Invalid ON statement: C1#1 ON CSEX#1. The order...", and if csex is first, I get the above error message "Invalid ON statement: C2#1 ON CSEX#1...". 


Please send the output and your license number to support@statmodel.com. 


Dear Prof MuthenI have conducted two separate LPA's with two age groups (adolescents and young adults). I used the the LoMendellRubin likelihood ratio test and the the bootstrapped likelihood test to determine the best fitting and most parsimonious models. For adolescents this is clearly a 3 class solution and for young adults a 2 class solution. My understanding is that when it is clear in this case that a 2 class solution won't fit for both groupsthat a multiplegroup analysis is not feasible. Is this your view? If this is the case, is it possible to test indicator mean differences across models as this is not a multiple group model? Thank you. 


I agree that a multiple group analysis is not appropriate when the classes for the two groups are not the same. Indicator means across models cannot be tested. 

Shuai Chen posted on Thursday, October 16, 2014  7:25 pm



Hello, I am fitting a multigroup mixture model with known classe gender and 3 latent classes according to example 7.21 but without parameter restriction: CLASSES = cg (2) c (3); KNOWNCLASS = cg (male = 0 male = 1); MODEL: %OVERALL% c ON cg; and compare it with the model without multiple group: CLASSES = c (3); I expected to get larger loglikelihood for multigroup mixture model, but it is 5587.569, smaller than 4874.786 from the model without multiple group. I also fitted with 15 latent classes, and each time the multigroup mixture model has smaller loglikelihood. Any explanation? Thank you very much! 


The Knownclass loglikelihood is not on a scale comparable to that without Knownclass. This is because Knownclass essentially has an observed binary indicator as an extra DV. If you want to make this type of comparison I think you have to use gender as a covariate of c in your "multiplegroup" model (in the model that takes gender into account): c ON gender; That way, you have the same DVs in the different models. 

Shuai Chen posted on Wednesday, October 22, 2014  7:03 am



Thanks for the suggestion. My DVs are categorical variables. However, I tried your suggested way with 3 classes and found the thresholds for male group are the same with threshold for female group. Can I fit a model with different thresholds for two gender groups with the right loglikehood I need? 

J.D. Smith posted on Friday, October 24, 2014  11:55 am



Hi, I receive this error when trying to run a multiple group model with the Bayesian estimator using the mixture model with KNOWNCLASS command: *** FATAL ERROR VARIANCE COVARIANCE MATRIX IS NOT SUPPORTED WITH ESTIMATOR=BAYES. PARTIAL EQUALITY BETWEEN TWO VARIANCE COVARIANCE BLOCKS. IF TWO PARAMETERS FROM TWO DIFFERENT VARIANCE COVARIANCE BLOCKS ARE HELD EQUAL THEN ALL THE PARAMETERS HAVE TO BE EQUAL IN THE TWO BLOCKS. USE ALGORITHM=MH TO RESOLVE THIS PROBLEM. The following model runs with ESTIMATOR = ML CLASSES = CG (3); KNOWNCLASS = CG (Cond = 1 Cond = 2 Cond = 3); Analysis: estimator = BAYES; TYPE = MIXTURE; Model: %OVERALL% Con2 on Con1;!(a1); Con2 on Age;!(b1); Con2 on TCGen;!(c1); Con2 on COACHS;!(d1); Age Tcgen Con1 COACHS; %CG#1% Con2 on Con1;!(a1); Con2 on Age;!(b1); Con2 on TCGen;!(c1); Con2 on COACHS;!(d1); Age Tcgen Con1 COACHS; %CG#2% Con2 on Con1;!(a2); Con2 on Age;!(b2); Con2 on TCGen;!(c2); Con2 on COACHS;!(d2); Age Tcgen Con1 COACHS; %CG#3% is same as above (message too long if included) Version is 7.2. Any help is much appreciated. 


You have two options both giving the same conditional distribution estimation [Con2  Age Tcgen Con1 COACHS] 1. Preferable since you don't estimate as many parameters: Remove all the lines Age Tcgen Con1 COACHS; With this option the ditribution for [Age Tcgen Con1 COACHS] is not estimated, i.e., they are treated as true covariates and no assumptions are made about their distribution. 2. In each class add Age Tcgen Con1 COACHS with Age Tcgen Con1 COACHS; So that the covariances are also class specific (not just the variances). If you have missing data on these variables only option 2 will be possible. 

Back to top 