Message/Author 

Anonymous posted on Monday, December 10, 2001  4:00 pm



Can you please explain the major differences between latent class analysis and cluster analysis? What is the advantages of using LCA? 

bmuthen posted on Tuesday, December 11, 2001  4:57 pm



LCA can be seen as a special case of cluster analysis, within the family of mixture cluster analysis. Mixture cluster analysis has been advocated by McLachlan and other statisticians as perhaps a better clustering method than the traditional ones (see McLachlan ref. in the Mplus reference section). The real question is which criterion used to form the clusters is most relevant to the particular application. LCA assumes that it is relevant to find clusters of individuals for whom the observed variables are independent, which is another way of saying that in the total sample the latent class variable is the only thing that causes the observed variables to be related to each other. This in turn is in line with factor analysis. If instead you believe that the observed variables have direct relationships, perhaps LCA is not a good method for clustering. 


Is it possible to correlate the error terms in MPlus, if the assumption of independence of the latent class indicators does not hold? In other words, is it possible to fit a LCA, and a cluster model, to see which one fits the data better? What happens under circumstances where indicators are correlated (despite being conditional on different classes), but their correlation is low? If cluster analysis is available, would I be able to include a predictor? Thanks in advance for your reply. 


If the indicators are continuous, you can use WITH statements. If they are categorical, they cannot be correlated. 


Thanks. 

npark posted on Thursday, February 19, 2004  7:31 am



I thought I posted this message yesterday to the list, but it might have not gone through. If it is a duplicate, please ignore this. I obtained 6 cluster solutions using 26 variables (LCA with binary and coninous indicators). A colleague questioned the possibility of multicollinearity among those 26 observed variables and the consequences of it. I examined the correlation matrix of the 26 variables, and found that .06 is the highest bivariate relationship and others are less than .06. I think there is a possiblity of giving redundant information using correlated items, but not sure if the collinearity is a problem in mixture modeling and if it is, at what point researcher should be alarmed. I will greatly appreciated your answer, or please point me to the relevant materials (references) related to these issues. Thanks very much. 


I'm surprised that your low correlations result in finding six classes. I think of multicollinearity as being a problem of extrememly high correlations 

npark posted on Friday, February 20, 2004  8:33 am



I am not surprised by your answer! Sorry for the typo  .06 should be .6. Let me ask you again my questions. (1) How high correlations among variables might cause problems in mixture modeling? (2) Are data reduction techniques (e.g., exploratory factor analysis) are recommended before mixture modeling? Thanks for your answer in advance. 


1. I don't really know. But .6 seems fine to me. 2. It's always a good idea to look at data in several ways to better understand it. It is likely that if you find two factors, you will find three classes. If by data reduction, you mean creating factor scores and using the factor scores in mixture modeling, I have never heard that recommendation. 

npark posted on Friday, February 20, 2004  8:56 am



Thanks very much! 

Anonymous posted on Friday, October 08, 2004  5:26 pm



Dear Dr. Muthen: 1) GMMs with different number of groups are not nested; therefore, it is inappropriate to use the likelihood ratio test for model comparison (Ghosh & Sen, 1985; Nagin, 1999). Is this the same situation for LCA? 2) How is the SampleSize Adjusted BIC defined? Sometimes, I find that BCI is smaller for K classes than for K+1 classes, but adjusted BIC is smaller for K+1 classes than for K classes. Which one should be used for model choosing? 

Anonymous posted on Sunday, October 10, 2004  6:36 pm



Please ignore the above questions. I got it. 

Anonymous posted on Tuesday, November 16, 2004  7:55 am



I am using MPLUS to run a Latent Profile Analysis. I am using 14 scales (on a continuous metric) for input for the analyses. These scales measure 4 latent variables. However, the scale level data was used in the analyses to estimater parameter information for each of the 14 scales. I was able to find a 3 class solution that made sense both in terms of interpretation & fit evidence. A few questions on the procedure: Is it true that the optimal class solution found by MPLUS is the number of latent traits  1? Also, I know that probability information is used in the class assignment, but, does MPLUS allow individual cases to "switch" classes throughout the iterative process (like a Kmeans cluster analysis)? Finally, since Latent profile is in the SEM framework, I'm not sure of the role of error. Are the input variables assumed to be measured without error? Or, could error be 'partitioned' (like in CFA & uniqueness terms) into the MPLUS parameter estimates which result for each latent class? Error isn't assumed to be zero just because covariance matrix input used with the analyses, right? Thank you for your assistance 

hildebtb posted on Tuesday, November 16, 2004  11:18 am



In terms of LCA versus cluster analysis, is it fair to say that LCA would be preferable to cluster analysis if you were trying to determine if there were subtypes of a particular diagnostic category? For instance, if you had 10 criterion variables that were indiciative of types of body image disturbance. You hypothesize that these criterion are met in different patterns, with each pattern representing a different type of body image disturbance with different etiology, phenomenology, genetic predisposition, and comorbidity. Would LCA be more appropriate than cluster analysis to identify the differnt subtypes of body image disturbance? 

bmuthen posted on Tuesday, November 16, 2004  1:01 pm



It is not always the case that the number of classes found (k) relates to the number of factors (m) as k = m+1, but it seems to often happen and does have a psychometric reason (see e.g. Bartholomew's book on our web site). Yes, individual's class probabilities change over iterations and therefore most likely class membership also changes. You can think of LPA as having error variances  in this case the withinclass variance for each outcome. So, the latent class variable explains some of the variation in the outcome and the residual the rest. 


I am having trouble conceptually with using LCA in a particular data set that I have of steroid users. I am particularly interested in determining if there are unique patterns of steroid use. I have a list of drugs (14 total) that have different properties and are likely used in different ways to achieve different goals (build muscle, reduce fat, etc). They can also be broken down in several ways (some are injected, some are taken orally, some speed up metabolism, others help build muscle, etc). I also have quantity and frequency data for the larger constructs (how much taken orally and for what duration). My question is whether a LCA model is the most effective way to determine unique patterns of use? I am particularly concerned with violations of local dependence because amount and frequency should be correlated even within class (although the directions may be different)and the number of drugs taken in most cases will also be related to amount, so in most cases, taking a drug vs not taking a drug, will be positively related to amount. Given the interrelationships between most of the indicator variables, I'm wondering if the LCA model wouldn't be trustworthy given that I would have to allow for most variables to be correlated within class to fit an accurate model? 

bmuthen posted on Monday, December 27, 2004  2:55 pm



Sounds like you have binary use/nouse variables for each drug and for many drugs also QF information. LCA can handle withinclass correlations, although with binary variables it is hard to estimate a model where there are many of these; it is easier with continuous outcomes. I wonder if a 2part mixture model is relevant here; this is often useful with strong floor effects (many zeros). In your context 2part modeling would consider  for each drug  a variable that has one binary part indicating if the drug is used and another continuous part indicating much it is used. For the second part you could multiply Q and F into a continuous amount variable. You can then have a mixture model for each of the 2 parts analyzed simultaneously. The 2 parts are typically strongly correlated. This modeling can be done in Mplus and we have some positive experiences with it. 


Thank you Dr. Muthen for the suggestion. Let me give you a bit more detail to make sure that this would work. The data is as follows: 3 steroids taken orally (binary use/no use for each drug) Quantity of oral steroids Frequency of oral steroids 7 steroids taken through injection (binary use/no use for each drug) Quantity of injectible steroids (continuous) Frequency of injectible steroids (continous) 1 over the counter fat burning drug (binary use/no use) Quantity of OTC fat burner (continuous) Frequency of OTC fat burner (continuous) 3 illegal fat burning drugs (binary use/no use for each drug) Quantity of each fat burning drug (continuous) Frequency of all illegal fat burning drugs (continuous) It seems as though this data would make the 2part mixture model a bit more complicated as we don't have Q and F for each individual drug. Would the 2part mixture model still be possible and could you suggest an example? 

bmuthen posted on Monday, December 27, 2004  5:46 pm



That data structure makes it more complex. I wouldn't throw all the variables in one LCA analysis, 2part or not. You can always simplify. One way is to analyze only the 14 binary variables by LCA; that can be informative in itself. Another is to do 2part LCA with 4 variables, where the binary variable is "any oral", any injection, any legal fat burning, and any illegal fat burning. 


Thank you for the advice again. I have run the 14 binary indicator LCA and am happy with the results. I wanted to try the 2part mixture model as you suggested though, to see if adding the quantity and frequency variables add interesting information. Is there an example that you could recomend? Also, would it be possible to allow Q and F to correlate within class using this model instead of combining them into a single variable? I believe that there are those who use have a High Q and Low F, High Q and High F, Low Q and High F, Low Q and Low F for each of these drugs and think that creating a combined QxF variable would mask these groups. 

bmuthen posted on Thursday, December 30, 2004  3:53 pm



Yes, you could correlate Q and F within class and that would not be problematic in 2part LCA/LPA. Although the User's Guide does not have an example of exactly this kind, Ex 6.16 from the Version 3 User's Guide  although a growth model  could be used to generalize to mixture modeling. We encourage such combinations of examples based on the UG components. There is a paper by Brown et al on the Mplus web site that has a 2part growth application, although not a mixture. I have a paper on 2part growth mixture modeling and also have some setups for 2part factor mixture analysis that I could share. One question I have found important is if the mixture holds for both parts or only one of them; these variations can be studied.  You might publish before I have time to 

Blaze Aylmer posted on Thursday, September 29, 2005  2:16 am



What algorithms does MPLUS use to undertake cluster analysis? Thanks in advance 

bmuthen posted on Thursday, September 29, 2005  5:46 am



Latent class analysis can be used for clustering. This method has been found to perform better than kmeans clustering. You can also use more general forms of latent class analysis where you allow for withinclass (withincluster) correlations among the variables. 

K Faouzy posted on Monday, December 12, 2005  9:50 pm



I am having trouble conceptually with using LCA in a particular data set that I have of business strategy. I am particularly interested in determining if there are unique patterns of different strategy types. I have a list of variables that represents different types of strategy (18 total) that have different properties and are likely used in different ways to achieve different types of strategy (Innovators, Followers, etc). For instance, the respondent were asked to indicate the importance of product innovation to the accomplishment of their business strategy, using a seven point likert scale with end points “Least Important” (1) and “Extremely Important” (7). My sample size is 120 companies answered all the 18 questions. I am expecting to obtain three to four distinct groups (clusters) of companies each group follows one type of business strategy. Latent Class Analysis will be performed to identify three to four groups (clusters) as suggested by the literature. My question is whether a LCA model is the most effective way to determine unique patterns of use to identify distinct groups (clusters)? Second question in terms of LCA versus cluster analysis, is it fair to say that LCA would be preferable to cluster analysis in this situation. 


IT sounds like latent class analysis would be a good approach. LCA is a type of cluster analysis. It performs better than kmeans clustering due to the fact that variances do not need to be equal across classes. 

Tonia F posted on Friday, January 27, 2006  10:32 am



I would really appreciate some advice as I've never done cluster analysis before. I am working with populationbased data of about 10,000 women. I have 5 binary/dichotomous (coded 0, 1) variables for types of violence: control, fear, demean, physical, sexual). We want to know which forms of violence are likely to cooccurr. Others in the field have used cluster analysis to identify patterns. Is this an appropriate technique? Is Hierarchical clustering best? Kmeans? If Hierarchical, is single, complete or average linkage appropriate based on my data? There are so many decisions to make but unsure which are the right ones based on my data. Many many thanks! 


I think Latent Class Analysis would be appropriate for your data and reseach question. I think it is a preferred clustering technique over Kmeans clustering for example. 

Tonia F posted on Tuesday, January 31, 2006  12:42 pm



Thank you for your response Dr. Muthen. Is it possible to talk briefly about how one would determine the number of latent classes in a LCA. I am confused as to whether one would start with the minimum or maximum number of possible classes. In my situation, the maximum number is quite a lot. Many many thanks! 


In my experience, one starts with the two class solution and goes up from there. I suggest you look at the B. Muthen paper in a book edited by Kaplan which can be downloaded from the website. It shows a strategy for determining the number of classes. 

anon posted on Wednesday, February 08, 2006  5:05 pm



i have a question about how to determine the most optimal clustering solution. can it be done by examining the BIC's alone? can entropy be used? both? what do you advise? have seen bits and pieces of this question asked, but never in a straightforward manner. thanks for your help. 

bmuthen posted on Wednesday, February 08, 2006  6:29 pm



There are several criteria in addition to BIC  and another one coming in Mplus Version 4 (bootstrapped LRT). For an overview, see my 2004 chapter in the Kaplan handbook  the pdf is on our web site: http://www.statmodel.com/recpapers.shtml 


Can you tell me where to find this paper? This link is no longer valid. "There are several criteria in addition to BIC  and another one coming in Mplus Version 4 (bootstrapped LRT). For an overview, see my 2004 chapter in the Kaplan handbook  the pdf is on our web site: http://www.statmodel.com/recpapers.shtml " 


Following is the current link: http://www.statmodel.com/papers.shtml 

mpduser1 posted on Tuesday, February 16, 2010  9:05 am



I have a question about interpreting / reconfiguring the results from a latent class regression in Mplus. Specifically, in a 4class model, is it possible to reconfigure or transform the Mplus output so that my regression results pertain to the log odds of being in class 1 versus all other classes, then class 1 versus all other classes, etc., (rather than, say, the log odds of being in class 1 vs. class 2, class 2 versus class 4, class 3 versus class 4, etc.)? Thanks very much. 


The multinomial regression coefficients get their interpretation as the log odds of a class relative to the last class. I may be wrong, but don't think there is a simple transformation to get a coefficient that portrays the log odds of a class relative to all others  unlike the coefficients of multinomial regression it would probably depend on the values of the covariates. But one could compute the log odds you want for a certain set of values for the covariates. 

Anne Chan posted on Wednesday, February 17, 2010  1:46 pm



Hello. I run a LCA analysis on 4 motivation constructs in learning. My data set is quite big with about 25000 respondents. According to AIC and BIC, the 13class solution fit the data best. However, there are too many cluster in the solution and the characteristics of some cluster are too similar. Why there are so many clusters in the bestfit solution, is it related to the big sample size? May I ask if you have any suggestion that I can work on this dataset, but able to get a fewcluster best solution? 


I assume your 4 outcomes are continuous. Have you checked if BIC is better with a 1factor model? It is not always the case that a simple LCA is a good model for the data. There is also the possibility of using a Factor Mixture Model so that the withinclass correlations are not restricted to be zero. See the Mplus web site under Papers for articles on that. 

Anne Chan posted on Thursday, February 18, 2010  5:24 am



Thanks for your suggestion. 1factor model is not a better fit. I will check the Factor Mixture Model. Thanks a lot! 

anonymous posted on Saturday, February 20, 2010  7:45 am



Hello, I am conducting a LCA (n=1510) of psychiatric diagnoses in males. I have a few questions: 1. In terms of assessing the optimal number of classes, indices are somewhat contradictory. The LL estimate continues to decrease at 4 classes, the LMR likelihood statistic is significant at 3 classes (but no longer at 4 classes), the BIC begins to increase at 2 classes, and the samplesize adjusted BIC begins to increase at 3 classes. 2. For one psychiatric diagnosis or variable, 4 categories are generated  which is strange since it is a dichotomous variable. 3. I receive the following information which I'm not familiar with: IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET AT THE EXTREME VALUES. EXTREME VALUES ARE 15.000 AND 15.000. THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES: * THRESHOLD 1 OF CLASS INDICATOR DSM_SP_N FOR CLASS 1 AT ITERATION 400 * THRESHOLD 2 OF CLASS INDICATOR DSM_SP_N FOR CLASS 1 AT ITERATION 400 * THRESHOLD 3 OF CLASS INDICATOR DSM_SP_N FOR CLASS 1 AT ITERATION 400 * THRESHOLD 1 OF CLASS INDICATOR DSM_PDS_ FOR CLASS 1 AT ITERATION 400 * THRESHOLD 1 OF CLASS INDICATOR DSM_SAD_ FOR CLASS 2 AT ITERATION 400 * THRESHOLD 1 OF CLASS INDICATOR DSM_GAD_ FOR CLASS 2 AT ITERATION 400 thank you! 


1. Fit statistics can be contradictory. You need to also consider the theoretical meaning of the classes. 2. If this is the case, it sounds like you are not reading your data correctly. 3. When thresholds become large, they are fixed reflecting probabilities of zero and one. This can be helpful in defining the classes. 

anonymous posted on Sunday, February 21, 2010  4:40 pm



Thanks very much for your response. Do I understand you correctly that Mplus fixes thresholds by default to 15 and +15 when they are small or large, reflecting probabilities of 0 (15) and 1 (+15). thanks! 


Yes. 

anonymous posted on Monday, February 22, 2010  11:41 am



Is the parametric bootstrap method, BLRT, available in Mplus to help determine the optimal number of classes? 


Yes, it is the TECH14 option in the OUTPUT command. 

anonymous posted on Tuesday, February 23, 2010  12:11 pm



Thank you. I am noticing that for some models the entropy is perfect (1.0), does this indicate any type of overfitting problem? 


We've never seen entropy of one. It could be overfitting but I can't say for sure. 

anonymous posted on Wednesday, March 10, 2010  1:20 pm



I am now attempting to include a continuous covariate in an 5class LCA of complex survey data. All models ran successfully, except the 5class model: WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED.THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS. However, I increased the initial and second starting value to 3000 and 2000, respectively, but continued to receive this message. I've also tried changing the starting values by using the STSEED function. What do you suggest? 


Please send the full output and your license number to support@statmodel.com. 

Erika Wolf posted on Thursday, December 09, 2010  9:02 am



I'm running an LPA with random starts and I'm honing in on a 3 class solution. However, in the 3 class model I get the following message: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTINGVALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.194D19. PROBLEM INVOLVING PARAMETER 20. The parameter this is referring to is estimated at 0 (and I would expect it to be 0 in that class). I'm wondering if this is causing the problem and I can ignore this message? Or is there something else I can do to resolve the issue? Thanks! 


A parameter estimated at zero should not cause that message. Please send the output and your license number to support@statmodel.com. 


I am doing a set of LCA's using ordinal indicators with 3 categories. For the first LCA that I am doing I have 26 indicators of the latent class and 255 observations. Beginning with the 2 class model, I get the following error message: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.373D18. PROBLEM INVOLVING PARAMETER 8. Any input on how I could address this would be great. Thank you. 


Please send the output and your license number to support@statmodel.com. 

Susan Pe posted on Tuesday, September 18, 2012  6:36 am



Hi, I am considering doing a latent class analysis using 3 variables. Is it possible to use latent variables for the latent class analysis? I think the observed proxy for the variable may not be good enough and using a latent variable made up of 3 items may be a better proxy for the variable. Or should I try to come up with the best observed proxy for the variables in the latent class analysis? Thank you. 

Jon Heron posted on Tuesday, September 18, 2012  7:11 am



Continuous latent variables would be a second order Latent Profile Analysis I guess. 


If the variables are binary, no more than two classes can be extracted. See Example 7.17 where a factor is used as you suggest. You could try it both ways. One issue with using a factor is that the indicators may have a direct relationship to the categorical latent variable. 


Dear Bengt, I have 20 continuous variables to perform latent profile analysis with a sample size of around 900. 1) How do I determine the identifiably of my model? 2) Generally should residual variances across classes be held equal? and what is the theoretical basis for this? Any good papers on this topic will be greatly appreciated. Thanks in advance Selahadin 


For both of these questions see the book, Finite Mixture Distributions, by Everitt and Hand. It is more difficult to estimated a model with variance unconstrained across classes. 


Hi Linda, I picked a 4class solution for my unconditional LCA model. I am now running conditional models. I notice that the item endorsement probabilities for each class change slightly as I add predictors. Is it possible for me to constrain the loadings of the indicators in the conditional models to what they were in the unconditional model? This way, the classes mean the same thing upon adding predictors to the model. Note that all of my indicators are dichotomous. Thank you. 


This may indicate the need for direct effects. See the following paper on the website: Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345368). 


You may also consider the 3step approach "R3STEP" that is introduced in the just released Version 7. This holds the class proportions fixed at the values of the unconditional model. See Mplus Web Note 15 as well as the V7 training videos from Utrecht that are referred to on our home page. 


Thanks, Bengt! I will take a look. 


Bengt and Linda, I have Mplus Version 6.12. Is there a way to constrain the solution to what it was in the unconditional model manually, without R3STEP? I was thinking that I could do this by using @ values for the thresholds of the dichotomous items, rather than start values (where * is used), which is demonstrated in the Mplus manual. Will this work? Thank you! 


Actually, Linda and Bengt, I will try to upgrade to Version 7. Is R3STEP preferable over trying to constrain the solution by hand (manually)? 


It is much more convenient to do this using R3STEP. You should upgrade. 


Hi Linda and Bengt, I am now using R3STEP and I think it is great! However, in comparing the effect of different predictors on class membership in univariate fashion, I notice that the fit indices are the same regardless of what predictor I enter. Specifically, the 2LL is 2765.7 and the AIC is 5753.4 regardless of whether I enter Internalizing, Externalizing, or Adversity. I would expect the fit of the model to be different depending on what predictor I enter. Or, is it the case that these fit statistics refer to the fit of the initial model without the predictor (the first step of the 3step process)? The numbers above are indeed very close to those from the unconditional model. Is there a way to get the fit of the model with the predictor added, enabling me to compare the fit of models with various predictors? 


VAriables tested using R3STEP are not part of the analysis model so will not affect fit indices. The only way for this to happen is if the are included in the MODEL command. 


Dear Linda, I am running a LCA with both continuous and binary indicators. I am interested in reporting odds ratios and confidence intervals for a 2class solution. I am trying to make one specific class the referent group to obtain the odds ratios in the direction I want. Without using the CINTERVAL statement, I am able to do this by specifying starting values for the classes in my model input statement. However, when I add the CINTERVAL statement, regardless of using starting values (both exact and extreme), I cannot seem to change the class that is used as the referent in the output. Do you know if there is a more appropriate way to do this? 


I can't see how adding the CINTERVAL option to the OUTPUT command would affect the estimation of the model. Please send the two outputs and your license number to support@statmodel.com. 


Linda and Bengt, Is it a requirement that responses on the indicators in an LCA be indpendent of each other? That is, there should not be contingency between two indicators of the latent class, right? In other words, there should not be correlations between responses on the indicators above and beyond what is accounted for by the latent class factor, right? Is this stated in the Mplus manual so that I may cite this idea? Thank you, Lisa 


LCA indicators can be highly correlated, but the model says that they are not correlated within class. This is the standard LCA conditional independence assumption that you will find in any LCA writing including in the LCA book our UG refers to: Hagenaars & McCutcheon (2002). You add classes until this is reasonably fulfilled (you can check by TECH10). 


Bengt, I tried requesting this information using TECH10, but received this message: TECHNICAL 10 OUTPUT TECH10 OUTPUT FOR CATEGORICAL VARIABLES IS NOT AVAILABLE BECAUSE THE FREQUENCY TABLE FOR THE LATENT CLASS INDICATOR MODEL PART IS TOO LARGE. Do I have too many indicators to check that we have met the conditional independence assumption? Is there any way that we can amend our code to check that we've met this criterion for a good solution? 


It is hard to do model fit testing with many categorical variables. In those cases I would take the more practical approach an increase the number of classes to see if new substantively meaningful classes come out. 


Hi, I'm running a 3step LCGA including a range of auxiliary variables (using x(r3step)). I have added CINTERVAL to the output command and expected to find ORs and CIs from the logistic regression in the output. However, they do not seem to be there. Am I missing something? Thanks 


This is not implemented yet. 


Hello, This is a simple question. In order to run a latent class model, is it necessary to purchase the mixture or combination addon for the base program? Thank you 


Just the mixture addon. 


Hello: I understand in LCA (binary indicators) that when thresholds become large, they are fixed reflecting probabilities of zero and one. In the case of LPA (count indicators) when one gets extreme logit parameters set at 15 and 15, does this mean that the probability of the mean for a given indicator is zero or one? 


Please send that output to Support. 


Hello I am new to LCA/LPA and have a few questions before I get started. I am interested in determining clusters of people with arthritis. Previous work using hierarchical cluster analysis has analyzed clusters mainly based one construct (psychological profiles) and then looked at associations with other factors once the clusters were established. So my questions are 1. With LCA/LPA am I able to cluster on more than one construct e.g. psych profiles (3 variables), performance of physical activity tests (4 variables), sensory testing (2 variables) and patient reported pain and function (3 variables)? If this is possible are there concerns in using multiple factors/constructs? 2. I understand that once the clusters are formed that there is the assumption of independence of the variables. As per Dr. Muthen's first comment in this thread about suspecting that there are direct relationships between the observed variables, does one consider a minimal correlation e.g. r<0.4 to determine whether direct relationships exist? thank you 


1.Perfectly fine to cluster based on all constructs jointly, working with all the indicators jointly. You have to choose the model specification of letting the factor means vary across classes (invariance indicator intercepts), or letting the indicator intercepts vary across classes (factor means fixed at zero). See my "hybrid" paper from 2008 on our website. 2.Note that having a latent class variable implies that the indicators are correlated; the model says that's why they are correlated. The question is if there is residual, or withinclass, correlation between some indicators. I don't know if you have continuous or categorical indicators. In both cases, however, you can use WITH to capture some of these residual correlations. 


Hello, I am conducting LCA. But I keep receiving the following error message: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.243D16. PROBLEM INVOLVING PARAMETER 12. According to the parameter specification, the problem parameter is tau (a threshold). So I was initially testing ordinal indicators (3 points), so changed data to binary indicators (2 points) but still receive the same error message. Could you suggest any solutions for this problem? 


Send the output to Support along with your license number. 


I am running an LCA attempting to identify classes within a diagnostic category. I am using complex Survey data and have run intio a few issues. I am using the analysis type Complex mixture. Any insight into some possible resolutions would be greatly appreciated. The first question I have is with weighting. For the LCA do I use the probability weight, stratification, and clustering variable? I received some advice that I only need to use the weight plus either the stratification or clustering variable and not both. My second question is more of a coding question. I have all the variables entered correctly, but If i remove the stratification variable from the use variables command and delete the stratification command I get a warning that there is missing data. 


My apologies. Let me clarify question 2. If I delete the "stratification=" command, but leave the stratification variable under the "use variables" command the model runs fine with no error messages. If I remove the stratification variable from use variables, then I get a missing data error message. 


The stratification variable should not be on the USEVARIABLES list. There must be a problem with your model. Send the output and your license number to support@statmodel.com. 

Witold Orlik posted on Tuesday, September 26, 2017  9:56 am



Please help . I converted data set from Stata to Mplus, then ran some latent class analysis using Mplus. Now I would like to transfer back 3 class solution from Mplus to Stata for other analysis. In detail, I wanted to add a variable to Stata indicating for each participant, which class they are in (so for 3 class solution, participants would have a value of 1,2 or 3). It would be based on 3 class solution output file from Mplus. Hope that is clear explanation, if not, please let me know and I will amend it. Thank you very much for any help. Regards Witold 


Try using the Savedata command with File = name.dat; Save = cprobs; The last column gives you the most likely class. 


Thank you Bengt, I used that command, saced file as .sav then opened that using notepad and then transferred this to Xcel and then to Stata. Anyway I am glad it worked. 

Peng qian posted on Thursday, November 30, 2017  7:02 am



Dear Dr. Muthen: How can I simulate two equal groups in LCA? The synatx of mcex7.21.inp provided by MPLUS below. montecarlo: names are y1y4 g; generate = g(1); categorical = g; genclasses = cg(2) c(2); classes = cg(2) c(2); nobs = 1000; seed = 3454367; nrep = 1; save = ex7.21.dat; We just obtain two random groups, not 500500. Thanks! sincerely 


Send your full output to Support along with your license number. Also explain what you mean by 500  500. 


Greetings, professor. We conducted a Kmeans cluster analysis to distinguish 4 types of parental styles based on 2 parental practices. Reviewer said that, given the fairly large sample size (600+), we are supposed to use LPA. But when we tried to do so, the number of latent classes would be 2, which can hardly be theoretically interpreted. We noticed that LPA analyses usually have more than five indicators. Is the fact that we have ONLY TWO indicators (i.e., parental practices) make our findings statistically unreliable? Thank you! 


I would suggest taking the approach of UG ex 7.22. This is a more general model than LPA. 


Is there a limit on how many items I can run in Mplus for an LPA? I'm trying to perform an LPA on 63 traits, so that I can run regressions between these clustered traits and 8 predictors. However, every time I run this syntax, Mplus shuts down and quits working. VARIABLE: NAMES ARE Q2_1 Q2_2 Q2_3 Q2_4 Q2_5 Q2_6 Q2_7 Q2_8 Q2_9 Q2_10 ...63 total traits Q7_1R Q7_25R Q7_28R Q7_40R Q7_4R Q7_15R Q7_27R Q7_38R Q7_3R Q7_9R Q7_17R Q7_29R Q7_26R Q7_36R Q7_35R Q7_37R; USEVARIABLES ARE Q2_1 Q2_2 Q2_3 Q2_4 Q2_5 Q2_6 Q2_7 Q2_8 Q2_9 Q2_10 Q2_1 Q2_2 Q2_3 Q2_4 Q2_5 Q2_6 Q2_7 Q2_8 Q2_9 Q2_10 ...63 total traits ; MISSING = ALL (99); CLASSES = C(3); Analysis: TYPE = MIXTURE; iterations = 3000; starts = 100 10; Plot: type is plot3; OUTPUT: TECH11 TECH14;!SAMPSTAT RESIDUAL; 


There is no such limit. Send your output and data to Support along with your license number so we can see what's going on. 


My question is what is the minimum requirement for the relative class size in the latent class analysis/latent transition analysis. 


There is no such known quantity as far as I know. 


Dear Dr. Muthen, What is your recommendation? Thank you, Wen 


There are so many factors involved that nothing general can be stated. 

Back to top 