Mplus Discussion >> LCA and cluster analysis

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


LCA and cluster analysis

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Anonymous posted on Monday, December 10, 2001 - 4:00 pm

Can you please explain the major differences between latent class analysis and cluster analysis? What is the advantages of using LCA?

bmuthen posted on Tuesday, December 11, 2001 - 4:57 pm

LCA can be seen as a special case of cluster analysis, within the family of mixture cluster analysis. Mixture cluster analysis has been advocated by McLachlan and other statisticians as perhaps a better clustering method than the traditional ones (see McLachlan ref. in the Mplus reference section). The real question is which criterion used to form the clusters is most relevant to the particular application. LCA assumes that it is relevant to find clusters of individuals for whom the observed variables are independent, which is another way of saying that in the total sample the latent class variable is the only thing that causes the observed variables to be related to each other. This in turn is in line with factor analysis. If instead you believe that the observed variables have direct relationships, perhaps LCA is not a good method for clustering.

Carlos Elordi posted on Thursday, April 04, 2002 - 1:48 pm

Is it possible to correlate the error terms in MPlus, if the assumption of independence of the latent class indicators does not hold?
In other words, is it possible to fit a LCA, and a cluster model, to see which one fits the data better? What happens under circumstances where indicators are correlated (despite being conditional on different classes), but their correlation is low?
If cluster analysis is available, would I be able to include a predictor?
Thanks in advance for your reply.

Linda K. Muthen posted on Thursday, April 04, 2002 - 4:46 pm

If the indicators are continuous, you can use WITH statements. If they are categorical, they cannot be correlated.

Carlos Elordi posted on Friday, April 05, 2002 - 6:39 am

Thanks.

npark posted on Thursday, February 19, 2004 - 7:31 am

I thought I posted this message yesterday to the list, but it might have not gone through. If it is a duplicate, please ignore this.

I obtained 6 cluster solutions using 26 variables (LCA with binary and coninous indicators). A colleague questioned the possibility of multicollinearity among those 26 observed variables and the consequences of it. I examined the correlation matrix of the 26 variables, and found that .06 is the highest bivariate relationship and others are less than .06. I think there is a possiblity of giving redundant information using correlated items, but not sure if the collinearity is a problem in mixture modeling and if it is, at what point researcher should be alarmed.

I will greatly appreciated your answer, or please point me to the relevant materials (references) related to these issues. Thanks very much.

Linda K. Muthen posted on Thursday, February 19, 2004 - 3:37 pm

I'm surprised that your low correlations result in finding six classes. I think of multicollinearity as being a problem of extrememly high correlations

npark posted on Friday, February 20, 2004 - 8:33 am

I am not surprised by your answer! Sorry for the typo -- .06 should be .6. Let me ask you again my questions.

(1) How high correlations among variables might cause problems in mixture modeling?

(2) Are data reduction techniques (e.g., exploratory factor analysis) are recommended before mixture modeling?

Thanks for your answer in advance.

Linda K. Muthen posted on Friday, February 20, 2004 - 8:50 am

1. I don't really know. But .6 seems fine to me.

2. It's always a good idea to look at data in several ways to better understand it. It is likely that if you find two factors, you will find three classes. If by data reduction, you mean creating factor scores and using the factor scores in mixture modeling, I have never heard that recommendation.

npark posted on Friday, February 20, 2004 - 8:56 am

Thanks very much!

Anonymous posted on Friday, October 08, 2004 - 5:26 pm

Dear Dr. Muthen:

1) GMMs with different number of groups are not nested; therefore, it is inappropriate to use the likelihood ratio test for model comparison (Ghosh & Sen, 1985; Nagin, 1999). Is this the same situation for LCA?

2) How is the Sample-Size Adjusted BIC defined? Sometimes, I find that BCI is smaller for K classes than for K+1 classes, but adjusted BIC is smaller for K+1 classes than for K classes. Which one should be used for model choosing?

Anonymous posted on Sunday, October 10, 2004 - 6:36 pm

Please ignore the above questions. I got it.

Anonymous posted on Tuesday, November 16, 2004 - 7:55 am

I am using MPLUS to run a Latent Profile Analysis. I am using 14 scales (on a continuous metric) for input for the analyses. These scales measure 4 latent variables. However, the scale level data was used in the analyses to estimater parameter information for each of the 14 scales. I was able to find a 3 class solution that made sense --both in terms of interpretation & fit evidence. A few questions on the procedure:

Is it true that the optimal class solution found by MPLUS is the number of latent traits - 1?

Also, I know that probability information is used in the class assignment, but, does MPLUS allow individual cases to "switch" classes throughout the iterative process (like a K-means cluster analysis)?

Finally, since Latent profile is in the SEM framework, I'm not sure of the role of error. Are the input variables assumed to be measured without error? Or, could error be 'partitioned' (like in CFA & uniqueness terms) into the MPLUS parameter estimates which result for each latent class? Error isn't assumed to be zero just because covariance matrix input used with the analyses, right?

Thank you for your assistance

hildebtb posted on Tuesday, November 16, 2004 - 11:18 am

In terms of LCA versus cluster analysis, is it fair to say that LCA would be preferable to cluster analysis if you were trying to determine if there were subtypes of a particular diagnostic category? For instance, if you had 10 criterion variables that were indiciative of types of body image disturbance. You hypothesize that these criterion are met in different patterns, with each pattern representing a different type of body image disturbance with different etiology, phenomenology, genetic predisposition, and comorbidity. Would LCA be more appropriate than cluster analysis to identify the differnt subtypes of body image disturbance?

bmuthen posted on Tuesday, November 16, 2004 - 1:01 pm

It is not always the case that the number of classes found (k) relates to the number of factors (m) as k = m+1, but it seems to often happen and does have a psychometric reason (see e.g. Bartholomew's book on our web site).

Yes, individual's class probabilities change over iterations and therefore most likely class membership also changes.

You can think of LPA as having error variances - in this case the within-class variance for each outcome. So, the latent class variable explains some of the variation in the outcome and the residual the rest.

Tom Hildebrandt posted on Monday, December 27, 2004 - 1:38 pm

I am having trouble conceptually with using LCA in a particular data set that I have of steroid users. I am particularly interested in determining if there are unique patterns of steroid use. I have a list of drugs (14 total) that have different properties and are likely used in different ways to achieve different goals (build muscle, reduce fat, etc). They can also be broken down in several ways (some are injected, some are taken orally, some speed up metabolism, others help build muscle, etc). I also have quantity and frequency data for the larger constructs (how much taken orally and for what duration). My question is whether a LCA model is the most effective way to determine unique patterns of use? I am particularly concerned with violations of local dependence because amount and frequency should be correlated even within class (although the directions may be different)and the number of drugs taken in most cases will also be related to amount, so in most cases, taking a drug vs not taking a drug, will be positively related to amount. Given the inter-relationships between most of the indicator variables, I'm wondering if the LCA model wouldn't be trustworthy given that I would have to allow for most variables to be correlated within class to fit an accurate model?

bmuthen posted on Monday, December 27, 2004 - 2:55 pm

Sounds like you have binary use/no-use variables for each drug and for many drugs also QF information. LCA can handle within-class correlations, although with binary variables it is hard to estimate a model where there are many of these; it is easier with continuous outcomes.

I wonder if a 2-part mixture model is relevant here; this is often useful with strong floor effects (many zeros). In your context 2-part modeling would consider - for each drug - a variable that has one binary part indicating if the drug is used and another continuous part indicating much it is used. For the second part you could multiply Q and F into a continuous amount variable. You can then have a mixture model for each of the 2 parts analyzed simultaneously. The 2 parts are typically strongly correlated. This modeling can be done in Mplus and we have some positive experiences with it.

Tom Hildebrandt posted on Monday, December 27, 2004 - 5:35 pm

Thank you Dr. Muthen for the suggestion. Let me give you a bit more detail to make sure that this would work. The data is as follows:

3 steroids taken orally (binary use/no use for each drug)
Quantity of oral steroids
Frequency of oral steroids

7 steroids taken through injection (binary use/no use for each drug)
Quantity of injectible steroids (continuous)
Frequency of injectible steroids (continous)

1 over the counter fat burning drug (binary use/no use)
Quantity of OTC fat burner (continuous)
Frequency of OTC fat burner (continuous)

3 illegal fat burning drugs (binary use/no use for each drug)
Quantity of each fat burning drug (continuous)
Frequency of all illegal fat burning drugs (continuous)

It seems as though this data would make the 2-part mixture model a bit more complicated as we don't have Q and F for each individual drug. Would the 2-part mixture model still be possible and could you suggest an example?

bmuthen posted on Monday, December 27, 2004 - 5:46 pm

That data structure makes it more complex. I wouldn't throw all the variables in one LCA analysis, 2-part or not. You can always simplify. One way is to analyze only the 14 binary variables by LCA; that can be informative in itself. Another is to do 2-part LCA with 4 variables, where the binary variable is "any oral", any injection, any legal fat burning, and any illegal fat burning.

Tom Hildebrandt posted on Thursday, December 30, 2004 - 3:30 pm

Thank you for the advice again. I have run the 14 binary indicator LCA and am happy with the results. I wanted to try the 2-part mixture model as you suggested though, to see if adding the quantity and frequency variables add interesting information. Is there an example that you could recomend?

Also, would it be possible to allow Q and F to correlate within class using this model instead of combining them into a single variable? I believe that there are those who use have a High Q and Low F, High Q and High F, Low Q and High F, Low Q and Low F for each of these drugs and think that creating a combined QxF variable would mask these groups.

bmuthen posted on Thursday, December 30, 2004 - 3:53 pm

Yes, you could correlate Q and F within class and that would not be problematic in 2-part LCA/LPA. Although the User's Guide does not have an example of exactly this kind, Ex 6.16 from the Version 3 User's Guide - although a growth model - could be used to generalize to mixture modeling. We encourage such combinations of examples based on the UG components. There is a paper by Brown et al on the Mplus web site that has a 2-part growth application, although not a mixture. I have a paper on 2-part growth mixture modeling and also have some setups for 2-part factor mixture analysis that I could share. One question I have found important is if the mixture holds for both parts or only one of them; these variations can be studied. - You might publish before I have time to

Blaze Aylmer posted on Thursday, September 29, 2005 - 2:16 am

What algorithms does MPLUS use to undertake cluster analysis?

Thanks in advance

bmuthen posted on Thursday, September 29, 2005 - 5:46 am

Latent class analysis can be used for clustering. This method has been found to perform better than k-means clustering. You can also use more general forms of latent class analysis where you allow for within-class (within-cluster) correlations among the variables.

K Faouzy posted on Monday, December 12, 2005 - 9:50 pm

I am having trouble conceptually with using LCA in a particular data set that I have of business strategy. I am particularly interested in determining if there are unique patterns of different strategy types. I have a list of variables that represents different types of strategy (18 total) that have different properties and are likely used in different ways to achieve different types of strategy (Innovators, Followers, etc). For instance, the respondent were asked to indicate the importance of product innovation to the accomplishment of their business strategy, using a seven point likert scale with end points �Least Important� (1) and �Extremely Important� (7). My sample size is 120 companies answered all the 18 questions. I am expecting to obtain three to four distinct groups (clusters) of companies each group follows one type of business strategy. Latent Class Analysis will be performed to identify three to four groups (clusters) as suggested by the literature. My question is whether a LCA model is the most effective way to determine unique patterns of use to identify distinct groups (clusters)? Second question in terms of LCA versus cluster analysis, is it fair to say that LCA would be preferable to cluster analysis in this situation.

Linda K. Muthen posted on Tuesday, December 13, 2005 - 9:19 am

IT sounds like latent class analysis would be a good approach. LCA is a type of cluster analysis. It performs better than k-means clustering due to the fact that variances do not need to be equal across classes.

Tonia F posted on Friday, January 27, 2006 - 10:32 am

I would really appreciate some advice as I've never done cluster analysis before. I am working with population-based data of about 10,000 women. I have 5 binary/dichotomous (coded 0, 1) variables for types of violence: control, fear, demean, physical, sexual). We want to know which forms of violence are likely to co-occurr. Others in the field have used cluster analysis to identify patterns. Is this an appropriate technique? Is Hierarchical clustering best? Kmeans? If Hierarchical, is single, complete or average linkage appropriate based on my data? There are so many decisions to make but unsure which are the right ones based on my data.
Many many thanks!

Linda K. Muthen posted on Saturday, January 28, 2006 - 9:13 am

I think Latent Class Analysis would be appropriate for your data and reseach question. I think it is a preferred clustering technique over K-means clustering for example.

Tonia F posted on Tuesday, January 31, 2006 - 12:42 pm

Thank you for your response Dr. Muthen. Is it possible to talk briefly about how one would determine the number of latent classes in a LCA. I am confused as to whether one would start with the minimum or maximum number of possible classes. In my situation, the maximum number is quite a lot.
Many many thanks!

Linda K. Muthen posted on Tuesday, January 31, 2006 - 1:15 pm

In my experience, one starts with the two class solution and goes up from there. I suggest you look at the B. Muthen paper in a book edited by Kaplan which can be downloaded from the website. It shows a strategy for determining the number of classes.

anon posted on Wednesday, February 08, 2006 - 5:05 pm

i have a question about how to determine the most optimal clustering solution. can it be done by examining the BIC's alone? can entropy be used? both? what do you advise?

have seen bits and pieces of this question asked, but never in a straightforward manner.

thanks for your help.

bmuthen posted on Wednesday, February 08, 2006 - 6:29 pm

There are several criteria in addition to BIC - and another one coming in Mplus Version 4 (bootstrapped LRT). For an overview, see my 2004 chapter in the Kaplan handbook - the pdf is on our web site:

http://www.statmodel.com/recpapers.shtml

Elaine Zanutto posted on Friday, September 25, 2009 - 10:43 am

Can you tell me where to find this paper? This link is no longer valid.

"There are several criteria in addition to BIC - and another one coming in Mplus Version 4 (bootstrapped LRT). For an overview, see my 2004 chapter in the Kaplan handbook - the pdf is on our web site:

http://www.statmodel.com/recpapers.shtml "

Linda K. Muthen posted on Friday, September 25, 2009 - 11:10 am

Following is the current link:

http://www.statmodel.com/papers.shtml

mpduser1 posted on Tuesday, February 16, 2010 - 9:05 am

I have a question about interpreting / reconfiguring the results from a latent class regression in Mplus.

Specifically, in a 4-class model, is it possible to reconfigure or transform the Mplus output so that my regression results pertain to the log odds of being in class 1 versus all other classes, then class 1 versus all other classes, etc., (rather than, say, the log odds of being in class 1 vs. class 2, class 2 versus class 4, class 3 versus class 4, etc.)?

Thanks very much.

Bengt O. Muthen posted on Tuesday, February 16, 2010 - 2:28 pm

The multinomial regression coefficients get their interpretation as the log odds of a class relative to the last class. I may be wrong, but don't think there is a simple transformation to get a coefficient that portrays the log odds of a class relative to all others - unlike the coefficients of multinomial regression it would probably depend on the values of the covariates. But one could compute the log odds you want for a certain set of values for the covariates.

Anne Chan posted on Wednesday, February 17, 2010 - 1:46 pm

Hello. I run a LCA analysis on 4 motivation constructs in learning. My data set is quite big with about 25000 respondents. According to AIC and BIC, the 13-class solution fit the data best. However, there are too many cluster in the solution and the characteristics of some cluster are too similar.

Why there are so many clusters in the best-fit solution, is it related to the big sample size? May I ask if you have any suggestion that I can work on this dataset, but able to get a few-cluster best solution?

Bengt O. Muthen posted on Wednesday, February 17, 2010 - 5:03 pm

I assume your 4 outcomes are continuous. Have you checked if BIC is better with a 1-factor model? It is not always the case that a simple LCA is a good model for the data. There is also the possibility of using a Factor Mixture Model so that the within-class correlations are not restricted to be zero. See the Mplus web site under Papers for articles on that.

Anne Chan posted on Thursday, February 18, 2010 - 5:24 am

Thanks for your suggestion. 1-factor model is not a better fit. I will check the Factor Mixture Model. Thanks a lot!

anonymous posted on Saturday, February 20, 2010 - 7:45 am

Hello,
I am conducting a LCA (n=1510) of psychiatric diagnoses in males. I have a few questions:
1. In terms of assessing the optimal number of classes, indices are somewhat contradictory. The LL estimate continues to decrease at 4 classes, the LMR likelihood statistic is significant at 3 classes (but no longer at 4 classes), the BIC begins to increase at 2 classes, and the sample-size adjusted BIC begins to increase at 3 classes.
2. For one psychiatric diagnosis or variable, 4 categories are generated - which is strange since it is a dichotomous variable.
3. I receive the following information which I'm not familiar with:

IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET
AT THE EXTREME VALUES. EXTREME VALUES ARE -15.000 AND 15.000.
THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES:
* THRESHOLD 1 OF CLASS INDICATOR DSM_SP_N FOR CLASS 1 AT ITERATION 400
* THRESHOLD 2 OF CLASS INDICATOR DSM_SP_N FOR CLASS 1 AT ITERATION 400
* THRESHOLD 3 OF CLASS INDICATOR DSM_SP_N FOR CLASS 1 AT ITERATION 400
* THRESHOLD 1 OF CLASS INDICATOR DSM_PDS_ FOR CLASS 1 AT ITERATION 400
* THRESHOLD 1 OF CLASS INDICATOR DSM_SAD_ FOR CLASS 2 AT ITERATION 400
* THRESHOLD 1 OF CLASS INDICATOR DSM_GAD_ FOR CLASS 2 AT ITERATION 400

thank you!

Linda K. Muthen posted on Saturday, February 20, 2010 - 9:53 am

1. Fit statistics can be contradictory. You need to also consider the theoretical meaning of the classes.
2. If this is the case, it sounds like you are not reading your data correctly.
3. When thresholds become large, they are fixed reflecting probabilities of zero and one. This can be helpful in defining the classes.

anonymous posted on Sunday, February 21, 2010 - 4:40 pm

Thanks very much for your response. Do I understand you correctly that Mplus fixes thresholds by default to -15 and +15 when they are small or large, reflecting probabilities of 0 (-15) and 1 (+15).
thanks!

Bengt O. Muthen posted on Sunday, February 21, 2010 - 4:43 pm

Yes.

anonymous posted on Monday, February 22, 2010 - 11:41 am

Is the parametric bootstrap method, BLRT, available in Mplus to help determine the optimal number of classes?

Linda K. Muthen posted on Monday, February 22, 2010 - 5:18 pm

Yes, it is the TECH14 option in the OUTPUT command.

anonymous posted on Tuesday, February 23, 2010 - 12:11 pm

Thank you. I am noticing that for some models the entropy is perfect (1.0), does this indicate any type of over-fitting problem?

Linda K. Muthen posted on Tuesday, February 23, 2010 - 4:21 pm

We've never seen entropy of one. It could be over-fitting but I can't say for sure.

anonymous posted on Wednesday, March 10, 2010 - 1:20 pm

I am now attempting to include a continuous covariate in an 5-class LCA of complex survey data. All models ran successfully, except the 5-class model:

WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED.THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS.

However, I increased the initial and second starting value to 3000 and 2000, respectively, but continued to receive this message. I've also tried changing the starting values by using the STSEED function.

What do you suggest?

Linda K. Muthen posted on Wednesday, March 10, 2010 - 2:45 pm

Please send the full output and your license number to support@statmodel.com.

Erika Wolf posted on Thursday, December 09, 2010 - 9:02 am

I'm running an LPA with random starts and I'm honing in on a 3 class solution. However, in the 3 class model I get the following message: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTINGVALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.194D-19. PROBLEM INVOLVING PARAMETER 20.

The parameter this is referring to is estimated at 0 (and I would expect it to be 0 in that class). I'm wondering if this is causing the problem and I can ignore this message? Or is there something else I can do to resolve the issue? Thanks!

Linda K. Muthen posted on Thursday, December 09, 2010 - 12:18 pm

A parameter estimated at zero should not cause that message. Please send the output and your license number to support@statmodel.com.

Rashelle J. Musci posted on Monday, August 22, 2011 - 6:15 pm

I am doing a set of LCA's using ordinal indicators with 3 categories. For the first LCA that I am doing I have 26 indicators of the latent class and 255 observations. Beginning with the 2 class model, I get the following error message:
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS -0.373D-18. PROBLEM INVOLVING PARAMETER 8.

Any input on how I could address this would be great. Thank you.

Linda K. Muthen posted on Monday, August 22, 2011 - 8:25 pm

Please send the output and your license number to support@statmodel.com.

Susan Pe posted on Tuesday, September 18, 2012 - 6:36 am

Hi, I am considering doing a latent class analysis using 3 variables. Is it possible to use latent variables for the latent class analysis? I think the observed proxy for the variable may not be good enough and using a latent variable made up of 3 items may be a better proxy for the variable. Or should I try to come up with the best observed proxy for the variables in the latent class analysis? Thank you.

Jon Heron posted on Tuesday, September 18, 2012 - 7:11 am

Continuous latent variables would be a second order Latent Profile Analysis I guess.

Linda K. Muthen posted on Tuesday, September 18, 2012 - 10:44 am

If the variables are binary, no more than two classes can be extracted.

See Example 7.17 where a factor is used as you suggest. You could try it both ways. One issue with using a factor is that the indicators may have a direct relationship to the categorical latent variable.

Selahadin Ibrahim posted on Thursday, September 20, 2012 - 6:50 am

Dear Bengt,

I have 20 continuous variables to perform latent profile analysis with a sample size of around 900.
1) How do I determine the identifiably of my model?
2) Generally should residual variances across classes be held equal? and what is the theoretical basis for this?

Any good papers on this topic will be greatly appreciated.

Thanks in advance

Selahadin

Linda K. Muthen posted on Thursday, September 20, 2012 - 2:57 pm

For both of these questions see the book, Finite Mixture Distributions, by Everitt and Hand. It is more difficult to estimated a model with variance unconstrained across classes.

Lisa M. Yarnell posted on Tuesday, September 25, 2012 - 5:18 pm

Hi Linda, I picked a 4-class solution for my unconditional LCA model. I am now running conditional models. I notice that the item endorsement probabilities for each class change slightly as I add predictors. Is it possible for me to constrain the loadings of the indicators in the conditional models to what they were in the unconditional model? This way, the classes mean the same thing upon adding predictors to the model. Note that all of my indicators are dichotomous. Thank you.

Linda K. Muthen posted on Tuesday, September 25, 2012 - 5:28 pm

This may indicate the need for direct effects. See the following paper on the website:

Muth�n, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368).

Bengt O. Muthen posted on Tuesday, September 25, 2012 - 6:12 pm

You may also consider the 3-step approach "R3STEP" that is introduced in the just released Version 7. This holds the class proportions fixed at the values of the unconditional model. See Mplus Web Note 15 as well as the V7 training videos from Utrecht that are referred to on our home page.

Lisa M. Yarnell posted on Wednesday, September 26, 2012 - 8:09 pm

Thanks, Bengt! I will take a look.

Lisa M. Yarnell posted on Thursday, September 27, 2012 - 12:48 am

Bengt and Linda, I have Mplus Version 6.12. Is there a way to constrain the solution to what it was in the unconditional model manually, without R3STEP? I was thinking that I could do this by using @ values for the thresholds of the dichotomous items, rather than start values (where * is used), which is demonstrated in the Mplus manual. Will this work? Thank you!

Lisa M. Yarnell posted on Thursday, September 27, 2012 - 12:58 am

Actually, Linda and Bengt, I will try to upgrade to Version 7. Is R3STEP preferable over trying to constrain the solution by hand (manually)?

Linda K. Muthen posted on Thursday, September 27, 2012 - 8:34 am

It is much more convenient to do this using R3STEP. You should upgrade.

Lisa M. Yarnell posted on Friday, September 28, 2012 - 5:50 pm

Hi Linda and Bengt, I am now using R3STEP and I think it is great! However, in comparing the effect of different predictors on class membership in univariate fashion, I notice that the fit indices are the same regardless of what predictor I enter.

Specifically, the -2LL is -2765.7 and the AIC is 5753.4 regardless of whether I enter Internalizing, Externalizing, or Adversity. I would expect the fit of the model to be different depending on what predictor I enter.

Or, is it the case that these fit statistics refer to the fit of the initial model without the predictor (the first step of the 3-step process)? The numbers above are indeed very close to those from the unconditional model.

Is there a way to get the fit of the model with the predictor added, enabling me to compare the fit of models with various predictors?

Linda K. Muthen posted on Sunday, September 30, 2012 - 12:30 pm

VAriables tested using R3STEP are not part of the analysis model so will not affect fit indices. The only way for this to happen is if the are included in the MODEL command.

William Arguelles posted on Wednesday, December 12, 2012 - 9:21 am

Dear Linda,

I am running a LCA with both continuous and binary indicators. I am interested in reporting odds ratios and confidence intervals for a 2-class solution. I am trying to make one specific class the referent group to obtain the odds ratios in the direction I want. Without using the CINTERVAL statement, I am able to do this by specifying starting values for the classes in my model input statement. However, when I add the CINTERVAL statement, regardless of using starting values (both exact and extreme), I cannot seem to change the class that is used as the referent in the output. Do you know if there is a more appropriate way to do this?

Linda K. Muthen posted on Wednesday, December 12, 2012 - 12:29 pm

I can't see how adding the CINTERVAL option to the OUTPUT command would affect the estimation of the model. Please send the two outputs and your license number to support@statmodel.com.

Lisa M. Yarnell posted on Sunday, January 06, 2013 - 3:43 pm

Linda and Bengt,

Is it a requirement that responses on the indicators in an LCA be indpendent of each other? That is, there should not be contingency between two indicators of the latent class, right? In other words, there should not be correlations between responses on the indicators above and beyond what is accounted for by the latent class factor, right?

Is this stated in the Mplus manual so that I may cite this idea?

Thank you,
Lisa

Bengt O. Muthen posted on Sunday, January 06, 2013 - 5:44 pm

LCA indicators can be highly correlated, but the model says that they are not correlated within class. This is the standard LCA conditional independence assumption that you will find in any LCA writing including in the LCA book our UG refers to: Hagenaars & McCutcheon (2002). You add classes until this is reasonably fulfilled (you can check by TECH10).

Lisa M. Yarnell posted on Monday, January 07, 2013 - 3:08 pm

Bengt, I tried requesting this information using TECH10, but received this message:

TECHNICAL 10 OUTPUT

TECH10 OUTPUT FOR CATEGORICAL VARIABLES IS NOT AVAILABLE BECAUSE THE FREQUENCY
TABLE FOR THE LATENT CLASS INDICATOR MODEL PART IS TOO LARGE.

Do I have too many indicators to check that we have met the conditional independence assumption? Is there any way that we can amend our code to check that we've met this criterion for a good solution?

Bengt O. Muthen posted on Monday, January 07, 2013 - 4:37 pm

It is hard to do model fit testing with many categorical variables. In those cases I would take the more practical approach an increase the number of classes to see if new substantively meaningful classes come out.

Karen-Inge Karstoft posted on Monday, January 14, 2013 - 10:43 am

Hi,

I'm running a 3-step LCGA including a range of auxiliary variables (using x(r3step)). I have added CINTERVAL to the output command and expected to find ORs and CIs from the logistic regression in the output. However, they do not seem to be there. Am I missing something?

Thanks

Bengt O. Muthen posted on Tuesday, January 15, 2013 - 12:51 pm

This is not implemented yet.

Brandy Gatlin posted on Friday, November 22, 2013 - 3:11 pm

Hello,

This is a simple question. In order to run a latent class model, is it necessary to purchase the mixture or combination add-on for the base program?

Thank you

Bengt O. Muthen posted on Saturday, November 23, 2013 - 6:06 am

Just the mixture add-on.

J.D. Haltigan posted on Monday, January 27, 2014 - 1:11 pm

Hello:

I understand in LCA (binary indicators) that when thresholds become large, they are fixed reflecting probabilities of zero and one. In the case of LPA (count indicators) when one gets extreme logit parameters set at -15 and 15, does this mean that the probability of the mean for a given indicator is zero or one?

Bengt O. Muthen posted on Monday, January 27, 2014 - 3:00 pm

Please send that output to Support.

lisa Carlesso posted on Friday, May 16, 2014 - 8:28 am

Hello

I am new to LCA/LPA and have a few questions before I get started. I am interested in determining clusters of people with arthritis. Previous work using hierarchical cluster analysis has analyzed clusters mainly based one construct (psychological profiles) and then looked at associations with other factors once the clusters were established. So my questions are

1. With LCA/LPA am I able to cluster on more than one construct e.g. psych profiles (3 variables), performance of physical activity tests (4 variables), sensory testing (2 variables) and patient reported pain and function (3 variables)? If this is possible are there concerns in using multiple factors/constructs?

2. I understand that once the clusters are formed that there is the assumption of independence of the variables. As per Dr. Muthen's first comment in this thread about suspecting that there are direct relationships between the observed variables, does one consider a minimal correlation e.g. r<0.4 to determine whether direct relationships exist?

thank you

Bengt O. Muthen posted on Saturday, May 17, 2014 - 11:41 am

1.Perfectly fine to cluster based on all constructs jointly, working with all the indicators jointly. You have to choose the model specification of letting the factor means vary across classes (invariance indicator intercepts), or letting the indicator intercepts vary across classes (factor means fixed at zero). See my "hybrid" paper from 2008 on our website.

2.Note that having a latent class variable implies that the indicators are correlated; the model says that's why they are correlated. The question is if there is residual, or within-class, correlation between some indicators. I don't know if you have continuous or categorical indicators. In both cases, however, you can use WITH to capture some of these residual correlations.

Tomo Umemura posted on Thursday, April 13, 2017 - 6:14 pm

Hello,

I am conducting LCA. But I keep receiving the following error message:

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS -0.243D-16. PROBLEM INVOLVING PARAMETER 12.

According to the parameter specification, the problem parameter is tau (a threshold). So I was initially testing ordinal indicators (3 points), so changed data to binary indicators (2 points) but still receive the same error message. Could you suggest any solutions for this problem?

Bengt O. Muthen posted on Friday, April 14, 2017 - 3:55 pm

Send the output to Support along with your license number.

Benjamin Trachik posted on Saturday, May 27, 2017 - 7:15 am

I am running an LCA attempting to identify classes with-in a diagnostic category. I am using complex Survey data and have run intio a few issues. I am using the analysis type Complex mixture. Any insight into some possible resolutions would be greatly appreciated.

The first question I have is with weighting. For the LCA do I use the probability weight, stratification, and clustering variable? I received some advice that I only need to use the weight plus either the stratification or clustering variable and not both.

My second question is more of a coding question. I have all the variables entered correctly, but If i remove the stratification variable from the use variables command and delete the stratification command I get a warning that there is missing data.

Benjamin Trachik posted on Sunday, May 28, 2017 - 3:21 pm

My apologies. Let me clarify question 2.

If I delete the "stratification=" command, but leave the stratification variable under the "use variables" command the model runs fine with no error messages. If I remove the stratification variable from use variables, then I get a missing data error message.

Linda K. Muthen posted on Monday, May 29, 2017 - 6:48 am

The stratification variable should not be on the USEVARIABLES list. There must be a problem with your model. Send the output and your license number to support@statmodel.com.

Witold Orlik posted on Tuesday, September 26, 2017 - 9:56 am

Please help .
I converted data set from Stata to Mplus, then ran some latent class analysis using Mplus. Now I would like to transfer back 3 class solution from Mplus to Stata for other analysis. In detail, I wanted to add a variable to Stata indicating for each participant, which class they are in (so for 3 class solution, participants would have a value of 1,2 or 3). It would be based on 3 class solution output file from Mplus. Hope that is clear explanation, if not, please let me know and I will amend it.
Thank you very much for any help.

Regards

Witold

Bengt O. Muthen posted on Tuesday, September 26, 2017 - 6:01 pm

Try using the Savedata command with

File = name.dat;
Save = cprobs;

The last column gives you the most likely class.

Witold Orlik posted on Friday, September 29, 2017 - 1:05 am

Thank you Bengt,
I used that command, saced file as .sav then opened that using notepad and then transferred this to Xcel and then to Stata. Anyway I am glad it worked.

Peng qian posted on Thursday, November 30, 2017 - 7:02 am

Dear Dr. Muthen:

How can I simulate two equal groups in LCA?

The synatx of mcex7.21.inp provided by MPLUS below.

montecarlo:
names are y1-y4 g;
generate = g(1);
categorical = g;
genclasses = cg(2) c(2);
classes = cg(2) c(2);
nobs = 1000;
seed = 3454367;
nrep = 1;
save = ex7.21.dat;

We just obtain two random groups, not 500|500.

Thanks!
sincerely

Bengt O. Muthen posted on Thursday, November 30, 2017 - 4:08 pm

Send your full output to Support along with your license number. Also explain what you mean by 500 | 500.

ZHANG Liang posted on Friday, May 18, 2018 - 2:02 am

Greetings, professor.

We conducted a K-means cluster analysis to distinguish 4 types of parental styles based on 2 parental practices. Reviewer said that, given the fairly large sample size (600+), we are supposed to use LPA. But when we tried to do so, the number of latent classes would be 2, which can hardly be theoretically interpreted.

We noticed that LPA analyses usually have more than five indicators. Is the fact that we have ONLY TWO indicators (i.e., parental practices) make our findings statistically unreliable?

Thank you!

Bengt O. Muthen posted on Friday, May 18, 2018 - 1:37 pm

I would suggest taking the approach of UG ex 7.22. This is a more general model than LPA.

Anna Kallschmidt posted on Wednesday, July 11, 2018 - 2:58 pm

Is there a limit on how many items I can run in Mplus for an LPA? I'm trying to perform an LPA on 63 traits, so that I can run regressions between these clustered traits and 8 predictors. However, every time I run this syntax, Mplus shuts down and quits working.
VARIABLE:
NAMES ARE
Q2_1 Q2_2 Q2_3 Q2_4 Q2_5 Q2_6 Q2_7 Q2_8 Q2_9 Q2_10
...63 total traits
Q7_1R Q7_25R Q7_28R Q7_40R Q7_4R Q7_15R
Q7_27R Q7_38R Q7_3R Q7_9R Q7_17R Q7_29R Q7_26R Q7_36R
Q7_35R Q7_37R;

USEVARIABLES ARE Q2_1 Q2_2 Q2_3 Q2_4 Q2_5 Q2_6 Q2_7 Q2_8 Q2_9 Q2_10
Q2_1 Q2_2 Q2_3 Q2_4 Q2_5 Q2_6 Q2_7 Q2_8 Q2_9 Q2_10
...63 total traits ;

MISSING = ALL (-99);
CLASSES = C(3);

Analysis:
TYPE = MIXTURE;
iterations = 3000;
starts = 100 10;
Plot:
type is plot3;

OUTPUT: TECH11 TECH14;!SAMPSTAT RESIDUAL;

Bengt O. Muthen posted on Wednesday, July 11, 2018 - 5:42 pm

There is no such limit. Send your output and data to Support along with your license number so we can see what's going on.

Wen, Fur-Hsing posted on Friday, March 15, 2019 - 5:08 am

My question is what is the minimum requirement for the relative class size in the latent class analysis/latent transition analysis.

Bengt O. Muthen posted on Friday, March 15, 2019 - 5:49 pm

There is no such known quantity as far as I know.

Wen, Fur-Hsing posted on Saturday, March 16, 2019 - 8:01 pm

Dear Dr. Muthen,

What is your recommendation?

Thank you,

Wen

Bengt O. Muthen posted on Sunday, March 17, 2019 - 4:25 pm

There are so many factors involved that nothing general can be stated.