Message/Author 

Anonymous posted on Friday, April 30, 2004  7:41 am



I have been using the Mplus Demo and hoping to use Mplus software in the near future. Would someone from the expert programmers help me to list the class membership for each subject. I.e., I would like a variable that stands for class membership in the same order used for the outcome variables, oredered by id for example. Many thanks. 


I'm not sure I totally understand your question. You can save class membership by using SAVE=CPROBABILITIES in the SAVEDATA command and you can save the id variable by using the IDVARIABLE option of the VARIABLE command. The you can sort the data in Excel. 

Anonymous posted on Wednesday, June 16, 2004  9:19 am



I have an average of 30 nurses nested in 200 hospitals. Thus i have about 10 nurse, and 5 hospitallevel measures. I want to use LCA to find typologies of hospitals. In addition I would like to have nurselevel variables as predictors of class membership. Can I use LCA with TWO LEVEL models models? Are there any examples available? Can I do this analysis in one step? Should i do the hospital LCA separate and then combine it with a TWO LEVEL Model? 

bmuthen posted on Wednesday, June 16, 2004  9:54 am



Twolevel LCA is described in the version 3 User's Guide in chapter 10. One question is if the LCA class membership is (1) a hospitallevel variable or (2) a nurselevel variable. You mention finding typologies of hospitals which suggests (1), but then mention prediction of class membership by nurselevel variables, which suggests (2). The Mplus multilevel LCA is primarily aimed at (2), i.e. finding nurselevel latent classes while accounting for the multilevel data by allowing LCA coefficients that vary across hospitals, e.g. latent class intercepts in the regression of class membership on nurse variables. If you are really interested in (1), perhaps you want to aggregate nurse variables to the hospital level and do a regular LCA on the 200 hospitals. In any case, that would seem a natural starting point. 

jbond posted on Tuesday, October 19, 2004  10:03 am



Bengt, After reading Nagin and Tremblay's Psych methods paper "Analyzing development trajectories of distinct but related behaviors: a groupbased approach," I was wondering if you could outline, or if there is a paper discussing, the differences between the likelihood based approach to LCGA that he uses and the method that used in Mplus. I know Mplus is quite bit more flexible in its ability to incorporate more complex dependencies between variables and/or factors and also allows for things like variation around the growth trajectory but I believe that the technical specification of the model is slightly different. Unfortunately, he doesn't give much of the details of what is being done. Interesting that his prediction, on page 31, that "we suspect that adapting this modeling framework to a structural equation modeling framework would make it difficult to retain two key strengths of the framework  the fleibility to handle a variety of different data types and to accommodate missing data" turned out to be somewhat off the mark. Thanks for any input, Jason 

bmuthen posted on Tuesday, October 19, 2004  10:07 am



Will take a look at that paper and get back to you. 

bmuthen posted on Wednesday, October 20, 2004  5:48 pm



Looks to me like the NaginTremblay 2001 Psych Methods paper corresponds to a parallel process, LCGA where you have one latent class variable for each of the two LCGA growth processes and you are interested in relating the two latent class variables to each other (so a "multiple c" application  see the end of Chapter 13 of the Version 3 User's Guide). So this can be done in Mplus and Mplus also allows for various generalizations that give important flexibility. 

Anonymous posted on Thursday, December 30, 2004  7:16 am



I am running a LCA with 21 binary class predictors and 4 covariates (direct & indirect) in Mplus 3. I need to save the most likely class membership to use in another model. I did so using the (partial) input below: ANALYSIS: TYPE = MIXTURE MISSING; STARTS = 100 10; STITERATIONS = 20; MITERATIONS = 1000; SAVEDATA: FILE IS c:\rnp_saved_data_4C_dir&indi RECORDLENGTH = 1000; SAVE = CPROB; This resulted in: CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 12568 0.22747 2 6053 0.10955 3 9818 0.17770 4 26812 0.48528 However when I look at the saved data file, of these 55,251 people, only 9,079 of them actually have a class assignment distributed as follows: Saved Latent Classes 1 2282 2 919 3 1657 4 4221 1. Why weren't all 55,251 people in the original data set assigned to classes in the new data file? How can I get the saved file to give everyone a class assignment? 2. I also noticed that the data in some of the original variables changed> For example, some binary variables were no longer binary, unique IDs changed and were no longer unique so I can not merge with other data files to continue my analyses, etc. How can I ensure that the data for all USEVARIABLES are copied as is when using the SAVEDATA command? Thanks 


Can you send your full output to support@statmodel.com? It sounds like your data are not being read correctly. If you could send a portion of the data  first ten cases  that would also help. Note that Mplus has a maximum record length of 1000. 

Anonymous posted on Friday, March 04, 2005  3:32 am



Hello, I have a rather general question concerning latent class growth analysis. Is this procedure suitable for a multivariate approach? By this I mean determining class membership on two or more variables (that are theoretically closely interlinked) simultaneously? Many thanks in advance! 

bmuthen posted on Friday, March 04, 2005  6:19 am



Yes, certainly. You can either do this using two separate latent class variables that are related, or use a single latent class variable that is determined by your multiple processes jointly  sounds like the second approach fits your theory best. 


Manu thanks for your quick reply, Dr Muthen! 

Anonymous posted on Tuesday, March 15, 2005  12:37 pm



Do you have any suggestions for working around the recordlength=1000 limitation? I have slightly more than that (around 12001300) in my data and am wondering how I can run a latent profile analysis and get back latent class probabilities and assignments for all of my data. This seems like a surprising limitation in this day and age of big data setsor am I missing or misunderstanding something? 


Each observation can have more than one record. So I suggest saving the data such that you have two records per observation. It's a long story that probably wouldn't mean much to you why we have this limit. So far no one has found in insurmountable. 

Katharina posted on Wednesday, May 25, 2005  9:43 am



hi I have only recently started using Mplus, so my question is very basic. I have carried out LCA with covariates and would now like to use these classes (i.e. the categorical variable underlying these classes) as an independent variable in a general SEM framework to predict certain outcomes, i.e. a number of continuous latent variables. Is this possible and if so, how? also, are there any references that cover this area in greater depth? Many thanks 

bmuthen posted on Wednesday, May 25, 2005  10:03 am



This is possible to do in single step in Mplus. For examples, and further references, see the Muthen (2004) chapter in the Kaplan handbook  this is available on our web site. 


Hi, I have 2 questions: Is it possible to treat individually varying time of observation in a mixture model. I try the AT statement but it does't work. And, what means c#1 in your exemple, I don't understand... Thank you 


AT should work here  send your input, output, data, and license number to support@statmodel.com. I am not sure which example you refer to, but the User's Guide has many mixture examples that may be helpful. As for learning mixtures, you might also consider the upcoming course in Montreal. 


YES!!! For sure i'm gonna be at your course in Montreal. For my first question, can I write : type = mixture random missing; Thank you 


Yes. You can see the combinations of options that can be used together in Chapter 15 where the ESTIMATOR option is discussed. 


Thank you for your time!! Since I change for the next input, I can not see the plot anynore and there is latent classes that have the same intercept and slope in the output... variable: names are id age1 age2 age3 onset y1 y2 y3; usevariables are age1age3 y1y3; tscores = age1age3; classes = c(6); missing = . ; analysis: type = mixture random missing; starts = 20 2; model: %overall% i s  y1y3 at age1age3; plot: type = plot3; before : variable: names are preco y1y3; usevariables = y1y3; classes = c(4); missing = . ; analysis: type = mixture missing; starts = 20 2; model: %overall% i s  y1@0 y2@1 y3@2; plot: type = plot3; series = y1y3(*); 


I don't see you in the list of course registrants. 


Yes it's Annie Lemieux from University Sherbrooke! Desrosiers is my husband name, because in Quebec we can't have the husband name but when it's not formel I use Desrosiers. I'm sorry for the confusion, I did't realize! 


In my data file, I have seven categorical variables. However, I want to conduct a latent variable mixture model using data from individuals who have endorsed only one or two of any of the seven categorical variables. Is this possible? Thank you for your time. 


Are you asking if you can select individuals who have endorsed only or or two items? 


Hi Linda, I know how to select individuals who have endorsed only one or two variables. If I specifically know prior to conducting the analysis that an indiviudal has information on one or two variables only can a latent variable mixture model be conducted. I am concerned about model identification. I hope this question is clear. 


If you mean that you have observations who have missing on all but one or two items, this should be fine. Identification would be compromised only when no one provides information for a pair of variables. Note that standard errors may be larger where less information is available. 


Individuals who have information on one or two variables (i.e., have answered the question ;'yes') have answered 'no' to the other variables. In other words, I have some individuals who have answered 'yes' to any one variable and 'no' to the other six variables. I also have individuals who have answered 'yes' to any two variables and 'no' to the other five variables. Does this mean that identification could be compromised? I hope this is question clear. thanks for your help. 


If for each pair of variables you have some people who have observations on both variables, you are most likely ok from an identification point of view. Try it  Mplus will tell you if your model is not identified. 


Hi Dr. Muthen, Is there a way to see if the latent class variable had better predictive utility of a distal outcome than any indicator by itself ? Thanks 


Say that all your variables are dichotomous: the latent class variable, the indicators and the distal. The latent class model, including the distal, gives you an estimate of the distal probability for each of the two latent classes. Alternatively, you don't include the distal in the latent class analysis but put it as auxiliary = distal(e); which gives you the distal mean (proportion) for each class. As a comparison you can then instead do regular logistic regression of the distal on an indicator and get an estimate of the distal probability for each of the two categories of the indicator. 


Thanks Dr. Muthen, I ran the model with the distal as an auxiliary variable and it gave me 3 estmated probabilities for the distal outcome for each class (3 class model). I have P[attempt=1class=1]=.2 From the regular logistic regression I got P[attempt=1Depression=1]=.15 for one of the indicators. How do I compare these 2 probabilities and claim that the latent class variable has a more predictive ability than the individual indicator (or viceversa) ? Or does it make sense to comapre these 2 probabilities ? I hope this question makes sense. Thanks, 


You need to also compute the other probabilities: P[attempt=1class=2] = a as well as P[attempt=1depression=0] = b. Then you compute a  0.2 and b  .15 and see which of those 2 differences is the biggest. You may also want to include the distal in your LCA. 


Thanks Dr. Muthen, I do not understand how to interpret the results of the method you suggested. For example if I got a  0.2 =.1 and b  .15 = .02 Does this suggest that the latent classes are more preditive than the indicator (depression) of the distal outcome 'attempt' ? 


Well, the latent class approach says that the distal probability changes .1 when changing latent class, whereas the logistic approach based on one indicator says that the distal changes only .02 when the indicator changes categories. So that says that the latent classes are more influential on the distal than this indicator is. You can also quantify this as an odds ratio. How well the two approaches predict is another matter where you have to compare estimated probabilities with actual distal outcomes in line with what you do in regular logistic regression (see such books). 


Thank you Dr. Muthen. This helps a lot. 


Hi Dr. Muthen, When I compute the differences in probabilities as you suggested, I found one indicator giving a higher change in probability than the classes. Rest of indicators gave smaller change in probability than the classes. Does this mean I'm better off using that one indicator to predict my distal outcome than using latent classes ? 


Maybe. You also have to look at how well you predict  see logistic regression text books. 


Hi Dr. Muthen, Is there a way to see if the latent class structure vary by other variables say gender using a single model rather than running seperate LCA for males and females. Thanks. 


You can run a model using the KNOWNCLASS option using males and females as the known classes. But I would analyze them separately to see if the same LCA model at least as far as number of classes holds for them both before I put them together. 


Thank you Dr. Muthen. 


Hi Dr. Muthen, Can I estimate the transition probability matrix for males and females in a 3 class LTA model with 3 time points ? I tried the following code: C2#1 on C1#1 GENDER; C2#1 on C1#1 GENDER; . . .etc But I only get two matrices for the 2 time points. How can we get seperate transition probability matrices for males and females ? Thanks 


You have to create these yourself. They are not given as part of the Mplus output. 


How can I do it ? Do I have to run two LTA's for males and females ? Is there any example in the Mplus manual ? Thanks 


You analyze males and females together using gender as a "Knownclass" latent class variable. Then you look at the output showing the estimated probabilities for all latent classes  from this table you can compute the transition matrices for males and females. 


Thanks 


Hi Dr. Muthen, I tried the following code to estimate the transition probabilities. This is part of the code: CLASSES : CG(2) C1(3) C2(3) C3(3); KNOWNCLASS: CG (GENDER=1 GENDER=2); MODEL: %OVERALL% C2#1 ON C1#1 CG#1; C2#2 ON C1#1 CG#1; C2#1 ON C1#2 CG#1; C2#2 ON C1#2 CG#1; . . .etc here I'm trying to estimate the t1 to t2 transition probabilties for each gender. But I'm not getting a meaningful output. What is wrong in this code ? please help. Thanks 


Please send your input, data, output, and license number to support@statmodel.com. 


Dear Dr. Muthen, I have a distal outcome and I used the auxiliary = distal(e); option to get class varying means(proportions) for my binary distal outcome in a 3 class LCA model. I have p1 for class 1, p2 for class2 and so on. If I need to compare p1 ad p2 (i.e pairwise contrasts) how can get the standard errors of these estimated means ? Thanks. 


In Version 5.1, you will get the standard errors. 


I have ver 5.0 . How do I upgrade to ver 5.1 ? Thanks. 


If you have a current upgrade and support contract, you can go to Support  Mplus Updates to download Version 5.1. 


Thanks Dr. Muthen, I have one more question. How can I get a G square goodness of fit statistic for a LCA model with 3 classes in Mplus ? Thanks 


If you mean a frequency table chisquare test, yes, we give both the Pearson and the Likelihood ratio chi2. 

finnigan posted on Tuesday, February 10, 2009  2:53 pm



Linda/Bengt I am undertaking a longitudinal research and want to examine typologies of significant events at work, significant events outside of work and significant events in personal life. The survey questions asks if a respondent has experienced a significant life event the answer is yes or no. if yes, please describe it as positive,negative or neither. The person is asked to then indicate if the event made them focus more on past present or future. The data structure will be multilevel ie persons within organisations. I am trying to find out if I can create typologies of say Positive future based work life event, past negative based personal life event etc Would latent class analysis be appropriate to do this? Thanks 


It is not clear to me how your variables can be used as latent class indicators. Do you plan on combining them in some way? 


Hello, I have a question concerning the sample size. I have conducted a latent class analysis with 34 subjects (thus, I have a very small sample). How does the sample size influence the estimates/fit indices? Thank you 


Hard to say in general and much of this depends on how large the class separation is. The standard errors might not be that good. You need a Monte Carlo study to get a better feel for this. The BIC tends to underestimate the number of classes at low sample sizes as shown in the Nylund et al article. 


Hello, I have recently submitted a paper where I've used latent class analysis (with binary indicators) to investigate subtypes of cognitive impairment. The results suggested a fourclass solution; two of these classes represented global cognitive impairment across domains (a mild and a severe class) while the other two represented impairment in specific domains. A reviewer commented that the global findings differed from a number of previous factor analyses of cognitive data and asked for more explanation. I used the same dataset and ran an exploratory factor analysis on the same binary indicator variables and this suggested a three factor solution. The first two factors looked like my domainspecific deficit classes but there was no global type of factor. So the observation of the global classes doesn't seem to be related to anomalies in my dataset but rather with differences between the two methods. My question then is whether there is any theoretical reason that factor analysis might be less likely to yield global factors (ie high or low scores across all observed indicators) than latent class analysis? I haven't been able to find anything in the literature yet. Thanks! 


Exploratory factor analysis is not well suited for finding a global (general) factor. You would have to do a CFA to capture that  a "generalfactor, specificfactor" model (see handout for Mplus short course topic 1). But you should also investigate if EFA/CFA or LCA fit the data better by comparing their BIC values. One model may very well be a lot better than the others in terms of BIC and should therefore be chosen on statistical grounds. For an example, also using factor mixture analysis, see http://www.statmodel.com/download/Muthen_tobacco_2006.pdf 

Julie Mount posted on Thursday, June 18, 2009  3:23 pm



Thank you, this is really useful. For the LCA (four class) the BIC is 1316 and for the EFA (3 factors) it's 8232. A big difference! 

mpduser1 posted on Tuesday, June 21, 2011  10:40 am



I'm generating a series of models in Mplus in an attempt to determine if latent class membership differentiates classes on some dichotomous distal outcome, Y, after adjusting for a series of covariates, X. I want to make sure I've set the problem up correctly (and am interpreting the results correctly) within the Mplus framework. My latent class model (L) has three classes, predicted based on a series of indicators (I). What I've done is first estimated: L > I L > Y In doing so I found that my classes differ on Y. In other words, the probability of endorsing the dichotomous item Y varies by latent classes. Then, I want to see if the crossclass variation in probability of endorsing Y changes after using covariates X. So, I estimate: L > I L > Y X > Y In many instances, the results appear to make sense. But for one particular Y, I obtain latent class probabilities of endorsing Y as 0 (or, close to it) for all of my latent classes. I'm wondering if this makes sense or if I've set the model up wrong  essentially none of my respondents, regardless of latent class, are endorsing Y after I adjust for X? 


Please send the output for the last analysis and your license number to support@statmodel.com. 

Leslie Roos posted on Tuesday, March 13, 2012  1:40 pm



Hello! I am a student new to MPlus and LCA and trying to determine which module would be best for my analyses: MPlus Base Program or MPlus Base program with Combination Add on. The project I am working on is with a large population database and will investigate whether 'classes' derived from a range of childhood variables (some multilevel varying in frequency and severity) will predict a binary adult outcome. Any advice would be much appreciated. I'm not sure I fully understand what '2level LCA' would look like. 


If you want to do Latent Class analysis, you would need either the Base plus the Mixture AddOn or the Base plus the Combination AddOn. If you don't have multilevel clustered data, you do not need to worry about multilevel analysis. 


I am struggling with model interpretation. I have approximately 104,000 respondents (37 items) and 99,000 different response patterns so my response space is almost a uniform distribution. It is my understanding that MPLUS classifies on the basis of response pattern so this distribution presumably affects interpretation. Can you please tell me how LCA decides class membership? And, does the response pattern composition affect class membership? 


The response patterns are a summary of the categorical data. The do not affect class membership. Classes are formed by finding the model with the lowest correlations among the latent class indicators for each class. The model assumes conditional independence. The interpretation of the classes is based on the profiles of the latent class indicators. 

Tim Stump posted on Saturday, September 22, 2012  8:32 pm



I'm running LCA with bayes estimator. I don't seem to get Bayes Information Criteria (BIC) to compare models with different number of classes as with ML or MLR estimator. I've just started to learn this type of model. Maybe I'm missing something easy about how to obtain BIC. Key parts of input program are below: ANALYSIS: TYPE = mixture; estimator=bayes; fbiterations=10000; processors=4; MODEL: %overall% %c#1% svdrng1svdrng11 with svdrng1svdrng11(p1p55); %c#2% svdrng1svdrng11 with svdrng1svdrng11(q1q55); MODEL PRIORS: p1p55~IW(0,100000); q1q55~IW(0,100000); OUTPUT: TECH1 tech10 tech11 tech14; 


This is not yet available. 

Tim Stump posted on Monday, September 24, 2012  7:55 am



If BIC is not available, are there other suggestions/guidelines on determining number of classes when using bayes estimator for LCA? 

Tim Stump posted on Monday, September 24, 2012  8:54 am



For LCA with bayes estimator, is deviance information criterion (DIC) available? Is there an option I can specify to get this displayed in the output. 

Tim Stump posted on Monday, September 24, 2012  9:25 am



Sorry, I'm showing my ignorance in this topic. But, looking back at slides from a previous workshop, I see that posterior predictive pvalue is useful for determining model fit with bayes estimator along with posterior predictive checking histogram  difference in chisquare (difference between real/replicated data). Would this be a useful method to determine number of classes from an LCA model? 


Take a look at section 6.5.3 about using PPP as a class enumeration technique. https://www.statmodel.com/download/BayesAdvantages18.pdf BIC and DIC are not available yet. 

Tim Stump posted on Friday, September 28, 2012  7:20 am



Thank you for referencing that article, very helpful. 

Ed Dunne posted on Monday, November 12, 2012  5:41 pm



I have a question about withinLCA class comparisons. To your knowledge, has anyone every compared participants within a specific class? For example, if I have four classes, can I take only ONE of those classes and make comparisons based on gender (i.e., males in this group were more likely to do X when compared to females in this group)? Thank you! 


You can use gender as a KNOWNCLASS variable. This is like multiple group analysis. See Example 7.21. 


Dear dr's Muthen, I have a question similar to the third of this thread, but since it's ten years later now, the views and possibilities with regard to this question might have changed. My dataset contains responses of approx 1750 teachers who work in approx 100 schools. The questions I askes the teachers are about the school culture and the school leader. A latent class analysis seems the best approach, but how to take the nested structure into account? I would like to assign schools to latent classes of school culture and assign schools to latent classes of educational leadership. In the next step, I would like to examine relationships between the two. Would it be possible to conduct these analyses in MPlus, and if so: how? Thank you in advance, Marieke 

IYH Boon posted on Tuesday, November 11, 2014  1:14 pm



I was wondering if you know of a paper that lays out the basic estimating equations for (longitudinal) latent class analysis? Something akin to what you lay out in Eq (18)(20) in Muthen 2001 ("Latent Variable Mixutre Modeling), but for LCA with repeated measures. Thanks in advance for any help you're able to offer. IYH 


Our UG refers to the 1998 Reboussin et al article in MBR  that gives a good statistical account of it. 


I have a relatively small dataset of 28 teams (108 individuals). I want to make classes with LCA, based on a team’s growth pattern (of a teamlevel variable). When I perform the LCA, Mplus delivers classes. The entropy, classsizes and latent class probabilities are all good. However, looking at the standardized model results, the estimates, S.E. etc. are all 999.000. Do you have any suggestions or possible explanations? 


Please send the output and your license number to support@statmodel.com. 


Hi Dr. Muthen, I am a new Mplus user. I am running this model with all binary indicators: Variable: Names are codice_scuola_8_14_15 GESTIONE d10a_d [...] d11k_d; Missing are all (9999) ; IDVARIABLE = codice_scuola_8_14_15; Usevariables are d10a_d  d11k_d; Categorical are d10a_d  d11k_d; classes = c (3); Analysis: Type = mixture ; MITERATION = 500; STARTS = 2500 25; STITERATIONS = 10; Output: SAMPSTAT STANDARDIZED tech11; SAVEDATA: file is lca_mgm.txt ; save is cprob; format is free; But I cannot run SAMPSTAT and STANDARDIZED outputs, getting this message: *** WARNING in OUTPUT command STANDARDIZED (STD, STDY, STDYX) options for TYPE=MIXTURE with categorical, count, censored or nominal outcomes are available only with ALGORITHM=INTEGRATION. Request for STANDARDIZED (STD, STDY, STDYX) is ignored. *** WARNING in OUTPUT command SAMPSTAT option is not available when all outcomes are censored, ordered categorical, unordered categorical (nominal), count or continuoustime survival variables. Request for SAMPSTAT is ignored. How can I modify the code to have these outputs? Thank you. 


This is not offered in the Mixture case. Standardization is not needed here. For sampstat you can ask for Crosstabs or run a Basic run not declaring the variables as categorical. 

Back to top 