Anonymous posted on Friday, April 30, 2004 - 7:41 am
I have been using the Mplus Demo and hoping to use Mplus software in the near future. Would someone from the expert programmers help me to list the class membership for each subject. I.e., I would like a variable that stands for class membership in the same order used for the outcome variables, oredered by id for example. Many thanks.
I'm not sure I totally understand your question. You can save class membership by using SAVE=CPROBABILITIES in the SAVEDATA command and you can save the id variable by using the IDVARIABLE option of the VARIABLE command. The you can sort the data in Excel.
Anonymous posted on Wednesday, June 16, 2004 - 9:19 am
I have an average of 30 nurses nested in 200 hospitals. Thus i have about 10 nurse-, and 5 hospital-level measures. I want to use LCA to find typologies of hospitals. In addition I would like to have nurse-level variables as predictors of class membership. Can I use LCA with TWO LEVEL models models? Are there any examples available? Can I do this analysis in one step? Should i do the hospital LCA separate and then combine it with a TWO LEVEL Model?
bmuthen posted on Wednesday, June 16, 2004 - 9:54 am
Twolevel LCA is described in the version 3 User's Guide in chapter 10. One question is if the LCA class membership is (1) a hospital-level variable or (2) a nurse-level variable. You mention finding typologies of hospitals which suggests (1), but then mention prediction of class membership by nurse-level variables, which suggests (2). The Mplus multilevel LCA is primarily aimed at (2), i.e. finding nurse-level latent classes while accounting for the multilevel data by allowing LCA coefficients that vary across hospitals, e.g. latent class intercepts in the regression of class membership on nurse variables. If you are really interested in (1), perhaps you want to aggregate nurse variables to the hospital level and do a regular LCA on the 200 hospitals. In any case, that would seem a natural starting point.
jbond posted on Tuesday, October 19, 2004 - 10:03 am
After reading Nagin and Tremblay's Psych methods paper "Analyzing development trajectories of distinct but related behaviors: a group-based approach," I was wondering if you could outline, or if there is a paper discussing, the differences between the likelihood based approach to LCGA that he uses and the method that used in Mplus. I know Mplus is quite bit more flexible in its ability to incorporate more complex dependencies between variables and/or factors and also allows for things like variation around the growth trajectory but I believe that the technical specification of the model is slightly different. Unfortunately, he doesn't give much of the details of what is being done. Interesting that his prediction, on page 31, that "we suspect that adapting this modeling framework to a structural equation modeling framework would make it difficult to retain two key strengths of the framework - the fleibility to handle a variety of different data types and to accommodate missing data" turned out to be somewhat off the mark. Thanks for any input,
bmuthen posted on Tuesday, October 19, 2004 - 10:07 am
Will take a look at that paper and get back to you.
bmuthen posted on Wednesday, October 20, 2004 - 5:48 pm
Looks to me like the Nagin-Tremblay 2001 Psych Methods paper corresponds to a parallel process, LCGA where you have one latent class variable for each of the two LCGA growth processes and you are interested in relating the two latent class variables to each other (so a "multiple c" application - see the end of Chapter 13 of the Version 3 User's Guide). So this can be done in Mplus and Mplus also allows for various generalizations that give important flexibility.
Anonymous posted on Thursday, December 30, 2004 - 7:16 am
I am running a LCA with 21 binary class predictors and 4 covariates (direct & indirect) in Mplus 3. I need to save the most likely class membership to use in another model. I did so using the (partial) input below:
However when I look at the saved data file, of these 55,251 people, only 9,079 of them actually have a class assignment -distributed as follows:
Saved Latent Classes 1 2282 2 919 3 1657 4 4221
1. Why weren't all 55,251 people in the original data set assigned to classes in the new data file? How can I get the saved file to give everyone a class assignment?
2. I also noticed that the data in some of the original variables changed> For example, some binary variables were no longer binary, unique IDs changed and were no longer unique so I can not merge with other data files to continue my analyses, etc. How can I ensure that the data for all USEVARIABLES are copied as is when using the SAVEDATA command?
Can you send your full output to email@example.com? It sounds like your data are not being read correctly. If you could send a portion of the data -- first ten cases -- that would also help. Note that Mplus has a maximum record length of 1000.
Anonymous posted on Friday, March 04, 2005 - 3:32 am
I have a rather general question concerning latent class growth analysis. Is this procedure suitable for a multivariate approach? By this I mean determining class membership on two or more variables (that are theoretically closely interlinked) simultaneously?
Many thanks in advance!
bmuthen posted on Friday, March 04, 2005 - 6:19 am
Yes, certainly. You can either do this using two separate latent class variables that are related, or use a single latent class variable that is determined by your multiple processes jointly - sounds like the second approach fits your theory best.
Anonymous posted on Tuesday, March 15, 2005 - 12:37 pm
Do you have any suggestions for working around the recordlength=1000 limitation? I have slightly more than that (around 1200-1300) in my data and am wondering how I can run a latent profile analysis and get back latent class probabilities and assignments for all of my data. This seems like a surprising limitation in this day and age of big data sets--or am I missing or misunderstanding something?
Each observation can have more than one record. So I suggest saving the data such that you have two records per observation. It's a long story that probably wouldn't mean much to you why we have this limit. So far no one has found in insurmountable.
Katharina posted on Wednesday, May 25, 2005 - 9:43 am
hi I have only recently started using Mplus, so my question is very basic. I have carried out LCA with covariates and would now like to use these classes (i.e. the categorical variable underlying these classes) as an independent variable in a general SEM framework to predict certain outcomes, i.e. a number of continuous latent variables. Is this possible and if so, how? also, are there any references that cover this area in greater depth? Many thanks
bmuthen posted on Wednesday, May 25, 2005 - 10:03 am
This is possible to do in single step in Mplus. For examples, and further references, see the Muthen (2004) chapter in the Kaplan handbook - this is available on our web site.
Yes it's Annie Lemieux from University Sherbrooke! Desrosiers is my husband name, because in Quebec we can't have the husband name but when it's not formel I use Desrosiers. I'm sorry for the confusion, I did't realize!
In my data file, I have seven categorical variables. However, I want to conduct a latent variable mixture model using data from individuals who have endorsed only one or two of any of the seven categorical variables. Is this possible?
I know how to select individuals who have endorsed only one or two variables. If I specifically know prior to conducting the analysis that an indiviudal has information on one or two variables only can a latent variable mixture model be conducted. I am concerned about model identification.
If you mean that you have observations who have missing on all but one or two items, this should be fine. Identification would be compromised only when no one provides information for a pair of variables. Note that standard errors may be larger where less information is available.
Individuals who have information on one or two variables (i.e., have answered the question ;'yes') have answered 'no' to the other variables. In other words, I have some individuals who have answered 'yes' to any one variable and 'no' to the other six variables. I also have individuals who have answered 'yes' to any two variables and 'no' to the other five variables.
Does this mean that identification could be compromised? I hope this is question clear.
If for each pair of variables you have some people who have observations on both variables, you are most likely ok from an identification point of view. Try it - Mplus will tell you if your model is not identified.
Say that all your variables are dichotomous: the latent class variable, the indicators and the distal. The latent class model, including the distal, gives you an estimate of the distal probability for each of the two latent classes. Alternatively, you don't include the distal in the latent class analysis but put it as
auxiliary = distal(e);
which gives you the distal mean (proportion) for each class.
As a comparison you can then instead do regular logistic regression of the distal on an indicator and get an estimate of the distal probability for each of the two categories of the indicator.
Well, the latent class approach says that the distal probability changes .1 when changing latent class, whereas the logistic approach based on one indicator says that the distal changes only .02 when the indicator changes categories. So that says that the latent classes are more influential on the distal than this indicator is. You can also quantify this as an odds ratio. How well the two approaches predict is another matter where you have to compare estimated probabilities with actual distal outcomes in line with what you do in regular logistic regression (see such books).
When I compute the differences in probabilities as you suggested, I found one indicator giving a higher change in probability than the classes. Rest of indicators gave smaller change in probability than the classes. Does this mean I'm better off using that one indicator to predict my distal outcome than using latent classes ?
You can run a model using the KNOWNCLASS option using males and females as the known classes. But I would analyze them separately to see if the same LCA model at least as far as number of classes holds for them both before I put them together.
You analyze males and females together using gender as a "Knownclass" latent class variable. Then you look at the output showing the estimated probabilities for all latent classes - from this table you can compute the transition matrices for males and females.
C2#1 ON C1#1 CG#1; C2#2 ON C1#1 CG#1; C2#1 ON C1#2 CG#1; C2#2 ON C1#2 CG#1; . . .etc here I'm trying to estimate the t1 to t2 transition probabilties for each gender. But I'm not getting a meaningful output. What is wrong in this code ? please help.
If you mean a frequency table chi-square test, yes, we give both the Pearson and the Likelihood ratio chi2.
finnigan posted on Tuesday, February 10, 2009 - 2:53 pm
I am undertaking a longitudinal research and want to examine typologies of significant events at work, significant events outside of work and significant events in personal life.
The survey questions asks if a respondent has experienced a significant life event the answer is yes or no. if yes, please describe it as positive,negative or neither. The person is asked to then indicate if the event made them focus more on past present or future.
The data structure will be multilevel ie persons within organisations.
I am trying to find out if I can create typologies of say Positive future based work life event, past negative based personal life event etc
Would latent class analysis be appropriate to do this?
I have a question concerning the sample size. I have conducted a latent class analysis with 34 subjects (thus, I have a very small sample). How does the sample size influence the estimates/fit indices?
Hard to say in general and much of this depends on how large the class separation is. The standard errors might not be that good. You need a Monte Carlo study to get a better feel for this. The BIC tends to underestimate the number of classes at low sample sizes as shown in the Nylund et al article.
I have recently submitted a paper where I've used latent class analysis (with binary indicators) to investigate subtypes of cognitive impairment. The results suggested a four-class solution; two of these classes represented global cognitive impairment across domains (a mild and a severe class) while the other two represented impairment in specific domains.
A reviewer commented that the global findings differed from a number of previous factor analyses of cognitive data and asked for more explanation.
I used the same dataset and ran an exploratory factor analysis on the same binary indicator variables and this suggested a three factor solution. The first two factors looked like my domain-specific deficit classes but there was no global type of factor. So the observation of the global classes doesn't seem to be related to anomalies in my dataset but rather with differences between the two methods.
My question then is whether there is any theoretical reason that factor analysis might be less likely to yield global factors (ie high or low scores across all observed indicators) than latent class analysis? I haven't been able to find anything in the literature yet.
Exploratory factor analysis is not well suited for finding a global (general) factor. You would have to do a CFA to capture that - a "general-factor, specific-factor" model (see handout for Mplus short course topic 1). But you should also investigate if EFA/CFA or LCA fit the data better by comparing their BIC values. One model may very well be a lot better than the others in terms of BIC and should therefore be chosen on statistical grounds. For an example, also using factor mixture analysis, see
Julie Mount posted on Thursday, June 18, 2009 - 3:23 pm
Thank you, this is really useful. For the LCA (four class) the BIC is 1316 and for the EFA (3 factors) it's 8232. A big difference!
mpduser1 posted on Tuesday, June 21, 2011 - 10:40 am
I'm generating a series of models in Mplus in an attempt to determine if latent class membership differentiates classes on some dichotomous distal outcome, Y, after adjusting for a series of covariates, X.
I want to make sure I've set the problem up correctly (and am interpreting the results correctly) within the Mplus framework.
My latent class model (L) has three classes, predicted based on a series of indicators (I).
What I've done is first estimated:
L --> I L --> Y
In doing so I found that my classes differ on Y. In other words, the probability of endorsing the dichotomous item Y varies by latent classes.
Then, I want to see if the cross-class variation in probability of endorsing Y changes after using covariates X. So, I estimate:
L --> I L --> Y X --> Y
In many instances, the results appear to make sense. But for one particular Y, I obtain latent class probabilities of endorsing Y as 0 (or, close to it) for all of my latent classes.
I'm wondering if this makes sense or if I've set the model up wrong -- essentially none of my respondents, regardless of latent class, are endorsing Y after I adjust for X?
Leslie Roos posted on Tuesday, March 13, 2012 - 1:40 pm
I am a student new to MPlus and LCA and trying to determine which module would be best for my analyses:
M-Plus Base Program or M-Plus Base program with Combination Add on.
The project I am working on is with a large population database and will investigate whether 'classes' derived from a range of childhood variables (some multi-level varying in frequency and severity) will predict a binary adult outcome.
Any advice would be much appreciated. I'm not sure I fully understand what '2-level LCA' would look like.
I am struggling with model interpretation. I have approximately 104,000 respondents (37 items) and 99,000 different response patterns so my response space is almost a uniform distribution. It is my understanding that MPLUS classifies on the basis of response pattern so this distribution presumably affects interpretation.
Can you please tell me how LCA decides class membership? And, does the response pattern composition affect class membership?
The response patterns are a summary of the categorical data. The do not affect class membership. Classes are formed by finding the model with the lowest correlations among the latent class indicators for each class. The model assumes conditional independence. The interpretation of the classes is based on the profiles of the latent class indicators.
Tim Stump posted on Saturday, September 22, 2012 - 8:32 pm
I'm running LCA with bayes estimator. I don't seem to get Bayes Information Criteria (BIC) to compare models with different number of classes as with ML or MLR estimator. I've just started to learn this type of model. Maybe I'm missing something easy about how to obtain BIC. Key parts of input program are below:
ANALYSIS: TYPE = mixture; estimator=bayes; fbiterations=10000; processors=4; MODEL: %overall% %c#1% svdrng1-svdrng11 with svdrng1-svdrng11(p1-p55);
%c#2% svdrng1-svdrng11 with svdrng1-svdrng11(q1-q55); MODEL PRIORS: p1-p55~IW(0,100000); q1-q55~IW(0,100000); OUTPUT: TECH1 tech10 tech11 tech14;
Tim Stump posted on Monday, September 24, 2012 - 7:55 am
If BIC is not available, are there other suggestions/guidelines on determining number of classes when using bayes estimator for LCA?
Tim Stump posted on Monday, September 24, 2012 - 8:54 am
For LCA with bayes estimator, is deviance information criterion (DIC) available? Is there an option I can specify to get this displayed in the output.
Tim Stump posted on Monday, September 24, 2012 - 9:25 am
Sorry, I'm showing my ignorance in this topic. But, looking back at slides from a previous workshop, I see that posterior predictive p-value is useful for determining model fit with bayes estimator along with posterior predictive checking histogram - difference in chi-square (difference between real/replicated data). Would this be a useful method to determine number of classes from an LCA model?
Tim Stump posted on Friday, September 28, 2012 - 7:20 am
Thank you for referencing that article, very helpful.
Ed Dunne posted on Monday, November 12, 2012 - 5:41 pm
I have a question about within-LCA class comparisons. To your knowledge, has anyone every compared participants within a specific class? For example, if I have four classes, can I take only ONE of those classes and make comparisons based on gender (i.e., males in this group were more likely to do X when compared to females in this group)?
I have a question similar to the third of this thread, but since it's ten years later now, the views and possibilities with regard to this question might have changed.
My dataset contains responses of approx 1750 teachers who work in approx 100 schools. The questions I askes the teachers are about the school culture and the school leader. A latent class analysis seems the best approach, but how to take the nested structure into account? I would like to assign schools to latent classes of school culture and assign schools to latent classes of educational leadership. In the next step, I would like to examine relationships between the two.
Would it be possible to conduct these analyses in MPlus, and if so: how?
Thank you in advance, Marieke
IYH Boon posted on Tuesday, November 11, 2014 - 1:14 pm
I was wondering if you know of a paper that lays out the basic estimating equations for (longitudinal) latent class analysis? Something akin to what you lay out in Eq (18)-(20) in Muthen 2001 ("Latent Variable Mixutre Modeling), but for LCA with repeated measures.
Thanks in advance for any help you're able to offer.
I have a relatively small dataset of 28 teams (108 individuals). I want to make classes with LCA, based on a team’s growth pattern (of a team-level variable). When I perform the LCA, Mplus delivers classes. The entropy, class-sizes and latent class probabilities are all good. However, looking at the standardized model results, the estimates, S.E. etc. are all 999.000.
Do you have any suggestions or possible explanations?
Hi Dr. Muthen, I am a new Mplus user. I am running this model with all binary indicators: Variable: Names are codice_scuola_8_14_15 GESTIONE d10a_d [...] d11k_d; Missing are all (9999) ; IDVARIABLE = codice_scuola_8_14_15; Usevariables are d10a_d - d11k_d; Categorical are d10a_d - d11k_d; classes = c (3); Analysis: Type = mixture ; MITERATION = 500; STARTS = 2500 25; STITERATIONS = 10; Output: SAMPSTAT STANDARDIZED tech11; SAVEDATA: file is lca_mgm.txt ; save is cprob; format is free;
But I cannot run SAMPSTAT and STANDARDIZED outputs, getting this message: *** WARNING in OUTPUT command STANDARDIZED (STD, STDY, STDYX) options for TYPE=MIXTURE with categorical, count, censored or nominal outcomes are available only with ALGORITHM=INTEGRATION. Request for STANDARDIZED (STD, STDY, STDYX) is ignored. *** WARNING in OUTPUT command SAMPSTAT option is not available when all outcomes are censored, ordered categorical, unordered categorical (nominal), count or continuous-time survival variables. Request for SAMPSTAT is ignored.
How can I modify the code to have these outputs? Thank you.
Hi, I am running a 2-time LTA model with 5 indicators and 5 classes. When I save the data using the SAVEDATA command I nicely get the raw scores of the indicators and the id variable followed by 25 cprob columns, a c1 and a c2 columns and an MLCJOINT column. Three questions please:
1. what is the meaning of the ordered cprob1-cprob25 columns? c2#1 ON c1#1 c2#1 ON c1#2 etc? they go up to 20 not 25! What am I missing?
2. Are the C1 and C2 columns the class assignment using posterior probabilities for Time 1 and Time 2, respectively?
In case of having two latent class variables, how can i know the correct number of classes for each of them? cause i find tech 11 and tech 14 are suitable only for 1 variable. Is the way that i first treat with each of the two latent class variables separately before the overall analysis?
Dear Dr. Muthen, I hope you are well. I run a 5-class LTA model with two time points. I posited invariant solutions at both time points, freely estimated probabilities, I used SVALUES and I saved cprob values (C1 and C2) using modal assignment. My question relates to the fact that the transition probabilities I obtained from the output (right below) deviate markedly from the probabilities obtained when I cross-tabulate the C1 and C2 probabilities obtained when I save membership. I did not employ the 3-step approach. Can you please advice, I am obviously doing something wrong. thank you, GS
LATENT TRANSITION PROBABILITIES BASED ON THE ESTIMATED MODEL