Mplus Discussion >> Latent Profile Analysis

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Latent Profile Analysis

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Monica Oxford posted on Monday, March 19, 2001 - 4:25 pm

Hi. I have a question regarding Latent Profile Analysis. I have several measures of child "executive function" that include behavior (e.g. impulsivity and attention) and language (e.g. expressive and reflective) that I am using in a profile analysis. These measures are popular in the field and are measured on different scales and thus have different variances. I was wondering if there was a general rule about the degree of difference (between smallest/largest) in variances for continuous items in a latent profile analysis (I know in this is an issue in other forms of "profile analyses" e.g. Tabachnick & Fidell and an issue in SEM e.g., Kline or Bentler). Is it a requirement that all items are measured on the same scale and have similar variances?? Thanks in advance.

Linda K. Muthen posted on Thursday, March 22, 2001 - 9:39 am

No, it is not a requirement that all items be measured on the same scale and have the similar variances. Putting items on the same scale may, however, help convergence.

Monica posted on Monday, May 07, 2001 - 1:41 pm

Hi again. Continued discussion on "child executive function" from the earlier post (3/19). I tested the adequacy of my three class latent profile model by giving each class different start values to make sure the solution I got was the "right" one. The model appeared stable (results and log likelihood). Next, I simply changed the order of class 1 and 2 and kept the same start values (so, in my mind the results shouldn't have changed just the order of the results--what were class "2" results should have become class "1" results).

I ended up with different results both on the means within class and on class sizes (prior I had 27%, 36%, 36% and now have 37%, 37%, and 25%, where the third class [the same class in both analyses] was 36% is now 25%, class one was 27% now 37%) leading to different conclusions, which makes me a little concerned.

Is there something I have overlooked or should be concerned about given the changes in results? My operating assumption was that the start values were important, not the order of the classes (by the way, I am using actual start values, instead of 1 and -1, for convergence, it helps because the items I am using are on different scales). Advice??

Thanks in advance.

Linda K. Muthen posted on Tuesday, May 08, 2001 - 7:48 am

I would need to see both of the outputs to answer this question. Please send them to support@statmodel.com.

Monica posted on Monday, May 21, 2001 - 3:15 pm

Thanks for the offer to look at my output. However, I discovered that the issue I raised was a mistake on my part. The models are the same even though I changed the order. Thanks anyway.

Beth McCreary posted on Wednesday, November 21, 2001 - 10:26 am

I've completed a confirmatory factor analysis with three continuous LVs each represented by a set of indicators (which are items on a paper-and-pencil scale), and the fit appears acceptable. The indicators are each scored from "1" to "4" in a Likert-type response format. I have a preconceived hypothesis that the participants should fall into five separate categories based on their scores on the three continuous factors. For example, those "high" on the first factor and "low" on the other two will form one group, those "high" on the second and third factors will form a second group (regardless of scores on the first factor), etc. In addition, I expect to see certain gender differences in the proportions of participants assigned to each category. Is there a way to conduct a confirmatory latent profile analysis to test this hypothesis? Would this be an appropriate thing to do? If so, could you please route me to a reference and/or example in MPlus2? Thank you and happy Thanksgiving!

Bmuthen posted on Thursday, November 22, 2001 - 8:55 am

You may be interested in a new paper by Lubke, Muthen, Larsen (2001), Global and local identifiability of factor mixture models. This can be requested from bmuthen@ucla.edu by mentioning paper 94.

Anonymous posted on Friday, March 08, 2002 - 10:00 am

Hi -- I am doing a latent profile analysis, using six indicators of "social capital", each measured on a 1-10 Likert-type scale. My model converges with all variances constrained (the BIC continues to decrease up to a six-class model, but a 3-class model fits better with theory, gives better class probabilities, and the entropy measure is higher). When I free variances past a 1-class model I have a variety of problems -- including within class means that are outside the scale, and/or at the min or max, and with 0 variance (and a variety of error messages re. the model not converging). Also, the class sizes change and patterns between the variable means within classes change when any variances are freed. My data is negatively skewed (less skewed within the classes than in the full group), but within the limits recommended by Kline. Should I trust my results with the variances constrained? Or, can you recomment how to proceed?

bmuthen posted on Friday, March 08, 2002 - 5:47 pm

Latent profile analysis can have these types of behaviors when the variances are allowed to vary across classes. The literature so far seems to have little guidance to offer in this area. You may want to consider the following approach. Using the model with class-invariant variances, you can classify individuals into the latent classes using their posterior probabilities. You can then go back to the raw data and study the variation of each variable in each class. If a variable is considerably more or less variable in a certain class, you can modify the model to allow that variable to have a class-specific variance for that class.

Anonymous posted on Monday, June 10, 2002 - 3:43 pm

I want to classify respondents from several ethnic groups into classes, three classes for each ethnic group. There are 24 5-point Likert variables (never to always) that measure five latent constructs. The classification will be based on the five latent constructs. The minimum and maximum subsample sizes are 190 and 300, totaling about 1,000. I want to see how the class proportions differ from one another group.
Could you give some guidence on how to run this analysis. Thanks!

bmuthen posted on Tuesday, June 11, 2002 - 9:31 am

Let me first ask you if by classes you refer to a latent class(latent profile; LPA) analysis using the 5 latent constructs? If so, have you done preliminary LPA analyses of the factor scores within each ethnic group?

Anonymous posted on Tuesday, June 11, 2002 - 10:04 am

I have tried LPA with the factor scores for one subsample and the result looked ok! I am not quite sure if I should procede with this approach. Should I obtain the factor scores from a Multigroup CFA or from a single group CFA? If multigroup CFA is preferred, what constraints are needed on what parameters?

bmuthen posted on Tuesday, June 11, 2002 - 11:26 am

A multiple-group analysis is very valuable to do first because you want to make sure that you have a sufficient degree of measurement invariance before you compare the latent variables (or classes from them) across groups. You should use the default Mplus setup for a multiple-group meanstructure analysis which holds intercepts and loadings equal across groups. You can then look at modification indices to see if some items are not invariant wrt to either parameter type (intercept or loading).

Anonymous posted on Saturday, February 28, 2004 - 6:51 pm

Hello
I am running a Latent Profile Analysis using a set of 15 behavioral characteristics. Some of the characteristics are highly correlated (e.g., .7 to .8), but the majority of characteristics have moderate to low relationships. Only 10 pairs of variables from the entire correlation matrix showed correlations above .7. Also, all variables are on the same metric (T scores).

In one run, the variables were considered independent, where the latent class variable was driving the relationship between the observed variables. In a second run, those variables which were highly related were allowed to correlate (using the WITH statement).
In the run which considered the variables to be independent, the results were much more meaningful (e.g., lower BIC, higher entropy, MUCH easier to interpret) than the results in which the selected variables were correlated.

Can the LPA solution which considers the variables to be independent be interpreted? Or, is this solution 'invalid' due to the high correlations between some of the variables? How strong is the assumption of independent variables when running/intepreting LPAs?

Thank you for your comments and also for MPLUS.

bmuthen posted on Sunday, February 29, 2004 - 7:49 am

The sample correlations should be signficant for LPA. It is the within class correlations that are zero. Although LPA specifies zero within-class correlations among the variables, it reproduces correlations among the variables because the variables are all influenced by the latent class variable, so the variables become correlated when mixing across the classes. If some variables correlate more than others this can be due to these variables differing more in means across the classes than other variables. This means that you don't have to include WITH statements to make your model fit. Perhaps you need to include more classes, which have particularly high across-class mean differences on the highly correlated variables. It is also the case that IF you allow WITH for some variables, you may be able to use a smaller number of classes and still get the same model fit. WITH represents within-class correlation and should have a well-interpretable substantive meaning such as a measurement methods effect. So, to some extent classes and WITHs have similar effects on model fit, and substantive arguments will have to be brough in to make a choice. Related to this, you may also study chapter 3 of the Hagenaars-McCutcheon latent class book of 2002 published by Cambridge Univ Press, "Applied Latent Class Analysis".

Tom Hildebrandt posted on Tuesday, November 30, 2004 - 10:25 am

As I've seen LPA used and described as a way to identify homogeneous populations within a larger heterogeneous population, indicator variables are usually either all continuous or all binary/categorical. What are the potential problems of combining binary/categorical indicators and continuous indicators in the use of LPA?

Linda K. Muthen posted on Tuesday, November 30, 2004 - 10:46 am

This should not present a problem. It can be done in Mplus.

Tom Hildebrandt posted on Tuesday, November 30, 2004 - 10:54 am

Thank you for the quick response.

Do you know of a good example of where this mixed model has been applied using LPA to describe subpopulations within a heterogeneous group? I'm currious as to how descriptions of the differences between groups among the indicator variables are made (means for continuous and item endorsement probabilities for binary/categorical)?

Linda K. Muthen posted on Tuesday, November 30, 2004 - 10:57 am

I don't know of any reference for this.

bmuthen posted on Tuesday, November 30, 2004 - 11:00 am

Some of this is discussed in the Vermunt-Magidson chapter 3 in the Hagenaars-McCutcheon book Applied Latent Class Analysis.

Tom Hildebrandt posted on Tuesday, November 30, 2004 - 11:48 am

Thank you both again.

I'm still waiting for the book to arrive. I'm anxious to get a chance to read through it, given your recomendations previously for LCA related questions

Scott Roesch posted on Tuesday, November 30, 2004 - 6:25 pm

Can anyone point me to a resource in which latent profile analysis was used with MPlus, and/or a general introduction to latent profile analysis including a description of the parameters that the analysis generates to determine these profiles? Thanks!

bmuthen posted on Tuesday, November 30, 2004 - 7:49 pm

Although not using Mplus, the Vermunt-Magidson chapter 3 in the Hagenaars-McCutcheon book Applied Latent Class Analysis is useful in this regard. An introduction using Mplus has yet to be written.

Scott C. Roesch posted on Friday, January 28, 2005 - 4:41 pm

We have just run a latent profile analysis using Mplus. We have 18 variables that are continuous in nature and 1 variable that is categorical with 4 levels or groups. With respect to the output, we understand how to interpret the output for the 18 continuous variables. However, the output for the 1 categorical variable is unclear to us. Values for this variable are listed under the heading Means, and give us values only for 3 of the 4 groups that compose this categorical variable. Our questions include (a) why are these categories list under Means?
(b) shouldn't we be getting proportions for this variable since it is categorical? and (c) in general, if these means are interpretatively
meaningful, what do negative means tell us? Thank you for any help you can provide.

Linda K. Muthen posted on Friday, January 28, 2005 - 8:13 pm

If this is an observed categorical variable, then you should get thresholds. This variable should be on the CATEGORICAL list. If this is a categorical latent variable, you should get means. I think you mean the former but am not totally certain.

Scott C. Roesch posted on Saturday, January 29, 2005 - 9:18 am

I now change the categorical variable to be listed as CATEGORICAL rather than NOMINAL, and received the thresholds. I guess I am still confused why I did not receive probabilities for these as well, like one receives in an LCA. Thanks!

Linda K. Muthen posted on Saturday, January 29, 2005 - 1:46 pm

With binary outcomes, CATEGORICAL and NOMINAL should yield the same results. I suggest that you send the two outputs and data to support@statmodel.com to be checked. You may not be using the most recent version of Mplus or there may be another explanation. I would need more information to determine this.

JJ posted on Friday, February 04, 2005 - 4:34 pm

I have a question regarding the determination of the appropriate number of classes in an LPA. For example, if the Vuong-Lo-Mendell-Rubin Likelihood test is not significant for a 3 class solution (compared to a 2 class solution) but the BIC is smaller for the 3 class solution, which should trump? Meaning...how do go about evaluating whether the 2 or 3 class solution is superior?

bmuthen posted on Friday, February 04, 2005 - 6:03 pm

This does not have a simple answer. BIC and LMR can disagree. You may also want to consider sample-size-adjusted BIC which has shown superior results in some studies. When fit indices do not give a clear answer I would go with interpretability - often a k-class solution is merely an elaboration of a (k-1)-class solution, not a contradictory finding.

Also, are you sure you are interpreting the MLR p value correctly? See the User's Guide.

JJ posted on Monday, February 14, 2005 - 8:30 pm

Could you tell me how MPlus sorts results files (.dat).? I have imported a results file into SPSS and want to be able to link subjects to their original case id�s�I should be able to do this if I can figure out how Mplus is sorting the file. Just so you have a little background (if necessary to answer the question), the LPA that I conducted includes only a subset of the total subjects in the original data file. The original file includes 3 sets of subjects, and I used the command syntax to include only those subjects with a code=1 on a categorical variable in the data set. Thus, the LPA was only conducted on these subjects in this specific analysis.

Linda K. Muthen posted on Tuesday, February 15, 2005 - 6:57 am

Sorting varies. If you are saving individual data, you should be able to use the IDVARIABLE option of the SAVEDATA command.

JJ posted on Tuesday, February 15, 2005 - 9:34 pm

I have now been able to save the ID it is saving it like this 10.000**********, which is not the proper format. The subject ids are supposed to look like this 030100102. Can you suggest how I might change the commands so that the subject id's are accurately saved?

Linda K. Muthen posted on Wednesday, February 16, 2005 - 7:21 am

See the Mplus User's Guide where it states that the length of the ID variable can not exceed seven. You will have to shorten this variable. There is usually a unique part that does not exceed seven.

Anonymous posted on Thursday, February 17, 2005 - 4:56 pm

Regarding the interpretation of the MLR, discussed on Feb.4...If the p value of the MLR is less than .05, this means that the solution is superior to the k-1 solution? Conversely, if the p value is greater than .05, the k-1 solution is superior. Is this correct? Thank you.

bmuthen posted on Thursday, February 17, 2005 - 5:08 pm

You mean LMR (Lo-Mendell-Rubin). Yes, your description is correct.

Anonymous posted on Saturday, March 19, 2005 - 1:51 pm

I am running a latent profile analysis (LPA) of four count variables that index health care utilization (e.g. # ER visits). Initially I plunged ahead and did the LPA and found that a two class solution was indicated by the Vuong-Lo-Mendell-Rubin and Lo-Mendell-Rubin Likelihood Tests (i.e the two class colution was superior to the one class solution, and the three class solution did not improve on the two class solution). At the same time the BIC argued for a single class. I became concerned with the inconsistency and (as I should have done originally) I investigated the "Poissoness" of the utilization variables. On convexity plots three of the four variables showed deviations from poissoness. I suppose my initial question is, "In the latent mixture model context of an LPA how robust are findings to violations of dispersion for count (poisson) variables?" I took the additional step of running LPA's with inflation parameters. This showed more consistent results in terms of the likelihood ratio tests and the BIC, and argued for the existence of three groups. The problem with this is that I cannot seem to test 3 vs. 4 groups in order to establish this classification scheme with more certainty. I am receiving several error messages and do not think I am going to get the model to run. So I suppose my next question is this,"Assuming that I do not get the 3 vs. 4 class model to run, would it be reasonable to acknowledge the existence of three classes, establish that the three class solution is a variation on the two class (k-1) model and move on with my analyses using two groups?"

bmuthen posted on Saturday, March 19, 2005 - 4:12 pm

It sounds like you needed the zero-inflated version of the Poisson model. But you say you don't get a solution for 4 classes - or perhaps you don't get a tech11 (LMR) result in the 4-class run; I am not sure from your message. If you have tried to use many random starts (say starts = 100 5) and still fail, it may be due to 4 classes being too ill defined in these data, and staying with 3 is the way to go. So my inclination would be to say yes to your last question.

Anonymous posted on Friday, April 22, 2005 - 10:02 am

Hello
I have run a k-means cluster analysis and a LPA analysis on the same set of data. I found an 8 cluster solution that made sense. But, I only found 4 classes (5 classes would not converge)
I've tried varied start values for the LPA, allowed variables to correlate within class, etc. in attempts to try to get the same number of groups across both methods.

My question is: should I expect the procedures to uncover the same number of classes/clusters or could I find different solutions because one method is uncovering latent groups/subpopulations and one method is working more on the observed level?

bmuthen posted on Friday, April 22, 2005 - 11:07 am

I think k-means clustering uses a more restrictive model that LPA - doesn't it also assume equal variances across variables (in addition to the assumption of equality of variances across clusters)? See for example McLachlan's new Wiley book on Microarray analysis. In Mplus you can add the equal variance restriction.

Anonymous posted on Friday, April 22, 2005 - 11:51 am

Thank you for your reply.
You're correcct - in k-means variables should be roughly equal across variables.
I was wondering if the differences in solutions was related to "level" of results (latent classes vs. observable clusters). In general, I haven't seen LPA models uncover as many groups as Cluster Analysis (mainly 2-4 classes found). I know that hierarchial cluster methods (e.g. Wards) let you 'see' the different cluster solutions & was wondering if this was similar to differences between k-means and LPA.

In MPLUS, the default is equal variance across cluster, correct? Is this relaxed with the WITH Statment to allow correlations between variables?

bmuthen posted on Friday, April 22, 2005 - 3:20 pm

I don't see the "level" of results as being different between the two approaches. You can "vizualize" the LPA results by using Mplus to plot the observed variable mean profiles for the different classes. You probably get more LPA classes when you hold variances equal across variables (try it). Yes, the Mplus default is equal variances across classes (but not across variables). And adding WITH statements relaxes the conditional independence assumption, allowing correlations. See also the Vermunt-Magidson article in the Hagenaars-McCutcheon Applied LCA book (Mplus web site refs).

Anonymous posted on Monday, April 25, 2005 - 8:03 am

Thank you again for your reply.

How do you hold the variances equal across variables?
I'm not sure if this is needed, since I am dealing with T-scores, but the variances should differ by class.

Also, on p.121 of the MPLUS (Ver 3) manual, an example mentions that by mentioning the variances of the latent class indicators, the default equality constraint of equal variances (across classes) is relaxed. Will this allow for estimates of different variances within each class as well as different variances for individual variables?

However, to compare to k-means, which creates groups based on minimum w/in cluster error, shouldn't the MPLUS default be imposed?

Linda K. Muthen posted on Monday, April 25, 2005 - 2:56 pm

To hold variances equal across variables, give the variable names which is how you refer to variances and use parentheses with a number inside to represent equality,

y1 y2 y3 (1);

holds the variances of y1, y2, and y3 equal.

In the example, the equality constraint on a regression slope is relaxed. If you want to relax the equality constraint on another parameter such as a variance, then you would mention that parameter.

If you want to compare to k-means, then you should place the same constraints as k-means does.

bmuthen posted on Monday, April 25, 2005 - 3:01 pm

You mention T scores so it sounds like you are standardizing your observed variables. This may be necessary for k-means clustering. I would, however, recommend not doing that in the LPA - and if the variables have different metrics then also not hold the variances equal across variables (only across classes).

Anonymous posted on Tuesday, April 26, 2005 - 1:36 pm

Regarding yesterday's discussion about comparisons between LPA & k-means - Thank you very much. You both cleared up a lot of questions.

Dr Muthen, you mentioned that I would probably get more LPA classes when variances were held equal across variables (4/22 note)-- and this did produce results very similar to k-means. (up to 8 classes found before nonconvergence)

However, when variances were allowed to vary across classes (but not across variables), there were fewer number of classes found (up to 4 classes found).

Why would relaxing an assumption lead to finding fewer classes?
Thanks again for your assistance

bmuthen posted on Tuesday, April 26, 2005 - 2:02 pm

The more flexible the model is for each class, the better it can fit data and therefore the fewer classes you need. Your finding suggests that the "true" classes have different variances (across classes). If class-varying variances is the true state of nature and you force classes to have equal variances in your analysis, you have to have more classes in order to fit the data. Same thing if the true state of nature is within-class covariance - if you force classes to be formed with uncorrelated variables within class, then you need more classes to fit the data (this can be vizualized if you draw a 2-dimensional plot with a single correlated pair of variables - that 1-class data situation need 2 or more uncorrelated classes to be fit).

Anonymous posted on Wednesday, April 27, 2005 - 1:07 pm

Re: yesterday's conversation: Thank you very much.
So, if I have this right, with LPA we may want to start with a restrictive model (essentially K-means) and systematically "relax" assumptions (allow different variances across classes, allow covariances w/in class) until we find the model that fits the best in terms of parsimony, interpretablity, and fit indices -- correct?

Is there any reference for this procedure or is it just standard practice?
Thanks again -- this conversation has been most helpful.

BMuthen posted on Wednesday, April 27, 2005 - 5:58 pm

Sounds correct.

See Chapter 3 by Vermunt and Magidson, Latent cluster analysis, in Hagenaars and McCutcheon's book Applied Latent Class Analysis.

Anonymous posted on Saturday, May 21, 2005 - 5:29 pm

I am trying to specify a latent profile analysis with covariates. I want the the latent class variable to be measured by one set of variables, and class membership to be "predicted" using a *different* set of variables. Most of the examples in Chapter 7 of the User's Guide have the covariates ALSO affecting (or covarying with)the indicators of class membership.

I've tried this:

model: %overall%
c#1 by Fsamed FsameBm AshrCC blauCCm blauCCv;
c#1 on meanphd acadappl psoc pmale quant sameIB samephdB;

But MPLUS output tells me it is no longer allowed, and I should see chapter 9, which is about multilevel modeling and complex data ... I couldn't see the link. Can you tell me how to model this?

Linda K. Muthen posted on Saturday, May 21, 2005 - 5:45 pm

The BY option was used in Version 1 for latent profile analysis. It is no longer used. See Example 7.12 for the Version 3 specification. Just delete the CATEGORICAL option because your indicators are continuous and delete the direct effect u4 ON x; from the MODEL command.

Anonymous posted on Sunday, May 22, 2005 - 6:26 pm

Thanks for the quick reply, that worked!
I am now wondering about how to get all possible contrasts for the multinomial logistic regression of the latent class variable on the covariates. I am working with 3 classes.

When I type:

c#1 on meanphd acadappl psoc pmale quant sameIB samephdB;
c#2 on meanphd acadappl psoc pmale quant sameIB samephdB;

MPLUS appears to give me the effect that each of these covariates has on the probability of being in the stated class (1 or 2) relative to being in class 3. But what about the probability of being in class 2 relative to class 3? MPLUS would not allow me to make any reference to the "last" class (#3) at all.

Linda K. Muthen posted on Sunday, May 22, 2005 - 6:39 pm

c#2 on meanphd acadappl psoc pmale quant sameIB samephdB;

gives the probability of being in class 2 relative to class 3. You can't make reference to the last class. It is the reference class with coefficients zero. See Chapter 13 of the Version 3 Mplus User's Guide for a description of multinomial logistic regression.

Anonymous posted on Thursday, May 26, 2005 - 2:52 pm

oops, sorry, I wasn't clear.

c#1 on meanphd acadappl psoc pmale quant sameIB samephdB;
gives the probability of being in class 1 relative to class 3.

c#2 on meanphd acadappl psoc pmale quant sameIB samephdB;
gives the probability of being in class 2 relative to class 3.

How do I get the probably of being in class 1 relative to class 2? (In STATA, "Mcross" gives you such results).

thanks!

Linda K. Muthen posted on Thursday, May 26, 2005 - 4:18 pm

You would have to make class 2 the last class to do this. You can do this by using the old class 2 ending values as user-specified starting values for class 3 in the run where you want to compare class 1 to class 3.

Stephen Gilman posted on Tuesday, September 20, 2005 - 8:05 am

Hello, I am considering estimating a latent profile analysis using a set of behavior ratings measured on a 5-point Likert scale. An alternative to this would be treating the items as ordinal, and estimating a latent class analysis. Another alternative is to consider the items as nominal. Is there any empirical way to determine which parameterization is most appropriate? The BIC from the 3 models is: 289368.688 from the LPA of the ratings treated as continuous indicators; 290569.173 from the LCA of the ratings treated as ordinal/categorical; and 290953.619 from the LCA of the ratings treated as nomial (2 class solution for each model). Thanks for your advice.

Linda K. Muthen posted on Wednesday, September 21, 2005 - 7:37 am

I don't think you can make this determination by comparing BIC's. I would need to know more about these variables to answer this but basically if this is an ordered polytomous variable, it is best to treat it that way. If it does not have strong floor or ceilling effects, you may be able to treat it as continous. I am not sure why you would want to treat it as nominal.

Sandra posted on Thursday, October 20, 2005 - 7:41 am

Hello,

I�m working on a latent profile analysis using seven scales which measure different life goals. I tried some mixture models where I allowed variables to be correlated within classes and with variances allowed to vary across classes.

My problem is that even in the two-class solution I receive a class in which one scale (a_aibz) has a variance of zero. This scale measures relationship goals and has already in the empirical data set a very small variance. Fixing the variance to zero in one class does not solve the problem, because Mplus tells me that the covariance matrix could not be inverted.

Is there anything I can do to avoid this? Shall I drop the scale from the analysis? If not, what is the reason for this problem?

I attach the output of the two class mixture solution with variances set free across the classes:

VARIABLE: Names are
a_aipw a_aibz a_aigs
a_aige a_aiws a_airu a_aiat a_aihe;
Usevar are a_aipw a_aibz a_aigs a_aige a_aiws a_airu a_aiat;
missing are all (-99);
classes = c(2);
Analysis: Type = mixture;
start = 50 10;
miterations = 1000;
Model:
%overall%
a_aipw a_aibz a_aigs a_aige a_aiws a_airu a_aiat;
%c#1%
a_aipw a_aibz a_aigs a_aige a_aiws a_airu a_aiat;
%c#2%
a_aipw a_aibz a_aigs a_aige a_aiws a_airu a_aiat;

Output:
sampstat tech1 tech2 tech3 tech11 tech13 stand;
Plot:
Type = plot3;
Series = a_aipw a_aibz a_aigs a_aige a_aiws a_airu a_aiat(*);
SAVEDATA:
file is AI_2cl.dat;
save = cprobabilities;

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A CHANGE IN THE
CLASS COUNTS DURING THE LAST E STEP.

AN INSUFFICENT NUMBER OF E STEP ITERATIONS MAY HAVE BEEN USED. INCREASE
THE NUMBER OF MITERATIONS. ESTIMATES CANNOT BE TRUSTED. THE CLASS COUNTS CHANGED IN THE LAST EM ITERATION FOR CLASS 1.

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

Latent Classes
1 2591.21830 0.60727
2 1675.78170 0.39273

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON ESTIMATED POSTERIOR PROBABILITIES

Latent Classes
1 2591.21830 0.60727
2 1675.78170 0.39273

CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions
Latent Classes
1 2593 0.60769
2 1674 0.39231

Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)
by Latent Class (Column)
1 2
1 0.999 0.001
2 0.001 0.999

MODEL RESULTS

Estimates

Latent Class 1

Means
A_AIPW 3.692
A_AIBZ 4.000
A_AIGS 3.137
A_AIGE 3.567
A_AIWS 2.663
A_AIRU 2.047
A_AIAT 2.692

Variances
A_AIPW 0.089
A_AIBZ 0.000
A_AIGS 0.280
A_AIGE 0.128
A_AIWS 0.453
A_AIRU 0.380
A_AIAT 0.383

Latent Class 2

Means
A_AIPW 3.358
A_AIBZ 3.497
A_AIGS 2.828
A_AIGE 3.195
A_AIWS 2.605
A_AIRU 1.945
A_AIAT 2.402

Variances
A_AIPW 0.204
A_AIBZ 0.196
A_AIGS 0.319
A_AIGE 0.261
A_AIWS 0.420
A_AIRU 0.333
A_AIAT 0.377

Categorical Latent Variables

Means
C#1 0.436

I�m looking forward to hearing from you, thank you very much.

Linda K. Muthen posted on Thursday, October 20, 2005 - 9:53 am

Your options are to increase the MITERATIONS as the error message suggests, hold the variance of the problem variable equal across classes, or remove the variable from the analysis.

As a rule, if it is necessary to show output to describe a problem, you should send your input, data, output, and license number to support@statmodel.com. We try to reserve Mplus Discussion for shorter posts.

Kris Anderson posted on Monday, March 20, 2006 - 2:23 pm

I would like to run a LPA on personality trait information we have collected. This data includes both probands and siblings. I would like to examine the LPAs but feel the siblings relations should be modeled. How would I best do this?

Bengt O. Muthen posted on Monday, March 20, 2006 - 3:44 pm

See the LCA section of my paper under Recent Papers on our web site:

Muth�n, B., Asparouhov, T. & Rebollo, I. (2006). Advances in behavioral genetics modeling using Mplus: Applications of factor mixture modeling to twin data. Forthcoming in the special issue "Advances in statistical models and methods", Twin Research and Human Genetics.

Kris Anderson posted on Wednesday, April 05, 2006 - 11:30 am

I have downloaded this paper and am trying to recreate these models. However, I am new to LPA, and I am not accounting for the presence of two latent class variables (and two groups of individuals) in the model. Is there another resource you might recommend?

Linda K. Muthen posted on Thursday, April 06, 2006 - 8:36 am

See Example 7.18 in the Version 4 Mplus User's Guide which is available on the website. You can also email bmuthen@ucla.edu to request the inputs.

Kris Anderson posted on Thursday, April 06, 2006 - 10:10 am

Thank you. I haven't had much luck using the sample for this model. I'll e-mail him.

Michael Beets posted on Wednesday, August 02, 2006 - 5:05 am

I am running a LTA for two time points on 10 likert-scale items at each time point and have arrived at a 4 class model (2 at each wave). I am attempting to run the model by freeing the variance to be estimated for each class separately. I am unsure if I have the correct model commands specified to request this.

Model c1:

%c1#1%
[s3ptp1-s3ptp12*] ;

s3ptp1-s3ptp12 ;

%c1#2%
[s3ptp1-s3ptp12*] ;

s3ptp1-s3ptp12 ;

Model c2:

%c2#1%
[s4ptp1-s4ptp12*] ;

s4ptp1-s4ptp12 ;

%c2#2%
[s4ptp1-s4ptp12*] ;

s4ptp1-s4ptp12 ;

Further, when I run this I receive the following error message:

THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.

WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.

Any suggestion would be appreciated.

Linda K. Muthen posted on Wednesday, August 02, 2006 - 9:40 am

I would need to see your input, data, output, and license number at support@statmodel.com to answer this.

Phil Herzberg posted on Monday, October 02, 2006 - 1:30 am

Hello,

I have two questions concerning LPA:
1) In the LCA & Cluster Analysis discussion bmuthen posted on Wednesday, February 08, 2006 - 6:29 pm that LRT can be bootstraped in M+4. How do i bootstrap the LRT (I assume it is not possible with the MLR estimator?).

2) In the output we got that IT MAY BE NECESSARY TO INCREASE THE NUMBER OF ANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.
We have 5 continious indicators with a range from 1 to 5, what is a good way to obtain starting values and are the starting values means in this case?
Can you give an example?

Any suggestion would be appreciated.

Bengt O. Muthen posted on Monday, October 02, 2006 - 2:42 pm

1) Use TECH14 in the OUTPUT command.

2) See the "STARTS" option in the version 4.1 UG on our web site.

Kelly Hand posted on Tuesday, October 03, 2006 - 6:57 pm

Hello

I have run both a latent class analysis and a latent profile analysis using 5 ordinally scaled items (5 point scale of agreement with 3 indicating "mixed feelings") about mothers' attitudes to employment and child care to create a typology of mothers employment "preferences". I plan to test this typology with a subsample of qualiative interviews.

I have found that the LPA solution is easier to interpret and is a better solution (although they are both good). But am concerned that it may not be acceptable to use 5 ordinal items in this manner. Is this ok to do in your opinion?

Unfortunately the survey only used a very limited number of items about this topic so I am unable to include any more items or create a scale.

I have also tried to search for a reference to support this but have had no luck. If you think it is an appropriate approach to take do you have any suggestions for a reference I could include in my paper?

Bengt O. Muthen posted on Tuesday, October 03, 2006 - 8:16 pm

This boils down to the usual choice of treating ordinal variables as categorical or continuous. I think treating them as continuous, using linear models, is often reasonable unless you have strong floor or ceiling effects. I would however worry if the two approaches gave different interpretations - if they do, I would be more inclined to rely on the categorical version. I would check that I had used a sufficient number of random starts (STARTS=) to make sure you have obtained the correct maximum likelihood solution. I can't think of relevant literature here.

Phil Herzberg posted on Wednesday, October 04, 2006 - 6:37 am

Dear Dr. Muth�n,

We would like to test four models (4 variables):
1) Variances are held equal across classes, covariances among latent class indicators are fixed to zero
2) allowed for class-dependent variances but constrained covariance terms to zero
3) allowed for class-dependent variances and held selected covariances equal across classes
4) allowed for class-dependent variances and allowed free estimation of selected covariance estimates within class

Are these the corresponding model-inputs?

Ad 1) %OVERALL%
Ad 2) %OVERALL%

%c#1%
y2 y3 y4 y5;

%c#2%
y2 y3 y4 y5;

We have no idea how to write the syntax for models 3 and 4, respectively. May you help with an example for model 3 and 4?

For model 2 we get the message: All variables are uncorrelated with all other variables within class. Check that this is what is intended. Does the model 2 syntax correspondent to what is intended by hypothesis 2?

Thank you very much in advantage,
Phil

Bengt O. Muthen posted on Thursday, October 05, 2006 - 7:02 am

Class-specific variances are obtained by mentioning them within each class, e.g.

%c#1%
y2-y5;

Free covariances are obtained by saying e.g.

y2 with y3;

Phil Herzberg posted on Thursday, October 05, 2006 - 8:24 am

Dear Dr. Muth�n,

thank you very much for your help.
Can we ignore the message for model 2 (Does the model 2 syntax correspondent to what is intended by hypothesis 2?)

All variables are uncorrelated with all other variables within class. Check that this is what is intended.

Thank you!

Linda K. Muthen posted on Thursday, October 05, 2006 - 9:45 am

Yes, you can ignore it if that is what you intended.

Phil Herzberg posted on Tuesday, October 24, 2006 - 3:25 am

Dear Linda,

thank you for you help, I was successful in reordering the classes and thereby maintainig all other parameters. My last question is how to reorder (last class as the largest) a model with this structure:

MODEL:
%OVERALL%

What happens with the bootstrapped Lo-Mendell-Rubin Likelihood ratio when the last class is not the largest one (for this model and in general)?

Thanks again, this conversation has been most helpful.

Linda K. Muthen posted on Tuesday, October 24, 2006 - 7:34 am

If you do not have class-specific MODEL parts, then you can't make the largest class last.

We delete the first class when testing the k and k-1 classes. This is why we suggest putting the largest class last. You would not want it to be deleted.

Bruce A. Cooper posted on Tuesday, February 20, 2007 - 5:31 pm

I'd like to obtain an LPA but allow correlations/covariances within class to be nonzero. Is this the way to do it?
E.G.:
...
CLASSES = c(3);
ANALYSIS:
TYPE = MIXTURE ;
MODEL:
%OVERALL%
MODEL:
%OVERALL%
y1 WITH y2 y3 y4 y5 y6 y7 y8 ;
y2 WITH y3 y4 y5 y6 y7 y8 ;
y3 WITH y4 y5 y6 y7 y8 ;
y4 WITH y5 y6 y7 y8 ;
y5 WITH y6 y7 y8 ;
y6 WITH y7 y8 ;
y7 WITH y8 ;

Thuy Nguyen posted on Wednesday, February 21, 2007 - 11:12 am

Yes, this will free the covariances within class while holding them equal across class.

anonymous posted on Thursday, March 29, 2007 - 11:41 am

Good day,

I'm sorry in advance if my question appear naive, I am new to these methods in an geographic area were few "coaches" exist.

I trying to allow for conditional dependance (within class correlations) in a latent profile analysis of seven different variables. I found at least four different ways of allowing conditional dependance:

(1) Including within class WITH statements between all of my indicators.
(2) Running a model with conditional independance and relying on modification indices to allow for partial conditional independance (including within class WITH statements between the variables that "could" be correlated according to the modification indices).
(3) Doing a factor mixture model without within class BY statements (fixing the factor loadings to remain equivalent accross classes).
(4) Doing a factor mixture model with within class BY statements (allowing for differential factor loadings across classes).

I believe that the main advantages of models 3 and 4 is that they result in less parameters beeing estimated.

However, I believe that the real "essence" of conditional dependance is more clearly captured by models 1 or 2. Am I right ?

Are there any other arguments or advantages and disadvantages of doing it one way or the other ?

Thank you very much for your time.

Bengt O. Muthen posted on Thursday, March 29, 2007 - 8:59 pm

1. Leads to an unstable model in line with the Everitt-Hand book that we cite under Mplus Examples - not recommended.

2. MI's don't work very well with mixture models, probably due to non-smooth likelihood surface - not recommended

3. Good idea; works well

4. Ok; not always needed beyond 3. Class-varying factor variances can be introduced instead.

anonymous posted on Friday, March 30, 2007 - 3:20 am

Thank you very much for this answer.

It clarify things a lot.

Could you please expand a bit on your answer to 4 (or suggest a reading on this topic). I'm not sure that I properly understand why the freeing up of within class factor variance would be equivalent to the model with free within class "BY" statements or why the more complexe model will not be needed past 3 classes.

Best regards

Bengt O. Muthen posted on Friday, March 30, 2007 - 8:34 am

Letting factor variances vary across classes is not the same as letting factor loadings vary across classes. However, I have found that a model with class-invariant loadings and class-varying variances often is suitable. I have tried several variations on the factor mixture modeling theme in my articles listed under "Papers" on our web site - see especially articles under the topics General Mixture Modeling and Factor Mixture Analysis.

Alex posted on Tuesday, April 03, 2007 - 9:17 pm

If my goal is to do a LPA (with two classes) of 3 variables (XX, XY, XZ). After trying the classical model, I can restrict indicators variance to be equal within class. Then I can try a less retricted model by allowing the variances to vary between class.

Following on the previous discussion, if I want to try for conditional dependance, I should rely on a factor mixture model letting the factor variances (and maybe the loadings) vary across classes.

My question is how I can combine conditional dependance (factor mixture) with the previous modifications of equal wihin class variances (A) and of unequal between class variances (B) ? Can I use commands such as theses or is there any additional "twist" ?
A:
%OVERALL%
f BY XX XY XZ ;
[f@0];
%c#1%
f;
[XX XY XZ];
XX (1);
XY (1);
XZ (1);
%c#2%
f;
[XX XY XZ];
XX (2);
XY (2);
XZ (2);

B:
%OVERALL%
f BY XX XY XZ ;
[f@0];
%c#1%
f;
[XX XY XZ];
XX XY XZ;
%c#2%
f;
[XX XY XZ];
XX XY XZ;

If theses commands are right, it would means that example 7.27 reflects a traditional LCA with conditional dependance, equal between class variances and unequal within class variances ?

Bengt O. Muthen posted on Wednesday, April 04, 2007 - 7:10 am

Yes, you can do this. A couple of comments:

- it seems unusual to restrict variances to be equal across variables within classes as you do in Model A. Typically, the variables are different and therefore variances are not comparable

- your equality statements in Model A can be simplified to, say

xx xy xz (1);

- the residual variance differences across classes that you specify in Model B may not be easy to achieve because this is a less well-defined model

- Ex 7.27 is different in that it has class-varying loadings. This may not be needed, however.

Alex posted on Wednesday, April 04, 2007 - 7:32 am

Thank you very much for this answer.
In fact, the variance restrictions do indeed fit less, suggesting differences.

Michael Giang posted on Saturday, April 28, 2007 - 12:24 am

I ran a LPA w/6 continuous indicators (values ranging between -2 to +5). Here's the dilemma:

The LMR indicates a 5 class model, and this makes substantive sense.

However, the AIC/BIC/ABIC values continue to decline (never rising) and i've tested this up to a 8 class model. BUT like a scree-test, the differences in IC values between models do decline greatly after the 5 class model.

In addition, the BLRT remains non-significant at every step/model.

No warnings were found and i did "start 500 20".

I'm in the process of correlating the variables (which I am not a fan off), but thus far, no resolve.

What do i make of this? Advice/ Suggestions?

Linda K. Muthen posted on Saturday, April 28, 2007 - 8:46 am

Sometimes statistics do not provide a clear indication of the number of classes. In this case, you need to rely on substance. It may be that LPA is not the best model for the data.

Michael Giang posted on Saturday, April 28, 2007 - 11:10 am

I'm satisfied with the 5 class model. In addition to substantive sense, it was choose it based on 1) LMR being & remaining non-significant after the 5 class model, and 2) the IC values begin to level off after the 5 class model. and what do i make of BLRT being non-significant at all steps? I plan on reporting BLRT, but indicating that it is potential limitation of the study? is this all sufficient?

Bengt O. Muthen posted on Saturday, April 28, 2007 - 12:06 pm

Your statement that BLRT is non-significant for all classes confuses me. In a k-class run, BLRT gives a p value for a k-1-class model being true versus the k-class model. So, a non-significant result (p >0.05) says that the k-1-class model is acceptable. So your statement implies that the k=1-class model is acceptable as judged by BLRT. Is this what you mean? If so, I would think BLRT is not applied correctly because it would imply that your variables are uncorrelated.

Michael Giang posted on Saturday, April 28, 2007 - 12:13 pm

apologies. i meant the BLRT remains significant at all classes/models, and was tested up to the 8 class model.

Bengt O. Muthen posted on Saturday, April 28, 2007 - 12:39 pm

If BLRT is correctly applied (no warning messages), that could be a sign of having a lot of power due to a large sample size, in which case I would rely on the substantive reasons for choosing number of classes.

Michael Giang posted on Saturday, April 28, 2007 - 1:29 pm

Thanks for the quick replies. The sample size was large (2000+). However, there was one warning "to increase the number of random starts using the starts option to avoid local maxima". I got this warning after increasing starts (500 20, and 1000 20), but read that this warning is typically issued?

So the take messages are to rely on substantive reasons, report LMR & IC values for statistical support, and also report the the BLRT (being significant at each model) but that it is sensitive to large sample size (thus power)?

Linda K. Muthen posted on Saturday, April 28, 2007 - 3:11 pm

The starts referred to in the warning is the LRTSTARTS option not the STARTS option. The default is LRTSTARTS = 0 0 20 5; You might try LRTSTARTS = 0 0 40 10;

I don't consider a sample size of 2000+ to be that large.

I assume that you have replicated the loglikelihood of your analysis model.

Michael Giang posted on Saturday, April 28, 2007 - 3:57 pm

I've tried increasing the LRSTARTS = 0 0 40 10, and the warning remains. No change in IC/LMR/loglikelihood values.

Linda K. Muthen posted on Saturday, April 28, 2007 - 4:33 pm

This sounds like a problem that is specific to your model and data. If you would like us to look at it further, please send your input, data, output, and license number to support@statmodel.com.

Sanjoy Bhattacharjee posted on Wednesday, May 09, 2007 - 1:10 pm

Dear Dr. Muthen(s),

We have Y11 �.. Y1T, Y21 �.. Y2T, ��..,Yn1 �. YnT;
Yit is continuous and Yit=f(Xit) where �i� indicates unit and T is the Tth time period.

We want to extract the possible grouping using Y�s as indicators.

I believe our Panel-mixture analysis will be in the line of Latent-profile mixture analysis (with covarites) rather than Latent-cluster mixture analysis since Y�s are continuous. Am I right? �. However there is a serial correlation or at least it is likely to be so and we need to test that.

Q1. Could you kindly suggest any established research on Panel-mixture analysis (rho across the error terms has to be calculated)?
Q2. Could we estimate the model using MPlus?

Thanks and regards
Sanjoy

Linda K. Muthen posted on Wednesday, May 09, 2007 - 2:35 pm

Yes, if the outcomes are continuous it is referred to as a Latent Profile Analysis rather than a Latent Class Analysis.

Q1. I don't know of any literature.
Q2. Yes.

Sanjoy Bhattacharjee posted on Wednesday, May 09, 2007 - 2:59 pm

Thank you Madam.
Sanjoy

Alex posted on Tuesday, May 22, 2007 - 3:01 pm

Greetings,

I'm doing a latent profile analysis (7 indicators) with covariates (4). I'm running models with conditional independance and models with conditional dependance (CD). For CD models, I rely on factor mixture models with class varying intercepts (only).

Is there any problems if I run these analyses with standardized variables (indicators and covariates)?

Thanks in advance.

Linda K. Muthen posted on Tuesday, May 22, 2007 - 3:23 pm

I would work with the raw data. I don't know why you want to standardize. If it is because the variables have large variances, I would rescale them by dividing them by a constant. A constant is not sample dependent as are the mean and standard deviation used for standardizing.

Alex posted on Tuesday, May 22, 2007 - 8:38 pm

Greetings and thanks for the fast answer,

In fact, suppose that we already did the analyses with standardized variables following the suggestion of a colleague (because it made it easier to compare the latent classes).
What kind of problems might it cause ?

Thank you very much in advance.

Linda K. Muthen posted on Wednesday, May 23, 2007 - 12:56 pm

When you standardize variables, you are analyzing a correlation matrix not a covariance matrix. This is fine if your model is scale free but not if it is not. One example of a model that is not scale free is a model that holds variances equal across classes. If a model is scale free, the same results will be obtained whether a correlation or covariance matrix is analyzed. If I were you, I would rerun the analysis using the raw data.

Alex posted on Wednesday, May 23, 2007 - 3:06 pm

Thank you again, so much for our laziness...

I imagine that the default LPM model is not scale free since variances are held equal between classes.

Does this holds for the covariates, the indicators or both ?

Linda K. Muthen posted on Wednesday, May 23, 2007 - 3:20 pm

I believe if the model is not scale free, all results would be affected. Analyzing raw data is your solution.

Alex posted on Wednesday, May 23, 2007 - 3:29 pm

Thank you again. The day we will manage to get these analyses done, MPlus support will clearly be at the begining of our thank you list.

Michael P. Marshal posted on Monday, June 18, 2007 - 12:11 pm

Hello Bengt and Linda,

Thanks once again for this valuable resource and for the workshops you have conducted. I am attempting to estimate an LPA with four continuous indicators which are age variables ranging from (age) 5 to 40. The variables represent how old participants were when they reached each of four developmental milestones. Our analyses are attempts to identify sub-groups of individuals who progress through these milestones at different paces. Not an ideal way to model developmental phenomena, of course, but the best we can do with the cross-sectional data we have. My question is whether or not this seems conceptually and statistically reasonable (assuming the models fit well, etc.) and if you know of any other published data that uses similar (age) indicator variables in LPA? I'm a little concerned with the validity of our approach.

Mike

Linda K. Muthen posted on Tuesday, June 19, 2007 - 8:17 am

Bengt and I discussed this and see no objections. It sounds like an interesting approach to looking at developmental milestones. Neither of us know of any articles that use age in this way.

Michael P. Marshal posted on Tuesday, June 19, 2007 - 12:38 pm

Thanks for your quick response. Glad to hear that you don't see any major red flags! Whew. We passed the first test... :-)

Selahadin Ibrahim posted on Tuesday, August 07, 2007 - 2:04 pm

Hello,

My statistician is helping me with a Latent Class analysis. We are looking at latent classes in a group of workers with low back pain. The variables we are using to distinguish between the classes are pain, functional status, depression, fear and some workplace factors. We used age, duration of complaint and time on the job as predictors of class membership. We know that pain, fs, depression and fear probably correlate so we added these correlations to the model. In the output I see that there is (amongst others) a significant covariance between pain and f.s. within the 1st class (in a 2 class solution). Estimates: 24.463 SE: 5.0079 ESt/SE: 4.816. How should I interpret this? Are assumptions violated?

We also saw in earlier analyses that a 4 class solution turned out to be the best fit.

Help is much appreciated, with kind regards,

Ivan Steenstra

Matthew Cole posted on Tuesday, August 07, 2007 - 3:57 pm

Hi Selahadin,

The covariance suggests that among the members of class 1, there is a relationship between pain and functional status. If the correlation is negative, then its a relationship moving in different directions. If the correlation is not significant in class 2, or if its in a different direction then class 2, then that's a really neat finding and supports the contention that there is heterogeneity in your sample

Regarding your 4 class solution question, are you saying that in an earlier analysis without the covariates you found a 4 class solution fit best, but that after adding the covariates only the 2 class solution fit? Bengt notes that class fit will change when covariates are added, and he has advocated that you should consider using the solution when covariates are added.

Matt

Selahadin Ibrahim posted on Wednesday, August 08, 2007 - 7:38 am

Hi Matt,

The correlations are as expected. Somewhat different between the classes. IN a few cases present in one class and not in the other.

We haven't done the 3, 4, (and 5) class analyses yet. (In previous analyses the 6 class solution didn't converge.) We realised we should do the analyses with adding the covariates after reading Bengts opinion on it and to us his point makes sense. (Starting out with SPSS K-means, it's getting better all the time:-) We expect that again the 4 class solution will be best, since the individual membership doesn't change that much, but it gives great info on how the constructs fit together. I expect to find more heterogeneity in the four class solution. We were a bit worried about our n (approx. 400)when adding all these extras. We might want to look into a subgroup in our next step.

Thanks,

This is great help.

Selahadin and Ivan

Bengt O. Muthen posted on Tuesday, August 14, 2007 - 6:51 pm

Note that whenever there are at least 2 latent classes, the observed variables will correlate. If in a latent class analysis you choose in addition to correlate variables *within* classes, saying e.g.

%overall%
y1 with y2;

then this means that your y1 and y2 variables correlate more than their common influence from the latent class variable can explain - so it is like a residual correlation. Often this comes about due to similar question wording or variables logically tied to each other.

Note that this is not a model violation. Although the standard LCA assumption of "conditional independence" no longer holds, you are using a perfectly legitimate generalized latent class model.

Selahadin Ibrahim posted on Wednesday, August 15, 2007 - 12:16 pm

Dear Bengt,

OK thanks very much. This helps a lot. The variables seems to be logically tied together. The LCA shows that some do in certain classes and some don't, which is good information.

A different question. We are now looking into latent classes within a subgroup (from n=441 to n=183). Only those people that haven't returned to work at baseline interview are now included in the LCA. Same variables, same observed independent variables, but without the (residual?)correlations in the model. The 4 class solution now doesn't converge (minimum number in 1 class is 33). I expected the 3 class solution to be optimal because the " low risk class" seemed to overlap with those who had returned to work. And that's an easier variable compared to the 5 we've used to determine classes. Unfortunately we now don't get information on model fit. Is there a way to get around this? Or does it just tell us that the results should be interpreted with caution?

With kind regards,

Ivan

Linda K. Muthen posted on Wednesday, August 15, 2007 - 2:52 pm

I would not expect a four-class solution to be optimal if you have basically removed one of the classes. I would expect the three-class solution to be better. You will not get any fit statistics if the model does not converge. Is this what you mean?

Selahadin Ibrahim posted on Thursday, August 16, 2007 - 9:14 am

Hi Linda,

I also expected the 3 class solution to be the best fit.

Yes, that is what I mean. In the previous analysis (n=441) we also could get fit statistics for a model with 5 (optimal fit +1 class) classes. I was hoping to get it (for the 4 class model) in this analysis (n=183)as well. We changed the setting to 500 iterations, but that doesn't help. Well, it's a sensitivity analysis anyway.

Thanks for the great support!

Ivan

Linda K. Muthen posted on Friday, August 17, 2007 - 8:44 am

Try STARTS = 8000 800; Five hundred starts may not be enough.

Selahadin Ibrahim posted on Monday, November 19, 2007 - 7:55 am

OK that worked, thanks for that. When submitting a paper a reviewer might ask: Why 8000 iterations, any suggestions? By the way: it seems that the SE decreases.

Another question:
One of the people in the team asked me what the main drivers for the class solution where, so we now have a description of what the classes look like and we know that the total model has a better fit in the 3 (and the 4) class solution, but which factors predict or drive the class membership. "Predict" might be a bit confusing since we also are using some counfounding variables as "predictors" . We where think of using a multinomial logistic regression to get the estimates. Do you have any other sugggestions, perhaps how we should model this using M-Plus?

Thanks,

Selahadin and Ivan

Linda K. Muthen posted on Monday, November 19, 2007 - 11:01 am

It is not 8000 iterations. It is 8000 sets of initial random starts and 800 solutions carried out completely. Read in the user's guide under STARTS. You may not need that many. You should not compare standard errors from a local solution to a replicated solution.

The model is estimated with the objective of conditional independence of the latent class indicators within each class. If you want to use covariates to predict latent class membership, you can regress the categorical latent variable on a covariate or set of covariates.

Selahadin Ibrahim posted on Thursday, November 22, 2007 - 8:23 am

OK thanks, it's now quite obvious tha tit would be useful to do the course next year :-). You have been great help.

Ivan

Vilma posted on Tuesday, November 27, 2007 - 6:35 am

I was running LPA with 3 continuous indicators. My sample is 210. According to fit criteria, it seems that could be 5 profiles, but the one of the profiles has only few people. My choice was 4 profiles (it makes more sense from theoretical point of view). These profiles differ from each other. The reviewers give me a hard time about LPA with a small sample and only 3 indicators. Basically, they said that I cannot do with 3 indicators 4 profiles (it is stretching data too far). Might be it is the truth. But could I check somehow that. Or should be better to have more indicators?

Bengt O. Muthen posted on Tuesday, November 27, 2007 - 4:57 pm

I think what you are trying to do is possible, although the solutation may not be very stable. The classic Fisher's Iris data had n=150 with 4 continuous indicators and 3 latent classes - see the Everitt & Hand (1981) book. That model did not use the LPA assumption of zero correlation within classes and so is harder to fit. Perhaps the reviewers are thinking LCA with binary indicators in which case only 2 classes can be obtained with 3 indicators.

To convince the reviewers (and yourself) you can do two things. You can use the Mplus Tech14 facility to test for the number of latent classes. You can also use the Mplus Monte Carlo facility to simulate data with exactly your parameter values and see how well or poorly the model is recovered.

Having more indicators, however, certainly helps.

Vilma posted on Wednesday, November 28, 2007 - 12:49 am

Thanks a lot!

Julie Mount posted on Friday, May 16, 2008 - 2:54 pm

I have a question related to variable types (continuous, categorical, count) within LCA/LPA. I'm working with 11 variables that are neuropsychological test subscales. These subscale scores are really sums of successes on a number of binary items (eg remember name y/n). The subscale variables range in levels from 2 to 37 and some are very skewed. I fit an initial LCA model with binary variables, dichotomising the subscale scores at the median values in this sample. A 4-class model seemed to fit the data well (with interesting results) but the modelling approach was criticised for not using all the information available in the data. I then tried modeling all the variables as count variables but ran into problems with the variables with fewer than 3 levels. A mixed categorical and count variable model ran without errors but the results are difficult to interpret (variables are on pretty different scales)and not terribly interesting, plus I have concerns with model fit (one class with very low probability and Lo-Mendell Rubin and BIC fit results conflict).

In short, I�m more comfortable with the initial binary model. My question, then, is whether I really should be concerned over potential loss of information in the binary classification model � how much value does using the full scales really add? Could my binary model findings be invalid?
Apologies if this is obvious; I�m new to latent variable modelling and Mplus.

Bengt O. Muthen posted on Friday, May 16, 2008 - 5:26 pm

No obvious decision here. Seems to me that the reduced-information binary approach is fine as long as the 11 subscales are a good summary. You could finesse it by using ordered polytomous representations of the number of successes (e.g. low, medium, high). Or you could stay with binary items, but go fancy by working with the original, total set of binary items used to create all of the 11 subscales. That large original set can be used for (1) LCA, or (2) factor analysis to see if 11 dimensions - or fewer - turn out, and then perhaps do LCA on the factors (either in 2 steps, or better still, by having a mixture for the factor means).

Julie Mount posted on Thursday, May 29, 2008 - 3:15 am

Thanks very much for your response.

Unfortunately I don't have access to the original item-level questionnaire responses, only the 11 aggregated subscale scores. Am sure there would be an interesting underlying factor structure as many of the questions could measure multiple domains of cognition. May aim to look at this in a replication analysis!

devin terhune posted on Tuesday, July 29, 2008 - 11:53 am

I am running a lpa that is very similar to example 7.9 with only minor changes. I keep getting an error that reads: Mplus was unable to start. Please try again.

I have double-checked to make sure that my .dat file and my data path are acceptable. Any solutions would be great. Many thanks in advance.

Linda K. Muthen posted on Tuesday, July 29, 2008 - 1:07 pm

This means that the directory where Mplus.exe is stored is not part of your path environment variable. See System Requirements - The Path Environment Variable.

Mark LaVenia posted on Sunday, September 21, 2008 - 5:37 pm

We are running a latent profile analysis with 12 continuous variables. The Tech11 and Tech14 p values are very dissimilar:

Profiles TECH11 TECH14
2.........0.027....0.000
3.........0.498....0.000
4.........0.197....0.000
5.........0.773....0.000
6.........0.467....0.000
7.........0.722....0.000

Are these results interpretable, that is, do they suggest going with 2 profiles, 3 profiles, or continuing until Tech14 is no longer significant? More to the point, do these divergent results suggest that there is something fundamentally wrong with our data or input specifications?

Gratefully yours, Mark

Linda K. Muthen posted on Monday, September 22, 2008 - 8:42 am

If you are not using Version 5.1, I would do that. If you are, please send your files and license number to support@statmodel.com.

Mark LaVenia posted on Saturday, September 27, 2008 - 8:07 am

Dear Dr. Muthen - Thank you for your reply. I regret to report that we ran it on version 4. Could I bother you for a brief explanation on why 5 might give different results? Gratefully yours, Mark

Linda K. Muthen posted on Saturday, September 27, 2008 - 9:07 am

We are constantly making improvements to the program and correcting problems.

Kelly Schmeelk posted on Tuesday, January 13, 2009 - 12:06 pm

I have found the proportions of membership in each class under the output for the LPA, but I was wondering how I could find out where each case was placed among the classes. Is there specific syntax for output I could request? Thanks!

Linda K. Muthen posted on Tuesday, January 13, 2009 - 1:02 pm

See the CPROBABILITIES option of the SAVEDATA command.

Alexander Kapeller posted on Thursday, February 05, 2009 - 5:38 am

small number of observations

Hi,
I am performing a lca with items measured on a 6 point likert scale. my number of observations is for one sample 70, for another sample 170. How can I check via monte carlo if the model is appropriate (you stated in an answer above:You can also use the Mplus Monte Carlo facility to simulate data with exactly your parameter values and see how well or poorly the model is recovered.)

thanks
Alex

Linda K. Muthen posted on Thursday, February 05, 2009 - 6:47 am

You can use the parameter values from an LCA as population values in a Monte Carlo study with sample size 70 for the parameter values from the analysis of the sample with 70 observations and sample size 170 for the parameter values from the analysis of the sample with 170 observations. Use mcex7.6.inp as a starting point.

Alexander Kapeller posted on Friday, February 06, 2009 - 5:24 am

HI Linda,

sorry that I didn't express myself precisely.
Which indicators in the results are those I should especially look for. Or is it just enough to look after the power indicator?

As I read already for some power studies with MC I have problems to interpret different outcomes for e.g. 70 observations:

a) alpha is sign. and power over 0.8 --> everything is fine

b) alpha is sign. and power is below 0.8 --> increase of power is necessary by reducing complexity or more obs.

c) alpha is not sign. and power is over 0.8 --> no clue

d) alpha is not sign. and power is less 0.8 --> no clue

Could me give a hint if these interpretations are correct or/and how to interpret these cases?

thanks in advance

Linda K. Muthen posted on Friday, February 06, 2009 - 11:09 am

The type of power study I was referring to is described in the following paper which is available on the website:

Muth�n, L.K. & Muth�n, B.O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599-620.

I am unclear about what the rest of your message means.

Alexander Kapeller posted on Tuesday, February 10, 2009 - 2:47 pm

hi Linda,

I try to explain with a exemple. I hope being more clear now.
i conducted a sem with sample size 65 the sem results are:

V25 ON [last number=p-value]
LOY_ACC -0.210 0.424 -0.495 0.621
REPURCH 1.318 0.551 2.394 0.017
AUSDEHN -0.817 0.458 -1.786 0.074

to check for power I ran a monte carlo with nobservations=400 and nreps=500. here are the results for % Sig coeff

V25 ON
LOY_ACC 0.054
REPURCH 0.106
AUSDEHN 0.251

the interpretation gives me some headache:
for loy account and ausdehn: is a non sign path, and shows a low power; so increase of sample size might help - this is clear for me - is this right?
for repurch: is a sign path, but also low power: this is puzzling me.

Shouldn't be a sign. path also have a high power or is the positive path only 10.6 % of the replications and so nearly a artefact?
And if there is the case that there is a non sign path in the sem but a power of more than 0.8? Would that then also be one case in the 20% of not rejecting H0 although it is wrong?

I would be really glad if you could shed some light on that.

Thanks in advance

Alexander

Linda K. Muthen posted on Wednesday, February 11, 2009 - 10:04 am

The method we proposed is used to assess the power of a single parameter not a set of parameters.

It sounds like you are making a mistake in your setup. Please send your output and license number to support@statmodel.com.

Anne Chan posted on Monday, September 21, 2009 - 4:23 am

Hello! I am doing a study which compares boys and girls in relation to their motivation, parental support and their learning outcomes. I have two questions:

1) I applied LPA analysis to classify students into different motivational groups. I included Gender and Parent Support as the covariates, the 6-class solution is perfect, both in terms of model fits and theoretical meanings. However, if I run the LPA without covariates, no theoretical meaning solutions can be generated. How should I interpret these results?

2) I am planning to save the LPA class membership (with Gender and Parent Support) of individuals and conduct further analysis to study the differences between boys and girls, both within and between classes. However, I am still a bit confused of how to understand having gender as a covariate in LPA analysis. Is it appropriate for me to use Gender as a covariate in LPA, particularly the goal of my study is to compare the two genders? I mean, if gender is included in the LPA, then gender will affect the classification result, and is it methodologically inappropriate to use this �biased� classification to conduct subsequent gender comparison? Or, instead of thinking the classification is �biased�, it is actually more robust in using Gender as a covariate, as it can more accurately reflect the data?

Bengt O. Muthen posted on Tuesday, September 22, 2009 - 10:52 am

1) Differences in latent classes when using and not using covariates usually is a sign that there are direct effects of the covariates onto the outcomes, not only indirect effects via the latent class variable. Try exploring direct effects (you cannot indentify all of them). Although that may move you away from your favorite solution, your 2 runs (without and with covariates) may agree more.

2) My opinion is that if Gender influences class membership you are fine including it in the model - the estimates will be better. The same is true for factor scores in MIMIC models.

However, doing analyses in several steps is not always desirable, particularly not with low entropy. Why not do your "further analysis" as part of this model?

For related topics, see also

Clark, S. & Muth�n, B. (2009). Relating latent class analysis results to variables not included in the analysis. Submitted for publication.

under Papers, Latent Class Analysis on our web site.

Anne Chan posted on Thursday, September 24, 2009 - 4:20 am

Thanks a lot for your kind suggestion. As a follow-up of question (1), I will explore the direct effects. May I ask how can I do it? Can you please kindly point me to some examples or references? Thank you very much!

Linda K. Muthen posted on Thursday, September 24, 2009 - 10:56 am

A direct effect is the regression of an outcome on a covariate. I would do each outcome one at a time to see if there is signficance.

Matt Thullen posted on Friday, October 16, 2009 - 10:45 am

Hello-

I am running LPA with 12 dimensions of relationship quality(N = 180) using a common set of 6 dimensions for two different relationship to assess patterns across the relations.

I have a problem similar to others I have seen posted where BIC, adjBIC, and BLRT are not useful in selecting class solutions. The LMR is providing some guidance but interpretation is also an issue.

class BIC aBIC LMR BLRT
2 6203.87 6086.69 <.001 <.001
3 6070.96 5911.67 .0577 <.001
4 5975.35 5775.83 .145 <.001

I figured I could justify a 3-class solution given close to sig. LMR but the means for the 3-class solution are providing almost no differentiation among the indicators for one relationship. I am not expecting much variation in terms of shape but I think at least two levels is more accurate. LPA for the six indicators for that relationship alone provides fairly clear (again LMR only) 2-class solution.

The means for the 4-class solution provide some differentiation among the indicators for that relationship and a more interpretable set though there is a small group (n = 10). I recognize the small N overall and in one class and the possibility the LMR tends to overestimate (Nylund etal 2007), but I am considering using the 4-class solution based on substantive meaning. Could I please have any insights you may have?
Thank you

Bengt O. Muthen posted on Saturday, October 17, 2009 - 12:27 pm

The non-significance (p=0.0577) for LMR in the 3-class run says that 2 classes cannot be rejected in favor of 3 classes.

Personally, I tend to often simply listen to what BIC says, in a first step. In your case it suggests to me that because you don't have a minimum BIC you may not be in the right model ball park. Perhaps you need to add a factor to your LPA ("factor mixture analysis") and then you might find a BIC minimum.

Matt Thullen posted on Saturday, October 17, 2009 - 7:18 pm

Thanks Bengt-

What do you mean by "add a factor"? I am basically familiar with how FMA integrates factors and classes but did you mean something specfic other than "try FMA"?

I do question whether a latent variable approach is appropriate here. The dimensions are rather skewed in the positive direction for one relation and mostly bipolar for the other. With a relatively small sample for LCA this is probably why a 2-class solution emerges.

Also Ive run separate LPA for each relationship to look at different class combinations as across relationship patterns. With this I get a clear 2-class solution for one relation and clear 3-class for the other but again with these there is no lowest BIC, BLRT is not useful (just .000), and LMR is my only solid indicator with more definitive p-values this time.

So if I continue to not find any lowest BIC is that evidence that latent variable approach may not be appropriate even if the LMRs are suggesting reasonable classes?

I did find that k-means cluster analyses provided an almost identical set group as the 4-class run (means and proportions) but not the 3-class run.

thanks for your help
mjt

Bengt O. Muthen posted on Sunday, October 18, 2009 - 12:44 pm

Yes, I meant try FMA. Such as a 2-class, 1-factor FMA where the item intercepts vary across the classes (factor means fixed for identification). So you could try 1-4 classes and see if you find a BIC minumum (where 1 class is a regular 1-factor model).

Matt Thullen posted on Sunday, October 18, 2009 - 7:25 pm

I ran FMA with 1 factor for classes 1-5. Still no lowest BIC, though after 3-class it decreases much less in each subsequent run. LMR points to 3-classes with this approach

2 questions:
1) Should factor means be fixed across classes? If so where?
2) How would using FMA with the one factor change my interpretation of the classes compared to LPA?

thank you

Bengt O. Muthen posted on Sunday, October 18, 2009 - 8:17 pm

Will shortly be able to send you a new paper which answers your questions. Email me your email address.

Matt Thullen posted on Monday, October 19, 2009 - 7:46 am

Thanks Bengt - I emailed you and look forward to reading that paper.

As a backup plan do you have any concerns about doing separate LPA for each relationship and looking at how combinations of classes are associated with an outcome.

The LPAs in that strategy give definitive LMR for 2-classes for one relation and 3-classes for the other which makes more sense substantively but still no lowest BIC.

thanks

Bengt O. Muthen posted on Monday, October 19, 2009 - 10:16 am

That may be a reasonable approximation.

Matt Thullen posted on Tuesday, October 27, 2009 - 8:05 pm

Bengt - that paper you sent on FMA was great...very helpful.

One question I have is what to make of residual variances greater than 1.

I searched the board and did not find anything about this. It comes up for one indicator in a 2f/2c FMM-5.

thanks

Alexandre Morin posted on Wednesday, October 28, 2009 - 7:31 am

Greetings,
Would it be possible to also receive a copy of this paper.
Thank you in advance.

Linda K. Muthen posted on Wednesday, October 28, 2009 - 7:50 am

It will be posted on the website in the next week.

Alexandre Morin posted on Wednesday, October 28, 2009 - 8:11 am

Thanks Linda,
What would be the reference so I can find it.
By the way, I just realized that Matt Thullen did ask another question in his last post. I just wanted to make sure that my posting did not "erase" this question.
Thanks again

Linda K. Muthen posted on Wednesday, October 28, 2009 - 8:40 am

The topic is Factor Mixture. The authors are Clark and Muthen.

Bengt O. Muthen posted on Wednesday, October 28, 2009 - 10:53 am

Answering Matt Thullen - residual variances can be greater than 1 when they correspond to raw estimates.

Robin Segerer posted on Monday, April 26, 2010 - 8:10 am

Hi,

I'd like to build my doctoral thesis on Latent profile Analysis, but I don't know whether I have enough statistical power to identify all relevant classes.
So I'm looking for any recommendations about sample size in Latent Profile Analysis.
Are there any articles discussing that issue? Thank you very much in advance and
best greetings from Germany...

Robin

Linda K. Muthen posted on Tuesday, April 27, 2010 - 10:14 am

You may find something in the following paper which is available on the website:

Marsh, H.W., Lüdtke, O., Trautwein, U., & Morin, A.J.S. (2009). Classical latent profile analysis of academic self-concept dimensions: Synergy of person- and variable- centered approaches to theoretical models of self-concept. Structural Equation Modeling, 16:2,191-225.

Sample size depends on the characterisitics of the data and the separation of the classes. Doing a Monte Carlo study may be helpful.

Linda K. Muthen posted on Tuesday, April 27, 2010 - 11:19 am

Here are a few references you may find useful:

Lubke, G. & Muth�n, B. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling, 14(1), 26�47.

Lubke, G.H. & Muth�n, B. (2005). Investigating population heterogeneity with factor mixture models. Psychological Methods, 10, 21-39.

Nylund, K.L., Asparouhov, T., & Muthen, B.
(2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study.
Structural Equation Modeling, 14, 535-569.

Mike Gillespie posted on Tuesday, November 02, 2010 - 10:44 am

I used two sets of four dichotomous items as measures of two factors per set (for a total of four factors) in seven-class factor mixture models in separate analyses of data from seven national election surveys. (I didn't combine the surveys in an known-group analysis because I believed the computation would be too heave, and, in any event, I also conducted a separate stacked SEM of the same data.)
The factors have no variance, so I assume that I did a LPA? My computer doesn't have enough memory to estimate the variances (using algorithm = integration), so I guess I'm stuck with LPA.
Three (more) questions:
(1) Is the Vermunt-Magdison article still the best reference? In particular, has anyone else used probit factor analysis in a mixture model?
(2) How does one interpret a factor with no variance? Would it be correct to say that the factor means + the (largely invariant) item intercept determine the class-specific probility-dist. of an item associated with the two factors it measures?
(3) How does one interpret the variance and covariance of the two continuous vars. that I also used (along with some additional categorical variables)? Ideally, I would like to interret these quantities as the measurement error in these variables.

Jason Chen posted on Tuesday, November 02, 2010 - 12:32 pm

I would like to conduct a Latent Profile Analysis to form clusters of students based on 4 variables (x1, x2, x3, and x4). These 4 variables are considered "sources" of another variable (y). There are theoretical arguments that another variable (m1) might moderate the relationship between the sources and y. I would like to use a person-centered approach because these 4 sources do not operate in isolation. However, if I wanted to test whether m1 moderated the relationship between the sources (x1-x4) and y, how would I test that if the sources are clustered within a person?

In regression, I could compute y = x1 + m1 + x1*m1. And if the interaction term was significant, that would be evidence of moderation. But If I'm clustering the 4 sources and exploring how m1 moderates the relationship between these clusters and y, how could that be done?

Bengt O. Muthen posted on Wednesday, November 03, 2010 - 12:31 pm

It sounds like you want the latent class variable (say c) behind the x's to influence y. With a continuous y this implies that the mean of y changes across latent classes.

If you have a binary moderator m1 you can simply use that to form a Knownclass latent class variable (say cg) and let the y means change over both latent class variables (that is the default) - and then use Model Test to see if the y means for the c classes are the same across the cg classes.

Bengt O. Muthen posted on Wednesday, November 03, 2010 - 12:43 pm

Answer to Mike Gillespie:

Seven classes and 2 factors is a lot of latents. Typically, when factors are added to a latent class model you don't need as many latent classes. Conversely, if you have a lot of a latent classes, the factor variances can go to zero. I would use BIC to compare the alternative models, varying the number of classes and factors.

1) You might consider my overview:

Muth�n, B. (2008). Latent variable hybrids: Overview of old and new models. In Hancock, G. R., & Samuelsen, K. M. (Eds.), Advances in latent variable mixture models, pp. 1-24. Charlotte, NC: Information Age Publishing, Inc.

which is on our web site under Papers.

2) Typically a model with no intercept (or threshold) invariance across classes has a much better BIC than letting only the factor mean vary. If you don't have factor variance, the factor is not motivated except as a non-parametric device of describing the factor by a mixture - and again it may be due to having too many classes.

3) With LPA there is no within-class covariance between the continuous outcomes. The variance is a within-class variance, but not necessarily measurement error, perhaps just "severity variation".

Jason Chen posted on Wednesday, November 10, 2010 - 9:11 am

Thanks very much for the reply, Bengt. If my moderator (m1) is not binary, I'm assuming that there is no other way to test for this moderation effect other than artificially creating one on my own (e.g., median splits?).

Bengt O. Muthen posted on Wednesday, November 10, 2010 - 12:58 pm

Continuous moderation (m1) of the effect of a latent class variables on a distal y? Can't you think of that as the m1 influence on yvarying over the latent classes (at the same time as the latent classes influence y by the y means varying over the classes)? So a c-m1 interaction. That's doable in Mplus.

luke fryer posted on Monday, November 29, 2010 - 6:55 am

Dr. Muthen would you please expand on your comment from " on Saturday, October 17, 2009 - 12:27 pm...":

"Personally, I tend to often simply listen to what BIC says, in a first step. In your case it suggests to me that because you don't have a minimum BIC you may not be in the right model ball park. Perhaps you need to add a factor to your LPA ("factor mixture analysis") and then you might find a BIC minimum."

I am facing a problem similar to the original post--arriving a minimum BIC for my analysis--BLRT is also not proving to be useful, entropy is occasionally useful. Would it be worth adding categorical variables (Gender, department, etc) to my LPA in order to create a more decisive model? What other alternatives might I have?

Thank you,

Luke

luke fryer posted on Monday, November 29, 2010 - 7:16 am

Dr.s Muthen,

One more question... At what point does the software's request for more starts--WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA. INCREASE THE NUMBER OF RANDOM STARTS.--start to be an indication of anything other than "time to increase the number of starts". The Mplus Manual gives clear advice up to 500 starts. If the warning persists, does one just continue to increase the number of starts? I have never had an analysis fail to converge, but I consistently get this warning.

thank you

luke

Linda K. Muthen posted on Monday, November 29, 2010 - 9:53 am

Not being in the right model ballpark means that LPA might not be an appropriate model for your data. Perhaps a factor analysis model is more approprite or a factor mixture model.

You should increase your starts keeping the second number about 1/4 of the first until the best loglikelihood is replicated. It may be that you have the wrong model.

luke fryer posted on Monday, December 06, 2010 - 10:02 pm

Dr. Linda Muthen,

Could you point me, and anyone else in a similar predicament, in the right direction with regard to making this distinction between LPA and Factor Mixture Models (when using continuous variables).

I consistently fail to get a minimum BIC...

Thank you

luke

Linda K. Muthen posted on Tuesday, December 07, 2010 - 1:49 pm

It seems like the models you are trying do not capture your data. This can happen with some data.

Elizabeth Bell posted on Saturday, April 16, 2011 - 11:01 am

Hello,
I have a question in response to Robin Segerer's post on Monday, April 26, 2010 - 8:10am about determining sample size for an lpa.

I am submitting a grant for funding for my dissertation work. I will be conducting a latent profile analysis using continuous indicators of children's behavior and using demographic covariates to predict class membership. In addition, I plan to simultaneously estimate the lpa as well as a lgca of longitudinal distal outcomes to examine mean differences across the profiles in intercept and slope parameters of this distal outcome.

I read through the articles that you suggested for Robin but am wondering how to determine sample size needed for estimating the lpa and lgca simultaneously. In addition, are there any examples that you know of where this has been done to determine profile differences in intercept and slope parameters of a distal outcome?

I would greatly appreciate any guidance.
Elizabeth Bell

Linda K. Muthen posted on Sunday, April 17, 2011 - 2:17 pm

I don't have such an example. You would need to put the two together yourself. Start with the Monte Carlo counterpart inputs for the two user's guide examples that come closest to your LPA and LCGA.

Alma Boutin-Martinez posted on Tuesday, May 03, 2011 - 1:57 pm

I'm running a cross-sectional LCA with continuous (4-point Likert items) and dichotomous items. In this analysis, when I specify a 4-class solution, the mean of one of the continuous variables is fixed for three of the four classes. Why would this happen? I know that with dichotomous variables, when the logit is very small it is fixed at -15 or +15, is this similar to what is happening with these continuous variables? If so, how is the value it is fixed at chosen? Is this problematic?

Below is the output with one mean fixed for class 2.

Latent Class 2

Means
B2E 2.833 0.028 99.436 0.000
B2G 2.703 0.034 79.621 0.000
B2D 3.000 0.000 ********* 0.000
B2B 4.087 0.062 65.951 0.000

Linda K. Muthen posted on Tuesday, May 03, 2011 - 2:42 pm

I think this happens when there is no variability for the item in a class. All members have the same value.

Alma Boutin-Martinez posted on Thursday, May 05, 2011 - 3:30 pm

Thank you Dr. Muthen.

Stata posted on Friday, June 17, 2011 - 9:29 am

Dr. Muthen,

Is it possible to use ordinal variable for latent trait analysis? Thank you.

Linda K. Muthen posted on Friday, June 17, 2011 - 1:09 pm

Yes.

James L. Lewis posted on Wednesday, July 06, 2011 - 10:21 am

Hello,
I am doing LPA with N=2000 and 6 continuous indicators. Each of the indicators are 3(5-pt Likert)item parcels (created by taking the mean of the 3 items). In most solutions, certainly in all intepretable solutions, I have modification indices indicating residual within-class correlations among parcels. This would seem to indicate that conditional independence is violated.

My question is whether modification indices are the only way to get a look at conditional independence when doing LPA in MPLUS. Clearly if I had categorical items I could use TECH10, but is their anything like that for the continuous indicator case or is there otherwise another way within MPLUS? I have considered freeing within class bivariate correlations, but it seems in a previous correspondence that this was not recommended in general as a way of modeling conditioal dependence (note these models are largely exploratory) -- [see post from anonymous on March 29, 2007 11:41a.m]. When I allow residual correlations %overall% (which I believe constrains the corrs to be equal across classes), the within-class corrs often remain to a lrage extent in the modification indices. Note I have also attempted FMA but would prefer to stay with "manifest parcels" if possible.

In a related question, is there a recommendable way to create categorical indicators using the parcels (15 levels seems too many)?

Thanks much.

James L. Lewis posted on Wednesday, July 06, 2011 - 10:29 am

Sorry I may have been a bit unclear above -- each parcel consists of taking the mean of three 5-pt Likert items.

Also, when I say at the end that "15 items seems to many" I was thinking in terms of summing (which I guess would be a maximum of 12 levels - still seems too many), but did not mean to indicate there may not be a recommendable way using the means of the items within parcels or some other way as well.

Thanks again.

Bengt O. Muthen posted on Wednesday, July 06, 2011 - 12:53 pm

It sounds like you have 6 variables, and each variable is created a the mean of 3 5-p Likert items. So you have 18 Likert items - is that right?

James L. Lewis posted on Wednesday, July 06, 2011 - 1:41 pm

Yes that is correct. Thanks.

Bengt O. Muthen posted on Wednesday, July 06, 2011 - 1:46 pm

But you see within-class correlations among the 6 variables, not among the 18?

James L. Lewis posted on Wednesday, July 06, 2011 - 1:48 pm

I am doing the LPA based on the 6 parcels - so I mean within-class corrs between the parcels (in other words lack of conditional indpendence).

Bengt O. Muthen posted on Wednesday, July 06, 2011 - 3:42 pm

You can use the estimated LPA model and classify subjects according to it. Then for each class see how correlated the variables are - and for which pairs.

But if the model which allows class-invariant within-class covariances has a much better BIC, or if a one-factor FMM has a much better BIC, then it is questionable to stay with the conditionally independent LPA.

James L. Lewis posted on Wednesday, July 06, 2011 - 5:15 pm

Thanks. I will try this. I of course do not mind modeling the dependencies as long as it doesn't mean my class membership assignments and probabilities are trustworthy???

Did you have any thoughts on whether there is a recommendable way to create categorical indicators using the parcels?

Thanks very much.

Jean Christophe Meunier posted on Sunday, July 31, 2011 - 7:40 am

LPA with N=900 and 4 continuous indicators on parenting to explore profile of parents on both parenting and differential parenting. As parenting is intrinsically related to children's characteristics, it seems that some children variables (for.ex. age) must be incorporate as endogenous cov. However the literature on covariates in LPA is quite vague. Here are some options that seem to me ok. I would be pleased to have your advice about the best option:
1.Making latent cl regress on the cov.
The most commonly used approach. However, the covariates I�d like to incorporate are uniquely related to some indicators but not to others (for ex, child�s age is related to parenting whereas age gap between siblings is related to differential parenting). I�m wondering if it wouldn�t be better to �link� more specifically the covariates to their relevant indicators. Other options that I imagine :
2.Making each indicator regress on its �relevant� covariates.
3.Making both the latent and the indicators regress on the covs (latent on each covariate ; indicators on their specific covariates)
4.Residualizing the indicators after regressing them on their specific covs.
Also I�d like to test the role of some predictors (not endogenous) on the latent classes.
What is the best way to test models that would include both covariates and predictors ?
Thanks a lot in advance.

Linda K. Muthen posted on Sunday, July 31, 2011 - 9:42 am

I would choose 3.

Melinda Gonzales-Backen posted on Monday, August 01, 2011 - 10:02 pm

Hi,
I am looking at a 5 factor, 4 cluster solution. I requested the cluster membership variable using
SAVE = CPROBABILITIES;

I then read it into SPSS and examined the descriptives of each cluster.

First, the number of cases it labeled as each cluster differs substantially. In addition, although I would interpret each cluster based on the estimated parameters given in MPlus, when I select each cluster and examine the means, the interpretation would be much different. For example, clusters that were once classified as "low" on variable X are now classified as "high" oh variable X.

Any help would be greatly appreciated.

Linda K. Muthen posted on Tuesday, August 02, 2011 - 8:04 am

You won't get the same results unless your entropy is extremely high. You are using most likely class membership not the fractional class probabilities used in the analysis.

Melinda Gonzales-Backen posted on Tuesday, August 02, 2011 - 8:39 am

Thank you, Dr. Muthen. So this is not a problem at all, correct? I should just be using the estimated parameters from the LPA model, correct?

Linda K. Muthen posted on Tuesday, August 02, 2011 - 8:43 am

Yes.

Meredith O'Connor posted on Monday, August 22, 2011 - 4:24 pm

Hi everyone,
I recently conducted an LPA and identified six profiles of positive functioning in my sample of 19 year olds. I then used MANOVA to compare the profiles on a number of variables measured when they were 17 years old.
A journal editor has asked me to consider using "conductional LPA analyses" rather than MANOVA, however I am unfamiliar with this technique. Would anybody know of a paper that would point me in the right direction?
Thank you!

Bengt O. Muthen posted on Monday, August 22, 2011 - 5:13 pm

I have not heard of this technique.

Meredith O'Connor posted on Monday, August 22, 2011 - 5:19 pm

Thank you for replying Dr Muthen, this clarifies for me that it must be a little used technique.
Thanks again!
Meredith

Bengt O. Muthen posted on Monday, August 22, 2011 - 5:22 pm

It doesn't Google either.

Anto John Verghese posted on Tuesday, September 06, 2011 - 1:11 pm

Dear Dr.Muthen,

I am using 22 continuous predictors (based on a 5 point Likert scale) to carry out latent profile analysis.

Is there any way to obtain the probability of each item (indicator) to each class?

Thanks!

Linda K. Muthen posted on Tuesday, September 06, 2011 - 1:45 pm

If you treat the variables as continuous, probabilities are not relevant. If you treat them as categorical, you will obtain the probabilities automatically.

Bengt O. Muthen posted on Tuesday, September 06, 2011 - 2:01 pm

I assume that when you say 22 predictors, these are actually latent class indicators. You can get a plot of the means for the indicators.

Anto John Verghese posted on Tuesday, September 06, 2011 - 2:34 pm

Thanks!

Julia Lee posted on Wednesday, September 07, 2011 - 8:15 pm

I am new to latent class analyses. I have been reading about issue of 'minimum BIC' on the discussion board. I have n = 521. I conducted a LPA with 5 indicators (all are continuous variables).

My interpretation of the fit indices below suggest that the 4-class model is the best model. I used VLMR and LMR to help me make the final decision on the number of classes because BLRT was significant for models 2 to 6. VLMR and LMR suggested 4-class model was better than the 3-class model. In addition, the Entropy for the 4-class model seems closer to 1.

One researcher in one of the posts mentioned his concern about the declining BIC/AIC/ABIC. What does minimum BIC mean? My BIC values were declining all the way from model 2 to 6. I did not continue to check model with 7 classes because it didn't make sense to me to continue without a substantive reason to do so. Should I be concerned about my results and consider using FMA? I'm unclear whether I am on the right track or not.

classes loglikelihood AIC BIC ABIC BLRT VLMR LMR Entropy
2 -2836.91 5705.82 5773.91 5723.13 0.000 0.000 0.000 0.925
3 -2518.82 5081.63 5175.26 5105.42 0.000 0.604 0.608 0.893
4 -2297.61 4651.22 4770.38 4681.50 0.000 0.003 0.003 0.908
5 -2198.93 4465.85 4610.55 4502.63 0.000 0.117 0.121 0.903
6 -2120.59 4321.18 4491.41 4354.45 0.000 0.142 0.147 0.901

Thank you, in advance, for your expert advice!

Bengt O. Muthen posted on Wednesday, September 07, 2011 - 8:50 pm

It looks like you are not reading the LMR results correctly. The first instance that you get a high p-value implies that one less class should be chosen. If I am reading your table correctly, you have the p-values 0.000 (for 2 classes), and 0.608 (for 3 classes), which then implies that you should choose 2 classes.

In my experience, when BIC continues to decline with increasing number of classes without hitting a minimum, better models can be found. For instance, an FMA should be explored.

Julia Lee posted on Thursday, September 08, 2011 - 11:53 am

Dr. Muthen, thank you for your feedback regarding my fit indices. I will try FMA. Is the syntax is similar to the CFA Mixture Modeling syntax in Example 7.17 of the Mplus version 6 manual? The factor means in this example for class 1 of c was fixed to 1. The factor means was fixed for identification purposes, correct?

Bengt O. Muthen posted on Saturday, September 10, 2011 - 8:49 am

For ex 7.17, the factor mean is fixed only in the last class. In the first class the factor mean is given a starting value of 1 to show that it is free in this class. Note that @ means fixed and * means free.

craig neumann posted on Thursday, September 22, 2011 - 11:40 am

Are there any materials which discuss conducting LPA when the sample is restricted to only those with high scores?

Given that some clinical measures have cut-offs, some might argue that only those above the cut should be examined to explore subtypes. However, I am wondering about finding spurious latent classes when sample variance is significantly restricted from using only cases with high scores. Put another way, shouldn't the high score classes naturally emerge when the entire sample is used?

My take from the Bauer and Curran papers is that LPA (and more generally LCA) could result in spurious latent classes if high score only samples result in nonnormal data.

Any help and guidance would appreciated.

Bengt O. Muthen posted on Friday, September 23, 2011 - 9:44 pm

I don't know of any papers on this, but we have had similar concerns in analyzing ADHD symptoms in general population surveys versus treatment samples. It seems that when a treatment sample is used, you get subclasses of ADHD such as hyperactive only, inattentive only, whereas with a population sample some of that detail gets lost due to broader distinctions being made.

I wonder what would happen if you oversampled the high scorers.

Li xiaomin posted on Monday, October 03, 2011 - 9:07 pm

Dear Dr. Muthen,
I have a question. Suppose there are 3 data files, naming "file1.dat","file2.dat", and "file3.dat", and 3 input file, "file1/2/3.inp". how can i use Mplus to analysis the 3 data automatically and generate the associated output file (file1/2/3.out)?

Thank you in advance!

Linda K. Muthen posted on Tuesday, October 04, 2011 - 6:50 am

You cannot do this. You could create a bat file with the set of inputs that you want and you will receive a set of outputs. You may want to check if MplusAutomation can help you. See the website under Using Mplus Via R.

Li xiaomin posted on Saturday, October 08, 2011 - 8:10 pm

thanks for the suggestions!

Junqing Liu posted on Thursday, October 27, 2011 - 11:56 am

Dr. Muthen,

I used the following command to save the class membership based on a LPA into a seperate dataset.

SAVEDATA: SAVE=CPROBABILITIES;
FILE IS ebppostprobs.dat;

I need to do analysis using the class membership and some other variables that are included in the original dataset, but not in the class memberhsip dataset.

How may i merge the two datasets or possibly directly save the class memership into the original dataset? I am new to Mplus. Thanks a lot!

Linda K. Muthen posted on Thursday, October 27, 2011 - 1:29 pm

You should use the AUXILIARY option for the variables from the original data set that were not used in the analysis. Then they will be saved also.

AnneMarie Conley posted on Monday, January 09, 2012 - 3:58 pm

(I apologize if this has been answered elsewhere, but I can't figure it out from the userguide.)

Is there an easy way to save the class means (and variances) from an LPA? I'm using 6.1 on a mac and need to export the means to plot the solution in a separate program. I know I can save the parameter estimates to an outfile using the ESTIMATES option, but it's not an ideal way to extract just some of the parameter estimates. If there's a faster way to do this, I'd appreciate knowing about it.

Thanks,

AnneMarie

Linda K. Muthen posted on Monday, January 09, 2012 - 4:00 pm

No, there is no way to export only certain parameter estimates. You need to save them all.

Claudia Recksiedler posted on Thursday, January 19, 2012 - 3:05 am

Dear Dr. Muth�n,

I have a question concerning the treatment of missing cases in Latent Profile. I am dealing with cross-sectional data of three same-aged birth cohorts (18-29 years old) on four transitions marking adulthood: moving out of the parental home, starting the first job, getting married, and becoming a parent. For each transition, I have a status variable stating if a person already experienced the respective transition and if yes, the precise age. First, I analyzed the timing of the transitions separately using Cox regression because of the large number of censored cases for marriage and first child.
Second, I am interested in looking at all four transitions simultaneously to explore different pathways/patterns into adulthood using Latent Profile (or Latent Class Analysis) for each cohort separately in Mplus.
I am just concerned about the large number of missing cases because many subjects did not marry or get children yet. Is mixture modeling capable to handle the censored cases, do I need to address it specifically in the program? Moreover, is it possible to run a Latent Profile Analysis based on the precise age at transitions or do I have to run Latent Class Analysis based on categorical status variables?

Thank you and kind regards,
Claudia

Bengt O. Muthen posted on Thursday, January 19, 2012 - 8:55 am

You can do LCA/LPA with continuous age and/or categorical status - that is, you can mix scale types in Mplus mixture modeling.

Mixture modeling does not handle censored cases as in survival analysis. It seems complicated to come up with a model that both determines when an event happens and then apply LPA/LCA to it, so some simpler approach is needed. For instance, restrict your analysis of marriage, child timings to the older subjects to reduce the amount of missing data.

Melinda Gonzales-Backen posted on Thursday, January 26, 2012 - 2:47 pm

I am running an LPA with 6 indicators. I want to see of the profiles differ based on ethnicity and gender. Can I use the KNOWNCLASS command for this? Should I run seperate LPA models for each group first to make sure that they have a similar latent profile structure?
Thank you!

Bengt O. Muthen posted on Thursday, January 26, 2012 - 4:15 pm

Yes, you should first run separate group analyses. You can use Knownclass, but it is somewhat simpler to have the 2 variables be covariates. If the covariates influence only the latent class variable ("c ON x" in Mplus language), then you have measurement invariance, that is, the same profiles - but you allow for different class prevalences. If you have some direct effects from the covariates to the LPA indicators, then you don't have measurement invariance. The covariate analysis also shows you the class-specific means of the covariates.

Melinda Gonzales-Backen posted on Saturday, January 28, 2012 - 10:17 am

Thank you so much for your response. In the case that the groups have different structure (exaple: I just ran the LPA for one group and found that a 2 profile model was the best fit, whereas when all data are used, an 8 profile model is the best fit) I would not use KNOWNCLASS or the covariate method, correct? In this instance, I would assume that it would be most appropriate for me to discuss these as seperate models from seperate subsamples, correct? Thanks so much for your help!

Bengt O. Muthen posted on Saturday, January 28, 2012 - 11:15 am

Right.

Melissa Kimber posted on Wednesday, February 15, 2012 - 9:00 am

Hello,
Is there an input example of an latent profile analysis somewhere on the website. I have 3 continuous indicatiors that I would like to run a LPA on.
Thank you.

Linda K. Muthen posted on Wednesday, February 15, 2012 - 10:15 am

See Example 7.9 in the user's guide on the website. LPA is LCA with continuous latent class indicators.

Anthony Rosellini posted on Tuesday, February 21, 2012 - 9:49 am

Hello,
I am using LPA to examine 7 indicators coming from a variety of self-report and clinical interview data (i.e., scales in different metrics) and had a few questions.

1) Under what circumstances would one override the assumption of conditional independence and allow freely estimated indicator covariances within classes?
Should this decision be made primarily based on model fit (e.g., if the conditional dependence model provides lower BIC)?

Based on prior posts it sounds like conditional dependence should be specified if method effects are suspected and not solely because of high correlations between indicators. What are other circumstances when a conditional dependence would be an appropriate approach?

2) It seems like the below syntax can be used to specify a model with conditional dependence (3 class model):

MODEL:
%OVERALL%
y1 y2 y3
y1 WITH y2 y3;
y2 WITH y3;

However, is it necessary to also specify freely estimated indicator covariances within each class, or would this be redundant coding e.g.,

MODEL:
%OVERALL%
y1 y2 y3
y1 WITH y2 y3;
y2 WITH y3;
%C#1%
y1 y2 y3;
y1 WITH y2 y3;
y2 WITH y3;
%C#2%
y1 y2 y3;
y1 WITH y2 y3;
y2 WITH y3;
%C#3%
y1 y2 y3;
y1 WITH y2 y3;
y2 WITH y3;

Thank you for the help,
Anthony

Melinda Gonzales-Backen posted on Tuesday, February 21, 2012 - 10:10 am

Hi,
I have run an LPA model in which 2 profiles emerged. I would like to see if these profiles predict a continuous outcome and if this association is moderated by a continuous variable.

My entropy is only .68 so I don't think a class-analysis strategy would be particularly appropriate here. Is there a way to look at this interaction within the LPA framework (eg, by specifying the model when I specify the 2-profile solution)?

Thank you!
-Mindy

Julia Lee posted on Tuesday, February 21, 2012 - 5:59 pm

I had my prospectus defense recently and I was asked to check the data set for nonlinearity by a committee. I am conducting LPA and LTA using Mplus to answer my research questions. I have read several book chapters and papers related to LPA and LTA prior to my proposal defense but I did not come across the issue to check for nonlinearity. Is it an assumption of these two statistical techniques to check for nonlinearity? Thanks.

Bengt O. Muthen posted on Tuesday, February 21, 2012 - 6:21 pm

I would say no. To me, nonlinearity is something that is relevant for the regression of a continuous variable on other continuous variables or with regular Pearson Product-Moment correlations. The LPA model does not consider such regressions because the continuous latent class indicators are related to a categorical (latent) variable. Nor are correlations analyzed or fitted.

Bengt O. Muthen posted on Wednesday, February 22, 2012 - 10:14 am

Answer to Anthony:

1) Note that LPA describes the correlations among the indicators. It does so as soon as you have more than one latent class. So conditional non-independence is a correlation among the indicators that is beyond what is explained by the latent class variable.

I would explore conditional non-independence if I had a priori reasons such as the methods effects that you refer to, or similar question wording.

2) It is not redundant coding but says that you believe the within-class correlations to be different in different classes. I would not recommend within-class WITH statements as a starting point - this is perhaps giving too much flexibility and may result in an unstable model (hard to replicate the best logL).

Bengt O. Muthen posted on Wednesday, February 22, 2012 - 10:21 am

Answer to Melinda:

You can do this in a single analysis. Say that you have latent class variable c influencing continuous outcome y, moderated by continuous predictor x. Moderation is handled by letting y ON x be different in the different c classes. This is so, because moderation is an interaction between c and x.

AnneMarie Conley posted on Wednesday, February 22, 2012 - 1:01 pm

I really want to use LPA for a person-centered analysis I'm doing, but I'm having trouble getting it to perform as well as cluster analysis. This is frustrating, as I find mplus so much easier to use than programs like Sleipner. I have tried a variety of ways of specifying the LPAs (including fixing and freeing variances across classes and variables). After I decide on the number of classes, I compare the LPA solution with a cluster solution from a recently published paper using the same data. In all cases, across multiple outcome variables, the cluster solution does a better job of explaining variability.

Can you help me figure out what I'm doing wrong? Based on the readings suggested on this site, I feel like the right specification of a latent variable model should be as good (if not better) at producing a useful classification system. I understand the problems inherent in the clustering algorithms, especially in the presence of heterogeneity of variances. Still, why would the CA produce a more precise classification?

If you have any ideas or people I could talk to about this (I'm local), I'd appreciate it. Thanks.

Bengt O. Muthen posted on Wednesday, February 22, 2012 - 1:20 pm

Doesn't one of the early chapters in the 2002 LCA book by Hagenaars & McCutcheon claim that LPA with equal variances across variables is similar to k-means clustering?

Regarding the published paper, how do you know which solution is better - what does "explaining variability" mean? That sounds like a principal component criterion.

You could do a small Monte Carlo simulation using Mplus and then see which approach best recovers the model.

AnneMarie Conley posted on Wednesday, February 22, 2012 - 3:50 pm

Yes on the Hagenaars & McCutcheon book. I have it on my desk right now (open to Vermunt & Magidson's chapter 3). (You recommended it on this site. It was very helpful).

Regarding the published paper (mine--out this month in Jrnl of Ed Psych) I used CA to describe patterns of motivation at time 1, then tested for differences between clusters in affect and achievement outcomes at a later wave. Like others, I argue that a person-centered approach gives a better picture of what motivation is than traditional variable-centered approaches. However, CA is very time consuming and (based on the readings suggested here) likely to make assumptions about the data that may not be appropriate.

My next step is to use a similar approach to describe developmental changes in patterns of motivation by conducting CA (or LPA) by grade for kids in grades 7-12. I want some evidence that the LPA solution is as trustworthy (and useful) as the CA solution. It seems like it should be at least as good, but I'm having trouble finding evidence of it. Can you direct me to other ways of establishing the utility of an LPA solution (as compared with CA)? Or is that not even a question I need to answer, in your opinion? Thanks for your help, by the way.

Bengt O. Muthen posted on Wednesday, February 22, 2012 - 4:28 pm

For a description of the advantages of LCA/LPA over k-means clustering, see also Magidson, J. and Vermunt, J.K. (2002). Latent class modeling as a probabilistic extension of K-means clustering. Quirk�s Marketing Research Review, March 2002, 20 & 77-80. (pdf)

with pdf at http://spitswww.uvt.nl/~vermunt/quirk2002.pdf

I have also seen more recent papers comparing the two approaches, with LCA/LPA not always winning, but I can�t find those published. Anyone else?

My prior would be to go with LPA instead of cluster analysis. Particularly if you want to study changes - I don't know of a longitudinal cluster analysis procedure that relates clusters over time.

AnneMarie Conley posted on Thursday, February 23, 2012 - 9:42 am

Thanks for that 2002 ref. It does a great job summarizing reasons for preferring LCA/LPA over clustering.

Bergman, Magnusson, and El-Khouri (2003) describe few procedures for longitudinal CA (e.g., LICUR), but I agree with you that LPA is preferable. I think a better test may be to use the posterior probabilities instead of the most likely class when computing time 2 means for the LPA solution. In that way I could take advantage of the probability-based classification (the first point in the article you posted). If you can think of any papers taking this approach, I'd be grateful for the direction. Thanks again for the help. If you are ever in Orange County I'll gladly buy you lunch.

Alexandre Morin posted on Thursday, February 23, 2012 - 10:30 am

Hi Dr. Muthen,
Is this paper? Steinley & Brusco (2011). Evaluating mixture modeling for clustering: Recommendations and cautions. Psychological Methods, 16(1), 63-79.
It shows that CA can outperform LPA. The full PM issue also includes answers by McLaghlan and Vermunt. Things are tricky with clean simulated data. From experience mixture analysing real-messy-data always involves interacting with the data through error messages, relaxing restrictions, etc. to get at the final model that is never more than the best �approximation� of the reality. In the end, I think the decision is practical. I prefer the flexibility of mixture models since they are part of the generic latent variable family. Assumptions can be relaxed & imposed, fully latent models can be specified (with various degrees of class invariance � see the 2011 special ORM issue on latent class procedures that include illustrations of CFA invariance testing across unobserved subgroups), factor mixture, and even cross-group LPA invariance. These can be implemented in mixture models (and result in substantively interesting new parm estimates), but not in CA. This is especially true of growth mixture models were the "developmental trends" cannot be clearly taken into account in CA models (see the recent Morin et al. aticle in SEM, 2011, 18, 4, pp. 613+ on the advantages of flexibility).

AnneMarie Conley posted on Thursday, February 23, 2012 - 11:18 am

Alexandre, this is really helpful and answers many questions. Thank you. Offer of lunch extended to you, too.

Bengt O. Muthen posted on Thursday, February 23, 2012 - 11:23 am

Yes, that's one of the ones I was thinking of.

Anthony Rosellini posted on Tuesday, February 28, 2012 - 6:10 am

Hello,
I have conducted an LPA on 7 indicators in a sample of 1200 individuals and found that the BIC continues to decline up until a 9 class model (BIC increases for 10 and 11 class models). However, the LMR indicates a 4 class solution (i.e., first non- significant LMR was found for the 5 class model). It is noteworthy that I am using a clinical sample and the majority of the indicators are positively skewed

1) Should I be concerned that LPA may not be the appropriate model given that the BIC continues to decline up until a 9 class model but the LMR indicates a 4 class solution? Or is it safe to assumed that LPA is appropriate given that the BIC eventually did reach a minimum? I have also conducted a single factor FMA and found that the BIC declines up until a 6-class solution, but that LMR still indicates a 4-class solution.

2) Is the interpretation of LMR influenced by size of my sample or the fact that my indicators are positively skewed? Many LPA papers I have read seem to have convergence in deciding the number of classes using BIC and LMR, however many of these studies used much smaller samples (e.g., N=200 or 300). I also found one study using a larger sample that rescaled positively skewed indicators into ordered categories. Would you recommend doing something like this?

Thanks for the help!

Linda K. Muthen posted on Wednesday, February 29, 2012 - 6:07 pm

1. Sometimes statistics cannot guide you. I would let the substantive interpretation of the classes guide me. Many of the class profiles may be very similar.

2. Not that we know of. I would not rescale skewed indicators into ordered categories.

Anthony Rosellini posted on Thursday, March 22, 2012 - 9:48 am

I have arrived at a 6 class solution for my latent profile analysis, but have noticed that the saved class probabilities differ in the solution with random starting values vs. the solution in which I specify the last class to be the largest (e.g., to interpret tech11 and tech14).

Is this discrepancy expected?

Which class probabilities should I use for secondary analyses?

Linda K. Muthen posted on Thursday, March 22, 2012 - 2:16 pm

Make sure you have replicated the best loglikelihood several times in the first analysis. When you specify the largest class to be the last class, be sure you obtain that loglikelihood.

Julia Lee posted on Saturday, March 24, 2012 - 5:06 pm

I am conducting:

LPA cross-sectional data (spring of first grade) and LTA longitudinal data (fall and spring of first grade).
a) Are the LPA and LTA robust to floor effects and outliers? Is this an issue since mixture distribution is allowed but normality within each latent class is assumed? Is there a way to check for normality within each subgroup or should I assume in theory that normality was met for each subgroup? Because this is an unselected sample of first graders, some of the variables were positively skewed and there were outliers.

Bengt O. Muthen posted on Saturday, March 24, 2012 - 5:54 pm

No, LPA and LT are not robust to floor effects and outliers. As my my other reply to you, it may be better to treat the outcomes as censored-normal (there are other alternatives too).

Outliers can be detected by several methods in Mplus, for example the likelihood contribution - see UG.

Anthony Rosellini posted on Wednesday, April 04, 2012 - 1:31 pm

I am trying to include covariates in my latent profile analysis in order to evaluate meaningful between-class differences (e.g., multinomial logistic regressions) on various outcomes. I am noticing that the classes change substantially when regressed onto continuous covariates that are closely related to the latent profile indicators (using one self-report measure of depression as one of the profile indicators; regressing classes onto a different self-report measure of depression). In contrast, the classes do not change in an meaningful way when I regress them onto a related categorical covariates (e.g., depression diagnosis).

Is it appropriate to model direct effects between class indicators and closely related covariates in a situation such as this?

It seems like you sometimes recommend modeling direct effects between covariates and class indicators (i.e., if classes are changing substantially after including covariates). However, in other posts you also caution the acceptability of a mixture solution that changes substantially with the addition of covariates

Linda K. Muthen posted on Wednesday, April 04, 2012 - 1:39 pm

You might find the following paper which is available on the website helpful:

Muth�n, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

Junqing Liu posted on Tuesday, April 10, 2012 - 10:23 am

I run the following syntax in Mplus6 to conduct LPA. All the observed variables are continuous. I kept getting error message of � ERROR in VARIABLE command
CLASSES option not specified. Mixture analysis requires one categorical latent variable.� How I may fix the problem? Thanks a lot.

VARIABLE:

Usevariables =turnov1 turnov2 turnov3 transf1 transf2 transf3 transf4 late1 late2r absent1 absent2r absent3 absent4 behave1 behave4 behave6 behave7 behave8 behave9 behave10 search1r search2r search3 search4 county;
Classes=c(3);
Cluster=county;
Within=turnov1 turnov2 turnov3 transf1 transf2 transf3 transf4 late1 late2r absent1 absent2r absent3 absent4 behave1 behave4 behave6 behave7 behave8 behave9 behave10 search1r search2r search3 search4;
Idvariable=pin;

ANALYSIS:
type = mixture twolevel;
Process=8(STARTS);

Model:
%WITHIN%
%OVERALL%
%BETWEEN%
%OVERALL%
C#1; C#2; C#1 WITH C#2;

output:

tech11 tech14;

SAVEDATA: SAVE=27var3cl2lvPROB;

Julia Lee posted on Tuesday, April 10, 2012 - 2:06 pm

I have a question about LPA. Because I have missing data in the covariates, I used multiple imputation. However, Tech 11 and Tech 14 is not available with multiple imputation. Is there some other way I can get the VLMR, LMR, and BLRT p values? If these are not available, does this mean that entropy and looking for a reduction in LL, AIC, BIC, and ABIC would be the only way to decide how many classes best fit the data and compare it to substantive theory? Thanks. I appreciate your response.

Bengt O. Muthen posted on Tuesday, April 10, 2012 - 6:21 pm

Answer to Junqing Liu:

You must include a NAMES= statement that tells Mplus about the variables in your data. USEV is for the variables to be used in the analysis.

Bengt O. Muthen posted on Tuesday, April 10, 2012 - 6:22 pm

Yes, information criteria would be the only available approach. But BIC isn't bad.

J.D. Haltigan posted on Thursday, April 19, 2012 - 12:07 pm

Quick question re: LPA that may have already been answered ad infinitum I just have not had a moment to do a thorough search.

In the case of a three class solution with 8 continuous indicators...how is that the estimated mean parameter for a given indicator yields a significant z-value in the LPA framework yet when I use the resultant groups to compare b/w group differences on the indicators in the ANOVA framework, the groups do not significantly differ from one another on a subset of the indicators that yielded significant z-values in the LPA framework?

Does this have to do with the local independence assumption? Or, is it that the z-parameter tells one that the estimated mean for that latent class is significantly different than zero (yet may not be significantly different between the classes).

Many thanks

Linda K. Muthen posted on Thursday, April 19, 2012 - 12:24 pm

I assume that when you use the ANOVA framework, you are using most likely class membership. This is not what is used in the LPA where each person is in each class proportionally. Depending on entropy, these can be different.

J.D. Haltigan posted on Thursday, April 19, 2012 - 12:47 pm

Thanks, Linda. Yes, in the ANOVA framework, I am using most likely class membership. When you say that in the LPA each person is in each class proportionally what do you mean exactly?

Linda K. Muthen posted on Thursday, April 19, 2012 - 2:54 pm

A posterior probability is estimated for each person in each class. After model estimation, most likely class membership is determined by examining these model estimated posterior probabilities.

J.D. Haltigan posted on Thursday, April 19, 2012 - 4:32 pm

Yes, I understand the posterior probabilities are used to derive class enumeration. So I guess my question is still why would the derived latent classes (used in an ANOVA framework as a manipulation check) show no significant b/w group differences on a given indicator which has evidenced a significant estimated mean parameter in the LPA itself. Apologies if I am missing the obvious.

Linda K. Muthen posted on Thursday, April 19, 2012 - 6:14 pm

Because in the LPA the means are not compared across the most likely class a person is in but the posterior probabilities for all classes are used for each person. Only if classification is perfect will they be the same. What is your entropy?

J.D. Haltigan posted on Thursday, April 19, 2012 - 8:13 pm

Entropy is .921

Bengt O. Muthen posted on Thursday, April 19, 2012 - 9:09 pm

Your initial message talked about significant z-values for the indicators in an LPA. I assume that you meant significant differences in indicator means across classes? If so, I think you need to send relevant files to be able to answer this.

J.D. Haltigan posted on Friday, April 20, 2012 - 9:11 am

Thank you. I actually think Linda's answer above is what I am trying to ask. Perhaps if I restate my question more clearly just to be sure.

I have k...8 indicators all continuous. I fit a 3 class model (as well as 2 and 4). 3 seems best from the perspective of all of the fit indicators available.

The profile plot clearly shows that it is the later 4 indicators that best separate the groups (the first four the lines are tightly packed together). Gives the impression of a lighting bolt across the sky.

The estimated means for each of the three classes all have significant z-value parameters for the first four indicators (the ones whose lines are tightly packed in the plot).

I then ran the ANOVA on the derived classes and sure enough, the three groups did not show b/w group differences on the first four indicators but did on the later four (as the plot would suggest). This got me confused as to the following:

Why would the estimated mean parameter values in the LPA for the first four indicators be significant, yet fail to reveal these differences in the context of the ANOVA (as a manip check). If the sig. value of the est. means in the LPA is a function of the posterior probabilities (rather than the most likely class membership) I follow. If not, I am still conceptually unclear.

Linda K. Muthen posted on Friday, April 20, 2012 - 10:00 am

Yes, the LPA is a function of the posterior probabilities not the most likely class membership.

J.D. Haltigan posted on Friday, April 20, 2012 - 1:34 pm

So, a significant z-value for an indicator in a given class in the output means that within that class the estimated mean for that indicator is significantly different than zero? In other words, what exactly does the significant mean estimate 'technically' mean as a function of class membership (particularly in the context of my current situation, where the estimated means by class are significant yet the resultant classes themselves are not different [ANOVA] on a subset of the indicators).

Linda K. Muthen posted on Friday, April 20, 2012 - 1:44 pm

Please send files that show exactly what you are comparing and your license number to support@statmodel.com.

J.D. Haltigan posted on Saturday, April 21, 2012 - 1:59 pm

I actually was able to reach out to a friend who clarified for me my question...Simply put (and perhaps my question was not clear) the significance value for a mean estimate for a given indicator references whether that mean is significantly difference from 0. In the context of an ordinary LPA (no covariates, no grouping variable) is this interpretation correct?

Bengt O. Muthen posted on Saturday, April 21, 2012 - 2:18 pm

z values in the Mplus output are always testing against zero. That this does not test class differences in means is what I was trying to say in my response.

J.D. Haltigan posted on Saturday, April 21, 2012 - 2:26 pm

I got it. Apologies for the lack of clarity in my question(s). I was switching from LCA to LPA and got a bit bewildered in the process moving from item response probabilities and thresholds to means and variances.

Katy Roche posted on Monday, April 23, 2012 - 9:27 am

What is the best approach for conducting latent profile analysis with 20 imputed data sets (created in SPSS)? Do I need to create one combined data file from those in order to conduct the LPA?

Linda K. Muthen posted on Monday, April 23, 2012 - 1:50 pm

They should be in separate files. See Example 13.13. Note that you can impute data in Mplus. See DATA IMPUTATION and examples in Chapter 11 of the user's guide.

Maartje Basten posted on Friday, May 11, 2012 - 9:24 am

Dear Dr. Muthen,

I performed a latent profile analysis with 6 continuous variables, N=5000.
I examined 1 to 4 classes. For the 2,3 and 4 class solutions there were a
number of starting value runs that did not converge. In addition, for the
3 and 4 class solutions I got the following warnings:

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE
DEFINITE FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING
VALUES.

THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE
OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH.
THE CONDITION NUMBER IS -0.136D+00.
THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE
MCONVERGENCE OR LOGCRITERION OPTIONS.

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE
AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR
STARTING VALUES. PROBLEM INVOLVING PARAMETER 2.
RESULTS ARE PRESENTED FOR THE MLF ESTIMATOR.

I increased the STARTS to 500 50 and the MITERATIONS to 5000, but this did
not help. I found that one of the variables is causing these problems, but
I do not want to exclude this variable. Do you know how I could solve
these problems? Thank you.

Linda K. Muthen posted on Friday, May 11, 2012 - 10:24 am

Please send your output and license number to support@statmodel.com.

Anat Zaidman-Zait posted on Wednesday, June 06, 2012 - 10:42 am

Hello, I am conducting a Latent Profile Analysis using a set of 8 behavioral characteristics. Based on the results I have identified 5 classes . currently, I am interested in including covariate in the model. When I run the model with the covariates, with 5 classes, I end up having a different number of participants in each of the classes in comparison to my initial analysis. Hence, I thought to set the classes means for each of the variables based on the initial analysis results. How do I set the classes means for each class in the syntax?
Thank you.

Linda K. Muthen posted on Wednesday, June 06, 2012 - 4:21 pm

When this happens, it points to the need for direct effects. See the following paper on the website for more information:

Muth�n, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

deana desa posted on Friday, June 08, 2012 - 8:12 am

Hello Dr. Muthen,

I have fundamental questions regarding latent class analysis and latent profile analysis, which is confusing me.

Here is the situation. The data that I have measuring 6 behaviors. Each behavior has 9 categories, and is evaluated (scored) for 12 different cases. Thus, the cases can be considered cross-sectional(?) measures.

Is LCA be a correct analysis to profile the data I have?

Thanks! I appreciate it.

Do you have recommendation for literature for me to start with this LCA for porfile analysis?

Linda K. Muthen posted on Friday, June 08, 2012 - 11:24 am

The basic distinction between LCA and LPA is the scale of the latent class indicators. In LCA, they are categorical. In LPA, they are continuous. If you treat your latent class indicators as categorical, you have an LCA.

See the Topic 5 course handout and video on the website. There are many references there.

Ting posted on Thursday, June 21, 2012 - 8:48 am

Hi Drs. Muthen,

I am interested in the latent profiles of students (n = 488). I have two questions about LCA:

1) If the variables (Likert-scale questions on epistemic beliefs) I use to generate the latent classes are multi-dimensional, (i.e., I ran a EFA with them, three factors were extracted), then should I still use LCA (Mplus ex 7.9)? If not, what model (and Mplus example) should I use?

2) The smallest BIC is when modeling 6 classes (class sizes ranging from 23 to 100). Is the 6-class result trustworthy?

Thanks!

Linda K. Muthen posted on Friday, June 22, 2012 - 9:50 am

1. You could consider 7.17.

2. I would consider other factors. See the Topic 5 course handout and video.

Robert Young posted on Friday, June 29, 2012 - 8:32 am

Hello,
I have a question concerning Latent Profile Analysis (LPA). J.D. Haltigan touched on the issue in an earlier post.

I have conducted a LPA of 12 different coping methods (12 items), each measured on a 1-5 scale. All fairly normally distributed.

In LCA the output reports which indicators differ between the latent classes (Latent class 1 vs. latent class 2 might significantly differ on three items). Is there an equivalent output for LPA? If not is there an easy way to make this comparison? I suppose a series of equality constrains could be used, but that seems incredibly cumbersome.

A related question is.... how can I determine which item contributes most to discriminating between the latent classes, or in other words I want to find out which indicators contribute most to discriminating between the groups and which are minor, ideally I would like to rank the importance/contribution of each item.

Thanks in advance.
Robert

Bengt O. Muthen posted on Friday, June 29, 2012 - 1:24 pm

I don't know what you mean when you say that LCA output reports which indicators differ between the latent classes.

For both LCA and LPA the interpretation of the classes is most easily obtained by the PLOT command, asking for SERIES and getting the mean/probability profiles over the indicators for each class.

You can as you say test differences across classes between the means of the LPA, but that is cumbersome. There is not an automatic way that I know of to determine which indicators best discriminate between classes, but one has to look at for which indicators the mean estimates differ the most across classes. Some indicators may be good at discriminating between some classes and some other indicators good for discriminating other classes. It would be hard to get a simple summary of this I would think. To get at the significance you can use Model Test, but again it is cumbersome to do it for all possible differences.

J.D. Haltigan posted on Friday, June 29, 2012 - 1:35 pm

Just as an addendum to Robert's post, the issue that I finally got straight after figuring out how to articulate my question properly is that the significance test of the indicator in the LPA analyses tests whether the indicator is significantly different from zero with respect to a given class. This is usually of little substantive import although I guess one could make the case that if the indicator is not significantly different from zero (for all classes?) then perhaps it could be dropped from the indicator set?

Bengt O. Muthen posted on Friday, June 29, 2012 - 1:53 pm

Mplus gives the test of significantly different from zero as a standard and in some cases this test is not of interest at all - this is one such case. The means don't have to be significantly different from zero for the indicators to be useful.

Robert Young posted on Monday, July 02, 2012 - 2:33 am

Dear Dr Muthen (and J.D. Haltigan), Thank you both for the comments. Very helpful.

I suppose one way would be to center and/or standardize the variables - then at least that way I will know which items in any latent class differ from the average/typical response!

RE: 'I don't know what you mean when you say that LCA output reports which indicators differ between the latent classes.'

Perhaps i am misinterpreting or misrepresenting this, but MPlus Latent Class analysis provide a test of differences in odds ratios between classes:

e.g.
http://www.ats.ucla.edu/stat/mplus/seminars/introMplus_part2/lca.htm

*******************************
LATENT CLASS ODDS RATIO RESULTS

Latent Class 1 Compared to Latent Class 2
*******************************

Regards

Robert

Bengt O. Muthen posted on Monday, July 02, 2012 - 7:50 am

I see what you mean about the odds ratios - yes, something analogous can be added for continuous indicators.

Maria Kapantzoglou posted on Tuesday, July 03, 2012 - 8:06 am

Hello,
I conducted a LPA and I was wondering if you could explain why:

(i) the FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON ESTIMATED POSTERIOR PROBABILITIES

and

(ii) the CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

provide different estimates. I am not sure which one to report. Both are very similar but not the same.

Thank you,
Maria

Linda K. Muthen posted on Tuesday, July 03, 2012 - 10:53 am

The first one is based on the model where individuals have a posterior probability for class membership in each class. The second is based on the largest class membership for each individual. I would report the first one.

Mario Mueller posted on Thursday, August 09, 2012 - 5:15 am

Dear Linda,

I performed a LPA in a sample of about n=9.000 and got 2 classes, then, based on a sample of selected Hi-Scorer (10%), I performed another LPA and got 3 classes whereof two were similar to the total sample solution plus one more class that was differently shaped.
I'm interested in whether these two similar classes of both models are comparable? I tried to compare proportions via crosstabs but I'm uncertain what it reveals. Any suggestions?

Thanks, Mario

Linda K. Muthen posted on Thursday, August 09, 2012 - 10:35 am

I would look at the profiles of the means/thresholds and compare them visually. You can use the PLOT command with the SERIES option to plot the profiles.

Mario Mueller posted on Monday, August 13, 2012 - 4:50 am

Dear Linda,

Thank you for your reply.
We have already used the PLOT command to visualize the profiles of both models.
Visually the profiles of the two classes in solution #1 (all participants) look very similar indeed to the first two classes of solution #2 (subgroup of highscorer only).
We are just wondering whether there is a test to examine whether this similarity can be statistically proven? For example, could test X tell us that, yes, the two solutions, albeit stemming from different samples, are not significantly different?

Thank you,
Mario

Linda K. Muthen posted on Monday, August 13, 2012 - 9:09 am

I don't know of any such test.

Adam Myers posted on Thursday, August 30, 2012 - 11:53 am

In LPA, how important is it that the continuous variables used to estimate the class solution approximate a normal distribution? Is it customary to run the typical diagnostics (histograms, etc.) and correct for non-normality by taking the logs of the variables, etc.? Does doing this sort of thing make an important difference? I haven't been able to find advice on this matter in the literature. Your input would be much appreciated. Thanks in advance.

Linda K. Muthen posted on Thursday, August 30, 2012 - 1:25 pm

I would deal with non-normality by using the MLR estimator which is robust to non-normality.

Susan Pe posted on Thursday, September 20, 2012 - 12:10 pm

I am doing a Latent Profile Analysis. Other than using Vuong-Lo-Mendell-Rubin, Lo-Mendell-Rubin adjusted LRT tests, and parametric boostrapped likelihood ratio test, someone recommended that I also check with MANOVA to make sure groups differ as people do with the cluster analysis. Does that make sense for LPA? Thank you.

Linda K. Muthen posted on Thursday, September 20, 2012 - 2:55 pm

To do a MANOVA you would need to use most likely class membership. I think a better approach is to test if the means are different across classes in your LPA model using MODEL TEST.

Jung-Ah Choi posted on Friday, September 21, 2012 - 3:51 am

Dear Linda,

Always thanks for your help. I'm doing Latent Profile Analysis. 4 classes was most approprate. Next, I added predictors to the classes. However, all the coefficients were same among classes. I'm not sure what was problem. My syntax and output as follows.

Syntax: ...

MODEL:%OVERALL%
Zdep WITH Zanx Zagg;
Zanx WITH Zagg;
c#1 ON grade se sef fb fa sex school;

output :

Parameterization using Reference Class 1

C#2 ON
GRADE 0.079 0.029 2.760 0.006
SE -0.450 0.055 -8.114 0.000
SEF 0.033 0.060 0.547 0.585
FB -0.362 0.044 -8.166 0.000
FA -0.303 0.045 -6.787 0.000
SEX -0.042 0.051 -0.821 0.412
SCHOOL -0.201 0.091 -2.212 0.027

C#3 ON
GRADE 0.079 0.029 2.760 0.006
SE -0.450 0.055 -8.114 0.000
SEF 0.033 0.060 0.547 0.585
FB -0.362 0.044 -8.166 0.000
FA -0.303 0.045 -6.787 0.000
SEX -0.042 0.051 -0.821 0.412
SCHOOL -0.201 0.091 -2.212 0.027
......

Linda K. Muthen posted on Friday, September 21, 2012 - 6:08 am

These coefficients are held equal across the classes as the default. You need to mention the ON statement in the class-specific parts of the MODEL command to relax this equality.

Vinay K. posted on Monday, September 24, 2012 - 7:12 am

Hello Drs. Muthen,

I ran an LPA model, where latent clusters were extracted from two latent
variables (say, depression and anxiety), each of which consist of three item
scales.

The three-cluster solution was judged the best according to LMR-LRT test and
other fit indices as well as meaningfulness of the cluster profiles.

A journal reviewer asked me to test the conditional independence assumption
and to report pairwise residuals.
So I inserted Tech 10 in the Mplus output, but it gave me the warning
"TECH10 option is only available with categorical or count outcomes.
Request for TECH10 is ignored."

So it seems that Tech10 cannot be used for categorical variables. What
should I do to get pairwise residuals?

I have not used Mplus a lot. I'd appreciate it if you could help me out on
this.

Linda K. Muthen posted on Monday, September 24, 2012 - 11:51 am

TECH10 is available for categorical outcomes.

Jung-Ah Choi posted on Monday, October 01, 2012 - 12:26 am

Dear Linda,

I always do appreciate your help.
I'm runnuing LPA. I'd like to examine the effects of classes(4 classes) on one outcome variable(continuous variable). Is it possible to analyze classes as a predictor? I got error messages when I used MODEL command like "sa(contunuous outcome variable) ON c(4);". Would you tell me how I specify syntax command if it is possible? Thanks in advance.

Linda K. Muthen posted on Monday, October 01, 2012 - 6:16 am

This effect is found in the varying of the means or thresholds across classes. You don't use an ON statement.

Oxnard Montalvo posted on Wednesday, October 31, 2012 - 6:55 pm

Hi,
I am running an LPA on 9 continuous observed variables. What is the implication for my results if I allow the variance of the observed variables to vary across the classes?

And is it correct that there would be no 'equivalent' to this in LCA (i.e. equivalent to freeing the variance of indicators across classes in LPA), since the indicators are binary?

Thanks

Linda K. Muthen posted on Thursday, November 01, 2012 - 11:38 am

If you relax the equalities of the variances across classes, the model is less stable and it may be more difficult to replicate the best loglikelihood. You can look at profiles of the indicators for each class to assess how much within class variability there is and relax the necessary variances.

John G. Orme posted on Saturday, December 29, 2012 - 9:37 am

Hi Linda,

Suppose that you are doing a latent class analysis with standardized measures that have arbitrary and different scales (e.g., a standardized measure of marital satisfaction with a potential range from 0 to 100, and a measure of marital conflict with a potential range of 0 to 20). Also, suppose that you allow the means and variances of the indicators to vary across classes. Would there be a problem with transforming the raw score to standard scores in this situation? I wonder because it seems like there are advantages to doing this (e.g., it makes it a lot easier to interpret the profile plot because you can interpret differences between classes and other differences as differences in standard deviation units).

Thanks for any advice you can give me about this. My apologies if I�m missing the obvious here!

Bengt O. Muthen posted on Saturday, December 29, 2012 - 2:33 pm

I don't think standardization would be problematic here. Your modeling is not making comparisons across variables (or the same variable across time as with growth), but only across classes.

Yan Liu posted on Saturday, December 29, 2012 - 8:26 pm

Just want to follow up this question. What if I were doing a latent transition profile analysis? Would standardization work across time? Thanks!

Linda K. Muthen posted on Sunday, December 30, 2012 - 6:29 am

No, you should not standardize when you are comparing across time.

Yan Liu posted on Sunday, December 30, 2012 - 8:51 am

Is that because we will not be able to compare the changes of means over time after standardization? Thanks.

Linda K. Muthen posted on Sunday, December 30, 2012 - 4:47 pm

Yes.

Niamh Bennett posted on Friday, March 08, 2013 - 10:52 am

Hello,

I have run the following model and am wondering how to interpret the value of [gpa2] for each class. Are these values simply the mean of gpa2 for each class, while holding sex1 and gpa1 at the level of the sample mean?

MODEL:

%OVERALL%

c on sex1 gpa1;

%C#1%
[papp pavoid efficacy mastery];
[gpa2];

%C#2%
[papp pavoid efficacy mastery];
[gpa2];

%C#3%
[papp pavoid efficacy mastery];
[gpa2];

Linda K. Muthen posted on Friday, March 08, 2013 - 11:20 am

If gpa2 is a continuous variable, this is a mean.

Niamh Bennett posted on Monday, March 11, 2013 - 4:59 pm

Hello again,

Yes, gpa2 is a continuous variable, and I understand that [gpa2] is the code to request a mean. However, what is unclear to me is how the variables in the "c on sex1 gpa1" portion of the model affect the estimated means for each of the latent classes. In my situation, are the means for each class estimated for while holding gpa1 and sex1 at the sample mean?

Linda K. Muthen posted on Tuesday, March 12, 2013 - 9:44 am

Sex1 and gpa1 predict the classes. Gpa2 varies across the classes that are predicted by sex1 and gpa1. There is no direct relationship between gpa2 and the other variables.

Jason A Chen posted on Thursday, March 14, 2013 - 7:17 am

Dear Drs. Muthen,

I am forming profiles based on four variables (Immersion, Interest, Usefulness, and Relatedness). These items were assessed immediately following (T2) a technology activity that students participated in. I would like to see if certain pre-intervention variables (T1) predict membership into these profiles, and whether these profiles are related to outcomes that I assessed after the intervention was over (T4). In Mplus, I know that there is the AUXILIARY (e) and AUXILIARY (r) functions, which perform this function. Does it make sense to do this given that the variables predicting latent class membership occur at Time 1 (T1) and the correlates of latent class membership occur at Time 4 (T4)? Or is there some other code that I should enter to account for this difference in time?

Thank you!

Kari Visconti posted on Thursday, March 14, 2013 - 11:31 am

Hello!
I am currently running an LPA with four indicators of class membership. I am interested in including a control variable to directly predict one of these class indicators. Is it possible to simply include an "ON" statement in the overall model command? If so, what are the implications for interpreting output? For example, the indicator that is being predicted by another variable is presented in the output within each class as an intercept rather than a mean.
Thank you!

Linda K. Muthen posted on Thursday, March 14, 2013 - 2:34 pm

Jason:

I don't see a problem with this. See Web Note 15 which shows a new 3-step approach. It currently needs to be done manually.

Linda K. Muthen posted on Thursday, March 14, 2013 - 2:35 pm

Kari:

Yes, you can include an ON statement in the overall part of the MODEL command. And yes, you will then be estimating an intercept instead of a mean.

Christine McWayne posted on Monday, March 18, 2013 - 4:05 pm

Hi Dr. Muthen,

We are using MPlus to run an LPA to see if different profiles of families engagement exist and the relations between these profiles and demographic characteristics and child outcomes.

When we looked at the results, all but 2 of the auxillary variables were not in the expected metric. We then looked at class membership information that was saved, and also found the variables not in the order that was identified in the output.

Can you help us understand why this happened and how this can be resolved? I tried looking at the forum but couldn't seem to find anything about this.

Linda K. Muthen posted on Monday, March 18, 2013 - 4:08 pm

Please send the files and your license number to support@statmodel.com. Please show in the output what you mean.

Niamh Bennett posted on Thursday, March 21, 2013 - 10:58 am

I have a three class model with distal outcomes. Is there any way in mplus to test the effect of class membership on the distal outcomes while controlling for other variables? In particular, I'm interested in knowing whether the effect of class membership is related to a distal outcome, even when controlling for prior levels of the distal outcome.

Linda K. Muthen posted on Thursday, March 21, 2013 - 1:37 pm

You can regress the distal outcome on the control variables. The relationship between class membership and the distal outcome is then the varying of the intercept rather than the mean of the distal outcome across classes.

Andreas Mokros posted on Friday, April 12, 2013 - 12:50 am

Dear Sir/Madam:
I noticed that in LPA the means and variances for the latent classes differ from the means/variances that would result if one computed them solely based on the most likely class a person is in. As you mentioned in response to an earlier posting this is due to the fact that "the posterior probabilities for all classes are used for each person". Now I am wondering which values to report in a paper. Wouldn't it be easier for the purposes of replication to solely focus on the means/variances as implied through the most likely class instead of the posterior probabilities for all classes? Especially since we would like to provide a Bayesian classification function for assigning new cases to the classes? It would be most appreciated if you pointed us into the right direction.
Thank you and kind regards,
Andreas

Linda K. Muthen posted on Friday, April 12, 2013 - 10:31 am

I would not use most likely class membership. I would use the model estimated values. And for prediction, I would use the SVALUES option of the OUTPUT command to obtain the input including ending values as starting values. I would change to asterisk to the @ symbol which fixes the parameter values and use that input as a prediction mechanism.

Ting Dai posted on Thursday, April 25, 2013 - 7:56 am

Dear Drs. Muthen,

I have 2 measures, each with 16 items (continuous variables), and there are 3 latent factors for each measure.

If I want to see the classification of individuals with these 32 items, what LPA model should I use?

I thought about a regular LPA with all 32 items (ex7.9), but because there are 2 measures I think perhaps I should do a LPA model with two latent class variables (ex7.14)?

A general but related question is:
If the observed indicators are known to be multidimensional (i.e., loaded on multiple factors), should LPA/LCA be used to do classification at all?

Thanks in advance for your reply!

Bengt O. Muthen posted on Thursday, April 25, 2013 - 9:04 pm

You can do many different model variations for this. Either letting latent class variables influence the factors or the items directly. In the former case, you can specify a latent class variable each for each set of 3 factor for the 2 measures.

You have models of this kind shown in UG examples 17, 26, 27.

Carey posted on Monday, June 24, 2013 - 2:21 pm

I am trying to decide whether to do use LPA or a cluster analysis with my data, but am having trouble finding resources that may help me decide. Is it the simplified answer that a profile analysis is used moreso when you are looking at several variables that are somehow related (e.g., scales on one measure like the MMPI); whereas, in a cluster analysis you can use several different measures and kinds of measures?

My research question involves looking at several risk and protective factors (individual, family, environmental) which I hypothesize will create several distinct classes that differ on their potential for "resilience" (e.g., high on risk factors, few protective factors in one group; low on risk factors, high on protective factors in another). I do not believe that my factors are necessarily related as latent constructs, so am not sure if LPA is the right approach.

Finally, I will have two data points and am interested in looking at how the classes predict to outcomes.

Thank you!

Bengt O. Muthen posted on Tuesday, June 25, 2013 - 9:02 am

I would say that the two methods can be used for the same sort of data and research questions.

Here is a good paper comparing K-mean clustering and mixture modeling:

Vermunt, J.K.. (2011). K-means may perform as well as mixture model clustering but may also be much worse: Comment on Steinley and Brusco (2011). Psychological Methods, 16, 82-88.

With two time points and an interest in classes predicting outcomes, I would choose LPA and LTA. See also our Web Note 15.

Carey posted on Tuesday, June 25, 2013 - 1:36 pm

Thank you! Are there a limit to the number of measures that one can use for LPA? And should the measures used to predict your classes be related?

I ask because I am using several individual level variables (self-esteem, IQ), family level variables (parental monitoring, quality of home life), peer level (friendships), and environmental (neighborhood) to predict my classes. Does this limit the usefulness of LPA since I do not expect these variables to necessarily "hang together" as latent constructs?

Bengt O. Muthen posted on Tuesday, June 25, 2013 - 2:00 pm

No, no,and no.

The predictors of your latent class variable do not have to be thought of as a construct. The class variable is the construct.

If you have many predictors, you may want to put the predictors on the auxiliary list and request R3STEP. See UG and Web Note 15.

Wilfried Smidt posted on Tuesday, July 09, 2013 - 6:46 am

Dear Prof. Muthen,

I have conducted Latent Profile Analysis and I am considering the possibility to reject a model due to conditional probabilities of 1 or 0 as stated by some researches. Is this an appropriate way to deal with this problem?
Thank you very much.

Wilfried

Linda K. Muthen posted on Wednesday, July 10, 2013 - 2:32 pm

You should not reject a model due to probabilities of 1 or 0. This can help define the classes.

Reem Saeed posted on Monday, October 14, 2013 - 8:21 am

Hello,

New to Mplus and LCA I am trying to come up with LCs of socioeconomic status in my country. The groups I have (i.e. IDs or areas) are 118. Ive done analysis when the indicators are proportions calculated from total population for each area, and have also tried it using simple counts, e.g. the number of people who are illiterate as an indicator.

Input:

Variable:
Names are ..... (all variables in data);
usevariables = ..... (variables I have chosen to use, includes the ID or area);
classes = c(2)
Analysis:
Type = mixture ;
starts = 500 100;
stiterations = 50;

output:
WARNING: THE BEST LOGLIKELIHOOD VALUE WAS NOT REPLICATED. THE
SOLUTION MAY NOT BE TRUSTWORTHY DUE TO LOCAL MAXIMA ...

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS -0.185D-16. PROBLEM INVOLVING PARAMETER 61.

ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE
INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE
MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT

Advice please?
Thanks

Linda K. Muthen posted on Tuesday, October 15, 2013 - 10:39 am

Please send your output and license number to support@statmodel.com.

emmanuel bofah posted on Wednesday, October 30, 2013 - 7:40 pm

IN CHAPTER 7, OF USER MANUAL IS STATED THAT," In contrast to factor analysis, however, LCA provides classification of individuals". Does it mean if i am saving my spss data into .dat file i should flip the data so that individuals becomes the column and variables become the rows.

Bengt O. Muthen posted on Wednesday, October 30, 2013 - 8:30 pm

No, you are thinking of Q factor analysis which is something different. In LCA the latent variable is nominal, so unordered categorical - people get placed in categories.

emmanuel bofah posted on Wednesday, October 30, 2013 - 9:20 pm

Ok. Thank you very much. I was thinking of between between Q factor analysis an Latent class profiling. Is Q factor analysis possible in Mplus?.

Linda K. Muthen posted on Thursday, October 31, 2013 - 10:24 am

Mplus does not have an option for this. You can try changing the data to have variables as rows and observations as columns. Using that for a CFA is Q factor analysis. I'm not sure if having more columns than rows will create an estimation problem.

Kelly DeMartini posted on Thursday, November 07, 2013 - 1:20 pm

Good afternoon,

I have conducted an LPA with a 10-item scale. A reviewer would like us to assess the local independence assumption of our LPA and assess the bivariate residuals. As Tech10 is not available for continuous data, is there a way for me to get this information? If there is not, how should I best assess this assumption? Thanks for your help.

Bengt O. Muthen posted on Thursday, November 07, 2013 - 2:41 pm

You can request RESIDUAL. Although no test is provided, this might give you an idea for which within-class covariances you want to let free. You can also look at Modindices to guide you in this way. TECH12 is a further possibility. And if you have a high entropy (> 0.8, say), you can divide subjects into classes using most likely class and compute within-class correlations.

Kelly DeMartini posted on Friday, November 08, 2013 - 6:27 am

Thanks very much. That's very helpful. I have requested the RESIDUAL output. Is there a rule of thumb that can be used to determine whether a within-class covariance is too high and/or how many covariances can be freed before the model is invalid?

Bengt O. Muthen posted on Friday, November 08, 2013 - 8:17 am

Short answer: No.

The too high question may be informed by the Modindices.

With continuous indicators you can add all covariances as in UG ex 7.22.

Adar Ben-Eliyahu posted on Thursday, November 14, 2013 - 10:16 am

Dear Linda, I was told that in order to provide support that the best-fitting model for LCA/LPA is �good enough� to justify the use of MANOVA, I should use a 3 step procedure available in Version 7 of MPlus (Asparauhov & Muthen, 2013) that allows researchers to examine the relation between the latent profile variable and the other variables of interest independently while still incorporating the classification uncertainty associated with the latent profile models.
As I was searching what this meant, I came across webnote 15. However, I am having difficulties running this analysis because 1) the variables used to categorize individuals into latent classes are not categorical, they are continuous and 2) I was unsure what was meant by auxiliary variables - would that be the "outcome" or "dependent" variable of interest?
I think that this advice might be for CFAs or for categorical grouping variables, not necessarily for the data I was using. Could you possibly advise?
Thank you

Brianna H posted on Friday, November 15, 2013 - 9:11 am

Hello Drs. Muthen-- In the output of a latent profile analysis with four continuous indicators (all ratio-scored), I received the following warning message for the input instructions: "All variables are uncorrelated with all other variables within class. Check that this is what is intended."

(1) The observed indicators are, in fact, correlated with each other. Does this message refer to the default in Mplus that covariances among latent class indicators are fixed at zero? Could you please advise about what this message means and possible ways to proceed?
(2) Is there a citation that you recommend for the use of ratio-scoring in latent profile analyses or SEM generally? Thank you.

Bengt O. Muthen posted on Friday, November 15, 2013 - 4:26 pm

Answer to Adar

3-step modeling can be used also with continuous latent class indicators.

Auxiliary variable can be a predictor of the latent class variable or a distal outcome, that is, a variable influenced by the latent class variable.

Bengt O. Muthen posted on Friday, November 15, 2013 - 4:30 pm

Answer to Brianna

(1) It means that within class the variables are specified to be uncorrelated. You can make them correlated by using WITH. The fact that the variables are correlated in the sample is captured by the variables being influenced by the same latent class variable, so a within-class correlation parameter is not necessarily needed.

(2) No, I can't think of any. Others?

Adar Ben-Eliyahu posted on Friday, November 15, 2013 - 6:54 pm

Thank you so much for your prompt reply Dr. Muthen!
I was able to get this to run, but am now struggling with a MANOVA as the 3rd step...I entered all the DVs as the auxiliary vars and it seemed like it was alright, but then I could not find the actual across group comparison. Please find my code below:
Thank you!

VARIABLE:
NAMES ARE
ID Q911A1 RQ911A2 Q911A3 Q911A4 Q946A1 Q946A2 Q946A3 Q946A4 ZQ911A1 ZQ911A3 ZQ911A4 ZQ946A1 ZQ946A2 ZQ946A3 ZQ946A4;

USEVAR ARE ID ID Q911A1 RQ911A2 Q911A3 Q911A4 Q946A1 Q946A2 Q946A3 Q946A4 ZQ911A1 ZQ911A3 ZQ911A4 ZQ946A1 ZQ946A2 ZQ946A3 ZQ946A4;
IDVARIABLE IS ID;
missing are BLANK ;
class=C(3);
Auxiliary=ZQ911A1 ZQ911A3 ZQ911A4 ZQ946A1 ZQ946A2
ZQ946A3 ZQ946A4(R3STEP);
MODEL:
ANALYSIS: type = mixture;
starts = 200 10;
%overall%
Q911A1(1)
Q911A3(2)
Q911A4(3)
Q946A1(4)
Q946A2(5)
Q946A3(6)
Q946A4(7);

%C#2%
Q911A1(1)
Q911A3(2)
Q911A4(3)
Q946A1(4)
Q946A2(5)
Q946A3(6)
Q946A4(7);

%C#3%
Q911A1(1)
Q911A3(2)
Q911A4(3)
Q946A1(4)
Q946A2(5)
Q946A3(6)
Q946A4(7);
[Q911A1 Q911A3 Q911A4 Q946A1 Q946A2 Q946A3 Q946A4];

Ginnie posted on Friday, November 15, 2013 - 10:22 pm

Hello Dr. Muthen,

I am conducting a LPA, and would like to incorporate additional observable (continuous) variables to predict latent profiles.

I am thus wondering how I can run this kind of analysis in MPlus.

Thanks,
Ginnie

Linda K. Muthen posted on Saturday, November 16, 2013 - 11:26 am

Ginnie:

See Example 7.12.

Bengt O. Muthen posted on Saturday, November 16, 2013 - 6:12 pm

Answer to Adar:

Please send your output to Support.

Brianna H posted on Sunday, November 17, 2013 - 2:23 pm

Thank you for your replies.

Tan Bee Li posted on Sunday, January 05, 2014 - 9:49 am

Hi,

1. Will there be an issue if the covariates and the indicators consist of both discrete and continuous data?

2. Distal outcomes are similar to the indicators but are identified based on time. Does MPlus require specific research designs such as repeated-measures to make this analysis?

3. Within-class correlations: Since 5 of my indicators are derived from the same construct, they are correlated (these are facets within a particular construct). Moreover, executive function skills which make my 6th, 7th and 8th and 9th indicator was found to be associated with some of these five indicators, and may be skills that contribute to their development, vice versa. Would it be redundant then for me to use LPA?
Is there a hierarchical version for LPA as how HLA is for LCA?

4. I have 1 questionnaire (likert) that measures 5 facets that makeup a construct. I am hoping to use the composite score for facet A as indicator 1, composite score for facet b as indicator 2 etc. Is that allowable?

5. What is the minimum sample size Mplus requires for meaningful analysis? Or how is it determined by the no. of indicators?

Based on my description, if LPA is not suitable, could you recommend a better model?

Thanks.

Bengt O. Muthen posted on Monday, January 06, 2014 - 8:29 am

1. No. But remember that covariates should not be put on the CATEGORICAL= list.

2.No.

3.No. Yes.

4.Yes.

5. No general rule, but you want more observations than parameters generally.

I don't want to make a general recommendation because it would require a much deeper understanding of what you do.

Tan Bee Li posted on Tuesday, January 07, 2014 - 10:50 pm

Thanks for your response.

What would be hierarchical version for LPA?

Also, the EF skills (indicator 6 to 9) have also been proposed to be outcomes instead of predictors (outcome of the 5 facets; predictor 1-5). Is there a meaningful way for me to examine if indicators 6 to 9 are better as predictors vs. distal outcomes?

Thanks.

Bengt O. Muthen posted on Wednesday, January 08, 2014 - 10:43 am

By hierarchical version for LPA I assume you mean two-level LPA, in which case you want to look at

Henry, K. & Muth�n, B. (2010). Multilevel latent class analysis: An application of adolescent smoking typologies with individual and contextual predictors. Structural Equation Modeling, 17, 193-215. Click here to view figures and syntax for all models.

Carrere posted on Wednesday, March 05, 2014 - 9:45 am

Hi,

I am running LPA and would like to free the variances across classes. That would be great if could let me know what code has to be added.

Here is my current program:
VARIABLE:
NAMES ARE id icog scog ihea shea ieng seng;
USEVARIABLES ARE icog scog ihea shea ieng seng;
CLASSES = group(3);

ANALYSIS:
TYPE IS MIXTURE;
LOGHIGH = +15;
LOGLOW = -15;
UCELLSIZE = 0.01;
ESTIMATOR IS ML;
LOGCRITERION = 0.0000001;
ITERATIONS = 1000;
CONVERGENCE = 0.000001;
MITERATIONS = 500;
MCONVERGENCE = 0.000001;
MIXC = ITERATIONS;
MCITERATIONS = 2;
MIXU = ITERATIONS;
MUITERATIONS = 2;

STARTS=100 10;
MODEL:
%OVERALL%
icog with scog ihea shea ieng seng;
scog with ihea shea ieng seng;
ihea with shea ieng seng;
shea with ieng seng;
ieng with seng;

OUTPUT: STANDARDIZED TECH11;

Many thanks!

Linda K. Muthen posted on Wednesday, March 05, 2014 - 10:45 am

You mention the variances is the class-specific part of the MODEL command, for example,

%group#1%
icog scog ihea shea ieng seng;
%group#2%
icog scog ihea shea ieng seng;

Carrere posted on Thursday, March 06, 2014 - 2:56 am

Many thanks!
I have one additional question. What option has to be checked to get the class number each participant has been assigned?

Linda K. Muthen posted on Thursday, March 06, 2014 - 5:59 am

Use the CPROBABILITIES option of the SAVEDATA command. See the user's guide for more information.

Joshua Wilson posted on Friday, March 14, 2014 - 11:03 am

Hello,

I'm running a LPA model with a four-class solution, and would like to reorder the classes so that the largest class is last. To do this, I'm taking the SVALUES from the first output file and then reordering the class labels so the largest class is last. I'm setting STARTS = 0 and using the OPTSEED option. This process worked fine for reordering the 2-class and 3-class solution, but it isn't working for the 4-class solution.

The problem that I'm having is that this changes the entire model. It doesn't reproduce the first model's H0 likelihood value, changes the entropy and other fit statistics, and has very different class assignment values.

Is there anything you can suggest to rectify this?

Bengt O. Muthen posted on Friday, March 14, 2014 - 12:48 pm

When you use SVALUES as start values with Starts=0 you should not also use OPTSEED.

Joshua Wilson posted on Friday, March 14, 2014 - 1:09 pm

Thank you, Dr. Muthen.

That worked to rearrange the classes--now the largest class is last. But, it is still not reproducing the original H0 likelihood value and the original class counts.

FYI--I ran the original model twice, once with Starts = 1000 200; and again with Starts = 2000 500; to ensure that the issue wasn't one of a local maxima. In both cases, I got the same H0 value and it was reproduced dozens of times.

Is there anything else I can try?

Thanks.

Joshua Wilson posted on Friday, March 14, 2014 - 1:23 pm

Hello again,

I figured it out.

The problems arise when I try to reorder more than two classes at a time. Switching class labels of two classes at a time results in no errors.

Thanks!

Joshua Wilson posted on Wednesday, March 19, 2014 - 11:18 am

Hello,

I have a question about interpreting the output of LPA.

I have three continuous indicator variables which were used to estimate a 3-class solution.

Are the class-specific means of the indicator variables interpreted as the actual mean for that class? Or, are the class-specific means of the indicator variables interpreted as the 'mean difference' in values for that indicator between that class and the reference class.

For example, if Indicator1 (i.e., Y1) has an estimated mean of -0.831 (p < .001) for class 1. Is this interpreted as the actual estimated mean for Y1 for class 1 is -0.831, and that this value is statistically significantly different from zero?

Or is this interpreted as, the mean of Y1 for class 1 is -0.831 units less than the mean of Y1 for class-3, and this mean difference is statistically significantly different from zero?

Thanks for your help!

Joshua Wilson posted on Wednesday, March 19, 2014 - 2:23 pm

Also,

How do I interpret the latent categorical means in LPA? What is their 'meaning'?

Thanks!

Bengt O. Muthen posted on Wednesday, March 19, 2014 - 4:29 pm

First post: They are actual means. The test concerns whether it is statistically different from zero.

Second post: These are logit values which correspond to the class probabilities printed at the top of the results (Model-estimated...).

Joshua Wilson posted on Wednesday, March 19, 2014 - 5:56 pm

Thank you, Bengt!

That really helps.

Brianna H posted on Wednesday, April 09, 2014 - 4:58 pm

Hello -- I have been working on a latent profile analysis following the instructions of Asparouhov and Muthen (2012) for using TECH 11 and TECH 14 to determine the optimal number of classes (link below).

In my 5-class solution TECH 14 output, I receive the warning message "WARNING: THE BEST LOGLIKELIHOOD VALUE FOR THE 4 CLASS MODEL FOR THE GENERATED DATA WAS NOT REPLICATED IN 5 OUT OF 5 BOOTSTRAP DRAWS[...]"

Although I have increased the number of random starts in LRTSTARTS (to 0 0 3600 720), I still receive this warning message. Asparouhov and Muthen (2012) state that this warning message should go away if the # of LRTSTARTS are increased.

I am wondering if this issue might be occurring because the 5-class solution has two classes with less than 5% of the sample in the class. Could the model be unstable because of the low # of cases in each class?

Although the 4-class solution is easier to interpret (and has only 1 class with <5% of the sample), most of the model fit statistics (e.g. Entropy, BLRT, LL, AIC, BIC, adjusted BIC, and Tech 11 output) favor the 5-class solution.

Thank you for your help.

http://www.statmodel.com/examples/webnotes/webnote14.pdf

Brianna H posted on Wednesday, April 09, 2014 - 5:07 pm

Actually, all of the fit statistics *except* Entropy favor the 5-class model.

Entropy of the 4-class model is 0.944 and of the 5-class model is 0.939.

BLRT of the 5-class model is 923.04, p<.001.

Thank you.

Bengt O. Muthen posted on Friday, April 11, 2014 - 6:04 am

If you have followed the suggestions of web note 14 and still have problems I would simply go with BIC. I would not use entropy to choose the number of classes. Entropy is a description of the usefulness of the latent class model, not a measure of how well it fits the data.

Johanna van Rijn posted on Thursday, April 24, 2014 - 5:32 am

How is the number of free parameters calculated for a Latent Profile Analysis? I can't seem to find the algarithm anywhere...

Linda K. Muthen posted on Friday, April 25, 2014 - 8:44 am

It is the number of means and variances of the latent class indicators times the number of classes plus the mean logits of the categorical latent variable. There are the number of classes minus one mean logits.

stella posted on Wednesday, April 30, 2014 - 2:34 pm

Dr. Muth�n,

I'm interested in running LPA using a 17 item continuous scale. I'm wondering if it's appropriate to use LPA with items instead of scales/measures (i.e., create latent profiles based on the items from this scale rather than multiple scales)? Thank you in advance!

Bengt O. Muthen posted on Wednesday, April 30, 2014 - 7:21 pm

Either way works - you should do what is closest to your research question, that is, on which level are the latent class likely to be operating?

stella posted on Thursday, May 01, 2014 - 4:36 pm

Thank you!

Johanna van Rijn posted on Wednesday, May 07, 2014 - 1:50 am

Thank you for your reply!

Diana Westerberg posted on Tuesday, July 01, 2014 - 1:23 pm

In reference to your response to Brianna on Nov 15th, 2013, you wrote that using the with statement would allow for variables to be correlated within class. However, I have included a with statement and continue to get that warning. Am I doing something wrong? My input is pasted below.

MODEL: %c#1%
y1 y2 y3 y4 y5 y6;
y1 y2 y3 y4 y5 y6 with y1 y2 y3 y4 y5 y6;

%c#2%
y1 y2 y3 y4 y5 y6;
y1 y2 y3 y4 y5 y6 with y1 y2 y3 y4 y5 y6

%c#3%
y1 y2 y3 y4 y5 y6;
y1 y2 y3 y4 y5 y6 with y1 y2 y3 y4 y5 y6

Bengt O. Muthen posted on Tuesday, July 01, 2014 - 5:21 pm

Please send your output to support@statmodel.com together with your license number.

Janina Roloff Henoch posted on Tuesday, July 08, 2014 - 3:31 am

Hello, I am using latent profile analysis to identify the number of latent classes depending on students� answers on eight scales. As I theoretically expect a solution with four classes, I compare the fit of five models from a two- to a six-class solution. I do this for each of the three time points I�ve got data for. For my second time point, Mplus has a problem with the three-class solution:

�THE ESTIMATED COVARIANCE MATRIX FOR THE Y VARIABLES IN CLASS 1 COULD NOT BE INVERTED. PROBLEM INVOLVING VARIABLE DAVE_OP. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 22. CHANGE YOUR MODEL AND/OR STARTING VALUES. THIS MAY BE DUE TO A ZERO ESTIMATED VARIANCE, THAT IS, NO WITHIN-CLASS VARIATION FOR THE VARIABLE. THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.�

I use starts = 5000 100 and there is no problem with other numbers of classes in this time point or with the three-class solution of the other two time points.
Do you have an idea how to solve this problem? Does this finding mean that a three-class solution does not fit to the data? Thanks in advance!

Linda K. Muthen posted on Tuesday, July 08, 2014 - 5:35 am

Try STARTS = 2000 500; The second number should be about 1/4 of the first number.

Janina Roloff Henoch posted on Tuesday, July 08, 2014 - 6:23 am

Thanks for your answer! I tried what you said, but the problem remains the same. Is there another possible solution for my problem?

Linda K. Muthen posted on Tuesday, July 08, 2014 - 6:55 am

Please send the output and your license number to support@statmodel.com.

Kathryn Modecki posted on Monday, July 14, 2014 - 9:00 pm

Dear Dr.'s Muthen-I have conducted latent profile analyses with 2 age groups (children and adults)-among children I get a unique profile that does not emerge in the adult group. I have received feedback that this unique profile may not be due to age but could be due to SES differences in the age groups. (Although my child profiles do not differ based on SES.) The suggestion is to residualize the outcomes after controlling for SES and rerun the analyses. I am wondering how to approach this -as I haven't located anything through a search on the MPlus board. Further-does this appear to be a solid solution in your view? Many thanks.

Bengt O. Muthen posted on Wednesday, July 16, 2014 - 11:50 am

Residualizing sounds complicated in this setting since you can let SES influence both the latent class variable and its indicators. Why not simply do separate analyses for different SES categories.

Chrissy posted on Tuesday, September 09, 2014 - 6:43 am

Hi,

I have a question in relation to Latent Profile Analysis (LPA), multi-collinearity and compositional natured data. I am trying to identify weekly patterns of physical behaviour using % of daily time spent in sedentary behaviour, light activity and moderate-to-vigorous activity across 7 days using LPA. All variables (21 indicator variables) are expressed as % of daily time spent in activity and thus together equal 100% for each day. I was wondering whether multi-collinearity and the compositional nature of the data are an issue in LPA? Thank you in advance.

Linda K. Muthen posted on Tuesday, September 09, 2014 - 9:12 am

I think summing to 100 percent will cause a non-positive definite matrix. You can try and see. You may need to leave out one behavior.

Eric posted on Wednesday, September 10, 2014 - 12:36 pm

Dear all,

is the method which is described for LCA models in Asparouhov & Muthen (2014, Auxiliary Variables in Mixture Modeling: Three-Step Approaches Using Mplus, Appendix F-I) also applicable to LPA?

Bengt O. Muthen posted on Wednesday, September 10, 2014 - 12:53 pm

Yes.

Steven L Lancaster posted on Friday, September 12, 2014 - 10:09 am

Hello,
I have searched for this information, but not found it, so if is an overlap I apologize. I am interested in using LPA to examine a number of emotion variables. However, I am not sure if all those variables are relevant. Thus, in the spirit of Raftery and Dean (2006; https://www.stat.washington.edu/raftery/Research/PDF/dean2006.pdf) and Dean and Raftery (2010), I am hoping to use variable selection techniques to determine which of the emotions should be included in the clustering procedures. However, I am not sure how to do this using MPLUS. Has syntax been developed/published for this procedure? Dean and Raftery note this can be done during the clustering procedure itself, is this possible in MPLUS? Thanks!

Bengt O. Muthen posted on Friday, September 12, 2014 - 5:57 pm

I don't think Mplus can do this. But in the upcoming version 7.3 we have added a simple descriptive device to see which latent class indicators are particularly useful for distinguishing among the classes. It is called univariate entropy.

Eric posted on Saturday, September 13, 2014 - 12:03 am

When will version 7.3 be available?

Linda K. Muthen posted on Saturday, September 13, 2014 - 6:17 am

At the end of the month.

Yellowdog posted on Friday, September 19, 2014 - 3:31 am

Dear Linda,

I have derived 4 latent classes that I would like to use as an outcome in a mediated pathmodel. Furthermore, I would like to compare direct and indirect effects between males and females using the GROUPING option.
Can I use classes that were derived from the total sample (with sex using as covariate for class estimation) for a MSA or do I have to estimate classes separately for both groups? Furthermore, is it recommend to use a two-step approach or better all in one (LPA & mediation analyses)?

Thank you, Mario

Bengt O. Muthen posted on Friday, September 19, 2014 - 5:58 am

Are you saying that in an X->M->Y mediation model your Y is a nominal variable based on 4 latent classes? A nominal Y requires special mediation formulas, but can be done.

I would first do an invariance study of gender for the latent class part of the model. Gender can affect c only or also the c indicators directly. In the latter case you don't have measurement invariance. If you have measurement invariance you can use classes derived from the total sample.

Yellowdog posted on Monday, September 22, 2014 - 1:38 am

Dear Bengt,

thank you for your reply.

1. Does "a special mediation formula" mean to create NEW parameters (indirect effects=products of single paths) using MODEL CONSTRAINT?
2. Could you please give an example for how to test for measurement invariance of c and indicators.

Thanks, Mario

Bengt O. Muthen posted on Monday, September 22, 2014 - 12:55 pm

1. The paper on our website

Muth�n, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus

deals with a nominal Y.

2. In a model with c on x1-xp, you can regress each indicator on all x's, one indicator at a time. And see which direct effects are significant.

Sam Courtney posted on Monday, September 29, 2014 - 8:29 pm

Hello,

I am conducting an LPA using 8 continuous variables and have arrived at a 5 class solution. My goal is to identify which class subsequent participants are most likely to belong to based on his or her own responses to these measures.

Is there a way to calculate likelihood for new participants to belong to the previously identified latent classes? If there are multiple solutions, is there a way to do this without re-running all participants each time?

Thank you very much.

Bengt O. Muthen posted on Tuesday, September 30, 2014 - 8:46 am

You can do this by fixing all model parameters and use data for say just one new subject, running this to get the estimated posterior probabilities for a subject for all classes. We discuss examples of this in our Topic 6 handout and video.

You do this using SVALUES in your first run, then changing * to @ and running your second run with Starts=0 to get the posterior probabilities by asking for Save=cprob in the Savedata command.

Sam Courtney posted on Tuesday, September 30, 2014 - 10:44 am

Dr. Muthen,

Thank you for getting back to me so quickly. I will go through the examples and videos, but this sounds like a very promising solution. Thank you for the help!

Katrin Lintorf posted on Monday, October 06, 2014 - 6:55 am

Hello,

in your older postings I read that you recommend to use raw data in a LPA. I tried raw data as well as standardized data and it seems to make a difference because the fit changes. Do you know of any reference which explains the consequences of using (un)standardized data in LPA?

Kind regards
Katrin

Bengt O. Muthen posted on Monday, October 06, 2014 - 7:57 am

LogL and BIC will change because the outcomes are in different metric, but I don't know what you mean by fit changing.

Katrin Lintorf posted on Tuesday, October 07, 2014 - 3:02 am

I do not only have different information criteria but also different p-values in the Vuong-Lo-Mendell-Rubin LRT and in the Lo-Mendell-Rubin adj LRT. With a change from standardized data to raw data, the p-value decreases. So with standardized data the tests indicate a 2-class-solution whereas with raw data they rather indicate 3 classes.

Bengt O. Muthen posted on Tuesday, October 07, 2014 - 9:19 am

Does deciding on the number of classes by BIC also differ for raw vs stand'd?

Katrin Lintorf posted on Wednesday, October 08, 2014 - 5:24 am

Sorry, I can't answer your question clearly. In both cases the ICs decrease with every class added. According to this I should probably adopt a model with 6 classes or more, which seems neither theoretically plausible nor parsimonious (I would expect 3 or 4 classes).
I tried to find out, whether every drop in (B)IC, as small as it may be, is necessarily meaningful or whether there are rules of thumb/formulas, which indicate a negligible drop. Unfortunately, I am not yet satisfied with what I found. Could you perhaps recommend further reading?

A follow-up question to your answer: Would you generally prefer the BIC over the above mentioned LRTs? Or are there certain conditions under which the LRTs are not trustworthy?

Bengt O. Muthen posted on Wednesday, October 08, 2014 - 10:01 am

When BIC keeps on decreasing with increasing number of classes there is often another kind of model that is more suited to the data. It could be something minor like residual covariances.

I usually use BIC together with looking at how different the solutions are (in terms of mean profiles) for k and k+1 class models that have close BICs. Sometimes the k+1th class is just a minor variation on one of the k classes.

Katrin Lintorf posted on Wednesday, October 08, 2014 - 1:33 pm

Thank you for the advice. I will have a look at the different solutions in the way you suggest.
Concerning your first point: Which criterion do you use, in order to decide whether an additional class is neccessary or whether some residual covariances should be permitted in some classes?

S Elaine posted on Thursday, October 16, 2014 - 9:35 pm

I am conducting LPA to examine profiles of socioemotional functioning among children in my study.

I tested the prelim syntax:

data: file is socio.dat;

variable: names= Pnumb CBCL_INTP CBCL_EXTP
CBCL_PTSprob CBCL_INTtot_Tscore
CBCL_EXTtot_Tscore CBCL_PTSprob_Tscore;

idvariable = Pnumb;

classes= c(5);

Analysis:
type=mixture;
starts=500 100;
processors=2;

Model:
%overall%
[ CBCL_INTP$1 CBCL_EXTP$1 CBCL_PTSprob$1];

%c#1%
[ CBCL_INTP$1 CBCL_EXTP$1 CBCL_PTSprob$1];

%c#2%
[ CBCL_INTP$1 CBCL_EXTP$1 CBCL_PTSprob$1];

%c#3%
[ CBCL_INTP$1 CBCL_EXTP$1 CBCL_PTSprob$1];

%c#4%
[ CBCL_INTP$1 CBCL_EXTP$1 CBCL_PTSprob$1];

%c#5%
[ CBCL_INTP$1 CBCL_EXTP$1 CBCL_PTSprob$1];

OUTPUT:
TECH1 TECH5 TECH10;

SAVEDATA:
FILE IS sociotest.dat;
SAVE = CPROBABILITIES;

My output file includes:
WARNING in VARIABLE command
Note that only the first 8 characters of variable names are used in the output. Shorten variable names to avoid any confusion.*** ERROR
The number of observations is 0. Check your data and format statement. Data file: /Users/shelbyelainemcdonald/Desktop/diss/socio.dat
ERROR Invalid symbol in data file:
"Pnumb" at record #: 1, field #: 1

any suggestions?

Linda K. Muthen posted on Thursday, October 16, 2014 - 10:01 pm

It sounds like the variable names are in the first record. Delete them.

Caitlin Turpyn posted on Thursday, November 13, 2014 - 9:52 am

Dear Dr. Muthen,

If I am conducting an LPA with covariates in the same step ("c on covariate" rather than the 3-step approach), does this increase the probability that the covariate will be related to latent classes, thus increasing type I errors?

Thank you.

Bengt O. Muthen posted on Friday, November 14, 2014 - 1:08 pm

If the true model is one where the covariate influence c but not the indicators of c directly, there should be no real difference between 1- and 3-step results.

Lauren Little posted on Monday, December 15, 2014 - 11:30 am

Dear Dr. Muthen,
I am running an LPA with four variables to determine number of profiles. I get this ERROR:
One or more variables have a variance greater than the maximum allowed of 1000000. Check your data and format statement or rescale the variable(s) using the DEFINE command.
Do you have any suggestions? Thank you!

Linda K. Muthen posted on Monday, December 15, 2014 - 11:49 am

Please send the output, data, and your license number to support@statmodel.com.

Leigh Greiner posted on Tuesday, January 06, 2015 - 7:59 am

Dear Dr. Muthen,
I am carrying out a Latent Profile Analysis using 10 dependent variables: 6 continuous, 1 categorical, and 3 count (and 2 covariates). I am using archival data, and two of my continuous variables are actually subscale means, with scores ranging from 0 to 4 (each has approximately 25-30% zeros, but are comprised mainly of non-integers; e.g., 2.83). Although I believe these two variables are technically count variables, I decided to treat them as continuous, as count variables cannot have non-integer values. When run this way, a 3-class solution is generated that is meaningful and consistent with theory.

I do have one concern--the distributions of these two variables (both measuring mental health symptoms) are positively skewed (as I would have anticipated them to be in my population). I am using the MLR estimator, which I understand is robust to non-normality. However, when discussing my results with a colleague, I was told that given the degree of skewness, treating them as count variables might be more appropriate. So I tried re-coding these two variables into integers (using the CUT option) to be able to treat them as count variables. However, when run this way a four-class solution fits best, and my classes are no longer consistent with theory and are more difficult to interpret.

Do you have any advice as to the most statistically valid approach here?
Thanks,
Leigh

Linda K. Muthen posted on Tuesday, January 06, 2015 - 8:31 am

I do not think you should treat these variables as count variables. I think the best way to treat them is as continuous or censored.

Leigh Greiner posted on Tuesday, January 06, 2015 - 8:50 am

Thank you for your quick reply!

Treating them as continuous was my first instinct too, but I want to be sure I am not simply choosing the easiest solution to interpret.

I have one more question, just to make sure I am on the right track. The variables I am having trouble with are measures of anxiety and depression (equal to the mean number of symptoms present for each disorder, which is likely why my colleague suggested it be treated as a count variable). If I treat these variables as continuous, is it possible that the tail end of the skewed distribution is driving the formation of one of the classes? I only ask because when I treat them as count variables, the mental health variables do little in distinguishing classes, but when used as a continous variable, one class has much higher mean scores on these mental health variables relative to the other 2 classes.

Thanks,
Leigh

Bengt O. Muthen posted on Tuesday, January 06, 2015 - 1:44 pm

You can see how much of a difference it makes to treat them as censored; this takes the piling up of zeros into account.

IW posted on Thursday, January 08, 2015 - 1:37 pm

Dear Drs. Muthen,

Do modeling specific within class covariances (i.e., indicator "with" statements) affect class membership?

Thanks!

Bengt O. Muthen posted on Thursday, January 08, 2015 - 5:08 pm

Yes to some extent.

Leigh Greiner posted on Wednesday, January 14, 2015 - 8:49 am

Hello again Drs. Muthen,
As per your earlier suggestion, I have tried treating my skewed continous mental health variables as censored (from below-zero inflated). My results appear to be similar to when I treat them as continuous-normal, in terms of class differences/optimal number of classes.

However, I am confused as to how to interpret my output. When run as censored variables, "one or more logit scale parameters approached and were set at the extreme values. Extreme values are -15.000 and 15.000."

1. What do the means represent in the output for censored variables? Are they log-odds, as they would be with zero-inflated count variables?

2. I am confused as to how I would interpret negative means, given that one can't have a score less than zero on the anxiety/depression measures I am using. For example, how would I interpret this for class 1?

Means
Anx#1 -15.000
Anx -0.168

Thank you for any clarification you can provide. Or if you know of any relevant readings that might help, that would be great too.

Thanks!

Bengt O. Muthen posted on Wednesday, January 14, 2015 - 10:44 am

1. When you do censored-inflated you have a binary latent variable indicating if the subject is in the zero class or not (see our UG or zero-inflated Poisson literature). A logit of -15 that your output shows indicates zero probability of being in the zero class, which means that you don't need the inflation part but can specify the variable simply as censored.

2. Censored modeling assumes an underlying continuous, unlimited latent response variable and the mean refers to the mean of that variable so it can be less than the lower censoring point. See the censored-normal literature.

A good book for both topics is by Scott Long (Sage "White" series).

Leigh Greiner posted on Wednesday, January 21, 2015 - 9:09 am

Thank you for your help thus far and for your quick replies! Treating my variables as censored seems to have done the trick. But I do have another follow-up question: I read in the UG that in order to model the covariance between two censored variables, you need to use special modeling (e.g., use a latent variable that influences both variables). Can you elaborate a bit on how this might be done, or perhaps direct me to an example in the UG?

Linda K. Muthen posted on Wednesday, January 21, 2015 - 10:51 am

With weighted least squares estimation, you can use the WITH option. With maximum likelihood estimation, you can use BY.

f BY y1@1 y2;
f@ 1;
[f@0];

where y1 and y2 are censored variables. The covariance value is found in the factor loading for y2.

HwaYoung Lee posted on Thursday, January 22, 2015 - 11:48 am

Dear Dr. Muthen,
I ran several LPA using four indicators, one covariate and one distal outcome.
The output for 2-class model showed only one best likelihood value even though I increased random start values and # of optimization (STARTS= 20000 8000;). When I used optseed to do LMR and BLRT test (2 classes vs. 1 class), LMR is .06, but BLRT is significant (<.00001).
Then, I tried to run 3-class model. For 3-class model, LMR values varied. When I used several optseed numbers with different best likelihood values, p values for LMR was significant (2 class vs. 3class) for one best likelihood value, but was not significant for two best likelihood values.
AIC, BIC, aBIC as well as BLRT consistently decreased when the # of latent classes increased.
I ran 6-class model, AIC, BIC, aBIC values still decreased, but BLRT was not trustworthy due to local maxima even though LRTbootstrap=200 was used.
My questions are
1) If output provided only one best likelihood value for 2-class model, is that any problem in my code or model?
2) The results of LMR and BLRT were not consistent. Also, AIC, BIC, and aBIC consistently decreased when the # of latent classes increased. How can I decide which model is optimal?
3) Is there any way to evaluate whether this population is homogenous or not?
Because LMR comparing 1-class model vs 2-class model was not significant.
Any help would be appreciated.

Bengt O. Muthen posted on Thursday, January 22, 2015 - 1:39 pm

1) With only one best logL the model is typically not very stable; such situations should be avoided. Modify the model.

2)If BIC keeps decreasing when increasing the number of classes that may indicate that the model should be modified. Perhaps you need to add some WITH statements to allow for within-class correlations among your indicators. A k-class model with some WITH statement may have a much better BIC than a k+1-class model without any WITH statements.

HwaYoung Lee posted on Monday, January 26, 2015 - 12:06 pm

Thank you Dr. Muthen.

I checked correlation coefficients among indicators such as vitamin supplement, # of drinks, # of cigarettes before running any analyses. These correlation coefficents were very low. For example, the coefficients between one indicator and others were .020, -.042, .012, .029 with sample size of above 4,000.
Can I use LPA even though correclation coefficents were low among indicators?

Bengt O. Muthen posted on Monday, January 26, 2015 - 4:06 pm

Yes, you need non-zero correlations among your indicators for there to be more than 1 latent class.

Perhaps you have strong floor effects for your outcomes, causing small sizes of regular Pearson product-moment correlations.

HwaYoung Lee posted on Wednesday, January 28, 2015 - 7:12 am

Thank you for your advice.
Well, I tried to add with statements in K-1 class and compared K-1 class which has with statements and K class. However, differences in BIC values (as well as AIC and aBIC) were larger across number of latent classes, so it didn�t work even though I added with statements in K-1 class model.
By the way, the variables� ranges were large, for example, the ranges of variables are 1) 0 to 277; 2) 1 to 130; 3) 1 to 140; 4) 1 to 34�.). Does this feature affect the number of classes? (sample size is above 4,000).
As I said above, BIC value (as well as BLRT, AIC�) keep decreasing (I tried to run even 20-class model�.)
Any advice would be appreciated.

HwaYoung Lee posted on Wednesday, January 28, 2015 - 11:47 am

One more question.
BIC and other fit indices provide that a larger number of latent class (say 9-class model) is better, but a few people belong to a couple of latent classes, can I just collapse some classes to one? That means I choose a fewer number of latent classes (say 4-class model). Is that ok?

Bengt O. Muthen posted on Wednesday, January 28, 2015 - 4:37 pm

Perhaps you want to investigate a Factor Mixture Model (see the UG for examples) to solve the problem of no BIC minimum.

You can also explore how the interpretation of the classes changes from say K=4 to K=9. Perhaps the classes found for K=4 are still there for K=9 and perhaps the extra 5 classes are not of substantive interest.

Tessa posted on Thursday, February 05, 2015 - 11:57 am

Dear Dr. Muthen and Dr. Muthen,

I am working on a Latent Profile Analysis making use of 4 continuous indicators that were multiply imputed (n = 40 data sets) using NORM prior to reading them into MPlus.

I would like to compare the fit between models with differing numbers of classes; however, when using the TECH11 command to request model fit statistics, I receive the following error statement: TECH11 option is not available with DATA IMPUTATION or TYPE=IMPUTATION in the DATA command. Request for TECH11 is ignored.

Could you please advise on whether there is an alternate command to request model fit statistics (LMRT, BLRT) in this case? I have specified Type is Imputation under the DATA command and Type = Mixture under the VARIABLE command. Thank-you!

Tihomir Asparouhov posted on Friday, February 06, 2015 - 8:48 am

Tessa

If you think in terms of estimating the number of classes as a parameter you can estimate the number of classes for each imputed set and then combine the information as for other parameters. Hopefully they all point to the same number of classes. If not use the mode for the number of classes.

Tihomir

Carey posted on Saturday, February 07, 2015 - 12:53 pm

I have performed a LPA and ended up with 4 classes as the best solution. I am now interested in using the classes as IVs in a moderation analyses with a continuous moderator and a continuous dependent variable. What is the best way to do this?

Bengt O. Muthen posted on Saturday, February 07, 2015 - 2:50 pm

If you have a high entropy (say > 0.8) you can use Most Likely class membership to create C-1 dummy variables for the C-class model. Then proceed as in regular mediation modeling.

JL posted on Monday, February 09, 2015 - 3:31 am

I am a new MPlus- user and would like to use the Mixture add-on (latent profile analysis) to model heterogeneity in one continuous variable. All the examples I have found have typically been multivariate with multiple class indicators.

Question: do you see any caveats or objections to using LPA with one class indicator and if so, would you recommend another anlaysis for my purpose within Mplus?

Many thanks.

Linda K. Muthen posted on Monday, February 09, 2015 - 9:09 am

This is possible. With only one indicator, it can be difficult to know if you are modeling the non-normality.

Katrin M�gi posted on Thursday, February 26, 2015 - 5:47 am

Dear Dr. Muthen,
I have run a five class LPA model with distal outcomes (using manual 3-step approach). I'm interested in knowing whether the effect of class membership is related to a distal outcome, even when controlling for prior levels of the distal outcome. To do this I�ve regressed the distal outcome on the control variables. As I understand the relationship between class membership and the distal outcome should then be varying of the intercept rather than the mean of the distal outcome across classes. What puzzles me is that the diff test results for intercepts across classes(using Model Constraint) are different depending on weather I standardized my control variables (with Define: standardize) or not. I am unsure If I should use standardized or unstandardized control variable scores and how does one or the ohter approach affect the interpretation of the differences in intercepts of my distal outcomes across classes.
Thank you!

Bengt O. Muthen posted on Thursday, February 26, 2015 - 7:29 am

Please send the 2 outputs to support@statmodel.com along with your license number.

S Elaine posted on Monday, March 09, 2015 - 12:59 pm

I'm using 6 indicators of children's socioemotional functioning in an LPA.
Relationships among the majority of the indicators are, as expected, low to moderate. However, there are 3 pairs with the following r values: .690** .714** .652**

I'm under the impression that correlations among the indicators are expected for LPA, and did not use WITH statements for my LPA. However, I see comments in articles that mislead me, such as this:
"We note that (a) vulnerability appraisals, (b) depression, (c) injuries at incident, (d) physical health functioning, and (e) social relations�both positive and negative�were primarily selected for inclusion in the LPA analysis based for the substantive and theoretical reasons discussed early. In addition and prior to the LPA analysis, we examined the bivariate correlations among these variables as well as between each of these variables and the IPV exposure variables. As anticipated, these analyses showed statistically significant relationships among the variables. Nonetheless, no statistical relationship was so strong (i.e., .7 and higher) to indicate that we had measured similar constructs with these various measures." Nurius & Macy (2010)

Should I have included the WITH statements due to the fact that I have pairs indicators that are strongly correlated?

S Elaine posted on Monday, March 09, 2015 - 1:00 pm

As a follow up to my previous post, the resulting model was characterized as follows:

Fit stats for the 3 profile model I selected are:
Log-likelihood (-6291.148); AIC (12634.297); BIC (12729.803);) Adjusted BIC(12647.352); Entropy (.92) LMRT|BLRT p-value (03|0) No. classes with n<5% study sample(0)

The lowest Average Latent Class Probability was .95

Thank you!

Bengt O. Muthen posted on Monday, March 09, 2015 - 6:18 pm

An LPA with at least 2 classes implies that the items are correlated; they are all influenced by the same latent class variable. If some items correlate more that could either signal the need for one more class or the need for WITH statements among some of the items in the 2-class model.

S Elaine posted on Monday, March 09, 2015 - 6:48 pm

Thank you. I should have stated that the correlations I listed were the coefficients for the sample as as a whole. Am I correct in interpreting that you mean if some items correlate more within a class, then WITH statements among some of the items are needed? If so, I do not have this issue for correlations among the items w/in each of the three classes...they are zero or very weak.

Danli Li posted on Thursday, March 12, 2015 - 6:45 am

Hi

I was wondering if it is possible to carry out LPA on a sample size of less than 100, with 14 latent class indicators that are expected to form 3 or 4 classes

Bengt O. Muthen posted on Thursday, March 12, 2015 - 8:26 am

That's a small sample size for a model with so many parameters. But if the classes are well separated it may work ok.

Alvaro Camacho posted on Friday, March 27, 2015 - 9:48 am

Hello,
We have a 3 class solution of an anxious-depression construct (low, moderate and high). We used the high class as the reference to run our logistic model to determine the association with ethnic groups.
The reviewer for a paper will like us to use the low class as the reference category.

This is part of the syntax that we use (excluding USERVARIABLES):
CLUSTER = psu_id;
STRAT = strat;
WEIGHT = weight_final_norm_overall;
MISSING ARE all (999);
IDVARIABLE IS ID;
CLASSES = C(3);
ANALYSIS: TYPE = COMPLEX MIXTURE;
STARTS = 500 50;
K-1STARTS = 20 5;
MODEL:
%OVERALL%
C#1 C#2 ON
mex_do mex_ca
mex_cu mex_pu
mex_sa mex_oth
income_c4 AGE
LANG_PREF GENDERNUM EDUCATION_C3 ;
OUTPUT: SAMPSTAT TECH11;

Where C1=low C2=moderate C3=high

I wonder if we need to change
C#3 C#2 ON and that will give C1 (low) as the reference category?

Thx much
Al

Bengt O. Muthen posted on Friday, March 27, 2015 - 11:52 am

You don't show your USEV list so I don't know what your latent class indicators are. Let's call the latent class indicators y1-y2. To change the class order you want to use the SVALUES option of the Output command to get statements with final estimates in the output and then use those statements to give starting values in a new run with STARTS = 0, where you have changed the class order. For instance, say that your first run has
SVALUES

class 1: [y1*5 y2*10]
class 2: [y1*1 y2*5]

If you want class 1 last, your starting values for the second run has

class 1: [y1*1 y2*5]
class 2: [y1*5 y2*10]

Make sure that the two runs have the same loglikelihood value. If not, you have to give starting values for more parameters.

Alysia Blandon posted on Wednesday, April 08, 2015 - 8:00 am

I have a couple of questions about running a latent profile analysis:

(1) Is it possible to do the DU3step procedure when using multiply imputed data?

(2) Is it possible to include control variables when running the DU3step procedure? So control for earlier levels of a behavior when exploring whether there are differences in the distal outcomes across the different classes?

Thanks!

Tihomir Asparouhov posted on Wednesday, April 08, 2015 - 12:07 pm

On both question: yes - but using the manual DU3step approach. See Section 3 in
http://statmodel.com/download/webnotes/webnote15.pdf

Alysia Blandon posted on Friday, April 10, 2015 - 12:50 pm

Hi Tihomir,

Thanks. I have a couple of follow-up questions.

(1) Is it possible to use the manual DU3step approach if I am using ALGORITHM = INTEGRATION, INTEGRATION = MONTECARLO to run the latent profile analyses with covariates that have missing data? (rather than impute the data)

(2) I realized that when I run the LPA with imputed data TECH11 AND TECH14 are not available, are there recommendations for choosing the model with the best number of classes in these situations?

Thanks!

Tihomir Asparouhov posted on Friday, April 10, 2015 - 3:55 pm

(1) Yes

(2) I would run/use tech11/tech14/BIC for each imputed data set separately and use the number of classes obtained most frequently across the imputed data set. If the amount of imputed data is not substantial I doubt there will be any differences in the class enumeration across the imputed data sets.

Chris posted on Friday, April 10, 2015 - 7:45 pm

Hello - new MPLUS user. I have performed a Latent Profile Analysis with a set of 8 continuous variables for about 370 observations, resulting in a 9-class best-fit solution. Everything appears to be working great, however I am trying to get to the bottom of local independence for our particular case. For a similar question above Bengt recommended calling for RESIDUAL, MODINDICES, and maybe TECH12, as well as looking at in-class correlations if Entropy was >.8, which ours is.

Taken together I am unclear on what, if anything may be wrong, and how best to assess any issues. The last method you mention in particular, if I were to create correlation matrices for each of the 8 variables across the 9 classes separately, model assumptions are such that within-class correlations should be minimal or non-existant, correct? What if we find some correlations here that are moderate or strong? Is this method the best indicator of local dependence? Or would relying on RESIDUAL and MODINDICES output be better? I am unclear on the best way to test for local dependence in the LPA case without TECH10.

Bengt O. Muthen posted on Sunday, April 12, 2015 - 5:29 pm

Q1 Right.

Q2. Then you add WITH for those pairs.

Q3-Q4. That's a methods research question that I don't think is resolved.

You can also try Factor Mixture Modeling, where you introduce a factor and try different number of classes. We have UG examples of that. Large factor loadings can indicate which pairs of variables are in need of a WITH statement (instead of the factor).

HwaYoung Lee posted on Friday, April 24, 2015 - 9:11 am

Hello Dr. Muthen,
I ran LPA using six indicators (physical activities, individual smoking, second-hand smoke, vitamin use, restaurant meals and alcohol use).
When adding one more class, fit indices kept decreasing (never resolved).
One issue is that two-class model had only one best log L value. Increasing starting values didn�t help to resolve this issue. Among indicators, individual smoking and second hand smoke were correlated (r= .5), so residual covariance between two indicators was added in the model. But the model with residual cov had still one best log L. [I don�t want to use Factor Mixture model, because these indicators, except for two smoking variables, were not related (correlation coefficients were small)].
However, when adding one more class (e.g.,three-class model), a couple of best log L values were replicated.
So, here is my question.
1) Do I just ignore two-class model and go to three-class model or four?
2) When compared the model without residual cov and the model with residual cov, class membership was substantially different (1st model: c1=2267, c2=801; second model: c1=3033, c2=35). Also, entropy substantially improved (.858 to .999). But adding one more class (c-class model) had better fit than c-1 model with residual cov.
Do I choose a model with residual cov? Any suggestions would be greatly appreciated.

Bengt O. Muthen posted on Friday, April 24, 2015 - 5:06 pm

I would use BIC, but to do that in a trustworthy way you need to replicate the best 2-class logL. Here are some suggestions:

If your variables are on very different scales (large diffs in variances), use Define to change that

Use many starts, such as starts = 800 200

Perhaps you need a smaller degree of perturbation, so use STSCALE = 1

Use a 1-factor factor mixture model to see which pairs of indicators have the larges loadings and therefore could use WITH added to the original LPA

Randi J Bertelsen posted on Wednesday, May 13, 2015 - 2:50 pm

Hi, I have data from a questionnaire where approx 7000 women were asked how they would describe themselves (body shapes) at different ages. Each question had 9 pictures of bodyshapes varying from very thin to very thick and they were asked to choose the one they thought resembled themselves at different ages (8 yrs, menarche, 30y and "now").So there are 4 variables with 9 categories and some missing (at random). I did a simple LCA model:

VARIABLE:NAMES ARE id2 wom8y womarche wom30y wom45y womow wompm ;
USEV = wom8y womarche wom30y womow ;
IDVARIABLE IS id2;
CLASSES = c (6);
CATEGORICAL = wom8y womarche wom30y womow;
MISSING ARE ALL (-9999)
ANALYSIS: TYPE = MIXTURE;
ALGORITHM = INTEGRATION;
STARTS= 1000 100;
SAVEDATA:FILE IS bodywomenLCA.dat;
SAVE = CPROB;
I did find a good fit for 6 classes, but I want to take into account the age-effect so that wom8Y comes before womarche and womow is always the last "observation".
I tried by adding:
Model: %OVERALL% i s | wom8y@1 womarche@2 wom30y@3 womow@4;
But get the following Error Message: This analysis is only available with the Mixture or Combination Add-On.
Am I using a wrong approach here?

Bengt O. Muthen posted on Thursday, May 14, 2015 - 6:41 am

Please send these 2 outputs to support along with your license number.

Randi J Bertelsen posted on Thursday, May 14, 2015 - 11:42 pm

Thank you. I found that there were some issues with the program I had on my Laptop. But when I ran the syntax on my stationary computer I got the following error:
*** ERROR in MODEL command
The categorical variables in the growth model do not have the same number of
categories. Use the CATEGORICAL option to allow the number of categories to
differ for maximum likelihood estimation. Problem with: I S
One of my ordinal categories had categories scoring from 1-8, and the others 1-9, so I added the following command: "CATEGORICAL = wom8y(1-8)| womarche(1-9)| wom30y(1-9)|womow(1-9)";
But I still get the error saying that the categorical variables in the Growth model does not have the same number of categories.

Bengt O. Muthen posted on Friday, May 15, 2015 - 7:57 am

Check the (*) option on page 544 of the UG.

Merlijn Venus posted on Monday, June 22, 2015 - 4:35 am

Dear Bengt and Linda,

I just performed my first latent profile analysis with 5 indicators. I have 2 questions.

1)My estimated within-class means are outside the range (i.e., >5 on a 5-point scale). What could this indicate?

2)Ultimately I would like to see whether my latent profile variable (c) predicts a distal outcome (y) above and beyond the latent indicators (u1 - u5), because the indicators are typically associated with y). Can I simply perform the appropriate 3-step approach to this and y on u1-u5 as well, or does it require something else?

Thank you for your reply.

Best,
Merlijn

Bengt O. Muthen posted on Monday, June 22, 2015 - 2:19 pm

1) Please send output and license number to support.

2) A 1-step model is not identified when both the latent class variable and all its indicators predict a distal, so I would not trust a 3-step approach that attempts this. The indicators would be associated with the distal if the latent class variable predicts the distal (they have a common cause), so that is not an argument for including both types of predictors of the distal.

Tony Bonadio posted on Wednesday, September 16, 2015 - 4:01 pm

Hello Drs. Muthen,

We are running a latent profile analysis using the 8 syndrome subscales from the Child Behavior Checklist (CBCL; parent report measure) and the 8 syndrome from the Youth Self Report (YSR; parallel youth report) as indicators. We are interested incorporating multiple informants to identify distinct subgroups of youth and explore patterns of reporter discrepancy. We feel that using all 16 continuous indicators provides information regarding patterns of symptom severity that could be lost by using the standardized difference scores. However, my concern is that we are violating the assumption of local independence as parent and youth are both reporting on the same individual. Although these reports are not highly correlated, theoretically they are not independent of each other.

So my questions are:
1) Do I need to be concerned about this violation?

2) If so, would allowing the parameters between corresponding indicators (e.g., CBCL Withdrawn Depressed and YSR Withdrawn/Depressed) to correlate WITHIN class account for the shared variance attributed to the multiple reporters for each child?

Thanks,

Tony

Bengt O. Muthen posted on Wednesday, September 16, 2015 - 5:23 pm

Yes, you should be concerned (a bit). If you can include the 16 variables for both parent and youth, a 32-variable analysis would account for the dependent observations. This is called a wide approach.

S.Arunachalam posted on Friday, October 02, 2015 - 6:42 pm

Respected. Prof. Muthen. I am running a simple LPA with one continuous variable, two latent classes. I tried two models, one with variance of the variable being equal (default setting) and the other variance relaxed to be un-equal.
The Tech 11 outputs for the two models are completely different. Webnote 14 is greatly useful in deciding no. of classes, however the variables are categorical in that webnote. Please advice on how to decide whether variance should be set to equal or unequal and the number of classes please. I am just running a simple test model before getting to my project that has 3 continuous variables please.

--- Equal variance --

LO-MENDELL-RUBIN ADJUSTED LRT TEST
Value 28.594
P-Value 0.0000
------ Unequal variance ----

LO-MENDELL-RUBIN ADJUSTED LRT TEST
Value 29.511
P-Value 0.5927

Bengt O. Muthen posted on Sunday, October 04, 2015 - 4:52 pm

Decide based on BIC and interpretability.

S.Arunachalam posted on Sunday, October 04, 2015 - 6:53 pm

Thanks a lot.
BIC for equal variance is 61.667; unequal variance is 66.894. So I could potentially choose the equal one? Could you advice on interpretability please as it is subjective. Should I be checking whether the p-value of variance in un-equal is statistically significant? Right now it is in-significant. In sum, given the lower BIC and insignificant equal variance, shall I go for the equal-variance (default) model?

The class counts and proportion are almost equal
--equal variance--
1 101 0.50249
2 100 0.49751
--unequal variance--
1 91 0.45274
2 110 0.54726

---> May I kindly understand the reasons as to why in Mplus the default setting is equal variance, though that is a restricted model of a more general un-equal variance please.

Bengt O. Muthen posted on Monday, October 05, 2015 - 5:25 pm

I cannot advise on interpretability.

Mixture models can get too flexible so that allowing class-varying variances can create problems such as small classes with very small variance.

S.Arunachalam posted on Monday, October 05, 2015 - 6:12 pm

Thank you Prof. Muthen.

Simon Coulombe posted on Saturday, October 17, 2015 - 9:32 am

Hi, I have obtained a three-profile solution using 6 continuous indicators. I now want to examine how different continuous, ordinal and binary covariates are related to profile membership. Theoretically, some of these covariates are more likely to be predictors and others more likely to be outcomes. However, all indicators and covariates have been measured on a same time.

For continuous outcome, I was planning to use Auxiliary DCON.
For binary predictors, R3STEP.
However, for continuous and ordinal predictors, I'm not sure. Is it appropriate to use R3STEP? Should I treat them as outcome instead and use DCON or DCAT?

Sorry if this might be obvious, but I'm relatively new to this type of analysis.

Thank you very much!

Linda K. Muthen posted on Saturday, October 17, 2015 - 3:29 pm

The AUXILIARY option works differently for covariates versus distal outcomes. For covariates, the coefficients are partial regression coefficients. For distal outcomes, each one is done independently of the others. For covariates, you should use R3STEP. For a continuous distal, BCH is recommended. For categorical distals, DCAT is recommended.

Simon Coulombe posted on Saturday, October 17, 2015 - 3:57 pm

Thank you very much for the fast answer.

I have three more sub-questions:
a) Is it ok to use ordinal or continuous predictors (covariates) with R3STEP (auxiliary)?
b) For ordinal distal, can I use DCAT?
c) The variables that I consider to be "outcomes" have been measured at the same time as the profile indicators variables, so is it ok to consider them "distal outcomes" (the study is cross-sectional)?

Thank you so much for your precisions. This is so helpful!

Simon

Bengt O. Muthen posted on Sunday, October 18, 2015 - 11:01 am

a) Like in regular regression such ordinal variables are treated as continuous. So, yes.

b) Yes.

c) Yes, all distal means in this context is that they don't predict the latent class membership nor measures/indicates the latent class variable (not part of forming the latent classes).

For recommended 3-step methods, see the tables at the end of Web Note 21.

Simon Coulombe posted on Sunday, October 18, 2015 - 8:13 pm

Thank you very much!

Simon Coulombe posted on Wednesday, October 21, 2015 - 7:14 am

Hi, I'm still having trouble deciding if I should treat one of the covariates as an outcome or as a predictor of latent class membership.

Is this just a theoretical matter? Basically, I want to know if the variable is associated or not with latent class membership. If feel like considering the variable as a predictor or an outcome will give a similar answer (although with different statistics) to this question, isn't it?

Thanks again! Your answer are very helpful.

Simon

Jon Heron posted on Wednesday, October 21, 2015 - 8:51 am

Hi Simon

your approach should ideally reflect your thinking about causality.

It's true that in some cases - e.g. a logistic model with a single binary covariate - you can swap them round with no effect, however more generally this is not the case.

Not only does your chosen approach potentially lead to a different interpretation, you may also be making different assumptions. Zuzana Bakk discusses this in her paper about the LTB method (which turns distal outcomes into predictors of class membership)

http://members.home.nl/jeroenvermunt/bakk2014b.pdf

best, Jon

Bengt O. Muthen posted on Wednesday, October 21, 2015 - 3:39 pm

If the variable is chosen as an outcome it is specified as independent of the latent class predictors conditional on latent class. If it is one more predictor that is not the case.

An outcome is like one more latent class indicator.

Simon Coulombe posted on Thursday, October 22, 2015 - 3:43 pm

Thank you very much.

ywang posted on Thursday, November 12, 2015 - 8:25 am

I have a quick question on class counts on the LCA outputs. The output provides two sets of class counts: one is FINAL CLASS COUNTS AND PROPORTIONS and the other is class counts based on MOST LIKELY LATENT CLASS MEMBERSHIP. I read the previous posts and it was recommended to report the former. However, when I checked the output file with save=Cprob and it gave me the class counts on the latter. Please clarify it. thanks.

Linda K. Muthen posted on Thursday, November 12, 2015 - 9:58 am

The CPROBABILITIES option gives the posterior probabilities for each class and also give most likely class membership. You should report FINAL CLASS COUNTS AND PROPORTIONS.

coulombe.simon@uqam.ca posted on Monday, November 30, 2015 - 8:40 pm

Hi,

I have conducted a LPA with distal outcomes (BCH). I'm reporting the results from the EQUALITY TESTS OF MEANS ACROSS CLASSES. I'm wondering to what exactly refers S.E.

Is there a way to transform S.E. into S.D.?

Thanks in advance.

Simon

Bengt O. Muthen posted on Tuesday, December 01, 2015 - 5:27 pm

The SE is the SD of the estimate.

coulombe.simon@uqam.ca posted on Tuesday, December 01, 2015 - 5:35 pm

Thanks for the answer. So, I can simply report SE as SD if I understand well.

Linda K. Muthen posted on Wednesday, December 02, 2015 - 6:52 am

Yes.

Laura Healy posted on Thursday, December 03, 2015 - 6:50 am

Hello,

I have used LPA within some of my research to create profiles of goal motivation within student athletes. I have also used the AUXILIARY function to analyse between-profile differences in a range outcomes. Having received some feedback from a peer-reviewer, I was wondering if it is possible to generate effect sizes for the analyses?

Kind regards,
Laura

Bengt O. Muthen posted on Thursday, December 03, 2015 - 6:15 pm

You can compute (outside Mplus) the ratio of the mean difference and its SE.

Michelle Lalonde posted on Sunday, December 06, 2015 - 6:46 pm

Hello,
Thank you for the post indicating that it is not a requirement that all items be measured on the same scale and have the similar variances when conducting an LPA.

Would you please be able to provide a title for a paper indicating this (for citation purposes)?

Thank you,

Michelle

Bengt O. Muthen posted on Monday, December 07, 2015 - 1:39 pm

I think there is a paper by Vermunt and Magidson where they compare k-means clustering and LCA/LPA and they might have mentioned this in that context. It's a chapter in the Applied Latent Class Analysis book by Hagenaars and McCutcheon.

coulombe.simon@uqam.ca posted on Sunday, December 13, 2015 - 5:55 pm

Hi,

I have run a latent profile analysis and I'm examining the profiles' associations with categorical covariates with the R3STEP command. However, there are missing values on the covariates (under 5% for all covariates, except 10% for one of them). What is the best way of dealing with these missing values ?

Bengt O. Muthen posted on Sunday, December 20, 2015 - 5:33 pm

There is really no good way to deal with that using R3STEP. Multiple imputation creates limited options in the next step. If you are just interested in one variable at a time your can use DCAT instead.

Katrina Brewsaugh posted on Sunday, February 14, 2016 - 7:18 am

I'm running an LPA with 4 observed continuous variables. Some of the models with the best fit indices (BIC, LMR, etc.) had low proportions of final starts reaching convergence and/or the reaching the LL. Should I re-run these models using the final start values obtained via SValues?

Bengt O. Muthen posted on Sunday, February 14, 2016 - 5:14 pm

Try getting more best logLs by using a smaller degree of random starts perturbation: STSCALE=1.

coulombe.simon@uqam.ca posted on Wednesday, February 17, 2016 - 10:05 am

Hello

I have conducted a latent profile analysis with the default Mplus setting (free mean estimation, but fixed variance). I have learned that allowing free variance estimation could also be pertinent. What are your point of view on this, concerning the context in which it would be most pertinent?

Also, concerning model selection (number of profiles) what would be your advice when the BIC, CAIC and AIC suggest four profiles, but the VLMR and ALMR suggest three? Also, the level of entropy is the highest for three profiles (0.90 vs 0.85 for four profiles)

Thanks for your point of view.

Simon

Bengt O. Muthen posted on Wednesday, February 17, 2016 - 4:54 pm

Q1. I don't think you want to freely estimate the variances to be different across groups - that can cause problems of tiny classes. I think you want to first analyze with equal variances across classes. Then check the variation in each class for those classified into it and see if any of the classes need a free variance for any of the variables.

Q2. I would simply stick to BIC.

coulombe.simon@uqam.ca posted on Thursday, February 18, 2016 - 10:37 am

Thanks for the answers. Here is a follow-up on these questions.

Q1. How would you proceed to check such variation within each class (what to look for in the Mplus output)?

Q2. From some of my readings, I thought that I was appropriate to stop looking at adding more profiles as soon as the VLMR and/or ALMR tells so?
And what about the entropy? Shouldn't we favour the profile with the best entropy?

I just want to make sure that I understand the rationale. Thanks in advance for your support.

Simon

Bengt O. Muthen posted on Thursday, February 18, 2016 - 5:02 pm

Q1. Classify people by most likely class and then check.

Q2. If the indices disagree, I would go with BIC. I would not choose a model based on entropy - it's like R2 in SEM: A model can have a bad R2 but a good fit and a model can have a good R2 but bad fit.

Katrina Brewsaugh posted on Saturday, February 20, 2016 - 9:14 am

I lowered the degree of random starts using STSCALE as suggested and the model did not converge. I've tried using the svalues and increasing the number of starts to 500/100 (also no convergence). Is it appropriate to now conclude that this model has failed, even though in my initial runs it had the best fit on multiple indices (that had only 10% converge)? As this is my first time doing LPA, I'm not sure if the results I'm getting are due to 'user error'.

Linda K. Muthen posted on Saturday, February 20, 2016 - 10:33 am

Please send your output and license number to support@statmodel.com.

coulombe.simon@uqam.ca posted on Sunday, February 21, 2016 - 10:54 am

Hello,

Thank you very much for the time you devote to answering our questions.

Concerning my question 1 above (whether or not I should allow free variance across classes), I have produce a classification with the default setting (equal variance across classes), then I have classified people by most likely class and I have looked at the standard deviation within each class for each indicator. How large of a difference in standard deviation (or variance) between classes should warrant to free the variance across classes? Should a use a statistical test (e.g. Levene) to determine this?

Another question: I want to predict membership to classes (profiles) from a list of several binary variable (sex, etc) + one continuous variable (age)? Before including all these variables at the same time in the R3STEP command (multivariate), is there a way I can test them one-by-one in order to do a first clean up, retaining only the few most likely to be pertinent for the R3STEP? (my sample size is relatively limited, so I want to limit the number of variable)

Last question: Can the command 'auxiliary (e)' be used with binary auxiliary variable?

Thanks for everything. If these are too specific questions and you prefer to refer me to relevant literature, I would totally understand.

Simon

Bengt O. Muthen posted on Monday, February 22, 2016 - 5:56 pm

Q1 Just look for very large variance differences - it's up to your judgement and trial and error.

Q2. Use BCH or DCAT, treating each variable at a time as a distal outcome.

Q3 Don't use "e" - it is outdated. See the summary tables at the end of the paper on our website:

Asparouhov, T. & Muth�n, B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Web note 21.

coulombe.simon@uqam.ca posted on Monday, February 22, 2016 - 6:33 pm

Thank you so much for your answer.

Q2: In my question above, I was specifically talking about testing predictors of profile membership. If I understand well, it's ok to use BCH or DCAT in this context, even if I conceptualize the variables to be predictors and not outcomes? Is this the only way to test the variable one by one? Instead should I use R3STEP and re-run the analysis integrating each time a different variable as predictor?

Another question: I want to produce "equality of means" tests for characterizing the profiles on the very exact variables used for deriving the profiles (the indicators). In this case, is this ok to use R3STEP ? I have been suggested to use" auxiliary (e)" in such context because these variables are neither predictors, neither outcomes.

Thank you so much!

Simon

Bengt O. Muthen posted on Tuesday, February 23, 2016 - 6:00 am

With one variable, viewing it as a predictor or distal outcome is the same thing statistically - conditional on the latent classes you assume uncorrelatedness between this variable and the latent class indicators. With several variables used as covariates in R3STEP you don't assume uncorrelatedness among the covariates (conditional on the classes).

R3STEP does not provide equality of those means. Use BCH (or DCAT for categorical). Read web note 21.

Sara N. posted on Tuesday, February 23, 2016 - 1:47 pm

Hi,
I have conducted a LPA analysis with three indicators and the best fitting model is a two-class model. Using the three-step approach, I then regressed my classes on continuous predictor and moderator as well as their interaction terms.
However, the percentages of cases in each class changes because some of the cases are kicked out of the analyses) in the third step. I tried to include the variances in the overall model,but I get an error message. Is there any possible way to keep the cases ? Below is the mplus code for step 3:
Thank you!

usevar = class X Z Product ;
missing = all(999.000, 9999.000);
classes = c(2);
nominal = class;

DEFINE:

Product=(X*Z );
analysis:
type = mixture;
Starts = 0;
processors = 4(starts);
MODEL:
%OVERALL%
c ON Product (b3)
X (b1)
Z (b2);

%c#1%
[class#1@1.170];
%c#2%
[class#1@ -3.038];

MODEL CONSTRAINT:
NEW(LOW_W MED_W HIGH_W OR_LO OR_MED OR_HI);
LOW_W = (4.805-sqrt (1.182));
MED_W = (4.805);
HIGH_W = (4.805+sqrt (1.182));

OR_LO = exp(b1 + b3*LOW_W);
OR_MED = exp(b1 + b3*MED_W);
OR_HI = exp(b1 + b3*HIGH_W);

Bengt O. Muthen posted on Wednesday, February 24, 2016 - 4:13 pm

You can do multiple imputation first. And then do the 3rd step repeatedly over the imputation draws. But you would have to compute the SEs yourself based on the usual imputation formulas.

TA posted on Wednesday, February 24, 2016 - 5:05 pm

Hi,

Is the auxiliary dcon/dcat available in 7.0 Mplus?

Thanks.

Bengt O. Muthen posted on Wednesday, February 24, 2016 - 6:22 pm

Not sure off hand - check Version History on our website - or just try and see if you get stopped.

afshin zilanawala posted on Monday, February 29, 2016 - 6:26 am

hi,

i'm doing a latent profile analysis and have extracted 3 profiles using 4 continuous indicators. i'd like to look at a distal, binary outcome of the profiles.

(a) can DCAT handle sample weights?
(b) if i am using the subpopulation command, do i need to be doing the manual steps by first saving class probabilities, then following the example in Appendix A in the appendices to Web Note 15?
(c) if the automatic method can handle the subpopulation command, do i instead follow the automatic syntax of Appendix A?

thanks so much!

Taylor BC posted on Wednesday, March 02, 2016 - 7:49 am

I am conducting a LPA with 11 continuous indicators and 2 dichotomous covariates. I have been comparing models using random starts and things seem to be going smoothly, no warnings aside from variables not correlated within classes and fit indices are helping choose a solution. After reading through all of these threads, however, I am wondered if I need to try running the models with specified start values. I basically have picked random values (I.e., 0 and 100) because it is not clear to me how to choose the start values. I am getting different results from my original models with random starts. How do I know whether I need to worry about specifying start values?

Tihomir Asparouhov posted on Wednesday, March 02, 2016 - 3:01 pm

To afshin

(a) No
(b) There is no manual for BCH
(c) Appendix A is for 3 step methods only, not DCAT

I would use BCH instead of DCAT.

afshin zilanawala posted on Thursday, March 03, 2016 - 1:06 am

hi Tihomir,

thanks for the help. i have one follow up question:

in previous posts i've noticed that DCAT is recommended for categorical outcomes, but you suggested BCH for a binary outcome? can you point to some insight on this? thanks.

Tihomir Asparouhov posted on Thursday, March 03, 2016 - 9:48 am

The mean for the binary is the same as the probability of being one. You can use also the 3-step and/or the BCH manual approaches with categorical variable specification as an alternative.

Also take a look at
http://statmodel.com/examples/webnotes/webnote21.pdf

Eric posted on Thursday, March 31, 2016 - 2:42 pm

Dear all,

I have conducted a LPA with seven variables and a 2 or 3-class solution, respectively.

Is there any way to test if the means of the varibales differ between the classes?

Bengt O. Muthen posted on Thursday, March 31, 2016 - 3:20 pm

Yes, just label the means in each class and then use Model Test.

anonymous Z posted on Tuesday, April 05, 2016 - 12:37 pm

Dear all,

I want to examine the occurrence pattern of different drug use (drug A, drug B, and drug C) across four time points. The distribution of all these drugs includes a preponderance of zeros. I am thinking to do a joint trajectory latent class analysis. The class membership will be decided by the change trajectory of drug A, drug B, and drug C. I know how to do joint trajectory LCA If the response outcome variables are normally distributed. But how should I go about with joint trajectory LCA when my response outcome variables include a lot of zeros.

Thanks so much!

Bengt O. Muthen posted on Tuesday, April 05, 2016 - 6:56 pm

You can use categorical, count, or two-part modeling. All are describe in the UG.

anonymous Z posted on Wednesday, April 06, 2016 - 8:52 am

Dr. Muthen,

Thanks for your response. I have experiences with two-part modeling and LPA separately, but I have never integrated them into one examination. I wondered how the syntax should be like? I cannot find relevant example from the User Guide. Could you give one example?

Thanks so much!

Bengt O. Muthen posted on Wednesday, April 06, 2016 - 6:05 pm

I assume you would want an LCA for the binary part of your two-part approach. So you would have two latent class variables that you can relate to each other using WITH. I have no such example but it seems possible to do.

Shraddha Kashyap posted on Wednesday, April 13, 2016 - 1:56 am

Drs Muthen,

I'm not sure how to interpret the output of a multinomial logistic regression which I used after conducting an LPA (using the auxiliary function).

Estimate S.E. Est./S.E. P-Value

Is the "estimate" beta, or an exponential B?

Thanks!

Bengt O. Muthen posted on Wednesday, April 13, 2016 - 12:48 pm

It is a beta in logit scale. See the end of Chapter 14 in our UG for a description of multinomial logistic regression.

Shraddha Kashyap posted on Thursday, April 14, 2016 - 8:31 pm

great, thanks!

Paulina Pizarro Laborda posted on Monday, April 18, 2016 - 11:48 am

Hi, I'm doing a LAP with a sample size of 31 pre-school teaching. I need to know the number of items or variables that must be included in the analysis (5, 6 or 7?) Given the size of the sample. Or where I can find this information, because the articles I have read identify criteria to select the sample size or the amount of latent class, but not the quantity of items. Thank you very much.

Linda K. Muthen posted on Monday, April 18, 2016 - 4:36 pm

This sounds like a question for a general discussion forum like SEMNET.

Paulina Pizarro Laborda posted on Tuesday, April 19, 2016 - 2:08 am

Thank you, but, why if I�m doing latent profile analysis my question it�s for general discussion forum like SEMNET (you refer to The Structural Equation Modeling Discussion Network?)
Thank you

Linda K. Muthen posted on Tuesday, April 19, 2016 - 1:40 pm

They cover a variety of topics beyond SEM. There may be another better general discussion forum but I don't know what it would be.

Sara N. posted on Friday, April 22, 2016 - 1:57 pm

Dear Dr. Muthen,
I would like to ask a follow up question regarding your response to my previous post (posted on Tuesday, February 23). From your response, I conclude that including variances in the overall model to keep the cases, which is an option when running SEM and path models, is not available for LCA/LPA models. Is that correct?
Thanks!
Sara

Bengt O. Muthen posted on Saturday, April 23, 2016 - 9:27 am

At the very least, it leads to heavy computations.

Judith Scott posted on Saturday, June 04, 2016 - 1:23 pm

Good day!

I ran 3 latent variable models to identify profiles for three types of parental discipline. For each selected profile solution, entropy was between .997 to 1 and profile probabilities ranged from .996 to 1.

With such high entropy and probabilities, I created profile categorical variables. I then wanted to identify groups of parents across the profiles associated with discipline1, 2 and 3. It was suggested that a cross tabulation would work because of the certainty of the profiles. I then created group categorical variables and used them to predict a distal outcome in the same wave using binary logistic regression.

A peer raised concerns about my methods. I am confused as to what to do:

1) Was treating my profiles as a certainty despite such high entropy and probabilities a mistake?

2) Do I need to use Lanza's method even if my distal outcome is in the same wave and I use chi-squares to followup on my logistic regression in order to understand the relationship?

3) Was the cross tabulation a mistake? it was important to determine whether profiles existed within each discipline because I could not find any other study that demonstrated variance in how people used strategies within a discipline type. I then wanted to show how parents combined the 3 profiles of discipline1, 2 and 3.

Thank you!

WEN Congcong posted on Saturday, June 04, 2016 - 7:30 pm

Greetings,
I have a question in my study.After running the LPA, an extension of LPA model is to include covariates to predict the latent class membership.Theoretically speaking,covariates should be included in the LCA, otherwise,the model may be misspecified, leading to distorted parameter estimates.(Muthen,2004)

As is known to us all,in application of LCA and LPA, they usually assume measurement invariance and only covariates influence the latent class and indirectly influence the observed variables via the latent variable. But in practice,the assumption is often violated and cause problems.

I want to ask you: Before proceeding the LPA, I analyzed the population with Multiple group ESEM and tested the factor loading invariance and intercept invariance with success.Do I still need to include covariates in the LPA model to predict the class membership? Please give me some advise. Thank you!

Bengt O. Muthen posted on Sunday, June 05, 2016 - 12:07 pm

It is good that you have established measurement invariance using multi-group analysis. But that is for the mixture of classes, not for each class. So there is still a threat that you may have some covariates that not only influence the latent class variable but also the latent class indicators directly, in which case leaving out the covariates when determining the number of classes can lead you astray.

WEN Congcong posted on Sunday, June 05, 2016 - 5:04 pm

Thanks for your advise!

Sophie T. Hebert posted on Friday, June 10, 2016 - 7:49 am

I,
I just received a comment from a reviewer who wants to know about the collinearity of my independant variables in a 3-step LPA-D model. I assume the reviewer is talking about my indicators (n=5). To your knowledge, is there a limit beyond which correlation between the indicators can affect the model?

I read a lot on this concern but I could not find a clear response. Do you know a reference that can help me figure out if the correlation between my idicators a problematic? To be more precise, I would appreciate to have references on the correlation between indicators in LPA.

For info, I have a sample of 315 and I got 30 parameters estimated for a 3-profile model. The entropy is 97%.

Thank you for you advise!

Bengt O. Muthen posted on Friday, June 10, 2016 - 10:36 am

I am not sure the reviewer is referring to the indicators - they are not independent variables. Perhaps you have covariates in the model? You may want to ask on SEMNET about how to spot collinearity.

Dennis Li posted on Thursday, July 28, 2016 - 1:21 am

Hello,
Please forgive my beginner-level question, but I am trying to understand how Mplus handles missing data. I am running an LPA with 10 indicators, each representing age at a certain developmental milestone (similar to Michael Marshal's post on June 18, 2007). Not all milestones may have occurred for all individuals, and non-occurrence is theoretically as important as occurrence; however, non-occurrence is coded as missing. How might I get LPA to take consider such structural zeros as part of the patterning, or does it already do that? Would it be better to recode structural zeroes with an actual value (e.g., 0 or 100) to differentiate them from true missing values?

Bengt O. Muthen posted on Thursday, July 28, 2016 - 9:17 am

Perhaps a two-part approach is suitable where the occurrence/non-occurrence is also modeled, not only the age. See the paper on our website under Factor Mixture:

Kim, Y.K. & Muth�n, B. (2009). Two-part factor mixture modeling: Application to an aggressive behavior measurement instrument. Structural Equation Modeling, 16, 602-624.

Dennis Li posted on Friday, July 29, 2016 - 5:41 pm

Thank you, Dr. Muthen, for that reference. I am in the process of applying it to my research question to see if it fits my needs.

In the meantime, I read a few times on this message board that modification indices perform strangely for mixture models. My entropy for k>2 is currently above .8, but I am tempted to put in some correlated indicators based on MIs to improve those numbers, with some theoretical justification as well. First, do you think the MI values are trustworthy to follow, and second, would it be better to specify equivalent correlations across classes through the %overall% model or allow correlations to vary by class?

Bengt O. Muthen posted on Friday, July 29, 2016 - 5:50 pm

I would not rely on MIs here but instead specify class-invariant WITH statements to see where they are needed.

Tenelle Porter posted on Wednesday, September 21, 2016 - 1:42 pm

I am running an LPA clustering on three continuous variables. The model runs fine with 1 cluster. When I run two or more clusters, I get the following error:

THE ESTIMATED COVARIANCE MATRIX FOR THE Y VARIABLES IN CLASS 2 COULD NOT BE INVERTED. PROBLEM INVOLVING VARIABLE T1MGCP. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 11. CHANGE YOUR MODEL AND/OR STARTING VALUES. THIS MAY BE DUE TO A ZERO ESTIMATED VARIANCE, THAT IS, NO WITHIN-CLASS VARIATION FOR THE VARIABLE. THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION.CHANGE YOUR MODEL AND/OR STARTING VALUES.

I used up to STARTS = 8000 2000

Any suggestions? Thanks

Bengt O. Muthen posted on Wednesday, September 21, 2016 - 2:45 pm

If that variable's variance is zero it means that there aren't 2 classes with respect to that variable. You can either drop the variable or hold its mean equal across the 2 classes.

Tenelle Porter posted on Wednesday, September 21, 2016 - 3:15 pm

Thanks.

afshin zilanawala posted on Monday, October 03, 2016 - 4:16 am

i'm following up on a conversation with Tihomir on march 3 whereby it was indicated that a binary outcome used in a BCH analysis should be interpreted as the mean and is the same as the probability of being one. how does one write results for the estimates presented in 'model results'? if i take the intercept coefficient and let's say it's .10, can i talk about this as i would in a linear probability model? should i say a 10% probability of the outcome being 1, or should i say a 10% prevalence? thanks!

Bengt O. Muthen posted on Monday, October 03, 2016 - 6:57 am

The BCH part of the output talks in terms of means - which as you say can be expressed as either a 10% probability or prevalence. But the regular Model results output is in terms of thresholds (the negative of an intercept).

afshin zilanawala posted on Monday, October 03, 2016 - 7:30 am

thanks Bengt. my follow up question to your response is i'm following example 3.1 in Webnote 21, and focusing on my results from the 2nd step. Tihomir suggested using BCH when the Y is binary, thus, in the example from the webnote, how would you interpret the coefficient on X? it would still be mean or prevalence? thanks very much!

Bengt O. Muthen posted on Monday, October 03, 2016 - 8:09 am

We need to see your output - please send to Support along with your license number.

Youjoung Lee posted on Thursday, October 20, 2016 - 10:33 pm

Hello,

I'm running a LPA and I'm wondering if I can use MI to make model fit better. Or is there another indices I can refer to in terms of model fit?

Bengt O. Muthen posted on Friday, October 21, 2016 - 11:07 am

Use BIC. MIs don't always work well with mixtures.

Youjoung Lee posted on Sunday, October 23, 2016 - 12:54 am

Thanks, Dr. Muthen!

Here is one more question. If I'd like to add some within group covariances, can I refer to MIs?

Bengt O. Muthen posted on Sunday, October 23, 2016 - 1:19 pm

not sure they are great for that either because adding a covariance may change the class formation.

Youjoung Lee posted on Monday, October 24, 2016 - 11:39 pm

I'm trying to run LPA with 5 continuous variables. I was not able to decide the number of class for my data yet. But I'm assuming that it would be more than 2 classes.

After reviewing the previous discussion, I found out that if I'd like to try conditional dependence, running a factor mixture model is a better option for me.

1) where can I find a syntax for factor mixture modeling with continuous variables?
2) Can I do explanatory factor mixture modeling?

Thank you for your advice in advance!

Bengt O. Muthen posted on Tuesday, October 25, 2016 - 10:04 am

You will find examples of both types of analyses in the User's Guide.

Jenny Gu posted on Wednesday, October 26, 2016 - 4:38 am

Hi Dr. Muthen,

I've run latent profile analysis and used the 'standardize' option under the 'define' command to obtain standardised estimates for each indicator under each profile. I originally noted down the 'STDYX' estimates but noticed that these values differ from the values shown in the plot, which seem to just correspond with the default estimates under model results (e.g., for one profile, the STDYX estimate for one indicator is 1.06 but it looks like it's 0.6 on the plot). Is there any way to create a standardised plot of STDYX values? Or am I approaching this incorrectly and the 'default' model results estimates (which correspond to the plot) are the standardised values I need (rather than the STDYX values)?

Many thanks in advance!

Jenny

Bengt O. Muthen posted on Wednesday, October 26, 2016 - 2:34 pm

You say

"used the 'standardize' option under the 'define' command to obtain standardised estimates"

This option does not do that.

The standardize option in the Define command is not related to the standardized option in the Output command. In Define you standardize the observed variables by their sample variances. In Output we standardize parameter estimates by estimated variances.

Jenny Gu posted on Friday, October 28, 2016 - 10:09 am

Hi Bengt,

Sorry, I meant to say, does the 'standardize' option under 'define' standardise the means of indicators so that positive values are above the mean (0) and negative below the mean? And the values I need from the output (for standardised means) are just the ones under default 'model results', rather than under 'STDYX'? I hope this makes sense!

Thanks very much,
Jenny

Bengt O. Muthen posted on Friday, October 28, 2016 - 12:19 pm

Q1 Yes.

Q2. This is hard to answer given that I don't know what you are after. Perhaps this answer is helpful: Standardizing the variables doesn't imply that you don't need the standardized output. The latter standardization is with respect to the model-estimated variances, not the sample variances used in standardizing the variables.

Jenny posted on Friday, October 28, 2016 - 12:44 pm

Thanks very much Bengt, that's cleared things up for me!

Hamed Q. Saremi posted on Monday, November 14, 2016 - 1:11 pm

Dear Dr. Muthen;

I have a quick question about how to interpret the "model results" in an LPA in Mplus.

I am doing an LPA on five standardized continuous factors. The "model results" section in the output file has a column for two-tailed p-values for each of the five variables in each class. My understanding is that this is the p-value of comparison between the mean of a variable in a class to the mean of that variable in the whole sample. In my case, because variables are standardized, it would be comparison of mean of the variable in a class with zero (mean of the variable in the sample). Is my understanding correct? If so, what is the justification for using mean of the whole sample as the threshold for comparison?

Thank you
Hamed

p.s., a sample of results:

MODEL RESULTS

Two-Tailed
Estimate S.E. Est./S.E. P-Value

Latent Class 1

Means
Z1AVAP -1.432 0.088 -16.288 0.000
Z1AVAC -1.752 0.076 -22.900 0.000
Z1AVPF -1.974 0.139 -14.168 0.000
Z1AVSB -1.196 0.078 -15.278 0.000
Z1AVSS -1.075 0.075 -14.254 0.000
Z1AVPC -0.914 0.113 -8.120 0.000

Bengt O. Muthen posted on Monday, November 14, 2016 - 5:50 pm

The p-value refers to whether the estimate is significantly different from zero - not a comparison against other means.

Hamed Q. Saremi posted on Wednesday, November 23, 2016 - 7:27 pm

Thank you, Dr. Muthen;

Regarding a different LPA, I am using four continuous and one binary variables. The binary variable is simply about whether a company has a CIO. When I run the analysis, regardless of the number of latent classes that I indicate (C(#)), I am getting a non-positive definite error message. I have tried to set starting values manually, but the problem persists. What should I do to resolve this issue? Is there any specific setting when a mix of continuous and binary variables is used.

Any help will be much appreciated;

Thank you
Hamed

p.s., The error message is:

"THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS 0.546D-13. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 13, %C#1%: [ CIO$1 ]"

Bengt O. Muthen posted on Friday, November 25, 2016 - 11:06 am

There is nothing specific to a mix of variable types. You should look at the distribution of the binary variable in the different classes using Most Likely Class (saving cprobs). To be able to say more, we need to see the full output - send to Support along with license number.

Christian Gross posted on Thursday, February 09, 2017 - 11:41 am

Dear Dr. Muthen,

I have two related questions about the wald test of means equality across classes using the automatic BCH procedure. We have a 4 latent profile solution and 1 continuous distal outcome variable.
- Is it possible to interpret a sig. mean difference between two profiles in light of a non-sig. overall test (see output below: class 2 vs. 4)?
- What exactly does the overall test tells us (we thought: it tells if there is any mean difference, i.e., at least between 2 profiles)?

Many thanks for your help!
Christian

Output:
Mean S.E.

Class 1 -0.203 0.168
Class 2 0.152 0.132
Class 3 -0.003 0.089
Class 4 -0.492 0.258

Chi-Square P-Value

Overall test 6.189 0.103
Class 1 vs. 2 2.581 0.108
Class 1 vs. 3 1.089 0.297
Class 1 vs. 4 0.880 0.348
Class 2 vs. 3 0.840 0.359
Class 2 vs. 4 4.951 0.026
Class 3 vs. 4 3.212 0.073

Bengt O. Muthen posted on Thursday, February 09, 2017 - 6:10 pm

Q1: Yes

Q2: Right

Christian Gross posted on Friday, February 10, 2017 - 6:39 am

Thanks for your immediate answer - I really appreciate it!

However, may be I miss something obvious, but I'm still wondering why the p-value of the overall test is .103 (non-sig.) although the output indicates a mean difference between class 2 and 4?

Thanks again for your help!
Christian

Bengt O. Muthen posted on Friday, February 10, 2017 - 10:33 am

What's your sample size?

Christian Gross posted on Saturday, February 11, 2017 - 2:07 am

268

Tihomir Asparouhov posted on Monday, February 13, 2017 - 2:39 pm

The omnibus test has less power to detect significance than pairwise comparison.

With larger sample size there should be no such logical contradiction.

If the number of classes is larger one should worry about �one out of 20 is supposed to show as significant even if they are not� but with 4 classes I wouldn�t worry about that. Multiple testing is an issue if the classes don't have a clear definition. In that case consider these independent variables
Xi~N(0,Vi), for i=1,...,4
Var(X_max - X_min) is not Var(X_max)+Var(X_min)
You can simulate this and determine what this variance is and thereby adjust for multiple testing. More reading in that direction is here
https://en.wikipedia.org/wiki/Extreme_value_theory#Univariate_theory

Randy Mowes posted on Thursday, February 23, 2017 - 6:48 am

Hello,

I am running an LPA with five classes and when I use the save = CPROBABILITIES command, I receive a file in which the probabilities are always 1 or 0. E.g. for a person in the fourth cluster 00010 or for a person in the second cluster 01000. I would expect to see a value between 0 and 1 per cluster, with one of them being higher than the other which would then lead to a person being sorted into that cluster, but not all probabilities being 1 for one cluster and 0 for all other clusters.

What might cause this and how can I receive realistic class assignment probabilities?

Danyel Moosmann posted on Thursday, February 23, 2017 - 7:46 am

Hello,

I have been trying to run an LPA model since yesterday, but the command window has said "waiting for thread 2" since yesterday. Any advice? This doesn't happen when I run the model without starts, but I want to make sure that I'm replicating the best loglikelihood. I'm following the steps from mplus web notes no.14. When I add the below, this is when the issue arises.

starts = 100 20;
processors = 4(starts);

Thanks in advance,
Danyel

Bengt O. Muthen posted on Thursday, February 23, 2017 - 6:22 pm

Randy Mowes:

That is an unusual result. To answer your question we need to see your files - send to Support along with your license number.

Bengt O. Muthen posted on Thursday, February 23, 2017 - 6:24 pm

Danyel Moosmann:

That sounds like thread 2 has malfunctioned somehow; don't wait for it. If you are running version 7.4, send your input and data to Support along with your license number. If not, update to 7.4.

Allison Stuppy-Sullivan posted on Tuesday, March 07, 2017 - 1:49 pm

I am completely new to Mplus and am trying to run a latent profile analysis on 2 continuous variables to fit 2 to 4 classes. I am likely doing a million things wrong. I put together the following syntax for the 3-class model (having adapted it for the 2 and 3 class), and received the error at the bottom of the screen:

Title: 3-Class Latent Profile Analysis;
Data: file is RPTTMplus.dat;
variable:
names=x1-x2;
MISSING ARE ALL (-999.00000);
classes=c(3);
analysis:
type=mixture;
model:
%OVERALL%
%C#1%
x1-x2;
%C#2%
x1-x2;
%C#3%
x1-x2;

OUTPUT:
TECH1 TECH5 TECH8;
PLOT:
TYPE=PLOT3;
SERIES IS x1(1) x2(2);

SAVEDATA:
FILE IS myfile3c.dat;
SAVE = CPROBABILITIES;

*** ERROR in ANALYSIS command
TYPE=MIXTURE is not available for multiple group analysis.

Does anyone have any advice? I am a complete newb.

Bengt O. Muthen posted on Tuesday, March 07, 2017 - 5:49 pm

We need to see your full output - send to Support along with your license number.

Daniel Brown posted on Thursday, March 16, 2017 - 6:10 am

Dear Mplus team,

I was wondering whether you could please tell me if the COOKS SAVEDATA/PLOT option used in Mplus represents Cook's distance or generalized Cook's distance? Or are these statistics the same thing?

Thanks in advance,
Daniel

Bengt O. Muthen posted on Friday, March 17, 2017 - 5:27 pm

Cook's distance.

Daniel Brown posted on Monday, March 20, 2017 - 3:58 am

Great, thank you.
Dan

Gen Li posted on Wednesday, April 26, 2017 - 6:06 am

Dear Dr. Muthen,
I' doing my first LPA analysis now and have some problem with adding covariant to predict different classes.
1. If I want to add "age" as a covariant. I add "C ON age" in the statement of the model. I wonder if age will also be included as an indictor for the LPA. If so, how to avoid that?
2. If I don't include the covariant in the model but I perform an additional multinomial logistic regression using the covariant as independent variable and class as dependent variable. I wonder whether the result will be the same with the result when the covariant is included in the LPA model?
Thank you very much!

Bengt O. Muthen posted on Wednesday, April 26, 2017 - 2:54 pm

1. No, saying c on age does not imply that age is another indicator of c. But it will change class formation (which can be ok).

2. No, the results won't be the same. Read our article:

Asparouhov, T. & Muth�n, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Download appendices with Mplus scripts.

Bengt O. Muthen posted on Wednesday, April 26, 2017 - 2:59 pm

See also our videos and handouts from the short course in Utrecht 2012 on our website.

Morgan DeBusk-Lane posted on Tuesday, May 30, 2017 - 6:47 pm

Likely a simple question. My apologies in advance.

I'd like to test for statistical differences between class means on a three class LPA model.

Would using the posterior probability output and therefore the most likely class membership to generate class means per indicator variable by class membership be logical to compare across the classes?

I assume there is a more accurate method, as the means from this and those derived from the output are different.

Thank you.

Linda K. Muthen posted on Wednesday, May 31, 2017 - 6:49 am

You can use MODEL TEST to test these means as part of the analysis. See MODEL TEST in the user's guide.

Morgan DeBusk-Lane posted on Wednesday, May 31, 2017 - 4:34 pm

Thank you for the quick reply.

Because my LPA model does not include any specific model instructions, what is the best way to go about labeling each class' variable to use with MODEL TEST? As I understand it, I need to label each class' input variable so that it will compare it with another class' variable mean.

This is what I have so far--however, it is making my three class model come out very different than without it.

I've only labeled the first two input variables for the first two classes to see if it works... it does not.

DATA: FILE = W3_HSMS_LPA.csv;
VARIABLE:
Names = (cleared for this blog post);
UseVariables = SE_ZIdea SE_ZMech SE_ZSR;
Classes = c(3);
MISSING ARE all(-99);
ANALYSIS: type = mixture;
Starts = 1000 100;
STITERATIONS = 50;
Process = 4 (Starts);
MODEL:
%c#1%
SE_ZIdea (m1);
SE_ZMech(m2);
SE_ZSR (m3);
%c#2%
SE_ZIdea (m4);
SE_ZMech(m5);
SE_ZSR (m6);
Model Test:
M1 = M4; !simply trying to see if this works.
OUTPUT:
(many requested outputs)

Thank you again.

Bengt O. Muthen posted on Wednesday, May 31, 2017 - 6:08 pm

Instead use brackets to refer to means. Your input (using the variable names) talks about variances which is not what you want.

Daniel Lee posted on Sunday, August 27, 2017 - 3:09 pm

Hi Dr. Muthen,

I estimated a series of LPA models and the model with 3 profiles (entropy = .95) fit the data better than the model with 2 profiles (entropy = .85). In addition, model fit between the 3 and 4 profile model were not significantly different.

One of the profiles in the model with 3 profiles, however, consisted of less than 10 participants (of 150 [total sample]). I really want to examine differences between these profiles, but a sample size of 10 seems just too small. I'm wondering if I should just exclude the 10 cases (i.e., 1 profile) from further analysis due to the small sample size, or just examine profiles from the model with 2 profiles (despite the fit being worse). Any insight on the matter is greatly appreciated. Thank you!

Rich Mohn posted on Thursday, August 31, 2017 - 8:16 am

Hello,

Can you use ranked (ordinal) data in an LCA/LPA? . . . either with all ranked data or in combination with continuous data?

Daniel Lee posted on Thursday, August 31, 2017 - 1:34 pm

I apologize about my previous post. Upon reading it further, I realized that I didn't articulate my question properly. Here is the question:

I estimated a series of LPA models and the model with 3 profiles fit the data better than the model with 2 profiles. In addition, model fit between the 3 and 4 profile model were not significantly different. I would like to conduct a multigroup analysis (since entropy = .95) with groups representing the profile classification (group 1 = profile 1). In each group, I would run a simple regression analysis.

One of the profiles, however, consisted of less than 10 participants (of 150 [total sample]). I really would like to conduct simple regression analyses within these profiles, but a sample size of 10 just seems too small. I'm wondering if I should exclude the 1 profile consisting of 10 participants from further analysis and just examine the desired effects (Y on X's) within the two profiles with decent sample size. Would this be possible?

Bengt O. Muthen posted on Thursday, August 31, 2017 - 4:57 pm

Mohn:

I don't think so.

Lee:

I would just delete these 10.

Anshuman Sharma posted on Thursday, September 28, 2017 - 6:49 pm

Hi Dr. Muthen,

I want to regress Categorical latent variable on a Continuous latent variable (example 7.19 of chapter 7 Mplus user guide). Having said that, I don't want the categorization to be affected by the Continuous latent variable or the indicators corresponding to the Continuous latent variable.

Shall I follow the commands presented in the Ex. 7.19 as it is? Following is my try:

VARIABLE: NAMES ARE y1-y6 u1-u8 s1-s8 T1-T25;
USEVARIABLES ARE y1-y6 T1-T25;
CLASSES = c (3);

ANALYSIS: TYPE = MIXTURE;
STARTS = 1000 100;
ITERATIONS = 20;
LRTBOOTSTRAP = 200;
LRTSTARTS =20 5 100 25;
ALGORITHM=INTEGRATION;
MODEL: %overall%
TR BY T1-T25;
c on TR;
%c#1%
y1-y6; ! y1-y6 are continuous
%c#2%
y1-y6;
%c#3%
y1-y6;
OUTPUT: TECH7 TECH11 TECH14;
The objective is categorization of c shall be based on only y1-y6 not by TR or T1-T25;

Thank you so much.
Regards,
Anshuman

Bengt O. Muthen posted on Friday, September 29, 2017 - 1:40 pm

See the manual approach described in Web Note 15.

Anshuman Sharma posted on Saturday, September 30, 2017 - 7:10 pm

Hi Dr. Muthen,

Thank you so much. Most of the things are quite clear now. I am using Mplus version 7. Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Column) by Latent Class (Row) are not available in the output. Does Mplus version 7 provide this output?

Thanks.

Bengt O. Muthen posted on Sunday, October 01, 2017 - 12:45 pm

If you don't see it - your version doesn't have it (it is provided automatically).

Anshuman Sharma posted on Sunday, October 01, 2017 - 1:28 pm

Thanks. I will do it manually.

Regards,
Anshuman

Anshuman Sharma posted on Sunday, October 01, 2017 - 3:59 pm

Hi Dr. Muthen,

Sorry to bother you again. Some of the logits are undefined because the corresponding classification probabilities are zero. Here is a summary.

Entropy: 0.964
Average latent class probabilities for the Most Likely Latent Class Membership (Column) by Latent Class (Row)
1 2 3
1 0.981 0.003 0.016
2 0 1 0
3 0.013 0 0.987

Classification Probabilities for the Most Likely Latent Class Membership (Column) by Latent Class (Row)
1 2 3
1 0.987 0.000 0.013
2 0 0.994 0
3 0.016 0 0.984

As you can observe some of the probabilities are zero due to which logits are unidentified. Could you please suggest how I can go forward? Also, am I doing something wrong?

Thanks and regards,
Anshuman

Bengt O. Muthen posted on Monday, October 02, 2017 - 5:04 pm

See how it is done in this paper

Nylund-Gibson, K., Grimm, R., Quirk, M., & Furlong, M. (2014): A latent transition mixture model using the three-step specification. Structural Equation Modeling: A Multidisciplinary Journal, 21, 439-454.
contact first author

Anshuman Sharma posted on Monday, October 02, 2017 - 5:40 pm

Thank you so much, Dr. Muthen.

Regards,
Anshuman

JuliaSchmid posted on Monday, October 09, 2017 - 5:21 am

Above, you recommended loglikelihood contribution to detect outliers in a data set. Is there an accepted cut-off value for this analysis? Furthermore, I'm wondering, if there is a paper, which refers to this topic. Thank you in advance!

Bengt O. Muthen posted on Monday, October 09, 2017 - 10:35 am

We refer to this in our book Regression and Mediation Analysis using Mplus. I don't think there are any rules of thumb for cutoffs - instead, see how results are affected by deleting the observation with the largest outlier value.

Angie Starrett posted on Tuesday, October 10, 2017 - 8:52 am

Below is the error message I am getting from Mplus. I get this message when running a 2-class and 3-class LPA solution. The variable MCcomp is mean-centered perceived competence. This is skewed data as most values are high. The resulting variance in class 2 (either model) is zero. I have tried different starting values, but I get the same error. What should I do to clear up this error message?

THE ESTIMATED COVARIANCE MATRIX FOR THE Y VARIABLES IN CLASS 2 COULD NOT
BE INVERTED. PROBLEM INVOLVING VARIABLE MCCOMP. COMPUTATION COULD
NOT BE COMPLETED IN ITERATION 11. CHANGE YOUR MODEL AND/OR STARTING
VALUES. THIS MAY BE DUE TO A ZERO ESTIMATED VARIANCE, THAT IS, NO WITHIN-CLASS
VARIATION FOR THE VARIABLE.

Linda K. Muthen posted on Tuesday, October 10, 2017 - 9:11 am

You should send the output and your license number to support@statmodel.com.

JuliaSchmid posted on Wednesday, October 11, 2017 - 2:01 am

Thank you, Bengt, for your answer above!
We run a LPA with raw data. However, to interpret the profiles, we would like to have an output with standardized (z-scores) values of each profile. How can we do this in Mplus?

Bengt O. Muthen posted on Wednesday, October 11, 2017 - 2:25 pm

The output gives you the estimated mean and variance for the LPA indicators in each class - use those to compute z-scores.

Gregory M. Dams posted on Wednesday, December 20, 2017 - 9:02 am

Dr. Muthen, I have some questions regarding checking LPA assumptions.

For the measurement model:
I'm aware that LPA assumes that the indicator variable distributions within classes is normal. How problematic is it when small clusters (n = 12-33; 2-6% of sample) do not have normal distributions of the indicator variables? What should be done in response/should this model be rejected because of the normality assumption violation?
Also, other than the assumption that data is missing at random, are there other assumptions that must be checked in the measurement model?

For the structural model:
Must the variables predicted by latent class membership be normally distributed? Are there other assumptions that must be checked in the structural model?

Bengt O. Muthen posted on Wednesday, December 20, 2017 - 12:15 pm

Take a look at the article on our website:

Muth�n, B. & Asparouhov T. (2015). Growth mixture modeling with non-normal distributions. Statistics in Medicine, 34:6, 1041�1058. DOI: 10.1002/sim6388

This shows how using a skew-t distribution within each class can reduce the number of classes due to non-normality within class. You small class may be due to picking up strong non-normality. You also want to check that your variables are uncorrelated within class, for instance using WITH statements to see significant deviations.

Same considerations for the structural model.

You can also categorize very skewed variables and consider them ordinal instead.

Gregory M. Dams posted on Thursday, December 28, 2017 - 8:36 am

Thank you very much for your quick reply Dr. Muthen!

Vinu Ilakkuvan posted on Thursday, December 28, 2017 - 9:37 am

I am conducted an LCA/LPA with both continuous and dichotomous variables. I started with a LPA with just the continuous variables (which capture different dimensions of social media use) and based on BIC/VLMR/etc determined the 5 class model was the best. Then, I added the dichotomous variables (which capture different dimensions of digital device ownership) and ran all the models again - I was surprised to see the latent class composition was essentially identical (and the 5 class model again the best fit, since nothing really changed as a result of adding the dichotomous variables).

Does this suggest the digital device questions add no value in terms of distinguishing latent classes (and therefore I shouldn't include them in the LPA/LCA)? Or is this because the digital device questions are dichotomous and the sm use questions are continuous so the greater variation available with the sm use questions ends up driving the formation of the latent classes? Or is it b/c digital device access really isn't meaningful in distinguishing groups? Or does it suggest something is off or wrong with the models? If anyone has any thoughts/insights, I would greatly appreciate it, thank you!

Bengt O. Muthen posted on Thursday, December 28, 2017 - 1:06 pm

Take a look at our tech note on variable-specific entropy

http://www.statmodel.com/download/UnivariateEntropy.pdf

This shows you how each variable contributes to the classification quality.

Vinu Ilakkuvan posted on Thursday, December 28, 2017 - 7:30 pm

Thank you so much for the response, this is very helpful!

Gregory M. Dams posted on Wednesday, January 17, 2018 - 2:09 pm

Good day Dr. Muthen,
I ran an LPA with three continuous variables and initially found a 5-cluster class-Varying, diagonal solution to be optimal following the recommendations of Masyn's (2013) chapter on LCA and LPA. However, the 5 cluster solution has violated the assumptions of within-cluster normality. Following your recommendation to examine skewed T distribution models, I compared the candidate models from my initial LPA to Skewed T, T-distribution, and Skew Normal models surrounding (+/- 1 cluster) the normal distribution candidate models. I am having trouble picking a best model with this second round of comparing models.

In particular, I found that the best two models (looking at BIC and cmP values) are a skew normal 2-cluster class-varying, unrestricted model and my initial best model that was a normal 5-cluster class-varying diagonal model. These two models have nearly the same BIC and cmP values, although the 2-cluster model has less than a point lower BIC score. Also, the 2-cluster model has theoretical significance, whereas the 5-cluster model does not have theoretical significance.

1. How do you recommend I pick between models?

2. Is there a preferred statistical test or even a preferred descriptive statistic that should be used to pick between models with different distributions?

Bengt O. Muthen posted on Wednesday, January 17, 2018 - 3:43 pm

When BIC doesn't help deciding, I would go with theoretical significance/substantive interpretation first and parsimony second.

I don't know about a statistical test to choose.

Gregory M. Dams posted on Saturday, January 20, 2018 - 3:40 pm

Thank you for the help!

Gregory M. Dams posted on Thursday, January 25, 2018 - 7:02 pm

Hi Dr. Muthen,
I've run a number of LPA models, checking skewed T, skew normal, normal, and T distributions, however I get many models with an odd replication situation. Such models do not have identical best -2LL value found, however those models have a second-best -2LL value that is within one point of the best -2LL. In such cases, Mplus software states that "The best loglikelihood value has been replicated".

1. Is this problematic? Are the models actually considered replicated in such situations? If no amount of additional random starts actually finds two identical best -2LL values (rather than a second best that is incredibly close), must the model be rejected?

2. I heard that model starts can be seeded in an attempt to increase the chance of finding replication of the seeded values. Do you have syntax you can share for how to seed the starts?

Bengt O. Muthen posted on Friday, January 26, 2018 - 4:13 pm

For some algorithms the precision is not high enough that it is possible to require exact replication.

1. No

2. If you do 2 runs and want different random starts you can use SEED= to start the process differently. This doesn't help in replicating logLs.

JuliaSchmid posted on Thursday, February 08, 2018 - 2:59 am

Hi Bengt and Linda

I analysed my data with a LPA (Profiles of motivation and volition, 3 Profile solution) and a multinomial logistic regression analysis (independet variable: Profiles; dependent Variable: known classes of behaviour change).
Mplus calculated the conditional probabilites of "transition". However, the values for the class/profile 3 somewhat irritated me, because they seem in contradiction with "latent class pattern":

P(CG=1|C=3)=0.196
P(CG=2|C=3)=0.643
P(CG=3|C=3)=0.161

Latent Class
Pattern

3 1; 35; 0.08929
3 2; 12; 0.03061
3 3; 3; 0.00765

C3 is my reference group. When I change the refrence group, the conditional probabilites for the same class (in the output down C2) change drastically:

P(CG=1|C=2)=0.758
P(CG=2|C=2)=0.205
P(CG=3|C=2)=0.038

What should I do? Is there an explanation for this different findings?

Thank you very much in advance!

Bengt O. Muthen posted on Thursday, February 08, 2018 - 10:25 am

It's probably better if you send the 2 outputs to Support along with your license number because it is not clear to me which is the knownclass variable and which is the unknown class variable. "cg" is out standard notation for knownclass but it is seldom a DV, but rather an IV.

Stephen Ungvary posted on Tuesday, February 20, 2018 - 2:52 pm

I am running an LPA with four predictors and found that my best fitting model is one with four latent profiles. As a next step, I am interested in seeing how the classes differ across 8 continuous outcome variables but I would also like to include covariates in my model.

My understanding is that it is not possible to include both covariates and distal outcomes in the same model, whether the 1-step or 3-step approach is used. Is this correct?

If it is, would you recommend that I re-estimate my models with the covariates included (c ON x in the overall command) and then move on to step 3 or including my covariates as latent class predictors?

If my initial assumption is incorrect, do you have a resource that you could point me to? Thank you.

Bengt O. Muthen posted on Tuesday, February 20, 2018 - 3:28 pm

See Web Note 21 on our website, Section 3.2.

I would not include c on x in Step 1.

Stephen Ungvary posted on Wednesday, February 21, 2018 - 7:41 am

Thank you for the quick reply. I have revisited webnote 21.

Based on my understanding of it, it is not possible to simultaneously assess a covariate of the latent classes and distal outcomes in the same model using R3STEP/DU3STEP. With this approach, I would have to first run a model with my covariates, then run a model with my distal outcomes as auxiliary?

However, section 5 of webnote 21 states that I can use the manual 3-step approach and then regress my outcomes on my covariates. I have just done this, but I believe that the output does not capture my research question. If I am interpreting the outcome correctly, it will tell me how Y relates to X within each of the latent classes, but cannot necessarily compare the mean differences between the latent classes.

If a goal of my research is to see how the latent classes differ on the distal outcomes after controlling for covariates, is this how you would recommend addressing it? Is a viable option to use residualized change scores?

Stephen Ungvary posted on Wednesday, February 21, 2018 - 8:49 am

I am amending my response directly above. I ran the BCH method successfully in the section of webnote 21 that you directed me to. The explanation in the first paragraph captures exactly what I hope to accomplish, but I believe I am misunderstanding the output.

This appears to be very similar to what I had done previously in my post above - the output gives me that effect of my outcomes on my covariates within each latent class, but does not compare the mean level of the outcomes across classes. Is this correct?

Bengt O. Muthen posted on Wednesday, February 21, 2018 - 4:04 pm

You get different Y intercepts in the different classes, right? That's because of the influence of the latent class variable on Y.

Stephen Ungvary posted on Thursday, February 22, 2018 - 8:37 am

Thank you again. I now understand that in order to test whether class specific intercepts differ from one another I will need to use the MODEL TEST: command. I would like to have an overall test before examining comparisons between two classes. I am using the following syntax:

MODEL:
%OVERALL%
C ON x1 x2 x3;
y1 ON x1 x2 x3;
%c#1%
y1 ON x1 x2 x3;
[y1](a);
%c#2%
y1 ON x1 x2 x3;
[y1](b);
%c#3%
y1 ON x1 x2 x3;
[y1](c);
%c#4%
y1 ON x1 x2 x3;
[y1](d);

MODEL TEST:
a = b; a = c; a = d;
b = c; b = d;
c = d;

If I attempt to run all of the tests, I am given the error "WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX."

If I run just one group of test, such as
a = b; a = c; a = d; I do not get an error.

Do you have advice on how to address this? Thank you again.

Bengt O. Muthen posted on Thursday, February 22, 2018 - 4:21 pm

You have redundancies in Model Test. Just say

b = a;
c = a;
d = a;

The idea is - if b, c, d are equal to a, they are equal to each other.

JuliaSchmid posted on Friday, February 23, 2018 - 7:15 am

Hi Bengt or Linda

I am running a LPA with four continuous variable. One of the variables is rangig from 0-100 (percent value). The distribution of this variable is bimodal in that 50% of the sample have a value of 0 and 25% of the sample have a value of 100%. I wanted to run a LPA with class-varying variances, but it didn't work. What do you recommend? In general: how should I handle the distribution/variance of the befor mentioned variable in the LPA?

Bengt O. Muthen posted on Friday, February 23, 2018 - 4:34 pm

Sounds like a variable that is censored from both below and above. Mplus can handle one but not both. You could cut it into 3 categories and treat it as categorical.

JuliaSchmid posted on Saturday, February 24, 2018 - 2:28 am

Thanks for your answer! One follow-up question: is it possible to run a LPA with 3 continuous variables and 1 categorical variable (with 3 or 4 levels)? Or isn't it allowed to run a LPA with mixed levels of measurement?

Bengt O. Muthen posted on Saturday, February 24, 2018 - 9:02 am

You can have different variable types - see e.g. UG ex7.11

Jan Hoeltge posted on Monday, February 26, 2018 - 3:20 am

Dear Bengt and Linda,

I want to run a LPA and use R3STEP and DU3Step.

This is my code:
data: file = diff8id70ns.dat;

variable: names = x1 x2 x3 x4 x5 x6 x7 x8 id y1;
usevariables are x1 - x8;
classes = class(3);
missing = all(-99);
idvariable is id;
auxiliary = y1(R3STEP);

analysis: type = mixture;
optseed = 181293;

plot: type = plot3;
series = x1(1) x2(2) x3(3) x4(4) x5(5) x6(6) x7(7) x8(8);

savedata: file = diff_38id70nsopt.dat;
save = cprobabilities;

output: tech14 tech11;

The optseed is based on a preliminary step to find the right model.

And this is my problem:
In my preliminary analysis I used a dataset that included the 8 variables to build my classes and the id variable only. When I now use a dataset that additionally contains my independent variable for the R3Step procedure, I get a different solution for my profiles. I can see on the graph and output that the analysis takes the id variable into account now, even if I actually defined it as the idvariable. The same problem persists even if I take out "auxiliary = y1(R3STEP);". So this happened just because I included an additional variable into my dataset.

Do you have any suggestions? Thank you for any help,
Jan

Bengt O. Muthen posted on Monday, February 26, 2018 - 11:20 am

Send the Step 1 output and the R3Step output to Support along with your license number.

JuliaSchmid posted on Tuesday, February 27, 2018 - 12:21 am

I'd like to run a LPA (Class-Varying, diagonal; see input below) with three continuous and one categorical variable. Which estimator do you recommend in this case: WLS, WLSM, or WLSMV?

Thanks for any help!

USEVARIABLES ARE
bmi_1 sskM_1 soFrM_1 svVK4_1;

CLASSES = c(2);
CATEGORICAL = svVK4_1;
Missing = ALL(-999);

ANALYSIS:
TYPE = MIXTURE;
STARTS = 500 50;
STITERATIONS = 50;
LRTBOOTSTRAP = 500;
LRTSTARTS = 50 10 50 10;

MODEL:
%c#1%
bmi_1 sskM_1 soFrM_1 svVK4_1;

%c#2%
bmi_1 sskM_1 soFrM_1 svVK4_1;

Gregory M. Dams posted on Tuesday, February 27, 2018 - 5:40 am

Good day Dr. Muthen,
1. What is the difference between the LPA measurement model with estimated parameters based on the estimated posterior probabilities and the measurement model based on the most likely latent class membership?

2. Which of these models should be presented in a publication as the measurement model?

Bengt O. Muthen posted on Tuesday, February 27, 2018 - 10:45 am

Julia:

LPA being mixture modeling needs either ML or Bayes. WLSMV can't do it.

Bengt O. Muthen posted on Tuesday, February 27, 2018 - 10:49 am

Gregory:

There are 3 parts to the latent class formation as shown in the output: (1) Based on the estimated model, (2) based on the estimated posterior probabilities, and (3) based on Most likely class.

You should report (1). (2) bases it on what would be called factor scores with continuous latent variables and (3) bases it on which class has the highest posterior probability for each subject.

Gregory M. Dams posted on Tuesday, February 27, 2018 - 4:19 pm

Thank you Dr. Muthen,
I had assumed the estimated model (1) and the model based on posterior probabilities (2) were the same. Could you either explain what makes (1) and (2) different or suggest an article that explains the difference?

Tihomir Asparouhov posted on Tuesday, February 27, 2018 - 4:57 pm

These are the three classifications stated right under the Information Criteria. They are all based on the estimated model and are based on these
(1) P(C|X)
(2) P(C|Y,X)
(3) max_C P(C|Y,X)
where X are the class predictors and Y are the class indicators.

Jin Qu posted on Wednesday, February 28, 2018 - 7:26 am

I would like to inquire about how I could better address reviewers' comments from a 2nd round revision regarding my choice of choosing 4 profiles in LPA. My LPA results showed that my BIC values keep decreasing, and the VLMR and LMR showed that the 3-profile is a better solution. I picked the 4-profile (The line chart for the BIC value decreased in a flatter slope after the 4-profile),and it is consistent with the conceptual perspective. However, the reviewers are not convinced that the 4-profile solution should be picked over 5- or 3- profile. I added the bootstrap likelihood piece (showing 4- is better than 3- profile), but this test cannot confirm that the 4-profile solution is better than 5-,6-, or 7- profile.

Do you have any suggestions on other things that I could write to justify my decision? Thanks!

Gregory M. Dams posted on Wednesday, February 28, 2018 - 8:23 am

Hi Jin,
One thing you can explore is that the statistics used in model selection have not been found to have the same degree of power to detect the correct number of latent classes. Check out the following article:
Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study (Nylund, Asparouhov, Muthen, 2007).

Bengt O. Muthen posted on Wednesday, February 28, 2018 - 3:40 pm

Gregory:

I just wanted to add to Tihomir's posting regarding (1) versus (2). They are typically the same or very close. They can differ when there are X's that influence c. They tend to differ when there are several c's with restrictions on their relationships. For instance, in LTA with more than 2 time points, the typical model is lag 1 but if the data follow a lag 2 model, (1) and (2) can differ.

(1) is clearly representing the model which says that the c distribution is a multinomial logistic regression function of X. Instead, (2) draws on information from the Y outcomes as well; this is how you estimate a certain subject's likely class membership.

Neither one of us can think of an article that discusses this.

Bengt O. Muthen posted on Wednesday, February 28, 2018 - 3:44 pm

Jin:

I think it is simplest to rely on BIC. If BIC doesn't hit a minimum, I would try to modify the model by e.g. allowing some residual covariance between the outcomes in some class or all classes. In that way, you may be more likely to find a minimum. As an example with categorical outcomes, see the paper on our website:

Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169-177, DOI: 10.1080/10705511.2014.935844. Download Mplus files.

Jill Rabinowitz posted on Wednesday, February 28, 2018 - 8:40 pm

Hi there,

I ran an LPA using several continuous indicators measured at 1 time point and I identified a 3-class model that best fit the data. I want to see whether probability of class membership predicts a distal outcome (totext5) while controlling for multiple covariates.

I saved the probabilities of class membership and I conducted a regression which involved regressing the outcome on class 1, 2, and 3 probabilities in the same model. When I try to enter the 3 variables that reflect class membership into the regression mode, I get the message below.

How would you recommend proceeding? Should I just run each regression in separate models (i.e., regress the outcome on class 1 probabilities in 1 regression; regress the outcome on class 2 probabilities in another regression, etc)?

THE MODEL ESTIMATION TERMINATED NORMALLY

THE STANDARD ERRORS FOR THE STANDARDIZED COEFFICIENT
COULD NOT BE COMPUTED.

THE CHI-SQUARE STATISTIC IS NEGATIVE.
THE LOGLIKELIHOOD VALUES MAY NOT BE RELIABLE.

SYNTAX FOR YOUR REFERENCE:

IDVARIABLE IS famid;
MISSING ARE ALL (999);
USEVARIABLES ARE class1prob class2prob class3prob icsex icethnic groupid ytotext3 totext4 totext5;

MODEL: totext5 on icsex icethnic groupid ytotext3 totext4 class1 class2 class3;

OUTPUT:stdyx;
Plot: type is plot1;

Best,
Jill

Bengt O. Muthen posted on Thursday, March 01, 2018 - 6:28 pm

The 3 probabilities sum to 1 for each subject so you can't include all 3.

Better approaches are discussed in our Web Note 15 and 21.

Jill Rabinowitz posted on Monday, March 05, 2018 - 7:29 pm

Thank you! I have an additional question. I identified a 3-profile model and I ran some auxiliary analyses. When I looked at the omnibus chi-squared results, for the pairwise comparisons, many of the SDs exceed the means and are sometimes double or triple the mean. Is this a problem?

Bengt O. Muthen posted on Tuesday, March 06, 2018 - 2:56 pm

Please send your output to Support along with your license number.

Stephen Ungvary posted on Wednesday, March 07, 2018 - 3:09 pm

I have a follow up question to one I posted a couple of weeks ago.

I am using the following syntax to test whether the means of y1 are significantly different across the four classes. Everything runs smoothly, but the Wald test is non-significant despite very clearly different intercepts. If I include only y1 ON x1 x2 x3; in the %OVERALL% model and not within each class, the Wald test is significant and the effect of y on x1 x2 and x3 still appears in the output for each class. However, if I do include the command within each class the Wald test is non-significant.

Based on webnote 21 I should have the syntax set up as below, but is it necessary to have y on x within each class? Your advice is appreciated.

MODEL:
%OVERALL%
C ON x1 x2 x3;
y1 ON x1 x2 x3;
%c#1%
y1 ON x1 x2 x3;
[y1](a);
%c#2%
y1 ON x1 x2 x3;
[y1](b);
%c#3%
y1 ON x1 x2 x3;
[y1](c);
%c#4%
y1 ON x1 x2 x3;
[y1](d);

MODEL TEST:
a = b;
a = c;
c = d;

Bengt O. Muthen posted on Wednesday, March 07, 2018 - 3:46 pm

Having the slope of y ON x varying across classes means that you are specifying an x*c interaction which is not necessary. Think of the ANCOVA analogy where the slope is the same across conditions so that the effect of condition is captured in the intercept differences.

Gregory M. Dams posted on Friday, March 09, 2018 - 6:10 pm

Hi Drs. Muthen,
I'm seeing more LPA articles being published using standardized variables for indicators. I've seen your past recommendations on this message board for not standardizing indicators.
Have your recommendations changed or do you both still hold that not standardizing indicators is best?

Bengt O. Muthen posted on Saturday, March 10, 2018 - 10:45 am

I wouldn't standardize. Have you seen any good reasons for standardizing?

Gregory M. Dams posted on Saturday, March 10, 2018 - 12:18 pm

I've seen no good reasons provided for standardizing. At worse, the indicators are described as having been standardized without explanation as to why. At best, there is ambiguity regarding whether standardization was used for the indicators or if it only occurred post hoc once the measurement model has been selected to help clarify latent class differences in figures.
Thanks for your response!

Mark Peterman posted on Saturday, March 10, 2018 - 1:46 pm

I conducted an LPA and identified a 4-profile model. I want to look at whether a covariate interacts with class membership (covariate x c) to predict a distal outcome (y). Where can I find syntax to do this? I checked webnote 21, but I don't think the procedures outlined there are what I need. Thanks.

Bengt O. Muthen posted on Saturday, March 10, 2018 - 2:06 pm

Section 3.2 of WN 21 gives you what you want. The interaction is picked up by Y ON X being class-specific.

Mark Peterman posted on Saturday, March 10, 2018 - 4:36 pm

Thank you, Dr. Muthen.

Gregory M. Dams posted on Sunday, March 11, 2018 - 8:18 pm

What syntax would I use to get within cluster median values reported for each indicator in an LPA that allows for skewnormal distributions?
I assume that when a cluster has skewed distributions that the median is still worth reporting as a better measure of central tendency.

Tihomir Asparouhov posted on Tuesday, March 13, 2018 - 10:18 am

You can use the command plot:type=plot3; Click on "estimated distribution". The median is reported under the "50%-tile".

Mark Peterman posted on Wednesday, March 14, 2018 - 9:24 am

I am using the 3-step approach outlined in webnote 21. In webnote 21, it states that if the entropy is low, the 3-step approach may not appropriate. I have a model where the other fit indices look ok, but the entropy is .70. When I requested bch weights, I noticed some of them were negative. Given the negative bch weights, is there an alternative approach you would recommend?

Tihomir Asparouhov posted on Wednesday, March 14, 2018 - 3:40 pm

The BCH weights are always negative - nothing unusual there. The entropy of 0.7 is not very low. I see no problems with using BCH.

Jacqueline Kim posted on Wednesday, March 14, 2018 - 11:36 pm

Hello,

I would like to do a Bayes Exploratory LPA as in v.7 pt.2, p.100. How would the syntax change for using continuous indicators rather than categorical?

Here is how it looks on the slide for categorical.

VARIABLE:
NAMES = u1-u4;
CATEGORICAL = u1-u4;
CLASSES = c(2);

ANALYSIS:
TYPE = MIXTURE;
ESTIMATOR = BAYES;
FBITERATIONS = 10000;

MODEL:
%OVERALL%
%c#1%
u1-u4 WITH u1-u4 (p1-p6);
%c#2%
u1-u4 WITH u1-u4 (q1-q6);

MODEL PRIORS:
p1-p6∼IW(0, 15); pq-q6∼IW(0, 15);

thank you for your guidance.

Bengt O. Muthen posted on Friday, March 16, 2018 - 6:32 am

Delete the CATEGORICAL= line. Also, I don't know what you intend with the Model Priors statements.

Jacqueline Kim posted on Friday, March 16, 2018 - 3:13 pm

Thank you Dr. Muthen, I am very new to Bayesian methods, so if you can point to good beginner's guides for determining model priors I would very much appreciate it.

warmly,

Bengt O. Muthen posted on Friday, March 16, 2018 - 3:45 pm

If you are new to Bayes, I would not use priors but instead rely on the Mplus default non-informative priors. You may also want to read about Bayes in Mplus in our book Regression and Mediation Analysis using Mplus.

Jacqueline Kim posted on Friday, March 16, 2018 - 8:33 pm

I will try the default priors and take a look at your book, thank you!

Jin Qu posted on Thursday, March 29, 2018 - 8:14 am

I used race as a predictor to predict profile membership (I already obtained 4 profiles from a previous analysis). First, I use race as a predictor:
auxiliary (r3step) = race

I get a logistic regression estimate for each profile in reference to a specific profile (showing significance in these pairwise comparisons), but not a omnibus test showing whether profiles differ based on race.

In my manuscript, I would like to write that the profile membership differs based on race. I would like to inquire how I can provide a omnibus test to support this statement?

I also tried
auxiliary(de3step) = race
and I obtain a overall chi-square test and p-value. However, I don't think this is correct conceptually (race as distal outcome in this case). Thanks for your feedback.

Bengt O. Muthen posted on Thursday, March 29, 2018 - 12:24 pm

One way is to consider the manual approach to R3STEP shown in Appendix A of

Asparouhov, T. & Muth�n, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Download appendices with Mplus scripts.

and then based on parameter labels in the Model command use Model Test to check if all c on x slopes are zero.

sfhellman posted on Tuesday, April 03, 2018 - 8:08 am

I'm running an LPA and trying to compare two different 3-profile models - one where the covariances are constrained to be equal across profiles and one where the covariances are 0 across all profiles. I used model test, but in the output got the following message: WALD'S TEST COULD NOT BE COMPUTED BECAUSE OF A SINGULAR COVARIANCE MATRIX.

Any ideas why this may be happening? In the model where I constrained the covariances to be equal across classes, I also got the following message: ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE
MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT
DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL.
THE FOLLOWING PARAMETERS WERE FIXED:

Bengt O. Muthen posted on Tuesday, April 03, 2018 - 10:26 am

A common mistake is to say in Model Test:

0 = m2-m1;
0 = m3-m1;
0 = m3-m2;

The last statement is redundant and leads to the singularity message.

For your second question we need to see the full output. Send to Support along with your license number.

sfhellman posted on Tuesday, April 03, 2018 - 10:48 am

I have actually labeled the parameters the same across classes and then in model test, listed them equal to zero:

%C#1%
variable 1 with variable 2 (C1);
variable 1 with variable 3 (C2);
and so on
...

%C#2%
variable 1 with variable 2 (C1);
variable 1 with variable 3 (C2);
and so on
...

%C#3%
variable 1 with variable 2 (C1);
variable 1 with variable 3 (C2);
and so on
...

Then:

MODEL TEST:
C1 = 0;
C2 = 0;
and so on

etc.

Bengt O. Muthen posted on Tuesday, April 03, 2018 - 11:05 am

Then we need to see your full output - send to Support along with your license number.

Gregory M. Dams posted on Monday, April 09, 2018 - 9:23 am

Hi Dr. Muthen,
I ran an LPA using three continuous indicators. Initially, the best model was a 5 latent class model, however two of the classes had skewed distributions of the indicators within each class (and they were small classes, probably formed from the nominal distribution tails).
I compared Skewed T distribution models and found the best model based on BIC scores (and cmP) was a 2-class class-varying, unrestricted, skewnormal model.
The problem I'm facing is that the means generated by the 2-class solution are strange. More specifically, one of the two classes (about 70% of sample) is reported as having a mean value for one of the indicators that is almost the same as the minimum score possible on that indictor. It is hard to conceptualize that roughly 70% of the population have virtually no levels of this indicator. In addition, within the same class, the SE for the indicator is strangely small and deviates greatly from the nominal SD on this indicator.
1. What could this mean?
2. Is it problematic?
3. Must the model be discarded even though it has the best BIC value?

It's worth noting that this 2-class LPA solution has roughly the means and SEs (with the exception of the one indicator noted above) as past analyses that relied on K-means cluster analysis to identify two cluster models using the same indicators on samples drawn from the same population.

Bengt O. Muthen posted on Monday, April 09, 2018 - 4:12 pm

We have to see the full output and data - send to Support along with your license number.

Jacqueline Kim posted on Monday, April 23, 2018 - 6:11 pm

Hello, would you have a paper/guide akin to...

Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169-177, DOI: 10.1080/10705511.2014.935844.

...that addresses the same issue but for continuous indicators in Latent Profile Analysis and examples for Mplus?

Would you use the mixed residual covariances instead of the bivariate Pearson? And is there a value you would consider a "large" residual covariance? In the paper it seems >30 was considered large for bivariate Pearson.

Unfortunately I do not have a large enough sample size to use the second method modeling all uniform associations.

thank you very much for your help!

Tihomir Asparouhov posted on Tuesday, April 24, 2018 - 4:09 pm

I would recommend this papers

Chapter 4
https://www.statmodel.com/download/muthen1.pdf

also the references in there

also User's guide example 7.9

Youjoung Lee posted on Saturday, May 19, 2018 - 2:35 am

Hello,

I ran LPA for one continuous variable (3 subscales) and chose five groups based on model fit and theoretical background.
I'm wondering if I can recode the group membership into a dummy variable and use this dummy variable as an independent variable for regression analysis.

Bengt O. Muthen posted on Saturday, May 19, 2018 - 3:42 pm

If your entropy is high (say > 0.8) that may be an ok approach. Otherwise, see the Mplus Web Note 21 on our website.

secondon posted on Saturday, July 14, 2018 - 6:14 pm

Hello,

Is there is a way to tell from the conditional response means and/or latent class probabilities which indicators are contributing the most to differentiating participants into latent profiles? I suppose I'm wondering if there an LPA equivalent of the conditional response probabilities generated in LCA?

Bengt O. Muthen posted on Sunday, July 15, 2018 - 2:42 pm

See our note under Recent Papers:

Asparouhov & Muth�n (2014). Variable-specific entropy contribution. Technical appendix.

Silke Hertel posted on Friday, July 20, 2018 - 7:01 am

Dear Linda and Bangt,
we are running LPA with 4 manifest indicators in samples of 100-220 students. We tried different starting values and numbers of iterations. Anyhow, we are having problems with retrieving solid cluster solutions. Quite often, the following warning shows up:

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS 0.490D-12. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 4, %C#1%: [ ITSRL_REL ]

This is our syntax:

DATA: FILE IS ILPA_IT_SRL_IT_IQ_20180720_2.dat;

VARIABLE: NAMES ARE
ITIQ_MAL ITIQ_REL ITIQ_Com
ITSRL_MAL ITSRL_REL ITSRL_Com;

USEVARIABLES ARE
ITIQ_MAL ITIQ_REL
ITSRL_MAL ITSRL_REL;

MISSING ARE ALL (-99999);

CLASSES = c (4);

ANALYSIS: TYPE = MIXTURE;
STARTS = 1000 100;
stiterations = 50;
convergence = 0.0000001;

OUTPUT: TECH7 TECH11 TECH13 standardized Modindices(ALL);

We would appreciate your help and advise.
All the best
Silke and Kathi

Bengt O. Muthen posted on Friday, July 20, 2018 - 10:45 am

This can happen if ITSRL_REL is binary but not declared categorical or if one of your classes is very small.

Silke Hertel posted on Friday, July 20, 2018 - 11:06 am

Dear Bengt,
thanks for your quick reply. ITSRL_REL is not binary. One class is very small (Class 1):

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

Latent
Classes

1 5.97782 0.05639
2 17.66961 0.16669
3 42.89680 0.40469
4 39.45577 0.37222

Do you have any suggestions how to handle this problem?
Thank you
Silke and Kathi

Bengt O. Muthen posted on Friday, July 20, 2018 - 11:14 am

Perhaps you have more parameters specific to class 1 than number of people (6) in class 1. If that doesn't help, send your output and preferably also data to Support along with your license number.

Eddie Liebmann posted on Monday, August 06, 2018 - 12:09 pm

Hello,

I'm working on a factor mixture LPA. I'm attempting to override the default behavior of constraining the last profile's latent means to zero. It is my understanding that the below code should work to override the default behavior, but the model estimation does not terminate normally. A snippet of my code is as follows:
CLASSES = c(2);
MODEL:
%OVERALL%
AUTO BY it1* it10 it14 it25@1;
CRAV BY it4* it17 it23 it29@1;
LOC BY it2* it16 it21 it35@1;
[it25@0]; [it29@0]; [it35@0];

AUTO WITH CRAV@0 LOC@0;
CRAV WITH LOC@0;

%c#1%
[AUTO*]; [CRAV*]; [LOC*];

%c#2%
[AUTO*]; [CRAV*]; [LOC*];

When the model is identified by setting the LV variance@1 and freely estimating the item loadings, the model estimates without issue. Am I incorrect in thinking that this default behavior can be overridden this way? Also, is it correct to expect that how the measurement model is identified should not influence the obtained fit statistics (log likelihood, AIC, BIC, aBIC)?

Thank you for your help!

Eddie

Bengt O. Muthen posted on Monday, August 06, 2018 - 3:37 pm

Perhaps the non-identification is due to intercepts not being held equal across the latent classes (as they are by default with multiple groups).

Jin Qu posted on Wednesday, August 08, 2018 - 5:26 pm

I would like to see if the 4-profile LPA solution differ based on attachment classification (i.e., using binary variable secure (0) vs. insecure (1); and categorical variable secure (1),avoidant (2),resistant (3) with the code auxiliary (de3step). I obtained the output with means for each profiles and chi-square tests for significance. It does not make sense to me that attachment classification for profile 1 is 1.516. I wonder how I can get logits/odds ratios that I could report from predicting categorical outcomes. Thanks!

Bengt O. Muthen posted on Wednesday, August 08, 2018 - 5:57 pm

Please send your full output to Support along with your license number.

Silvia Colonna posted on Thursday, November 29, 2018 - 6:19 am

Dear all,

I have some questions about LPA.

1) After performing a LPA, I was looking at the model results and I have noticed that the p-value associated to one estimated mean is not significant but I am not sure how to interpret this result.
Does it mean that that particular variable is not relevant for identifying different profiles within my dataset?

2) I would also like to add covariates to my LPA. By running my model without covariates, it looks like the profiles identified don't really differ that much on 3 DV. Would it make more sense to remove these 3 DV for further analyses with covariates?
I know that simpler models are preferable but after all, my profiles have been identified considering all my dependent variables. I wish to add covariates using the 3 step approach so that the profiles I have identified won't change.
Could you please help me clarifying these issues?

Kind regards
Silvia

Bengt O. Muthen posted on Thursday, November 29, 2018 - 2:49 pm

1) The important thing is not if means are different from zero but if they are different across the classes. Regarding the contribution to the classification by each variable, see the paper on our website under Recent Papers:

Asparouhov & Muth�n (2014). Variable-specific entropy contribution. Technical appendix.

2) If the variable-specific entropy suggests it, you can delete poor variables.

Silvia Colonna posted on Friday, November 30, 2018 - 10:20 am

Thank you Dr. Muthen this is really helpful.

So I will disregard the p-value associated with the model estimates.

I am wondering when I should use the variable-specific entropy. Shall I use it when I decide what my best model is and re-run it after I removed the variables that lower entropy? or straight at the beginning when I test a 2 class model and then keep testing only the variables selected?

Also, I have a very skewed and kurtotic continuous variable. However it is not of a count type. Is the MLR estimator enough to take this into account?

Kind regards,
Silvia

Bengt O. Muthen posted on Friday, November 30, 2018 - 4:45 pm

Q1: I would try to find the best number of classes first and then consider getting rid of items if you need to.

Q2: If you have a high percentage (say > 25%) at the lowest or highest value, you can use the Censored option.

Silvia Colonna posted on Tuesday, December 04, 2018 - 5:05 am

Hi Dr. Muthen,

I tried to write a script following your suggestions but I would like to double check with you if this is correct.

VARIABLE:
NAMES = ID Irr RT_go PI K L_N ModU Car TErr PErr LL SL LW SW adhd odd cd cdagg badhd;
USEVARIABLES = RT_go PI K L_N ModU Car TErr PErr LL SL LW SW;
CENSORED = K (b)
CLASSES = C (2);
IDVARIABLE is ID;
MISSING ARE ALL (999);

ANALYSIS:
type = mixture;

OUTPUT: STDYX PATTERNS TECH14 TECH11;

As far as I understood, because my censored variable is very skewed and kurtotic at the very left side of the distribution (hence the "b") and it doesn't have any normal distributed data within, I am not going to use the censored-inflated option.

I would also leave the default estimator (MLR) despite the fact that the rest of my variables are normally distributed as I am interested in the bootstapped likelihood ratio test to decide which model is the best one and it cannot be obtained with ML estimator.
Thank you for your precious help.

Kind regards,
Silvia

Bengt O. Muthen posted on Tuesday, December 04, 2018 - 3:10 pm

You can use ML with bootstrapping, just not MLR because R refers to other SEs than bootstrapped ones.

Silvia Colonna posted on Friday, December 07, 2018 - 3:17 am

Hi Dr. Muthen,

I tried to use the bootstrapping option but TECH14 is not available and as I mentioned before, I normally use the BLRT as well to decide my best model. I was also thinking that it is not quite clear why I should use ML with bootstrapping if my variables are normally distributed apart from one that is censored. Would it be incorrect to use simply the ML indicator? is it incorrect as I have a censored variable?
Sorry for these many questions.

Kind regards
Silvia

Bengt O. Muthen posted on Friday, December 07, 2018 - 3:57 pm

I would use BIC instead of TECH14 to decide on the model.

Gregory M. Dams posted on Friday, January 18, 2019 - 7:29 am

Hi Dr. Muthen,
When comparing non-normal LPA mixture models, are the VLMR and LMR adj tests trustworthy in guiding model selection? Are these tests intended just for models with within-class normality?

Bengt O. Muthen posted on Saturday, January 19, 2019 - 1:23 pm

I don't know that this has been studied. You may be better off with BIC.

Nicolas Berger posted on Thursday, February 07, 2019 - 2:20 am

Hello,

I am trying to run the following LPA, with different variances for each latent class. I have censored variables. I would like to assess model fit with TECH11 and TECH14.

VARIABLE:
Names x1 x2 x3 x4 x5 ;
USEVARIABLES ARE x1 x2 x3 x4 x5 ;
CENSORED ARE x1 x2 x3 x4 x5 (b);
idvar=id;
CLASSES = c (2);

ANALYSIS: TYPE = MIXTURE;
MODEL:
%overall%
x1 x2 x3 x4 x5;

%c#2%
x1 x2 x3 x4 x5;

OUTPUT: TECH11 TECH14 ;

I have the following questions:

1) Why are VLMR test and boostrapped LRT not calculated? I obtain the following message, no matter the number of random starts used: THE LIKELIHOOD RATIO TEST COULD NOT BE COMPUTED. AN ERROR HAS OCCURRED DURING THE ESTIMATION OF THE H0 MODEL WITH ONE LESS CLASS.

2) Are those tests generally valid in this context? BIC values keep improving as I increase the number of classes, and therefore do not seem very useful to assess model fit. How would you then select the number of classes?

Many thanks.

Nico

Bengt O. Muthen posted on Friday, February 08, 2019 - 7:01 am

1) Send your full output - and data if possible - to Support along with your license number.

2) Finding no BIC minimum may be due to a need for (some) residual covariances which can be added using WITH statements.

Simon Coulombe posted on Tuesday, February 12, 2019 - 9:43 am

Hello,
I'm working on a latent profile analysis, and for one of the indicators used to create the profiles, about half of the participants have a missing value. The variable asked about support from their partner, but these participants don't have a partner, so they rightfully were not asked that questions.

Should I remove that variable for the analysis, or can I still include it and the analysis will still provide the profiles' "patterns" for that variable using those participants who have a partner?
Is there a risk that the FIML process will create a bias in such a case when missing is not at random?
Thank you

Simon

Bengt O. Muthen posted on Tuesday, February 12, 2019 - 5:23 pm

I don't see a problem using the usual FIML approach.

Simon Coulombe posted on Wednesday, February 20, 2019 - 3:43 pm

Thank you so much for your answer above.

We have another question. We are running a LPA, with several covariates using the R3STEP method.

In the results section, under the title "tests of categorical latent variable multinomial logistic regressions using the 3-step procedure", we obtain results that we are not sure how to interpret.

A) In one case, the estimate is 67.595, the SE is 0.000, the Est/S.E. is 999.000, and the p-value is 0.000. OUR QUESTION: What does a SE of 0.000 means? Does it have to do with a potential small cell?

B) Similar case: the estimate is -130.494, the SE is 0.000, the Est/S.E. is *********, and the p-value is 0.000. OUR QUESTION: Why is the Est/SE represented by asterisks here compared to 999.000 above?

Thanks again! Your support is very valuable!

Simon

Bengt O. Muthen posted on Wednesday, February 20, 2019 - 4:25 pm

Those large estimates, which are on a logit scale, occur when the parameter cannot be estimated. As you mentioned, this can be due to zero cells. These estimates should not be interpreted but just noted as parameters that could not be estimated - there was not enough information about them.

John R Anderson posted on Monday, February 25, 2019 - 3:33 pm

Hello-

I am trying to replicate findings with a certain LPA model in order to re-examine its graph. I have been unable to match the -2LLs using the same data. Is there syntax to set the seed or this there another way replicate?

Thanks

Simon Coulombe posted on Wednesday, February 27, 2019 - 2:13 pm

Hi,
I'm running a LPA.
I want to look at the association of the profiles with several variables considered to be potential predictors of profile membership.
I know R3STEP does that but it includes all predictors in a single regression, which I don't want to do because of the high number of missing values when all predictors are considered together and also because I'm using a more exploratory approach.
What is the auxiliary command that can be used to consider the predictor variables one by one instead of all together in the same regression?
Thank you!
Simon

Bengt O. Muthen posted on Wednesday, February 27, 2019 - 4:30 pm

There is no such option. If you want to look at one variable at a time, you may want to view the variable simply as one that has different means for different classes, that is, as a distal outcome using either the DCAT or the BCH auxiliary options.

John R Anderson posted on Monday, March 04, 2019 - 5:41 am

Hi,

I posted this above:

"I am trying to replicate findings with a certain LPA model in order to re-examine its graph. I have been unable to match the -2LLs using the same data. Is there syntax to set the seed or this there another way replicate?"

I was hoping to get a response soon.

Thanks

Bengt O. Muthen posted on Monday, March 04, 2019 - 3:31 pm

I would use the solution with the best logL. If this doesn't help, can you please send your 2 outputs that disagree to Support so we can check. Also include your license number.

Andre Maharaj posted on Monday, April 01, 2019 - 7:45 am

Hello everyone,

I am getting the following error with a 3 class LPA:

THE BEST LOGLIKELIHOOD VALUE HAS BEEN REPLICATED. RERUN WITH AT LEAST TWICE THE
RANDOM STARTS TO CHECK THAT THE BEST LOGLIKELIHOOD IS STILL OBTAINED AND REPLICATED.

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS 0.147D-18. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 4, %C#1%: [ RSAW ]

-----
RSAW is not a dichotomous variable.

Are the estimates trustworthy?
The classes have a reasonable number of persons (27, 32, 123) so does not seem to indicate outlier(s).

Thanks.

Bengt O. Muthen posted on Monday, April 01, 2019 - 5:14 pm

We need to see your full output - send to Support along with your license number.

wang_hao_ran posted on Thursday, April 04, 2019 - 9:28 pm

HI Dr. Muthen,
recently, I run LPA with raw data and found a few issues quite puzzuling. First, to interprete our results more effectively, we perfer to use Z-scores of each porfile. How can we get that? I found a similar question on the board, but your response is using means and variance provided in MODEL RESULTS.however, to cauculate Z-scores we also need raw data. Should we caculate the mean score of each class's raw data and minus the estimated mean presented on MODEL RESULTS, then divided that by the s.d. presented on MODEL RESULTS? Or should we substitute the sample means and variance with estimated means and variance presented on MODEL RESULTS,and caculate every participant within a class a Z-score,then average these Z-scores to represent the Z-score of each class? thanks!

wang_hao_ran posted on Friday, April 05, 2019 - 7:56 pm

Maybe I did not make myself clear.Could you please tell me how to use the MODEL RESULTS to caculate Z scores SPECIFICALLY?
That is puzzuling and our manuscript was nearly enclosed except that point. I know it seems like a quite easy question�But，please just help me�Thank you very much Dr. Muthen！

Bengt O. Muthen posted on Saturday, April 06, 2019 - 12:55 pm

I wonder what you refer to as the "Z-score for each profile". Perhaps you mean

(a-b)/c

where b and c are the mean and SD for the variable overall and a is the mean for the variable in a specific class? If so, you can express this as a New parameter in the Model Constraint command using parameter labels from the Model command.

wang_hao_ran posted on Saturday, April 06, 2019 - 8:38 pm

that's exactly what we want, thank you Muthen!

Silvia Colonna posted on Thursday, April 11, 2019 - 6:57 am

Hi all,

I have a couple of questions related to my LPA.

1) is it possible to compare the mean scores of my LPA indicators across the different classes identified to see if they are significantly different? e.g. to see if the mean score of RT in class 1 is significantly different than in class 2. Sort of a between-subjects ANOVA but with Mplus.

2) I am very new to Mplus, is there a book that goes through every steps LPA, explain the output file and illustrate different models of LPA more in details than the user's guide?

Thank you for your help.

Silvia

Bengt O. Muthen posted on Thursday, April 11, 2019 - 4:16 pm

1) Yes, use Model Test.

2) We list the book by Geiser on our website under Books. See also the articles on LPA on our web site under Papers, Latent Class Analysis.

Jessica Smith posted on Wednesday, April 17, 2019 - 12:29 pm

Hello,
I have a few questions and hope you can help out.

1) If I run LPA with covariates and distal outcomes, what are the differences between Mplus output and LatentGold output? What kind of things (output wise) Mplus has but LatenGold does not, or vice versa?

2) I tried to use gender as covariate but was warned that categorical variable cannot be used as auxiliary variable. So Mplus does not allow categorical variable to be auxiliary variable?

3) I know how to get the means of distal outcome in each class, and their mean differences. But how can I know if those mean differences are significant?

4) I had a few negative weights by using BCH and the entropy is more than .8, does it mean I need to switch to 3-step?

Thank you very much,

Bengt O. Muthen posted on Wednesday, April 17, 2019 - 3:19 pm

1) I don't know what kinds of outputs Latent Gold has.

2) Covariates should not be put on the Categorical list.

3) Use Model Constraint with e.g.

New(diff);

diff = a - b;

where a and b are parameter labels given in the Model command.

4) Negative weights with BCH is normal.

Jessica Smith posted on Thursday, April 18, 2019 - 7:40 am

Thank you very much Dr.Muthen,
3) Yes. I got the mean difference. But how can I know if this mean difference is significant between these two groups?

New/Additional Parameters
DIFF -46.216

Bengt O. Muthen posted on Thursday, April 18, 2019 - 3:01 pm

The output gives you the estimate and its SE. Est/SE is a z-score which gives you a p-value.

Jessica Smith posted on Thursday, April 18, 2019 - 6:53 pm

Thank you very much Dr. Muthen.

I think that is the problem. The output does not have SE. EST/SE. It only reports parameter. I put "standardized; " in the ouput command, nothing showed up.

I ran LPA on Mplus without any problem, but when I include distal outcome and covariates, I just can't get it work.

I know usually at this time you will ask me to send output along with license to the support. But since I am using a university copy, I do not have a license.

Thank you for the response. I will see what else I can do.

Linda K. Muthen posted on Friday, April 19, 2019 - 8:33 am

It sounds like your model did not converge. We need to see the output to say more and you must be a registered user of a license with a current support contract to be eligible for support.

Jessica Smith posted on Friday, April 19, 2019 - 9:03 am

Thank you very much. I understand that.

Jilian Halladay posted on Friday, May 03, 2019 - 8:11 am

Hello Dr. Muthen,

When assessing profile separation within Latent Profile Analyses, is there an easy way to calculate the standardized mean differences between the indicator variable means and variances for each profile?

It is discussed in this book: The Oxford Handbook of Quantitative Methods Edited by
Todd D. Little

Thanks in advance!

Gregory M. Dams posted on Friday, May 03, 2019 - 12:08 pm

Hi Jilian,
I just sent you a spreadsheet by email with a faster way to calculate the standardized mean differences as discussed in Masyn's (2013) chapter in that textbook.

Best,
Greg

DavidBoyda posted on Sunday, May 26, 2019 - 11:52 pm

Dear DrMuthen

I have a LPA model where the BIC keeps improving. If I was to add WITH statements, on what basis do i select which variables to correlate?

DavidBoyda posted on Monday, May 27, 2019 - 3:04 am

Oh example 7.22 defines this.

Ads posted on Sunday, June 02, 2019 - 1:41 pm

To address outliers in LPA, I was considering Winsorizing all of the indicators first. Would there be any issue with MLR estimation of variance or SEs (or any other issues you would foresee)?

I ask because the univariate variance/SE equations are different for Winsorizing that in traditional variance estimation. I am considering just trimming the variables 20% as well, and have the same question for that approach.

Bengt O. Muthen posted on Monday, June 03, 2019 - 2:54 pm

Don't know really, but Winsorizing or trimming would seem to risk affecting the class formation - perhaps instead allow for outlier classes.

Ads posted on Monday, June 03, 2019 - 4:04 pm

Thank you Dr. Muthen.

Jessica Smith posted on Thursday, June 06, 2019 - 10:21 am

Dear Dr. Muthen,

My question is like a follow-up on the question asked by Stephen Ungvary on 2/22/2018.

I am interested in the class specific means, suppose this is the code.
MODEL:
%OVERALL%
C ON x1 x2 x3;
y1 ON x1 x2 x3;
%c#1%
y1 ON x1 x2 x3;
[y1](a);
%c#2%
y1 ON x1 x2 x3;
[y1](b);
%c#3%
y1 ON x1 x2 x3;
[y1](c);
%c#4%
y1 ON x1 x2 x3;
[y1](d);

MODEL TEST:
a = b;
a = c;
c = d;

If I do not put "y1 ON x1 x2 x3" in each class, I feel like I don't see the involvement of covariate influence when I request the distal mean in each class. In other words, if I just write like this:

%c#4%
[y1](d);

Why the class-specific intercept is beyond the influence of covariates? Just by looking at these two lines, this is only about latent class variable and distal outcome.

Thank you very much.

Bengt O. Muthen posted on Thursday, June 06, 2019 - 5:20 pm

When you say Y ON... in the overall part of the model only, you are saying that C does not interact with the X's in their influence on Y; only the intercept will differ (it differs by default without mentioning it in each class; see Mixture defaults in the UG).

Jessica Smith posted on Sunday, June 09, 2019 - 12:34 pm

Thank you Dr. Muthen. It is clearer.

Sanna Oinas posted on Wednesday, July 03, 2019 - 9:56 am

Hello,

I have conducted LPA with the data (N=1900) of 3 variables measured with 1-5 Likert- scale. I know that there are around 5% of participants, who have reported the value 1 (defined as �never� in options) to each variables. However, the profile solutions from 1 to 10 do not recognize the profile, where participants indicate �never� experienced any of the measured phenomenon�s, although there is a profile even for 0,5% of participants. As it would be necessary for the results and later analysis to �reveal� a profile answering �never� to all the items, my question is: Why there is not such a profile? Are there a way to define it?

Thank you for your answer in advance!

Bengt O. Muthen posted on Wednesday, July 03, 2019 - 3:39 pm

You can define such a never class but typically that doesn't give the best BIC. The never people may behave similar to the close to never people as judged by BIC.

Sanna Oinas posted on Thursday, July 04, 2019 - 2:33 am

Thank you for your quick response. The existence of those "never"-participants are important to report, so I think, I have (at least) two options to continue. Do you think it would be methodologically totally inappropriate to first exclude those never-participants, then run the LPA and then include the never-participants as one group to the further analysis? Will this have effect on the other profiles defined? (I actually have done this already, and did not notice difference when I compared the profiles for the profiles defined from whole data) If this rather easy solution is not appropriate, have you any recommendations, where I could find the syntax for defining these never-participants?

Best regards,
Sanna

Bengt O. Muthen posted on Friday, July 05, 2019 - 3:36 pm

I think it is better to present a model for all subjects. Then you can also have predictors of being in the never class or not. Have a look at these 2 papers on our website under Papers, Growth Mixture Modeling:

Kreuter, F. & Muth�n, B. (2008). Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling. Journal of Quantitative Criminology, 24, 1-31. Click here to download Mplus input and output files associated with this paper.
download paper contact first author show abstract

Kreuter, F. & Muth�n, B. (2008). Longitudinal modeling of population heterogeneity: Methodological challenges to the analysis of empirically derived criminal trajectory profiles. In Hancock, G. R., & Samuelsen, K. M. (Eds.), Advances in latent variable mixture models, pp. 53-75. Charlotte, NC: Information Age Publishing, Inc. Click here for information about the book.
download paper contact first author

Anne Black posted on Thursday, August 01, 2019 - 9:44 am

Hello,
For LPA, we would like to report the number of respondents likely belonging to each class. I'm finding fairly large differences in class counts (>60 for one class) based on posterior probabilities vs. using the class assignment (c) variable in the saved data file based on cprobabilities. Should the counts be more closely aligned? My sample size is 4200 and entropy is .744.

Thank you for advising!

Bengt O. Muthen posted on Thursday, August 01, 2019 - 10:36 am

Use the numbers reported under the heading

FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE
BASED ON THE ESTIMATED MODEL

That is, it is the model-estimated values you should go by.

Nitya Chawla posted on Monday, October 21, 2019 - 8:52 pm

Hello!

I'm running a multilevel latent profile analysis. The profile enumeration process suggests that the 4-profile solution is best. When trying to run antecedent analyses with R3STEP and the 4-profile solution, the coefficients for one of the variables are extremely strange (e.g., 1123.427, 1124.052). For the last parameterization, the SE is simply .0000. We first believed this was because the variable we're considering is most frequently reported with a 0 (we're measuring number of offers). However, this problem disappears if I specify a 5-profile solution, even though the 5-profile solution exhibits worse fit. Is there something I'm missing?

Thank you!!

USEVARIABLES ARE ID IDorder PAACTW NAACTW
FIRST1 OFFERS1;
IDVARIABLE = IDorder ;
CLUSTER = ID ;
CLASSES = c(4);
MISSING ARE ALL (-99.00);
AUXILIARY = (R3STEP) FIRST1 OFFERS1;

ANALYSIS: TYPE = mixture complex;
ESTIMATOR = MLR ;
STARTS = 5000 200;
LRTBOOTSTRAP = 100;
LRTSTARTS = 10 5 80 20;

MODEL:

%OVERALL%
PAACTW; NAACTW;

OUTPUT: Tech7; Tech11; Tech14;
PLOT: TYPE= Plot3;
SERIES= PAACTW NAACTW (*);
SAVEDATA: FILE=S2.dat; SAVE=CPROB;

Bengt O. Muthen posted on Tuesday, October 22, 2019 - 5:01 pm

We have to see your full output to say - send to Support@statmodel.com along with your license number.

Note also that you save a lot of computing time if you follow the advice about Tech11 and tech14 that we give in Web Note 14.

Ayca Alayli posted on Wednesday, November 13, 2019 - 11:44 am

Hello Dr. Muthen,

My sample consists participants from 3 countries and I have 4 indicators. I ran LPA and found 4 profiles (based on class number comparisons, entropy value, AIC and aBIC).
How can I be sure of measurement invariance as I have three different countries? Should I run the LPA for each country and see whether the same number and type of classes emerge? or is it meaningful to use KNOWNCLASS option following the syntax in the guide as below?:

DATA: FILE IS ex7.21.dat;
VARIABLE: NAMES = g y1-y4;
CLASSES = cg (2) c (2);
KNOWNCLASS = cg (g = 0 g = 1);
ANALYSIS: TYPE = MIXTURE;
MODEL:
%OVERALL%
c ON cg;
MODEL c:
%c#1%
[y1-y4];
%c#2%
[y1-y4];
MODEL cg:
%cg#1%
y1-y4;
%cg#2%
y1-y4;
OUTPUT: TECH1 TECH8;

Also, if knownclass is meaningful to run, in my case cg would be the country which is not dichotomous but 3 countries. Do I need to use dummycoded country variable?
I am quite new to MPlus and LPA, I hope my question is clear.
Thank you!

Bengt O. Muthen posted on Wednesday, November 13, 2019 - 1:02 pm

You create a variable that has 3 values, say 0, 1, 2 and let cg correspond to that variable. You hold all means equal across the 3 cg classes/countries in a first run. Then you allow one variable at a time have a different mean over countries and do a likelihood-ration chi-square test to check significant non-invariance.

Brandi Rollins posted on Monday, March 09, 2020 - 10:04 pm

Hi, please help. I would like to make sure I am doing this correctly. I am trying to run an LPA with 6 measured indicators and 2 latent variables. I would eventually like to plot a bar chart showing the means of all of the variables (measured and latent) by class...but the factor scores I am getting for the latent variables do not have a mean of 0. So I am not sure if I am writing the syntax wrong.

USEVARIABLES = slowwalk TCBQ_att TCBQ_imp TCBQ_inh CBQ_imp CBQ_inh CEBQ_sr CEBQ_se CEBQ_ef CEBQ_eoe CEBQ_fr age sex;

IDVARIABLE IS famid;

CLASSES = c (3);

ANALYSIS:
TYPE = mixture;
ESTIMATOR=MLR;
PROCESSORS = 8;
STARTS = 100 10;
STITERATIONS = 10;

MODEL:
%OVERALL%

INHIBIT BY TCBQ_inh CBQ_inh SLOWWALK;
IMPULSE BY TCBQ_imp CBQ_imp;
TCBQ_imp CBQ_imp;
IMPULSE WITH INHIBIT @ 0;
C ON AGE SEX;

%c#1%
[INHIBIT*1] (i1);
[IMPULSE*1] (i2);

%c#2%
[INHIBIT*-1] (il);
[IMPULSE*-1] (i2);

Please advise. This is supposed to be a straightforward LPA with 6 measured indicators and 2 latent variables.

Bengt O. Muthen posted on Tuesday, March 10, 2020 - 10:32 am

The estimated factor means are not zero - which is what you want - so the estimated factor scores won't be close to zero either.

Brandi Rollins posted on Tuesday, March 10, 2020 - 11:16 am

Thank you for your help! One last question...Am I right to force the correlation between the latent variables to be zero? From what I recall, the manifest variables are uncorrelated...or is it the errors?

Brandi Rollins posted on Tuesday, March 10, 2020 - 11:37 am

Also, from the previous discussions on other threads, it says that the intercepts of the latent variables should be constrained to be equal. Am I doing that correctly?

%c#1%
[INHIBIT*1] (i1);
[IMPULSE*1] (i2);

%c#2%
[INHIBIT*-1] (il);
[IMPULSE*-1] (i2);

Bengt O. Muthen posted on Tuesday, March 10, 2020 - 2:51 pm

Factor correlations should not be zero. Latent variable means should not be held equal - their differences are identified (can be estimated) assuming that you impose invariance across classes in the intercepts of the outcomes.

If that doesn't help, send your output to Support along with your license number.

Elke Sekeris posted on Monday, March 16, 2020 - 1:03 am

Hello,

I am trying to run a latent profile analysis to explore the different strategy patterns in my data. A 1-class model runs fine. However, starting from a 2-class model an error message is given:

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS -0.178D-16. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 1, %C#1%: [ EA_M4 ]

For models with more classes the same variable (EA_M4) is given as parameter with problems.

Increasing the starting values does not seem to help and since the best log likelihood was replicated I am not sure whether this is the correct way in trying to solve this problem?
Is there anything else I can do?

Thank you in advance!

Bengt O. Muthen posted on Monday, March 16, 2020 - 10:45 am

Perhaps this variable has zero within-class variance in the data. We need to see your full output - send to Support along with your license number.

Friedrich Platz posted on Monday, May 18, 2020 - 6:31 am

Hello!

I'm trying to model a LPA with class/profiles-varying indicators' variances. If I understand correctly, Mplus constrained the indicators' variance across all classes.
For variance estimation varying across classes I used the following syntax:

Variable:
names = Y1-Y5;
usevariables = Y1-Y5;
classes = c(4);

Model:
%OVERALL%
[Y1-Y5];
Y1-Y5;

However, this doesn't work correctly; MPLUS is estimating indicators' variances held constant across all classes.
Would you please help me?

Thank you in advance!

Bengt O. Muthen posted on Monday, May 18, 2020 - 5:26 pm

You need to put this line in every class:

Y1-Y5;

DavidBoyda posted on Thursday, July 30, 2020 - 12:21 pm

Dear Professor,

I have an error in my LPA "...NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL IDENTIFICATION. "

In all estimations of classes I have one class with 1 observation that appears in each run.

How can I solve this problem in Mplus. I checked my data and there are no major outliers.

Class Counts and Proportions

Latent Classes
1 238 0.41391
2 157 0.27304
3 179 0.31130
4 1 0.00174

Bengt O. Muthen posted on Thursday, July 30, 2020 - 3:00 pm

Seems like you should go with 3 classes.

DavidBoyda posted on Friday, July 31, 2020 - 2:03 am

Dear Bengt,

this is what i mean, that single observation manifests when i run a 3 class solution.

I would appreciate your guidance on what do about this.

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions

Latent
Classes

1 202 0.35130
2 372 0.64696
3 1 0.00174

Bengt O. Muthen posted on Sunday, August 02, 2020 - 4:42 pm

We need to see your full output and data to diagnose this - send to Support along with your license number.

Mary Troxel posted on Thursday, September 03, 2020 - 8:28 am

Hi, I am new to MPlus and Latent Profile Analysis. I hope to conduct a confirmatory latent profile analysis. Do you have any resources that you would suggest I read in order to better understand how to conduct these analyses? I have only found Finch & Bronk (2011) and Schmiege, Masyn, Bryan (2017) (which focus primarily on LCA/ binary indicator variables).
Thank you in advance!

Bengt O. Muthen posted on Thursday, September 03, 2020 - 11:21 am

Maybe you can find something useful on our website under Papers, Latent Class Analysis.

Mary Troxel posted on Monday, October 12, 2020 - 5:20 pm

Hi again, I now have a more specific question and am hoping you may be able to provide some guidance.

I am still trying to conduct a confirmatory latent profile analysis. I think that constraining each variable mean in my analyses/classes to the precise mean found in the original study's classes would be too stringent. So, I want to constrain the mean of, for example, variable x in class 1 to the 95% confidence interval of the mean found for variable x in class 1 of the original study. Is this specification/constraint possible? What would it look like? The * or @ commands don't seem to appropriate for this.

Thank you!

Bengt O. Muthen posted on Wednesday, October 14, 2020 - 3:40 pm

You can use the Bayes estimator with priors on the means in each class that capture the information in your CIs by using a normal prior with corresponding mean and variance.