Message/Author 

Rebecca posted on Thursday, November 20, 2003  5:18 am



Hello. I have been able to develop an interesting 5class latent profile model in the analyses I’ve been running. Now I am thinking of adding covariates to the model. To see if this might be worthwhile, I quickly ran a few chisquares to see if at this level there would be significant statistical differences (e.g., race/ethnicity with the 5 classes). Unfortunately I did not find significant differences with these analyses. My question is this may it still be worthwhile to run a LPA with covariates? Would this type of multivariate analysis find nuanced significant differences that I am not pickingup in the straightforward chisquare? Thanks in advance for any guidance you can provide. 


I think what you did is classify people into their most likely class, and then do chisquare for classmembership by each covariate. If so and you found no significance, there probably is not any. Doing it stepwise like you did, you will obtain standard errors that are too small. Thus you will falsely find signficance. You can always add covariates to the LPA model and see what happens. 

Rebecca posted on Thursday, November 20, 2003  10:43 am



Yes. This is exactly what I did. Thankyou for your guidance. As you suggest, I think I will add a covariate to the LPA model and see what happens because I have substantive questions that this type of analysis may help answer. Now I have one followup question based your comment about the too small standard errors and false significance In order to assess the LPA model’s interpretability and usefulness, I have been running followup analyses (e.g., MANOVA’s) using substantively relevant auxiliary variables as outcomes to determine if there are significant differences among the classes on these variables. (I’ve run these after creating the LPA model and confirming that I do not have a local solution.) Is this appropriate? Or will I also have too small standard errors for such analyses because I am using a stepwise procedure here as well? If it helps to know, I am using crosssectional data for these analyses. Again thankyou for your helpful guidance and speedy response! 


Whenever you assign a person to the most likely class and treat class membership as a given, i.e., ignore sampling variability, you are going to have bias in your standard errors. They will be too small. It is always best to estimate the entire model at the same time. You can add your auxiliary variables to your analysis as covariates or distal outcomes whichever is most appropriate. 

Rebecca posted on Thursday, December 04, 2003  2:14 pm



Thanks for your comments from a few weeks ago. I have continued to work on the latent profile analysis and now have a followup question. I have three continuous background variables that I wanted to add to the LPA to determine if class membership varies as a function of these background variables. I have been using the latent class analysis with covariates example (example 25.10 in the Mplus manual) as my guide in this analysis, although my class indicators are interval not binary (i.e., this is an LPA model, not LCA). Is this acceptable? I have been able to obtain an identified model, but I want to make certain that I am on the right track. And if I am on the right track, can I interpret the output in the same way for the LPA as I would for a LCA with covariates? That is, is this still multinomial logistic regression? In advance, thanks very much for your help! 


Yes and yes. 

Anonymous posted on Wednesday, August 18, 2004  1:02 pm



I have created latent classes using factor mixture modeling. When I add covariates (children's scores on mental health measures), the classes change. I want to continue to examine the scores as covariates (rather than including them as indicators), as this fits best with my theory. In other words, children's scores are not part of the latent construct I wish to model, but I am interested to know how scores vary according to latent class probability. Am I going about this the correct way? Thank you. 

bmuthen posted on Wednesday, August 18, 2004  1:06 pm



Covariates can and should influence the class formation, not only indicators. Think of it this way  any observed variable correlated with class membership carries information about class membership. In factor analysis with covariates you have the same situation and in fact ETS uses an extensive list of covariates to produce their factor scores (called "proficiences" and printed in your morning paper now and then). The issue of changing class membership due to covariates is discussed explicitly in Muthen (2004) which is in pdf on the Mplus home page. 


I've run a latent profile model and end up with a 3 class solution. I’ve also included covariates to predict class membership, but am unsure of how to interpret the covariate output. In the output below, my initial assumption was that the first column represents the parameter estimate, the second a standard error, and the third a test statistic. If this is correct, my second question is in regard to interpretation of the test statistic. I was originally thinking it could be evaluated on a z distribution (e.g., absolute values of 1.96), but am now confused because this is a multinomial regression, right? For example, from this output, my interpretation was that both Class 2 and Class 3 had significantly lower scores on the “lastsex1” variable compared to Class 1 and that Class 3 had significantly lower scores than Class 2, but that none of the three classes differed on the “relation” variable. Could you please tell me if that is an accurate assessment, or if this should be interpreted differently? Parameterization using Reference Class 1 C#2 ON LASTSEX1 1.526 0.570 2.680 RELATION 0.211 0.464 0.456 C#3 ON LASTSEX1 3.710 0.520 7.141 RELATION 0.665 0.462 1.439 Parameterization using Reference Class 2 C#3 ON LASTSEX1 2.184 0.579 3.771 RELATION 0.454 0.501 0.905 


The test is a zscore. You interpreation sounds correct but it is not lower values but less likely. 


Thanks for your reply  that makes perfect sense and thank you for clarifying that I'm still dealing with likelihoods. As a quick followup question, I was wondering if it would be appropriate to calculate the odds ratios and confidence limits from the parameter estimates and standard errors to report in the manuscript I am writing. 


We give odds ratios as part of the results and if you ask for CINTERVAL in the OUTPUT command, you will obtain confidence intervals. 


Good morning, I am conducting a LPA with 4 classes and 2 continuous predictor variables. I would like to change the order of the classes so that I have a different reference class so that I have the odds ratios. I have read numerous threads in the discussion, and I know that I need to use the ending values of the desired reference class as starting values for the last class. I also know that these values can be found in the output. I have two questions. 1) Which values do I use? and 2) What is the input syntax that I need to use? It seems that example 7.10 is the closest example of what I want to do. I have included my syntax below. TITLE: PE LPA with gender as a covariate DATA: FILE IS D:\data\Masterdata.dat; VARIABLE: NAMES ARE EUID CMExp CSExp D1stGen CertType YOB Ethnic Gender Mk12ex Sk12ex MPtotal SPtotal k12ex Cex hstotal hmtotal cmtotal cstotal pmte1 mtoe1 pste1 stoe1 pmte3 mtoe3 pste3 stoe3; IDVARIABLE = EUID; MISSING are all(9); CLASSES = c(4); USEVAR = Gender pmte1 pste1; DEFINE: IF (Gender eq 1) THEN Male = 0; IF (Gender eq 2) THEN Female =1; ANALYSIS: TYPE = mixture; STARTS = 0; MODEL: %OVERALL% c#1 ON Gender; c#2 ON Gender; c#3 ON Gender; Thanks for your help! 


You use the values under means and the syntax shown in Example 7.10. 


Linda, Thanks so much for the help! One point of clarification. Do I want to use the means from the baseline model (the 4class LPA without covariates)or the means from the first run with a particular covariate? 


You want to use the means for the analysis for which you want to change the reference class. 


I am running a multigroup LPA model using the KNOWNCLASS command. I've run the groups separately and in both cases a 3profile solution was the best fit, based on the VLMR. The interpretation of profiles was the same across groups as well. These profiles were also the same for the total sample. Is there a way to get the VLMR for a multigroup LPA model? I get the following Warning: TECH11 option is not available for TYPE=MIXTURE with the TRAINING option. Request for TECH11 is ignored. Is there a way to confirm the number of profiles that are the best fit in a multigroup LPA model? Thanks. 


I would do the LPA with KNOWNCLASS for 2, 3, 4, etc. classes and look at BIC to see which is best. 


Thank you! 


We are using MPlus to run a LPA to see if different profiles of family engagement exist and if there are relations between these profiles and child and parent demographic characteristics and child outcomes. When we looked at the results,all but 2 of the auxillary variables are not in the expected metric. When we looked at class membership information that was saved, we also found the variables did not seem to be in the order that was identified in the output. Can you help us understand why this happened and how this can be resolved? Thanks! 


Are the variables in the NAMES statement in the order of the columns of the data set. This is the first thing I would check. Also are the number of variable names in the NAMES statement the same as the number of columns in the data set. It sounds like you may be reading the data incorrectly. Use TYPE=BASIC with no MODEL command to investigate this. 

anonymous posted on Tuesday, March 19, 2013  12:10 pm



When including covariates in an LPA, is there ever a time when you would interpret the intercepts that are presented in the output below the covariate information? (For example): Categorical Latent Variables C#1 ON GRADE 0.174 0.231 0.754 0.451 SEX 0.287 0.502 0.572 0.567 C#2 ON GRADE 0.347 0.355 0.978 0.328 SEX 1.662 0.950 1.749 0.080 C#3 ON GRADE 0.054 0.249 0.215 0.830 SEX 0.121 0.520 0.233 0.816 Intercepts C#1 1.754 1.180 1.482 0.138 C#2 4.329 1.545 2.801 0.005 C#3 1.970 1.274 1.547 0.122 


No, these would not be interpreted. They are simply related to the class probabilities which you know. 


I have recently completed an LPA with auxiliary variables and am trying to obtain more detailed information from my model results. My specific questions are: 1. Since my auxiliary variables are categorical, I know I can’t show means on these added variables, but how can I show frequency distributions on each variable by class membership produced from the LPA? 2. Do you have any recommendations on how to show or discuss the chisquare results? My output shows pvalues between pairs of classes, but I can't infer any differences in representation that are greater than expected as you would say from standardized residuals in a chisquare analysis. 3. How do these chisquare tests differ methodologically from a standard chisquare test based on class membership? I only ask this because my chisquare values produced in trying to get answers using SPSS were much larger for one of my variables almost by a factor of 4. Thanks so much, Jonathan Steinberg 


Please send your output to Support@statmodel.com. 


Dear Professors Muthen, I have run a Latent Profile Analysis with 8 variables. The sample is composed by two subsamples (1 recruited online and 1 recruited offline). The two subsamples differ on 2 of the 8 variables (p<.001). Therefore, I have run a multiplegroup LPA in order to account for the subsamples differences (the observed classes correspond to the subsamples online/offline). Does it make sense to you or would you suggest another solution? Thank you very much! Andrea 


When you say "differ on 2 of the 8 variables", do you mean that the means of those variables are different or that the variables themselves are different? And when you do the knownclass run, how are the known and unknown class variables specified to be related? 


Hi, Thank you for the prompt reply. Yes, their means differ (I have run a ttest). The variables are the same in the two groups. If I regress the unknown on the known, I find a significant association. 


You may also want to explore direct effects from the Knownclass variable to those 2 variables, i.e. mean variation for the 2 variables as a function of Knownclass classes. Instead of Knownclass you can use an observed binary covariate x, with c on x and y on x for those 2. 

S Ernestus posted on Sunday, March 15, 2015  12:52 pm



Hello, I am having trouble understanding how to interpret and report covariate effects for an LCA. For a 4 class solution with 2 covariates: C#1 ON STRESS1 5.087 7.635 0.666 0.505 RACE 0.225 0.630 0.357 0.721 C#2 ON STRESS1 0.674 0.322 2.094 0.036 RACE 0.342 0.255 1.343 0.179 C#3 ON STRESS1 0.607 0.410 1.483 0.138 RACE 0.416 0.439 0.948 0.343 Intercepts C#1 2.036 0.744 2.737 0.006 C#2 2.109 0.670 3.145 0.002 C#3 0.731 0.856 0.855 0.393 From what I understand, each of these is providing the statistics to compare each class against the reference class. So for example, when compared to class 4, the probability of being in class 1 decreases as stress increases but this is not significant. Is that correct? Is there a way to interpret or report the significance of the covariate influence overall (e.g. overall did stress predict class membership)? Thank you so much for your time. 


Q1. Yes. Q2. You can use Model Test to test if the 3 stress coefficients are jointly zero. 


I've read through a lot of posts regarding when covariates should be included in the models, but I'm still confused. Setup: Develop sexism profiles (LV) as measured by 4 sexism scales (continuous). Determine if sexism profiles are a greater predictor of attitudes towards father involvement than demos. Original plan: Conduct an LPA using sexism scales as observed y's to determine the bestfitting model. Assign cases to classes based on post probs, examine differences in the demographics of the classes using crosstabs, etc. Ultimately class assignment will be entered as the first block in a HMR, with demos in a second block, to examine relationship with father involvement attitudes. Alternative plan: Conduct an LCA using sexism scores as y's and the demos as u's. After selecting the best fitting model, examine classes to determine if demo differ between classes. Then continue with the HMR as planned. This is where I start to get confused, because some posts say remove covariates (e.g. demos) one at a time and see if it changes the model solution while others say to add them in one at a time. Which is the best/accepted practice? Also, I know that when doing an LPA, I have to run each class enumeration 4 times to account for each of the main withinclass var/cov structures. Can I still do this with the y's if I'm including categorical u's? 


I am guessing that "demos" are demographic variables. If you don't want to do a singlestep analysis, I would follow the Section 3.2 manual 3step approach in the paper on our website: Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Using the BCH method in Mplus to estimate a distal outcome model and an arbitrary second model. Web note 21. 


Hello! I'm new to Mplus and getting the help of your homepage a lot. Thank you so much. However, there are still problems I'm dealing with and they are as follows. My research model  conditional LPA model  contains (1) 3 variables used for LPA: These are continuous variables and correlated each other. (2) 8 covariates . 1. Is it possible to use 3 variables from different time for LPA? For example, among 3 variables which are used for LPA in my study, one is the data measured in 2014 and the others were measured in 2004. 2. I think the data used in the analysis is truncated because I have only chosen the data of people who have jobs and did listwise deletion of the others. Then, I have acknowledged that it can be a problem of the selectionbias in OLS. In this case, do I need to use Heckman model in my analysis? 3. Should all variables in LPA meet the normality assumption? Thank you very much in advance for your help! 


1. No problem if they are measured on the same sample. 2.That would be too complex to try. Just do your inference (draw the conclusions for) your particular, selected sample. 3.No, mixtures implies no normality. 


Hi, I intend to use BCH for a LPA with auxiliary variables (antecedents/distal outcomes). Three questions: 1. Does the "mixture implies no normality" argument apply only to indicator variables of LPA, or does it apply to all variables in the model? 2. I have an outcome variable (salary) which distribution is negativelyskewed, leptokuric, and possibly contains outliers. Would you recommend any preprocessing? 3. I read in several parts of this forum that as long as the antecedent has a direct effect on any of the profile indicators, there would be not be measurement invariance. However, according to Lubke & Muthen (2005), if the direct effect does not vary across profiles, MI will still hold  is this correct? Thank you. 


1. Only to variables influenced by the latent class variable, so not x's inm c ON x. 2. Only if there are clear outliers that you don't want to devote to a class. 3. Did we really say that? Seems like there would be measurement noninvariance in all classes. I am thinking of measurement noninvariance as a difference in response mean/prob across x values even when conditioning on latent class  and that's the direct effect. 


Thank you. Page 29 of the article states that when the direct path is specified to be class invariant, the latent classes can still be compared in a straightforward way... I just took that to mean MI but I see your point now. 1. My model has 2 latent profile indicators and the covariate (gender) in my model appears to have a profile invariant effect on one of the indicators when I explored the data using R3STEP (Appendix O to Webnote 15). Will I still be able to use BCH in this case? 2. If I can, do I need to take any special care when I interpret the interprofile differences on distal outcomes? 3. Gender correlates with some of my distal outcomes as well so I also intend to regress outcome on it. This is like the example in section 3.2 of Webnote 21  but I also want to specify a profileinvariant direct effect u1 on x2 (where x2 is a duplicate of x). Is this feasible? Thank you! 

Chee Wee Koh posted on Wednesday, February 24, 2016  3:48 pm



Dear Dr Muthen, I have done some further reading (Kankaras, Moors, & Vermunt, 2010) and analyses, and I would appreciate your advice on the way forward please: 1. I ran LPA for males and females separately. A 4profile model is the best fit for both groups. 2. As mentioned, I have two indicator variables. The indicator means within each profile did not replicate across the gender groups. Also, the entropy in the male group was higher (.79 vs. .71). 3. I ran LPA for the pooled sample (with gender as covariate). Now a 5profile model provides the best fit (the increase in no. of profiles with pooled sample is consistent with the example in Kankaras et al.). The model with gender influencing class assignment only (structural equivalence, MI) has a BIC of 11785, whereas the model with direct effect from gender to one of the indicator variables (no MI, just metric equivalance) has a slightly better BIC (11780). My questions remain the same as those in my earlier post. My study is exploratory. I am more interested in the impact of profiles on distal outcomes rather than gender differences but if the nonMI means that I should analyze male and female data separately, I can go that way too. Thank you! 


First post: 1. I don't know what this means: " a profile invariant effect on one of the indicators when I explored the data using R3STEP" But I don't think you can use BCH correctly when there are indicator effects. 3. I think BCH assumes conditional independence of the indicators and other variables given the latent class variable, so no. Second post: Try this more general analysis question SEMNET. 

Harmen Zoet posted on Tuesday, October 25, 2016  1:43 am



I'm planning to do an LPA of treatment outcome, after which I will try to find predictors of class membership (using continuous as well as categorical predictors). Is there basically a difference between inserting my predictors in the LPA as covariates vs. first conducting an LPA and then using the latent variable as the dependent variable in a multinomial logistic regression, which is conducted afterwards? 


See NylundGibson, K. & Masyn, K. (2016). Covariates and mixture modeling: Results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling: A Multidisciplinary Journal, DOI: 10.1080/10705511.2016.1221313 view abstract contact author 


I completed a LPA on a mixedgender sample (n=408, 50% male). Using the steps in Finch, 2015, I then checked to see if the 4 class model I found was consistent across males and females. I got the same 4 class profiles for both males and females (and as a whole). When doing this, someone suggested I check to make sure people did "jump" groups when the analyses were done together or by gender. Although the girls did not jump much, about 50 boys moved groups. Some from an extreme group into a nonextreme group. My question is 1) is this a valid analysis and 2) does this mean anything from a statistical level or is the important part that the same 4 groups were found? 


I think it is important that boys jumped when you analyzed them together with girls compared to analyzing them alone even if 4 classes and perhaps also the class percentages are the same. That seems to speak to model misfit in the joint analysis. I assume you checked measurement invariance in the joint analysis. 


I tried to do the gender invariance using cg (2) c(4) and c(4) cg (2) but I am not sure how to interpret these analyses. I cannot seem to find any reference that tells me whether to look at BIC differences, or model fit, or what not. For the parameters being free the BIC is 7918 with an entropy of .866 For the parameters contrained the BIC is 7923 and the entropy is .853. These are so similar that from what I can tell the models are the same and therefore there is no variance across gender. Am I understanding that correctly? 


By default c and cg are uncorrelated. If you say c on cg; you allow the c class percentages to vary with cg. 


Where would I find out to understand whether I have gender invariance with that analysis? I still come out with 4 groups for both makes and females. I have been looking for resources on how to determine this and I cannot find anything. 


You can use the dot command %cg#1.c#1% etc for the combination of classes. And then within each such combination impose equality constraints on the outcome means/thresholds across the cg classes. This is the run for measurement invariance across cg classes. Then run without those equalities. The create the usual likelihood ratio chisquare test from the two loglikelihood values. 


As suggested I now have the following. MODEL: %overall% c ON cg; %cg#1.c#1% %cg#1.c#2% %cg#1.c#3% %cg#1.c#4% %cg#2.c#1% %cg#2.c#2% %cg#2.c#3% %cg#2.c#4% How do I constrain these to be equal? What is the best place for me to find literature on how to do this? I have been using the mplus user guide and these forums and I am feeling really lost. 


Say that you have 5 latent class indicators y1y5. Measurement invariance across the cg classes is then stated via equalities of parameters labels p for the y means as %cg#1.c#1% [y1y5] (p1p5); %cg#1.c#2% [y1y5](p11p15); %cg#1.c#3% [y1y5] (p21p25); %cg#1.c#4% [y1y5] (p31p35); %cg#2.c#1% [y1y5] (p1p5); %cg#2.c#2% [y1y5](p11p15); %cg#2.c#3% [y1y5] (p21p25); %cg#2.c#4% [y1y5] (p31p35); No measurement invariance is obtained by deleting the in parentheses parameter labels. 

Fan Xizhen posted on Wednesday, February 22, 2017  1:27 am



Dear Prof. Muthen, I¡¯m working on a multigroup profile analysis with known classes with two crossnational samples, and I¡¯m confused about how to interpret the analyses. I ran 4 MLPA models, (1) a completely unrestricted MLPA in which withinprofile means and variances were allowed to vary freely over group, in addition to profile size, (2)a semiconstrained model1 in which profile size was still allowed to vary freely, but conditional means and variances were constrained to be equal across groups, (3)a semiconstrained model2 in which profile size and variances were still allowed to vary freely, but conditional means were constrained to be equal across groups, and (4) a fully constrained model, both profile size and withinprofile means and variances were fixed to be equivalent across groups. I used BIC to select the best fit model and it showed that semiconstrained model2(e.g., profile size and variances were still allowed to vary freely, conditional means were constrained to be equal across groups). So, does this mean there was no measurement variance across nations? What is the difference between semiconstrained model1 and semiconstrained model2? Do I have to constrained the means and variances at the same time to obtain measurement invariance? Thanks in advance, your reply would be highly appreciated! 


Knownclass is a way to do multiplegroup analysis in the mixture context. Regarding MLPA, I assume you impose measurement invariance  which implies that you should use these results for 3stepping. 

Julie Nguyen posted on Wednesday, September 06, 2017  9:34 pm



I ran a latent profile analysis without covariates using the the entire sample. I found 4 profiles to be the best model. Now I want to determine if there are group differences but I have 4 groups I want to compare. 1)Am I able to compare the 4 latent profiles across the four groups in one model using KNOWNCLASS or can I only compare two groups in a model at a time?I've only found examples with two groups. 2)Rather than comparing the same number of latent profiles (4) for each group in a model, is there a way I can allow the number of latent profiles to vary (be estimated separately) per group for 4 different groups but done all in one model (ie. In one analysis/model output, group 1 could have 4 profiles, group 2 could have 5 profiles, etc.)? Or is it best to run the latent profile analysis for each group separately? I want to be able to have 4 groups in one model to later see if there are differences in covariates relating to a profile per group. 


1) Yes, 4 grps is not a problem. 2) If you have no parameters held equal across groups, a joint analysis of all groups gives the same results as separate analysis of each group. So nothing is gained. 

Julie Nguyen posted on Thursday, September 07, 2017  3:59 pm



Thank you! I have some follow up questions. 1) would I need to dummy code the 4 group variable? 2)what changes would I make to a multigroup latent profile syntax to run a joint analysis with all 4 groups in the model without specifying the number of profiles but rather explore how many latent profiles there are per group? 


1) No, you can use Define to create a grouping variable which you then use for Knownclass. 2) See a UG example that has cg in it, such as ex 8.8. 


Good morning, I am running an LPA with covariates. I have found that a 4 class solution best fits the data. When I include the covariates, the proportions in each class changes significantly. I read in another discussion that when this happens, this disparity in class proportions requires direct effects between class indicators and covariates. When I include these direct effects, class proportions are still different. Do you have any suggestions available to help fix this issue? Thank you for your help. 


You don't include direct effects for the results to be the same as with no covariates, you include them to get good estimates for the model. You can also use R3STEP if you don't want the covariates to influence the classification. 

Abeer Alamri posted on Saturday, December 15, 2018  5:34 am



Hello, I am doing LPA (5profile) and using gender as a covariate. I selected 5 profiles and used R3Step. it seems the algorithm cannot converge on an estimate for one of the levels (*****). In the first set, it can’t come up with an estimate of 4 vs 5. We need to figure out what is happening here is we want to speak about gender. 1) Could you please explain why we have **** and can't have an estimate? 2) How we could interpret this? Thank you. Here is a part of the output: THE 3STEP PROCEDURE TwoTailed Estimate S.E. Est./S.E. PValue C#4 ON G ********* 0.000 999.000 0.000 Parameterization using Reference Class 1 C#4 ON G ********* 1.303 9834.578 0.000 Parameterization using Reference Class 2 C#4 ON G ********* 24.407 527.947 0.000 


Maybe the variable G has a very small scale (very small variance) or maybe class 4 is very small or maybe G is binary and has nobody or only 1 person in class 4. If this doesn't help, send your output to Mplus Support along with your license number. 

Back to top 