Rebecca posted on Thursday, November 20, 2003 - 5:18 am
Hello. I have been able to develop an interesting 5-class latent profile model in the analyses I’ve been running. Now I am thinking of adding covariates to the model. To see if this might be worthwhile, I quickly ran a few chi-squares to see if at this level there would be significant statistical differences (e.g., race/ethnicity with the 5 classes). Unfortunately I did not find significant differences with these analyses. My question is this- may it still be worthwhile to run a LPA with covariates? Would this type of multivariate analysis find nuanced significant differences that I am not picking-up in the straightforward chi-square? Thanks in advance for any guidance you can provide.
I think what you did is classify people into their most likely class, and then do chi-square for classmembership by each covariate. If so and you found no significance, there probably is not any. Doing it stepwise like you did, you will obtain standard errors that are too small. Thus you will falsely find signficance. You can always add covariates to the LPA model and see what happens.
Rebecca posted on Thursday, November 20, 2003 - 10:43 am
Yes. This is exactly what I did. Thank-you for your guidance. As you suggest, I think I will add a covariate to the LPA model and see what happens because I have substantive questions that this type of analysis may help answer.
Now I have one follow-up question based your comment about the too small standard errors and false significance-
In order to assess the LPA model’s interpretability and usefulness, I have been running follow-up analyses (e.g., MANOVA’s) using substantively relevant auxiliary variables as outcomes to determine if there are significant differences among the classes on these variables. (I’ve run these after creating the LPA model and confirming that I do not have a local solution.) Is this appropriate? Or will I also have too small standard errors for such analyses because I am using a stepwise procedure here as well? If it helps to know, I am using cross-sectional data for these analyses.
Again thank-you for your helpful guidance and speedy response!
Whenever you assign a person to the most likely class and treat class membership as a given, i.e., ignore sampling variability, you are going to have bias in your standard errors. They will be too small. It is always best to estimate the entire model at the same time. You can add your auxiliary variables to your analysis as covariates or distal outcomes whichever is most appropriate.
Rebecca posted on Thursday, December 04, 2003 - 2:14 pm
Thanks for your comments from a few weeks ago. I have continued to work on the latent profile analysis and now have a follow-up question.
I have three continuous background variables that I wanted to add to the LPA to determine if class membership varies as a function of these background variables. I have been using the latent class analysis with covariates example (example 25.10 in the Mplus manual) as my guide in this analysis, although my class indicators are interval- not binary (i.e., this is an LPA model, not LCA). Is this acceptable? I have been able to obtain an identified model, but I want to make certain that I am on the right track. And if I am on the right track, can I interpret the output in the same way for the LPA as I would for a LCA with covariates? That is, is this still multinomial logistic regression?
Anonymous posted on Wednesday, August 18, 2004 - 1:02 pm
I have created latent classes using factor mixture modeling. When I add covariates (children's scores on mental health measures), the classes change. I want to continue to examine the scores as covariates (rather than including them as indicators), as this fits best with my theory. In other words, children's scores are not part of the latent construct I wish to model, but I am interested to know how scores vary according to latent class probability.
Am I going about this the correct way? Thank you.
bmuthen posted on Wednesday, August 18, 2004 - 1:06 pm
Covariates can and should influence the class formation, not only indicators. Think of it this way - any observed variable correlated with class membership carries information about class membership. In factor analysis with covariates you have the same situation and in fact ETS uses an extensive list of covariates to produce their factor scores (called "proficiences" and printed in your morning paper now and then). The issue of changing class membership due to covariates is discussed explicitly in Muthen (2004) which is in pdf on the Mplus home page.
I've run a latent profile model and end up with a 3 class solution. I’ve also included covariates to predict class membership, but am unsure of how to interpret the covariate output. In the output below, my initial assumption was that the first column represents the parameter estimate, the second a standard error, and the third a test statistic. If this is correct, my second question is in regard to interpretation of the test statistic. I was originally thinking it could be evaluated on a z distribution (e.g., absolute values of 1.96), but am now confused because this is a multinomial regression, right? For example, from this output, my interpretation was that both Class 2 and Class 3 had significantly lower scores on the “lastsex1” variable compared to Class 1 and that Class 3 had significantly lower scores than Class 2, but that none of the three classes differed on the “relation” variable. Could you please tell me if that is an accurate assessment, or if this should be interpreted differently?
Parameterization using Reference Class 1
C#2 ON LASTSEX1 -1.526 0.570 -2.680 RELATION -0.211 0.464 -0.456
C#3 ON LASTSEX1 -3.710 0.520 -7.141 RELATION -0.665 0.462 -1.439
Parameterization using Reference Class 2
C#3 ON LASTSEX1 -2.184 0.579 -3.771 RELATION -0.454 0.501 -0.905
Thanks for your reply - that makes perfect sense and thank you for clarifying that I'm still dealing with likelihoods. As a quick follow-up question, I was wondering if it would be appropriate to calculate the odds ratios and confidence limits from the parameter estimates and standard errors to report in the manuscript I am writing.
I am conducting a LPA with 4 classes and 2 continuous predictor variables. I would like to change the order of the classes so that I have a different reference class so that I have the odds ratios. I have read numerous threads in the discussion, and I know that I need to use the ending values of the desired reference class as starting values for the last class. I also know that these values can be found in the output. I have two questions. 1) Which values do I use? and 2) What is the input syntax that I need to use? It seems that example 7.10 is the closest example of what I want to do. I have included my syntax below.
Thanks so much for the help! One point of clarification. Do I want to use the means from the baseline model (the 4-class LPA without covariates)or the means from the first run with a particular covariate?
I am running a multigroup LPA model using the KNOWNCLASS command. I've run the groups separately and in both cases a 3-profile solution was the best fit, based on the VLMR. The interpretation of profiles was the same across groups as well. These profiles were also the same for the total sample.
Is there a way to get the VLMR for a multigroup LPA model? I get the following Warning: TECH11 option is not available for TYPE=MIXTURE with the TRAINING option. Request for TECH11 is ignored.
Is there a way to confirm the number of profiles that are the best fit in a multigroup LPA model?
We are using MPlus to run a LPA to see if different profiles of family engagement exist and if there are relations between these profiles and child and parent demographic characteristics and child outcomes.
When we looked at the results,all but 2 of the auxillary variables are not in the expected metric. When we looked at class membership information that was saved, we also found the variables did not seem to be in the order that was identified in the output.
Can you help us understand why this happened and how this can be resolved?
Are the variables in the NAMES statement in the order of the columns of the data set. This is the first thing I would check. Also are the number of variable names in the NAMES statement the same as the number of columns in the data set. It sounds like you may be reading the data incorrectly. Use TYPE=BASIC with no MODEL command to investigate this.
anonymous posted on Tuesday, March 19, 2013 - 12:10 pm
When including covariates in an LPA, is there ever a time when you would interpret the intercepts that are presented in the output below the covariate information? (For example):
Categorical Latent Variables
C#1 ON GRADE -0.174 0.231 -0.754 0.451 SEX 0.287 0.502 0.572 0.567
C#2 ON GRADE 0.347 0.355 0.978 0.328 SEX 1.662 0.950 1.749 0.080
C#3 ON GRADE -0.054 0.249 -0.215 0.830 SEX -0.121 0.520 -0.233 0.816
I have recently completed an LPA with auxiliary variables and am trying to obtain more detailed information from my model results. My specific questions are:
1. Since my auxiliary variables are categorical, I know I can’t show means on these added variables, but how can I show frequency distributions on each variable by class membership produced from the LPA?
2. Do you have any recommendations on how to show or discuss the chi-square results? My output shows p-values between pairs of classes, but I can't infer any differences in representation that are greater than expected as you would say from standardized residuals in a chi-square analysis.
3. How do these chi-square tests differ methodologically from a standard chi-square test based on class membership? I only ask this because my chi-square values produced in trying to get answers using SPSS were much larger for one of my variables almost by a factor of 4.
Dear Professors Muthen, I have run a Latent Profile Analysis with 8 variables. The sample is composed by two subsamples (1 recruited online and 1 recruited offline). The two subsamples differ on 2 of the 8 variables (p<.001). Therefore, I have run a multiple-group LPA in order to account for the subsamples differences (the observed classes correspond to the sub-samples online/offline). Does it make sense to you or would you suggest another solution? Thank you very much! Andrea
From what I understand, each of these is providing the statistics to compare each class against the reference class. So for example, when compared to class 4, the probability of being in class 1 decreases as stress increases but this is not significant. Is that correct?
Is there a way to interpret or report the significance of the covariate influence overall (e.g. overall did stress predict class membership)?
I've read through a lot of posts regarding when covariates should be included in the models, but I'm still confused.
Setup: Develop sexism profiles (LV) as measured by 4 sexism scales (continuous). Determine if sexism profiles are a greater predictor of attitudes towards father involvement than demos.
Original plan: Conduct an LPA using sexism scales as observed y's to determine the best-fitting model. Assign cases to classes based on post probs, examine differences in the demographics of the classes using cross-tabs, etc. Ultimately class assignment will be entered as the first block in a HMR, with demos in a second block, to examine relationship with father involvement attitudes.
Alternative plan: Conduct an LCA using sexism scores as y's and the demos as u's. After selecting the best fitting model, examine classes to determine if demo differ between classes. Then continue with the HMR as planned.
This is where I start to get confused, because some posts say remove covariates (e.g. demos) one at a time and see if it changes the model solution while others say to add them in one at a time. Which is the best/accepted practice?
Also, I know that when doing an LPA, I have to run each class enumeration 4 times to account for each of the main within-class var/cov structures. Can I still do this with the y's if I'm including categorical u's?
Hello! I'm new to Mplus and getting the help of your homepage a lot. Thank you so much. However, there are still problems I'm dealing with and they are as follows.
My research model - conditional LPA model - contains (1) 3 variables used for LPA: These are continuous variables and correlated each other. (2) 8 covariates . 1. Is it possible to use 3 variables from different time for LPA? For example, among 3 variables which are used for LPA in my study, one is the data measured in 2014 and the others were measured in 2004. 2. I think the data used in the analysis is truncated because I have only chosen the data of people who have jobs and did list-wise deletion of the others. Then, I have acknowledged that it can be a problem of the selection-bias in OLS. In this case, do I need to use Heckman model in my analysis? 3. Should all variables in LPA meet the normality assumption?
I intend to use BCH for a LPA with auxiliary variables (antecedents/distal outcomes). Three questions:
1. Does the "mixture implies no normality" argument apply only to indicator variables of LPA, or does it apply to all variables in the model?
2. I have an outcome variable (salary) which distribution is negatively-skewed, leptokuric, and possibly contains outliers. Would you recommend any pre-processing?
3. I read in several parts of this forum that as long as the antecedent has a direct effect on any of the profile indicators, there would be not be measurement invariance. However, according to Lubke & Muthen (2005), if the direct effect does not vary across profiles, MI will still hold - is this correct?
1. Only to variables influenced by the latent class variable, so not x's inm c ON x.
2. Only if there are clear outliers that you don't want to devote to a class.
3. Did we really say that? Seems like there would be measurement noninvariance in all classes. I am thinking of measurement noninvariance as a difference in response mean/prob across x values even when conditioning on latent class - and that's the direct effect.
Thank you. Page 29 of the article states that when the direct path is specified to be class invariant, the latent classes can still be compared in a straightforward way... I just took that to mean MI but I see your point now.
1. My model has 2 latent profile indicators and the covariate (gender) in my model appears to have a profile invariant effect on one of the indicators when I explored the data using R3STEP (Appendix O to Webnote 15). Will I still be able to use BCH in this case?
2. If I can, do I need to take any special care when I interpret the inter-profile differences on distal outcomes?
3. Gender correlates with some of my distal outcomes as well so I also intend to regress outcome on it. This is like the example in section 3.2 of Webnote 21 - but I also want to specify a profile-invariant direct effect u1 on x2 (where x2 is a duplicate of x). Is this feasible?
Chee Wee Koh posted on Wednesday, February 24, 2016 - 3:48 pm
Dear Dr Muthen,
I have done some further reading (Kankaras, Moors, & Vermunt, 2010) and analyses, and I would appreciate your advice on the way forward please:
1. I ran LPA for males and females separately. A 4-profile model is the best fit for both groups.
2. As mentioned, I have two indicator variables. The indicator means within each profile did not replicate across the gender groups. Also, the entropy in the male group was higher (.79 vs. .71).
3. I ran LPA for the pooled sample (with gender as covariate). Now a 5-profile model provides the best fit (the increase in no. of profiles with pooled sample is consistent with the example in Kankaras et al.). The model with gender influencing class assignment only (structural equivalence, MI) has a BIC of 11785, whereas the model with direct effect from gender to one of the indicator variables (no MI, just metric equivalance) has a slightly better BIC (11780).
My questions remain the same as those in my earlier post. My study is exploratory. I am more interested in the impact of profiles on distal outcomes rather than gender differences but if the non-MI means that I should analyze male and female data separately, I can go that way too.
" a profile invariant effect on one of the indicators when I explored the data using R3STEP"
But I don't think you can use BCH correctly when there are indicator effects.
3. I think BCH assumes conditional independence of the indicators and other variables given the latent class variable, so no.
Try this more general analysis question SEMNET.
Harmen Zoet posted on Tuesday, October 25, 2016 - 1:43 am
I'm planning to do an LPA of treatment outcome, after which I will try to find predictors of class membership (using continuous as well as categorical predictors). Is there basically a difference between inserting my predictors in the LPA as covariates vs. first conducting an LPA and then using the latent variable as the dependent variable in a multinomial logistic regression, which is conducted afterwards?
Nylund-Gibson, K. & Masyn, K. (2016). Covariates and mixture modeling: Results of a simulation study exploring the impact of misspecified effects on class enumeration. Structural Equation Modeling: A Multidisciplinary Journal, DOI: 10.1080/10705511.2016.1221313 view abstract contact author
I completed a LPA on a mixed-gender sample (n=408, 50% male). Using the steps in Finch, 2015, I then checked to see if the 4 class model I found was consistent across males and females. I got the same 4 class profiles for both males and females (and as a whole). When doing this, someone suggested I check to make sure people did "jump" groups when the analyses were done together or by gender. Although the girls did not jump much, about 50 boys moved groups. Some from an extreme group into a non-extreme group. My question is 1) is this a valid analysis and 2) does this mean anything from a statistical level or is the important part that the same 4 groups were found?
I think it is important that boys jumped when you analyzed them together with girls compared to analyzing them alone even if 4 classes and perhaps also the class percentages are the same. That seems to speak to model misfit in the joint analysis. I assume you checked measurement invariance in the joint analysis.
I tried to do the gender invariance using cg (2) c(4) and c(4) cg (2) but I am not sure how to interpret these analyses. I cannot seem to find any reference that tells me whether to look at BIC differences, or model fit, or what not.
For the parameters being free the BIC is 7918 with an entropy of .866
For the parameters contrained the BIC is 7923 and the entropy is .853.
These are so similar that from what I can tell the models are the same and therefore there is no variance across gender. Am I understanding that correctly?
Where would I find out to understand whether I have gender invariance with that analysis? I still come out with 4 groups for both makes and females. I have been looking for resources on how to determine this and I cannot find anything.
etc for the combination of classes. And then within each such combination impose equality constraints on the outcome means/thresholds across the cg classes. This is the run for measurement invariance across cg classes. Then run without those equalities. The create the usual likelihood ratio chi-square test from the two loglikelihood values.
No measurement invariance is obtained by deleting the in parentheses parameter labels.
Fan Xizhen posted on Wednesday, February 22, 2017 - 1:27 am
Dear Prof. Muthen, I¡¯m working on a multigroup profile analysis with known classes with two cross-national samples, and I¡¯m confused about how to interpret the analyses. I ran 4 MLPA models, (1) a completely unrestricted MLPA in which within-profile means and variances were allowed to vary freely over group, in addition to profile size, (2)a semi-constrained model-1 in which profile size was still allowed to vary freely, but conditional means and variances were constrained to be equal across groups, (3)a semi-constrained model-2 in which profile size and variances were still allowed to vary freely, but conditional means were constrained to be equal across groups, and (4) a fully constrained model, both profile size and within-profile means and variances were fixed to be equivalent across groups. I used BIC to select the best fit model and it showed that semi-constrained model-2(e.g., profile size and variances were still allowed to vary freely, conditional means were constrained to be equal across groups). So, does this mean there was no measurement variance across nations? What is the difference between semi-constrained model-1 and semi-constrained model-2? Do I have to constrained the means and variances at the same time to obtain measurement invariance? Thanks in advance, your reply would be highly appreciated!