Group differences in LCA PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
Message/Author
 Jennie Jester posted on Monday, February 10, 2003 - 11:56 am
I want to compare latent class analysis for symptoms in girls and boys. Can I do a 2-group analysis in LCA (where I could then force the groups to be equivalent or not and check whether this causes worse fit), or do I do the two groups separately?

Another question: I have been looking at Tech 8 to check on smooth convergence. Occasionally, a "QN" or "FS" pops up in the algorithm column and this is often accompanied by a big jump in the change in log likelihood. So the change in log likelihood is not smoothly converging to zero (which is what I thought I was hoping to see in the Tech 8 output). Is this a problem? Can you explain to me a little more what I should be checking for in the Tech 8 output.

Thanks for all the help,

Jennie
 Linda K. Muthen posted on Monday, February 10, 2003 - 1:37 pm
It is not necessary to use multiple group analysis to compare the symptom items across gender. You can just regress the symptom items on gender and accomplish your goal. The symptom items have no variances or covariances, so you don't need multiple group analysis.

The QN and FS in the algorithm column indicate that the estimation algorithm has changed. QN stands for quasi-newton and FS stands for Fisher Scoring. This is not something to worry about. Following are the things you should be looking at in the TECH8 output:

1. loglikelihood should increase smoothly and reach a stable maximum -- with a change in algorithm, there may be more of a change

2. absolute and relative changes should go to zero

3. class counts should remain stable
 Christian Geiser posted on Friday, February 25, 2005 - 2:09 am
Dear Linda,

I want to check whether a latent class model with 12 binary LC indicators and 4 classes is the same for males and females. Therefore, I used the KNOWNCLASS option to do a multigroup analysis. I arrived at constraining the response probabilities to be equal across gender but now I also want to test if the class sizes are equal for both males and females. I tried to specify this with the following model statement:

MODEL: %OVERALL%
[csex#1.c#1*0.882] (49);
[csex#1.c#2*0.216] (50);
[csex#1.c#3*0.716] (51);

[csex#2.c#1*0.882] (49);
[csex#2.c#2*0.216] (50);
[csex#2.c#3*0.716] (51);

However, it didn't work. Could you give me a hint about what I must change? Thanks a lot!
 bmuthen posted on Sunday, February 27, 2005 - 11:30 am
I think perhaps the easiest way to get the gender invariance of class probabilities that you want is to instead let c and cg be uncorrelated as they are by default. Note that in ex 8.8 you have

c#1 on cg#1;

which means that the c class probabilities vary as a function of the cg (Knownclass) classes. Leaving out that line makes the c probabilities the same for the classes of cg, which is what you want. Try that.
 Christian Geiser posted on Monday, February 28, 2005 - 7:09 am
Thank you Bengt. This was actually what I did (I left out that line) but it appears from my output that the class sizes are only approximately (but not perfectly) identical:

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON THE ESTIMATED MODEL

1 1 247.64496 0.14610
1 2 116.88705 0.06896
1 3 148.71330 0.08774
1 4 221.79412 0.13085
1 5 115.96057 0.06841
2 1 245.60793 0.14490
2 2 115.92559 0.06839
2 3 147.49004 0.08701
2 4 219.96972 0.12978
2 5 115.00672 0.06785

When I do the (restricted) analysis in PANMARK, the class counts in both groups match perfectly (though the parameter estimates appear to be identical to those of Mplus). Do you have an explanation?
 Linda K. Muthen posted on Monday, February 28, 2005 - 8:58 am
Please send your input, output, and data to support@statmodel.com so we can answer your question.
 bmuthen posted on Thursday, March 03, 2005 - 1:47 pm
Christian - it looks like the difference in estimated class probabilities is merely due to the slightly different group sizes. You have sample size 851 in the first group and 844 in the second (this gives a ratio of 1.008). Since the estimated proportions that you give above are from the joint distribution of the 2 x 5 table (the 10 proportions at to 1), you would only see the same 5 class proportions in the two groups if you account for the sample size difference. For instance, 0.14620/1.0083 is approximately 0.1449.
 Christian Geiser posted on Thursday, March 24, 2005 - 1:34 am
Dear Bengt, thank you very much. Later when I was looking at the transition probabilities I realized that the class probabilities were actually identical in both groups. I have another question. I did my multigroup LCA both in Mplus and PANMARK. Now, I found that the Loglikelihood, AIC and BIC values as well as the parameter estimates are identical in both programs. However, df, Pearson X^2 and the LR test statistic were different for multigroup analysis (not for single group LCA!). Do you have an explanation? Thanks again, Christian
 bmuthen posted on Thursday, March 24, 2005 - 7:52 am
There was a glitch in the computations of those statistics when Knownclass was used - this has been fixed in version 3.12. Let us know if you still have discrepancies after trying that.
 Anonymous posted on Thursday, September 01, 2005 - 9:46 am
I have cross-sectional data from a psychological inventory and I want to test the hypotheses that there are different classes (or profiles) for different age groups (2 age groups). What is the best strategy to test this hypothesis, using age as a predictor or using Knownclass, e.g., ex 8.8?
In addition, what paper do you recommend that could help me interpret the data for group differences?
I need to be as descriptive as possible because my audience will not be very sophisticated in terms of stats knowledge.
Thanks,
P
 bmuthen posted on Thursday, September 01, 2005 - 11:30 am
Using age as a predictor is perhaps most straightforward.

There is a 1985 Clogg & Goodman paper in Soc Meth which discussed group differences in latent class analysis.
 anonymous posted on Thursday, January 12, 2006 - 5:23 am
Is it feasible to regard latent classes generated through LCA as subpopulations? In other words, I would like to use the 4 latent classe yielded in my LCA in a general SEM model, but rather than use the classes as predictors in this model, i would like to carry out multigroup comparisons, based on individuals' most likely class membership.
Is this possible, or does this result in estimation error?
or would it be better to include the actual class probablities, and not most likely membership, as predictors in a model?
 Linda K. Muthen posted on Thursday, January 12, 2006 - 8:40 am
It is not a good idea to use most likely class membership as a grouping variable. You will be introducing estimation error and your standard errors will not be correct. You could use the class probabilities as predictors, but it would be better to do the analysis simultaneously not in two steps.
 anonymous posted on Thursday, January 12, 2006 - 9:03 am
THanks for this. can you point me towards an example of how this is done in a single step, or can you suggest any paper/ reference which has applied this? many thanks
 Linda K. Muthen posted on Thursday, January 12, 2006 - 9:11 am
The way to have a latent class variable as a covariate, that is, to regress a dependent variable on the latent class variable, is to allow the means of the dependent variable to vary across classes.
 RJM posted on Tuesday, January 24, 2006 - 4:13 pm
I would like to estimate a multiple group (KNOWNCLASS = cg) hidden markov model with two classes at each occasion, testing the equality for the two groups of the probability matrix linking each latent class variable and its indicator. Is this possible?

I tried coding the model as follows but Mplus v3.13 returns an error that the class label is unknown.

MODEL x:
%cg#1.x#1%
[f$1] (111);
%cg#1.x#2%
[f$1] (121);

%cg#2.x#1%
[f$1] (211);
%cg#2.x#2%
[f$1] (221);
 Linda K. Muthen posted on Tuesday, January 24, 2006 - 4:50 pm
Please send your input, data, output, and license number to support@statmodel.com.
 Andy Ross posted on Tuesday, May 09, 2006 - 11:03 am
Dear Prof. Muthen

I am attempting to run a multi-group analysis comparing a latent class solution for two groups

In the first step I ran a four class solution for each group simultaneously using the KNOWNCLASS command, allowing both class and conditional probabilities to vary across groups:

TITLE: slca
DATA: FILE IS c:\slca;
VARIABLE: NAMES ARE pt re kd em hq g;
USEVARIABLES ARE pt re kd em hq;
CLASSES = cg(2) c(4);
KNOWNCLASS = cg (g = 1 g = 2);
CATEGORICAL = pt re kd em hq;
ANALYSIS: TYPE = MIXTURE;
STARTS = 10 5;
STITERATIONS = 200;
MITERATIONS = 3000;

MODEL: %overall%
c#1-c#3 on cg#1;

In the next step I wanted to run a restricted model in which the class probabilities are equal across groups (structural homogeneity). However I have not been able to set this up. I tried inputting the start thresholds for the first group, and constraining the conditional probabilities to be equal across groups using the following command syntax:

TITLE: slca
DATA: FILE IS c:\slca;
VARIABLE: NAMES ARE pt re kd em hq g;
USEVARIABLES ARE pt re kd em hq;
CLASSES = cg(2) c(4);
KNOWNCLASS = cg (g = 1 g = 2);
CATEGORICAL = pt re kd em hq;
ANALYSIS: TYPE = MIXTURE;
STARTS = 10 5;
STITERATIONS = 200;
MITERATIONS = 3000;

MODEL: %overall%
%cg#1.c#1%
[pt$1*-4.1 pt$2*-2.7](p1);
[re$1*2.7 re$2*7.8](p2);
[kd$1*-3.5 kd$2*1.5](p3);
[em$1*0.3 em$2*1.5 em$3*3.7](p4);
[hq$1*-2.8 hq$2*0 hq$3*0.8](p5);
%cg#1.c#2%
[pt$1*-0.7 pt$2*0.2](p6);
[re$1*2.2 re$2*12](p7);
[kd$1*3.1 kd$2*12](p8);
[em$1*2.8 em$2*3.1 em$3*3.2](p9);
[hq$1*-3.3 hq$2*-0.4 hq$3*0.3](p10);
%cg#1.c#3%
[pt$1*-1.4 pt$2*-0.6](p11);
[re$1*-1.2 re$2*3.6](p12);
[kd$1*-3.2 kd$2*0.5](p13);
[em$1*-0.8 em$2*0.1 em$3*1.8](p14);
[hq$1*-0.5 hq$2*2.3 hq$3*3.2](p15);
%cg#1.c#4%
[pt$1*4.0 pt$2*5.0](p16);
[re$1*-3.7 re$2*-0.9](p17);
[kd$1*3.8 kd$2*6.8](p18);
[em$1*0.8 em$2*1.0 em$3*1.1](p19);
[hq$1*-1.2 hq$2*0.7 hq$3*1.5](p20);
%cg#2.c#1%
[pt$1*-4.1 pt$2*-2.7](p21);
[re$1*2.7 re$2*7.8](p22);
[kd$1*-3.5 kd$2*1.5](p23);
[em$1*0.3 em$2*1.5 em$3*3.7](p24);
[hq$1*-2.8 hq$2*0 hq$3*0.8](p25);
%cg#2.c#2%
[pt$1*-0.7 pt$2*0.2](p26);
[re$1*2.2 re$2*12](p27);
[kd$1*3.1 kd$2*12](p28);
[em$1*2.8 em$2*3.1 em$3*3.2](p29);
[hq$1*-3.3 hq$2*-0.4 hq$3*0.3](p30);
%cg#2.c#3%
[pt$1*-1.4 pt$2*-0.6](p31);
[re$1*-1.2 re$2*3.6](p32);
[kd$1*-3.2 kd$2*0.5](p33);
[em$1*-0.8 em$2*0.1 em$3*1.8](p34);
[hq$1*-0.5 hq$2*2.3 hq$3*3.2](p35);
%cg#2.c#4%
[pt$1*4.0 pt$2*5.0](p36);
[re$1*-3.7 re$2*-0.9](p37);
[kd$1*3.8 kd$2*6.8](p38);
[em$1*0.8 em$2*1.0 em$3*1.1](p39);
[hq$1*-1.2 hq$2*0.7 hq$3*1.5](p40);

MODEL CONSTRAINT:
p1=p21;
p2=p22;
p3=p23;
p4=p24;
p5=p25;
p6=p26;
p7=p27;
p8=p28;
p9=p29;
p10=p30;
p11=p31;
p12=p32;
p13=p33;
p14=p34;
p15=p35;
p16=p36;
p17=p37;
p18=p38;
p19=p39;
p20=p40;

However this did not work. Could you please tell me how I can set up and run the structural homogeneity model for the above example?

Also, can I check, in order to run the next step in which I also restrict the class probabilities to be equal across groups (complete homogeneity) I simply run the original syntax, except for removing the model command:

MODEL: %overall%
c#1-c#3 on cg#1;


Is this correct?

Many thanks for your support

Andy
 Linda K. Muthen posted on Tuesday, May 09, 2006 - 11:36 am
You need to send your input, data, output, and license number to support@statmodel.com to get help on this.
 Khoun Bok Lee posted on Tuesday, October 02, 2007 - 5:44 am
hi

I want to test the hypotheses that proportions of 2 classes in low educated group are same to proportions of 2 classes in high educated grop, usig education as a preditor.
However, I could not found correct commend to compair proportions of classes between 2 groups...
Is there any commend for my test?
Although I know the way to test my hyphotheses using a KNOWNGROUP model, the result of this model did not caculate 'df'(I don't know the reason). Because I want to statistical test using X^2 distribution, df must be needed.
many thanks
 Linda K. Muthen posted on Tuesday, October 02, 2007 - 8:59 am
Instead of using the education variable as a grouping variable, use it as a covariate and regress the categorical latent variable on it using the ON option of the MODEL command.
 Lannie Ligthart posted on Thursday, April 10, 2008 - 2:49 am
I would like to test group differences in 3-class LCA profiles across two variables: sex and affection status for a disorder.
I tried to do this using the KNOWNCLASS option, by creating 4 groups: male/unaffected, female/unaffected, male/affected, female/affected, and then equating the response probabilities step by step, starting with group 1 and 3 vs. 2 and 4 etc., and comparing the BICs for these models.

I coded this as follows:

KNOWNCLASS = cg (group=1 group=2 group=3 group=4);
classes = cg(4) c(3) ;

and then:

Model: %OVERALL%
C#1 ON cg#1;
C#2 ON cg#1;
C#1 ON cg#2;
C#2 ON cg#2;
C#1 ON cg#3;
C#2 ON cg#3;

I have two questions:
1) Did I specify this model correctly (I have never seen any scripts using more than 2 groups)?
2) Is it a valid approach to create four groups the way I did, or is there a better way to do this?
 Linda K. Muthen posted on Thursday, April 10, 2008 - 9:15 am
1. It looks correct. In Version 5, you can simply say c ON cg;
2. I would use -2 times the loglikelihood difference not BIC.
 C. Sullivan posted on Thursday, July 24, 2008 - 6:00 am
Hi, I'm trying to conduct a multigroup lca using "knownclass" (adstat) and while I can run a model constrained in terms of conditional item probabilities, I'm having difficulty holding the lc probabilities to be equal. Any advice on how to constrain those lc probabilities would be much appreciated. This is the input that I have so far.

MODEL:
%Overall%
drgcrm#1 on adstat#1;

%adstat#1.drgcrm#1%
[coc$1] (2);
[op$1] (3);
[pcp$1] (4);
[mj$1] (5);
%adstat#1.drgcrm#2%
[coc$1] (6);
[op$1] (7);
[pcp$1] (8);
[mj$1] (9);

%adstat#2.drgcrm#1%
[coc$1] (2);
[op$1] (3);
[pcp$1] (4);
[mj$1] (5);
%adstat#2.drgcrm#2%
[coc$1] (6);
[op$1] (7);
[pcp$1] (8);
[mj$1] (9);
 Linda K. Muthen posted on Thursday, July 24, 2008 - 9:11 am
Try removing the statement:

drgcrm#1 on adstat#1;

If you continue to have problems, send your files and license number to support@statmodel.com.
 James Swartz posted on Sunday, March 22, 2009 - 6:14 pm
Sorry for this very simple question, but when comparing a restricted versus unrestricted LCA model with known classes, do you compare the statistics printed as the loglikelihood values or the likelihood ratio chi-squares...

Thanks,
James
 Linda K. Muthen posted on Sunday, March 22, 2009 - 9:39 pm
To test nested LCA models, the regular loglikelihood values are compared. -2 times the loglikelihood difference is used.
 davide morselli posted on Monday, February 08, 2010 - 6:52 am
Hi,
I'm comparing latent class structures between groups,I've performed the analysis separately on the two samples and results are that a 3 three class model is good for group 1; while for group 2 a 4 class model is preferable. Since classes from 1 to 3 are in effect similar in both groups (e.g., each class is defined in the same way by the same items), can I specify a model that consider at the same time the measurement equivalence between the class from 1 to 3 and the fact that group 2 has 1 additional class?
 Linda K. Muthen posted on Monday, February 08, 2010 - 8:18 am
That seems to be a reasonable approach.
 davide morselli posted on Monday, February 08, 2010 - 9:04 am
ok, but how can I specify the model?
if I constrain means of class 4 of group 1 (cg#1.c#4) to be equal to zero I have 0 subjects in the models with no other constraints but 114 subjects in the model where I constrain measurement invariance across groups. Do I have to constraint some other parameter?

the syntax is:
CLASSES = cg (2) c(4) ;
KNOWNCLASS = cg (country = 1 country = 2);
ANALYSIS:
TYPE = MIXTURE;
STARTS = 500 50;
MODEL:
%OVERALL%
c ON cg ;
%cg#1.c#4%
[v1-v6@0];

MODEL c:
%c#1%
[v1-v6];
%c#2%
[v1-v6];
%c#3%
[v1-v6];
 Linda K. Muthen posted on Monday, February 08, 2010 - 11:01 am
In overall, for the class you want to be the empty class, for example, class 1, specify:

c#1 ON cg#1@-15;

For further support, please send your question and license number to support@statmodel.com.
 Maria Guadalupe posted on Thursday, April 15, 2010 - 8:23 am
Hello! I am wanting to do a LCA with multiple groups. I have 4 groups, and for 3 of the 4, a 2 class solution fits the data best, but for one class a 3 class solution fits best. Is there a way to free the number of classes for that one group with 3 classes, or some other way to handle this situation?

Gracias!
 Bengt O. Muthen posted on Friday, April 16, 2010 - 10:30 am
It is not easy to work with different number of classes in different groups. Instead, you could investigate the 3-class solution in all groups - perhaps in the 3 groups where 2 classes fit best the 3-class solution is just a minor extension of the 2-class theme.
 Christian M. Connell posted on Friday, April 23, 2010 - 1:28 pm
I am working on a model similar to those described above (gender comparison of latent classes based upon 13 binary indicators) using the knownclass approach. We have been able to run models that freely estimate item-response probabilities across class by gender and also to run models in which the item-response probabilities are restricted within class by gender (i.e., males and females in a given class are restricted to have the same item-response probabilities).

Where I am having some difficulty is in restricting the class probabilities (i.e., prevalence of each class) to be the same across gender. I have removed the regression of latent classes on the knownclass from the overall statement (as suggested previously), but the class probabilites still differ by gender.
 Bengt O. Muthen posted on Saturday, April 24, 2010 - 12:56 pm
Please send your input, output, data, and license number to support@statmodel.com so we can see your exact setup.
 F Lamers posted on Friday, October 22, 2010 - 1:32 pm
Iím trying to run an LCA using the KNOWNCLASS command to evaluate whether profiles are similar between two groups. First, I ran an unrestricted model where class and conditional probabilities are allowed to vary across groups:
VARIABLE: NAMES ARE sampleid d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 group;
USEVARIABLES ARE d1-d10 ;
CATEGORICAL= d1 d2 d3 d4 d5 d6 d7 d8 d9 d10;
MISSING= all (-1234);
IDVARIABLE IS sampleid;
CLASSES= cg (2) c(2);
KNOWNCLASS = cg (group=0 group=1);
ANALYSIS: TYPE= MIXTURE;
STARTS= 400 100 ;
PROCESSORS=2;
MODEL: %OVERALL%
c on cg;

Now, I would like to equalize the response probabilities in the classes between the two groups (and then compare the models with a -2 loglikelihood test). Iíve been trying to write the input for this, but Iím not sure if Iím doing things the right way.

MODEL:%OVERALL%
c on cg;

%cg#1.c#1%
[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (1-14);
%cg#1.c#2%
[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (15-28);

%cg#2.c#1%
[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (1-14);
%cg#2.c#2%
[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (15-28);

Is this the right way?
 Linda K. Muthen posted on Sunday, October 24, 2010 - 9:04 am
That looks correct. The best way to check is to run it and see if you get what you expect.
 Jennifer Buckley posted on Wednesday, February 02, 2011 - 2:34 am
I am trying to run a similar model to the one above, where response probabilities are equalized across different groups, and I have the following queries:

1) when the dependent variables are categorical, how many constraints do I need? Is it the number of categories minus 1?

2) Do the groups need to be the same size when doing multiple group analysis with knownclass?
 Linda K. Muthen posted on Wednesday, February 02, 2011 - 6:48 am
1. The number of thresholds for a categorical variable is the number of categories minus one.

2. No.
 J.D. Haltigan posted on Friday, May 06, 2011 - 7:22 pm
Hi:

I have a similar question that has been posed above but I am not sure if I am interpreting my output correctly.

I have a 4 class model (generated from 7 binary latent class indicators) and the substantive question I want to ask is whether membership in these classes is the same for males and females (gender as covariate).

I have regressed the categorical latent variable on gender using:

MODEL: %OVERALL%
C#1 on sex;

However, I am not sure how to interpret the estimates. Specifically the overall C#1 on sex estimate and then the 3 intercept estimates (C#1, C#2, C#3). Any insight would be greatly appreciated.

Also, is there a separate plot command to generate the plot for relationship between class probabilities and covariate (as opposed to the item probabilities for each class)?
 Linda K. Muthen posted on Saturday, May 07, 2011 - 10:38 am
You should use the specification c ON x. For this multinomial logistic regression with four classes, you will obtain three regression coefficients and three intercepts. The intercepts are used along with the regresson coefficients to compute probabilities. See pages 443-445 of the Version 6 user's guide.

As far as the regression coefficients, the last class is the reference class. Let's say for a continuous covariate x in class 1, the regression coefficient for x is positive. The interpretation is that as x increases, the log odds increases for those in class 1 compared to the reference class.
 J.D. Haltigan posted on Monday, May 09, 2011 - 1:58 pm
Thanks! Is it the case that if one uses algorithm integration (to regress the covariate on the latent classes) that plots for the relationship between class probabilities and the covariate can not be generated? I have seen examples of these plots and would like to be able to see them for my data, but I can not run the {above} model without using algorithm integration.
 Bengt O. Muthen posted on Monday, May 09, 2011 - 4:30 pm
I think that is true. Do you really need algorithm = integration? (Also, c ON x doesn't mean to regress the covariate on the latent classes, but the other way around.)
 J.D. Haltigan posted on Monday, May 09, 2011 - 10:40 pm
Thanks and thanks for the language clarifiation. I worked with my syntax and was able to get the plots of probabilities for the classes as a function of its covariate (I was able to run it without algorithm=integration).

One conceptual question: given that the covariate is not assumed independent amongst the classes, what is the usual justification for including a covariate in class generation? In other words, if there is a theoretical basis for including the covariate in the model, but the logistic regression coefficients are not significant does it make sense to NOT include the covariate in subsequent latent class generation?

Relatedly, the approach I have been taking with my data is to use the classes to ascertain or predict differences on various relevant antecedent and sequalae variables (e.g., ANOVA/logistic regression). But does it make more sense to use variables that might explain the classes as covariates in the model (aside from the binary behavioral indicators of the phenomenon)?
 Bengt O. Muthen posted on Tuesday, May 10, 2011 - 10:26 am
Answer to your 1st question: Yes.

Answer to your 2nd question:

If you think of certain variables as antecedents to the latent class variable, I would include them as covariates ("c ON x"). Then you can also see how the covariate means change over the classes. This is different from having these variables as indicators of the latent classes because there is no assumption of conditional independence among covariates,
 J.D. Haltigan posted on Thursday, May 12, 2011 - 12:43 am
Thanks again for the helpful remarks. I am still in the process of fully grasping certain aspects of the conceptual aspects of the LCA method.

One further point of clarification on the above: Is it the case that I can not compare models (same # classes) with and without the covariates included? Although my BIC and adjusted BIC values are more favorable with the covariates included, their estimates are not significant. That said, I am not certain such a comparison is tenable (i.e., one can only compare the same model in terms of adjudicating different class sizes).
 Bengt O. Muthen posted on Thursday, May 12, 2011 - 8:10 am
Because the likelihood is computed for outcomes conditional on covariates, you can compare BIC between models with and without covariates as long as the models have the same outcomes. But if none of the covariates are significant, why include them? Note that you can test joint significance of the covariates by Model Test testing if all their slopes are zero - instead of looking at the z score for each slope separately. I'm not sure, but I think this is what you were asking.
 J.D. Haltigan posted on Thursday, May 12, 2011 - 4:57 pm
Just to be sure of myself, when you use outcomes you mean classes correct? Yes, since the covariates are not significant, I likely will not include them in the model. The somewhat odd result I am having trouble grasping is that with the same indicators a 4-class solution is best without the covariates but a 3-class solution seems best with them (indeed, model estimates are more favorable for the 3-class with covariates than without). Substantive theory in the area I am working in would point to either a 3 or 4 class solution, so in that sense it is not a problem.

One thought I had was that, even though the covariates in the 3-group solution are not significant (risk index comes close), the resultant classes (3) may have better predictive yield for the substantive question I am looking to address with the classes (i.e., unique predictive correlates). Is this a reasonable strategy to approach the issue from?
 Bengt O. Muthen posted on Thursday, May 12, 2011 - 6:50 pm
I actually meant the observed outcomes being the same - which I assume you have in your case.

The topic of deciding on the number of classes with or without covariates has been studied by several authors - you may want to email Katherine Masyn at Harvard School of Ed for example. Some covariate may have direct effects on the observed outcomes which makes things more complex; in that case you should include the direct effect. See also my 2004 chapter on GMM in the Kaplan handbook.

If you have theoretical reasons for a set of covariate influencing the latent class variables, I would not be against reporting results from such an analysis even if none of the covariates turns out to be significant.
 J.D. Haltigan posted on Thursday, May 12, 2011 - 7:58 pm
Dr. Muthen thanks again this is very helpful. If I am correct in that observed outcomes = observed latent indicators then yes they were the same 7 for each model.

Given that a 4-class model emerged without the covariates, but a 3-class model emerged with them (albeit they were non-significant) it must still be the case that group membership has changed (4 to 3 classes etc.) as a function of the inclusion of the covariate? Or am I misunderstanding? I have not utilized the three-group solution (classes) yet in analyses but I am anxious to see if they show a different pattern of results (in terms of their relationship to subsequent variables).

One thing that may be limiting the model is that the N (cases) is 177. This seems rather low based on my reading of most LCA literature. However, given the somewhat rarity of the behavioral phenotype of interest, 177 is probably the largest sample on which this type of analyses will likely be conducted--at least at present.
 Bengt O. Muthen posted on Monday, May 16, 2011 - 9:04 am
If your 3-class model with covariates has some significant direct effects from some covariates to some observed latent class indicators, it may be that is has a worse BIC than a 4-class model. So the group membership may not change when the modeling with covariates is done in more depth.

A sample less than 200 can make BIC underestimate the number of classes - see the article on our web site:


Nylund, K.L., Asparouhov, T., & Muthťn, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.
 J.D. Haltigan posted on Wednesday, May 18, 2011 - 12:55 am
Thanks again, I read over the Nylund piece in detail and it was quite helpful.

One thing that a number of colleagues have broached with me is that it may be circular to claim that, in my case for example, that a given class shows associations with a covariate (e.g., maternal depression) if that covariate is informing the class membership in the first place (i.e., their position is that is more plausible to just use the latent class indicators to inform class enumeration and then test for differences on antecedents in a separate step). I have received this critique a number of times and I am curious as to what is the advantage of using covariates in the LCA modeling process (i.e., what is the rebuttal to this claim)?
 Bengt O. Muthen posted on Wednesday, May 18, 2011 - 9:16 am
That's a big topic. I think it all depends on the application situation. It is interesting that this comes up often in LCA but not in factor analysis. A factor model with factors regressed on covariates (a MIMIC model) typically would not be questioned - the covariates contribute information to the formation of the factor scores so you include them. Although there are exceptions - take the MIMIC model used by ETS to score student performance. On the individual level using only the indicators makes sense - you wouldn't want to have say gender influence your factor score, only your performance. But on the group level, covariates improve the scores.

In your case, if you want to show how strongly a covariate predicts class membership, you do that best in a single run including the covariate. Doing it in several steps (without covariates, classify, regress class on covariates) has its own complications. However, I can see an argument for not using the covariates when deciding on the classes. For instance, if covariates are genotypes and the indicators are phenotypes - here you want to define the phenotype without using the genetic information, and then see the strength of relationship.
 J.D. Haltigan posted on Friday, May 20, 2011 - 12:26 am
Thanks, Dr. Muthen. Would one include a direct effect of a covariate on a given indicator if they had a priori theory or evidence that the covariate influences, say, one of the X indicators? This is certainly not the case for me (i.e., I only suspect that the chosen covariates will in some way influence class membership--I have no prior evidence to suggest that they are associated with a given indicator) but I was curious as to when one would include such a direct effect of a covariate on an indicator (in addition to regressing the class on the covariate).
 Bengt O. Muthen posted on Saturday, May 21, 2011 - 9:52 am
You would do it either by theory, or when less is known, simply by regressing one indicator at a time on all covariates in an exploratory way to see which direct effects are significant.
 J.D. Haltigan posted on Friday, December 02, 2011 - 1:18 pm
Returning again to this thread to make sure I am interpreting some things correctly...Just finished reading some of Collins and Lanza (2010) and had a few questions.

If I fit an LCA model with 5 covariates (simultaneously) is it the case that the odds-ratios will tell me the increase in probability of a given class membership for a one-unit increase in a given covariate but that in order to test whether a covariate is significant, I would need to run separate models with and without a given covariate and perform the chi-square test of model fit?

Also, I have not standardized the covarites prior to entry into the model..I realize this will not affect the results other than easing the conceptual part, but I did find out that to obtain standardized parameters for this type of model, numerical integration is required. I was just curious why this (numerical integration for standardized coffecients) was the case.
 Bengt O. Muthen posted on Friday, December 02, 2011 - 2:12 pm
The printed z tests (Est/SE) for the coefficient of each covariate gives the test of whether the covariate has a significant influence.

I don't see how the request for standardized calls for numerical integration - better send to support to see the full picture.
 J.D. Haltigan posted on Friday, December 02, 2011 - 10:52 pm
Thanks Dr. Muthen. This was my original understanding and I got a bit turned around after reading Chapter 6 of Collins and Lanza. They note that in an LCA with covariates the latent class prevalences are expressed as functions of the regression coefficients and individuals' values on the corresponding covariates. I get this. They then mention that hypothesis testing in LCA with covariates is done by means of a likelihood ratio chi-square test. This is where I became a bit confused. By hypothesis testing I am now understanding the authors to mean testing competing models (i.e., one with a covariate and one without). So, if there is not a significant improvement in model fit with the addition of the covariate (even though it may have a significant odds-ratio estimate on a given class relative to the reference class) how does one evaluate the covariate? I guess since in this case I am using a hypothetical example of one covariate, if it has a signifcant odd-ratio estimate, there would have to be improvement in class prediction relative to a baseline model with no covariate?

To be more clear, in my example of a 5-covariate model, 2 of the 5 covariates had significant odds-ratio estimates depending on the reference class (theory-expected). I can say that these covariates significantly influence class membership correct? There is no need, say, to compare to a baseline model without these covariates?
 Bengt O. Muthen posted on Saturday, December 03, 2011 - 4:49 pm
You can test the significance of a covariate by the z test for its slope that Mplus prints. You can also run 2 models, one with the slope free and one with the slope fixed at zero. 2 times the loglikelihood difference for these runs is chi-square which is z-squared - so these two tests should agree. To test more than one covariate having zero slopes you can still do the likelihood difference testing - I think that's the testing mentioned in the book.
 Stata posted on Monday, April 30, 2012 - 10:30 am
Dr. Muthen,

I'd like to confirm with you regarding Nylund et al (2007) mentioned "the commonly used log likelihood difference test cannot be used to test nested latent class models". Does it mean the loglikelihood (Ho in Tech 8) value cannot be used for LCA comparison or does it mean something else (Tech 11)?


Thank you.
 Linda K. Muthen posted on Monday, April 30, 2012 - 2:00 pm
If means that to decide on the number of classes, you should not use loglikelihood difference tests. You should instead use BIC, TEHC11, TECH14 etc.
 Tracy Witte posted on Thursday, October 17, 2013 - 4:57 am
I am running a 3-group LCA and have used the 3-step procedure for testing equality of means in auxiliary variables across classes. I would like to provide effect sizes for differences in means on the auxiliary variables across the classes. Would it be appropriate to convert the SE's to SD's and then calculate Cohen's d from the means and standard deviations? Or, is there some other, preferred approach for determining effect sizes?
 Bengt O. Muthen posted on Thursday, October 17, 2013 - 6:26 am
Yes, that would be appropriate.
 Jonathan Larson posted on Wednesday, February 19, 2014 - 11:37 am
We have an LTA that displays measurement non-invariance over time. However, upon visual inspection, one of the classes appears relatively unchanged. Is there a way to compare item probabilities over time to see which ones differ and which don't?

I know there are ways to compare proportions in independent and matched samples, but this class at two time-points doesn't classify as independent or matched because it has some of the same people and some different people.

Thank you!
 Bengt O. Muthen posted on Wednesday, February 19, 2014 - 11:47 am
You can test invariance one item at a time.
 Jonathan Larson posted on Wednesday, February 19, 2014 - 2:38 pm
Do you mean by using the likelihood ratio test to compare models (i.e., two times the difference in log-likelihoods)?

If so, how do you interpret the case where the two models are not different from each other? I imagine that if the constrained model fits better, the item probability does not change over time, and if the free model fits better, the item probability changes over time. However, I wouldn't know how to interpret a non-significant result.

Thank you!
 Bengt O. Muthen posted on Thursday, February 20, 2014 - 2:09 pm
Yes, an LR chi-square test.

If this test does not give significance we can't reject equal parameters, so we take them to be invariant.
 Jonathan Larson posted on Friday, February 21, 2014 - 7:29 am
Thank you!
 Wen, Fur-Hsing posted on Monday, March 24, 2014 - 7:08 pm
I try to run an LCA using the KNOWNCLASS command to evaluate whether profiles are similar between two groups. I have 4 classes based on the observed items. If I want to test some specific classes, e.g. c#2 and c#4 based on result, whether the conditional probabilities are equal between the two groups, how do I specify the scripts?
 Linda K. Muthen posted on Tuesday, March 25, 2014 - 1:33 pm
You can use MODEL CONSTRAINT to test if the thresholds or probabilities are equal across classes, for example,

MODEL:
%overall%
[u$1 u$2];
%c#1%
[u$1] (p1);
%c#2%
[u$1] (p2);

MODEL CONSTRAINT:
NEW (diff);
diff = p1 - p2;

To test the probabilities, you need to create them in MODEL CONSTRAINT using the thresholds.
 Sara Anderson posted on Tuesday, May 27, 2014 - 11:27 am
I am having trouble interpreting my LCA multi-group output. In my first multigroup model (with two groups, 5 classes), I am not finding similar patterns of probabilities for the classes across groups. I had presumed that even though classes can vary across groups, that the patterns for the 1st, 2nd, etc., classes should be roughly similar. Is that not the case? Are classes just put in different order for each group?

Similarly, when I constrain thresholds to be equal within one class, I am not finding that the paths are actually the same for the same class across the two groups (although the Wald test is significant). Why would that be the case? I thought that I should see the thresholds as being identical? Is it because the groups are different sizes?
 Bengt O. Muthen posted on Tuesday, May 27, 2014 - 6:24 pm
I assume that you are using 2 latent class variables and that one of them is specified as Knownclass. If so, the unknown classes can come out in different order for the 2 known classes. But the mean/probability profiles will tell you which class should be comparable to which.

For the question in your second paragraph we would need to see your output - send that to Support with your license number - and point out what is not equal that you expected to be equal.
 Sara Anderson posted on Tuesday, May 27, 2014 - 6:49 pm
Just to clarify - with the knownclass option, the classes could show up in different order across the known groups. So, 1 1 might not be the same class as 2 1 in the output. I need to use the predicted probabilities to match them?

Also, what if I find that the overall predicted N for the classes is different with the Knownclass option as it was for the initial model? Shouldn't they be the same or can they shift once you can disaggregate by groups?

I figured out part two. Thanks-
 Bengt O. Muthen posted on Wednesday, May 28, 2014 - 2:11 pm
Q1. Yes.

Q2. They can shift.
 Seung Bin Cho posted on Friday, September 05, 2014 - 7:53 am
Dear, Dr. Muthen.

I have questions on testing measurement invariance in multiple group LCA.
I have fitted two models with binary indicators: one model without any restriction on thresholds and another model with equality constraints on all thresholds between sex.

1) Can I use difference of -2*H0 Log likelihood between the two models to test thresholds equality between sex?

2) Alternatively, I attempted to test thresholds equality between sex using model test option. I have manually checked the order of classes and selected pairs to test that I think were appropriate. Optseed was used to fix the order. Is this way correct?

3) Test results were different between the two methods above. Test was insignificant from 1) (p=.31) but highly significant from 2) (p<.001) . I think the method 1) is more reasonable mainly because 1. there still is arbitriness in selecting pairs to testin method 2), 2. I'm not sure if the methods are equivalent, and 3. sample sizes were different between sex.

4) Is there better way to test the measurement invariance?

Thank you for your help!
 Linda K. Muthen posted on Friday, September 05, 2014 - 9:34 am
See the Version 7.1 Language Addendum on the website with the user's guide under convenience features for multiple group analysis. This shows the models to use for testing measurement invariance and convenience features that can help you do this.
 Seung Bin Cho posted on Wednesday, September 10, 2014 - 7:55 am
Thank you for your response, Dr. Muthen.

I think the relevant part of 7.1 language addendum is the last part about Knownclass and Do option.
As far as I understand, what I did using Model test command was equivalent with what's described in the document, and using Do option is a convenience feature that helps user do the same thing more easily.
I'm still wondering whether my first question - using -2*H0 Log likielihood difference - is true, because I still think this is a better way of testing threshold invariance for the reasons I described in my third question above.

Thank you for your help!
 Seung Bin Cho posted on Wednesday, September 10, 2014 - 8:10 am
I forgot to mention this part.

Is this page also relevant as I used MLR?
http://statmodel.com/chidiff.shtml

Thank you!
 Linda K. Muthen posted on Wednesday, September 10, 2014 - 8:46 am
See

Multiple group factor analysis: Convenience features

This has convenience features for testing measurement invariance. I think this is what you want to do.
 Seung Bin Cho posted on Wednesday, September 10, 2014 - 9:07 am
Thank you Dr. Muthen.

I found the section and still wonder if it's relevant with LCA because it says
"It is available for CFA and ESEM models for continuous variables with the maximum likelihood and Bayes estimators..."

I'm running LCA using binary indicators. My main question is whether it's appropriate to use -2*H0 Log likelihood from the output to test threshold invariance. I'm using MLR estimator. In other words, I wonder if the difference of -2*H0 log likelihood between nested models follows chi-squared distribution. I assumed it follows chi-squared distribution based on my previous search on this forum.

http://www.statmodel.com/discussion/messages/13/254.html?1401311492
http://www.statmodel.com/discussion/messages/23/393.html?1261082914

I'd also appreciate if you could recommend relevant readings.
Thank you for your help.
 Bengt O. Muthen posted on Wednesday, September 10, 2014 - 12:55 pm
Yes, the LRT test is correct to use for threshold invariance. I don't know about references; there may be in the LTA context.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: