Multigroup factor mixture model PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 Mike Cheung posted on Tuesday, May 04, 2004 - 7:48 am
I am interested to conduct multigroup factor mixture model in which "groups" are treated as "subjects" in factor mixture models. Is it possible to use Mplus to classify the groups into several homogeneous classes, say 2 or 3, of models?
 Linda K. Muthen posted on Tuesday, May 04, 2004 - 10:34 am
It sounds like you want your unit of analysis to be group rather than individual. Is this what you mean? If not, please explain in more detail. It is possible to do multiple group factor mixture modeling using the KNOWNCLASS option. but in this case the unknown classes correspond to classes of individuals not classes of groups.
 Mike Cheung posted on Tuesday, May 04, 2004 - 6:26 pm
Yes, the unit of analysis is group rather than individual. The KNOWNCLASS option may not be appropriate for my case because the interest is on the classes of groups, not individuals. Thank you for your suggestion.
 Linda K. Muthen posted on Wednesday, May 05, 2004 - 6:44 am
If that is the case, then create a data set where each record represents one group and do the analysis. You will get classes of groups.
 Mike Cheung posted on Wednesday, May 05, 2004 - 7:36 am
Are you suggesting to use the summary statistics (e.g,. covariance matrix) as the input and conduct factor mixture analysis? If I have 3 variables for a CFA model, the data structure would be something like:
Group_ID var11 cov21 var22 cov31 cov32 var33

Can Mplus fit models with such data structures where one row represents one group? Moreover, how can I tell the sample size per group in Mplus?

I have orderd Mplus 3. Could you point me to the relevant pages for the syntax or examples? Thanks a lot!
 bmuthen posted on Wednesday, May 05, 2004 - 11:11 am
No, I don't think you want to do mixture modeling on covariance matrix elements. Instead, you can create the group-averages of all of your variables and then do the (single-level) factor mixture modeling on those new variables, where your sample size is the number of groups. Is that of interest to you?

FYI - in twolevel analysis, Mplus is intended for mixture modeling with classes that vary across individuals, not for classes that vary only across clusters. I think some limited forms of the latter can be done by tricks, however.
 Mike Cheung posted on Wednesday, May 05, 2004 - 6:40 pm
Indeed, I want to do mixture modeling on correlation matrices which are the only available summary statistics in my data set.

I will try to work around with Mplus. Thanks a lot for your suggestions.
 Linda K. Muthen posted on Thursday, May 06, 2004 - 10:03 am
Are you saying that you don't have raw data but only summary data to analyze?
 Mike Cheung posted on Thursday, May 06, 2004 - 8:24 pm
Yes, I only have the summary data such as the correlation matrices and sample sizes. Since they are heterogeneous, I want to see whether there are several classes of CFA models or not.
 Linda K. Muthen posted on Friday, May 07, 2004 - 6:10 am
The correlation matrices alone are not sufficient to build a mixture model.
Mixture models fit third and forth order moments roughly speaking. Mplus can
build mixture models only with raw data.
 Chuck Green posted on Monday, February 27, 2006 - 9:55 am
We are currently running a multigroup model (path analysis) in which we have two normally distributed predictors and a poisson distributed outcome. The purpose of the analysis is to evaluate: 1) Whether the model is invariant across groups, and 2) If there is some invariance, whether one of the predictors functions differentially as a mediator in the two groups (as a buffer in one group and as a risk factor in the other group). I I understand correctly, since the outcome is poisson distributed this necessitates use of the integration algorithm, which in turn requires that the multigroup analysis be carried-out as a mixture model with the "KNOWNCLASS" option being used to fix group membership. We are puzzled because our path coefficients change dramatically (i.e. in magnitude and direction) in each group when this analysis is performed. The change occurs in a direction opposite to theoretical prediction. Moreover, analyzing the data ignoring the count nature of the data, as well as analysis of the data using a series of poisson regressions in the Baron and Kenny approach both yield relations in the direction predicted by theory. We are wondering if we are misinterpreting the coefficients in the mulitgroup/mixture model.

Our code is as follows:

CLASS = C(2);
MITERATIONS = 1000000;
STARTS = 100 5;

ETOHB_T on apcqaexp;
ETOHB_T on apcqabeh;
apcqabeh on apcqaexp;

ETOHB_T on apcqaexp;
ETOHB_T on apcqabeh;
!apcqabeh on apcqaexp;

Finally, when we output a data set from this analysis, the class defining variable from the KNOWNCLASS statement and the predicted class membership "C" differ. Should they be the same if we are fixing class membership?

Chuck Green
 Linda K. Muthen posted on Monday, February 27, 2006 - 10:31 am
I would need to see your input, data, output, saved data, and license number at to answer this.
 Jennifer Hamilton posted on Thursday, April 27, 2006 - 2:15 pm
I would like to run a multigroup growth mixture model with two classes. I would like group membership to predict the manifest variables. Furthermore, I would like the resulting estimates to be by class and not by groupxclass. Is this possible? This is the best that I could come up with so far....

title: GMM with group membership predicting scores - test

FILE = "C:\mixed.dat";

NAMES ARE v1 v2 v3 v4 class strata;
CLASSES = cg (2) c (2);
KNOWNCLASS = cg (strata = 0 strata = 1);


i s | v1@0 v2@1 v3@2 v4@3;
i WITH s@0;
i*5; s*1;

[i* s*];
!v1-v4 ON cg#1;

[i* s*];

[i* s*];

[i* s*];

OUTPUT: stand; tech4; SAMPSTAT;
 Bengt O. Muthen posted on Friday, April 28, 2006 - 11:04 am
The way you have specified the model, you are saying that the growth means vary across all 4 group and class combinations. That seems natural when you have a grouping variable. You would get the same effect if you simply used a dummy x variable to represent group and regressed the growth factors on x.

Note that you may also want c#1 on cg#1 in the overall part of the model as in UG ex 7.21. Otherwise, they are unrelated.

Another approach makes it clear which parameters vary across group only and which vary across class only. For this you would use as an example

Model cg:


Model c:


so that only [i] varies across group and only [s] varies across class.
 Jennifer Hamilton posted on Friday, April 28, 2006 - 11:31 am
Thank you so much for responding so quickly! The problem is that I do not want the resulting parameter estimates by group (i.e. pattern 11 and so on). I only want the results by class. So I would have an overall mean intercept estimate for class 1 and another one for class 2. Is there a way to do this? Thank you once again for your patience.
 Bengt O. Muthen posted on Friday, April 28, 2006 - 11:45 am
One way is to ignore group. But perhaps you want to allow for the group difference and then simply report the combined estimate, weighted across the groups? If that is what you mean, I think it can be done.
 Jennifer Hamilton posted on Friday, April 28, 2006 - 11:53 am
Yes, that is exactly what I would like to do (allow for the group difference and then simply report the combined estimates by class)! Delighted to hear that it may be possible. But how would one go about doing such a thing?
 Bengt O. Muthen posted on Friday, April 28, 2006 - 2:43 pm
One way would be to use Model Constraint. See the V4 UG. In the Model statements, you give the class-specific parameters labels and then you refer to those in Model constraint. For example,


[i] (i1);
[i] (i2);

Model constraint:

icomb = i1*n1 + i2*n2;

where in place of n1 and n2 you give the sample sizes for the two groups. The new parameter icomb will have the combined (weighted) estimate and its SE.
 Mirjana Radovanovic posted on Friday, August 17, 2007 - 7:43 am
I will be grateful for any advice and also correction if I'm totally off in the following two questions:

1) We would like to be able to do multigroup comparison of latent class factor analysis (LCFA) and factor mixture model (FMA). Would the use of KONWNCLASS subcommand to define grouping variable (countries in our case) solve this issue?

2) We have two completely different samples (gated and non-gated) with data on the same variables (diagnostic criteria). Is there a way to use LCFA or FMA modeling on both samples and compare results from the two samples (and modeling approaches)? Could the approach via KNOWNCLASS(to define from which sample is a subject) be applied in such a situation?

Thank you in advance! Mirjana
 Linda K. Muthen posted on Friday, August 17, 2007 - 8:52 am
1. With mixture modeling, the KNOWNCLASS option is used for multiple group modeling instead of the GROUPING option.

2. I don't know what the difference between gated and non-gated is. If one sample is selected on some criteria and the other is not, they should not be compared without taking this into account which is complex. See Pearson-Lawley selection bias.
 Mirjana Radovanovic posted on Friday, August 17, 2007 - 9:19 am
Thank you very much for your reply.
ad 1)We'll use KNOWNCLASS for comparisons.

ad 2)Thank you for the reference on selection bias.
I hope I will be able to explain the situation more clearly: we have two different samples about the same disorder; in one sample every participant was asked all the questions for all criteria; in another sample participants were asked few questions first and only a subgroup of these was asked the rest of questions (a hurdle or a gate was imposed). So data from second sample refer to a specific subgroup of participants in the study(=those who passed the gate). We would like to tease out the effect of this gate on estimates as compared with the situation where no gate was imposed (in the other sample, totally unrelated in every aspect to the gated one, but the same instrument was used). By mimicking the gating process in ungated sample, the change in estimates explains the situation in that particular sample. How (if at all) would it be possible to get insight in gating effect in the other(irreversibly gated) sample? Would running an analysis on combined data(by merging two samples) and using sample_id variable as a known class membership help getting the right answer?
Thank you for all suggestions! Mirjana
 Bengt O. Muthen posted on Friday, August 17, 2007 - 6:25 pm
The sample that used gating (the "irreversibly gated" sample) has observations for all subjects for the gate items, but observations on the non-gate items for only the subset of subjects who passed the gate - those who did not pass the gate have missing data on the non-gate items. To draw inference to the full population of subjects is a "MAR" missing data problem. Since your gate fully explains the missing data, MAR holds (see the Little & Rubin missing data book), which means that you should simply use Mplus TYPE=MISSING where you have entered missing data symbols for the non-gated items among subject who did not pass the gate. This means that the subjects with observations on the gate items but not the non-gate items will contribute to the estimation of the parameters for the non-gate items.

You can check this approach in the sample where a gate was not used and where you simulated a gate.

No Knownclass matter is involved here.
 Mirjana Radovanovic posted on Tuesday, August 21, 2007 - 8:36 am
Thank you very much for the advice. We'll do that and see what happens.
Thank you again,
 Stephen Tueller posted on Tuesday, August 28, 2007 - 1:20 pm
For the


option for input data, can Mplus read in multiple group data? I have tried the following structure in my data file without success (mean vectors are row vectors and covariance matrices are lower right triangular with blank upper left as shown in the Mplus Users Guide):

mean vector for group 1
covariance matrix for group 1
mean vector for group 2
covariance matrix for group 2

Thank you
 Linda K. Muthen posted on Tuesday, August 28, 2007 - 1:30 pm
For multiple group analysis, you need to also specify the NOBSERVATIONS and NGROUPS options. This is described in Chapter 13 under Summary Data One Dataset in the Multiple Group discussion. If this is not the problem, please send your input, data, output, and license number to
 Anthony Mancini posted on Friday, July 18, 2008 - 1:00 pm
I have a 5-wave longitudinal dataset in which some participants are siblings. To account for the nested structure of the data, I would like to conduct a multi-level longitudinal growth mixture analysis. My outcome variables are categorical, not continuous.

The Mplus manual includes examples for continuous outcomes (10.8 - 10.10) but not for categorical.

Can multi-level GMMs be done with categorical outcomes?

Thank you in advance.
 Bengt O. Muthen posted on Friday, July 18, 2008 - 5:02 pm
 Mahima Hada posted on Saturday, July 24, 2010 - 11:22 am

I am estimating a two-level path analysis Level 1 is i and Level 2 is j. Equations:

Biasij = a1_0j + b1_ij* X1_ij + e_ij1
Evalij = a2_0j + b2_ij*X2_ij + e_ij2 ,
And the manager level (level 2) equations are,
a10j = v100 + r0_j1
a20j = v200 + r0_j2

Next,I have estimated the same equations to check for latent classes at the i level. I understand how two-level latent class regression works statistically (from Muthen and Asaparouhov 2009).

Is there a paper you could guide me to which would describe the two-level latent class model for a path analytic regression. How is the multinomial regression equation set up? And how are the errors correlated across the two sets of equations, random effects and the classes?

Thanks for your guidance,
 Bengt O. Muthen posted on Sunday, July 25, 2010 - 11:49 am
The only other paper of some relevance to this (although it is not about path analysis) is:

Henry, K. & Muthén, B. (2010). Multilevel latent class analysis: An application of adolescent smoking typologies with individual and contextual predictors. Structural Equation Modeling, 17, 193-215.

Regarding the multinomial regression, I don't think there is any difference between how it is set up in Muthen-Asparouhov (2009; eqn 4). As in that paper, the errors can be correlated within each of the two levels.
 Mahima Hada posted on Tuesday, July 27, 2010 - 1:27 pm
 Jan-Willem Kroon posted on Thursday, October 07, 2010 - 6:49 am
Dear Dr. Muthen,

I am trying to run a multiple group(2) growth mixture model with two known classes (gender). The purpose of the analysis is to look for different latent classes in each group. I already ran a gender specific model but the results differed with the multiple group analysis, for both groups. I found the same number of latent classes per group, but the counts and proportions for each latent class variable, the means and variances of the intercept, slope and quadratic slope differed significantly. I wonder what i'm doing wrong. Because I am still a rookie in this area, I could use some help here.

My input is:

Multiple groups:

classes=cg(2) c(2);
knownclass=cg(g1sex=0 g1sex=1);
starts=500 20;
lrtstarts=2 1 50 15;
i s q| lat10@0 lat11@1 lat12@2 lat13@3 lat14@4 lat15@5 lat16@6 lat17@7;

i s q on Evertoba p1ses3;
c on cg Evertoba p1ses3;

Thank you in advance, jan-Willem
 Linda K. Muthen posted on Thursday, October 07, 2010 - 10:01 am
If you don't find similar trajectory classes for each group in the separate analyses, it does not make sense to compare the group;s in a known class analysis. If you want us to look at this, send the outputs for the two separate analyses and the joint analysis along with your license number to
 Jan-Willem Kroon posted on Monday, October 11, 2010 - 6:45 am
Thank you for your response. I think I've found the problem.

With kind regards,

jan-Willem Kroon
 Yunfei Wu posted on Wednesday, March 09, 2011 - 8:17 am
I am trying to measure factor invariance across gender using maximum likelihood estimator with categorical outcomes. Because I have 30 variables, I got a message "THE CHI-SQUARE TEST CANNOT BE COMPUTED BECAUSE THE FREQUENCY TABLE FOR THE
LATENT CLASS INDICATOR MODEL PART IS TOO LARGE." Based on the suggestion of the website, it seems that I can still use the difference of loglikelihood and free parameters to get the difference test result. But the problem is that I got the same loglikelihood values from my two nest models. Is it normal? Does that mean that there is factor invariance across gender then? Thanks.
 Bengt O. Muthen posted on Wednesday, March 09, 2011 - 6:06 pm
Getting exactly the same loglikelihood probably means that you have set up the models incorrectly - send relevant information to support.
 Tom Carpenter posted on Monday, March 11, 2019 - 12:45 am

I wish to (1) sort cases into latent classes and (2) use that class variable in an SEM model. I wish to define the class by three continuous indicators. I then wanted to examine how latent class interacted with other latent variables to predict outcomes. My questions are as follows:

1) how do I tell Mplus to define the latent class from specific variables? It appears that it uses all variables in "usevaraibles are"? I am used to typical factors (e.g., f1 by x1 x2 x3). How do I sort cases into latent classes on the basis of specific indicators when I am analyzing relations with a larger set of variables?

2) I would like to examine interactions with other latent variables. Can I use XWITH when one variable is a latent class? Is is possible to use latent classes in a multigroup analysis in Mplus?

Thank you
 Bengt O. Muthen posted on Tuesday, March 12, 2019 - 6:01 pm
1) See UG ex 7.19 to get ideas.

2) No, XWITH cannot be used. Yes, multigroup analysis is shown in several UG examples.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message