Message/Author 


I am interested to conduct multigroup factor mixture model in which "groups" are treated as "subjects" in factor mixture models. Is it possible to use Mplus to classify the groups into several homogeneous classes, say 2 or 3, of models? 


It sounds like you want your unit of analysis to be group rather than individual. Is this what you mean? If not, please explain in more detail. It is possible to do multiple group factor mixture modeling using the KNOWNCLASS option. but in this case the unknown classes correspond to classes of individuals not classes of groups. 


Yes, the unit of analysis is group rather than individual. The KNOWNCLASS option may not be appropriate for my case because the interest is on the classes of groups, not individuals. Thank you for your suggestion. 


If that is the case, then create a data set where each record represents one group and do the analysis. You will get classes of groups. 

Mike Cheung posted on Wednesday, May 05, 2004  7:36 am



Are you suggesting to use the summary statistics (e.g,. covariance matrix) as the input and conduct factor mixture analysis? If I have 3 variables for a CFA model, the data structure would be something like: Group_ID var11 cov21 var22 cov31 cov32 var33 Can Mplus fit models with such data structures where one row represents one group? Moreover, how can I tell the sample size per group in Mplus? I have orderd Mplus 3. Could you point me to the relevant pages for the syntax or examples? Thanks a lot! 

bmuthen posted on Wednesday, May 05, 2004  11:11 am



No, I don't think you want to do mixture modeling on covariance matrix elements. Instead, you can create the groupaverages of all of your variables and then do the (singlelevel) factor mixture modeling on those new variables, where your sample size is the number of groups. Is that of interest to you? FYI  in twolevel analysis, Mplus is intended for mixture modeling with classes that vary across individuals, not for classes that vary only across clusters. I think some limited forms of the latter can be done by tricks, however. 

Mike Cheung posted on Wednesday, May 05, 2004  6:40 pm



Indeed, I want to do mixture modeling on correlation matrices which are the only available summary statistics in my data set. I will try to work around with Mplus. Thanks a lot for your suggestions. 


Are you saying that you don't have raw data but only summary data to analyze? 


Yes, I only have the summary data such as the correlation matrices and sample sizes. Since they are heterogeneous, I want to see whether there are several classes of CFA models or not. Thanks! 


The correlation matrices alone are not sufficient to build a mixture model. Mixture models fit third and forth order moments roughly speaking. Mplus can build mixture models only with raw data. 

Chuck Green posted on Monday, February 27, 2006  9:55 am



We are currently running a multigroup model (path analysis) in which we have two normally distributed predictors and a poisson distributed outcome. The purpose of the analysis is to evaluate: 1) Whether the model is invariant across groups, and 2) If there is some invariance, whether one of the predictors functions differentially as a mediator in the two groups (as a buffer in one group and as a risk factor in the other group). I I understand correctly, since the outcome is poisson distributed this necessitates use of the integration algorithm, which in turn requires that the multigroup analysis be carriedout as a mixture model with the "KNOWNCLASS" option being used to fix group membership. We are puzzled because our path coefficients change dramatically (i.e. in magnitude and direction) in each group when this analysis is performed. The change occurs in a direction opposite to theoretical prediction. Moreover, analyzing the data ignoring the count nature of the data, as well as analysis of the data using a series of poisson regressions in the Baron and Kenny approach both yield relations in the direction predicted by theory. We are wondering if we are misinterpreting the coefficients in the mulitgroup/mixture model. Our code is as follows: USEVARIABLES ARE APCQAEXP APCQABEH ETOHB_T; COUNT ARE ETOHB_T; CLASS = C(2); KNOWNCLASS = C(MOODGRPI = 0 MOODGRPI =1); AUXILIARY ARE CID MOODGRPI; ANALYSIS: TYPE = MIXTURE ; ESTIMATOR IS MLF; ALGORITHM = INTEGRATION; INTEGRATION = STANDARD (15); MITERATIONS = 1000000; STARTS = 100 5; STITERATIONS = 20; MODEL: %OVERALL% ETOHB_T on apcqaexp; ETOHB_T on apcqabeh; apcqabeh on apcqaexp; %C#1% ETOHB_T on apcqaexp; ETOHB_T on apcqabeh; !apcqabeh on apcqaexp; Finally, when we output a data set from this analysis, the class defining variable from the KNOWNCLASS statement and the predicted class membership "C" differ. Should they be the same if we are fixing class membership? Thanks, Chuck Green 


I would need to see your input, data, output, saved data, and license number at support@statmodel.com to answer this. 


I would like to run a multigroup growth mixture model with two classes. I would like group membership to predict the manifest variables. Furthermore, I would like the resulting estimates to be by class and not by groupxclass. Is this possible? This is the best that I could come up with so far.... title: GMM with group membership predicting scores  test DATA: FILE = "C:\mixed.dat"; VARIABLE: NAMES ARE v1 v2 v3 v4 class strata; USEVARIABLES ARE v1 v2 v3 v4; CLASSES = cg (2) c (2); KNOWNCLASS = cg (strata = 0 strata = 1); ANALYSIS: TYPE = MIXTURE; ESTIMATOR = MLR; MODEL: %OVERALL% i s  v1@0 v2@1 v3@2 v4@3; [v1v4@0]; i WITH s@0; v1v4@1; i*5; s*1; %cg#1.c#1% [i* s*]; !v1v4 ON cg#1; %cg#1.c#2% [i* s*]; %cg#2.c#1% [i* s*]; %cg#2.c#2% [i* s*]; OUTPUT: stand; tech4; SAMPSTAT; 


The way you have specified the model, you are saying that the growth means vary across all 4 group and class combinations. That seems natural when you have a grouping variable. You would get the same effect if you simply used a dummy x variable to represent group and regressed the growth factors on x. Note that you may also want c#1 on cg#1 in the overall part of the model as in UG ex 7.21. Otherwise, they are unrelated. Another approach makes it clear which parameters vary across group only and which vary across class only. For this you would use as an example Model cg: %cg#1% [i]; %cg#2% [i]; Model c: %c#1% [s]; %c#2% [s]; so that only [i] varies across group and only [s] varies across class. 


Thank you so much for responding so quickly! The problem is that I do not want the resulting parameter estimates by group (i.e. pattern 11 and so on). I only want the results by class. So I would have an overall mean intercept estimate for class 1 and another one for class 2. Is there a way to do this? Thank you once again for your patience. 


One way is to ignore group. But perhaps you want to allow for the group difference and then simply report the combined estimate, weighted across the groups? If that is what you mean, I think it can be done. 


Yes, that is exactly what I would like to do (allow for the group difference and then simply report the combined estimates by class)! Delighted to hear that it may be possible. But how would one go about doing such a thing? 


One way would be to use Model Constraint. See the V4 UG. In the Model statements, you give the classspecific parameters labels and then you refer to those in Model constraint. For example, Model: %c#1% [i] (i1); %c#2% [i] (i2); Model constraint: new(icomb); icomb = i1*n1 + i2*n2; where in place of n1 and n2 you give the sample sizes for the two groups. The new parameter icomb will have the combined (weighted) estimate and its SE. 


I will be grateful for any advice and also correction if I'm totally off in the following two questions: 1) We would like to be able to do multigroup comparison of latent class factor analysis (LCFA) and factor mixture model (FMA). Would the use of KONWNCLASS subcommand to define grouping variable (countries in our case) solve this issue? 2) We have two completely different samples (gated and nongated) with data on the same variables (diagnostic criteria). Is there a way to use LCFA or FMA modeling on both samples and compare results from the two samples (and modeling approaches)? Could the approach via KNOWNCLASS(to define from which sample is a subject) be applied in such a situation? Thank you in advance! Mirjana 


1. With mixture modeling, the KNOWNCLASS option is used for multiple group modeling instead of the GROUPING option. 2. I don't know what the difference between gated and nongated is. If one sample is selected on some criteria and the other is not, they should not be compared without taking this into account which is complex. See PearsonLawley selection bias. 


Thank you very much for your reply. ad 1)We'll use KNOWNCLASS for comparisons. ad 2)Thank you for the reference on selection bias. I hope I will be able to explain the situation more clearly: we have two different samples about the same disorder; in one sample every participant was asked all the questions for all criteria; in another sample participants were asked few questions first and only a subgroup of these was asked the rest of questions (a hurdle or a gate was imposed). So data from second sample refer to a specific subgroup of participants in the study(=those who passed the gate). We would like to tease out the effect of this gate on estimates as compared with the situation where no gate was imposed (in the other sample, totally unrelated in every aspect to the gated one, but the same instrument was used). By mimicking the gating process in ungated sample, the change in estimates explains the situation in that particular sample. How (if at all) would it be possible to get insight in gating effect in the other(irreversibly gated) sample? Would running an analysis on combined data(by merging two samples) and using sample_id variable as a known class membership help getting the right answer? Thank you for all suggestions! Mirjana 


The sample that used gating (the "irreversibly gated" sample) has observations for all subjects for the gate items, but observations on the nongate items for only the subset of subjects who passed the gate  those who did not pass the gate have missing data on the nongate items. To draw inference to the full population of subjects is a "MAR" missing data problem. Since your gate fully explains the missing data, MAR holds (see the Little & Rubin missing data book), which means that you should simply use Mplus TYPE=MISSING where you have entered missing data symbols for the nongated items among subject who did not pass the gate. This means that the subjects with observations on the gate items but not the nongate items will contribute to the estimation of the parameters for the nongate items. You can check this approach in the sample where a gate was not used and where you simulated a gate. No Knownclass matter is involved here. 


Thank you very much for the advice. We'll do that and see what happens. Thank you again, Mirjana 


For the TYPE IS MEANS COVARIANCE; option for input data, can Mplus read in multiple group data? I have tried the following structure in my data file without success (mean vectors are row vectors and covariance matrices are lower right triangular with blank upper left as shown in the Mplus Users Guide): mean vector for group 1 covariance matrix for group 1 mean vector for group 2 covariance matrix for group 2 Thank you 


For multiple group analysis, you need to also specify the NOBSERVATIONS and NGROUPS options. This is described in Chapter 13 under Summary Data One Dataset in the Multiple Group discussion. If this is not the problem, please send your input, data, output, and license number to support@statmodel.com. 


I have a 5wave longitudinal dataset in which some participants are siblings. To account for the nested structure of the data, I would like to conduct a multilevel longitudinal growth mixture analysis. My outcome variables are categorical, not continuous. The Mplus manual includes examples for continuous outcomes (10.8  10.10) but not for categorical. Can multilevel GMMs be done with categorical outcomes? Thank you in advance. 


Yes. 

Mahima Hada posted on Saturday, July 24, 2010  11:22 am



Hi, I am estimating a twolevel path analysis Level 1 is i and Level 2 is j. Equations: Biasij = a1_0j + b1_ij* X1_ij + e_ij1 Evalij = a2_0j + b2_ij*X2_ij + e_ij2 , And the manager level (level 2) equations are, a10j = v100 + r0_j1 a20j = v200 + r0_j2 Next,I have estimated the same equations to check for latent classes at the i level. I understand how twolevel latent class regression works statistically (from Muthen and Asaparouhov 2009). Is there a paper you could guide me to which would describe the twolevel latent class model for a path analytic regression. How is the multinomial regression equation set up? And how are the errors correlated across the two sets of equations, random effects and the classes? Thanks for your guidance, Mahima 


The only other paper of some relevance to this (although it is not about path analysis) is: Henry, K. & Muthén, B. (2010). Multilevel latent class analysis: An application of adolescent smoking typologies with individual and contextual predictors. Structural Equation Modeling, 17, 193215. Regarding the multinomial regression, I don't think there is any difference between how it is set up in MuthenAsparouhov (2009; eqn 4). As in that paper, the errors can be correlated within each of the two levels. 


Thanks! 


Dear Dr. Muthen, I am trying to run a multiple group(2) growth mixture model with two known classes (gender). The purpose of the analysis is to look for different latent classes in each group. I already ran a gender specific model but the results differed with the multiple group analysis, for both groups. I found the same number of latent classes per group, but the counts and proportions for each latent class variable, the means and variances of the intercept, slope and quadratic slope differed significantly. I wonder what i'm doing wrong. Because I am still a rookie in this area, I could use some help here. My input is: Multiple groups: idvariable=idno; classes=cg(2) c(2); knownclass=cg(g1sex=0 g1sex=1); missing=all(1.00); analysis: type=mixture; starts=500 20; stiterations=20; lrtstarts=2 1 50 15; coverage=0; model: %overall% i s q lat10@0 lat11@1 lat12@2 lat13@3 lat14@4 lat15@5 lat16@6 lat17@7; q@0; i s q on Evertoba p1ses3; c on cg Evertoba p1ses3; Thank you in advance, janWillem 


If you don't find similar trajectory classes for each group in the separate analyses, it does not make sense to compare the group;s in a known class analysis. If you want us to look at this, send the outputs for the two separate analyses and the joint analysis along with your license number to support@statmodel.com. 


Thank you for your response. I think I've found the problem. With kind regards, janWillem Kroon 

Yunfei Wu posted on Wednesday, March 09, 2011  8:17 am



I am trying to measure factor invariance across gender using maximum likelihood estimator with categorical outcomes. Because I have 30 variables, I got a message "THE CHISQUARE TEST CANNOT BE COMPUTED BECAUSE THE FREQUENCY TABLE FOR THE LATENT CLASS INDICATOR MODEL PART IS TOO LARGE." Based on the suggestion of the website, it seems that I can still use the difference of loglikelihood and free parameters to get the difference test result. But the problem is that I got the same loglikelihood values from my two nest models. Is it normal? Does that mean that there is factor invariance across gender then? Thanks. 


Getting exactly the same loglikelihood probably means that you have set up the models incorrectly  send relevant information to support. 

Back to top 