You would specify c ON x in the class-specific parts of the MODEL command for the KNOWNCLASS variable.
Andy Ross posted on Tuesday, October 23, 2007 - 10:06 am
Many thanks Linda
I have another query if I may...
I ran an analysis using the knownclass function holding the conditional probabilities equal across two samples.
In the Mplus output I am informed that I have achieved this aim - however on saving the cprobs and using these to create a weight variable so that I can recreate the solution in SPSS, the solution for the two samples is only fairly equivelent. Would you know why this is? I have saved the cprobs to 16 decimal places so I expected the SPSS solution to be a highly accurate representation of the MPlus one (it has been before)
I'm not sure what you mean. What solution are you trying to reproduce in SPSS and how are you using the posterior probabilities to do this? If you cannot describe this briefly, please send the relevant information and your license number to firstname.lastname@example.org.
I am also working on a multigroup mixture model with known classes. My question is about the decision regarding the best model. If I run two models that are nested, how can I decide which model is the best? I would say that it is not enough to look at the BIC only.
Hello, I have another question about my model. Again I am doing a multiple group analysis using mixture modelling with know classes (i.e. sex), since my dependent variable is poisson distributed. When I tried to analyze a model with latent variables, the computation could not be completed (even when the means/regression coefficients where not allowed to differ across groups). The program advices to give starting values (see below). Can you tell me what is the best way to choose starting values for this model?
Thank you in advance!
Unperturbed starting value run did not converge.
1 perturbed starting value run(s) did not converge.
THE ESTIMATED COVARIANCE MATRIX IN CLASS 1 COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 389. CHANGE YOUR MODEL AND/OR STARTING VALUES.
THE ESTIMATED COVARIANCE MATRIX IN CLASS 1 COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 389. CHANGE YOUR MODEL AND/OR STARTING VALUES.
WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES.
I have a question about using the KNOWNCLASS option to do a multiple group analysis in a latent class model. I have covariates predicting my latent class variable and I want the relationship between the covariates and the latent classes to vary between my KNOWNCLASS groups. How do I incorporate that into the syntax?
C is my latent class variable with 3 latent classes. G is my knownclass variable, which references 2 gender groups (0=male and 1=female).
If I do it the following way, I get one set of coefficients for the regression of C on the covariates that does not vary by G.
MODEL: %Overall% C on G; C#1 on covariates; C#2 on covariates;
And the following doesn’t seem to give me the full set of regression coefficents for C on covariates.
MODEL: %Overall% C on G; C#1 on covariates; C#2 on covariates;
MODEL G: %G#1% C#1 on covariates; C#2 on covariates; %G#2% C#1 on covariates; C#2 on covariates;
Do I need the MODEL G: command or do I list the %G#1% and %G#2% under the first MODEL: section? I am confused about that.
Hi, Is it possible to run a model like example 7.21 in the mplus manual except using categorical indicators? If so, what would the input file look like?When I try to run such a model I get the following each each of my indicators:
ERROR in MODEL command Variances for categorical outcomes can only be specified using PARAMETERIZATION=THETA with estimators WLS, WLSM, or WLSMV.
I specified TYPE=MIXTURE and used the KNOWNCLASS feature to run a multigroup model that should have been run using Bayesian estimation due to a small n. I learned that Bayesian estimation is not possible for multiple groups models, but was able to run the model this way.
Can you tell me how parameters are estimated in LCA/mixture models (e.g., using KNOWNCLASS), and how this is different from default settings for multiple groups? Is parameter estimation more robust?
Thank you for your response. The same model using the GROUPING option gives an error message that reads, "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 35."
When I use the KNOWNCLASS option with LCA, some parameters are fixed automatically "TO AVOID SINGULARITY OF THE INFORMATION MATRIX." Can you see any problem with this? Are the results the same as they would be if I constrained the parameter myself?
Please send your output and license number to email@example.com. It may be that you are mentioning the first factor indicator in the groups and classes. If you do that, you relax the fact that it is fixed at one.
Laura Baams posted on Saturday, November 26, 2011 - 12:13 pm
I have a question about a parallel process model with Knownclass and Bayes.
I would like to know the regression of s1 ON i2, and s2 ON i1 estimated freely for both classes. The syntax below enables me to do that. However, I would also like to obtain the correlations s1 WITH s2 and i1 WITH i2 estimated freely for both classes.
In the multigroup MLR option I was able to do this. However with Bayes I get the following error message:
*** FATAL ERROR VARIANCE COVARIANCE MATRIX IS NOT SUPPORTED WITH ESTIMATOR=BAYES. PARTIAL EQUALITY BETWEEN TWO VARIANCE COVARIANCE BLOCKS. IF TWO PARAMETERS FROM TWO DIFFERENT VARIANCE COVARIANCE BLOCKS ARE HELD EQUAL THEN ALL THE PARAMETERS HAVE TO BE EQUAL IN THE TWO BLOCKS.
Any help would be great! Thanks so much!
This is the Analysis and Model part of the syntax:
Hi I would like to do a simple class analysis (measurement and not structural model) in three cultures. I failed to find anything on the website or in the book concerning multiple-group analysis and measurement invariance testing. Can you please let me know if I can define groups (countries) for LCA. Can you also give me some info about equality constraints in multiple group LCA? This is an example of what I (and many other psychologists) need to do with LCA:
For LCA because the only parameters are thresholds, you can simply regress the latent class indicators on the dummy variables representing the groups. It is not necessary to do multiple group analysis. If the group dummy variables influence the latent class variable but not the latent class indicators directly, you have measurement invariance. If there are some direct effects, you have measurement non-invariance.
See Chapter 14 of the user's guide where there is a section on multiple group analysis. Everything in this section also applies to known classes. See the Topic 5 course handout on the website where multiple group analysis with mixture models is discussed.
Thank you very much indeed for your reply. The method you are suggesting reminds me of a MIMIC model, and I am comfortable with it. I just made a hypothetical syntax (for three countries, 8 dichotomous items) based on your suggestion using Topic 5 course handout. I made dummies for two countries and not three. This is totally hypothetical and I have not run it yet. Just want to double check with you if my syntax is going to work well for my purpose. Do you find any problem in this syntax? Or do I need to add any other thing for this multi-group analysis?
TITLE: LCA for three countries DATA: FILE IS asb.dat; VARIABLE: NAMES ARE B1-B8 US Thai; USEVARIABLES ARE B1-B8 Us Thai; CLASSES = c(3); CATEGORICAL ARE B1-B8 Us Thai; ANALYSIS: TYPE = MIXTURE; MODEL: %OVERALL% c#1-c#3 ON US Thai; OUTPUT: TECH1 TECH8;
I am using the KNOWNCLASS to conduct a multiple-group analysis because my dependent variable is nominal. This multinomial DV is regressed on an exogenous latent factor (CFA) composed of a mix of categorical and continuous items, some of which I know are not invariant across the known classes. When I specify my baseline model with configural invariance, I see that the residual variance for the continuous indicator is fixed across known classes. This seems to be different from the normal defaults. Is the behavior of the KNOWNCLASS statement documented somewhere, regarding how to use it to evaluate measurement invariance and adjust for non-invariance? Thank you.
The defaults differ in different tracks of the program. You can relax the equality by mentioning the parameter is the class-specific parts of the MODEL command. The steps to test for measurement invariance do not differ just the defaults.
I'm trying to conduct a MG LCA with 10 binary indicators using the knownclass command (for gender) but 2 of the indicators are answered by men only. This means that the data for women can only be missing on these variables.
My question is can this MG analysis be conducted (perhaps using contraints?) if some of the variables are not common across men and women?
1. Can you explain (briefly) the purpose of the KNOWNCLASS option in mixture modeling? If classes are known, and mixture modeling accounts for group membership that is only probabilistic, what is the value of the KNOWNCLASS option? Why does Mplus software recommend the KNOWNCLASS option in certain scenarios over a multiple-group approach?
2. Can results of a mixture model estimated with the KNOWNCLASS option be compared with results from a model estimated outside of the mixture modeling framework? Or is estimation inherently different?
3. Is it possible to estimate a KNOWNCLASS model where there really is only one group?
1. Sometimes in mixture modeling, people want to compare groups like males and females. The KNOWNCLASS option is a way to do this. Sometimes multiple group analysis is not available using the GROUPING option and must be done using the KNOWNCLASS option. This has no statistical implications.
2. KNOWNCLASS and GROUPING do the same thing and if you do the same analysis in both ways you will obtain the same results.
3. Yes but the results would be the same as not using the KNOWNCLASS option. I'm not sure if we give a message in this case.
Hi Linda, can you explain the difference between the "c" and "cg" factors in Example 7.21 of Version 7 Users Guide?
It seems that both "c" and "cg" represent class membership. I'm confused about that and the implications for drawing up my general and class-specific models, where I will ultimately want to test some different patterns of beta weight (regression) parameters across the classes.
Is it that cg will reflect class differences for variances/covariances/beta weights, and c will reflect class differences for means/intercepts?
cg is based on an observed variable. The classes are therefore not estimated but are known. It is identical to a grouping variable. c is a categorical latent variable for which class membership is estimated. In both cases, parameters can vary across the classes.
OK, so if the model I am working with is not a measurement model but just a structural path model, no "c" would be needed--only the "cg" factor, which will allow the model as a whole to differ in certain ways across the classes?
Basically, Linda, this is my model (below). Since there are no latent variables in this model, can I just use "cg" but not "c"?
I have 5 classes. I am running this as a KNOWNCLASS model because aud_grp (the dependent variable) is a count variable, which works outside of the mixture modeling framework with 1 group, but not with 5 groups. Thank you.
Why is it that in some class models, one must state the MODEL command a second time (after it is stated for the overall model), and in other class models, you do not need a second model command?
I have a single class variable in my model (which I called "cg"). I did not re-state the MODEL command when I started writing code for class-specific parameters. The model ran and the results are great.
But then I noticed that I had not written the MODEL statement again for the class-specific estimates, like this:
ANALYSIS: TYPE = MIXTURE; ALGORITHM=INTEGRATION; INTEGRATION = MONTECARLO;
MODEL: %OVERALL% (overall parameters here)
MODEL cg: %cg#1% (parameters that differ in class 1) ----------------------------------------------------------
The model runs great when I DO NOT include the "MODEL cg" line, but when I do include it, I get the message: *** ERROR in MODEL command Unknown class model name CG specified in C-specific MODEL command.
Why is that? I am surprised that my initial run went smoothly without the second statement of the MODEL command.
I received review of my paper, which applies multigroup latent class. One of the points reviewer makes is:
Please provide a brief discussion on estimation when there is a different number of observations in each group.
As my groups are time points and the number of responses at different time points ranges from 3000 to 10000 I would like to ask you: Does the different size of groups affect the results in Mplus? Could you help me with some references to deal with the issue raised by the reviewer?
I already tried hard on my own to find the answer but was completely unsuccessful. Thanks in advance Piotr
I have a panel, but I don't use this information. I just treat each period separately as a different group. It will be the next step of my research to use latent transition. I think that the reviewer is just interested in technical issues of estimating multigroup LCA with different number of respondents in each group.
Multiple group analysis requires the groups to consist of different people. You should be comparing across time in a single group analysis.
Mike Todd posted on Wednesday, August 07, 2013 - 11:41 am
We have data from 2200 individuals sampled from 2 different cities. Our goal is to use 7 individual-level indicators to obtain meaningful latent profile solutions.
In exploring the possibility that the profile solutions differ between cities via the KNOWNCLASS command we have obtained somewhat confusing results
Allowing only the estimated item means to vary across cities (KNOWNCLASS categories) results in a large increase in the number of parameters (30 vs. 52) but *worse* fit as judged by absolute differences in -2LL, BIC, and AIC.
I estimated a series of 4 nested(?) models each with 3 derived/estimated classes and 2 observed/known classes. Model 1 ignores city altogether (no KNOWNCLASS command); Model 2 allows item means to vary across cities, Model 3 allows item means and class probabilities to vary across cities; and Model 4 allows item means, item variances, and class probabilities to vary across cities.
Model 4 fit better than Model 3, which fit better than Model 2, which makes sense to me. But only Model 4 fit better than Model 1, which confuses me.
I feel like I must be missing something fundamental about the nestedness (or non-nestedness) of my models. The results suggest that Models 2 and 3 are not actually less constrained versions of Model 1. Is this true?
Model 1 is not on a loglikelihood metric comparable to the other models, which also means that BIC and AIC are not on a comparable metric. The reason is that Knownclass contributes to the likelihood (imagine an observed indicator, the probability of which is estimated).
Hi, I have an LTA model with two time points and 4 classes at both time points. I need to test whether the transition probabilities differ by gender by testing 1) an LTA model in which the transition probabilities are constrained to equality across gender, and b) a model where the probabilities are allowed to vary between gender. I haven't been able to find out how to constrain trans. probabilities to equality. I tried like this:
CLASSES = csex (2) c1(4) c2(4); KNOWNCLASS IS csex (SEX=1 SEX=2); ANALYSIS: TYPE = mixture; STARTS = 100 25; MODEL: %OVERALL% c2#1 ON c1#1 csex#1 (p1); c2#1 ON c1#2 csex#1 (p2); c2#1 ON c1#3 csex#1 (p3); c2#2 ON c1#1 csex#1 (p4); c2#2 ON c1#2 csex#1 (p5); c2#2 ON c1#3 csex#1 (p6); c2#3 ON c1#1 csex#1 (p7); c2#3 ON c1#2 csex#1 (p8); c2#3 ON c1#3 csex#1 (p9);
And I can't add
c2#1 ON c1#1 csex#2 (p1); c2#1 ON c1#2 csex#2 (p2); etc. because I can't refer to the last class of csex on MODEL command.
Thank you, I modified my code based on the Web Note 13, like this: MODEL: %OVERALL% c1 ON csex; c2 ON c1; MODEL c1: %c1#1% c2#1 ON csex; c2#2 ON csex; c2#3 ON csex; %c1#2% c2#1 ON csex; etc.
but I get an error message "Invalid ON statement: C2#1 ON CSEX#1. The order of categorical latent variables does not allow for this regression." But, csex is mentioned first in the CLASSES command? So, I'm puzzled as to what to do.
the CLASSES option in the User's Guide tells me that the class on which other classes are regressed on should be first. Because I try to regress c1 and c2 on csex, I figured csex should be first in the CLASSES command. But be that as it may, I get a similar type of error message regardless of which order I use in the CLASSES. If csex is second or last, the error message says "Invalid ON statement: C1#1 ON CSEX#1. The order...", and if csex is first, I get the above error message "Invalid ON statement: C2#1 ON CSEX#1...".
Dear Prof Muthen-I have conducted two separate LPA's with two age groups (adolescents and young adults). I used the the Lo-Mendell-Rubin likelihood ratio test and the the bootstrapped likelihood test to determine the best fitting and most parsimonious models. For adolescents this is clearly a 3 class solution and for young adults a 2 class solution. My understanding is that when it is clear in this case that a 2 class solution won't fit for both groups-that a multiple-group analysis is not feasible. Is this your view? If this is the case, is it possible to test indicator mean differences across models as this is not a multiple group model? Thank you.
I agree that a multiple group analysis is not appropriate when the classes for the two groups are not the same. Indicator means across models cannot be tested.
Shuai Chen posted on Thursday, October 16, 2014 - 7:25 pm
I am fitting a multigroup mixture model with known classe gender and 3 latent classes according to example 7.21 but without parameter restriction: CLASSES = cg (2) c (3); KNOWNCLASS = cg (male = 0 male = 1);
MODEL: %OVERALL% c ON cg;
and compare it with the model without multiple group: CLASSES = c (3);
I expected to get larger loglikelihood for multigroup mixture model, but it is -5587.569, smaller than -4874.786 from the model without multiple group. I also fitted with 1-5 latent classes, and each time the multigroup mixture model has smaller loglikelihood. Any explanation?
The Knownclass loglikelihood is not on a scale comparable to that without Knownclass. This is because Knownclass essentially has an observed binary indicator as an extra DV. If you want to make this type of comparison I think you have to use gender as a covariate of c in your "multiple-group" model (in the model that takes gender into account):
c ON gender;
That way, you have the same DVs in the different models.
Shuai Chen posted on Wednesday, October 22, 2014 - 7:03 am
Thanks for the suggestion. My DVs are categorical variables. However, I tried your suggested way with 3 classes and found the thresholds for male group are the same with threshold for female group. Can I fit a model with different thresholds for two gender groups with the right loglikehood I need?
J.D. Smith posted on Friday, October 24, 2014 - 11:55 am
Hi, I receive this error when trying to run a multiple group model with the Bayesian estimator using the mixture model with KNOWNCLASS command:
*** FATAL ERROR VARIANCE COVARIANCE MATRIX IS NOT SUPPORTED WITH ESTIMATOR=BAYES. PARTIAL EQUALITY BETWEEN TWO VARIANCE COVARIANCE BLOCKS. IF TWO PARAMETERS FROM TWO DIFFERENT VARIANCE COVARIANCE BLOCKS ARE HELD EQUAL THEN ALL THE PARAMETERS HAVE TO BE EQUAL IN THE TWO BLOCKS. USE ALGORITHM=MH TO RESOLVE THIS PROBLEM.
You have two options both giving the same conditional distribution estimation [Con2 | Age Tcgen Con1 COACHS]
1. Preferable since you don't estimate as many parameters: Remove all the lines Age Tcgen Con1 COACHS; With this option the ditribution for [Age Tcgen Con1 COACHS] is not estimated, i.e., they are treated as true covariates and no assumptions are made about their distribution.
2. In each class add Age Tcgen Con1 COACHS with Age Tcgen Con1 COACHS; So that the covariances are also class specific (not just the variances).
If you have missing data on these variables only option 2 will be possible.
Hi, I computed a multigroup latent profile analysis with two groups using the knownclass command.
Using the knownclass command, I don't get the adjusted Lo-Mendell-Rubin likelihood ratio test, nor the Vuong-Lo-Mendell-Rubin likelihood ration test, nor the parametric bootstrapped likelihood ratio test.
My reviewers have requested these tests nevertheless.
Which would be a good way to provide a similar information with the knownclass command, or which would be a useful workaround to this problem?
I am trying to use TYPE=MIXTURE RANDOM and the KNOWNCLASS option for multiple group analysis using XWITH (for interaction between a latent continous [nevro] and observed continous variable [sle]). My latent variable [nevro] has categorical indicator variables.
%OVERALL% nevro BY sado011 sado007 sado008 sado027 sado019; mod | nevro XWITH sle; scl6 On sle nevro mod;
%cg#1.c#1% scl6 On sle nevro mod;
However, I get the warning: "ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX." etc. Unfortunately, the fixed parameters are the regression coefficients for my interaction variables [mod].
A quick clarification: I do get an estimate for the regression coefficient for my interaction variable [mod] but the standard error is zero and accordingly I don't get a significance estimate (999.000).
In general, does computational burden increase with the number of known classes that are modeled? I am interested in running two-level mixture models with seven cohorts each represented in a known class.
However, I am concerned about running a seven-class model, when the two-level model itself already has the WITHIN and BETWEEN parts.
Is there a number of known classes beyond which computation will generally become too burdensome? In that case, I might collapse some cohorts by age group, but I want to see if I can keep all age groups separate to discern the most about the process at each age, if possible.
P.S. I know that computational burden also depends on the complexity of the model--such as the number of latent variables and random slopes that are estimated. However, I wonder if you have recommendations about sheer number of known classes as a separate factor. Does it depend on the number of observations in each class?
I don't think the number of classes is an issue at all because the classes are known. All between random effects will have to be integrated however so if you have more than two you should use montecarlo integration with 5000 points. Soon Mplus will have the Bayesian estimation for these models and it will be easy to estimate.
Fan Xizhen posted on Thursday, February 16, 2017 - 1:56 am
Dear Prof. Muthen, I¡¯m working on a multigroup profile analysis with known classes, but I¡¯m still confused about the syntax, I need your help in the following problems: 1) The example 7.21 (Mixture modelling with known classes - multiple group analysis) in the Mplus User Guide Ver_7.0 is a fully uncontrained MLPA, right? So the means of y1, y2, y3, and y4 vary across the classes of c, while the variances of y1, y2, y3, and y4 vary across the classes of cg. 2) In the example 7.21, it did not mention whether the profile sizes vary freely across samples or stay equal? If I want to constrain the profile sizes of different samples, which syntax should I use? I searched through the manual, but I just could not find it. 3)How to constrain the means or variances of LPA indicators between different samples, what is the syntax?
Thanks in advance, your reply would be highly appreciated!
Jon Heron posted on Thursday, February 16, 2017 - 2:31 am
I wonder whether you might get along better following the convention shown in example 8.8
you have two grouping variables CG and C, one is measured perfectly and the other is not. If each has two categories then you are just talking about 4 groups defined by their combination.
by specifying the means and variances of Yi within each combination you can apply whatever constraints you are interested in.
Fan Xizhen posted on Thursday, February 16, 2017 - 3:28 am
Hi Jon, Thank you very much! Acturally, no, I'm struggling with this syntax. Let me check if I got what you said. If I want the means and variance to be equal, I just need to put it like this: MODEL: %OVERALL% c ON cg; %cg#1.c#1% [y1-y5] (p1-p5); %cg#2.c#1% [y1-y5](p5-p10); %cg#1.c#2% [y1-y5] (p1-p5); %cg#2.c#2% [y1-y5](p5-p10); And I'm also confused about how to constrain the profile sizes.
Jon Heron posted on Thursday, February 16, 2017 - 3:53 am
To be on the safe side I would specify the variance terms too. it's then a trivial step to relax them if you choose. perhaps I am over-cautious, but I don't like getting estimates which I haven't explicitly asked for.
as for the profile sizes, I think removing the "c on cg;" command will do that for you.