Mixture modeling with multilevel data
Message/Author
 Anonymous posted on Tuesday, February 19, 2002 - 3:06 pm
I would like to know if a multilevel or multisample mixture model is possible with Mplus. If so, can you please point me to further readings?

Many thanks!
 Linda K. Muthen posted on Tuesday, February 19, 2002 - 3:55 pm
Mplus can currently do Mixture Complex with and without missing data. It can also do Mixture mutisample using training data to specify the groups. Mixture multilevel is currently under development.
 Scott Grey posted on Wednesday, June 14, 2006 - 11:47 am
Linda,

I'm trying to run a LPA using 'TYPE IS MIXTURE COMPLEX' but I do not get standard errors in my output. Here's the code:

DATA:
FILE IS "C:\Documents and Settings\insthealthsa4\My Documents\DARE\
External prevention programming\CFA3.dat";

VARIABLE:
NAMES ARE n1q44a_7 n1q44a_8 n1q44a_9 n1q44b_7 n1q44b_8 n1q44b_9
n1q44c_7 n1q44c_8 n1q44c_9 n1q44d_7 n1q44d_8 n1q44d_9 v10q52 v10q55
n2q49 utq39 crsswlk ms7dist treatms inclass outclass dare other
t7 t8 t9 t10 msdist;
USEVARIABLES ARE inclass outclass dare other;
CLUSTER IS ms7dist;
CLASSES = class(5);

ANALYSIS:
TYPE IS MIXTURE COMPLEX;
STARTS = 50 5;
LOGHIGH = +25;
LOGLOW = -25;
UCELLSIZE = 0.01;
ESTIMATOR IS MLR;
LOGCRITERION = 0.0000001;
ITERATIONS = 1000;
CONVERGENCE = 0.000001;
MITERATIONS = 500;
MCONVERGENCE = 0.000001;
MIXC = ITERATIONS;
MCITERATIONS = 2;
MIXU = ITERATIONS;
MUITERATIONS = 2;

MODEL: %OVERALL%
inclass-other WITH outclass-other;

OUTPUT: TECH8 TECH11;

SAVEDATA:
FILE IS LC3;
FORMAT IS FREE;
SAVE = CPROB;

THANKS FOR YOUR HELP!
 Linda K. Muthen posted on Wednesday, June 14, 2006 - 1:47 pm
I'm afraid this information does not tell me why you don't get standard errors. Please send your input, data, output, and license number to support@statmodel.com.
 Hao Duong posted on Saturday, October 18, 2008 - 11:02 am
Dr. Muthen,
I am confused about interpretations for ib sb on w, and c on w in example 10.9 in Mplus user's guide in 2007.
Would you please explain them for me?
Thank you
I appreciate all your help!
Hao Duong
 Linda K. Muthen posted on Monday, October 20, 2008 - 8:43 am
All regressions are linear regressions. ib and sb are continuous growth factors. c#1 is a random intercept, also a continuous latent variable.
 Miguel Villodas posted on Monday, August 22, 2011 - 10:22 am
Hello,

I have two questions. I am attempting to establish a GMM with four time points. However, the time points are not equally spaced. I do have an indicator of time (age at each assessment).

1. Would this preclude me from using a traditional GMM? I was under the impression that I would have to use a two level model, rather than LGCM because of the unequal time between interviews, and regress my outcome on my time variable.

2. Also, if multilevel modeling is required, would it still be possible to examine classes of trajectories?
 Bengt O. Muthen posted on Monday, August 22, 2011 - 10:39 am
1. No. A single-level model is sufficient. Just use time scores that reflect the non-equidistance.

2. Yes, you can use between-level latent class variables.
 Miguel Villodas posted on Monday, August 22, 2011 - 11:52 am
Thank you very much Dr. Muthen. The problem that I have run into is that the spacing of each assessment varies individually. Is this still possible?
 Miguel Villodas posted on Monday, August 22, 2011 - 2:24 pm
I should clarify as this is unclear. I have been able to run these models using the TYPE = RANDOM command for the individually varying time scores. I am wondering if there is a way, using this option, to examine classes of trajectories? If so, how would I specify this model in MPlus?
 Bengt O. Muthen posted on Monday, August 22, 2011 - 3:49 pm
Individually-varying times of observations is handled by the TSCORES option (see UG) and it can be combined with Type=Mixtures, although it takes a little longer.
 FANG BEE LING posted on Thursday, November 01, 2012 - 2:33 am
Hi,

I'm doing multilevel latent class analysis with 12 items which have 4 categorical options. I tried few times but keep pop up some error message. And I'm not sure what's gone wrong. Hope someone can advise me on this. many thanks.

my input instruction:
VARIABLE:
NAMES ARE wel mor dif enj str qui bor lik hel oth uni job schid ;
USEVARIABLES = wel mor dif enj str qui bor lik hel oth uni job ;
CATEGORICAL ARE wel mor dif enj str qui bor lik hel oth uni job ;
MISSING ARE all (9);
CLASSES=C(3);
CLUSTER=schid;
WITHIN=wel mor dif enj str qui bor lik hel oth uni job ;
ANALYSIS:
TYPE = MIXTURE TWOLEVEL ;
STARTS = 20 10;
PROCES = 8 (STARTS);
MODEL:
%WITHIN%
%OVERALL%
%BETWEEN%
%OVERALL%
C#1; C#2; C#1 WITH C#2;

*** ERROR
Categorical variable WEL contains 158 categories.
This exceeds the maximum allowed of 10.
 Linda K. Muthen posted on Thursday, November 01, 2012 - 7:46 am
It sounds like you are reading your data incorrectly. Perhaps the number of variable names is not the same as the number of columns in your data set or you have blanks in your data. If you can't figure this out, send your files and license number to support@statmodel.com.
 jilke@fsw.eur.nl posted on Thursday, April 18, 2013 - 3:03 am
I am implementing a IRT multilevel mixture model with item bias effects in Mplus. My structure has items nested in individuals, nested in countries.

I would like to code the mixtures nominal, yielding discrete random effects. For identification I use effect-coding for the mixtures. But when using the following syntax, there are no differences for the latent means across mixtures; both are zero.

Am I applying the correct syntax for effect-coding of mixtures?

VARIABLE:
NAMES = id item1 item2 item3 country;
USEVARIABLES = item1 item2 item3;
CATEGORICAL = item1 item2 item3;
CLUSTER= country;
CLASSES= eta3 (2);
BETWEEN= eta3;

ANALYSIS:
ALGORITHM=integration;
TYPE=TWOLEVEL MIXTURE;

MODEL:
%WITHIN%
%OVERALL%
ETA2w BY
item1* (a)
item2 (b)
item3 (c);
ETA2w@1;
[ETA2w@0];

%BETWEEN%
%OVERALL%
ETA2b BY
item1* (a)
item2 (b)
item3 (c);

%eta3#1% !Mixture #1
[ETA2b] (d); !Mean of latent var for mixture #1

item2; !item bias effects
item3;

%eta3#2% !Mixture #2
[ETA2b] (e); !Mean of latent var for mixture #2

item2; !item bias effects
item3;

model constraint:
0= d+e; !effect coding for mixtures
 Linda K. Muthen posted on Thursday, April 18, 2013 - 9:33 am
The latent variable mean must be fixed at zero in one class for model identification. So d and e in MODEL CONSTRAINT are not both identified.
 Artur Pokropek posted on Thursday, February 13, 2014 - 8:51 am
Hello,
I'm running Multilevel Latent Class model similar to model presented in Henry&Muthén(2010). The core of the model is multilevel logistic model: resp on negd1 negd2 negd3; and 2 latent classes are specified. Estimation of the model works when I've two regression in each of the latent class:

MODEL:
%WITHIN%
%OVERALL%
resp on negd1 negd2 negd3;
cw#1 on time ;
%CW#1%
resp on negd1 negd2 negd3 ;
%CW#2%
resp on negd1 negd2 negd3 ;
%BETWEEN%
%OVERALL%
resp ;
[resp$1] ; %CW#1% resp ; [resp$1] ;
%CW#2%
resp ;
[resp$1] ; But the problem appears when I want to have empty logistic model in class 2. i.e model without explanatory variables. I was trying several specifications. None of them worked. While I'm declaring ON statement in %OVERALL% part I get ON statements in all classes. When ON statement is not declared in %OVERALL% part I'm not allowed to specify it for class 1. Is there a way to specify logistic model with explanatory variables in class 1 and empty model in class 2? Thank You!  Bengt O. Muthen posted on Friday, February 14, 2014 - 12:02 pm Please send to Support the data and output for the case where you specified the ON statement in the Overall part and fix the slopes in class 2.  Lisa M. Yarnell posted on Wednesday, April 09, 2014 - 6:25 pm Hello, I plan to use UG Example 10.12 (two-level LTA with a covariate) for my analyses. I have students nested in schools. I understand the code in the example, except I want to clarify one aspect of it. In the code below, why are the indicators for the latent classes modeled at the between level? If these reflect individual responses (such as from individual students), wouldn't those be on the within level? Or is it that because we are estimating probabilities (or mean responses) for the items for persons, conditional on class, this becomes an average across persons--no longer on the within level? Thank you. MODEL: %WITHIN% %OVERALL% c2 ON c1 x; c1 ON x; %BETWEEN% %OVERALL% c1#1 ON w; c2#1 ON c1#1 w; c1#1 c2#1; MODEL c1: %BETWEEN% %c1#1% [u11$1-u14$1] (1-4); %c1#2% [u11$1-u14$1] (5-8); MODEL c2: %BETWEEN% %c2#1% [u21$1-u24$1] (1-4); %c2#2% [u21$1-u24$1] (5-8);  Linda K. Muthen posted on Thursday, April 10, 2014 - 9:45 am In multilevel modeling, all mean/threshold/intercept parameters are on the highest level.  Lisa M. Yarnell posted on Thursday, April 10, 2014 - 5:18 pm OK, may I request your input on the following questions regarding this section of the code: %BETWEEN% %OVERALL% c1#1 ON w; c2#1 ON c1#1 w; c1#1 c2#1; Understanding the code above: 1) Why is c2#2 ON c1#2 (and other combinations such as c2#2 ON c1#1) not above, similar to the second line? I believe that the code above regresses the cluster-level (average or intercept) latent status for class 1 of c2 on that for class 1 of c1. This is part of the random intercept setup. But wouldn't regressing the second class of each latent class variable make sense to do as well? 2) By the same token, why does the above code not show "c1#2 c2#2;" as well? Is it that by allowing intercepts for the first class for each latent class variable to vary across clusters, these are already free to differ from those for the second latent classes? Altering the code for my data: 3) If I have 3 latent class variables rather than 2, and want to have random intercepts, I would also model c3#1 ON c2#1, right? 4) Finally, if I try a model without random intercepts, would I remove both the "c2#1 ON c1#1", and "c1#1 c2#1;" sections of code? Thank you sincerely.  Bengt O. Muthen posted on Friday, April 11, 2014 - 11:36 am There are only 2 classes in which case there is no between counterpart for the second class just like in multinomial logistic regression. With 3 classes you have between counterparts for the first 2.  Jiwon Shannon Choi posted on Thursday, May 01, 2014 - 5:02 am Dear Muthen I am writing this post to ask MLCA analysis. is it possible to run "multilevel LCA" with covariates and distal outcome simultaneously? the number of individual cases are 580 nested with 30 organizations. Thank you in advance.  Linda K. Muthen posted on Thursday, May 01, 2014 - 10:34 am Thirty organizations is the minimum you should have. Yes, this model is possible.  Yasumasa Otsuka posted on Monday, October 06, 2014 - 6:44 pm I'm now considering to calculate the MOR proposed by Larsen & Merlo (2005). But due to my poor statistical ability, I cannot understand the mathematical expression shown in page 83, 1st line. Could anyone give me some example with actual number to calculate MOR in this article?  Bengt O. Muthen posted on Tuesday, October 07, 2014 - 9:23 am Try contacting the authors, or post on Multilevelnet.  Chris Kenaszchuk posted on Friday, June 03, 2016 - 2:33 pm Some segments of Example 10.1, 'Two-level mixture regression for a continuous dependent variable,' are below. I'd like to modify the program for two objectives: (1) Incorporate the measurement error table output from step 1/step 2 of manual 3-step estimation, i.e., "Logits for the Classification Probabilities for the Most Likely..." (2) Temporarily omit the regressions from the overall model in the between part of the model. The example does not mention %C#2% in the within part of the model. Is it necessary to omit %C#2%? If so, then how could the parameter N#1@ be used for %C#2%? VARIABLE: CLASSES = C(2); WITHIN = X1 X2 N; ! N is my addition ! BETWEEN = W; ! Not using for now CLUSTER = CLUS; NOMINAL = N; ANALYSIS: TYPE = TWOLEVEL MIXTURE; STARTS=0; MODEL: %WITHIN% %OVERALL% Y ON X1 X2; C ON X1; %C#1% [N#1@1.901]; ! As per Webnote 15, appendix E, step 3 of ! manual 3-step estimation Y ON X2; Y; %BETWEEN% %OVERALL% ! Y ON W; ! No between-level regression ! C#1 ON W; ! No between-level regression ! C#1*1; ! A starting value is not needed %C#1% [Y*2]; OUTPUT: TECH1 TECH8;  Bengt O. Muthen posted on Sunday, June 05, 2016 - 11:54 am It is not necessary to omit %C#2%. We did because we didn't need to say anything about c#2. So go ahead and try it.  Timothy Tuti posted on Wednesday, June 29, 2016 - 5:40 am Dear Muthen, Is it possible to have a twolevel mixture model in mplus where LCA classes are predictor variables, outcome is binary observed variable & the covariates for the model include a continous latent factor? i.e Y = Classes + continous latent factor + covariates The code below is an attempt to do this but I'm having a difficulty keeping the binary Y separate from observed categorical that are used in LCA classes. Is that possible?how do I achive that? ## UseVariables are Dailies Radio TV Maln Brstfed Eat_Freq Diet_Div C_Age Urban Wealth M_Age Female sexXage Edu; Classes=HLTH_BHV(2); Categorical are Brstfed Eat_Freq Diet_Div; WITHIN = C_Age Female sexXage; BETWEEN = Dailies Radio TV; CLUSTER = Comm_Hse; Auxiliary = (r)Wealth Urban Edu M_Age; Define: sexXage=C_Age*Female; CENTER C_Age(GRANDMEAN); Analysis: Type=TWOLEVEL MIXTURE; Starts=4000 40; Processors = 4; Algorithm=integration; Model: %WITHIN% %OVERALL% HLTH_BHV ON Female C_Age sexXage; %BETWEEN% %OVERALL% MED_USE BY Dailies Radio TV; Maln ON HLTH_BHV MED_USE; HLTH_BHV ON MED_USE; How do I ensure the outcome variable is treated as binary when LCA classes are regressed on it?  Bengt O. Muthen posted on Friday, July 01, 2016 - 9:37 am You don't say which variable the binary outcome variable is. Perhaps it is "Maln" but it is not declared categorical. You say "when LCA classes are regressed on it". If an outcome is a function of the latent class variable, you should say "the outcome is regressed on the latent class variable", not the other way around. But note that in Mplus you don't say "y ON c", but instead let the default change of y means/thresholds be the effect of c on y. See also our 3-step papers on the website such as Web Note 21.  Chris Kenaszchuk posted on Friday, July 22, 2016 - 9:52 am I want to change the reference class from being class 3 to class 2. Would I use the outputted values of class 2 as user-specified starting values for class 3 in the next run? Where do I insert the desired starting values in the code below? Thanks. USEVARIABLES = y x1 x2 w n; NOMINAL = n ; CATEGORICAL = y; CLASSES = C(3); CLUSTER = clus; WEIGHT = wgt; WITHIN = n x1 x2; BETWEEN= w; ANALYSIS: TYPE = MIXTURE TWOLEVEL; MODEL: %WITHIN% %OVERALL% y ON x1 x2; C ON x1 x2; %C#1% [N#1@4.2]; [N#2@1.4]; [N#3@2.9]; y ON x1 x2; %C#2% [N#1@0.8]; [N#2@3.9]; [N#3@1.9]; y ON x1 x2; %C#3% [N#1@2.6]; [N#2@2.3]; [N#3@6.4]; y ON x1 x2; %BETWEEN% %OVERALL% y ON w; C#1 ON w; C#1*1; C#2 ON w; C#2*1; %C#1% y ON w; [y$1*2];

%C#2%
y ON w;
[y$1*2]; %C#3% y ON w; [y$1*2];
 Bengt O. Muthen posted on Friday, July 22, 2016 - 12:56 pm
Try switching values only for the nominal statements

[N#1@...]
 Youmi Suk posted on Monday, February 27, 2017 - 11:24 am
In multilevel mixture regression, if we do not use no particular indicators for latent classes and have a model with categorical and continuous variables (covariates), do we have to specify the variable type of covariates?

In the example 10.1,
NAMES ARE y x1 x2 w1 w2 class clus;
USEVARIABLES = y x1 x2 w1 w2;
CLASSES = c (2);
WITHIN = x1 x2;
BETWEEN = w1 w2;
CLUSTER = clus;

If x1 (a within level variable) w1 (a between-level variable) are binary variables,
should we specify x1 w1 as categorical variables (CATEGORICAL = x1 w1)?

Like this,

NAMES ARE y x1 x2 w1 w2 class clus;
USEVARIABLES = y x1 x2 w1 w2;
CATEGORICAL = x1 w1;
CLASSES = c (2);
WITHIN = x1 x2;
BETWEEN = w1 w2;
CLUSTER = clus;

If I specify them using ¡°CATEGORICAL,¡± I get threshold information, relating to indicators.

 Bengt O. Muthen posted on Monday, February 27, 2017 - 6:46 pm
You should not declare a variable type such as categorical for a covariate.
 Youmi Suk posted on Monday, February 27, 2017 - 7:48 pm
Thanks much for the quick reply.

I have one more question regarding a variable type for a dependent variable in the same situation above except for the variable type of the dependent variable (a continuous dependent variable -> a binary dependent variable).

Given the fact that we cannot declare a variable type, cannot we use a categorical dependent variable for multilevel mixture (logistic) regression?

I guess that if we cannot specify a categorical variable as a dependent variable in Mplus, do we use the linear-probability model (i.e., standard regression), rather than using logistic regression model?

I really appreciate your help in advance.
 Bengt O. Muthen posted on Tuesday, February 28, 2017 - 6:15 pm
You can declare a dependent variable (DV) as categorical or anything else - I don't know why you think you cannot. I was referring to a covariate, not a DV.
 Youmi Suk posted on Friday, March 10, 2017 - 12:56 pm
(1) We are trying to fit 2-class multilevel models with Level 1 latent classes

(2) With a binary DV, adding "DV;" provides class-specific variance estimates for the random intercept. It worked.

VARIABLE:
NAMES ARE y x w clus;
USEVARIABLES = y x w;
CLASSES = cw (2);
CATEGORICAL = y;
WITHIN = x;
BETWEEN = w;
CLUSTER = clus;

MODEL:
%WITHIN%
%OVERALL%
y on x;
%cw#1%
y on x;
%cw#2%
y on x;

%BETWEEN%
%OVERALL%
y on w;
%cw#1%
y on w;
%cw#2%
y on w;
y;

(3) However, with a continuous DV, adding "DV;" did not work. The program stopped with the error message.

VARIABLE:
NAMES ARE y x w clus;
USEVARIABLES = y x w;
CLASSES = cw (2);
WITHIN = x;
BETWEEN = w;
CLUSTER = clus;

MODEL:
%WITHIN%
%OVERALL%
y on x;
%cw#1%
y on x;
%cw#2%
y on x;
y;

%BETWEEN%
%OVERALL%
y on w;
%cw#1%
y on w;
%cw#2%
y on w;
y;

The error message is as follows:
-------------
*** FATAL ERROR
CLASS-SPECIFIC BETWEEN VARIABLE PROBLEM.
-------------
We would like to allow class-specific random intercept variances with a continuous DV, identifying Level-1 latent classes. Could you help me out with this problem?

Thank you in advance for your help.
 Bengt O. Muthen posted on Friday, March 10, 2017 - 6:08 pm
Please send the 2 outputs to Support along with your license number.
 Virginia Rangel posted on Tuesday, September 10, 2019 - 1:57 pm
I am trying to run a two-level LCA and am having trouble because the program keeps freezing my computer right after it starts estimating the model. Our IT guy suggested that the data file might be corrupt, but I can run other, more basic, analyses using the same file, so I assume that is not the issue. I have checked for blanks and random characters in the data file and there are none. I'm out of ideas as to what might be causing the program to freeze and crash my computer.
 Bengt O. Muthen posted on Wednesday, September 11, 2019 - 11:56 am
Send your input and data to Support along with your license number.
 Virginia Rangel posted on Tuesday, September 17, 2019 - 9:44 am
Variable:

missing=All(-9);

Weight=w1;

IDvariable=stu_id;

usevariables=sch_ID female latinx black asian other firstgen math ses scise sciint sciid sciut mathse mathin mathid mathut STEM GPA private city town rural frl pctlat pctblack scifair summer mentor pdlearn pdint calc compsci chem phys;

categorical=scifair summer mentor pdlearn pdint calc compsci chem phys;

classes=cb(4) cw(4);

within=female latinx black asian other firstgen math ses scise sciint sciid sciut mathse mathin mathid mathut;

between=private city town rural frl pctlat pctblack scifair summer mentor pdlearn pdint calc compsci chem phys;

cluster=sch_id;

Analysis:
type=mixture twolevel;
processors=8(starts);
miteration=5000;
starts=20000 2000;
stiterations=100;

model:
%within%
%overall%
cw ON female latinx black asian other firstgen math ses;

%between%
%overall%
cb ON private city town rural frl pctlat pctblack;

cw on cb;

MODEL cw:
%cw#1%
[scise sciint sciid sciut]
[mathse mathin mathid mathut];

%cw#2%
[scise sciint sciid sciut]
[mathse mathin mathid mathut];

%cw#3%
[scise sciint sciid sciut]
[mathse mathin mathid mathut];

%cw#4%
[scise sciint sciid sciut]
[mathse mathin mathid mathut];
 Bengt O. Muthen posted on Tuesday, September 17, 2019 - 2:08 pm
What is your question? Note that cb should be put on the Between= list.
 Virginia Rangel posted on Tuesday, September 17, 2019 - 2:19 pm
This was my question from above, I ran out of space, sorry. I cannot send my data file, also sorry.

I am trying to run a two-level LCA and am having trouble because the program keeps freezing my computer right after it starts estimating the model. Our IT guy suggested that the data file might be corrupt, but I can run other, more basic, analyses using the same file, so I assume that is not the issue. I have checked for blanks and random characters in the data file and there are none. I'm out of ideas as to what might be causing the program to freeze and crash my computer.
 Bengt O. Muthen posted on Tuesday, September 17, 2019 - 2:56 pm
Are you running in version 8.3? If so, the only way we can help you about potential freezing is to run your data.

Also, you specify:

starts=20000 2000;
stiterations=100;

You should not need that many starts to replicate the best loglikelihood. And you should use the default for Stiterations (which I believe is 20); I have never encountered a situation needing that many Stiterations.
 Virginia Rangel posted on Tuesday, September 17, 2019 - 4:31 pm
Yes, I'm running the most recent version of Mplus. I'll make the changes you have suggested and see if that helps. I appreciate the suggestions.
 Youmi Suk posted on Thursday, April 09, 2020 - 3:09 am
Here is my code:

VARIABLE:
USEVARIABLES = Z X1 W1;
WITHIN = X1 ;
BETWEEN = W1 CB;
CLUSTER = id;
CLASSES = CB (2);
CATEGORICAL = Z;

ANALYSIS:
TYPE = TWOLEVEL MIXTURE;
STARTS = 20 3;
ESTIMATOR = ML;

MODEL:
%WITHIN%
%OVERALL%
Z ON X1 ;
%CB#1%
Z ON X1 ;

%BETWEEN%
%OVERALL%
Z on W1 ;
%CB#1%
Z on W1 ;
Z;

individual = i and cluster (e.g., school) = j.
Z_ij = binary individual-level dependent variable
X1_ij = individual-level independent variable
W1_j = cluster-level independent variable
id = cluster id

With the between-level categorical latent variable, K_j, I get cluster-specific posterior probabilities, P(K_j =k | X1_ij, W1_j).
Using Bayes rule, P(K_j =k | X1_ij, W1_j) = [P(K_j=k)P(Z_ij=1 |K_j=k, X1_ij, W1_j)] / [\sum P(K_j=k)P(Z_ij=1 |K_j=k, X1_ij, W1_j)]

P(K_j=k) is the constant across all the observations from the current model only with intercept. P(Z_ij=1 |K_j=k, X1_ij, W1_j) should be individual-specific due to individual-level variable, X1_ij. Then, P(K_j=k)P(Z_ij=1 |K_j=k, X1_ij, W1_j) should be individual-specific. But, we obtain cluster-specific posterior probabilities. Could you please explain how we can get cluster-specific posterior probabilities in this model?
 Tihomir Asparouhov posted on Thursday, April 09, 2020 - 5:18 pm
This is easier to look at if you use probit link as it will avoid the numerical integration associated with ZB

[ZB_j|W1_j,K_j=k] ~N(mb_j, v_j), where mb_j=beta_b*W1_j

[ZW_ij|X1_ij,K_j=k] ~N(mw_j, 1), where mw_ij=beta_w*X1_ij

P(Z_ij=1 |K_j=k, X1_ij, W1_j)=P(ZW_ij+ZB_j>tau)=1-Phi((tau-mb_j-mw_ij)/sqrt(1+v_j))

where Phi is the standard normal distribution function.
 Youmi Suk posted on Monday, April 13, 2020 - 8:43 am
Thank you for your explanation, but I’m not still getting that. I changed the code, now without W1, to make it more straightforward.

VARIABLE:
USEVARIABLES = Z X1;
WITHIN = X1 ;
BETWEEN = CB;
CLUSTER = id;
CLASSES = CB (2);
CATEGORICAL = Z;

ANALYSIS:
TYPE = TWOLEVEL MIXTURE;
STARTS = 20 3;
ESTIMATOR = ML;

MODEL:
%WITHIN%
%OVERALL%
Z ON X1 ;
%CB#1%
Z ON X1 ;

No %BETWEEN% Model (assume intercept only for each between-level categorical latent variable (CB)).

In this setting only with X1_ij, I get cluster-specific posterior probabilities, P(K_j =k | X1_ij) for the between-level categorical latent variable (CB), K_j.
Using Bayes rule, P(K_j =k | X1_ij) = [P(K_j=k)P(Z_ij=1 |K_j=k, X1_ij)] / [\sum P(K_j=k)P(Z_ij=1 |K_j=k, X1_ij )]

P(K_j=k) is the constant from the current model only with intercept. P(Z_ij=1 |K_j=k, X1_ij) should be individual-specific due to individual-level variable, X1_ij. Then, P(K_j=k)P(Z_ij=1 |K_j=k, X1_ij) should be individual-specific. But, we obtain cluster-specific posterior probabilities, P(K_j =k | X1_ij). Could you please elaborate on how to get cluster-specific posterior probabilities, P(K_j =k | X1_ij) in this model? Thank you for your help in advance.
 Tihomir Asparouhov posted on Monday, April 13, 2020 - 7:34 pm
I think I understand now what you are asking. You want to find out how we compute the posterior class probability that you get in the savedate file with the CPROB option.

For that there is no way to avoid the numerical integration, i.e., there is no way to make it into a simple formula. You need this

P(Z_1j,Z_2j,... |K_j=k, X1_1j, X1_2j, ... )= integral-over-ZB of the product_over_j
P(Z_ij=1 |K_j=k, X1_ij, ZB )

All the relevant formulas are in here
http://www.statmodel.com/bmuthen/articles/Article_128.pdf