Mplus Discussion >> LCGA and Zero-Inflated Poisson Model

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


LCGA and Zero-Inflated Poisson Model

Mplus Discussion > Growth Modeling of Longitudinal Data >

Message/Author

Anonymous posted on Tuesday, March 08, 2005 - 10:45 am

Just a quick question. If I am using c# to capture the zero-inflation, then do I have to use the ii si | u1#1@0 u2#1@1 u3#1@2 u4#1@3; as is found in 8.11? I guess I'm confused as to how to read the ii and si in the output, so if you have any suggestions where I may go to figure this part out too? I really appreciate it.

Anonymous posted on Tuesday, March 08, 2005 - 10:50 am

I am sorry. I specifically am referring to the means portion of the output. What does it mean when the ii and/or the si have a significant mean?

bmuthen posted on Tuesday, March 08, 2005 - 11:47 am

"ut#1" refers to a dichotomous latent variable where the focus is on the probability of being in the class that cannot obtain an observed count other than zero ("zero class" at time t). The estimated ii and si means are interpreted like growth modeling of a binary outcome - for instance, the mean of i is the logit for the probability of being in the zero class at the time point with time score 0 and the mean of si gives the change in that logit over time.

Jason Bond posted on Thursday, July 13, 2006 - 10:37 am

Bengt and Linda,

When I try and run zero-inflated Poisson LCGM, I very often encounter problems with convergence. One issue may possibly be the range of the variables (0-365 with a fairly big pile up at 0 (20-50% of the cases across the 4 waves)) and misingness (9-30% of the cases across the 4 waves). Censored and even censored inflate analyses seem to be a litle easier to get to converge. Is the procedure quite sensitive to whether the distributional assumptions of the outcome variables are satisfied? Similar output is obtained for linear instead of quadratic growth. Even assuming only a Poisson response (not zero-inflated) did not seem to want to run, with an error of:
THE LOGLIKELIHOOD DECREASED IN THE LAST EM ITERATION. CHANGE YOUR MODEL
AND/OR STARTING VALUES.

Basically, the same model as below but without the (i) on the Count statement or the zero-inflated parameters line. Any thoughts you have would be appreciated.

Jason

------------------------------

Mplus VERSION 3.01
MUTHEN & MUTHEN
07/12/2006 5:18 PM

INPUT INSTRUCTIONS

TITLE: LCA For Number of AA Meetings;

DATA:
FILE IS "I:\MyFiles\Trajectories\AA-Tx-Careers\Rep-Orig-Traj\AA-TX-traj.dat";

VARIABLE:
NAMES = id dataset2 age1829 age3049 gender blckhisp white black hisp
aacapt1 aacapt2 aacapt3 aacapt4;

USEVARIABLES ARE aacapt1 aacapt2 aacapt3 aacapt4;

Classes = C(4);
MISSING ARE ALL (-9);
IDvariable = id;
Count = aacapt1-aacapt4 (i);

SAVEDATA:
FILE = "I:\MyFiles\Trajectories\AA-Tx-Careers\Rep-Orig-Traj\output.out";
SAVE = CPROBABILITIES;

ANALYSIS:
TYPE = Mixture Missing;
STARTS = 10 2;

OUTPUT:
TECH1 TECH8;

PLOT:
Type is PLOT3;
Series = aacapt1 (0) aacapt2 (1) aacapt3 (3) aacapt4 (5);

MODEL:
%OVERALL%
i s q | aacapt1@0 aacapt2@1 aacapt3@3 aacapt4@5;
ii si qi | aacapt1#1@0 aacapt2#1@1 aacapt3#1@3 aacapt4#1@5;

INPUT READING TERMINATED NORMALLY

LCA For Number of AA Meetings;

SUMMARY OF ANALYSIS

Number of groups 1
Number of observations 389

Number of dependent variables 4
Number of independent variables 0
Number of continuous latent variables 6
Number of categorical latent variables 1

Observed dependent variables

Count
AACAPT1 AACAPT2 AACAPT3 AACAPT4

Continuous latent variables
I S Q II SI QI

Categorical latent variables
C

Variables with special functions

ID variable ID

Estimator MLR
Information matrix OBSERVED
Optimization Specifications for the Quasi-Newton Algorithm for
Continuous Outcomes
Maximum number of iterations 1000
Convergence criterion 0.100D-05
Optimization Specifications for the EM Algorithm
Maximum number of iterations 500
Convergence criteria
Loglikelihood change 0.100D-06
Relative loglikelihood change 0.100D-06
Derivative 0.100D-05
Optimization Specifications for the M step of the EM Algorithm for
Categorical Latent variables
Number of M step iterations 1
M step convergence criterion 0.100D-05
Basis for M step termination ITERATION
Optimization Specifications for the M step of the EM Algorithm for
Censored, Binary or Ordered Categorical (Ordinal), Unordered
Categorical (Nominal) and Count Outcomes
Number of M step iterations 1
M step convergence criterion 0.100D-05
Basis for M step termination ITERATION
Maximum value for logit thresholds 15
Minimum value for logit thresholds -15
Minimum expected cell size for chi-square 0.100D-01
Maximum number of iterations for H1 2000
Convergence criterion for H1 0.100D-03
Optimization algorithm EMA
Random Starts Specifications
Number of initial stage starts 10
Number of final stage starts 2
Number of initial stage iterations 10
Initial stage convergence criterion 0.100D+01
Random starts scale 0.500D+01
Random seed for generating random starts 0

Input data file(s)
I:\MyFiles\Trajectories\AA-Tx-Careers\Rep-Orig-Traj\AA-TX-traj.dat
Input data format FREE

SUMMARY OF DATA

Number of patterns 0
Number of y patterns 0
Number of u patterns 0

COVARIANCE COVERAGE OF DATA

Minimum covariance coverage value 0.100

RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES

Initial stage loglikelihood values, seeds, and initial stage start numbers:

-51887.783 462953 7
-51887.783 127215 9
-51887.784 939021 8
-51887.787 903420 5
-60970.569 unperturbed 0

6 perturbed starting value run(s) did not converge.

Loglikelihood values at local maxima, seeds, and initial stage start numbers:

-51887.783 462953 7
-51887.783 127215 9

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS -0.178D-16. PROBLEM INVOLVING PARAMETER 10.

ONE OR MORE MULTINOMIAL LOGIT PARAMETERS WERE FIXED TO AVOID SINGULARITY
OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE
MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT
DISTRIBUTION OF THE CATEGORICAL LATENT VARIABLES AND ANY INDEPENDENT
VARIABLES. THE FOLLOWING PARAMETERS WERE FIXED:
2 3
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE
AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR
STARTING VALUES. PROBLEM INVOLVING PARAMETER 5.

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

Latent
Classes

1 29.25021 0.07519
2 29.25021 0.07519
3 301.24938 0.77442
4 29.25021 0.07519

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON ESTIMATED POSTERIOR PROBABILITIES

Latent
Classes

1 29.25021 0.07519
2 29.25021 0.07519
3 301.24938 0.77442
4 29.25021 0.07519

CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions

Latent
Classes

1 26 0.06684
2 23 0.05913
3 308 0.79177
4 32 0.08226

Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)
by Latent Class (Column)

1 2 3 4

1 0.250 0.250 0.250 0.250
2 0.250 0.250 0.250 0.250
3 0.029 0.029 0.912 0.029
4 0.250 0.250 0.250 0.250

MODEL RESULTS

Estimates

Latent Class 1

I |
AACAPT1 1.000
AACAPT2 1.000
AACAPT3 1.000
AACAPT4 1.000

S |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 3.000
AACAPT4 5.000

Q |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 9.000
AACAPT4 25.000

II |
AACAPT1#1 1.000
AACAPT2#1 1.000
AACAPT3#1 1.000
AACAPT4#1 1.000

SI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 3.000
AACAPT4#1 5.000

QI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 9.000
AACAPT4#1 25.000

Intercepts
AACAPT1#1 -1.471
AACAPT1 0.000
AACAPT2#1 -1.471
AACAPT2 0.000
AACAPT3#1 -1.471
AACAPT3 0.000
AACAPT4#1 -1.471
AACAPT4 0.000

Means
I 4.735
S 0.355
Q -0.073
II 0.000
SI 0.377
QI -0.011

Latent Class 2

I |
AACAPT1 1.000
AACAPT2 1.000
AACAPT3 1.000
AACAPT4 1.000

S |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 3.000
AACAPT4 5.000

Q |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 9.000
AACAPT4 25.000

II |
AACAPT1#1 1.000
AACAPT2#1 1.000
AACAPT3#1 1.000
AACAPT4#1 1.000

SI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 3.000
AACAPT4#1 5.000

QI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 9.000
AACAPT4#1 25.000

Intercepts
AACAPT1#1 -1.471
AACAPT1 0.000
AACAPT2#1 -1.471
AACAPT2 0.000
AACAPT3#1 -1.471
AACAPT3 0.000
AACAPT4#1 -1.471
AACAPT4 0.000

Means
I 4.735
S 0.355
Q -0.073
II 0.000
SI 0.377
QI -0.011

Latent Class 3

I |
AACAPT1 1.000
AACAPT2 1.000
AACAPT3 1.000
AACAPT4 1.000

S |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 3.000
AACAPT4 5.000

Q |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 9.000
AACAPT4 25.000

II |
AACAPT1#1 1.000
AACAPT2#1 1.000
AACAPT3#1 1.000
AACAPT4#1 1.000

SI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 3.000
AACAPT4#1 5.000

QI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 9.000
AACAPT4#1 25.000

Intercepts
AACAPT1#1 -1.471
AACAPT1 0.000
AACAPT2#1 -1.471
AACAPT2 0.000
AACAPT3#1 -1.471
AACAPT3 0.000
AACAPT4#1 -1.471
AACAPT4 0.000

Means
I 3.524
S 0.490
Q -0.088
II 0.000
SI 0.377
QI -0.011

Latent Class 4

I |
AACAPT1 1.000
AACAPT2 1.000
AACAPT3 1.000
AACAPT4 1.000

S |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 3.000
AACAPT4 5.000

Q |
AACAPT1 0.000
AACAPT2 1.000
AACAPT3 9.000
AACAPT4 25.000

II |
AACAPT1#1 1.000
AACAPT2#1 1.000
AACAPT3#1 1.000
AACAPT4#1 1.000

SI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 3.000
AACAPT4#1 5.000

QI |
AACAPT1#1 0.000
AACAPT2#1 1.000
AACAPT3#1 9.000
AACAPT4#1 25.000

Intercepts
AACAPT1#1 -1.471
AACAPT1 0.000
AACAPT2#1 -1.471
AACAPT2 0.000
AACAPT3#1 -1.471
AACAPT3 0.000
AACAPT4#1 -1.471
AACAPT4 0.000

Means
I 4.735
S 0.355
Q -0.073
II 0.000
SI 0.377
QI -0.011

Categorical Latent Variables

Means
C#1 0.000
C#2 0.000
C#3 2.332

TECHNICAL 1 OUTPUT

PARAMETER SPECIFICATION FOR LATENT CLASS 1

PARAMETER SPECIFICATION FOR LATENT CLASS 2

PARAMETER SPECIFICATION FOR LATENT CLASS 3

PARAMETER SPECIFICATION FOR LATENT CLASS 4

PARAMETER SPECIFICATION FOR LATENT CLASS REGRESSION MODEL PART

ALPHA(C)
C#1 C#2 C#3 C#4
________ ________ ________ ________
1 1 2 3 0

PARAMETER SPECIFICATION FOR THE CENSORED/NOMINAL/COUNT MODEL PART

NU(P) FOR LATENT CLASS 1
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 4 0 4 0 4

NU(P) FOR LATENT CLASS 1
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0 4 0

LAMBDA(P) FOR LATENT CLASS 1
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0 0 0 0 0
AACAPT1 0 0 0 0 0
AACAPT2# 0 0 0 0 0
AACAPT2 0 0 0 0 0
AACAPT3# 0 0 0 0 0
AACAPT3 0 0 0 0 0
AACAPT4# 0 0 0 0 0
AACAPT4 0 0 0 0 0

LAMBDA(P) FOR LATENT CLASS 1
QI
________
AACAPT1# 0
AACAPT1 0
AACAPT2# 0
AACAPT2 0
AACAPT3# 0
AACAPT3 0
AACAPT4# 0
AACAPT4 0

ALPHA(P) FOR LATENT CLASS 1
I S Q II SI
________ ________ ________ ________ ________
1 5 6 7 0 8

ALPHA(P) FOR LATENT CLASS 1
QI
________
1 9

NU(P) FOR LATENT CLASS 2
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 4 0 4 0 4

NU(P) FOR LATENT CLASS 2
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0 4 0

LAMBDA(P) FOR LATENT CLASS 2
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0 0 0 0 0
AACAPT1 0 0 0 0 0
AACAPT2# 0 0 0 0 0
AACAPT2 0 0 0 0 0
AACAPT3# 0 0 0 0 0
AACAPT3 0 0 0 0 0
AACAPT4# 0 0 0 0 0
AACAPT4 0 0 0 0 0

LAMBDA(P) FOR LATENT CLASS 2
QI
________
AACAPT1# 0
AACAPT1 0
AACAPT2# 0
AACAPT2 0
AACAPT3# 0
AACAPT3 0
AACAPT4# 0
AACAPT4 0

ALPHA(P) FOR LATENT CLASS 2
I S Q II SI
________ ________ ________ ________ ________
1 10 11 12 0 8

ALPHA(P) FOR LATENT CLASS 2
QI
________
1 9

NU(P) FOR LATENT CLASS 3
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 4 0 4 0 4

NU(P) FOR LATENT CLASS 3
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0 4 0

LAMBDA(P) FOR LATENT CLASS 3
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0 0 0 0 0
AACAPT1 0 0 0 0 0
AACAPT2# 0 0 0 0 0
AACAPT2 0 0 0 0 0
AACAPT3# 0 0 0 0 0
AACAPT3 0 0 0 0 0
AACAPT4# 0 0 0 0 0
AACAPT4 0 0 0 0 0

LAMBDA(P) FOR LATENT CLASS 3
QI
________
AACAPT1# 0
AACAPT1 0
AACAPT2# 0
AACAPT2 0
AACAPT3# 0
AACAPT3 0
AACAPT4# 0
AACAPT4 0

ALPHA(P) FOR LATENT CLASS 3
I S Q II SI
________ ________ ________ ________ ________
1 13 14 15 0 8

ALPHA(P) FOR LATENT CLASS 3
QI
________
1 9

NU(P) FOR LATENT CLASS 4
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 4 0 4 0 4

NU(P) FOR LATENT CLASS 4
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0 4 0

LAMBDA(P) FOR LATENT CLASS 4
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0 0 0 0 0
AACAPT1 0 0 0 0 0
AACAPT2# 0 0 0 0 0
AACAPT2 0 0 0 0 0
AACAPT3# 0 0 0 0 0
AACAPT3 0 0 0 0 0
AACAPT4# 0 0 0 0 0
AACAPT4 0 0 0 0 0

LAMBDA(P) FOR LATENT CLASS 4
QI
________
AACAPT1# 0
AACAPT1 0
AACAPT2# 0
AACAPT2 0
AACAPT3# 0
AACAPT3 0
AACAPT4# 0
AACAPT4 0

ALPHA(P) FOR LATENT CLASS 4
I S Q II SI
________ ________ ________ ________ ________
1 16 17 18 0 8

ALPHA(P) FOR LATENT CLASS 4
QI
________
1 9

STARTING VALUES FOR LATENT CLASS 1

STARTING VALUES FOR LATENT CLASS 2

STARTING VALUES FOR LATENT CLASS 3

STARTING VALUES FOR LATENT CLASS 4

STARTING VALUES FOR LATENT CLASS REGRESSION MODEL PART

ALPHA(C)
C#1 C#2 C#3 C#4
________ ________ ________ ________
1 0.000 0.000 0.000 0.000

STARTING VALUES FOR THE CENSORED/NOMINAL/COUNT MODEL PART

NU(P) FOR LATENT CLASS 1
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 -0.321 0.000 -0.321 0.000 -0.321

NU(P) FOR LATENT CLASS 1
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0.000 -0.321 0.000

LAMBDA(P) FOR LATENT CLASS 1
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0.000 0.000 0.000 1.000 0.000
AACAPT1 1.000 0.000 0.000 0.000 0.000
AACAPT2# 0.000 0.000 0.000 1.000 1.000
AACAPT2 1.000 1.000 1.000 0.000 0.000
AACAPT3# 0.000 0.000 0.000 1.000 3.000
AACAPT3 1.000 3.000 9.000 0.000 0.000
AACAPT4# 0.000 0.000 0.000 1.000 5.000
AACAPT4 1.000 5.000 25.000 0.000 0.000

LAMBDA(P) FOR LATENT CLASS 1
QI
________
AACAPT1# 0.000
AACAPT1 0.000
AACAPT2# 1.000
AACAPT2 0.000
AACAPT3# 9.000
AACAPT3 0.000
AACAPT4# 25.000
AACAPT4 0.000

ALPHA(P) FOR LATENT CLASS 1
I S Q II SI
________ ________ ________ ________ ________
1 0.000 0.000 0.000 0.000 0.000

ALPHA(P) FOR LATENT CLASS 1
QI
________
1 0.000

NU(P) FOR LATENT CLASS 2
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 -0.321 0.000 -0.321 0.000 -0.321

NU(P) FOR LATENT CLASS 2
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0.000 -0.321 0.000

LAMBDA(P) FOR LATENT CLASS 2
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0.000 0.000 0.000 1.000 0.000
AACAPT1 1.000 0.000 0.000 0.000 0.000
AACAPT2# 0.000 0.000 0.000 1.000 1.000
AACAPT2 1.000 1.000 1.000 0.000 0.000
AACAPT3# 0.000 0.000 0.000 1.000 3.000
AACAPT3 1.000 3.000 9.000 0.000 0.000
AACAPT4# 0.000 0.000 0.000 1.000 5.000
AACAPT4 1.000 5.000 25.000 0.000 0.000

LAMBDA(P) FOR LATENT CLASS 2
QI
________
AACAPT1# 0.000
AACAPT1 0.000
AACAPT2# 1.000
AACAPT2 0.000
AACAPT3# 9.000
AACAPT3 0.000
AACAPT4# 25.000
AACAPT4 0.000

ALPHA(P) FOR LATENT CLASS 2
I S Q II SI
________ ________ ________ ________ ________
1 0.000 0.000 0.000 0.000 0.000

ALPHA(P) FOR LATENT CLASS 2
QI
________
1 0.000

NU(P) FOR LATENT CLASS 3
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 -0.321 0.000 -0.321 0.000 -0.321

NU(P) FOR LATENT CLASS 3
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0.000 -0.321 0.000

LAMBDA(P) FOR LATENT CLASS 3
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0.000 0.000 0.000 1.000 0.000
AACAPT1 1.000 0.000 0.000 0.000 0.000
AACAPT2# 0.000 0.000 0.000 1.000 1.000
AACAPT2 1.000 1.000 1.000 0.000 0.000
AACAPT3# 0.000 0.000 0.000 1.000 3.000
AACAPT3 1.000 3.000 9.000 0.000 0.000
AACAPT4# 0.000 0.000 0.000 1.000 5.000
AACAPT4 1.000 5.000 25.000 0.000 0.000

LAMBDA(P) FOR LATENT CLASS 3
QI
________
AACAPT1# 0.000
AACAPT1 0.000
AACAPT2# 1.000
AACAPT2 0.000
AACAPT3# 9.000
AACAPT3 0.000
AACAPT4# 25.000
AACAPT4 0.000

ALPHA(P) FOR LATENT CLASS 3
I S Q II SI
________ ________ ________ ________ ________
1 0.000 0.000 0.000 0.000 0.000

ALPHA(P) FOR LATENT CLASS 3
QI
________
1 0.000

NU(P) FOR LATENT CLASS 4
AACAPT1# AACAPT1 AACAPT2# AACAPT2 AACAPT3#
________ ________ ________ ________ ________
1 -0.321 0.000 -0.321 0.000 -0.321

NU(P) FOR LATENT CLASS 4
AACAPT3 AACAPT4# AACAPT4
________ ________ ________
1 0.000 -0.321 0.000

LAMBDA(P) FOR LATENT CLASS 4
I S Q II SI
________ ________ ________ ________ ________
AACAPT1# 0.000 0.000 0.000 1.000 0.000
AACAPT1 1.000 0.000 0.000 0.000 0.000
AACAPT2# 0.000 0.000 0.000 1.000 1.000
AACAPT2 1.000 1.000 1.000 0.000 0.000
AACAPT3# 0.000 0.000 0.000 1.000 3.000
AACAPT3 1.000 3.000 9.000 0.000 0.000
AACAPT4# 0.000 0.000 0.000 1.000 5.000
AACAPT4 1.000 5.000 25.000 0.000 0.000

LAMBDA(P) FOR LATENT CLASS 4
QI
________
AACAPT1# 0.000
AACAPT1 0.000
AACAPT2# 1.000
AACAPT2 0.000
AACAPT3# 9.000
AACAPT3 0.000
AACAPT4# 25.000
AACAPT4 0.000

ALPHA(P) FOR LATENT CLASS 4
I S Q II SI
________ ________ ________ ________ ________
1 0.000 0.000 0.000 0.000 0.000

ALPHA(P) FOR LATENT CLASS 4
QI
________
1 0.000

TECHNICAL 8 OUTPUT

INITIAL STAGE ITERATIONS

TECHNICAL 8 OUTPUT FOR UNPERTURBED STARTING VALUE SET

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.66904778D+05 0.0000000 0.0000000 97.250 97.250 EM
97.250 97.250
2 -0.60970582D+05 5934.1962440 0.0886962 97.248 97.251 EM
97.253 97.248
3 -0.60970569D+05 0.0126947 0.0000002 97.145 97.262 EM
97.376 97.216

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 1

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.65423857D+05 0.0000000 0.0000000 65.990 149.512 EM
95.524 77.974
2 -0.53137657D+05 ************ 0.1877939 32.262 294.741 EM
32.461 29.536
3 -0.51867844D+05 1269.8122781 0.0238967 29.250 301.250 EM
29.250 29.250
4 -0.51887949D+05 -20.1047530 -0.0003876 29.250 301.250 EM
29.250 29.250

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 2

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.72616222D+05 0.0000000 0.0000000 62.402 58.775 EM
51.383 216.440
2 -0.57352476D+05 ************ 0.2101975 30.821 31.049 EM
32.522 294.608
3 -0.51783555D+05 5568.9212783 0.0970999 29.250 29.251 EM
29.253 301.245
4 -0.51888221D+05 -104.6658649 -0.0020212 29.250 29.250 EM
29.250 301.249

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 3

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.67321388D+05 0.0000000 0.0000000 56.303 59.723 EM
141.281 131.693
2 -0.56825460D+05 ************ 0.1559078 30.240 32.082 EM
292.102 34.576
3 -0.52827678D+05 3997.7819975 0.0703520 29.422 29.424 EM
300.708 29.446
4 -0.52901771D+05 -74.0925036 -0.0014025 29.404 29.404 EM
300.787 29.404

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 4

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.59878285D+05 0.0000000 0.0000000 107.659 42.138 EM
115.763 123.440
2 -0.56451827D+05 3426.4580370 0.0572237 124.445 30.052 EM
200.179 34.324
3 -0.54143781D+05 2308.0451419 0.0408852 48.404 29.446 EM
281.704 29.446
4 -0.53064222D+05 1079.5598127 0.0199388 29.473 29.419 EM
300.689 29.419
5 -0.52902861D+05 161.3608945 0.0030409 29.405 29.405 EM
300.785 29.405
6 -0.52903814D+05 -0.9535982 -0.0000180 29.407 29.407 EM
300.779 29.407

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 5

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.66884666D+05 0.0000000 0.0000000 71.464 119.866 EM
110.843 86.827
2 -0.55244074D+05 ************ 0.1740398 31.676 292.003 EM
33.073 32.248
3 -0.52043695D+05 3200.3790473 0.0579316 29.250 301.250 EM
29.250 29.250
4 -0.51888655D+05 155.0405820 0.0029790 29.250 301.250 EM
29.250 29.250
5 -0.51887787D+05 0.8675706 0.0000167 29.250 301.249 EM
29.250 29.250

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 6

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.66852854D+05 0.0000000 0.0000000 100.432 73.852 EM
83.106 131.609
2 -0.57556789D+05 9296.0651102 0.1390526 145.265 31.968 EM
35.019 176.747
3 -0.55004486D+05 2552.3035206 0.0443441 187.576 30.108 EM
30.108 141.209
4 -0.54512109D+05 492.3769669 0.0089516 208.005 29.501 EM
29.501 121.992
5 -0.54150738D+05 361.3708978 0.0066292 289.204 29.420 EM
29.420 40.955
6 -0.53003087D+05 1147.6512364 0.0211936 300.734 29.422 EM
29.422 29.422
7 -0.52901865D+05 101.2214061 0.0019097 300.787 29.404 EM
29.404 29.404
8 -0.52903923D+05 -2.0572518 -0.0000389 300.779 29.407 EM
29.407 29.407

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 7

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.84085127D+05 0.0000000 0.0000000 88.851 60.809 EM
174.555 64.786
2 -0.56756134D+05 ************ 0.3250158 78.879 32.274 EM
244.466 33.381
3 -0.54042887D+05 2713.2471442 0.0478054 41.906 29.752 EM
287.591 29.752
4 -0.52376326D+05 1666.5613406 0.0308378 29.251 29.251 EM
301.246 29.251
5 -0.51905521D+05 470.8049273 0.0089889 29.250 29.250 EM
301.249 29.250
6 -0.51887802D+05 17.7192435 0.0003414 29.250 29.250 EM
301.249 29.250
7 -0.51887783D+05 0.0189753 0.0000004 29.250 29.250 EM
301.249 29.250

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 8

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.71135676D+05 0.0000000 0.0000000 112.422 90.486 EM
94.292 91.800
2 -0.57277354D+05 ************ 0.1948154 211.785 33.742 EM
33.631 109.841
3 -0.53220600D+05 4056.7539099 0.0708265 294.926 29.257 EM
29.257 35.560
4 -0.51906613D+05 1313.9871940 0.0246894 301.250 29.250 EM
29.250 29.250
5 -0.51887819D+05 18.7942098 0.0003621 301.250 29.250 EM
29.250 29.250
6 -0.51887784D+05 0.0343058 0.0000007 301.249 29.250 EM
29.250 29.250

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 9

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.78133250D+05 0.0000000 0.0000000 59.251 59.376 EM
77.906 192.468
2 -0.56783824D+05 ************ 0.2732438 32.630 32.635 EM
40.482 283.254
3 -0.52338884D+05 4444.9408355 0.0782783 29.250 29.250 EM
29.250 301.249
4 -0.51905364D+05 433.5195405 0.0082829 29.250 29.250 EM
29.250 301.249
5 -0.51887812D+05 17.5517942 0.0003381 29.250 29.250 EM
29.250 301.249
6 -0.51887783D+05 0.0294882 0.0000006 29.250 29.250 EM
29.250 301.249

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 10

ITER LOGLIKELIHOOD ABS CHANGE REL CHANGE CLASS COUNTS ALGORITHM
1 -0.76622768D+05 0.0000000 0.0000000 71.071 64.460 EM
123.621 129.848
2 -0.98855630D+05 ************ -0.2901600 90.931 104.362 EM
113.457 80.250

FINAL STAGE ITERATIONS

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 7

7 -0.51887783D+05 0.0189753 0.0000004 29.250 29.250 EM
301.249 29.250
8 -0.51887783D+05 0.0000000 0.0000000 29.250 29.250 EM
301.249 29.250

TECHNICAL 8 OUTPUT FOR STARTING VALUE SET 9

6 -0.51887783D+05 0.0294882 0.0000006 29.250 29.250 EM
29.250 301.249
7 -0.51887783D+05 0.0000010 0.0000000 29.250 29.250 EM
29.250 301.249
8 -0.51887783D+05 0.0000000 0.0000000 29.250 29.250 EM
29.250 301.249

SAVEDATA INFORMATION

Order and format of variables

AACAPT1 F10.3
AACAPT2 F10.3
AACAPT3 F10.3
AACAPT4 F10.3
ID F10.3
CPROB1 F10.3
CPROB2 F10.3
CPROB3 F10.3
CPROB4 F10.3
C F10.3

Save file
I:\MyFiles\Trajectories\AA-Tx-Careers\Rep-Orig-Traj\output.out

Save file format
10F10.3

Save file record length 1000

Beginning Time: 17:18:36
Ending Time: 17:18:53
Elapsed Time: 00:00:17

MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA 90066

Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com

Copyright (c) 1998-2004 Muthen & Muthen

Linda K. Muthen posted on Friday, July 14, 2006 - 3:40 pm

There have been changes to the Poisson algorithm since Version 3.01. You should upgrade to the most recent version of Mplus. I think your problems may be solved by this.

Elia Femia posted on Thursday, August 31, 2006 - 3:36 pm

Referring to the post from March 8 2005, I'd like to clarify the meaning a significant estimated si mean. In this part of the model, we are predicting the probability of being in the zero class? If I have group membership as a covariate (0=control and 1=treatment), and the si coefficient is negative and significant, is that interpreted as the control group having a higher probability of being in the zero class over time? And what if, in addition, the qi coefficient is also significant (and positive)?

Thank you for your help.

Bengt O. Muthen posted on Thursday, August 31, 2006 - 6:43 pm

That post referred to zero-inflated Poisson modeling. So this is a 2-class model in line with the Roeder et al (1999) JASA article. One class of people follow the regular Poisson where the dependent variable is the log rate for the counts. The other class of people can only have zero counts and here the dependent variable is the probability of being in this zero class. See ex 6.7 in the User's Guide. "si" in that notation refers to the inflation part and is the slope in the growth model for changes over time in individual probabilities of being in the zero class. A negative significant si slope mean implies that the probability goes down over time. Regressing si on a covariate, you have 2 parameters: the intercept and the slope in this regression. If you are saying that this latter slope is negative, then yes your interpretation is correct. I won't answer the qi part because it is not clear if you refer to the mean of qi or the regression of qi on the covariate - in any case, it is always difficult in any growth model to single out effects on linear and quadratic slopes.

B Lee posted on Tuesday, March 27, 2007 - 8:22 pm

Regarding above message, with ZIP in a growth model over several time points: does ZIP assume that a portion of the sample will be zero at every single time point?
OR: at each time point, there are more individuals at zero than expected for a regular poisson, but not necessarily the same individuals at zero at every time point?

My data reflects the latter situation. Trying to fit a ZIP model results in the mean of the growth inflation term to be zero at each time point. Does this suggest Poisson (without inflation) would be a better fit?

Bengt O. Muthen posted on Wednesday, March 28, 2007 - 9:14 am

On your first paragraph, ZIP growth modeling let's people move in and out of zero across the time points, so it is not necessarily the same individuals at zero at every time point. ZIP mixture modeling can be used to in addition specify a zero class throughout if you want that.

Inflation estimates close to -15 suggest that inflation is not needed. I don't know why you would get (exactly) zero values at each time point unless you inadvertently specify that.

Michael Hallquist posted on Tuesday, May 22, 2007 - 8:03 pm

With a ZIP latent growth model, I am somewhat confused about the use of random effects versus fixed effects. I am fitting a ZIP LGM with three time points. The slope parameter for the binary part is negligible, so I dropped it such that the binary part is a means model, not a growth model.

When I freely estimate variances for both count and binary intercepts, I get a singularity of the information matrix (possible underidentification?). When I constrain the variance for the binary intercept part to zero (making it a fixed effect), the estimated inflation probability across waves is 9.4%. When I constrain the variance for the count part to zero (fixed effect), the estimated inflation probability jumps to 48.3%. Reading over Hall's (2000) article on ZIP and ZIB regression with random effects, I am inclined to allow the random intercept effect for the count part of the model.

The AIC favors the count random intercept model over the binary random intercept model. But what can I make of the vast differences in inflation probabilities? And, do I really need the inflation component? The non-inflated model AIC is basically comparable, but there is a preponderance (~50% at each wave) of zeros in the data, suggesting the utility of zero-inflation.

Thanks for your help.

Linda K. Muthen posted on Wednesday, May 23, 2007 - 6:06 pm

I would look at the following models:

1. Count model without inflation and random intercept and slope growth factors - see BIC
2. Count inflated model with no growth model for the inflated part of the variable and random intercept and slope factors for the continuous part of the variable -- is BIC better or worse? If worse, work with the count model without inflation and adjust steps 3 and 4.
3. Count inflated model with a growth model also for the inflated part but fixed effects for both the intercept and slope growth factors for the inflated part and random intercept and slope growth factors for the continuous part of the variable -- how does BIC compare to 1 and 2.
4. Count inflated model with a growth model for the inflated part and random effects for both the intercept and slope growth factors for both growth models -- I think this is where you get singularity -- what exactly does the message say?

Regarding the probabilities, if you are computing these for a model with random effects, I think they will be incorrect because they cannot be computed with numerical integration.

Michael Hallquist posted on Wednesday, May 23, 2007 - 9:19 pm

Dr. Muthen,

Thanks for your useful reply. I fit the model without inflation (Model 1 in your reply) and an inflated model without a growth model for the inflated piece (Model 2). The BIC for the non-inflated model is 1105, whereas the BIC for the inflated model without growth for the inflation part is 1115. So, it looks like the inflated model is not improving things (residuals look similar for Model 1 and Model 2). The thresholds for Model 2 are -15, -2.0, and -3.7 for the three waves, yielding low inflation probabilities (.00, .12, and .02, respectively). To clarify, when you mentioned that inflation probabilities will be incorrect for random effects models (because cannot be computed by numerical integration), does that apply to a model with random effects in the count and/or inflation parts or just random effects in the inflation growth parameters?

Although Models 3 and 4 are probably contraindicated because the BIC favors the non-inflated model, I fit them to learn more about thinking through this problem. Model 3 (fixed effects inflation growth model) yields a BIC of 1113.5 (slightly better than Model 2) and finds inflation probabilities of .05, .06, and .09 across the waves.

For Model 4, I used numerical integration (15 points) and again received the singularity of information matrix message, which was:

Michael Hallquist posted on Wednesday, May 23, 2007 - 9:20 pm

ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE
INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE
MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER
ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED:
1 4 8 9 10 11 12 13 14

Parameter 1 is the inflation intercept (mean) from the nu matrix. Parameter 4 is the slope for the inflation part from the alpha matrix. Parameters 8-14 are the covariances of the inflation intercept and slope parameters with all other growth parameters (Psi matrix), as well as the variances of the inflation intercept and slope.

The output for Model 4 looks severely out of whack (e.g., estimate for inflation slope parameter is 193!). BIC for Model 4 is 1144. I guess model 4 was not meant to be given the best BIC in Model 1. To conclude, it appears that the non-inflated model works the best. Is that accurate, and is there any more to the story that I should be thinking about? Thanks again.

Linda K. Muthen posted on Thursday, May 24, 2007 - 9:06 am

The numerical integration is needed only for the probabilities related to the inflation growth parameters.

Although I don't think you are interested in Model 4, it looks like you have hit an odd solution. You can use the STARTS option in the ANALYSIS command, for example, STARTS = 100 10; This may help.

Note that posts should not exceed one window.

J.W. posted on Wednesday, March 05, 2008 - 8:52 am

I am testing LGM for a count outcome using a zero-inflated Poisson model:
1) I ran the Mplus example program ex6.7.inp. The mean of Ii was 0. Then, I freed the parameter Ii, but the estimated mean of Ii was still 0 (Mplus fixed it to zero to avoid singularity of the information matrix according to the message in Mplus output)
2) I noticed that the data set (i.e., ex6.7.dat) used in Mplus example program ex6.7.inp was generated from Monte Carlo simulation in Mplus example program mcex6.7.inp where Ii was set to 0. Then, I re-set Ii to 0.1 in the program and generated a new data set by re-running the Monte Carlo simulation.
3) I ran the program ex6.7.inp again using the new data set. The estimated mean of Ii was still zero. It seems that Ii was set to 0 by default in Mplus. Is this right?
4) I freed the parameter Ii and re-ran the model. Then, I got the message �...TO AVOID SINGULARITY OF THE...THE FOLLOWING PARAMETERS WERE FIXED: 4�.
However, it was not parameter 4 (i.e., the mean of Ii), but its S.E. was fixed to zero.
Your answers to my questions will be highly appreciated!

Linda K. Muthen posted on Wednesday, March 05, 2008 - 11:21 am

The mean of ii is fixed at zero as part of the growth model parameterization for the inflation part of the model. If you want to free the mean of ii, you must also fix the intercepts of the inflation outcome to zero intead of having them held equal.

J.W. posted on Friday, March 07, 2008 - 10:16 am

Linda, thanks a lot! A few more questions:
1) Holding the intercepts of the inflation outcome equal is actually holding the thresholds equal (threshold=-intercept), right?
2) In regard to interpretations of threshold of the inflation outcome and mean of Ii:
-- Is the estimate of a threshold (e.g., U14#1=-2.139) the negative value of logodds of having extra zeros in the sample at a specific time point (e.g., Time=4)?
-- Is the mean of Ii the negative value of logodds of having extra zeros on average over time?
The model results are:
Ii=-0.162 (estimated by freeing [Ii] and fixing [U11#1- U14#1@0]);
U11#1=-0.262, U12#1=-0.606, U13#1=-0.801, U14#1=-2.139 (estimated by fixing [Ii@0] and freeing [U11#1- U14#1]).
Your help will be appreciated!

Bengt O. Muthen posted on Sunday, March 09, 2008 - 10:31 am

Look at UG ex 3.8 for ZIP regression and its explanation of inflation. In that example the u#1 on x refers to the logistic regression probability of being in the zero class, that is the class that is unable to have positive counts. That class can be seen as the inflation class (extra zeros). Here we estimate a logistic regression intercept, not a threshold. The higher the intercept, the higher the probability. And the higher the logodds for being in the zero class vs the other class. With growth modeling, this translates to an outcome at a certain time point. The mean of Ii is on the same scale as the intercept because Ii takes the role of x in regression (now regression of the count outcome on Ii). - So higher values give higher prob of zero class.

krisitne amlund hagen posted on Monday, March 10, 2008 - 5:10 am

Dear drs. muthen,
I am testing a three-wave SEM model. My independent, wave 1, variable (IV) is a four-group categorical (categories are qualitatively different). It is my understanding that IVs need not be specified as such in the syntax. My model fits very well, and the path from the IV to the DV is significant. I am unsure however, how to interpret the output concerning the regression of
W2_SOCI ON W1_SOCGR being
Est./S.E = -2.762 (p= 0.006).
Does this mean that the greater the group dummy code (1, 2, 3, or 4), the lower the social compentence at w2 (w2_soci)? Because that doesn't make any sense with my categorical variable. I've tried the define command (defining three of the four groups), but it doesn't work. Relatedly, with the new version, estimates are provided for unstand. model results, STDYX Stand., STDY stand., and Std. What is the differnece between all of these? And finally, how can the indicator of a factor be significant, but variance explained in the indicator not? thank you

Linda K. Muthen posted on Monday, March 10, 2008 - 6:45 am

If you have a nominal independent variable, you need to create a set of dummy variables. I'm not sure why this does not work. Please send your input, data, output, and license number to support@statmodel.com for help with this.

Please see the STANDARDIZED option in the user's guide for a description of the various standardized estimates.

The reason the two tests might be different is that one tests if the size of the factor loading is different from zero. The other tests whether the variance explained in the dependent variable is different from zero. The latter is a function of more than the factor loading parameter.

J.W. posted on Tuesday, March 11, 2008 - 2:02 pm

Dear Dr. Muth�n,
You mentioned in your response on March 09, 2008 that Mplus estimates �a logistic regression intercept, not a threshold� in the logit part of the ZIP model. As I recall that Mplus reports threshold instead of intercept for logit model. So, when Mplus reports intercept and when threshold for a logit model or probit model? Thanks!

Linda K. Muthen posted on Tuesday, March 11, 2008 - 2:32 pm

For probit and logistic, we give thresholds. For multinomial logistic, we give intercepts and also for the inflated part of ZIP. As I am sure you know, the threshold and intercept differ only by sign.

Bengt O. Muthen posted on Tuesday, March 11, 2008 - 4:08 pm

Thresholds for observed dependent variables, providing for ordered polytomous variables. Intercepts for latent binary and unordered dependent variables.

Tom Hildebrandt posted on Sunday, March 08, 2009 - 8:53 pm

I am interested in estimating a growth model with a continuous DV and a TVC that is ZIP.

I want to compare the above model to a cross-lagged model.

Is it possible to regress a count variable on a continous outcome?

And if so, can Mplus handle cross-lagged models?

Thanks in advance!

Linda K. Muthen posted on Monday, March 09, 2009 - 10:12 am

Yes to both questions.

Tom Hildebrandt posted on Monday, March 09, 2009 - 8:54 pm

Thank you Linda. I've tried to estimate the first model from above (a growth model where the DV is continous and the TVC is ZIP):

If this is a traditional TVC model:
i s | bmi2@0 bmi3@1 bmi4@2 bmi5@3 bmi6@4;
bmi2 ON x2;
bmi3 ON x3;
bmi4 ON x4;
bmi5 ON x5;
bmi6 ON x6;
i s ON x1 bmi1;

When you make x a ZIP how do you specificy an effect of the inflation part of the model on the DV? When I list x as a count I get just a "bmi on x" parameter but I am also interested in the effect of the inflation on BMI.

Thank you!

Linda K. Muthen posted on Tuesday, March 10, 2009 - 9:02 am

The COUNT option has various settings. See the Version 5.1 Language Addendum for the full set. You can find it at:

http://www.statmodel.com/ugexcerpts.shtml

See also Example 3.8.

Tom Hildebrandt posted on Tuesday, March 10, 2009 - 8:21 pm

Linda,

Thank you for the direction. I understand how to specify the inflation part of the ZIP variable when it is a DV but I get an error when I list x#1 on the right side of the ON statement. I am interested in the effect of the inflation on the continuous DV. Is there a way to do this?

Linda K. Muthen posted on Wednesday, March 11, 2009 - 8:52 am

Please send your input, data, output, and license number to support@statmodel.com.

socrates posted on Sunday, June 07, 2009 - 8:46 pm

Hi

Do you now a reference where I can find the formula to calculate the expected counts based on the growth parameter estimates in a GMM for a count outcome using a zero-inflated poisson model?

Many thanks!

Linda K. Muthen posted on Monday, June 08, 2009 - 11:33 am

See:

Hilbe, J. M. (2007). Negative binomial regression. Cambridge, UK: Cambridge
University Press.

Sarah Pedersen posted on Wednesday, May 19, 2010 - 11:00 am

Dear Dr. Muthen,

I am running a latent growth curve model (6 time points)and then using the intercept and slope parameters to predict the outcome (1 time point) using a ZIP model. I am getting noticably different estimates when I look at the unstandardized versus standardized output. For example:

Unstandardized
Estimate S.E. Est./S.E. P-Value
TDRK29C#1 ON
I -3.521 6.170 -0.571 0.568
S1 -104.887 144.716 -0.725 0.469

Standardized (STDYX)
TDRK29C#1 ON
I -0.174 0.185 -0.937 0.349
S1 -0.958 0.202 -4.754 0.000

Is this ok? Should I report unstandardized results? Thanks!

Linda K. Muthen posted on Thursday, May 20, 2010 - 11:37 am

Unstandardized and standardized coefficients will be different. The amount of difference depends on the standard deviations of the variables involved. If you don't have a reason to use standardized, I would use unstandardized.

LAS posted on Tuesday, June 15, 2010 - 9:57 am

My colleague and I have been running lcga models using 37 waves of count data. The data have a large percentage of 0s at each wave and when 0s are disregarded, the data are highly skewed, even after top coding at 75. We are interested in comparing the results obtained using proc traj and mplus. We ran 2-class zip models in both programs (in mplus fixing the variances of the count and zero inflated portions of the model to 0). The fit in sas is ok, but the fit for the mplus model is very poor, especially for one of the classes (the posterior probabilities are near .4 and the estimated mean trajectory is much lower than the observed mean trajectory) and the model will only converge using mlf. Moreover, in mplus 90% of the cases were placed in one class while the split was 60%/40% for proc traj. Because it is unlikely that the count portion of the model follows a poisson distribution, we reran the lcga 2-class model in mplus using the zero inflated negative binomial model (zinb). The fit was much improved and the percentage of the sample falling into each class was similar to that obtained from the proc traj zip model. I have read that proc traj and mplus will not give you the exact same results for the zip model, but why are they so different? Maybe you could recommend an article that discusses this issue? Most of what I have read says that sas has more flexibility to account for dormancy and exposure time, but provides little elaboration. Thank you.

Bengt O. Muthen posted on Tuesday, June 15, 2010 - 11:38 am

Because the models used in TRAJ are restricted, special cases of the Mplus models you get exactly the same results as in TRAJ when you set up the model correctly in Mplus. This is also true for the zip model. You need to send both the TRAJ and Mplus outputs for the zip model where your two runs disagree so we can see where the input error lies. Please include your licence number. I don't think that sas has more flexibility as you suggest- perhaps you can point me to such written claims.

AeLy Park posted on Friday, August 13, 2010 - 12:40 pm

I tested ZIP model with repeated measures first and then put covariates into the model to predict binary part and count part. Then I got the following warning messages. So the model loose 816 cases when I put the covariates. In the program, I put <Type = Missing;
Integration=5;>

Any more function I need to put into the model? How can I use full information including covariates?

*** WARNING in ANALYSIS command
Starting with Version 5, TYPE=MISSING is the default for all analyses.
To obtain listwise deletion, use LISTWISE=ON in the DATA command.
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 816
*** WARNING
Data set contains cases with missing on all variables except
x-variables. These cases were not included in the analysis.
Number of cases with missing on all variables except x-variables: 7
3 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

Linda K. Muthen posted on Friday, August 13, 2010 - 2:19 pm

Missing data theory does not apply to covariates. You can mention the variances of the covariates in the MODEL command. They will then be treated as dependent variables and distributional assumptions will be made about them. Or you can use multiple imputation to create imputed data sets. I think both approaches are just about the same.

Chien-Ti Lee posted on Friday, October 28, 2011 - 2:22 pm

Hi Drs. Muthen,

I am running a ZIP growth model with KNOWNCLASS option in order to look at the estimated means as well as the estimated probability for the subgroups that I am interested in.

However, I don't receive differential estimated probability for each subgroup. Therefore, I am wondering is there a specific save command or an alternative approach that I can use to reveal the estimated probability for binary outcomes for each subgroup?.

Thank you,

Chien-Ti

Bengt O. Muthen posted on Friday, October 28, 2011 - 8:16 pm

No, but you can compute the estimated probability of being at zero or not from the estimated parameter values. This is done in line with a regular binary logit growth model for being in the zero class (see UG ex for ZIP).

Chien-Ti Lee posted on Monday, October 31, 2011 - 9:40 am

Hi Dr. Muthen,

I don't quite understand. Did you suggest me that I could calculate the estimated probability for time 1 by using estimated mean that is equals 1.191 and the probability of inflation probability = .755 at time 1?

If so, can you point out the formula for calculating Pr(Yit=0)?

I think that Pr(Yit=0)=Pr(zero-class)+Pr(yit=0|non-zero class)*Pr(non_zero class)

So, I can use .755+Pr(Yit=0|non-zero class)*(1-0.755)

I don't know specific for the detailed equation for calculating Pr(Yit=0| non-zero class), is it e^(-1.191)*1.191^0?

Thank you so much,

Chien-Ti

Bengt O. Muthen posted on Monday, October 31, 2011 - 4:24 pm

You asked for "the estimated probability for binary outcomes for each subgroup", so I interpreted that to mean that you wanted the probability of being in the zero class for each subgroup. That is the probability of inflation. See our Topic 2 handout. Your P(Yit=0) is correct, but that is another matter.

Chien-Ti Lee posted on Tuesday, November 01, 2011 - 6:39 am

Hello Dr. Muthen,

I am so sorry for the confusion. I meant to graph the trajectories of the binary outcome (being at 0) for 5 KNOWNCLASS for 20 time points. However, I did not know how to do to request Mplus output it for me. I tried to put "residual" command and also request plot3. However,I only have trajectories of estimated means for each KNOWNCLASS.

Previously, I interpreted your advice to me was to compute the trajectories of the binary outcome for 5 KNOWNCLASS by a) using the estimated means across time points for each KNOWNCLASS, and b)the overall probability of inflation.

I wonder now a) is it one of the correct procedures get trajectories of binary outcome (with and without co-variates), and b) is there a better way to do this?

Many Thanks,

Chien-Ti

Bengt O. Muthen posted on Tuesday, November 01, 2011 - 5:22 pm

If you want to graph the estimated trajectory for the binary part of the ZIP model you can look at page 682 of the V6 UG where the bar (|) function is used in the SERIES option to plot two growth curves, in your case the binary and the count curves.

You get these curves for each KNOWNCLASS.

Melanie Wall posted on Wednesday, November 09, 2011 - 12:36 pm

We are trying to use Model Constraint commands to directly estimate the expected values from a LCGM Zero-Inflated Poisson, but we are not getting the same values as those output by the SERIES option in the Plot command.

Snipit of code...

model:%overall%
i s q|dsmdep12@0 dsmdep13@.1 dsmdep14@.2 dsmdep15@.3;
ii si qi|dsmdep12#1@0 dsmdep13#1@.1 dsmdep14#1@.2 dsmdep15#1@.3
;
s-q@0; ii-qi@0;
%c#1%
[i*-0.4](a1)
[s*-0.3](a2)
[q*1.1](a3)
[si*0.5](a5);
[qi*-0.7](a6);
[dsmdep12#1-dsmdep23#1*2] (a4);
%c#2%
MODEL CONSTRAINT:
NEW (point112 point113);
point112 = (1/(1+exp(a4)))*exp(a1);
point113 = (1/(1+exp(a4+a5*.1+a6*.01)))*exp(a1+a2*.1+a3*.01);

New/Additional Parameters
POINT112 0.071 POINT113 0.067

However, Mplus SERIES gives.
0.00000 0.07990
1.00000 0.07512

Can you help us figure out why the Model Constraint and the SERIES do not agree.

Bengt O. Muthen posted on Wednesday, November 09, 2011 - 1:54 pm

Looks like your Model Constraint statements are correct. (In the Model command I don't think you mean dsmdep23#1*2, but dsmdep15#1*2.)

Please send input and data to Support so we can investigate the discrepancy.

Melanie Wall posted on Wednesday, November 09, 2011 - 2:04 pm

Actually thanks to our diligent colleague, Mei-Chen Hu, we figured out our constraints were wrong. Because we are allowing the intercept of the Poisson part to be random, we need to also include the variability of that intercept when calculating the mean back on the original scale. So below, we label the variance of the intercept as "av1" and then put it into the model constraints...

%c#1%
[i*-0.4](a1);
i(av1);
[s*-0.3](a2);
[q*1.1](a3);
[si*0.5](a5);
[qi*-0.7](a6);
[dsmdep12#1-dsmdep23#1*2] (a4);

MODEL CONSTRAINT:

point112 = (1/(1+exp(a4)))*exp(a1+av1/2);
point113 = (1/(1+exp(a4+a5*.1+a6*.01)))*exp(a1+av1/2+a2*.1+a3*.01);

Now we get the same estimates as the SERIES command.

Bengt O. Muthen posted on Thursday, November 10, 2011 - 11:34 am

I missed that you had specified the intercept growth factor i as random. This means that numerical integration over i is done resulting in the values given in the RESIDUAL output which are then plotted.

Bengt O. Muthen posted on Thursday, November 10, 2011 - 1:15 pm

P.S. I guess your explicit formula gives the same as the numerical integration - I haven't checked by completing the square in the exp.

Seung Bin Cho posted on Monday, February 17, 2014 - 11:16 am

Hello, Dr. Muthen,

I have questions of two-part growth model for zero-inflated data.

Is it right that 1) all cases are included in the logistic part, and 2) the cases included in the poisson part are only who have non-zero responses at all time points?

Thank you for your help.

Bengt O. Muthen posted on Monday, February 17, 2014 - 2:39 pm

Are you asking about two-part growth modeling or zero-inflated growth modeling? These two are different. Or, are you looking at some sort of combination?

Seung Bin Cho posted on Monday, February 17, 2014 - 3:08 pm

Thank you for your response, and sorry for not being clear.
I followed UG 6.7, so I assume it's a zero-inflated growth model.

I have an added question. If I want to compute the estimated probabilities of p(y=0) at each time point from estimated parameters, what's the correct formula?
I tried exp(I+S*time)/(1+exp(I+S*time), but it doesn't seem correct. I think the intercept parameter (not the growth factor I) comes into play, but I haven't figured out how.

I am analyzing substance use data (count) with p(y=0) vary from .5 to .25 over time. Do you think zero-inflated poisson growth model appropriate, or would you recommend other types of model such as two-part growth model?

Thank you very much for your help!

Bengt O. Muthen posted on Tuesday, February 18, 2014 - 3:15 pm

UG 6.7 is a zero-inflated model, so everyone contributes to every part of its estimation. For two-part models only the ones with non-zero response contribute to the continuous part.

It is difficult for you to compute the estimated probabilities because the growth factors are random variables. This means that you can't just insert their means in the formula but have to integrate over their distributions, which Mplus does using numerical integration. I think we print the estimated probabilities.

There isn't a clear choice. If you want to view this as there being 2 types of people who answer zero I would use zero-inflated modeling: Those who do not participate in the activity and those who do but didn't in the time period studied.

Laura posted on Wednesday, March 19, 2014 - 12:15 pm

Hi,

I have a question on LCGA with a count variable that is very skewed: about 80% have zero values in each time point.
I have compared the results of models with zip, zinb and negative binomial distribution. What comes to the BIc values, it seems that "zinb" fits the data best (in 2 to 5 latent class models) and "zip" is the second best alternative,
although the differences in BIC values are quite small. The form of the trajectories is quite different in these models (with zip and zinb). However, both of them make sense substantially.
Is it possible in this case to choose the model (zip or zinb) based on BIC values?

Bengt O. Muthen posted on Wednesday, March 19, 2014 - 4:15 pm

It's hard to make a statistical choice when BIC values are close. You can also look at TECH10 and count the number of significant bivariates; see the below Muthen-Asparouhov chapter where we use TECH10 information for crime curve model fit:

Muth�n, B. & Asparouhov, T. (2009). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.

You say LCGA, which then raises the possibility of generalizing to a GMM.

Laura posted on Tuesday, August 19, 2014 - 7:33 am

Thanks for your reply! Related to the previous post, I would still like to ask about the choice of the distribution. Should the choice be based more on substantial interpretation or on statistics, e.g. BIC values? With negative binomial and zero-inflated negative binomial I get quite similar solutions that are also substantially interesting. With ZIP, the trajectories are also clear, but one distinct group identified by ZINB and NB is missing. The BIC values are, in general, the best with ZINB, but the models do not converge very easily and the best log likelihood value is not always replicated. The BIC values are the second best with ZIP, but the model does not identify the distinct trajectory that was identified with ZINB and NB. With NB the interpretation is good (and the average posterior probabilities are the highest) but the BIC values are the worst. Overall the differences between BIC values are quite small. What do you think is more important in this kind of situation; statistical criteria (BIC, converging) or interpretation?

Bengt O. Muthen posted on Tuesday, August 19, 2014 - 5:35 pm

These are difficult choices. Being able to replicate the best logL many times is important in order to trust the solution. If BIC values are close, I would rely on interpretability and usefulness of the model - for instance by relating the classes to antecedents and consequences.

But you say LCGA - why not GMM? Our Topic 6 handout on our website, slides 127-137 discusses the choices and in particular slide 130 compares GMM and LCGA, with GMM doing better.

namer posted on Friday, April 03, 2015 - 1:49 am

Dear Dr. Muthen(s),

I am running a 5 wave LCGA on skewed count data (roughly 50-60% zeroes at any time point). Additionally, in these models I need to identify a class of people who score zero at every wave.

1. I understand from above that if I want a zero class with the same people at each time point I need to use a fixed zero class as the zero inflation allows people to move in and out at any given wave?

2. Does using a fixed zero class negate the need for zero inflation, or does this depend on model fit?

3. I have variances much larger than my means, does this indicate that negative binomial models would be better suited than poisson models?

4. I am trying to compare poisson, zip, negative binomial and zinb models all with a fixed zero class - can I do this using BIC/AIC etc? Is a larger BIC in the ZI models a true indication of poorer fit, or just that these models are more complex with more parameters than the non ZI models?

5. Finally, the ZINB and ZIP latent growth (non mixture) models have incredibly large values for inflation growth means and variance (i.e. a slope of -35.00 and variance of 1800.00). The intercept means are 0.00 and the intercept variances are even larger (e.g 41,000). How do you interpret such large values? Is this an indicator of a larger problem?

Thank you for your time and help!

Namer

Bengt O. Muthen posted on Friday, April 03, 2015 - 3:45 pm

1. In my experience, BIC typically does not favor a zero class across time. Instead, a solution with a low class (almost zero) comes out as the winner.

We talk about ZIP growth modeling in the video and handout for Topic 6, slides 128-137. For a count outcome U, the inflation is referred to as U#, where u# is a binary latent inflation variable and u#=1 indicates that the individual is unable to assume any value except 0. In the output you see an instance of U# = -15 which means that there is no inflation (prob of being in the zero class for this outcome is zero). Conversely, if you want to force a zero class you use +15 and say [u#@15]; and then also fix any growth factor parameters at zero.

2. If you specify a fixed zero class you are using an inflation model.

3. Variance larger than mean typically calls for an inflation model. This doesn't mean that negbin fits better than ZIP.

4. BIC tends to make good choices. See also Topic 2 for regression examples using BIC to choose among a multitude of models variations.

5. Note that the Topic 6 slides 128-137 don't use a growth model for the inflation part, but simply uses an intercept/mean parameter for it. Use that model first.

Joe posted on Thursday, February 04, 2016 - 3:26 pm

In a ZIP growth model (Ex. 6.7) for the inflation part, if the intercept of the outcome variable (e.g., u11#1) is -1.37, can I interpret this parameter as the probability of 0.25 (e^-1.37) of being unable to assume any value except zero for each time point?

Bengt O. Muthen posted on Friday, February 05, 2016 - 6:09 pm

See our Topic 6 handout from our courses.

Almar Kok posted on Monday, December 12, 2016 - 12:37 am

Dear Dr. Muth�n,

I am in doubt whether to specify skewed variables in an LCGA as censored or let the model assume their distribution is normal.

On the one hand, given the skewness of the variables it seems plausible to define them as censored. On the other hand, in other discussions on this forum you state the following: �If you expect a latent class (mixture) model underlying your data it is natural for you to see non-normal outcomes; that's what the mixture can explain", and "" the skewness is part of what is expected in mixtures and part of what determines the classes.�

My questions are:
1. By these statements, do you mean that I should NOT specify skewed variables as censored in an LCGA?

2. I have compared results from a censored model vs a not-censored model, and they are quite different. The types of trajectories are about the same, but the percentages in the latent classes differ substantially. Also, the entropy in the censored models is quite a bit lower than in the non-censored models. Which one should I choose?

I hope you can provide some guidance regarding these questions.

Many thanks in advance!

Bengt O. Muthen posted on Monday, December 12, 2016 - 4:28 pm

1. Censored is only needed when you have a strong floor or ceiling effect, for example when more than 25% are at the lowest value. This is a more important factor in the choice than the skewness itself.

2. Choose according to the above.

J Jack posted on Tuesday, May 23, 2017 - 6:54 am

Hello,

I am running zero-inflated Poisson LCGM but I get strange results when I check the plot options while the output looks ok. Now I am unsure whether to trust my results. The model convergences, produces classes and gives estimates that seem to make sense.
But the problem is: when I plot the observed means against the estimated means with the plot option, I encounter that: estimated means for all classes lie far outside the possible range of values (all straight lines at value 999 for all classes) while the observed class values are far below in the given range of weeks per year (0 to 52).
furthermore: if I use the numerical integration algorithm (which I suspect can be used in my case), then the plot options to check the distribution of means is not available. Does that mean that with numerical integration it is not possible to visually check how the sample means are distributed around the class means?

Any hints what I could have gone wrong would be greatly appreciated.

Bengt O. Muthen posted on Tuesday, May 23, 2017 - 5:51 pm

Is it an Mplus plot you are looking at or a plot you have made outside Mplus?

Remember that the model consider log(mean) and the plot is for the mean - so an exponentiation is involved.

With random effects - such as growth factors - there is also numerical integration involved in the Mplus plots (your output Summary shows if integration is done).

J Jack posted on Wednesday, May 24, 2017 - 1:20 am

Thanks Bengt.

It is indeed an Mplus plot: the plot of estimated means and observed values and the same happens for the estimated means and observed means plot.
I have been considering that it could have to do with the exponentiation. But I could not find any logical explanation why the log transformation should affect the estimated values differently than the observed values, resulting in the estimated values being far outside the range of the observed values and on the same line for all classes.

As it seems you did not encounter such problems with the plot function before - unfortunately I have no idea what could be wrongly specified in the model in order to achieve that all estimated means would be incredibly inflated while the model still converges and estimates classes as expected.

Bengt O. Muthen posted on Wednesday, May 24, 2017 - 6:02 pm

I would need to try this myself on your data to know exactly what you are looking at. Send files to Support along with your license number.

John C posted on Thursday, April 02, 2020 - 5:37 pm

Hello I am testing a ZIP GMM model with several latent classes and have stable meaningful results. However, if I run using two-part semicontinuous the BIC's are of an order of magnitude lower, unless I specify "transform = none." From my understanding, the two-part approach uses a logged version of the outcome in addition to the binary part by default, so the BIC should be directly comparable with a ZIP model by default, but this is not what I'm observing.

Further, if I plot the outcomes, the counts from the ZIP version are in the original scale for the non-binary process (a count of 1 to 12) whereas when using a two-part model the plot appears to be in the log scale.

Could you kindly clarify?

Bengt O. Muthen posted on Saturday, April 04, 2020 - 11:11 am

The two-part model has 2 DVs so its loglikelihood is not on the same scale as the ZIP. We discuss this for regression in our RMA book, chapter 7.

Derek posted on Thursday, April 09, 2020 - 2:01 pm

Hi Dr. Muthens,

Thanks for answering my questions below and hope you are doing well.

I have some questions regarding how to understand the inflation part of a zero-inflated Poisson model (EXAMPLE 8.11: LCGA FOR A COUNT OUTCOME USING A ZERO-INFLATED POISSON MODEL). My understanding is that the inflation part is used for modeling the probability of extra-Poisson zeros.

1) How can we tell if we actually need to include the inflation part for any specific one group or all groups when estimating LCGA using a zero-inflated Poisson model? In other words, when LCGA using a Poisson model WITHOUT zero-inflation is adequate?

2) How can we tell if we need separate inflation part parameters for each group (i.e., adding [ii si] in example 8.11) or just one inflation part parameter across all groups (the default in example 8.11)?

3) How to determine the best polynomial function for the inflation part parameter(s)?

Bengt O. Muthen posted on Thursday, April 09, 2020 - 4:21 pm

First, see our Short Course Topic 6 video and handout, slides 127-136. See also our RMA book, Chapter 6.

John C posted on Thursday, April 09, 2020 - 5:51 pm

Hello,
I would just like to follow up on your response above "The two-part model has 2 DVs so its loglikelihood is not on the same scale as the ZIP. We discuss this for regression in our RMA book, chapter 7."

As it turns out, I was led to the consider the two-part model after reading chapter 7 of the RMA book. There you have a table (6.1) that lists the BIC for among others the ZIP model and the two-part (hurdle) model.

So just to be clear, are you saying that even though the ZIP involves two regression equations (p. 262), you do not regard this as having two DVs whereas the two-part model does (have two DVs)? Just want to be clear because in that chapter, it is also stated that "The two-part model is an alternative to the zero-inflated model" (p. 263).

I'm happy with my results from the ZIP model but I wanted to test some other options, but the two-part alternative seems pointless if you're saying the BICs are not comparable.

Bengt O. Muthen posted on Sunday, April 12, 2020 - 12:08 pm

I think my answer to you wasn't the best. You initially mentioned the two-part semicontinuous model of Chapter 7 for which one of its 2 DVs is continuous. Its logL/BIC cannot be compared to the zero-inflated Poisson (ZIP) of Chapter 6 because the DV is a count so describing probabilities and therefore logL is on a different scale. The Chapter 6 Table 6.1 compares various count models. At the bottom is a two-part (hurdle) model. All of those Table 6.1 models have comparable loglikelihoods and therefore comparable BICs. Tables 6.7 and 6.8 show two different ways to analyze this two-part model. But none of the 2 parts of the model analyzed in Table 6.8 involves a continuous variable like two-part in Chapter 7. It is interesting that the Table 6.7 and 6.8 loglikelihoods come out to be the same despite differing in the number of DVs. But both represent probabilities of the one count outcome; just splitting the counts in two.