PSMG Meeting, Feb. 13-15, 2002 - Current Research Interests

My current statistical interests fall in two broad categories. The first is the analysis of correlated times-to-events, such as arise when one samples families and observes age at disease onset (say) in each of several members per family. Second, and perhaps more relevant to the upcoming meeting, I've been studying hierarchical and latent variable models. Much of my motivation springs from the diverging opinions of many quantitative researchers in substantive fields, versus many generic statisticians and biostatisticians, as to the usefulness of these models. I believe that the models are useful because they elucidate the extent to which each outcome indicator reflects influences other than the intended aspect of health or behavior; describe heterogeneity within populations; and incorporate theory in relating underlying health status to the measured indicators. However, the models rely on assumptions that may be hard to verify but may materially affect analytic findings. To meet this criticism, my work over the past few years has developed model checking procedures that identify how each of a LV model's statistical assumptions may be contradicted in data being analyzed. My current work is focusing on aspects of the criticism that I believe are not met by model checking: (i) elucidating the degree to which data as opposed to model assumptions identify parameter estimates; (ii) developing methods to describe the set of models that are consistent with a given data set; and (iii) investigating and quantifying the degree to which latent variable analysis can be well approximated with analyses that do not rely on latent variables. To exemplify (i): a class of models for analyzing data sets that have some covariates non-ignorably missing assumes that the covariates are distributed as multivariate normal. Model parameters are identifiable, but (very roughly) by essentially imputing the missing data so that the joint distribution of observed and missing is normal. Thus, the normal assumption almost entirely serves to identify the parameters. My current substantive interests are primarily in human aging and adolescent health. In aging, I've been studying the etiology and course of frailty and physical disability in older adults. In adolescent health, I've been involved in a study of how neighborhood factors and parenting interact in influencing adolescent behaviors. In both cases, constructs of interest as outcomes and as predictors are difficult to measure precisely, and there is substantial heterogeneity in individual trajectories and within neighborhoods, so that hierarchical and latent variable models usefully describe the data to be analyzed.

Hendricks Brown, USF

Hendricks Brown is involved in developing designs and analyses for preventive field trials across the prevention research cycle from the pre-intervention, efficacy, effectiveness, and dissemination and implementation phases. He has also examined the breadth of the field of prevention, first through identifying commonalities and differences between prevention perspectives that focus on mental health, drug use, HIV, suicide, delinquency and violence, and secondly by quantifying those elements of design, analysis, and measurement that lead to valid scientific evidence. Several cross-cutting themes have also been important-a population based or public health approach to prevention, examining the effects of an intervention on differential developmental pathways, and incorporating ecological or contextual effects of both the intervention and the natural environments across the life span. His methodologic interests include procedures for missing data, ranging from handling selection bias, participation bias, and attrition, as well as methods for growth modeling and design of randomized field trials. Much of his experience comes from school-based randomized preventive trials that target classroom behavior and reading achievement. Recently he has been interested in examining the effects of universal interventions on low-base rate disorders such as suicide, schizophrenia, and substance abuse. Specific areas of interest include:

• Handling nonignorable missing data in intervention trials.
• Extensions of pattern mixture modeling of missing data to longitudinal data.
• Diagnosing model inadequacies in growth mixture modeling.
• Identifying mixtures from behavioral observation data.

Getachew Dagne, USF

My research interest is to develop models for analyzing higher dimensional behavioral observation data using Bayesian approach. Bayesian modeling facilitates incorporation of various sources of variation for small sample cases, or sparse data. Summarization of effects involving behavioral data (e.g., couple interaction) helps develop developmental models that relate subject-specific measures derived from observational data to antecedents and consequences including both models of mediation and moderation. Computational and model selection issues are also discussed.

Dan Feaster, Univ. of Miami

Daniel Feaster has general interests in longitudinal and other multi-level data analysis and trial planning. Substantive areas of application include interventions for the HIV infected and their families to improve adaptation and medication adherence as well as trials for HIV and drug abuse prevention and treatment. His recent research includes jointly modeling the stress process of individual family members to uncover systemic affects of the family on these individual stress processes. This work creates family means of the stress process variables and includes these along with the individual's deviation from the family mean as predictors of outcome. Differences in the responsiveness to the family mean and the individual deviation from the mean are indicative of a systemic effect of the family on the individual. Additional methodological interests include procedures for accounting for informative missing data (particularly differential drop-out across conditions), and variability in effect sizes across different sites (or other levels) in trials.

Paul Greenbaum, USF

During the last year, I have been involved with two studies using GGMM. One study was a quasi-experimental evaluation of a services intervention program for children with serious emotional disorders, and the other study examined the etiology of drinking during the first year of college. I would be interested in discussing with the group some of the issues and problems that were encountered in implementing GGMM analyses with these data. Potential discussion topics in implementing GGMM are described below.
Fitting a conventional latent growth curve model. This procedure worked very well. Strengths include:
	• able to fit nonstandard models, • handle large number of repeated measures (>30), • modification indices supply useful diagnostic information about error structure over time.
Enumerating latent classes. Procedure provides powerful tool to cluster individual growth trajectories in theoretically meaningful ways (e.g., strong vs. weak growth, initiators vs. desisters). A number of recurring patterns across the different data sets were observed:
	(a) model fit was always improved by allowing for multiple classes; (b) as the number of potential classes were increased, the number of potentially testable models increased geometrically, and (c) as the number of classes were increased, among successful models, the number of random parameters decreased. These patterns suggest expanding our understanding of how variances are modeled (random vs. fixed, freely estimated vs. invariant across classes) and their linkage to substantive theory, and the need to assess when the model is overfitted vs. theory-driven.
Assessing the role of theoretically interesting covariates. Have had difficulty in achieving a proper solution when regressing some covariates between-classes. Convergence problems/improper solutions may be a function of sample size, the model, or distributional characteristics of the covariate (low frequencies). Unfortunately, those covariates that have been problematic also have been the most interesting theoretically. Among within-class covariate analyses, small sample size and large numbers of covariates also have been problematic. Propensity scoring was explored as a solution.

George Howe, George Washington University

I am currently involved in three research projects for which the work of the PSMG is relevant.

1. Study of couples interaction using microcoded behavioral observations. This is part of a study of how couples respond in the face of job loss by one of the partners. 254 couples were videotaped for 15 minutes while discussing a problem, and each behavior in each interaction sequence was microcoded. I have been particularly interested in studying contingencies among behaviors in these sequences. Getachew, Hendricks, Bengt, and I have been working on Bayesian and empirical Bayesian random effects approaches for modeling the hierarchical structure in first order contingencies in these data, and have a paper in press on modeling contingencies in two by two tables (involving two antecedent and two consequent codes). I am interested in a number of extensions of these models, including: ways of studying structure when coding involves several behavioral categories, ways of studying structure when two different coding systems are applied to the same set of behaviors, and applications for studying higher-order Markov processes involving second or third-order processes.
2. Randomized prevention trial targeted to reduce risk for depression in couples following job loss. This trial involved a collaboration with Rick Price and Amiram Vinokur at the University of Michigan PIRC. We accrued a sample of 1477 couples in the greater Baltimore and Detroit areas, and randomly assigned them to intervention or control conditions. This study was one of the first to use a community sample with a stressor-based sampling frame and a prevention program requiring the involvement of both members of the couple in an intent-to-treat design. Participation in the intervention itself was low, with only 30% of the assigned couples actually participating. In addition, our initial analyses indicated that participation in the intervention group was differentially predictive of later continuation in follow-up data collection, both directly and in interaction with baseline characteristics. (Liz Ginexi and I have just submitted a paper for review that details these findings). This poses major challenges to the assumptions of both standard ITT analyses and those using CACE. This has led to an interest in statistical methods that can handle nonignorable missing data in a CACE framework.
3. Longitudinal study of development of coping in children whose parents have become unemployed. This study of risk and protective process, involving a collaboration with Tim Ayers and Irwin Sandler at the ASU PIRC, and Nick Ialongo at the Hopkins PIRC, involves a four-wave longitudinal design, tracking children for 18 months after accrual. It focuses on children's coping with major stressful events that occur in the aftermath of parental job loss, as well as family factors that may facilitate or inhibit productive coping. I am interested in using growth mixture modeling to study patterns of change in internalizing and externalizing symptoms over the four time points.

Alka Indurkhya, Harvard

My current research interests are to apply general growth mixture models developed by our PSMG colleagues to mental health service use data. I am also developing alternate frameworks that include developmental trajectories for conducting economic evaluations of school based, and community based preventive interventions. My current psychometric interests include:
	(a) addressing power and sample size issues in person oriented analyses, (b) using latent growth mixture models to assess item relevance to factors in developmental psychopathology.

Booil Jo, UCLA

The general theme is estimating efficacy of intervention trials accounting for subpopulation heterogeneity including compliance (adherence) types. JHU PIRC cohort 3 and JOBS II data are mainly used.

1. Accounting for treatment assignment effects. First, I explored bias mechanism when assignment effects are ignored in estimating intervention effects, and second, I came up with alternative models that will allow assignment effects. Key words: CACE estimation, exclusion restriction. Draft available for both issues.
2. Based on topic 1, I am planning to expand CACE methods to estimate dosage effects. To begin with I will examine already existing, and possibly applicable methods. Then, I will explore both ML-EM and Bayes-Gibbs approaches to solve this problem. Key words: dosage effects, nonlinear treatment effects.
3. Simultaneous modeling of nonignorable missing data and non-adherence. With Bengt Muth?n and Hendricks Brown. This project is based on the idea that adherers and non-adherers may show different nonresponse (attrition, dropout) rates at later follow-ups in longitudinal intervention trials. Key words: nonignorable missingness, missing at random, latent ignorability.
4. Multilevel CACE modeling. Intervention trials often suffer from both non-adherence and clustering of data. The goal is to estimate correct parameter estimates/standard errors and to examine inferential issues related to intervention protocols. With Bengt Muth?n, Nick Ialongo and Hendricks Brown. Key words: intra class correlation, sandwich estimator, adherence, implementation, multilevel mixtures. Working paper.
5. Statistical power and design issues. What affects power, how to improve power, how to reduce cost given various complications in intervention trials. Key words: covariate information, outcome distributions, study design, power estimation. Draft available regarding power and non-adherence issues.

Andreas Klein, UCLA

1) Moderator Models & Elementary Latent Interaction Models Estimation and Interpretation of Latent Interaction Effects. Application of the LMS Estimation Method. Klein, A. & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65 (4), 457-474. Klein, A. (2000). Moderator Models. Methods for the analysis of moderator effects in structural equation models. Monograph (in German)
2) Complex Nonlinear Structural Equation Models. Estimation of Multiple Latent Interaction and Quadratic Effects Identification of latent Confounding Variables in context of Causal Modeling. Development of Fit Measures for Latent Interaction Models
	- Klein, A. & Muthen, B.O. (under review). Quasi Maximum Likelihood Estimation of Structural Equation Models with Multiple Interaction and Quadratic Effects.
3) Heterogeneous Growth Curve Models Modeling Heterogeneity of Development of Subgroups on the Latent Variable Level. Identification of Heterogeneous Subgroups in Longitudinal Designs
	- Klein, A. (in prep.). Modeling Heterogeneity in Growth Curve Models.
Computer Programs:
	1. LMS 1.2: Elementary Latent Interaction Models. 2. QUASI-ML 1.0: Complex Nonlinear Structural Equation Models, Multiple Interaction Effects, Latent Confounder Testing 3. HGM 1.0 : Heterogeneous Growth Curve Modeling. Modeling of heterogeneous
Developments. Current Applications Drug Abuse Data, Cross-sectional Study. Depression Data, Longit. Study. I'm particularly interested in new applications of the prototype computer programs and the newly developed methodology.

Klaus Larsen, UCLA

I use Cox' proportional hazards model for the analysis of survival data in situations, where time to event is measured continuously in time. I am particularly interested in models, which have a latent variable among the predictors of survival. This latent variable can be either continuous or a class variable, and it is measured by a number of ordinal items. The work includes the development of estimation algorithms (maximum likelihood estimates by EM), methods for evaluation of model fit (graphical and formal tests), and actual illustration using real data (relationship between physical function and death - data from Johns Hopkins). I am currently working on two papers, one with a latent class variable as predictor of survival, and another one with a continuous latent variable as predictor of survival. In short, the scope of the papers is to bring the Cox model and factor analytic models together and to solve the statistical and interpretational aspects of this new model. Perspectives/extensions: competing risks, time-varying covariates.

Gitta Lubke, UCLA

My general interest is the analysis of heterogenous populations using latent variable models. The heterogenous populations I am interested in consist of a small number of groups, where group membership is known for all subjects, or latent classes, where class membership is at least partially unknown. The latent variable models I have considered so far are mainly factor analysis models. More specifically, I am interested in four issues or areas.
	(1) measurement invariance (MI) To compare groups or latent classes with respect to factor means it is necessary to first investigate whether the test or questionnaire measures the same factors across these groups or classes. In the context of confirmatory factor analysis, investigations of MI are carried out by restricting the factor model to have equal intercepts, factor loadings, and residual variances across groups or classes. The fit of the restricted model is compared to the fit of a more lenient model. Adequate fit of the restricted model has a number of interesting implications.
	(2) factor mixture models Factor mixture models are models for the analysis of test or questionnaire data in case it is not known which test taker belongs to which of a small number of latent classes. Along with the factor model that is estimated for each of the classes, the model assigns each test taker to the most likely class. Problems arising when fitting these models can be related to empirical identifiability. I have looked at some factor mixture models in more detail to find out what might cause identifiability problems and how to improve the results when fitting these models.
	(3) categorical outcomes Likert scale data running for example from 'strongly disagree' to 'strongly agree' are ordered categorical outcomes. However, these data are often analyzed with models for continuous (e.g., multivariate normal) data. My presentation on Wednesday concerns the effects of analyzing Likert scale data with factor models (including growth models) when the interest is in the comparison of several groups or latent classes.
	(4) analysis of genetic data Recently, genotyping is added to ongoing studies or included in new studies. This leads to a situation where the number of variables is often much larger than the number of subjects. Usually, the interest is to find out which possibly interacting genes are contributing to some behavioral outcome such as alcohol dependance. The number of candidate genes investigated in present studies is usually small (e.g., smaller than 5). Trying to connect larger numbers of genes to behavioral outcomes requires a different statistical approach. That approach may consist of combining data mining techniques such as mixtures of factor analyzers with traditional factor analysis models.

Hanno Petras, John Hopkins

My current interests are in the area of developmental psychopathology in childhood and early adulthood and its prevention. As it is laid out in the Lifecourse/Social Field theory, I view human development as a staged and cumulative sequence of individual responses to field specific demands, which may vary in their level of success. Unsuccessful responses to specific demands may then increase the risk for later maladaptive outcomes, such as school expulsion, Antisocial Personality Disorder, or juvenile and adult arrest. Importantly, early antecedents (risk factors) of these negative outcomes are viewed as potentially malleable targets for preventive interventions. For this research interest I have predominantly used longitudinal Data from the Baltimore Prevention Program (PI: Sheppard Kellam), a randomized community-epidemiological preventive intervention trial of Baltimore City Public School children. This conceptual framework in combination with a strong interest in the new modeling opportunities implemented in Mplus have resulted in several research projects, which are summarized in the following paper drafts:
	Paper 1: Developmental Antecedents and Malleability of Antisocial Personality Disorder: Long-term Effects of a Universal Classroom Based Preventive Intervention
	Paper 2: Aggression, Poverty, and school removal: An analysis of Moderation/Mediation in Mixture Survival Analysis
	Paper 3: Specificity/Specificity of predicting Violent Juvenile Arrest, using Teacher rated levels of aggression All three papers, at varying levels of completion contribute to two predominant topics in Growth Mixture Modeling, which are the Examination of Growth Heterogeneity and Time-to-Event data in growth modeling.

Katherine Masyn, UCLA

My general areas of interest are longitudinal data analysis and finite mixture modeling. I have collaborated with Bengt on a paper (currently in revision) on discrete-time survival mixture analysis (DTSMA) using an LCA framework. I am working right now on extending that work to include recurring event data. I am using survival data provided by Bill Fals-Stewart, a Senior Research Scientist at the Research Institute for Addictions at SUNY-Buffalo. My work with Bill includes finding ways to apply new methods in longitudinal analysis to his randomized intervention studies of drug addiction; his main focus has been on the interplay between drug abuse and violence in couples. He is also looking at the relationship between drug abuse and work absence over time.
Extending my work on discrete-time survival mixture analysis, I plan to explore the following issues: model identification and stability for k>1 classes, evaluation of overall fit, power, competing events, and simultaneous modeling with parallel and sequential growth processes.
I have also been working for a while on the issue of latent class enumeration in the general growth mixture modeling (GGMM) framework, with mixed success (no pun intended). I continue to pursue that vein as well.
Finally, I have always had an interest in propensity score adjustment and causal inference in quasi-experimental and observational studies. I plan to more actively pursue such topics as they related to GGMM and DTSMA in this new year.

Bengt Muthén, UCLA

1. Explorations of substantive examples of growth mixture modeling, particularly for randomized trials. One current example is Andrew Leuchter's UCLA depression research on placebo response, see http://www.MentalHealth.ucla.edu/cgi-bin/av-npi-rs8?gr020205al
I have started to analyze these data with growth mixture modeling using latent trajectory classes and the method looks promising, although the current sample is small (n=51) producing low power given the amount of variability. A brief summary may be useful given that this may have general interest. In the current data, there are 2 pretreatment measurement occasions (baseline and 1 week) with follow-ups at 48 hours, 1 week, 2 weeks, 4 weeks and 8 weeks. Treatment is placebo or medication. Primary outcome is the Hamilton depression score, but also brain activity measures (QEEG recordings). Using growth mixture modeling in the control group suggests two distinct classes of development after the treatment has started and in line with Leuchter's research they can be characterized as placebo responders and placebo nonresponders. Given randomization, the same two classes can be sought in the treatment group, allowing for change in slopes due to treatment. Brain activity measures are promising for distinguishing among subject classes already before treatment. I am seeking other substantively well-motivated applications.
2. Two-part growth mixture modeling for data with a preponderance of zeroes (floor effects), e.g. when studying early development. This connects with the substantive interests of for example Mike Stoolmiller and Jim Snyder. Draft available.
3. Growth and time-to-event (survival) analysis combined for studies of onset and subsequent development. Data being sought.
4. Non-ignorable missing data using latent variable modeling (with Hendricks). Applications include growth mixture modeling. Also connects with "terminal decline" issues in aging research (Scott Hofer Penn State data). Draft available.
5. Generalizations of latent variable modeling to combinations of complexities not in the current Mplus program, such as categorical outcomes, missing data, random effects, mixtures, multilevel. For example, growth mixture modeling with categorical outcomes. Ongoing with the Mplus group.
6. Non-ignorable missing data CACE modeling.
7. Multilevel CACE modeling.
8. Assessing model fit in mixture models. Much research has focused on comparing fit for models with different number of classes, while less attention seems to have been paid to the fit to data. Connects with Karen's, Hendricks' and Chen-Pin's diagnostic work.
9. Genetic modeling related to development of alcohol problems and conduct disorder. Planned collaboration with Bob Zucker and genetic researchers connected with the Univ of Michigan alcohol research center.

Jim Snyder, Wichita State University

Mike and I share common interests as described in his attachment in terms of the OZ project. I would add the following additional interests:
	1. measuring and modeling growth in behavioral phenomena when they first emerge developmentally - in particular - sneaky, surreptitious or covert antisocial behavior (e.g., steals, lies, cheats, drug experimentation) 2. modeling growth in antisocial behavior across different settings (e.g., home, playground, classroom) with confounds of setting and informant/method sources of variation.

Mike Stoolmiller, OSLC

I am involved in two different research projects. I will list them in order of level of my involvement in terms of FTE and mention the models of main interest. The "Oz" grant (so called because it is from Kansas, as in the movie, "The Wizard of Oz"). This project involves Jim Snyder and Jerry Patterson. The sample is 3 consecutive kindergarten cohorts (total N=250 families) from a school in Wichita, Kansas, which serves an urban neighborhood that has high levels of social disorganization (poverty, broken families, etc). Extensive parent-child (2, 2 hour occasions) and peer-child (5, 30 minute occasions) social interaction data was collected via videotape and coded with both the family-peer process (FPP) code and the specific affect (SPAFF) code. In addition, multi-method and multi-informant outcome data on child antisocial behavior was collected twice during both kindergarten and first grade (4 repeated assessments). The aim of the study is to test 3 different theories of the development of antisocial behavior, coercion theory (focus on negative reinforcement), cognitive theory (focus on social information processing) and emotion regulation theory (focus on the regulation of negative emotions). Models of interest:
	1) Growth models of antisocial behavior, 2) Growth mixture models of antisocial behavior, 3) Multi-level log-linear models, 4) Multi-level continuous time proportional hazards model.
The analytic challenges are to incorporate information from the coded, social-interaction data as predictors of growth in antisocial behavior. Negative reinforcement is a key predictor and our current definition of negative reinforcement is the log odds of a child ending a conflict episode(either with parent or peers) with an aversive behavior. Growth over time in antisocial behavior is our key outcome. Statistically efficient models that can incorporate both our key predictors and outcomes are of critical interest. The Colorado Longitudinal Youth Study (CLYS). This project involves Elaine Blechman at the University of Colorado. The sample is consecutive juvenile referrals to the Boulder County Justice Center, in Boulder county, Colorado, for 5 years (N=505). The entire life history of arrest events was determined and an extensive battery of psychological tests relevant to various theories of delinquency was administered to each youth and the youth's parent or guardian. The aim was to test competing theories of delinquency. Models of interest:
	1) Growth mixture models of the annual frequency of arrest, 2) Continuous time proportional hazards models of recidivism, first re-arrest and all re-arrests. 3) Continuous time proportional hazards models with random effects or mixtures.

Beth Vanfossen, Towson University

The current research of my colleagues at Towson, JHU, and AIR and I focuses on the impacts of community context, family structure and dynamics, classroom interventions, and gender and other child demographics on the development of aggressive behavior of children. The collaborative team also wants to explore not only what are the characteristics of neighborhoods that are consistent with the development of positive social adaptations in children and adolescents, but also if two first-grade classroom interventions which have been found to affect the developmental path of children may help children cope with economic difficulty and neighborhood crime and violence. The neighborhood (census tract) measures of neighborhood employment, income, and violence come from the U. S. census and police records. The developmental data come from the Baltimore school prevention trials already conducted by the Kellam et al. Prevention Research Center. These are longitudinal data centering on the life-course development of a sample of 2000 Baltimore children. The children live in 75 eastern Baltimore census tracts, which are middle to low range in median income and violence rates. At the present time, we are focusing on the development of aggressive behavior between the 1st and 7th grades at the dependent variable. We have been using multilevel and growth models within a SEM framework. We want to continue with this methodology in order to examine the separate effects of different levels. Also, in the future we want to explore trajectory classes of students to attempt to identify differences in antecedents.

Chen-Pin Wang, USF

My research agenda with PSMG has been motivated by the randomized preventive intervention trial aimed at reducing child's aggressive behavior in the Baltimore city public school area. We employ the general growth mixture modeling to address the heterogeneity of the developmental trajectories among these longitudinal follow-ups. The research I have been involved in this project includes developing diagnostic methodology to examine the model fit in terms of misspecification of growth patterns, covariance structures among latent growth classes, and enumeration of growth classes. The main technique anchoring this development is the adoption and extension of pseudo class proposed by Bandeen-Roche et al.. Based on the theory we derived, the pseudo-class adjusted residuals averaged across multiple pseudo class draws give rise to three useful diagnostics. Currently, I am exploring how the construct of the growth mixtures can be influenced by adding or dropping observed characteristics such as poverty status or intervention condition. The objective of my research is to develop statistical procedures to build mixture model that suits the data best.