Adam Hafdahl posted on Wednesday, October 23, 2002 - 3:46 pm
I'd appreciate any thoughts about how to analyze some data that may or may not be well-suited for multilevel analysis. (Similar queries posted to the SEMNET and MULTILEVEL lists have generated little response, except a few suggestions that Mplus may offer something appropriate.)
A colleague in Agricultural Economics has mail-survey data from about 76 respondents (staff members at various agencies) who each rated 17 policies (all aimed at resolving the same natural resources issue) on 6 qualities (e.g., fairness, efficacy, farmer resistance, preference) using a nine-point Likert-type response scale. She's interested mainly in relationships among qualities and has in mind a particular path model with preference as the ultimate outcome and various correlations and paths among the other qualities. The respondents were a fairly representative convenience sample from the types of agencies to which she'd like to generalize; although the policies cover most of the conceivable ones germane to this specific issue, her broader research agenda concerns policies regarding other natural resources issues as well.
As someone who's relatively naive about multilevel modeling, I suspect there's a multilevel structure to these data -- at least in a repeated-measures sense -- but can't describe it or see how to respect it when analyzing the relationships among qualities. What seems clear is that treating each respondent-policy pair as an independent 6-variate observation is inappropriate.
Some colleagues and I have considered a few strategies, but I'll defer describing them for now. I'd appreciate any advice, including references of published examples with a similar data structure. Thanks in advance.
Adam Hafdahl University of Missouri-Columbia
bmuthen posted on Wednesday, October 23, 2002 - 5:47 pm
I think you are right that this can be seen as a multilevel situation. You have 76 independent observations and in the multilevel perspective that would constitute level 2, i.e. the "cluster unit". Level 2 covariates such as agency type can be included. Level 1 is the 17 policies, i.e. the "members" of the clusters. And you have multivariate outcomes - 6 qualities. In multilevel modeling, you would treat the members of the clusters (say students in a school, or repeated measures in a person) as statistically equivalent, that is obeying the same model; of course, the means of the outcomes can shift as a function of level 1 covariates. In the student/school analogy, the 17 policies should be thought of as a random sample of students in that school. Typically, multilevel modeling considers level 1 relationships - such as the path analysis among the 6 outcomes that you refer to - as possibly varying across clusters (across your 76 respondents), either in terms of intercepts or in terms of slopes. And, you want to study how much such variation there is and what predictors of this variation you might have.
If you think this captures what you want to do, Mplus is ready to do it.
Thanks for your explanation of how these data might be considered multilevel in nature. I'd like to clarify one point and pose three follow-up questions. By way of clarification, each respondent rated the same 17 policies, so unlike the typical student/school situation the clusters (respondents) and members (policies) are fully crossed.
1. Is this crossed design likely to influence how the analysis should be handled?
2. Is it important that these policies are nearly the entire population of conceivable policies relevant to this particular natural resources issue? By analogy, this might be like having access to nearly all the children of interest in each school, or measuring nearly all time points of interest. Could these 17 policies be treated as fixed realizations of the Policy variable? (Again, we might hope the model for policies about this particular issue would generalize to other issues, but we have no empirical evidence.)
3. Are the numbers of respondents and/or policies dangerously close to being too small for an appropriate analysis? Where might I look for guidelines about this?
bmuthen posted on Friday, October 25, 2002 - 9:19 am
Ah, I see. Perhaps I jumped to conclusion. A straightforward way of viewing this is as a data matrix with 76 rows and 17x6 columns. I think you said the 6 variables were going into a path analysis. Perhaps the 17 policies, particularly if they cover nearly all policies, could be viewed as "fixed" rather than "random" effects, and therefore be treated as covariates in the path analysis.
Upon returning to this problem after several weeks away from it, I'm still unable to think clearly about reasonable modeling strategies. The central issue that has confused me from the beginning is how to handle the two sources of (co)variation: Is it covariation across respondents, policies, or both sources that we wish our path model to reflect? Frankly, I'm unable to distinguish among the substantive questions each choice would address. For instance, what types of research questions would warrant modeling covariation among respondents versus among policies? Does it even make sense to model covariation among both?
Although I've made a bit of progress by thinking about these data in terms of multivariate generalizability theory, three-mode factor analysis, and multilevel SEM, I'm still at a loss. It's quite possible this problem is simpler than I'm making it.
My currently favored ad-hoc strategy is as follows: Assuming the same covariance matrix among the six qualities holds for all 1,292 respondent-policy pairs, first subtract the respondent and policy means from the data and estimate this covariance matrix from the residuals, then use standard SEM software to conduct path analyses on this single covariance matrix. Am I correct that this strategy models covariation among respondents and policies simultaneously? If this is a reasonable strategy, I have four remaining questions:
1. What is the appropriate sample size for standard errors and other results that need this (e.g., fit indices), and how can I trick the software into handling this appropriately (or adjust the results manually)?
2. Is there a more straighforward way to implement this strategy, such as with dummy variables for respondents and policies?
3. Can the assumption of homogeneity of covariance matrices be relaxed to allow, say, the covariance matrix -- or perhaps coefficients in the path model -- to vary across respondents or across policies?
4. With only one replication per respondent-policy pair, can I do anything to investigate whether there's a substantial Respondent x Policy interaction remaining after the marginal means are subtracted?
I would be grateful for any further thoughts about conceptualizing this problem and about the above strategy or other approaches (e.g., it's unclear to me exactly how to implement your suggestion to treat Policy as a covariate or what the substantive interpretation of this would be). References to similar data sets or related methodologies would also be helpful.
bmuthen posted on Monday, January 13, 2003 - 10:35 am
If I understand your suggestion correctly, you are proposing the use of a 6 x 6 sample covariance matrix created from 76 x 17 = 1,292 observations. This would be acting as if you have independent observations among all 1292 observations, whereas there is true independence only among the 76 respondents. I would instead lean towards looking at a 6 x 6 sample covariance matrix for 76 respondents for one policy among the 17 at a time, or analyzing all policies jointly by putting policy as dummy covariates. Other Mplus Discussion readers are encouraged to jump into the discussion.
Dear Dr Muthen and Muthen, Our data consist of students nested within classrooms nested within teachers nested within schools (with no weight and N=60). And we want to investigate simple mediation analysis (A-->B --> C, and A --> C)on level1 only. Our main question is how to take the random effects into account for computing accurately the standard errors. We have decided to ignore the random effects and to use the bootstrap approach (Syntax below). Is this approach correct? If not what is the best way to proceed? Thank you.
ANALYSIS: BOOTSTRAP = 1000; MODEL: B ON A (a); C ON B (b); C ON A ; MODEL CONSTRAINT: NEW( m); m=a*b;
If you use 3-level analysis, bootstrap is not available. Instead just let random intercept variances be estimated on levels 2 and 3. See UG for examples. The indirect effect on level 1 follows the same formula.