I am attempting to replicate the missing data modelling approach of Falkenström, Granström, and Holmqvist (2013; see http://tinyurl.com/y9kochuv). In an Mplus two-level model, they regressed between-level slopes on dummy coded variables capturing k-1 monotonous missing data patterns (i.e. Pattern Mixture Modeling), and within-level intercepts on a dummy coded variable reflecting whether observations were missing or present (i.e., Selection Modeling).
Q1: I tried to integrate the NMAR syntax into a TWO-LEVEL model, by adding the essentials of UG v8 Ex 11.3 into the %WITHIN% level and essentials of Ex 11.4 into the %BETWEEN% LEVEL but failed. Is this even possible? Note I am using a long-data structure.
Q2: I manually created dummy coded variables for each level/NMAR method. The %BETWEEN% level (Pattern Mixture) regressions work well; the %WITHIN% level (Selection Model) variable is claimed to have a 0 variance. It is simply a variable with 1s for data = missing, and 0's for data = present. There are no missing data values for the first of 4 time-points.Any ideas?
I wondered whether the "DATA MISSING" function was available in multilevel models (MLM), but it appears not (e.g., DATA MISSING uses wide format which is incompatible with MLMs, which they use long).
A potential solution is to manually create the necessary dummy variables. For Selection Modeling, this would involve regressing a variable which codes for whether an observation is missing = 1 or not = 0, onto the outcome measure at the %WITHIN% level of an MLM. The problem is that that there is no missing data for the baseline time-point, which is a common scenario (e.g., STAR*D), and a 0 variance error is thrown.
I am wondering whether there's an alternative approach to achieve this or if you have any suggestions for amendment.
True, but as I'm sure you know, studies in the social sciences typically use the long format to take advantage of differences in handling missing data and computational efficiency.
So basically, DATA MISSING is incompatible with long data because it is designed to convert wide structured data.
The issue is, when I manually create dummy coded variables of dropout and regress them onto the intercept and slope factors with the restrictions mentioned in the STAR*D paper, the model is not identified (I am talking about both PM and SM models).
Unless you have very long, longitudinal data, I think the single-level wide approach is preferable to the two-level long approach and that's why we promote and focus on the wide approach. It is more flexible, for instance, easily handling testing of measurement non-invariance over time and allowing for residual variances to vary over time. The wide and long approaches handle missing data under MAR the same way contrary to what you seem to suggest.
I don't think a two-level, long dropout approach could more easily handle PM and SM models.