I am trying to impute missing data in a complex survey data set, and appreciate your help in getting it right. For design I have variables strata for strata, com for clusters and wt for individual weights. I did:
The best solution is to add the weight variable in the imputation
USEVARIABLES = ... wt;
and remove the command
WEIGHT = wt;
Usually the weight variable is computed from other variables such as race gender SES. If that is the case, the best solution is to have these variables in the imputation instead of the weight variable.
You can also add dummy variables for each stratum if you want to use that information.
Bayesian estimation (which is used for the imputation) currently can not use the weight variable directly.
The weight variable is actually sampling probability based, and it depends on which stratum/cluster one is in and not on individual characteristics. I think because I am going to include the stratum variable, the weight variable will not carry any additional information.
And so should I still do CLUSTER = com and TYPE = BASIC TWOLEVEL? Or should I do just TYPE = BASIC?
You should use TYPE = BASIC TWOLEVEL if you can unless the cluster effects are very small. Look at the ICC of the variables and also take a look at https://www.statmodel.com/download/Imputations7.pdf in particular Section 3.3 and the other sections on multilevel imputations.
Is it possible to use the DEFINE command to create variables in multiply imputed datasets? I would like to obtain exogenous indicator variables from a multiply imputed ordinal variable. (This seems preferable to doing this in the imputation stage, as it sacrifices information about the ordinal relationship of the indicators).
But after adding the define command, the input file that had been working normally now produces an output file with only input file instructions--without results or error messages. Also, the "Mplus" activity box does not show that multiple datasets are being analyzed.