Missing data and DEFINE
Message/Author
 Ivan Jacob Agaloos Pesigan posted on Saturday, April 30, 2016 - 12:53 pm
I have measured variables with missing data and I wish to create aggregate scores using DEFINE. I specified missingness using MISSING ARE ALL (-99). How does FIML work in this case? Does FIML estimate values for missing data in the level of the variables defined in NAMES before computing the aggregate scores in DEFINE? Or means are computed first in DEFINE and missing values are estimated after? Any insight regarding this is welcome. Thanks very much.

VARIABLE:
NAMES ARE X1-X5 Y1-Y5 Z1-Z5;
USEVARIABLES ARE MEANX MEANY MEANZ;
MISSING ARE ALL (-99);

DEFINE:
MEANX = MEAN(X1 X2 X3 X4 X5);
MEANY = MEAN(Y1 Y2 Y3 Y4 Y5);
MEANZ = MEAN(Z1 Z2 Z3 Z4 Z5);

MODEL:
MEANZ ON MEANX MEANY;
 Linda K. Muthen posted on Saturday, April 30, 2016 - 2:29 pm
The DEFINE command creates or transforms variables prior to model estimation. FIML is not involved. See the DEFINE command in the user's guide which explains how missing values are handled.
 Aurelie Lange posted on Monday, May 02, 2016 - 6:37 am
Dear Dr Muthen,

I have 3 consecutive waves. I would like to define people who sustain their good scores during those 3 waves, and those that show a relapse at any of those 3 waves. I have 3 variables for which I would like to define the 'sustainers' and 'relapsers'; two of them are binary variables. The third variable is continuous. Here, a relapse is defined as a score below a certain threshold.

As I have missing data on these waves, and as the DEFINE command is run before FIML is being used, I wanted to use multiple imputation. However, I can't run the DEFINE command in combination with type=imputation either, because I get a different number of 'sustainers' and 'relapsers' per imputed dataset. Therefore, I thought of defining the sustainers and relapsers (1 and 0) in each imputed dataset and then averaging these values across all imputed datasets to get a variable similar to 'chance of being a sustainer'.

Is this is an appropriate solution?

thank you so much!
 Bengt O. Muthen posted on Wednesday, May 04, 2016 - 9:33 am
creative but I am not sure how reliable that approach is. Why not keep the continuous variable as is and do an LTA? Or do the dichotomization of the continuous variable first, outside Mplus.
 Aurelie Lange posted on Thursday, May 05, 2016 - 6:28 am
Dear Dr Muthen,

thank you for your reply. However, I am not quite sure I understand what you mean. I impute the values for the three variables (2 binary and 1 continous) which have been measured over 3 waves (let we call them a, b, and c resp). Then, outside of Mplus, I compute whether a participant declined or sustained on these three variables (a, b and c), thus creating 3 binary variables (variable d, e, and f). Still outside of mplus, I then compute an average across all 40 imputed datasets. So, if a participant is a decliner on variable d in 30 of the datasets and a sustainer in 10 of them, he would get a score of .75 (d_mean). This d_mean score is used in mplus as a continuous dependent variable. Is this what you suggested in your post? If not, could you explain which continous variable you refer to?

Thank you!

Sincerely,
Aurelie
 Bengt O. Muthen posted on Thursday, May 05, 2016 - 6:55 pm
I was referring to the variable for which you say "the third variable is continuous".

I think you should post this general analysis question on SEMNET.
 Aurelie Lange posted on Sunday, May 08, 2016 - 11:58 pm