Fit with missing data
Message/Author
 Maren Winkler posted on Monday, January 18, 2010 - 3:32 am
Hi,

in their current paper, Wu, West and Taylor (2009) point out the following:
"However, as noted by Enders (2001), when FIML is used for Type II longitudinal data, the chi-square test statistic cannot be calculated using the general form, (N - 1)F(FIML), because there is no single N that is applicable to the entire sample. Researchers can calculate the fit indices by following a two-step procedure: (a) estimate the saturated model and the hypothesized model using FIML, and (b) calculate the chi-square test statistics using the formula –2(InL(FIML(hypothesized)) - InLFIM(L(saturated))). Then, the fit indices based on the chi-square test statistics can be calculated by hand using standard formulas (see Table 3)."

I'm using Mplus 5.21.
I have missing data and include five auxiliary variables in my model.

Does the chi-square test statistic I get use the above formula?
If not, does the output contain information on both log-likelihoods so that I can do the calculations myself?

Thank you very much
 Linda K. Muthen posted on Monday, January 18, 2010 - 9:17 am
With TYPE=MISSING, Mplus use the formula –2(InL(FIML(hypothesized)) - InLFIM(L(saturated))).
 Maren Winkler posted on Tuesday, January 19, 2010 - 12:51 am
Dear Linda,

1. Are fit indices (e.g. CFI, TLI, RMSEA, SRMR) calculated according to the above formula?

2. In addition, Davey, Savla and Luo (2005) point out the following:
"Comparison of model fit indexes generated by EM and FIML may thus be informative about the extent to which one's evaluation of a model is likely to be affected by missing data."
I use the following commands in Mplus:

" MISSING ARE BLANK;

AUXILIARY = (m) z1 z2 z3 z4 z5;

ANALYSIS:

TYPE = MISSING;
ESTIMATOR = ML;"

Does Mplus use an EM-algorithm for estimating the model?

 Linda K. Muthen posted on Tuesday, January 19, 2010 - 8:19 am
1. Yes.
2. Yes.
 Maren Winkler posted on Thursday, January 21, 2010 - 1:44 am
Dear Linda,

I've got an additional question after having read a paper by Bodner (2008).
He recommends reporting the parameter lambda which "represents the fraction of missing information for a given parameter in a particular data set". It is defined as

lambda = 1 - [(v(m)-1)*U(m)/[(v(m)+3)*T(m)]

Is it possible to estimate this value with TYPE = MISSING? If so, what do I have to add to the syntax?

Thanks a lot!
 Bengt O. Muthen posted on Thursday, January 21, 2010 - 10:46 am
Those quantities can currently not be gotten out of the Mplus output.
 Maren Winkler posted on Wednesday, April 21, 2010 - 10:29 am
Dear Drs. Muthén,

I have a question concerning the amount of missing data.

On our predictor side, we have data for nearly 1200 subjects (8 variables + five auxiliary variables). However, on the criterion side we only have data for 80 subjects (11 variables).
We assume missing at random - subjects were admitted (and thus have criterion data) due to their test results (predictors).

We are using the auxiliary option and ML in Mplus to model our data.

How much can I trust the results I get - parameter estimates, model fit?
For multiple imputation, there are "rules of thumb" on how many imputations one needs in order to get reliable estimates. Are there any rules of thumb for using FIML?

Thanky you very much!
 Bengt O. Muthen posted on Wednesday, April 21, 2010 - 1:00 pm
It sounds like you regress 1 DV on 8 predictors and to that you add 5 auxiliary(m) variables as missing data correlates. If that's right, the Mplus aux(m) ML approach under MAR ("FIML") for the 1200 subjects would work well since you have only 10 regression parameters (8 slopes, 1 intercept, and 1 residual variance) to estimate based on the 80 subjects who got selected. I think it would be difficult to have reliable general rules of thumb for either MI or FIML since it depends so much on the specific setting.
 Maren Winkler posted on Thursday, April 22, 2010 - 11:11 pm
this sounds pretty reassuring!
Just to make sure I've described my model sufficiently:

On the predictor side I have eight variables, loading 2 factors in a nested factor model:

F1 BY x1 TO x8;
F2 BY x4 TO x8;
F1 WITH F2@0;

(This model fits the data very well in the sample of our 1200 subjects).

For 80 subjects who were admitted I have got grades in 11 courses which load one factor well:

F3 BY y1 TO y11;

In my SEM I regress F3 on F1 and F2.

I define my five auxiliary variables as aux (m).

Would your optimistic view hold for this scenario?

Thank you so much for your help!
 Bengt O. Muthen posted on Friday, April 23, 2010 - 9:22 am
Yes. In fact, I have an article on a similar situation regarding GMAT selection:

Muthén, B. (1989). Factor structure in groups selected on observed scores. British Journal of Mathematical and Statistical Psychology, 42, 81-90.

This is paper #23 at my UCLA web site

http://www.gseis.ucla.edu/faculty/muthen/full_paper_list.htm

It discusses the selection issues involved.
 Maren Winkler posted on Tuesday, May 11, 2010 - 6:25 am
Thanks for this article! Have you also run analyses on the predictive validity of GMAT's factor structure in predicting grade point average?
 Linda K. Muthen posted on Tuesday, May 11, 2010 - 9:42 am
No.