I am performing standard multivariable linear regression (interval dependent variable) with a dataset that has 12% missing cases under listwise deletion. I am assuming MCAR or MAR. I would like to avail of full information maximum likelihood (FIML) estimation in Mplus as a means of handling the missing data. I am using the following estimator option: ANALYSIS: ESTIMATOR = MLR; However, the output indicates that 29 cases are excluded from the analysis (see warning message below), which I would not expect under FIML. How do I get Mplus to run FIML? Thank you. –James McMahon
*** WARNING Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 29
The model is estimated conditioned on the x variables. Their means, variances, and covariances are not model parameters. If you don't want to lose these cases, mention the variances of all of the x variables in the MODEL command. Then distributional assumptions will be made about these variables and their means, variances, and covariances will be model parameters..
Similar to the original poster, I am performing standard multivariate linear regression with a dataset of 188 cases, that has 32.5% missing cases with listwise deletion, but I would like to use FIML. As you recommended to the original poster, I've included a line in the model command that lists all of the x variables, as follows:
MODEL: y on x1 x2 x3; x1 x2 x3;
However, I have two follow-up questions:
1) We also have some missing data on the y variable. Is it okay to list the y variable on that second line as well, so that our full sample of 188 cases is included? E.g.:
MODEL: y on x1 x2 x3; y x1 x2 x3;
2) Our model has a large number of observed covariates. If I include all of those covariates in the first line (regression line) as well as in the second line (to model distributional parameters), MPlus gives me an error that I have more parameters than observations. If I exclude just a few of the variables from the distributional parameters line, I avoid this problem (while still keeping the full sample of 188), but is it problematic to model some of the distributional parameters that are in the regression but not all? E.g., is this okay to do?:
MODEL: y on x1 x2 x3 x4 x5 x6; x1 x2 x4;
If so, is there a principled way (either conceptually- or statistically-driven) to choose which variables to leave out of the distributional parameters line?
This is discussed in section 10.4.2 of our new book.
1) you don't have to list the residual variance of y because it is in the model as the default. Note, however, that subjects with complete data on x's and missing on y do not contribute to estimation of the regression coefficients (see that section).
2) If you exclude some of the x's they won't be allowed to correlate with the other x's. There really isn't a good solution to this problem.
Thanks, Bengt. A reviewer recently stated that FIML may not be an appropriate method for estimating missing data in an LCA when you're working with non-normal data. Is this correct? If it's not, do you have any web notes that support that FIML is an appropriate estimation method for non-normal data in an LCA framework?
"FIML" is ML estimation under the MAR (missing at random) assumption. For continuous outcomes, normality is assumed. This is probably what the reviewer refers to - "FIML" isn't perfect when you have non-normal continuous outcomes. But normality is not assumed with categorical or count outcomes. Therefore, there is no problem applying ML under MAR to count outcomes. I don't know which source could be a good reference - perhaps the book by Little & Rubin talks about counts; I don't have it here at the moment.