Ordered Polytomous data PreviousNext
Mplus Discussion > Missing Data Modeling >
Message/Author
 Sanjoy posted on Sunday, May 01, 2005 - 10:06 pm
Dear Professor/s ... I have couple of quick questions regarding missing data analysis ...

Mine is SEM with 5-scale categorical outcome indicator, in my final-use data set I don't have any missing covariates (i.e. no missing X's)... missingness is there only in the outcome indicator variable

Following is my code ...

TITLE: response WITH MISSING
DATA: FILE IS d:\mpluspaper1_missing.txt;
VARIABLE:
NAMES ARE X1-X19 Y1-Y4 XB1-XB6 XP1-XP9 R1-R9 B1-B11 T1-T4 MB MR MB3-MB5;
USEVARIABLES ARE X2 X5 X7-X12 X15 Y1 R7-R9 MB3-MB5;
CATEGORICAL ARE Y1 R7-R9 MB3-MB5;
MISSING ARE .;

! M in model statement indicates missing dependent variables


ANALYSIS: TYPE=MISSING;
PARAMETERIZATION=THETA;
ESTIMATOR=WLSMV;

MODEL: B by MB3-MB5;
R by R7-R9;
Y1 on B R X7-X12;
B on R X2 X8 X9 X11 X15 X9 X10;
R on B X5 X9 X10 X12;

OUTPUT:
STANDARDIZED;
SAMPSTAT;


Q1. How do I know what exactly MPlus is doing ... I mean the mathematics behind it, like the way we can say for sure about WLS(MV) once we read your papers (83,84,95,97) ... actually professor, this is something I have to report in my thesis

Q2. Where should I put "H1" in the analysis command, since MPlus is saying in order to access "sampstat" under "missing" I need to put "H1" on?

Q3. Once we change the parameterization from theta to delta, significance of the parameter/s changes ... why!

Q4. I guess my result will be better if we can treat my missing data as Non-ignorable, what should be the necessary changes in my Mplus-code in order to get that

Actually prof. ... apart from testing my model hypotheses I'm also checking three other things ... what will happen to our overall fit of the model, when we replace "Don't Know" by
1.0 (where don’t know stands for no importance)
2.3 ( where don’t know stands for neutral point)
3. Don’t know being treated as a genuine missing value

we have "don't know" on that 3 indicator variables, which we represent as MB3-MB5 …it’s quite reasonable to assume in our particular situation “don’t know”/ missingness is/could be a function of X, like her different demographic features, as you can see from our model statement “MB3-MB5” are loaded onto the latent factor “B”, which in turn is regressed on different X’s

Thanks in zillions, with Regards
 bmuthen posted on Tuesday, May 03, 2005 - 10:06 am
Q1. The answers are in the Version 3 User's Guide (see e.g. chapter 1).

Q2. In the Analysis command, TYPE= ...H1;

Q3. Long story - see web note #4. Basically, this is in line with standardized slopes not having the same SEs as raw slopes.

Q4. See Q1 answer

The last questions are better put forth on SEMNET and discussed with your advisor.
 Sanjoy posted on Tuesday, May 03, 2005 - 6:32 pm
Thank you Professor ... web note#4 is really helpful, "H1" is working fine now ... regarding User's guide note, it's written all that MPlus can do but not the program logistics ... I mean something like the way your articles explain things … today I got those two of your article (later one is a note) on missing data (#47 and # 93)... thanks to Maija ... I was, in fact looking for article like these two, especially No. 93 which helped me a lot to understand the way we deal with non-ignorable missing data in Latent variable framework

With regards
 bmuthen posted on Tuesday, May 03, 2005 - 6:38 pm
With WLSMV and no exogenous observed variables (no "x's"), Mplus simply uses the pairwise present approach (see e.g. Little & Rubin's missing data book). With x's, missingness is allowed to be predicted by x's in the MAR sense of Little-Rubin.

With ML, regular MAR is used.
 Sanjoy posted on Thursday, May 05, 2005 - 8:45 pm
Thank you Professor ... I hope, now I start getting slightly the issues behind missing data handling and it analysis

Now professor... with MPlus, unlike other software we can do a great deal of things with missing data, and particularly under a situation when we have multivariate dependent variables with categorical indicators ... at least to best of my knowledge I can't remember any other econometric software which can do such things, however there is one thing we were missing here and that is imputation ... is there any statistical reason behind ... I mean, on the whole your experience don't find Imputation technique efficient or something like that

If it is not ... then this is what I have planned to go for with ... I'm going to use your WLSMV, since this is the only estimator which can estimate my situation efficiently ... and I'm going to do it over 5/10 imputed data set (though I suppose 5 is ok under moderately missingness)

I have three very quick questions

1. What is your advice ... should I go for

2. are all ".dat" files same in nature (like .dat in MPlus or in GAUSS) ... since I'm doing imputation in GAUSS and I have noticed ".dat" file that GAUSS creates is some kind of encrypted file ... well I can convert them again into ".txt" file with GAUSS ... but I'm just wondering

3. Now I made five files ready (in ASCII / txt format) … following your example 12.13 HOW can I COMBINE them so that I can run the imputation … I read the page, but can’t understand how will one “FILE” command take care of five files!

Thanks and regards
 Linda K. Muthen posted on Friday, May 06, 2005 - 7:09 am
Imputation and maximum likelihood estimation for missing data are asymptotically equivalent.

1. You should check the literature for the number of imputed data sets to use.

2. I don't know if all .dat files are the same.

3. Look up IMPUTATION in the index of the Mplus User's Guide. It shows how the file should look.
 Sanjoy posted on Friday, May 06, 2005 - 4:48 pm
Oops madam ... thanks for ur suggestion ... but, I couldn't run ... this is what I have written ...I have made 5 imputed data set saved in "D" ...

each data set has 240 rows and 8 columns


TITLE: imputation TEST
DATA: FILE IS d:\impute1.txt;
FILE IS d:\impute2.txt;
FILE IS d:\impute3.txt;
FILE IS d:\impute4.txt;
FILE IS d:\impute5.txt;
TYPE=IMPUTATION;
NOBSERVATIONS=240;
VARIABLE:
NAMES ARE A1-A4 B1-B4;
CATEGORICAL = B1-B4;

MODEL: a by A1-A4;
b by B1-B4;

MPlus is saying
"*** ERROR in Data command
There are fewer NOBSERVATIONS entries than groups in the analysis."

I have tried with replacing 240 by 240*5=1040 in NOBSERVATION ...it's giving the same error message

could you sugest me the correct one please

thanks and regards
 Linda K. Muthen posted on Friday, May 06, 2005 - 5:12 pm
Example 12.13 shows an input for multiple imputation. Please compare your input to that. The names of the five data sets should be in an external ASCII file not in the input file. The ASCII file with the names of the data sets is the file name that should be referenced in the FILE option.
 Sanjoy posted on Saturday, May 07, 2005 - 4:35 pm
sorry madam, I'm still struggling with this ...in MPLus example 12.13 it's saying "the FILE option of the DATA command is used to give the names of the multiple imputation data set to be analyzed. the file named using the FILE option of the DATA command must contain a list of the names of the multiple imputation data sets to be analyzed"

I have tried in this way which failed

TITLE: imputation TEST
DATA: FILE IS d:\impute1.txt
impute2.txt
impute3.txt
impute4.txt
impute5.txt;
TYPE=IMPUTATION;

MPlus is saying
*** ERROR in Data command
The file specified for the FILE option cannot be found. Check that this
file exists: d:\impute1.txt d:\impute2.txt d:\impute3.txt d:\impute4.txt d:\i

while there are five data sets and they do exist in "d" drive ... to make things sure I run them seperatley and they work

I have ALSO tried with putting ";" after each data set name, that did not work either


How can I do this ... "The names of the five data sets should be in an external ASCII file not in the input file. The ASCII file with the names of the data sets is the file name that should be referenced in the FILE option."... as you have advised me earlier


thanks for your patience , regards ...sanjoy
 Linda K. Muthen posted on Saturday, May 07, 2005 - 5:05 pm
That is not what it says. Following is what it says: "The FILE option is used to give the name of the file that contains the names of the multiple imputation data sets to be analyzed." So the names should be in a file. You should not list all of the names using the FILE option. If you look at the example, there is one file name, imput.dat. The file imput.dat contains the names of the data sets and this is shown in the example.
 Sanjoy posted on Saturday, May 07, 2005 - 6:51 pm
this time I got it ... thannnnnnnk you so much, for your advice and your patience, of course :-)...

let me know madam if I'm still wrong

(for the folks who are doing imputation for the first time)

1. open a NOTEPAD window

2. paste five names of the file that you have created through imputation ... say e.g.
impute1.txt
impute2.txt
impute3.txt
impute4.txt
impute5.txt
(do NOT mention the directory name here like d:\impute1.txt ... mine is here "d" drive)

3. close the window and save the file under a name ,say "Multiple" under "d" drive, so in ur commnad it will look like
DATA: FILE IS d:\multiple.txt;
TYPE=IMPUTATION;

(now if u have partioned drive like I have "c" and "d", u can NOT save this file in one drive and keep those 5 imputed files in other drive ... )
 Linda K. Muthen posted on Sunday, May 08, 2005 - 6:44 am
Looks correct.
 Huang Wu posted on Saturday, June 09, 2018 - 2:52 pm
The answers are insightful. I am still wondering if all the imputation datasets should be included in one file and use the file name to separate?
 Linda K. Muthen posted on Saturday, June 09, 2018 - 6:15 pm
Each imputation data set should be in a separate file. The names of the data sets are listed in a file that is specified using the FILE option. See Example 13.13 in the user's guide.
 Jan Stochl posted on Wednesday, June 05, 2019 - 6:21 am
Dear Drs Muthen and/or Aspaouhov,


**FIRTS PART
I have a dataset with missingness on several variables. I imputed the data (with multiple imputations), as follows:

NDATASETS = 10;
SAVE = FRN_missimp*.dat;
ANALYSIS: TYPE = BASIC;

I then use the resulting data set to run several analyses on it. One is a factor model for which I use several variables from the imputed data and for which I would like to save the pooled factor scores. However, when doing so I get the following warning (I have Mplus 8):


*** WARNING in SAVEDATA command
The FILE option is not available for TYPE=MONTECARLO or TYPE=IMPUTATION.
The FILE option will be ignored.
*** WARNING in SAVEDATA command
The SAVE option is not available for TYPE=MONTECARLO or TYPE=IMPUTATION.
The SAVE option will be ignored.
2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS


To give you a bit of background on my analyses, I am runnign a longitudinal CFA for 2 time points (with different levels of invariance) and categorical indicators.

to be continued:
 Jan Stochl posted on Wednesday, June 05, 2019 - 6:22 am
**SECOND PART
Therefore my first question is, is there at all a way to retrieve the pooled factor scores when using imputed data sets?

If not, I think there are three other options: First, I could use my original data set with missingness and run the (longitudinal/invariant) CFAs with WLSMV (given all variables are ordered categorical) and pairwise present (excluding the variables that have entire missingness on time 1 and/or time 2). Second, I could use my original data set with missingness and run the (longitudinal/invariant) CFAs with MLR (altough my variables are ordered categorical) and FIML (excluding the variables that have entire missingness on time 1 and/or time 2). This way incidental missingness would be accounted for, but not attrition. On the first glance the WLSMV option gives a better fit than the FIML option. However, a third option may be to use Stochastic Regression Imputation (one single imputed data set) and to just feed this imputation data set into the longitudinal CFA. This should take into account the attrition, but would probably have other limitations, e.g. attenuate standard errors.

**to be coninued
 Jan Stochl posted on Wednesday, June 05, 2019 - 6:22 am
**THIRD PART
So my second question would be, if multiple imputation is not an option, which of those three options would you recommend?

My third an last question is, whether there would be any other way to deal with the missing data, so that I could also account for attrition?

Thank you very much for your help,

Jan Stochl and Jessica Fritz
 Bengt O. Muthen posted on Thursday, June 06, 2019 - 4:41 pm
Not sure what you mean by pooled factor scores but perhaps you mean some sort of average over MI draws. That seems more suitable for a Bayes treatment for which plausible values factor scores (several draws) are available.

But why complicate things with MI. Why not use FIML instead of MI? ML and therefore FIML is available also for categorical outcomes. As for attrition, see the paper on our website:

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper.
download paper contact first author show abstract

We ask that postings be limited to 1 window.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: