Dear Professor/s ... I have couple of quick questions regarding missing data analysis ...
Mine is SEM with 5-scale categorical outcome indicator, in my final-use data set I don't have any missing covariates (i.e. no missing X's)... missingness is there only in the outcome indicator variable
Following is my code ...
TITLE: response WITH MISSING DATA: FILE IS d:\mpluspaper1_missing.txt; VARIABLE: NAMES ARE X1-X19 Y1-Y4 XB1-XB6 XP1-XP9 R1-R9 B1-B11 T1-T4 MB MR MB3-MB5; USEVARIABLES ARE X2 X5 X7-X12 X15 Y1 R7-R9 MB3-MB5; CATEGORICAL ARE Y1 R7-R9 MB3-MB5; MISSING ARE .;
! M in model statement indicates missing dependent variables
MODEL: B by MB3-MB5; R by R7-R9; Y1 on B R X7-X12; B on R X2 X8 X9 X11 X15 X9 X10; R on B X5 X9 X10 X12;
OUTPUT: STANDARDIZED; SAMPSTAT;
Q1. How do I know what exactly MPlus is doing ... I mean the mathematics behind it, like the way we can say for sure about WLS(MV) once we read your papers (83,84,95,97) ... actually professor, this is something I have to report in my thesis
Q2. Where should I put "H1" in the analysis command, since MPlus is saying in order to access "sampstat" under "missing" I need to put "H1" on?
Q3. Once we change the parameterization from theta to delta, significance of the parameter/s changes ... why!
Q4. I guess my result will be better if we can treat my missing data as Non-ignorable, what should be the necessary changes in my Mplus-code in order to get that
Actually prof. ... apart from testing my model hypotheses I'm also checking three other things ... what will happen to our overall fit of the model, when we replace "Don't Know" by 1.0 (where don’t know stands for no importance) 2.3 ( where don’t know stands for neutral point) 3. Don’t know being treated as a genuine missing value
we have "don't know" on that 3 indicator variables, which we represent as MB3-MB5 …it’s quite reasonable to assume in our particular situation “don’t know”/ missingness is/could be a function of X, like her different demographic features, as you can see from our model statement “MB3-MB5” are loaded onto the latent factor “B”, which in turn is regressed on different X’s
Thanks in zillions, with Regards
bmuthen posted on Tuesday, May 03, 2005 - 10:06 am
Q1. The answers are in the Version 3 User's Guide (see e.g. chapter 1).
Q2. In the Analysis command, TYPE= ...H1;
Q3. Long story - see web note #4. Basically, this is in line with standardized slopes not having the same SEs as raw slopes.
Q4. See Q1 answer
The last questions are better put forth on SEMNET and discussed with your advisor.
Thank you Professor ... web note#4 is really helpful, "H1" is working fine now ... regarding User's guide note, it's written all that MPlus can do but not the program logistics ... I mean something like the way your articles explain things … today I got those two of your article (later one is a note) on missing data (#47 and # 93)... thanks to Maija ... I was, in fact looking for article like these two, especially No. 93 which helped me a lot to understand the way we deal with non-ignorable missing data in Latent variable framework
With WLSMV and no exogenous observed variables (no "x's"), Mplus simply uses the pairwise present approach (see e.g. Little & Rubin's missing data book). With x's, missingness is allowed to be predicted by x's in the MAR sense of Little-Rubin.
Thank you Professor ... I hope, now I start getting slightly the issues behind missing data handling and it analysis
Now professor... with MPlus, unlike other software we can do a great deal of things with missing data, and particularly under a situation when we have multivariate dependent variables with categorical indicators ... at least to best of my knowledge I can't remember any other econometric software which can do such things, however there is one thing we were missing here and that is imputation ... is there any statistical reason behind ... I mean, on the whole your experience don't find Imputation technique efficient or something like that
If it is not ... then this is what I have planned to go for with ... I'm going to use your WLSMV, since this is the only estimator which can estimate my situation efficiently ... and I'm going to do it over 5/10 imputed data set (though I suppose 5 is ok under moderately missingness)
I have three very quick questions
1. What is your advice ... should I go for
2. are all ".dat" files same in nature (like .dat in MPlus or in GAUSS) ... since I'm doing imputation in GAUSS and I have noticed ".dat" file that GAUSS creates is some kind of encrypted file ... well I can convert them again into ".txt" file with GAUSS ... but I'm just wondering
3. Now I made five files ready (in ASCII / txt format) … following your example 12.13 HOW can I COMBINE them so that I can run the imputation … I read the page, but can’t understand how will one “FILE” command take care of five files!
Oops madam ... thanks for ur suggestion ... but, I couldn't run ... this is what I have written ...I have made 5 imputed data set saved in "D" ...
each data set has 240 rows and 8 columns
TITLE: imputation TEST DATA: FILE IS d:\impute1.txt; FILE IS d:\impute2.txt; FILE IS d:\impute3.txt; FILE IS d:\impute4.txt; FILE IS d:\impute5.txt; TYPE=IMPUTATION; NOBSERVATIONS=240; VARIABLE: NAMES ARE A1-A4 B1-B4; CATEGORICAL = B1-B4;
MODEL: a by A1-A4; b by B1-B4;
MPlus is saying "*** ERROR in Data command There are fewer NOBSERVATIONS entries than groups in the analysis."
I have tried with replacing 240 by 240*5=1040 in NOBSERVATION ...it's giving the same error message
Example 12.13 shows an input for multiple imputation. Please compare your input to that. The names of the five data sets should be in an external ASCII file not in the input file. The ASCII file with the names of the data sets is the file name that should be referenced in the FILE option.
sorry madam, I'm still struggling with this ...in MPLus example 12.13 it's saying "the FILE option of the DATA command is used to give the names of the multiple imputation data set to be analyzed. the file named using the FILE option of the DATA command must contain a list of the names of the multiple imputation data sets to be analyzed"
I have tried in this way which failed
TITLE: imputation TEST DATA: FILE IS d:\impute1.txt impute2.txt impute3.txt impute4.txt impute5.txt; TYPE=IMPUTATION;
MPlus is saying *** ERROR in Data command The file specified for the FILE option cannot be found. Check that this file exists: d:\impute1.txt d:\impute2.txt d:\impute3.txt d:\impute4.txt d:\i
while there are five data sets and they do exist in "d" drive ... to make things sure I run them seperatley and they work
I have ALSO tried with putting ";" after each data set name, that did not work either
How can I do this ... "The names of the five data sets should be in an external ASCII file not in the input file. The ASCII file with the names of the data sets is the file name that should be referenced in the FILE option."... as you have advised me earlier
That is not what it says. Following is what it says: "The FILE option is used to give the name of the file that contains the names of the multiple imputation data sets to be analyzed." So the names should be in a file. You should not list all of the names using the FILE option. If you look at the example, there is one file name, imput.dat. The file imput.dat contains the names of the data sets and this is shown in the example.
this time I got it ... thannnnnnnk you so much, for your advice and your patience, of course :-)...
let me know madam if I'm still wrong
(for the folks who are doing imputation for the first time)
1. open a NOTEPAD window
2. paste five names of the file that you have created through imputation ... say e.g. impute1.txt impute2.txt impute3.txt impute4.txt impute5.txt (do NOT mention the directory name here like d:\impute1.txt ... mine is here "d" drive)
3. close the window and save the file under a name ,say "Multiple" under "d" drive, so in ur commnad it will look like DATA: FILE IS d:\multiple.txt; TYPE=IMPUTATION;
(now if u have partioned drive like I have "c" and "d", u can NOT save this file in one drive and keep those 5 imputed files in other drive ... )
Each imputation data set should be in a separate file. The names of the data sets are listed in a file that is specified using the FILE option. See Example 13.13 in the user's guide.
Jan Stochl posted on Wednesday, June 05, 2019 - 6:21 am
Dear Drs Muthen and/or Aspaouhov,
**FIRTS PART I have a dataset with missingness on several variables. I imputed the data (with multiple imputations), as follows:
NDATASETS = 10; SAVE = FRN_missimp*.dat; ANALYSIS: TYPE = BASIC;
I then use the resulting data set to run several analyses on it. One is a factor model for which I use several variables from the imputed data and for which I would like to save the pooled factor scores. However, when doing so I get the following warning (I have Mplus 8):
*** WARNING in SAVEDATA command The FILE option is not available for TYPE=MONTECARLO or TYPE=IMPUTATION. The FILE option will be ignored. *** WARNING in SAVEDATA command The SAVE option is not available for TYPE=MONTECARLO or TYPE=IMPUTATION. The SAVE option will be ignored. 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
To give you a bit of background on my analyses, I am runnign a longitudinal CFA for 2 time points (with different levels of invariance) and categorical indicators.
to be continued:
Jan Stochl posted on Wednesday, June 05, 2019 - 6:22 am
**SECOND PART Therefore my first question is, is there at all a way to retrieve the pooled factor scores when using imputed data sets?
If not, I think there are three other options: First, I could use my original data set with missingness and run the (longitudinal/invariant) CFAs with WLSMV (given all variables are ordered categorical) and pairwise present (excluding the variables that have entire missingness on time 1 and/or time 2). Second, I could use my original data set with missingness and run the (longitudinal/invariant) CFAs with MLR (altough my variables are ordered categorical) and FIML (excluding the variables that have entire missingness on time 1 and/or time 2). This way incidental missingness would be accounted for, but not attrition. On the first glance the WLSMV option gives a better fit than the FIML option. However, a third option may be to use Stochastic Regression Imputation (one single imputed data set) and to just feed this imputation data set into the longitudinal CFA. This should take into account the attrition, but would probably have other limitations, e.g. attenuate standard errors.
**to be coninued
Jan Stochl posted on Wednesday, June 05, 2019 - 6:22 am
**THIRD PART So my second question would be, if multiple imputation is not an option, which of those three options would you recommend?
My third an last question is, whether there would be any other way to deal with the missing data, so that I could also account for attrition?
Not sure what you mean by pooled factor scores but perhaps you mean some sort of average over MI draws. That seems more suitable for a Bayes treatment for which plausible values factor scores (several draws) are available.
But why complicate things with MI. Why not use FIML instead of MI? ML and therefore FIML is available also for categorical outcomes. As for attrition, see the paper on our website:
Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper. download paper contact first author show abstract