Data with structural missingness PreviousNext
Mplus Discussion > Missing Data Modeling >
Message/Author
 Andrew Mackinnon posted on Tuesday, June 05, 2001 - 2:39 am
I'm just getting familiar with the capabilities of Mplus and am seeking comments
about a particular type of problem I have encountered doing structural analyses of
psychiatric symptom data with missing items. It arises from the use of algorithmic
methods for determining diagnoses (usually to DSM criteria). Only the minimum
number of questions is asked to arrive at or rule out a positive diagnosis for a
particular disorder

For instance, for DSM Major Depressive Episode, positive cases must have one of
"depress mood" or "loss of interest or pleasure" symptoms plus at least three others.
Inquiry stops with negative responses to both the first items. If positive to either or
both gatekeeper questions, inquiry continues until three positive responses are
obtained or the list is exhausted. Items on the list do not seem to be ordered according
to any statistical criterion such as frequency of occurrence. This means that a very
large number of questions are not asked of every subject.

The current study is a large epidemiological one (N~10000). Other sources of
missing data are negligible. The researchers were planning to undertake some form of
factor analysis of the symptom data. My suggestions included (1) LCA of the data
including the "Not asked" category in addition to Present and Absent. (2) Estimation
of a factor model using multiple imputation methods to impute values for the unasked
questions. (It seems to me that the data may be considered missing at random.) My
concern is that both (and particularly the LCA) may simply give back the structure
that was used to decide what questions to ask. The imputation approach is also rather
messy, but seems to be the only way to get a MAR analysis with categorical data.

I'd welcome suggestions for alternative approaches I might try in Mplus or other
software, or comments on what I've proposed.
 Bengt O. Muthen posted on Friday, June 08, 2001 - 5:42 pm
It doesn't seem that this study was designed to study the dimensionality of depression due to the skip patterns employed. Factor analysis and LCA would seem to require yes/no answers to all of the questions in order to determine dimensions and classes. I don't believe the data can be easily used for more than classifying individuals as depressed/not depressed.

Regarding your suggestions, approach 2 using multiple imputation would seem to depend too highly on modeling assumptions given that such a large fraction of individuals are missing. You have MAR conditional on the first two questions but for the population of non-depressed, you have missing data for all depression items. Therefore, these individuals cannot be included in the analysis.

The proper model here may be a kind of sequential model, sometimes used in econometrics. In the first stage you predict who will give a positive answer to the first two questions. In the second stage, you model responses for the remaining items for those individuals who gave a positive response to one of the first two questions. However, this does not provide information about dimensionality for the whole population. I am not any commercial software that does this.
 Hee-Jin Jun posted on Wednesday, January 24, 2007 - 4:54 pm
I want to do the BMI trajectory analysis. Population are adoloescents and young adults (N equals approximately 15000). In the begining of the study, there are 6 age group (from 9 to 14) and they have been followed 8 years (7 times from 1996 to 2003, skipped 2002). Because age is an important factor, we want to treat age as a time point. but it gives us lots of systematic missing observations, e.g., if a kid were 10 at 1996 then we don't have BMI at age 9 and BMI at age 17 through 22.
We saw the example 6.12 individaully varying times of observation. Can we apply the method that was described in the example 6.12 to avoid the systematic missing observations.

example of coding
i s | y9-y23, a9-a23;
s9 | y9 on a9
s9 | y10 on a10
s9 | y11 on a11
s9 | y12 on a12
s9 | y13 on a13
s9 | y14 on a14
s9 | y15 on a15
s9 | y16 on a16

s10 | y10 on a10
s10 | y11 on a11
s10 | y12 on a12
s10 | y13 on a13
s10 | y14 on a14
s10 | y15 on a15
s10 | y16 on a16
s10 | y17 on a17
. . .

s14 | y14 on a14
s14 | y15 on a15
s14 | y16 on a16
s14 | y17 on a17
s14 | y18 on a18
s14 | y19 on a19
s14 | y20 on a20
s14 | y21 on a21

Thanks a lot for your help.
 Linda K. Muthen posted on Wednesday, January 24, 2007 - 5:43 pm
Yes, you can use individually-varying times of observation in this case but your code would be:

i s | y1-y8 AT a1-a8;

The outcome and age variables are from each measurement occasion not each age. It is not necessary to have time-varying covariates.
 Hee-Jin Jun posted on Friday, January 26, 2007 - 6:47 pm
Dear Linda,

Thank you so much for your response.
We tried what you suggested and got the error message

*** ERROR in Model command
Growth factor indicators must be all observed or all latent.

We tried two ways,
1.
MODEL:
%OVERALL%
i s | bmi96@age96 bmi97@age97 bmi98@age98 bmi99@age99 bmi00@age00 bmi01@age01 bmi03@age03;

2.
MODEL:
%OVERALL%
i s | bmi96-bmi03@age96-age03;

Thank you for your help.
 Hee-Jin Jun posted on Friday, January 26, 2007 - 7:35 pm
Dear Linda,

Please disregard our last question.
We figured out the problem, but we got a new problem...

New error message says,
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILL-CONDITIONED
FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE
DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES
BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION
NUMBER IS 0.182D-16.

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE
AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR
STARTING VALUES. PROBLEM INVOLVING PARAMETER 28.

We increased the starting value

STARTS = 500 200;
STITERATIONS = 100;

Is there any other way than increasing starting value?

Thanks for your help.
 Linda K. Muthen posted on Friday, January 26, 2007 - 8:46 pm
Please send your input, data, output, and license number to support@statmodel.com.
 Joop Hox posted on Friday, April 20, 2007 - 5:21 pm
A simple question on m odeling incomplete data. I have panel data with dropout. I want to do a pattern mixture analysis. So I have 3 patterns: complete, random missing, and attrition. In the last group, per definition the last measurement occasion ALL variables are missing. Is there a way to use 4 variables in one group and only 3 in the other? I know you couldd work with dummies and ghost variables, but this is not very elegant. I am posting this here instead of mailing support because I think others might find this useful.

Joop Hox
 Linda K. Muthen posted on Saturday, April 21, 2007 - 2:08 pm
You need to have the same number of variables in each group. In the last group, the variables measured at the last time point will have all missing value flags. I think you can avoid problems of no variance for those variables by adding the option:

VARIANCES = NOCHECK;

to the DATA command.
 Michel Nivard posted on Tuesday, June 21, 2011 - 11:57 am
I am fitting a Simplex model on repeatedly measured twin data. I have very high missingness in combination with structural missingness. This leads to problems with h1 estimation. Is there a way to disable H1 estimation in M-plus so I can at least inspect the parameter estimates my model would result in?
 Linda K. Muthen posted on Tuesday, June 21, 2011 - 4:26 pm
Ask for NOCHISQUARE in the OUTPUT command.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: