Mplus Discussion >> Data with structural missingness

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Data with structural missingness

Mplus Discussion > Missing Data Modeling >

Message/Author

Andrew Mackinnon posted on Monday, June 04, 2001 - 8:39 pm

I'm just getting familiar with the capabilities of Mplus and am seeking comments
about a particular type of problem I have encountered doing structural analyses of
psychiatric symptom data with missing items. It arises from the use of algorithmic
methods for determining diagnoses (usually to DSM criteria). Only the minimum
number of questions is asked to arrive at or rule out a positive diagnosis for a
particular disorder

For instance, for DSM Major Depressive Episode, positive cases must have one of
"depress mood" or "loss of interest or pleasure" symptoms plus at least three others.
Inquiry stops with negative responses to both the first items. If positive to either or
both gatekeeper questions, inquiry continues until three positive responses are
obtained or the list is exhausted. Items on the list do not seem to be ordered according
to any statistical criterion such as frequency of occurrence. This means that a very
large number of questions are not asked of every subject.

The current study is a large epidemiological one (N~10000). Other sources of
missing data are negligible. The researchers were planning to undertake some form of
factor analysis of the symptom data. My suggestions included (1) LCA of the data
including the "Not asked" category in addition to Present and Absent. (2) Estimation
of a factor model using multiple imputation methods to impute values for the unasked
questions. (It seems to me that the data may be considered missing at random.) My
concern is that both (and particularly the LCA) may simply give back the structure
that was used to decide what questions to ask. The imputation approach is also rather
messy, but seems to be the only way to get a MAR analysis with categorical data.

I'd welcome suggestions for alternative approaches I might try in Mplus or other
software, or comments on what I've proposed.

Bengt O. Muthen posted on Friday, June 08, 2001 - 11:42 am

It doesn't seem that this study was designed to study the dimensionality of depression due to the skip patterns employed. Factor analysis and LCA would seem to require yes/no answers to all of the questions in order to determine dimensions and classes. I don't believe the data can be easily used for more than classifying individuals as depressed/not depressed.

Regarding your suggestions, approach 2 using multiple imputation would seem to depend too highly on modeling assumptions given that such a large fraction of individuals are missing. You have MAR conditional on the first two questions but for the population of non-depressed, you have missing data for all depression items. Therefore, these individuals cannot be included in the analysis.

The proper model here may be a kind of sequential model, sometimes used in econometrics. In the first stage you predict who will give a positive answer to the first two questions. In the second stage, you model responses for the remaining items for those individuals who gave a positive response to one of the first two questions. However, this does not provide information about dimensionality for the whole population. I am not any commercial software that does this.

Hee-Jin Jun posted on Wednesday, January 24, 2007 - 10:54 am

Linda K. Muthen posted on Wednesday, January 24, 2007 - 11:43 am

Yes, you can use individually-varying times of observation in this case but your code would be:

i s | y1-y8 AT a1-a8;

The outcome and age variables are from each measurement occasion not each age. It is not necessary to have time-varying covariates.

Hee-Jin Jun posted on Friday, January 26, 2007 - 12:47 pm

Dear Linda,

Thank you so much for your response.
We tried what you suggested and got the error message

*** ERROR in Model command
Growth factor indicators must be all observed or all latent.

We tried two ways,
1.
MODEL:
%OVERALL%
i s | bmi96@age96 bmi97@age97 bmi98@age98 bmi99@age99 bmi00@age00 bmi01@age01 bmi03@age03;

2.
MODEL:
%OVERALL%
i s | bmi96-bmi03@age96-age03;

Thank you for your help.

Hee-Jin Jun posted on Friday, January 26, 2007 - 1:35 pm

Dear Linda,

Please disregard our last question.
We figured out the problem, but we got a new problem...

New error message says,
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILL-CONDITIONED
FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES.

THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NON-POSITIVE
DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES
BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION
NUMBER IS 0.182D-16.

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE
AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR
STARTING VALUES. PROBLEM INVOLVING PARAMETER 28.

We increased the starting value

STARTS = 500 200;
STITERATIONS = 100;

Is there any other way than increasing starting value?

Thanks for your help.

Linda K. Muthen posted on Friday, January 26, 2007 - 2:46 pm

Please send your input, data, output, and license number to support@statmodel.com.

Joop Hox posted on Friday, April 20, 2007 - 11:21 am

A simple question on m odeling incomplete data. I have panel data with dropout. I want to do a pattern mixture analysis. So I have 3 patterns: complete, random missing, and attrition. In the last group, per definition the last measurement occasion ALL variables are missing. Is there a way to use 4 variables in one group and only 3 in the other? I know you couldd work with dummies and ghost variables, but this is not very elegant. I am posting this here instead of mailing support because I think others might find this useful.

Joop Hox

Linda K. Muthen posted on Saturday, April 21, 2007 - 8:08 am

You need to have the same number of variables in each group. In the last group, the variables measured at the last time point will have all missing value flags. I think you can avoid problems of no variance for those variables by adding the option:

VARIANCES = NOCHECK;

to the DATA command.

Michel Nivard posted on Tuesday, June 21, 2011 - 5:57 am

I am fitting a Simplex model on repeatedly measured twin data. I have very high missingness in combination with structural missingness. This leads to problems with h1 estimation. Is there a way to disable H1 estimation in M-plus so I can at least inspect the parameter estimates my model would result in?

Linda K. Muthen posted on Tuesday, June 21, 2011 - 10:26 am

Ask for NOCHISQUARE in the OUTPUT command.