Message/Author 


I'm just getting familiar with the capabilities of Mplus and am seeking comments about a particular type of problem I have encountered doing structural analyses of psychiatric symptom data with missing items. It arises from the use of algorithmic methods for determining diagnoses (usually to DSM criteria). Only the minimum number of questions is asked to arrive at or rule out a positive diagnosis for a particular disorder For instance, for DSM Major Depressive Episode, positive cases must have one of "depress mood" or "loss of interest or pleasure" symptoms plus at least three others. Inquiry stops with negative responses to both the first items. If positive to either or both gatekeeper questions, inquiry continues until three positive responses are obtained or the list is exhausted. Items on the list do not seem to be ordered according to any statistical criterion such as frequency of occurrence. This means that a very large number of questions are not asked of every subject. The current study is a large epidemiological one (N~10000). Other sources of missing data are negligible. The researchers were planning to undertake some form of factor analysis of the symptom data. My suggestions included (1) LCA of the data including the "Not asked" category in addition to Present and Absent. (2) Estimation of a factor model using multiple imputation methods to impute values for the unasked questions. (It seems to me that the data may be considered missing at random.) My concern is that both (and particularly the LCA) may simply give back the structure that was used to decide what questions to ask. The imputation approach is also rather messy, but seems to be the only way to get a MAR analysis with categorical data. I'd welcome suggestions for alternative approaches I might try in Mplus or other software, or comments on what I've proposed. 


It doesn't seem that this study was designed to study the dimensionality of depression due to the skip patterns employed. Factor analysis and LCA would seem to require yes/no answers to all of the questions in order to determine dimensions and classes. I don't believe the data can be easily used for more than classifying individuals as depressed/not depressed. Regarding your suggestions, approach 2 using multiple imputation would seem to depend too highly on modeling assumptions given that such a large fraction of individuals are missing. You have MAR conditional on the first two questions but for the population of nondepressed, you have missing data for all depression items. Therefore, these individuals cannot be included in the analysis. The proper model here may be a kind of sequential model, sometimes used in econometrics. In the first stage you predict who will give a positive answer to the first two questions. In the second stage, you model responses for the remaining items for those individuals who gave a positive response to one of the first two questions. However, this does not provide information about dimensionality for the whole population. I am not any commercial software that does this. 

HeeJin Jun posted on Wednesday, January 24, 2007  4:54 pm



I want to do the BMI trajectory analysis. Population are adoloescents and young adults (N equals approximately 15000). In the begining of the study, there are 6 age group (from 9 to 14) and they have been followed 8 years (7 times from 1996 to 2003, skipped 2002). Because age is an important factor, we want to treat age as a time point. but it gives us lots of systematic missing observations, e.g., if a kid were 10 at 1996 then we don't have BMI at age 9 and BMI at age 17 through 22. We saw the example 6.12 individaully varying times of observation. Can we apply the method that was described in the example 6.12 to avoid the systematic missing observations. example of coding i s  y9y23, a9a23; s9  y9 on a9 s9  y10 on a10 s9  y11 on a11 s9  y12 on a12 s9  y13 on a13 s9  y14 on a14 s9  y15 on a15 s9  y16 on a16 s10  y10 on a10 s10  y11 on a11 s10  y12 on a12 s10  y13 on a13 s10  y14 on a14 s10  y15 on a15 s10  y16 on a16 s10  y17 on a17 . . . s14  y14 on a14 s14  y15 on a15 s14  y16 on a16 s14  y17 on a17 s14  y18 on a18 s14  y19 on a19 s14  y20 on a20 s14  y21 on a21 Thanks a lot for your help. 


Yes, you can use individuallyvarying times of observation in this case but your code would be: i s  y1y8 AT a1a8; The outcome and age variables are from each measurement occasion not each age. It is not necessary to have timevarying covariates. 

HeeJin Jun posted on Friday, January 26, 2007  6:47 pm



Dear Linda, Thank you so much for your response. We tried what you suggested and got the error message *** ERROR in Model command Growth factor indicators must be all observed or all latent. We tried two ways, 1. MODEL: %OVERALL% i s  bmi96@age96 bmi97@age97 bmi98@age98 bmi99@age99 bmi00@age00 bmi01@age01 bmi03@age03; 2. MODEL: %OVERALL% i s  bmi96bmi03@age96age03; Thank you for your help. 

HeeJin Jun posted on Friday, January 26, 2007  7:35 pm



Dear Linda, Please disregard our last question. We figured out the problem, but we got a new problem... New error message says, THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILLCONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.182D16. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING PARAMETER 28. We increased the starting value STARTS = 500 200; STITERATIONS = 100; Is there any other way than increasing starting value? Thanks for your help. 


Please send your input, data, output, and license number to support@statmodel.com. 

Joop Hox posted on Friday, April 20, 2007  5:21 pm



A simple question on m odeling incomplete data. I have panel data with dropout. I want to do a pattern mixture analysis. So I have 3 patterns: complete, random missing, and attrition. In the last group, per definition the last measurement occasion ALL variables are missing. Is there a way to use 4 variables in one group and only 3 in the other? I know you couldd work with dummies and ghost variables, but this is not very elegant. I am posting this here instead of mailing support because I think others might find this useful. Joop Hox 


You need to have the same number of variables in each group. In the last group, the variables measured at the last time point will have all missing value flags. I think you can avoid problems of no variance for those variables by adding the option: VARIANCES = NOCHECK; to the DATA command. 


I am fitting a Simplex model on repeatedly measured twin data. I have very high missingness in combination with structural missingness. This leads to problems with h1 estimation. Is there a way to disable H1 estimation in Mplus so I can at least inspect the parameter estimates my model would result in? 


Ask for NOCHISQUARE in the OUTPUT command. 

Back to top 