Mplus Discussion >> Missing by design

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Missing by design

Mplus Discussion > Missing Data Modeling >

Message/Author

Anonymous posted on Monday, October 22, 2001 - 12:19 pm

Dear Linda/Bengt

May I ask you one question? I am trying to use the method of missing by design following example 24.2 in Mplus manual. My data have two time-point and my subjects' age ranges from 4-9. I am planning to develop latent growth modeling using age not using the data collection times. My question is whether it is OK to conduct latent growth modeling in this situation. That is, I have 6 time points (age) in my LGM but each subject has only two observations and four missing points. Does my plan sound OK? Do I have too many missing points relative to the number of observations?
Thanks so much.

bmuthen posted on Tuesday, October 23, 2001 - 9:41 am

It is possible to do growth modeling in your case. The growth model that you can fit is essentially determined by the number of time points a given individual has - that is, 2 in your case. With 2 time points you can only fit a restricted model, such as random intercept but fixed (not random) slope. This can be specified as zero slope variance and zero intercept-slope covariance.

Anonymous posted on Tuesday, March 15, 2005 - 8:15 am

We are fitting a growth curve model to cohort-sequential data covering ages 3-22, which we are handling with FIML estimation using type = missing (there are many different missing data patterns). We have time-varying covariates that follow the same patterns of missingness. But because these are exogenous observed variables, they are not allowed to have missing values. Is it possible with Mplus to estimate the effects of time-varying covariates that themselves have multiple patterns of missingness (without doing imputation)?

Thanks!

Linda K. Muthen posted on Tuesday, March 15, 2005 - 8:41 am

You would have to bring them into the model to do this. This would mean that they would take on distributional assumptions as if they were endogenous. To do this, you can mention their variances in the model command.

Anonymous posted on Monday, May 30, 2005 - 11:47 pm

Is it correct that it is not possible to use the pattern statement for defining data missing by design when using categorical variables in an EFA or CFA? If so, is there a way to get around this other than treating the variables as interval variables?

Linda K. Muthen posted on Tuesday, May 31, 2005 - 6:46 am

I am assuming that you must have tried this and gotten an error message to that effect. This seems possible given that missing was originally developed for continuous outcomes. You can get around this by doing your analysis in two steps. In the first step, do not treat the variables as categorical. Use the PATTERN option and do a TYPE = MISSING BASIC; in conjunction with the SAVEDATA command. In the second step, analyze the saved data while declaring the outcomes as categorical.

Anonymous posted on Thursday, June 09, 2005 - 4:55 pm

That's right. Thank you very much for your response. I will try this.

S. Oesterle posted on Saturday, June 11, 2005 - 2:18 pm

I would like to estimate a Multiple Cohort Growth Model similar to the example you have on your Special Analyses page. However, my dataset is already arranged by age (not wave). That is, I don't need to rearrange my data. How do I set up the model in this case? I tried using the PATTERN option in combination with my cohort variable to define the missing by design pattern, but I had to set the covariance coverage value to zero in order for Mplus to estimate my model. How would you set up the model in this case?

Linda K. Muthen posted on Monday, June 13, 2005 - 12:23 am

In that case, you don't need to do anything special. The special fuctions are just for rearranging the data. Use TYPE=MISSING:

Eveline posted on Tuesday, June 14, 2005 - 8:15 pm

I think I have followed your advise on using the PATTERN statement for CATEGORICAL variables in two steps. I don't think it worked, but I might have misunderstood you. This is what I did.

(1) TYPE=MISSING BASIC in analysis statement, using the PATTERN statement, treating variables as continuous, 1 iteration and saving the response data.

(2) I've tried two versions of the second step. I defined the indicators as categorical and tried to run it both with and without PATTERN statement:

- When I did not use the PATTERN statement the following statement appeared in the output "COMPUTATIONAL PROBLEMS ESTIMATING THE CORRELATION FOR S404QRA AND S269QRA. THE MISSING DATA COVARIANCE COVERAGE FOR THIS PAIR IS ZERO." Same for many other pairs of variables that do not appear in the same booklet.

- When I did use the PATTERN statement the output gave the error "*** ERROR in Variable command. Analysis with categorical variables is not available with PATTERN, COHORT, COPATTERN features."

Is this what you meant I could try? Thanks!

bmuthen posted on Wednesday, June 15, 2005 - 7:49 am

Please send this question to support@statmodel.com.

fati posted on Friday, October 07, 2005 - 11:26 am

I have use the PATTERN option with TYPE = MISSING BASIC; in conjunction with the SAVEDATA command. in order to analyse the saved file in mixture model in the second step, but i have a problem with the number of observations, the number of observations is counted incorrectly, i have in first 978 observations, i have use a variable with no missing value as pattern, which have 3 category, but in each category of pattern variable, i have also the missing so the number of observations is not correct (just 783), how can i use pattern variable and counts this observations with other missing

Linda K. Muthen posted on Saturday, October 08, 2005 - 2:05 pm

You need to send your input, data, output, and license number to support@statmodel.com. It is not clear from your description exactly what is happening. There will be cases deleted if they do not have the variables listed using the PATTERN option for their pattern.

fati posted on Thursday, October 20, 2005 - 9:29 am

I would like doing a LCA analysis with data misssing by design, then i have doind analysis in two steps. In the first step, I use the PATTERN option and do a TYPE = MISSING BASIC; in conjunction with the SAVEDATA command. In the second step, analyze the saved data while declaring the outcomes as categorical.
but in my first outout, I have the following message :
THE MISSING DATA EM ALGORITHM FOR THE H1 MODEL HAS NOT CONVERGED WITH RESPECT TO THE PARAMETER ESTIMATES. THIS MAY BE DUE TO SPARSE DATA LEADING TO A SINGULAR COVARIANCE MATRIX ESTIMATE.
INCREASE THE NUMBER OF H1 ITERATIONS.

RESULTS FOR BASIC ANALYSIS

ESTIMATED SAMPLE STATISTICS

NO CONVERGENCE IN THE MISSING DATA ESTIMATION OF THE SAMPLE STATISTICS.

I have trying to increase tne number of h1iterations in 2000, but I have always the same message, what can I do for resolve this?
thank tou for your help

bmuthen posted on Thursday, October 20, 2005 - 6:52 pm

If you have missing by design, MCAR (missing completely at random) holds for this part of the missingness. I would not do the analysis in two steps, but in one step using Type = Missing. This takes care of both missing by design and other missingness giving ML estimation under MAR. This may solve your H1 model problem.

If your H1 problem persists, you can either drop H1 from the estimation or try
a multiple-group analysis where each missing-by-design pattern is a group. Here you should also use Type = Missing.

fati posted on Thursday, October 27, 2005 - 5:48 am

I think it is not possible to do pattern option with categorical variables, this is the reason why I have used two steps in my analysis, that's correct?

Linda K. Muthen posted on Thursday, October 27, 2005 - 6:12 am

Yes.

fati posted on Thursday, October 27, 2005 - 8:25 am

THANK YOU;

I have a question about the second step for my analysis , the file created contains * for the values missing by design, but in my analysis, do I make that missing=* in the second step?

Linda K. Muthen posted on Thursday, October 27, 2005 - 11:41 am

Yes.

jad posted on Friday, February 03, 2006 - 8:44 am

Can I use more than one patterns variables, I have in my data 9 patterns variables , I use an LCA analysis

Linda K. Muthen posted on Friday, February 03, 2006 - 9:26 am

What do you mean by a pattern variable?

JAD posted on Tuesday, February 07, 2006 - 10:18 am

I HAVE A MISSING DATA BY DESIGN , BUT I HAVE MORE THAN ONE DESIGN, MY QUESTION IS CAN I USE IN PATTERN COMMAND , MORE THAN VARIABLE IN ORDER TO DEFINE THE DESIGN OF MISSING.
THANKS

Linda K. Muthen posted on Tuesday, February 07, 2006 - 6:42 pm

The pattern command refers to one design variable. If you have more
than one design variable, you could create a variable that reflects
this and then specify which variables individuals with each conditon
should have.

Michelle Williams posted on Tuesday, October 31, 2006 - 9:39 am

I am trying to do a multigroup CFA model, but my grouping variable is defined by missingness. I have used the PATTERN command in order to allow for the missingness, but the only way that I can get the model to run is if I leave out the grouping variable, thus having only one model. If I leave the grouping command in the model the program gives me an error saying that the grouping/pattern variable has multiple uses. I have included a sample of my syntax to clarify what I am trying:

GROUPING IS patgrp(1=1 2=2 3=3);
PATTERN IS patgrp(1=a b c d e f 2=a b c d e 3=a b c d);

ANALYSIS: TYPE=General MISSING h1; ESTIMATOR=ML;

MODEL: f1 by a b;
f2 BY c d;
MODEL 1: f2 by e f;
f1 WITH f2 (cv1);
MODEL 2: f2 by e;
f1 WITH f2 (cv2);
MODEL 3:
f1 WITH f2 (cv3);

If I remove the MODEL 1 - MODEL 3 statements and the GROUPING statement, and include the variables with missingness in the by part of the MODEL statement it runs, but it is only one model, which is not what I want.

I would greatly appreciate any suggestions
you might have for rectifying this problem, or if you can point me in the direction of some literature that I might read.

Linda K. Muthen posted on Wednesday, November 01, 2006 - 8:11 am

If your groups represent different patterns of missing data, then you don't also need the PATTERN option. The same variable cannot be used as both a grouping variable and a pattern variable.

Michelle Williams posted on Wednesday, November 01, 2006 - 8:19 am

Thank you for your response. I initially tried running the model without the pattern statement but the missing data gave me problems. Specifically, I kept getting errors that said a variable had no non-missing values, (meaning the variable that is missing for a group), even if I did not invoke that variable in that groups model statement.

For example, the following code gives me problems because for group 2 there are no non-missing values for variable f and for group 3 there are no non-missing values for variables e or f.

GROUPING IS patgrp(1=1 2=2 3=3);

ANALYSIS: TYPE=General MISSING h1; ESTIMATOR=ML;

MODEL: f1 by a b;
f2 BY c d;
MODEL 1: f2 by e f;
f1 WITH f2 (cv1);
MODEL 2: f2 by e;
f1 WITH f2 (cv2);
MODEL 3:
f1 WITH f2 (cv3);

I guess the question is then, how do I get Mplus to ignore the missing values for group 2 and group 3?

Linda K. Muthen posted on Thursday, November 02, 2006 - 1:16 pm

In your case, you should use the PATTERN option and not the GROUPING option.

Michelle Williams posted on Thursday, November 02, 2006 - 2:07 pm

Thank you for your response, again. I am sorry to be a pain about this, but I think that I may not be making my problem clear. I need to do a multigroup CFA. I have three groups and they are defined by missingness. For example, group 1 has data for all variables, group 2 has data for all but variable A, and group 3 has data for all but variables A and B. I am constructing my CFA so that I have three models: MODEL 1 has all variables in it, MODEL 2 has all but variable A in it, and MODEL 3 has all but variables A and B in it. When I use the grouping option, MPlus gives me errors saying that (even though I do not invoke the missing variables in their corresponding model statement) there are no non-missing values. I tried using the PATTERN option to fix this, and then I could get a single, overall model to run, however, as mentioned in my first post, I need seperate models for each group. As far as I can figure, I cannot see how to do a multigroup CFA with the groups defined by missingness since I cannot use the GROUPING and PATTERN options at the same time.

To boil it down, is there a way that I can use the PATTERN option and run multiple group analyses, or alternatively, can I use the GROUPING option and tell MPlus to ignore the missingness by design so that I don't get errors when I define the groups by missing variables?

Thanks again.

Linda K. Muthen posted on Friday, November 03, 2006 - 8:22 am

I assume you want to do multiple group analysis to test for the invariance of certain parameters across group. You can do the same thing in a single group analysis using a set of dummy variables to represent the three groups. Usually when groups represent patterns of missing data, equalities are placed such that the results represent a single-group analysis.

Maren Winkler posted on Friday, October 12, 2007 - 2:35 am

In your post from May, 31st, 2005 you propose a two-step-procedure for using the PATTERN option in combination with CATEGORICAL data, using the SAVEDATA command in the first step. It is not clear to me which data I should save in the first step.
I want to establish a SEM for one test. Subjects worked on one of three different testbooks, leading to missing by design in my data.
Thank you very much for your help.

Linda K. Muthen posted on Friday, October 12, 2007 - 4:15 pm

You would run the analysis with your full set of data. The PATTERN option would specify for each value of the pattern variable, the variable for which listwise deletion would be carried out. You save that data and use it in your analysis.

Maren Winkler posted on Tuesday, November 06, 2007 - 7:29 am

Dear Linda,

thanks for your last advice - it helped a lot with one dataset. Now I've encountered a new problem with two other datasets.
In these two studies, we had three groups. All students worked on the same six items. Additionally, each group of students worked on 8 to 9 unique items. Therefore, we cannot estimate correlations between the unique items, e.g. from version A and version B.
I tried the two-step-procedure for this data. In step 2, I don't get any results but the following information:
"THE MISSING DATA COVARIANCE COVERAGE FOR THIS PAIR IS ZERO" for all pairs of variables from two different test versions.
Is there a possibility to circumvent this problem or does the data simply not allow estimation of a SEM?
Thank you very much for your help.

Linda K. Muthen posted on Tuesday, November 06, 2007 - 7:53 am

Please send your input, data, output, and license number to support@statmodel.com so I can see the whole picture. Covariance coverage of zero should not be a problem if this is by design.

daniel adkins posted on Friday, February 15, 2008 - 11:37 am

i have an r&r on a paper where i have reorganized 3 waves of data into 14 ages and fit a quadratic LGM trajectory. Everything looks good--no flags, good fit indices and nice parameter estimates. unfortunately, i have a reviewer who doesn't comprehend using FIML to address data which is missing by design due to the wave->age rearrangement. the reviewer claims that i have used 3 waves of data to impute 8 additional repeated measures and accordingly, argues that this is an invalid method. do you know of a good citation supporting the use of FIML to address data which is missing by design in such an "accelerated" longitudinal design? i know the relevant passages in the Mplus users manual and also the Bollen and Curran (2006) book. any other citations you'd recommend?

Linda K. Muthen posted on Friday, February 15, 2008 - 3:21 pm

If you are interested in growth over age and you have collected data on several occasions, you need to arrange the data by age to study this development or use age as an individually-varying time of observation. See the following paper where this type of analysis was done:

Muth�n, B. & Muth�n, L. (2000). The development of heavy drinking and alcohol-related problems from ages 18 to 37 in a U.S. national sample. Journal of Studies on Alcohol, 61, 290-300.

Missing by design is MCAR. See the Little and Rubin book for background.

daniel adkins posted on Wednesday, April 16, 2008 - 4:42 pm

could you suggest a reference that would allow me to substantiate the claim that missing by design is MCAR? i'm familiar with rubin's work, but do not recall him specifically addressing missing by design. i could be wrong though.

Linda K. Muthen posted on Wednesday, April 16, 2008 - 5:16 pm

See page 16 of the Little and Rubin book.

Catherine Close posted on Thursday, April 17, 2008 - 1:46 pm

I have only categorical (binary) variables and it's one of those cases where the data is missing by design. I have demographic data for all examinees, a set of common items for all examinees, some examinees taking two test forms A and B (but not C and D), and the rest taking two separate test forms C and D(but not A and B). I have tried the PATTERN option but I know that I am not doing it right. For now, I am just computing the tetrachoric correlations and any help as to how I can specify the missing by design data will be appreciated.

Linda K. Muthen posted on Friday, April 18, 2008 - 8:52 am

You should set your data up with four columns representing A, B, C, and D. Individuals who have not taken C and D will have missing data on C and D. Individuals who have not taken A and B will have missing data on A and B. Then use the default of TYPE=MISSING;

Catherine Close posted on Friday, April 18, 2008 - 10:54 am

Dear Linda,
Thank you very much for the quick response. This is a step ahead.

Jon Elhai posted on Friday, June 20, 2008 - 1:16 pm

Linda,
When using the savedata option (with file is data.dat), is the resulting saved dataset supposed to include values that were previously missing but that Mplus estimated by ML?

Linda K. Muthen posted on Friday, June 20, 2008 - 3:02 pm

No, Mplus does not estimate values for those that are missing. The model is estimated using all possible information.

Abdel posted on Saturday, August 09, 2008 - 7:05 am

Hi,

I am trying to run a CFA with dichotomous variables. I have 4 cohorts and 10 items. Of one of those four cohorts I only have 5 of the 10 items. I try to use the PATTERN IS command like this:

VARIABLE:
NAMES =
trappreg trappext sex1to7 twzyg
y9 y18 y36 y40 y46 y63
y66 y70 y84 y85 lijst group;

USEVARIABLES =
y9 - y85 ;

CATEGORICAL =
y9 - y85 ;

PATTERN IS lijst (1= y9 y18 y36 y40 y46 y63 y66 y70 y84 y85
3= y9 y18 y36 y40 y46 y63 y66 y70 y84 y85 4= y9 y18 y36 y40
y46 y63 y66 y70 y84 y85 5= y18 y40 y46 y70 y85) ;

CLUSTER = trappreg ;
MISSING = ALL (-1) ;

ANALYSIS:
TYPE = COMPLEX MISSING h1 ;
ESTIMATOR = WLSMV ;
PARAMETERIZATION = DELTA ;
ITERATIONS = 1000 ;

MODEL:

etc....

But Mplus 3 does not run that for me, and it doesn't give me a reason why.. It just acts like it is running and returns an outputfile with only my script in it. It does run well with a good output without the "PATTERN IS" line however. Do you have any idea why he is not running with the PATTERN IS line?

Thanks in advance!

Bengt O. Muthen posted on Saturday, August 09, 2008 - 9:20 am

No, I can't say. You should first update to version 5.1 and if the problem persists then send it to support@statmodel.com with your license number.

Abdel posted on Saturday, August 09, 2008 - 11:04 am

Thanks for the quick reply! It will be a little difficult for me to get that update on time, since I got sort of a deadline.. Are there any readings you know about that can tell me something about how large the bias is without the PATTERN IS option? Or some paper(s) about how Mplus handles missings by design in a case like this?

Bengt O. Muthen posted on Saturday, August 09, 2008 - 1:56 pm

Dropping PATTERN and using Type = Complex Missing H1 should be sufficient here.

Cristina Sturaro posted on Wednesday, September 24, 2008 - 5:40 am

Hi Linda/Bengt,

I have this model:

analysis: type is missing complex;
model:
a11 by ra11 (1)
pa11 (2);
a21 by ra21 (1)
pa21 (2);
a22 by ra22 (1);
a31 by pa31* (2);
a32 by ra32 (1)
pa32 (2);

ia sa | a11@0 a21@0.5 a22@1 a31@1.5 a32@2;

ia;
sa;
ia with sa@0;

[ra11] (10);
[ra21] (10);
[ra22] (10);
[ra32] (10);
[pa11] (11);
[pa21] (11);
[pa31] (11);
[pa32] (11);
a11 (20);
a21 (20);
a22 (20);
a31 (20);
a32 (20);

ra21 with ra22;

Because of the two missing variables (pa22 and ra31) I used missing by design when I created the 5 latent variables I needed to model the growth of a.

My question is, would it be possible (and if yes, how?) to obtain the values of the means of theses 5 latent variables?

Thank you

Linda K. Muthen posted on Wednesday, September 24, 2008 - 10:12 am

If pa22 and ra31 are missing for everyone, don't include them in the analysis.

Ask for TECH4 in the OUTPUT command to obtain the mean values.

Cristina Sturaro posted on Thursday, September 25, 2008 - 12:28 am

Thank you Linda,

I actually asked for TECH4 (sorry I forgot to add it to the post), but it gives me the factor scores standardized with a mean of 0.

Is there any other way tho get them not standardized?

Thanks

Cristina Sturaro posted on Thursday, September 25, 2008 - 12:47 am

And just another thing.

I just saved a .dat file with the factors scores. I see that in some cases they have negative values, which is impossible with regards to my variables.

Can I solve that problem and obtain only positive factor scores?

Thanks again.

Linda K. Muthen posted on Thursday, September 25, 2008 - 10:23 am

Please send your input, data, output, and license number to support@statmodel.com.

Maren Winkler posted on Thursday, May 14, 2009 - 1:11 am

Dear Linda,

I have used the two-step procedure you've suggested on May 31, 2005 to analyse my categorical data with missing by design.
In step 1, I have the following warning: "THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.154D-13. PROBLEM INVOLVING PARAMETER 36."

Nevertheless, I proceeded with step 2 where I did not get this message. However, I'm not sure whether I can trust my results now.

Linda K. Muthen posted on Friday, May 15, 2009 - 8:10 am

In the first step, it sounds like you included a MODEL command. Rerun the analysis with TYPE=BASIC MISSING; and no MODEL command. I think this will take care of the problem.

Maren Winkler posted on Monday, May 18, 2009 - 1:55 am

Dear Linda,

thanks, it helped.

Richard E. Zinbarg posted on Sunday, December 05, 2010 - 5:57 pm

A collaborator has collected data for a joint project via a website with large numbers of subjects but with a lot of missing data by design (he administers a relatively small subset of the item pool to each subject to reduce subject burden). In fact, the covariance coverage is below the 10% Mplus default cutoff for every pair of items. I would very much appreciate being pointed in the direction of information re: how the 10% cutoff was derived. Are there simulations, for example, that suggested this cutoff? And might a different value be acceptable with a very large overall sample? (That is, is it truly the % that is important or is it the number of cases each covariance is based on that is important?). Many thanks in advance!

Linda K. Muthen posted on Monday, December 06, 2010 - 10:14 am

Missing by design usually involves zero coverage which when by design is no problem. Low coverage is a problem. The 10% cutoff is not suggested as an okay amount. Much higher coverage is recommended. It is just where we draw the line. Lower coverage typically gives problems with the unrestricted H1 model. See the Schafer book in the user's guide reference list for further information about coverage.

roofia galeshi posted on Friday, October 14, 2011 - 8:10 am

hi,

Would you please direct me to syntax examples of MISSING BY DESIGN. I have 14 booklets,subjects worked on 2 of 14 different testbooks, leading to missing data. These are categorical in Latent class analysis. I am not able to find syntax examples with explanations of the codes.

Thanks you

Linda K. Muthen posted on Friday, October 14, 2011 - 8:40 am

There is no syntax. This is the default in Mplus. Just use the MISSING option of the VARIABLE command to specify the missing value flag(s) in your data set.

Andre Plamondon posted on Thursday, October 20, 2011 - 1:00 pm

Hi,
what is the logic behind the two step procedure outlined above? The resulting dataset seems to contain the same data but reorganized.

Andre Plamondon posted on Thursday, October 20, 2011 - 1:23 pm

I also have another question. I tried the two-step procedure and I want to assess DIF. When I use the MLR estimator it works, but when I use WLSMV I get this error (for the item that are missing by design):

THE WEIGHT MATRIX PART OF VARIABLE I9T1 IS NON-INVERTIBLE. THIS MAY
BE DUE TO ONE OR MORE CATEGORIES HAVING TOO FEW OBSERVATIONS. CHECK
YOUR DATA AND/OR COLLAPSE THE CATEGORIES FOR THIS VARIABLE.
PROBLEM INVOLVING THE REGRESSION OF I9T1 ON AGEGROUP. THE PROBLEM
MAY BE CAUSED BY AN EMPTY CELL IN THE BIVARIATE TABLE.

Is there a solution other than using MLR?

Linda K. Muthen posted on Friday, October 21, 2011 - 10:18 am

The two steps are required because the data are categorical and the procedure is restricted to continuous.

You can look at the variable and collapse categories. Take a look at the crosstab for the two variables to see what the problem is. See the CROSSTAB option.

Andre Plamondon posted on Friday, October 21, 2011 - 11:03 am

I should have said that this error only appears for the variables which are missing by design. I'm trying to regress the indicators on the grouping option (the grouping variable used in the pattern option in the first step) to see if there's DIF for the item that appear in both groups. I get the error even though I don't regress the variables missing by design on the grouping option but this doesn't happen when using MLR.

Linda K. Muthen posted on Friday, October 21, 2011 - 2:15 pm

This makes sense. Variables missing by design may have sparse cells in the bivariate tables. MLR does not use univariate and bivariate frequencies for model estimation so this message would not come up although the issue of sparse cells is still there. I would look at the CROSSTABS.

Andre Plamondon posted on Friday, October 21, 2011 - 2:42 pm

I fixed the problem by fixing the covariations between the grouping variable and the missing by design variables at zero.
Thanks!

Malte Jansen posted on Monday, November 28, 2011 - 9:01 am

Dear Linda/Bengt,

I am trying to analyse data that both has a nested structure (students in classes in schools) and an incomplete design.

I tried using PATTERN to specify which variables are missing by design (there were 3 different designs; thus each student was administered 2/3 of the total items) and using
STRATIFICATION IS school;
CLUSTER IS class;
to adjust standard errors to the nested structure.

However, PATTERN doesn't seem to work with TYPE=Complex and STRATIFICATION only works with TYPE=Complex. I would be very grateful of you could help me address this problem.

Best regards

Linda K. Muthen posted on Tuesday, November 29, 2011 - 11:52 am

You can do the analysis in two steps. In the first step, do not use COMPLEX and STRATIFICATION. Simply use the PATTERN option and TYPE=BASIC putting the CLUSTER and STRATIFICATION variables on the AUXLITIARY list. Save the data in this step. In the second step, use the saved data and the COMPLEX and STRATIFICATION options.

Kou Murayama posted on Tuesday, February 07, 2012 - 1:27 am

I have data with missing by design, like x1-x8 with x6-7 being missing in group 1 and x5-6 being missing in group 2. So the data have coverage of zero covariance.

I tried "pattern" option (with Mplus 6) in VARIABLE command with ML, and got a result. Then I omit this option and run the model again. The results were identical. What does "pattern" option specifically do?

Linda K. Muthen posted on Tuesday, February 07, 2012 - 5:42 pm

The PATTERN option is for use with listwise deletion not the TYPE=MISSING default in Mplus.

Sandra N. posted on Thursday, March 08, 2012 - 1:44 am

Hi Linda,
we collected data by using a multi-matrix-design (35 booklets, balanced incomplete block design, N=1200). We tried running a CFA based on these data and had to lower the covariance coverage limit as it is around .03 for the variables we are interested in. The fit indices were implausibly high, some latent correlations were above 1. We tried if the pattern command would help, but it did not. Is there any possibility how we can deal with these problems? Is there a minimum for the covariance coverage for running the analyses? Many thanks in advance!

Linda K. Muthen posted on Thursday, March 08, 2012 - 12:46 pm

It seems odd you would have coverage of .03 when you have planned missingness. Usually is it zero for the planned missingness.

Christoph Weber posted on Tuesday, May 08, 2012 - 5:28 am

Dear Dr. Muthen,
I'm running an IRT-model with planned missing data. I have 4 booklets, 8 items occur in each booklet, 16 items vary from booklet to booklet. I tried to use the PATTERN command. But an error in the output indicates, that the PATTERN command can't be used in conjunction with categorical data.

Is it correct to use FIML-estimation and not to specify the missing by design pattern?

Thanks
Christoph Weber

Linda K. Muthen posted on Tuesday, May 08, 2012 - 5:54 am

Yes, this is what you should do. The PATTERN option was used when the default was listwise deletion to minimize the cases deleted.

Christoph Weber posted on Wednesday, May 09, 2012 - 3:38 am

Dear Dr. Muthen,
another question: I want to compare a 1p and a 2p IRT Model (45 Items). Mplus can't compute Chi� because
"THE CHI-SQUARE TEST CANNOT BE COMPUTED BECAUSE THE FREQUENCY TABLE FOR THE LATENT CLASS INDICATOR MODEL PART IS TOO LARGE."
Should I use AIC and BIC to compare the models, or is there another possibility?

Thanks
Christoph Weber

Bengt O. Muthen posted on Wednesday, May 09, 2012 - 8:29 am

In this case you can use the likelihood ratio chi-square difference test. Two times the difference in loglikelihoods for the two models is chi-square with df equal to the difference in number of parameters.

Christoph Weber posted on Wednesday, May 09, 2012 - 12:24 pm

Dear Dr. Muthen,
just another question. What would be the best way for item selection using Mplus. I have 45 items and want to select those, which conform to the 2p or 1p model. Sorry for the these questions, but its my first time doing irt-models.

Should I start deleting items based on factor loadings and then using ICC and Information Curves? Further inspect output of tech10. Starting with univariate fit information and deleting those items with z > 1.96 and so on??

Is there a possibility to evaluate person fit? Can I use the standardised residuals for the response pattern?

Thanks
Christoph Weber

Hallie Bregman posted on Thursday, May 10, 2012 - 10:27 am

Hi, I want to run a cohort-sequential LGM. I have 20 cohorts each with 4 measurement occasions, across 20 time-structured timepoints, set-up in a wide data format. I have a fairly small sample (N= 152), and each of my cohorts are small. I also have missing data due to dropout, and have 71 missing data. I set the residuals of all timepoints equal. Even after setting COVERAGE = 0 and increasing my H1Iterations, I receive errors in convergence.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE
COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL.
PROBLEM INVOLVING PARAMETER 5.
Parameter 5 is the Psi between intercept and slope, and I do not notice any blatant problems with that value. How do you suggest that I proceed? I have also considered running this model in a multilevel framework, or using TScores. Do you suggest either of these approaches instead of the LGM described above, keeping in mind that I intend to eventually conduct a parallel process model using this LGM? Thanks very much for your input.

Linda K. Muthen posted on Thursday, May 10, 2012 - 1:24 pm

Christoph:

I think EFA is a useful tool for seeing if your items are behaving the way intended.

To evaluation person fit you can look at the loglikelihood outliers.

Linda K. Muthen posted on Thursday, May 10, 2012 - 1:25 pm

Hallie:

Please send the output and your license number to support@statmodel.com.

Samuli Helle posted on Wednesday, February 06, 2013 - 6:34 am

I'm rather new to Mplus and to missing data procedures. I think I have a case where part of my data is missing by design: I have measured three DVs from three cohorts of women and for one DV, there are missing values (by design since it was not recorded?) for all women belonging to a specific cohort. My impression is that as I'm using Mplus 7.0 I actually shouldn't do anything "special" because FIML can handle this sort of missingness (and I have used e.g. MISSING ARE ALL(-99))? Is this correct?

Linda K. Muthen posted on Wednesday, February 06, 2013 - 11:51 am

You don't need to do anything special. FIML is the default in Mplus.

Samuli Helle posted on Wednesday, February 06, 2013 - 1:06 pm

Thank you. So I can ignore the following note:

ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED: 8

...and that this parameter has estimate and SE of 0.0000.

Linda K. Muthen posted on Wednesday, February 06, 2013 - 2:00 pm

Please send your output and license number to support@statmodel.com.

Stata posted on Wednesday, March 20, 2013 - 3:04 pm

Does Mplus take missing value coded as 0? Would that affect the result?

Missing = all (0);

Thank you.

Linda K. Muthen posted on Wednesday, March 20, 2013 - 3:40 pm

As long as 0 is not a valid number for any of the variables, I can't see that this would be a problem.

Tor Neilands posted on Tuesday, November 05, 2013 - 8:46 pm

Hi,

Is the approach implemented via the PATTERN option of the VARIABLE command the same as the multiple-groups SEM methodology described in Muthen, Kaplan, and Hollis (1987)? I am interested in replicating their approach using Mplus with a toy example to demonstrate how FIML works "under the hood" for an upcoming presentation I'm delivering.

Thanks,

Tor Neilands

Linda K. Muthen posted on Wednesday, November 06, 2013 - 10:00 am

I think these would be the same.

Tor Neilands posted on Wednesday, November 06, 2013 - 3:59 pm

Thanks, Linda.

I noticed PATTERN only works with single-level data with all continuous endogenous variables. How would one manually implement this approach for other models in Mplus if one needed to do it manually?

Thanks again,

Tor

Linda K. Muthen posted on Thursday, November 07, 2013 - 6:25 am

I can't think of a way unless you treat the variables as continuous, save the data, and then do the analysis using that data and treating the variables as categorical.

Jasmine Park posted on Monday, February 24, 2014 - 1:31 pm

I am trying to run CFA with categorical indicators. The data have both designed missingness (approximately 60%) and truly omitted responses (approximately 3.4% of valid data).
As many people have asked previously, the �pattern� statement does not work with categorical variables, so I tried the 2-step approach you suggested in May 31st, 2005. In this approach, can you elaborate how missing data (missing by design) is treated? I am using the default estimator (WLSMV).
Also, is there any way I can model both designed missingness and truly omitted responses when running CFA with categorical indicators?

Bengt O. Muthen posted on Tuesday, February 25, 2014 - 12:21 pm

I would suggest using multiple groups to handle the missing by design. And then ML to handle missing within each design group.

Jasmine Park posted on Thursday, March 20, 2014 - 7:41 am

Hello, I have a question about how factor scores are computed when data are missing by design.

I ran a CFA (3-factor structure) with 14 variables (ordinal variables) using the WLSMV estimation, and data are missing by design. For group 1, all 14 variables were administered. For group 2, 10 out of 14 variables were administered. For group 3, 4 out of 14 variables were administered. Group 2 and 3 do not share any common items.

When I looked at the correlation among the factor scores by each group, the correlations were 1.0 for group 3 (those who got 4 out of 14 items).

So, I am wondering how factor scores were computed for group 2 (missing 4 variables) and 3 (missing 10 variables) for factors representing the variables that were not administered.

Is there a formula for computing factor scores that I can read?

Thank you,

Bengt O. Muthen posted on Thursday, March 20, 2014 - 3:21 pm

I assume you use the Mplus default of measurement parameter invariance across groups. The factor scores for a person are estimated using the estimated model parameter values and the person's observed scores. Perhaps group 3 didn't have items measuring all three factors, or in any case the factors must be poorly measured by only the 4 variables in group 3.

The Technical Appendix for Version 2 is on our website and appendix 11 describes WLSMV factor score estimation.

Susanne Schmidt posted on Monday, May 12, 2014 - 3:05 am

I have the same problem like fati
"fati posted on Friday, October 07, 2005 - 11:26 am
I have use the PATTERN option with TYPE = MISSING BASIC; in conjunction with the SAVEDATA command. in order to analyse the saved file in mixture model in the second step, but i have a problem with the number of observations, the number of observations is counted incorrectly, i have in first 978 observations, i have use a variable with no missing value as pattern, which have 3 category, but in each category of pattern variable, i have also the missing so the number of observations is not correct (just 783), how can i use pattern variable and counts this observations with other missing"

Is there a solution, that observations with missing values due to nonresponse within each pattern are included in the in the Analysis?

Linda K. Muthen posted on Monday, May 12, 2014 - 10:02 am

I would not use the PATTERN option. Using only TYPE=MISSING will keep all of your observations.

Aidan posted on Monday, February 23, 2015 - 3:26 pm

Dear Profs. Muth�n and MPlus community,

I have three waves of data, involving several overlapping cohorts. Following readings of the User Guide and previous posts on this forum, I have rearranged the waves into an age-based structure with the DATA COHORT command:

Data cohort:
Cohort = birthyear (1993 1994 1995 1996 1997);
Timemeasures = wave1 (2011) wave2 (2012) wave3 (2013);
Tnames = age;

However, this results in an error message (One or more variables in the data set have no non-missing values. Check your data and format statement) - which appears to be because there are no participants aged 14 (i.e., 1997-2011) or 20 (i.e., 1993-2013), so that all values are missing for 14 and 20. This is as intended - it is the intervening ages that I am looking at. I can't see a way to proceed by specifying that it is ages 15-19 that I want to analyse. Is this possible?

Linda K. Muthen posted on Tuesday, February 24, 2015 - 7:34 am

Use all of the variables except 14 and 20. Make the time scores of the model reflect that the distance from 13 to 15 and 19 to 21 is twice as long as between the other times, for example,

0 1 3 4 6 7

Aidan posted on Wednesday, February 25, 2015 - 9:12 am

Dear Linda,
Thank you for your response. Your suggestion of "using all the variables except 14 and 20" is what I am (unsuccessfully!) trying to do. I have tried to do this using both versions of the Model command below (on separate runs, of course). In each case, the program terminates with a error message, copied below. How do I exclude the age14 and age20 terms from the model?

Model:
int slope | age15@0 age16@1 age17@2 age18@3 age19@4;

Model:
int slope | age14@0 age15@0 age16@1 age17@2 age18@3 age19@4 age20@4;

*** ERROR - One or more variables in the data set have no non-missing values. Check your data and format statement.

--- Variable -- Observations -- Variance
- **AGE14 --------- 0
--- AGE15 ------- 360 --------- 0.751
--- AGE16 ------- 933 --------- 0.685
--- AGE17 ------ 1132 --------- 0.727
--- AGE18 ------- 772 --------- 0.787
--- AGE19 ------- 199 --------- 0.733
- **AGE20 --------- 0

Bengt O. Muthen posted on Wednesday, February 25, 2015 - 1:59 pm

Perhaps you still have AGE14 in your USEV list.

Aidan posted on Thursday, February 26, 2015 - 9:57 am

Dear Bengt,

Thank you for the suggestion - however, that doesn't appear to be the problem. My Usevariables list contains the variables named in the Cohort and Timemeasures commands (as per my post above) and nothing else. I believe this to be the correct procedure according to my reading and previous examples of this command that I have seen.

Following your suggestion, I have tried re-running the analysis while explicitly adding age15-age19 to the Usevariables list, or adding age14-age20. However, I get the same error message, with no change to the output posted above.

As far as I can see, the program calculates the hypothetical existence of age14 (from 1997-2011) and age20 (from 1993-2013) from either extreme of the Data Cohort command. However, there are no cases matching either of those categories using the observed data. This results in the error of variables with "no non-missing values", which halts the analysis even though these variables are not named or used in any other command. I remain unsure as to how else I could explicitly exclude these variables that I know are completely empty.

Thank you for your patience.

Linda K. Muthen posted on Thursday, February 26, 2015 - 6:06 pm

Please send the output and your license number to support@statmodel.com.

Anne Moehring posted on Tuesday, March 17, 2015 - 4:33 am

Hello,
I'm trying to run a CFA with my data. Our subjects worked on one of three tests - with some overlapping items and some that were exclusive for each test. I've tried the 2-step procedure for categorical data as described above but I only get the Error Message
"THE MISSING DATA COVARIANCE COVERAGE FOR THIS PAIR IS ZERO" for the items belonging to only one test.
Do you have any idea what I can do with this?

Thank you for your help.
Anne

Bengt O. Muthen posted on Tuesday, March 17, 2015 - 8:28 am

I don't understand what you mean by a 2-step procedure for CFA. Perhaps you need to send your output to Support.

Uzay Dural posted on Friday, March 11, 2016 - 10:44 am

Dear Dr. Muthen,

I have a longitudinal data with 3 repeated measurement points (3-month interval). At Time 2, I added new participants (to increase sample size) and similarly measured them for three times.

Hence, original sample (n = 172) responded at Time-1,T2 & T3. Others (n = 36) responded at T2, T3 & T4. I want to fit the data to a multiple group (exposure group versus control group) and multiple indicator LGM.

My questions are;
1) Is it legitimate to treat the data as missing by design?

2) Should I run a) a 4-wave MLGM, or b) a 3-wave cohort design?

3) If a);How can I deal with missingness : can I use FIML, though there are only 36 subjects at Time 4?

If b); how can use my original grouping variable (exposure vs. control) in addition to cohort as a grouping variable?

Thanks in advance!

Bengt O. Muthen posted on Friday, March 11, 2016 - 5:46 pm

1) Yes

2) I would choose b) as more flexible.

3) Just work with 2 x 2 = 4 groups and put the parameter equality restrictions in the right places.

Uzay Dural posted on Saturday, March 12, 2016 - 1:04 am

Thank you very much!

saravanelst posted on Monday, August 08, 2016 - 8:46 am

Hello,

I have a question concerning missing data. I am using a longitudinal dataset with multiple cohorts. Patients� BMI was measured annually from 2009 to 2014. As a time variable I am using disease duration. In my analyses I only included those patients with a disease duration <1 year when they entered the dataset between 2009 and 2014. This means that the disease duration range goes from <1 yr to 6 yrs. At t=0 (<1 yr diasease duration) n=5000. At t=1 (disease duration between 1 and 2 yrs) n=4253. The longer the disease duration, the lower the N. I think this is missing by design. I conducted a GMM with BMI as outcome variable and no covariates. It worked fine. I'm wondering though if the FIML default option is the right way to handle the missing data in this dataset. Could you please advice me on this.

Thank you.

Bengt O. Muthen posted on Monday, August 08, 2016 - 3:31 pm

I don't think it is missing by design, but MAR. In which case our default approach - which some call FIML - is appropriate.

See also the paper on our website:

Muth�n, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper.
download paper contact first author show abstract

Simon L Chr�tien posted on Friday, September 09, 2016 - 11:21 am

Hi, I have a dataset with missing variables and I am trying to do a CFA. I'm getting those ouputs:

SAMPLE STATISTICS

ESTIMATED SAMPLE STATISTICS

NO CONVERGENCE IN THE MISSING DATA ESTIMATION OF THE SAMPLE STATISTICS.

MAXIMUM LOG-LIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS -35337.201

and this:

THE MISSING DATA EM ALGORITHM FOR THE H1 MODEL HAS NOT CONVERGED WITH RESPECT TO THE PARAMETER ESTIMATES. THIS MAY BE DUE TO SPARSE DATA LEADING TO A SINGULAR COVARIANCE MATRIX ESTIMATE. INCREASE THE NUMBER OF H1 ITERATIONS.

Is there a command I can use? I consider my missing data MCAR, but I'd like to verify MAR too.

Thanks!

Bengt O. Muthen posted on Friday, September 09, 2016 - 1:56 pm

Try H1iterations=10000;

Simon L Chr�tien posted on Monday, September 12, 2016 - 5:13 am

Thanks!

Dominique W posted on Sunday, May 13, 2018 - 11:34 pm

My cross-sectional survey data is characterised by missing by design. This was done to reduce survey load by about 25%.
Now I conceptually wonder about the benefits of specifying the missing data patterns (in my case three groups): What do I gain by specifying this via the PATTERN command when compared to simply including the MISSING command (using FIML)?

*I have read the manual and here on the board but still cannot fully clarify the benefits and distinctions under the hood.

Thanks!

Bengt O. Muthen posted on Monday, May 14, 2018 - 4:38 pm

UG ex 6.18 shows the best approach. It avoids problems related to low coverage.