Mplus Discussion >> Missing data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Missing data

Mplus Discussion > Missing Data Modeling >

Message/Author

Anonymous posted on Friday, October 29, 1999 - 11:42 am

Does Mplus impute values for those that are missing?

Linda K. Muthen posted on Friday, October 29, 1999 - 11:42 am

No, Mplus does not impute values for those that are missing. It uses all data that is available to estimate the model using full information maximum likelihood. Each parameter is estimated directly without first filling in missing data values for each individual.

Matthew Archibald posted on Saturday, March 18, 2000 - 5:06 pm

I am having difficulty getting Mplus to converge on H1 (and thus to get a chi-sq test) for a missing-data-latent-growth-curve model, even when I fiddle with the starting values and convergence criteria. It runs fine when I do not ask for "type= missing h1;" but then I can't get the chi-sq. Am I missing some fundamental piece of the puzzle?

bmuthen@ucla.edu posted on Monday, March 27, 2000 - 8:18 am

If you give the Mplus statement type=missing h1, the program first does H1 and then H0. You may want to first to a type=basic missing. The H1 estimation that this leads to can be difficult if there are large percentages of missing data - see the Covariance Coverage output. Starting values are not needed for H1. You can try to sharpen the convergence criterion as described in the User´s Guide.

Anonymous posted on Wednesday, February 14, 2001 - 9:07 am

I have data that do not fit the assumptions Mplus imposes for SEM with missing data so I am using a multivariate, multiple imputation approach such as that advocated by Little and Rubin.

My question is whether the coefficients and standard errors generated by the Mplus WLSMV estimator present particular problems for those planning on combining results from several separate imputed data sets.

Bengt O. Muthen posted on Wednesday, February 14, 2001 - 3:02 pm

As far as I understand, combining estimates and s.e.'s from analyses of multiple-imputed data could be done in the usual way also when using WLSMV.

Anonymous posted on Thursday, May 10, 2001 - 1:25 pm

In the manual, you point out that "Mplus has two special data handling features when data are missing because of the design of the study."

I understand that using the "not by design" missing data features, models assume that data are missing at random or missing completely at random. I have a data problem that doesn't technically seem to fit either scenario.

The study is looking at drug/alcohol treatment over time. It follows 2 cohorts of over 1,300 adults at baseline, 6 months, 18 months, 24 months, and 36 months. Because of funding constraints only 1 cohort (n about 700) was interviewed at 18 months. Both cohorts were interviewed at each of the other waves. I am wondering whether or not we should simply drop the entire 18-month wave data in a growth curve model or if we can somehow include the existing data from the 1 cohort who was interviewed. Technically, the missing cohort at 18 months was not missing at random and it did not seem to be similar to the missing patterns by design examples either. In addition, because this is a longitudinal study, there are data at waves other than 18 months that are missing, but these are more defensively considered “missing at random.”

Any advice? Thanks in advance.

Linda K. Muthen posted on Friday, May 11, 2001 - 10:02 am

I would not get rid of the data for 18 months. Measuring one cohort only at 18 months constitutes missing by design and is MCAR. You also have attrition which may be MAR but I couldn't comment on that. I would analyze all of the data using TYPE=MISSING. This is if your outcomes are continuous. TYPE=MISSING is not available for categorical outcomes.

Anonymous posted on Friday, May 11, 2001 - 1:55 pm

I guess it is MCAR (i.e., a random event caused it). I wasn't thinking of it as such since I was wondering if any existing differences between the 2 cohorts would pose a problem in the missing data estimation. But it was not cohort differences that "caused" the missing data, just the flip of the cohort coin. Your advice is very helpful. Thanks!

And yes, we do have attrition too that we would consider MAR.

Mike W posted on Monday, September 10, 2001 - 9:42 am

I'm interested in analyzing repeated measures data using a latent growth curve model. The data come from a complex sampling design (individual cases have sampling weights), are non-normal, AND have both planned/unplanned missing values (i.e., cohort sequential design/sample attrition).

I'm interested in using 3 of Mplus' features:
1. complex sampling (type=complex)
2. FIML missing data estimation
3. Robust estimators (MLM, MLMV, WLSM, WLSMV).

My question is whether these 3 features can be used in conjunction. If not, I'm wondering if it would make sense to do multiple imputation on the missing data, and then use the complex sampling & robust estimators in conjunction?

I appreciate any thoughts you may have. -MW

Linda K. Muthen posted on Tuesday, September 11, 2001 - 9:15 am

We are having an update in about two weeks that will include crossing TYPE=COMPLEX with MISSING AND MIXTURE, but weights will not be allowed at this time. You can do a one class mixture and thereby cross complex without weights and missing. Perhaps that might help. The estimator is MLR. MLR has maximum likelihood estimates and robust standard errors. This may help. Otherwise, multiple imputation would be the way to go.

Anonymous posted on Wednesday, February 20, 2002 - 4:20 am

Does Mplus have any methods to estimate a model when the missing data is non-ignorable?

Bengt O. Muthen posted on Wednesday, February 20, 2002 - 7:33 am

Work is being done in this area but nothing definitive is available at this time. A possibility is to use the pattern mixture approach (see Little's 1995 JASA paper), using covariates, a multiple group approach, or a mixture approach.

Daniel Zimprich posted on Monday, March 25, 2002 - 2:09 am

I am trying to estimate the Diggle and Kenward (1994) model in order to account for non-ignorable missing data. This would, however, require to estimate a multiple-groups model with heterogeneous structures (as it is called in AMOS) using MPLUS. Thus, different groups should be allowed to have different variables included in the analyses (e.g., in a longitudinal setting with one outcome variable measured up to six times, for group1 outcome1 would be modeled, for group2 outcome1 and outcome2 would be modeled, etc.). I did not find such an option in MPLUS, so my question is: Is it possible to estimate a multiple groups model with heterogenous structures in MPLUS (which would also be helpful for "pattern mixture models")?

bmuthen posted on Tuesday, March 26, 2002 - 8:37 am

In the Diggle and Kenward approach, Mplus would need to model the growth part among the "y's" and the missingness as a function of previous observed y's among the "u's", where the quotation marks are used to refer to the general model parts of the Mplus framework. The D & G approach therefore needs to be able to allow missingness on y's as well as "u ON y" (logit) regressions. This combination cannot be done in the current Mplus, but is planned for version 3 due out early 2003.

The pattern mixture approach, however, would seem to be possible to carry out in the current Mplus. This would not use the regular multiple-group track because that requires non-zero variance for equal numbers of observed variables in the groups, which is not present here due to missingness. Instead the mixture track (type=mixture) would be used. In the mixture track, there is no problem due to missing data and zero variance for y's. The groups corresponding to the dropout patterns can be represented by latent classes ("c"), where the known class membership is handled using the "training data" feature. The growth model parameters for the y's can then be allowed to vary across classes (groups, patterns) to the extent desired. We can help with trying this approach.

Anonymous posted on Thursday, May 23, 2002 - 7:09 am

As I understand it, the new analyses in Mplus 2.1 assume MAR, but they use White corrected S.E.s, which I thought assume MCAR. Am I mistaken in my assumption?

Bengt O. Muthen posted on Thursday, May 23, 2002 - 4:57 pm

Theory supports the fact that the corrected
standard errors (sandwich or White) for missing data are correct under MCAR with non-normal data. For normal data, they are correct under MAR. We have found that these corrected standard errors also work better than regular standard errors under MAR and non-normality. However, there appears to be no theory to support this (see Yuan and Bentler in Soc Methods, 2000).

Sherry Chen posted on Wednesday, June 12, 2002 - 11:07 am

I am trying to replicate the table on page 27 of Allison's Missing Data using Mplus. I have trouble specifying the model correctly. In AMOS syntax, the model is like this:

gradrat = () + csat + .. + (1) error
act (correlated with) error

This implies that variable act is also correlated with all the independent variables in the model. How should I specify the same model in Mplus?

I have tried the following and a few variations of it, but they don't seem to work:

VARIABLE:
NAMES ARE csat act stufac gradrat rmbrd private lenroll ;
USEVARIABLES ARE csat act stufac gradrat rmbrd private lenroll ;
MISSING ARE ALL (-999);

MODEL:
gradrat on csat stufac rmbrd private lenroll ;
act with gradrat csat stufac rmbrd private lenroll;

ANALYSIS:
TYPE IS MISSING H1 Meanstructure;
ESTIMATOR IS ML;
CONVERGENCE = 0.005;
COVERAGE = 0.10;

Another try is to specify the model as follows and the results are very close, but I don't think it models the same model:

MODEL:
gradrat on csat stufac rmbrd private lenroll ;
act on gradrat csat stufac rmbrd private lenroll;

Linda K. Muthen posted on Wednesday, June 12, 2002 - 11:58 am

Could you send the AMOS output and the two Mplus outputs to support@statmodel.com? If you can send the data, that would also be helpful.

Linda K. Muthen posted on Wednesday, June 12, 2002 - 5:02 pm

Thank you for sending the outputs. The correct model in Mplus is the one using the WITH statements. The reason the answers did not agree is that this model did not converge. I added two starting values for the variables GRADRAT and CSAT which have large variances and the model convergeds to the same solution as AMOS.

Levent Dumenci posted on Monday, July 08, 2002 - 9:43 am

The following input does what I what to do:
TITLE:
Modified from http://www.statmodel.com/mplus/examples/categorical/cat4.html
DATA:
FILE IS wmimicd.dat;
VARIABLE:
NAMES ARE x1-x3 y1-y16;
USEV = y6-y10;
CATEGORICAL = y6-y10;
GROUPING = x3 (1 = groupA 2 = groupB);
ANALYSIS:
TYPE = MGROUP MEANSTRUCTURE;
MODEL:
f1 BY y6-y10;
OUTPUT: standardized;

What changes do I need to make in this input file when y10 is missing by design in groupB (e.g., Type = mixture missing)? Also, are there any fit indices unavailable after respecifying this missing data problem as a mixture analysis?

Anonymous posted on Tuesday, July 09, 2002 - 9:26 am

You can split groupB into two groups, one group for the observations with y10 present and the other with y10 missing. Add f1 by y10@0 for the last group and use type = mgroup meanstructure.
Type = mixture missing is not going to give you what you want.

bmuthen posted on Monday, July 15, 2002 - 4:16 pm

The idea of the solution proposed above is good - that y10 should not influence the fitting function in the last group where it is missing - but there seems to be two complications. Mplus will complain that y10 has zero variance in the last group. This can be circumvented by letting one person have a different value for y10 in the missing data group to give a quasi-nonzero y10 variance. Also, I think the weight matrix will be singular with zero variance and I don't know its quality if a quasi-nonzero variance is introduced for y10. I don't know if some other trick can be used. Categorical missing facilities are forthcoming in future Mplus versions.

Anonymous posted on Thursday, July 25, 2002 - 7:24 am

Hello. I am working on an LCA model with some missing data, and I would appreciate some advice on its behavior. The (binary) latent class indicators include 5 behaviors measured at each of 2 posttreatment follow-ups (for a total of 10 indicators). About 40% of the sample were interviewed at the short-term follow-up, but not the long-term assessment. Some noninterviews were by design, and some represent attrition. I also have several pretreatment covariates I am using to predict the classes. There are several points I am wrestling with:

1. The "Test for MCAR..." is clearly nonsignificant. What practical effect should this have on modeling strategy?

2. Group membership changes noticably when I add predictors. I suspect many "changers" are individuals with only 1 interview, because they simply have less "u" information available for classification, increasing the importance of their "x" information. If this is true, is it reasonable to believe that the LCA with covariates is more likely to be the "correct" model? Should decisions about the number of classes be made in the presence of the covariates?

3. To test for possible nonignorable missingness, is it appropriate in the context of LCA to try a "pattern mixture" approach (in the spirit of Little or Hedeker & Gibbons)? That is, adding a "missing interview" indicator and interaction terms to the set of covariates.

Thank you in advance for any suggestions.

bmuthen posted on Friday, July 26, 2002 - 10:05 am

Good questions.

Re 1, your MCAR test would seem to suggest that you can feel more comfortable using the ML approach that you are using. The ML approach is correct under the less strict MAR assumption. So, having support for MCAR is comforting, but I don't see that it changes your modeling strategy.

Re 2, changing group membership may point to a misspecification. If in the true model the predictors influence only class membership and not the latent class indicators directly, then you should get statistically the same membership with and without predictors in the model. But if the true model has some direct effects of predictors on latent class indicators, then class membership will change when including predictors but not allowing for the direct effects. The solution is to examine the need for direct effects by including one at a time and looking at chi-square differences (2*logL differences). It is also correct that predictors help to better determine class membership when the latent class indicator information is not strong, but with a correctly specified model this additional information should not cause essential changes in membership.Re 3, yes, a pattern mixture approach could be useful here.

Anonymous posted on Wednesday, July 31, 2002 - 6:54 am

If I am trying to run a Discrete-Time Survival Analysis, but I have missing data in my X values, is the only way for me to estimate a model with missing data is to use a program such as NORM and impute the missingness?

bmuthen posted on Wednesday, July 31, 2002 - 9:38 am

Yes, unless the x variables are such that they do not influence class membership, in which case they can be turned into "y variables" (for which missingness is handled) by referring to a parameter for x (e.g. its mean).

Anonymous posted on Saturday, November 02, 2002 - 2:54 pm

I need help running an EFA with missing data. Missingness is due to use of a 3-form design for 180 participants; data from 49 additional participants who completed either the first or second half of the 64-item set is also included. Covariance coverages range from .262 to.633. I used the following code--my first MPLUS experience.

Data: file is deedataII.txt;
Variable: names are I1-I94;
Usevariables are I1-I64;
Missing = .;
ANALYSIS:
TYPE IS EFA 1 5 MISSING;
ESTIMATOR = ML;
H1ITERATIONS = 500;
H1CONVERGENCE = 0.0001;
COVERAGE = 0.10;

I have tried lowering the coverage crtiterion to .08, running the model with up to 16 of the 64 variables of interest deleted, eliminating the H1ITERATIONS & H1CONVERGENCE statements, and using analysis type missing basic. The messages I get go something like this...

THE MISSING DATA EM ALGORITHM FOR THE H1 MODEL
HAS NOT CONVERGED WITH RESPECT TO THE LOGLIKELIHOOD
FUNCTION. THIS COULD BE DUE TO LOW COVARIANCE COVERAGE OR A NOT SUFFICIENTLY STRICT EM PARAMETER CONVERGENCE CRITERION.
CHECK THE COVARIANCE COVERAGE, OR SHARPEN THE EM PARAMETER CONVERGENCE CRITERION, OR RERUN WITHOUT H1 TO OBTAIN H0 PARAMETER ESTIMATES AND STANDARD ERRORS.
NOTE THAT THE NUMBER OF H1 PARAMETERS (MEANS, VARIANCES, AND COVARIANCES) IS GREATER THAN THE NUMBER OF OBSERVATIONS.
NUMBER OF H1 PARAMETERS : 2144
NUMBER OF OBSERVATIONS : 229

I think that the covariance coverage is adequate--how do I go about changing the convergence criterion or running the model without H1?

bmuthen posted on Sunday, November 03, 2002 - 8:38 am

You can try sharpening the H1convergence criterion to say 0.00001. One question is if your missingness is by design - you mention a 3-form design. If so, there may be alternative approaches.

Anonymous posted on Sunday, November 03, 2002 - 10:11 am

In response to your question, yes the missingness is by design; 180 participants completed 3-form design questionnaires containing 2 of the three subsets of items. I have additional data from 49 participants, each of whom completed half of the 64 items of interest. What options does this give me?

bmuthen posted on Sunday, November 03, 2002 - 5:38 pm

Here is an answer about what one can do in principle with missing by design - without claiming that this is how you should try to do your analysis. If I understand your design correctly, apart from the 49 subjects, there are 3 groups of subjects, each of which has missingness on parts of the variables. In a CFA, these 3 groups could be handled via multiple-group modeling where in each group only the reduced set of variables actually observed in the group would be considered, so that each group would only have missingness that is not by design. This would be an analysis with only about 2/3 of the variables and therefore perhaps less heavy. This approach has 2 complications for you. One, it is not clear how to handle the 49 participants since each group needs to have the same number of variables. Two, you want to do an EFA.

Regarding your analysis, what is the lowest coverage value that gets printed?

Anonymous posted on Sunday, November 03, 2002 - 6:11 pm

The lowest covariance coverage is .262.

bmuthen posted on Tuesday, November 05, 2002 - 8:42 am

Have you had any success using the sharpened H1convergence criterion? If not, perhaps you want to send your input and data to Mplus support so they can help.

Anonymous posted on Saturday, December 07, 2002 - 12:36 pm

I am thinking about using multiple imputation with data on which i am doing a structural equation model. the outcome variable in this model is dichotomous, which limits my options for handling missing data. i am considering using multiple imputation, and am wondering how to approach doing this in mplus. i can create my multiple data sets in other software packages. i have read on your website about the RUNALL facility. would it make sense to run the analyses with that? also, are there any other features in mplus that might be useful in this, including anything that would combine the results from the multiple runs to give final estimates? (or is that step something i need to do by hand?).
thanks.

Linda K. Muthen posted on Saturday, December 07, 2002 - 5:18 pm

RUNALL would be the way to go. Results for the analysis of each data set are saved in an ASCII file which can subsequently be analyzed to obtain means of the parameter estimates etc.

anonymous posted on Monday, January 27, 2003 - 7:11 pm

I would like to use the MLR option across multiple imputations. Because I have no missing data I am not specifiying type=missing. When I try to run the model I get a message telling me that the MLR estimator is not available with type=general. Is MLR only available if you have missing data?

Linda K. Muthen posted on Tuesday, January 28, 2003 - 7:02 am

You can say TYPE=MISSING even if you don't have any missing data. Then you will be able to use MLR.

Anonymous posted on Tuesday, June 03, 2003 - 4:02 pm

My sense is that Mplus can only account for data missing on Y variables.

Is this because the computation is too intensive to include imputation on X's, or because its empirically incorrect to impute on Y's and X's at the same time ?

I ask because I've noticed that one of the HLM packages allows multiple imputations on X and Y in the same model run. This would appear to imply that such models borrow information from the X's to impute Y's (and vice-versa).

Will Mplus allow for imputation on X's and Y's in the near future (version 3.0) ?

Thanks.

bmuthen posted on Tuesday, June 03, 2003 - 5:33 pm

Modeling typically concerns a specification of the distribution of y | x (y conditional on x), whereas the marginal distribution of x is not involved in the model. When there is missing on x, a model for the marginal x part needs to be added. This is true for imputations as well as other modeling. That's why missingness on x's changes the picture and is not trivial - it calls for an extended model that may be hard to specify realistically.

I am not clear on what type of x modeling HLM does for imputations in the x part - I am not sure that this is stated; please let me know if I am wrong. Mplus does not do imputations, but handles missing data in a general way using ML under MAR. Mplus can handle missing on x's if they are brought into the model as "y's". This is done automatically in some tracks of the program (such as non-mixture, non-categorical). In other tracks, x's can be moved into the y set by mentioning parameters related to them in the model. Missing on x is then handled by a normality model for the x's. Normality may not be suitable if x's are say binary and skewed. In Schaefer's imputation programs, missingness categorical x's is handled by loglinear modeling. Mplus Version 3 will have more facilities related to missingness on categorical variables and missingness for variables that have random slopes.

Yongyun Shin posted on Thursday, June 19, 2003 - 1:32 pm

Dear Dr. Muthen,
Would it be possible to estimate in Mplus

1. multilevel model with missing data
2. multilevel SEM with missing data?

For question 2, I would appreciate your
recommended reference.

Thank you.
Yongyun Shin

Linda K. Muthen posted on Thursday, June 19, 2003 - 2:18 pm

Both 1 and 2 can be estimated in the current version of Mplus. These techniques have been available since Version 2.1 which came out in May 2002. The use of these techniques is described in the Addendum to the Mplus User's Guide which can be found at www.statmodel.com under Product Support. More features are coming in Version 3 the Fall.

The Mplus techniques for multilevel SEM with missing data are described in a paper that we will be happy to make available at the end of the Summer. We are not aware of any other references on this topic.

Peter Elliott posted on Saturday, January 31, 2004 - 6:38 pm

I am using Mplus to perform a stepwise regression analysis. I am using Mplus because some of the data is missing (N=363). For Step 1, the Mplus syntax is:
Title: Injury analysis;
Data: file is "filename";
Variable: Names are DV IV1 IV2 IV3 IV4 IV5;
Usevariables are DV IV1;
Missing are all (999);
Analysis: Type = H1 Meanstructure missing;
Model: DV on IV1;
Output: Standardized;
The output shows:
Chi-sqare test of model fit for the baseline model
Value 3.080
DF 1
P-Value 0.0000
(R sq = 0.011)
For Step 2:
Title: Injury analysis;
Data: file is "filename";
Variable: Names are DV IV1 IV2 IV3 IV4 IV5;
Usevariables are DV IV1 IV2 IV3 IV4 IV5;
Missing are all (999);
Analysis: Type = H1 Meanstructure missing;
Model: DV on IV1 IV2 IV3 IV4 IV5;
Output: Standardized;
The output shows:
Chi-sqare test of model fit for the baseline model
Value 17.652
DF 5
P-Value 0.0000
(R sq = 0.062)

To calculate the significance of the R sq change (0.062 - 0.011 = 0.051) can I simply calculate the change in Chi sq (17.652 - 3.080 = 14.772), the change in DF (5 - 1 = 4) and conclude that the R sq change in significant @ p<.01? (The crititcal value of Chi sq for 4 df and p < .01 is 13.28). Or am I on the wrong track completely??
For your advice please,

Peter Elliott

Linda K. Muthen posted on Saturday, January 31, 2004 - 7:19 pm

A chi-square difference test can not be used to determine whether a r-square difference is significant. It can be used to see if parameters in nested models are significant. For example, you could compare

Model: DV on IV1 IV2 IV3 IV4 IV5; to

Model: DV on IV1 IV2@0 IV3@0 IV4@0 IV5@0;

to determine if the four covariates in the model are jointly significant.

Peter Elliott posted on Sunday, February 01, 2004 - 4:51 pm

You are so quick and so helpful. Thank you Linda.

Best wishes,

Peter Elliott

Anonymous posted on Monday, June 28, 2004 - 1:34 pm

Hi there,

I wish to run a multi-group analyis (women vs. men) using the missing option.

1. Would my analysis code be type=missing h1; or type = missing h1 mgroup?

2. Can I compare nested models (e.g., resticting covariances to be equal across groups)using a chi-square difference test when using the missing command?

3. When I run a mgroup analysis (not specifing missing) leaving the 'estimator=' blank, Mplus uses the ML estimator. Can I always trust what Mplus picks? For example, I have some categorical ivs and and some categorical indicators of a latent variable.

Thanks in advance!

bmuthen posted on Monday, June 28, 2004 - 3:19 pm

1. The former.

2. Yes

3. With categorical factor indicators Mplus defaults to WLSMV - and yes, the defaults have good reasons behind them.

Anonymous posted on Tuesday, June 29, 2004 - 7:20 am

I am trying to use missing data analysis to run a simple path model. However, Mplus only analyzes cases with no missing data (i.e., listwise). How can I use FIML in this situation? Thanks!

Mplus VERSION 3.01
MUTHEN & MUTHEN
06/29/2004 10:00 AM

INPUT INSTRUCTIONS

TITLE: PATH ANALYSIS
DATA: FILE IS C:\a1.DAT;
VARIABLE: NAMES ARE id x1 x2 x3 y x4;
CATEGORICAL ARE y;
USEVARIABLES ARE x2 x3 y x4;
missing are x2 x3 y x4 (-9);
analysis: type = basic missing;
MODEL: y ON x2 x3 x4;
OUTPUT: stand tech1;

*** WARNING
Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 115
*** WARNING
Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 318
*** WARNING
Data set contains cases with missing on all variables except x-variables. These cases were not included in the analysis. Number of cases with missing on all variables except x-variables: 209

3 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

bmuthen posted on Tuesday, June 29, 2004 - 9:50 am

Regression on x's does not include a model for the x distribution, but the model concerns the y outcome conditional on the x's. To handle missing data on x's, you need to expand the model to include a model for the x's, e.g. assuming normality. This can be done in several ways. One way is to first do a multiple imputation step outside Mplus. Note that Mplus can take multiply imputed data as input. Another way is to include the x's in the model in Mplus - this is done by mentioning say their variances:

x2-x4;

You can use ML estimation by the Analysis option

estimator = ml;

in which case the missing data on your 3 x's results in 3 dimensions of numerical integration.

David J. DeWit posted on Monday, October 25, 2004 - 11:35 am

Hello:

I am running a parallel process latent growth curve model in version 3.11(3 equally spaced time points)involving two outcomes measured continuously (depression and smoking). The latent intercepts and slopes are regressed on two x's: gender and number of siblings. The data are nested (individuals nested within schools). There are missing values on both the Y's and on the sibling "X" variable.

The model appears to run fine (no warnings or error messages) and generates results that make sense. However, when I attempt to evaluate the plausibility of the model for girls and boys separately, I get a warning message that states "data set contains cases with missing on x-variables. These cases were not included in the analysis". Below is the sytnax used for latent growth model using the multiple group procedure:

USEVAR=DEPPSH1 DEPPSH2 DEPPSH3
CIGS1 CIGS2 CIGS3 NUMSIB;
GROUPING IS SEX (1=MALE 0=FEMALE);
MISSING=BLANK;
CLUSTER=SCHID;
ANALYSIS: TYPE=MISSING H1 MEANSTRUCTURE COMPLEX;
MODEL: I1 BY DEPPSH1-DEPPSH3@1;
S1 BY DEPPSH1@0 DEPPSH2@1 DEPPSH3@2;
I2 BY CIGS1-CIGS3@1;
S2 BY CIGS1@0 CIGS2@1 CIGS3@2;
S1 ON I2;
S2 ON I1;
[DEPPSH1-DEPPSH3@0 I1-S1];
[CIGS1-CIGS3@0 I2-S2];
I1 WITH S1;
I2 WITH S2;
I1 WITH I2;
S1 WITH S2;
I1 ON NUMSIB;
S1 ON NUMSIB;
I2 ON NUMSIB;
S2 ON NUMSIB;
OUTPUT:STANDARDIZED;

Am I making an error somewhere in the syntax? Does M-plus offer FIML for LGC models where there is missing data on both the Y's and the X's in the context of running complex models (nested data) and multiple group comparisons?

I look forward to your response.

D.J. DeWit

Linda K. Muthen posted on Monday, October 25, 2004 - 11:52 am

I don't see how you would not have missing on x's when you read the full data set and have missing on x's when you look at part of the data set. If you send the two outputs, the one that worked and the one that didn't, and the data, to support@statmodel.com, I can figure this out.

Regarding missing on x's, the following is from Chapter 1 in the Mplus User's Guide:

"In all models, missingness is not allowed for the observed covariates because they are not part of the model. The outcomes are modeled conditional on the covariates and the covariates have no distributional assumption. Covariate missingness can be modeled if the covariates are explicitly brought into the model and given a distributional assumption."

Anonymous posted on Tuesday, October 26, 2004 - 3:01 am

I have a question concerning missing data. I am constructing a twolevel model. Some of my within-level variables have missing values. Should I specify that type is twolevel missing or is it not necessary? I am using mlr as an estimator.

Thank you!

Linda K. Muthen posted on Tuesday, October 26, 2004 - 8:24 am

If you want the model estimated using all observations, you must use the MISSING keyword as part of the TYPE option. The default is listwise deletion.

Jacob Felson posted on Monday, November 22, 2004 - 9:40 pm

I had a general question about using the H1 feature with missing data present. The manual (p.361) says that H1 allows the estimation of an "unrestricted mean and covariance model with TYPE=MISSING."

Appendix 6 of the technical manual says basically the same thing--that the H1 model does not restrict u-g or sigma-g.

I am just wondering if this could be unpacked a little because I'm trying to understand what's going on underneath the hood.

Thanks!

Linda K. Muthen posted on Tuesday, November 23, 2004 - 6:44 am

The unrestrictied model is the model of all means, variances, and covariances of the observed variables being free. There are no restrictions on any of the parameters. It is the H1 model. The reason that it is not automatically estimated with TYPE=MISSING in all cases is that it can be slow and is needed only to compute chi-square. So Mplus has it as an option. Without it, you will get parameter estimates and standard errors but not chi-square.

Anonymous posted on Thursday, January 27, 2005 - 8:33 am

I have a model where I am testing for invariance of structural paths across gender in the multiple-group context (all observed, continuous variables) but I am concerned that I have data that are NMAR. One of my endogenous variables is frequency/quantity of alcohol use and I have strong reason to believe that missingness on alcohol use is related to true levels of alcohol use. Consistent with suggestions made in earlier postings, I have constructed training data to represent 4 gender X missing data groups (i.e., males w/complete data, males w/incomplete data, etc. - missing data patterns are too sparse for additional patterns). In order to get weighted averages for structural coefficients (and intercepts) across missing data patterns (as you would via Hedeker/Gibbons 97), I have constrained all parameters to equality within gender for all models (i.e., male incompletes and male incompletes equated, female incompletes and female incompletes equated).

My base model would be the fully constrained model (all parameters equated across all 4 groups) - In order to test for invariance across gender, I allowed males to differ from females (but maintained equality constraints between the within-gender missing data groupings). I then used 2(deviance1-deviance2) for single df X^2 difference tests for invariance across gender.
I wanted to know if a) this approach to pattern mixture modeling is generally defensible, b) could I compare the deviance from a fully saturated model against my base model so I can give an indication of "model fit" (and hand calculate RMSEA and the like) and c) if this is defensible, is there a citation to justify this approach specifically in SEM other than Hedeker/Gibbons 97 or Little 93 (i.e., Muthen/Brown 01 - is this manuscript available)?

bmuthen posted on Thursday, January 27, 2005 - 1:12 pm

A pattern-mixture approach of this kind seems generally ok, but I need some clarification. You seem to have a typo in the parenthesis at the end of the last paragraph and the second parenthesis of the second paragraph sounds strange to me. Seems like you want to test gender invariance while allowing for missing data differences within each gender. Also, does Hedeker's work give formulas for weighting regression coefficients across the missing data groups? I have not seen a reference to pattern-mixture for SEM. Muthen-Brown is still not available and is focused on actually letting latent variables predict missingness.

Anonymous posted on Thursday, January 27, 2005 - 2:09 pm

Yes, I am looking to test gender invariance and adjust for differences across the missing data groups.........Hedeker does give formulas for weighting regression coefficients across missing data groups. He does this by weighting the estimates by the observed proportions among the missing data groups in his 97 Psych Methods paper (an illustration of a conditional LGM with NMAR dropout in Proc Mixed). Here is the link to the .pdf http://tigger.uic.edu/%7Ehedeker/RRMPAT.pdf. Formulas 12 and 14 are the formulas for the weighted estimates and standard errors respectively.
The corresponding dataset and SAS IML program that performs the matrix operation described therein is at http://tigger.uic.edu/~hedeker/ml.html. I simulated longitudinal data that were NMAR for 2 missing data patterns and analyzed the simulated data both in Proc Mixed/IML with his approach and in GGMM in Mplus with the two missing data groups identified w/training data and got nearly identical estimates. So the approach seemed viable but I did not want to move forward without consultation. I have had great difficulty finding a published analog to Hedeker's approach in SEM and had wondered whether Muthen-Brown was the SEM analog but also wanted your take on the approach before going forward........

BMuthen posted on Friday, January 28, 2005 - 2:42 pm

I will look at this when I am back in the country -- after February 2. It looks like you're on the right track and may be the first to do pattern-mixture in SEM.

Anonymous posted on Friday, January 28, 2005 - 4:58 pm

Thank you so much Bengt - look forward to hearing more from you when you are back. Bidding you safe travels on your return to LA......

Anonymous posted on Friday, February 04, 2005 - 3:03 am

Hello and thanks for a great program.

I'm running a simple linear regression analysis in Mplus3 where I want to correct the standard errors for the design effect (two-level structure) as well as estimate this model with missing data.

The "problem" is that I have got missing data on both the dependent variable and several of the independent variables. In another posting on this page you wrote

"Regression on x's does not include a model for the x distribution, but the model concerns the y outcome conditional on the x's. To handle missing data on x's, you need to expand the model to include a model for the x's, e.g. assuming normality. This can be done in several ways. One way is to first do a multiple imputation step outside Mplus. Note that Mplus can take multiply imputed data as input. Another way is to include the x's in the model in Mplus - this is done by mentioning say their variances:

x2-x4;

This leaves me a bit confused on exactly how to write it in syntax. Could you please take a look at my syntax and see if this is the correct way to write a model where we have got missing data in a complex design (where there is also missing on the independent variables) as well as correcting the standard errors.

Names: The names of the variables in the dataset;
Missing = All (-99);
centering GRANDMEAN(age gender education);
cluster is klnr1;
Usevariables are
antisos age gender education singlem1 singlem2 stepf JPC singlef;
Analysis:
type=complex;
type=missing;
Model:
antisos on age gender education singlem1 singlem2 stepf JPC singlef;
age gender education singlem1 singlem2 stepf JPC singlef;

******
In this model we wish to see how six different family structures expressed as five dummy variables (singlem1 singlem2 stepf JPC singlef) predicts antisos after we have controlled for age gender education. I have got no missing data on the dummy variables-only on the control variables and the dependent variable.

Is this the correct way to do it? Do you think it's best to impute the missing data with multiple imputation (NORM) before you use mplus or to to include the x's in the model in Mplus - by mentioning their variances (as I think have done here)? And related: Is it ok to impute missing data with Norm and use the imputed datasets in mplus3 even when you have got a nested data set?

Thank you in advance.

Linda K. Muthen posted on Friday, February 04, 2005 - 10:11 am

I think this is the correct approach. I don't think NORM handles clustered data. You may have to correlate the observed variables using the WITH option. I'm not sure of the default. You could, however, remove the dummy variables from the variance list given that they have no missing data.

Anonymous posted on Wednesday, March 09, 2005 - 11:32 am

What approach does Mplus use to compute the S.E.'s for the H1 model with missing data? Is it using the observed information matrix evaluated at the final estimates?

Linda K. Muthen posted on Wednesday, March 09, 2005 - 12:25 pm

The observed information matrix is used. With ML and MLR, there is an option to use the expected information matrix.

Anonymous posted on Friday, March 18, 2005 - 10:02 pm

I have longitudinal survey data at 5 time points. I am interested in using multiple imputation to handle missing data. I plan on using available data from time 1 for the imputation model used to impute values for the missing values at time 1. I would like to use the imputed (i.e., complete) data from time 1 to help impute the missing data in time 2, and so on.

I have two main questions about this:

Would you recommend doing a single imputation for each wave of data. Otherwise, I would have, say, m=5 imputed data sets for time 1, and then it is not clear how I would go about using time 1 to help impute the time 2 data.

Also, do you have recommendations about whether to use individual items vs. scale scores in the imputation model? I would like to have complete item-level data (for subsequent factor analyses), not just complete scale scores (for path analyses, for example). I have seen examples of multiple imputation and they all use scale scores in the model. Is it ever appropriate to use individual items in the imputation model? I can't seem to find anything about this in the literature.

Thank you.

bmuthen posted on Saturday, March 19, 2005 - 4:55 am

In principle, a good approach would be to use item-level data for all 5 time points jointly, perhaps adding covariates, analyzing these variables by ML under the usual MAR assumption. This approach is certainly doable on the scale score (or IRT theta) level. But perhaps the approaches you discuss are motivated by this ML-MAR approach involving too many variables when working on the item level. Perhaps that is why you suggest a time-by-time approach. However, the use of complete (partly imputed) data from time 1 for imputing values for time 2 does not seem like a good approach to me since it is acting as if the imputed time 1 data are real. And a single imputation would not give the desired result of multiple imputation - showing the true variability. Staying with the idea of imputing item-level data for each time point separately, it seems feasible to do this using observed data (not imputed) data on items (and covariates) at all other time points. I am not familiar with literature on these matters.

Anonymous posted on Saturday, March 19, 2005 - 10:49 am

Thank you for your quick reply.
When you say "use the item-level data for all 5 time points jointly, perhaps adding covariates," isn't it the case that the covariates would already be factored in due to having all the survey items in the imputation model already? So I'm no sure what you mean here. Are you saying that if I want to use depression info as part of the model to impute values for missing anxiety scale items, I should use the depression scale score instead of the depression scale items?

Also, just to clarify, you think it is appropriate to use data collected at subsequent time points to impute values from previous time points? (I'm not arguing against the view, just wnat to clarify). Would this still be the case if there is a reason to expect measurements to change over time (e.g., some of the participants belong to an anxiety treatment group)?

Thank you again.

bmuthen posted on Saturday, March 19, 2005 - 4:19 pm

When I mentioned covariates I was making a distinction between background variables (e.g. demographics) and the (test?) items - it sounds like you are calling all of these variables "survey items" so we were probably just using different vocabulary. So my answer is no to your question at the end of your first paragraph. Regarding your second paragraph, yes my inclination would be to use any variable that might be correlated with the items with missing data.

Anonymous posted on Saturday, March 19, 2005 - 5:12 pm

Thanks again. This has been very helpful.

Anonymous posted on Tuesday, April 05, 2005 - 6:51 pm

I am using GGMM to analyse a longitudinal dataset with missing values. It seems that if "missing" is specified in the variable and analysis commands, FIML method will be utilized and the default algorithm is EM.i.e. the observed log likelihood will be maximized. am I right about this?
In the output I got the warning says the fisher's information matrix and standard error matrix related to some parameters cannot be inverted. what does it imply generally?

one more question,is MCMC ever be considered in Mplus when dealing with latent variables and missing values?

BMuthen posted on Wednesday, April 06, 2005 - 3:16 am

Yes, you are correct about your first question.

Regarding the information matrix, this implies that your model is not identified. Ask for TECH11 to see which parameter causes this.

The current version of Mplus does not include MCMC analysis.

Anonymous posted on Thursday, April 07, 2005 - 10:13 am

many thanks to your prompt answer.

Anonymous posted on Wednesday, June 01, 2005 - 10:23 am

Hi there,

I am using type = missing h1 (with the ML estimator) for a structured equation model using all continuous variables (latent and manifest). I am trying to provide a brief description of what missing h1 does for a manuscript. I read the manual but was confused. Could you provide a brief description for inclusion in the manuscript?

Thanks in advance,

Courtney

bmuthen posted on Wednesday, June 01, 2005 - 5:56 pm

"Missing H1" says that we want to do ML estimation of an unconstrained (saturated) covariance matrix for the observed variables taking missingness into account under the MAR assumption (see the Little & Rubin book). This ML-MAR estimation is carried out using the EM algorithm in line with the L-B book. The estimated covariance matrix is used to compute a chi-square test of model fit, comparing H0 to H1.

Anonymous posted on Saturday, June 04, 2005 - 2:29 pm

Is is appropriate to use H1 with outcomes that are from all categorical data? That is, using theta parameterization and the WSLMV estimator?

Thanks.

Linda K. Muthen posted on Sunday, June 05, 2005 - 6:49 am

Yes, it is. There is a table in Chapter 15 of the Mplus User's Guide that shows which TYPE options are avaiable for variaous estimators and outcome scales. See ESTIMATOR in the index of the user's guide to find this table.

R. MacIntosh posted on Tuesday, June 21, 2005 - 9:48 am

Is there any new information avaiable on implementing the Hedeker-Gibbons pattern-mixture approach in Mplus? (See the 1/27/05 post above.)

BMuthen posted on Wednesday, June 22, 2005 - 12:15 am

No. Is there anything in particular you would like to know?

kristine amlund posted on Monday, July 04, 2005 - 7:36 am

Dear Dr. Muthen,
I am running a path analysis with 7 IVs at Time1 predicting 2 DVs at Time2. I have some missing data (not a huge amount, the coverage is around .9 for all variables). I have specified the Type = missing h1 under the analysis command. I have the following questions:
1. Does this missing data command take into account ALL variables that are listed under NAMES ARE, or does it use the variables that are listed in the USE VARIABLES ARE only?
2.If the latter is true, how do I go about letting other variables into the missing value analysis?(for instance relevant covariates listed in the NAMES ARE list)?
3. One of the Time one variables is gender. The way I have written the syntax now is just listed gender after the ON statement. Should I specify that gender is categorical? If so, how do I write that in the syntax?
Thank you in advance and thanks for a wonderful help-page

Linda K. Muthen posted on Monday, July 04, 2005 - 8:24 am

1. USEVARIABLES only.

2. The USEVARIABLES list should contain all variables in the MODEL command -- independent and dependent variables.

3. You should not place indepdendent variables on the CATEGORICAL list. This list is for dependent variables only.

Anonymous posted on Wednesday, July 06, 2005 - 9:47 am

Dear Dr. Muthen,

I have a question about missing value treatment.
When I want to conduct FIML instead of EM algorithm, How can I do?

Analysis=missing?

According to your previous response related to missing value treatment,

"Missing H1" says that we want to do ML estimation of an unconstrained (saturated) covariance matrix for the observed variables taking missingness into account under the MAR assumption (see the Little & Rubin book). This ML-MAR estimation is carried out using the EM algorithm in line with the L-B book.

Do you think that the MCMC option in LISREL does the same thing as multiple imputation under NORM or SAS proc MI?

Thank you very much!!!

bmuthen posted on Wednesday, July 06, 2005 - 4:44 pm

FIML is an estimator and EM is one algorithm for computing FIML estimates. Other algorithms include Quasi-Newton, Fisher Scoring, and Newton-Raphson. Mplus uses the EM algorithm for the unrestricted H1 models and the other algorithms for H0 models.

Saying Analysis type = missing implies using all available data. With FIML this is the standard "MAR" approach to missingness.

MCMC stands for Markov Chain Monte Carlo. I don't know how the LISREL approach relates to NORM.

Ad Vermulst posted on Friday, October 07, 2005 - 1:42 am

Dear Bengt/Linda,
I am using type=missing H1 in combination with the WLSMV estimator for ordered categorical dependent variables. Can you tell me how MPLUS 3 deals with missing values in this situation? I have read appendix 6 of your technical appendices, but this appendix is restricted to normally distributed y-variables. Maybe you can give me a lit. reference?
Thank you very much.
Ad Vermulst

bmuthen posted on Saturday, October 08, 2005 - 2:07 pm

See the missing data section of Chapter 1 of the version 3 User's Guide - which has the same content as the intro paragraphs for the Missing data topic here on Mplus Discussion.

Essentially, pair-wise information is used with categorical outcomes using the WLSMV estimator.

Reetu posted on Wednesday, December 14, 2005 - 2:19 pm

I am trying to do an exploratory factor analysis with both categorical and continuous variables. I have missings in both and I'm getting an error that is telling me that i can only use the missing option if all my dependents are continuous. Is there a way of getting around this? How should I treat my categorical missings?

Linda K. Muthen posted on Wednesday, December 14, 2005 - 2:44 pm

I'm not sure which version of the program you are using but I just tried this in Version 3.13 and it is fine.

Reetu posted on Wednesday, December 14, 2005 - 2:58 pm

I'm using version 2.14.

Linda K. Muthen posted on Wednesday, December 14, 2005 - 3:56 pm

That does not have missing data estimation for categorical outcomes. This came out in Version 3.

Annonymous posted on Wednesday, January 11, 2006 - 10:50 am

Is the missing data estimation for categorical outcomes appropriate even if it does not appear that the data is MAR or MCAR? How can one test to know for certain if the data is not missing at random?

bmuthen posted on Wednesday, January 11, 2006 - 11:02 am

There is no test for MAR. If one suspects ways in which MAR is violated, non-ignorable missing data modeling can be attempted to see if results differ. Although it is not always easy, you can do non-ignorable modeling in Mplus - see for example the model diagrams posted at

http://www.gseis.ucla.edu/faculty/muthen/ED231e/Handouts/Lecture17.pdf

Annonymous posted on Wednesday, January 11, 2006 - 11:23 am

I'm finding those diagrams a bit hard to follow - can you explain in words what the non-ignorable missing data approach is, and what the MPlus code would look like?

william ryan posted on Tuesday, February 07, 2006 - 10:13 am

Hi, out there:

Full information maximum likelihood and multiple imputation are clearly superior to other ad hoc approaches. I am debating which one to use for modeling my path analyses. Does anyone know if MI has clearly advantage over FIML?

bmuthen posted on Tuesday, February 07, 2006 - 6:24 pm

The approaches should give about the same results. I have heard Joe Schafer say that if you can do FIML, do it. - MI is mostly intended for when it is too hard to do FIML.

Antonio A. Morgan-Lopez posted on Wednesday, March 15, 2006 - 2:53 pm

I wanted to get a sense for whether or not there is a mathematical and/or conceptual relationship between three approaches to the modeling of non-ignorable missingness - the first two are: a) MI where the missing data pattern indicators are included (along with the variables of interest) in the imputation model but only the variables of interest are included in the analysis model (Schafer, 2003, Stat. Neerlandica) and b) FIML with auxiliary variables where the missingness indicators are additional outcomes predicted by the IV(s) of interest (along with the DVs of substantive interest) with residual correlations between the missingness indicator(s) and the substantive DVs (Graham, 2003, SEM).

I came across Schafer's (2003) suggestion on a simple approach to pattern mixture modeling where he says in contrast to traditional PMMS "......this process of averaging the results across patterns may be carried out by MI. Suppose that we generate imputations Y1mis.....YMmis under a pattern mixture model. Once these imputations exist, we may forget about "R" (the missing data pattern indicators) and use the imputed datasets to estimate the parameters of P(Ycom) directly."(bottom of p.27) (link to the paper on Schafer's site is @

http://www.stat.psu.edu/~jls/reprints/schafer_2003_neerlandica.pdf)

Using R in the imputation model and throwing it out in the analysis model sounded very much like using R as a special-case auxiliary variable a la Graham (2003). In Graham (2003), Collins et al., (2001, Psych. Methods) and elsewhere, the equivalence between MI with auxiliary variables and FIML with auxiliary variables is either discussed or illustrated. But one of the key models that is suggested by Graham (2003) (the correlated residuals model described above) looked very similar to a third model (Muthen/Jo/Brown 03 JASA - specifically the model on page 6 of your lecture17.pdf) except for two things: a) mixtures of longitudinal trajectories (which is not an important difference per se) and b) latent missing data classes (e.g., CU in the diagram) that are correlated with (or at least account for differences in conditional means on) the growth parameters. Now to my real question - assuming the same model structure of interest across the two approaches (e.g., single-population LGM), is it safe or reasonable to say that Graham's (2003) model is a special case of your "CU" JASA model where missing data pattern class is "known" (or at least captured with observed measure(s) of missing data class)?

Bengt O. Muthen posted on Friday, March 17, 2006 - 7:12 am

I like the Schafer and Graham (2002) Psych Methods paper and their discussion of MI and FIML. Consider cases where you have variables (Z, say) that relate to missing data and that don't belong in your model of interest for x and y. With MI you would use z in the imputation model but not in the analysis model. With FIML you would use z as extra y variables that are freely related to y and x.

The modeling with the missing data indicators (u say as in Lecture 17) is different. If you have MAR, modeling the u's in an unrestricted way in addition to x and y gives the same ML results as analyzing x and y only (ignorability of missingness). Modeling the u's aims to handle non-ignorability. Lecture 17 suggests several possible alternatives for doing such u modeling. Page 6 that you point to tries to simplify the u structure. This relates to pattern-mixture modeling where you have to use all missingness patterns as covariates. The pattern-mixture model essentially corresponds to a latent class model (the model with cu) that has as many classes as there are missing data patterns. With a latent class model for u, you essentially reduce the number of patterns to the number of classes. You can combine the u modeling idea of Lecture 17 with the z modeling idea above.

Antonio A. Morgan-Lopez posted on Monday, March 20, 2006 - 12:01 pm

Thanks so much for your response Bengt; it was very helpful. I had an additional question on u modeling in general and cu modeling in particular. Other approaches to NMAR have a mechanism for (weighted) averaging of parms and se's across the missingness patterns such as hand-calculation, equality constraints (e.g., Allison 87, MKH 87), combining via matrix manipulation (HG 97) or the multiple imputation approach to NMAR that Schafer discusses in the .pdf linked above. For CU modeling of NMAR, it seems like constraining the estimates to get a weighted average of the covariate>growth parameter effects (i.e., X>I, X>S) is no problem (of course, modeling X>I and X>S only in the %overall% part of the model is less code to do the same thing). But it also seems like if one is interested in getting a weighted averaged estimate of the growth parameter intercepts (GPIs) (across all the latent missing data groups), ( E[ I | X, CU] and E[ S | X, CU] ), you may not be able to estimate them directly in the analysis because if you constrain the GP intercepts to equality, the problem reduces back to an MAR solution - it seems like constraining the GPIs in each CU class to equality eliminates the relation between the growth parameters and CU which seems like the very part of the model that handles non-ignorability. But if you allow the GPIs to vary across CU, you do not get a single (weighted averaged) estimate. Is my understanding of this off-base? If so, any additional guidance you could provide would definitely be appreciated. If this is not off-base, then would you recommend hand-calculation of the weighted average if one was interested in inferences on the GPIs?

Bengt O. Muthen posted on Tuesday, March 21, 2006 - 7:49 am

I think your understanding is correct. You don't want to hold these parameters equal across classes, and this does lead to the problem of how one presents the results mixing over classes. I don't think this is resolved, but needs research. On the other hand, with a cu approach you have fewer patterns (number of classes) and therefore perhaps you are interested in presenting the results for each class by itself without weighting (mixing) them together - the classes may be so fundamentally different that you rather treat them separately.

anna kryzicek posted on Thursday, April 20, 2006 - 12:03 pm

I was wondering what missing data strategy you would recommend for a small longitudinal SEM model? More specifically, I ran a SEM model in which there were 58 subjects at the first time point and only 50 subjects at the second time point (i.e. 8 subjects had missing data). I ran the model two ways (1) with listwise cases deleted and (2) with the means in the place of the missing data. Both models fit the data almost equally well and the same paths were significant in both models. Is the listwise strategy more rigorous than running the model with the means? Does this depend on the percentage of the sample that is missing data? Should I run the model another way?

Linda K. Muthen posted on Thursday, April 20, 2006 - 12:10 pm

What happens if you use TYPE=MISSING in the ANALYSIS command? I think this would be far better than using the means.

anna kryzicek posted on Thursday, April 20, 2006 - 1:10 pm

That works, my missing data are computed. Do you recommend using EM or regression imputation?

Bengt O. Muthen posted on Thursday, April 20, 2006 - 1:59 pm

Mplus uses the EM algorithm for ML estimation under the "MAR" assumption; see the Little & Rubin missing data book. In this approach, missing data are not imputed, but parameters of the model are estimated directly using all available data.

anna kryzicek posted on Thursday, April 20, 2006 - 2:33 pm

Is it alright to use the EM algorithm when your data are non-normal?

Bengt O. Muthen posted on Thursday, April 20, 2006 - 3:08 pm

There is little theory on ML under MAR with non-normal outcomes. I think the results are still better than listwise deletion. See also Mplus Web Note#2 posted on our web site.

Christina Gibson-Davis posted on Friday, April 28, 2006 - 7:45 am

Hello,
In Stata, I created a data set that has several multiply imputed data sets. When I try to read this data set into MPLUS, however, I get the same two error messages repeated until the program finally aborts:

-Errors for replication with data file [and then it lists a bunch of numbers].

-*** ERROR in Data command
The file specified for the FILE option cannot be found. Check that this
file exists: [and then again, a bunch of numbers].

As far as I can tell, the Stata file contains 5 multiply imputed data sets, but do can you tell from the above message if this is problem with the data in Stata or in MPlus?

Thank you,
Christina

Linda K. Muthen posted on Friday, April 28, 2006 - 8:14 am

The message means that the file you have named using the FILE option cannot be found. Perhaps you have misspelled it or it has an extra extension that you are not aware of. If the file contains 5 data sets, you need to separate them if you plan to use the IMPUTATION option of Mplus. If you have further questions on this topic, please send them along with your license number to support@statmodel.com.

Sally Czaja posted on Friday, May 12, 2006 - 11:45 am

In the intro to the Missing Data Modeling Discussion board, there's a reference to a paper I can't find:
"Non-ignorable missing data modeling is possible using maximum likelihood where categorical outcomes represent indicators of missingness and where missingness may be influenced by continuous and categorical latent variables (Muthén et al., 2003)."
Can you provide a link or more information?

Bengt O. Muthen posted on Friday, May 12, 2006 - 6:58 pm

That is the JASA article which you find on our web site under References.

Andy Ross posted on Wednesday, June 21, 2006 - 7:09 am

Dear Prof. Muthen

I am attempting to run a MIMIC LCA model with missingness on the covariates.

A colleague of mine recommended: rather than including the x's in the model by mentioning their variances, which would require using integration to estimate the model. To instead create a new variable with mean zero and small variance and give random values to each case. Then regress all the covariates on this random variable. The covariates are then not independent variables in the model and can be missing.

The syntax for this model is as follows (rg is the new, random variable)

Data: file = c:\soton\ncdmis2.dat;

Variable: names = sx sc2 sc3 me ma ha br pv cd ep pa sm ex
pt re kd em hq rg;
classes = c (4);
categorical = pt re kd hq;
nominal = em;
missing are all(99);

Analysis: type = mixture missing;
starts (0);

Model: %overall%
c#1-c#3 on sx-ex;
sx sc2 sc3 me ma ha br pv cd ep pa sm ex on rg;

%c#1%
[pt$1*-0.688 pt$2*0.243 re$1*2.218 re$2*15];
[kd$1*3.054 kd$2*15];
[em#1*3.121 em#2*-0.819 em#3*1.908];
[hq$1*-3.320 hq$2*-0.432 hq$3*0.287];

%c#2%
[pt$1*-1.408 pt$2*-0.572 re$1*-1.164 re$2*3.578];
[kd$1*-3.190 kd$2*0.464];
[em#1*0.814 em#2*0.523 em#3*0.879];
[hq$1*-0.499 hq$2*2.253 hq$3*3.224];

%c#3%
[pt$1*3.985 pt$2*4.999 re$1*-3.658 re$2*-0.867];
[kd$1*3.799 kd$2*6.832];
[em#1*0.943 em#2*-2.388 em#3*-3.821];
[hq$1*-1.205 hq$2*0.686 hq$3*1.473];

%c#4%
[pt$1*-4.116 pt$2*-2.653 re$1*2.708 re$2*7.775];
[kd$1*-3.485 kd$2*1.518];
[em#1*3.180 em#2*2.270 em#3*1.908];
[hq$1*-2.777 hq$2*0.057 hq$3*0.848];

Output: tech1 tech8 modindices;

However for some reason, whilst this works for my colleague, it does not for me - the outcome is that intergration is still requested.

Could you please answer me two questions?

Firstly what do you think of my colleagues suggestion? Does it sound feasible at least in principle.

Secondly, is there any indication in the syntax why the solution is not working here?

Many thanks

Andy

Linda K. Muthen posted on Wednesday, June 21, 2006 - 8:12 am

My initial reaction is that this is not a good idea. I would have to hear why your colleagues think it is a good idea to say more.

In your case, I would use multiple imputation. You can use the NORM program to generate imputed data sets and analyze them in Mplus using the IMPUTATION option.

Problems with the Mplus syntax should be sent to support@statmodel.com. Please include the input, data, output, and license number.

Andy Ross posted on Wednesday, June 21, 2006 - 8:37 am

Dear Prof. Muthen

Many thanks for your speedy response - i will be certain to pass your thoughts onto my colleague.

Multiple imputation has been our method of choice so far, however the problem is we now want to save the probabilities in a data file, which is something you cannot do when working with multiple datasets.

Can i ask, are you suggesting that in our case, FIML wouldn't really be an option? i.e. use MI and accept that we will not be able to save the probabilities?

With many thanks

Andy

Linda K. Muthen posted on Wednesday, June 21, 2006 - 10:45 am

If you have more than two or three covariates with missing date, it is impractical to bring the covariates into the model because the computational burden of numerical integration would be heavy. If only two or three of your covariates have missing data, then FIML should be fine. You should study the missing data in your covariates. Perhaps there are some with very little missing data such that you could allow the listwise deletion on those and bring the others into the model.

Andy Ross posted on Thursday, June 22, 2006 - 7:32 am

Many thanks again.

I tried running the model again, mentioning the variance of the three variables which had the greatest missingness as suggested.

The model requested that i use ALGORITHM=INTEGRATION method of estimation. However when including this term under the analysis command I got the following error message:

*** FATAL ERROR
THIS MODEL CAN BE DONE ONLY WITH MONTECARLO INTEGRATION.

Is this to be expected? Could you please give me some indication of how i should set up the estimation?

Many thanks

Andy

Linda K. Muthen posted on Thursday, June 22, 2006 - 7:59 am

Add INTEGRATION=MONTECARLO to the ANALYSIS command.

Andy Ross posted on Thursday, June 22, 2006 - 8:21 am

I tried this however i get the following error message:

*** WARNING in Analysis command
The INTEGRATION option is not available with this analysis.
INTEGRATION will be ignored.

and it then goes on to say, as before:

*** WARNING in Model command
This latent class regression requires numerical integration. Add
ALGORITHM=INTEGRATION to the ANALYSIS command.
Problem with: C#1 ON EX

etc.

Linda K. Muthen posted on Thursday, June 22, 2006 - 8:57 am

It sounds like you need to send your input, data, output, and license number to support@statmodel.com so we can see the whole picture.

Julie Hall posted on Thursday, June 22, 2006 - 9:51 am

Hello,
I would like to use FIML to accommodate my missing data. Is that possible with WLSMV?
Thanks in advance!

Linda K. Muthen posted on Thursday, June 22, 2006 - 10:08 am

Missing data estimation using FIML is available for categorical outcomes by using the maximum likelihood estimator. Missing data estimation is also available using the weighted least squares estimator.

Julie Hall posted on Thursday, June 22, 2006 - 10:30 am

Thanks so much. How does Mplus deal with missing data when using WLSMV?

Linda K. Muthen posted on Thursday, June 22, 2006 - 12:32 pm

Pairwise present if there are no covariates. Missing as function of the covariates if there are covariates.

Jennifer Tom posted on Thursday, July 13, 2006 - 3:50 pm

I am confused about how M-plus handles missing data. When I type the following into M-plus:
/*
DATA: FILE IS fulljoin.txt;

VARIABLE: NAMES ARE t1-t10 y1-y80 p1-p10 t11-t20 m1-m20;
TSCORES = t1-t10;
MISSING=ALL (999);
USEVARIABLES ARE t1-t10 p1-p10;
ANALYSIS: TYPE=MISSING;

MODEL: i s | p1-p10 AT t1-t10;

OUTPUT: SAMP;
*/

I receive the means for each of the variables p1 - p10. However, only the variables that do not have any missing data in them, match up with the means calculated in excel. I am certain that the missing values are set to 999 in MPlus file. Thank you for your help.

Linda K. Muthen posted on Friday, July 14, 2006 - 10:07 am

If you do not specify TYPE=MSSING; in the ANALYSIS command, Mplus uses listwise deletion of any observation with a missing value on one or more of the analsyis variables.

peter kane posted on Tuesday, July 18, 2006 - 1:12 pm

question about missing data.

i am analyzing some longitudinal data in a cross-lag model and have about 15% of subjects missing data on one variable at the first time point. these same subjects are missing subsequent time points for this specific variable. essentially, for this 15% of the sample, there is no data on this one variable. however, these missing subjects have observations on other variables.

my question is whether i should delete the subjects who are missing this variable, or conduct the analysis on the entire sample by employing a missing data estimation procedure such as FIML? i guess i am not sure if the data is "missing at random". thank you very much for your ideas/suggestions.

Linda K. Muthen posted on Tuesday, July 18, 2006 - 3:59 pm

I would use missing data estimation even if the data are not missing at random if it meant losing 15 percent of the sample. You might want to do the analysis both ways and see if it affects the interpretation of the results.

Orla Mc Bride posted on Thursday, July 27, 2006 - 8:54 am

Dear Bengt,

I am conducting some analyses using data from NESARC. In a recent article (Grant et al., 2006) analyses were conducted using a sample of past year drinkers (n = 26946). I hope you could answer a query that I have.

I am aware that you have conducted anlayses on this dataset and am interested to know how you and your colleagues dealt with missing data among this sample. I have read in the literature that listwise deletion of missing data is quite popular. I am aware however that the NESARC dataset contains a weighting variable. I have read on the MPlus discussion board that deleting missing data can have an adverse affect on the weighting variable. I want to use the weighting variable in my analyses and I am therefore reluctant to delete cases with missing data.

In an attempt to overcome this problem, I have included the following commands in the input:

Variable:
Missing are all (-9);

Type:
Complex mixture missing;

However, I am aware that other people have used the algorithm command in their analyses. Is this an appropriate solution to the issue of missing data? If not, what command(s) would you suggest I use/change in my analyses?

Thank you for your time.

Bengt O. Muthen posted on Thursday, July 27, 2006 - 5:05 pm

By using Type=MISSING, you are not doing listwise deletion but using the standard MAR approach. I would advice against listwise deletion.

Orla Mc Bride posted on Sunday, July 30, 2006 - 11:47 am

I would like to extend on a query from my previous post (July 27 2006). I have missing values for approximately 4% of my data. I am considering recoding my dataset from values of ‘yes' to 'criteria present' and values of ‘no' or 'missing’ to ‘all other responses’.

Do you think that I could statistically defend this treatment of missing data? I am aware that treating unknowns or missing values as negative has a certain element of risk (as there may be false negatives), but given the low proportion of missing data, I am unsure as to whether this is a problem. I was wondering however if you could perhaps suggest any references or authors that may have utilised such a technique?

Thank you for your time.

Linda K. Muthen posted on Sunday, July 30, 2006 - 6:19 pm

I would treat the missing as missing and use TYPE=MISSING; I think it is dangerous to start recoding. You may want to search the literature to see if you can find anyone who advocates the approach that you suggest.

Julie Hall posted on Wednesday, August 09, 2006 - 8:55 am

I am using Mplus for my dissertation analyses and I want to make sure that I understand how my missing data will be handled. I am using WLSMV (with covariates) and my understanding is that the data will be treated as missing as a function of the covariates. Could someone explain what this means? Thanks so much!

Bengt O. Muthen posted on Wednesday, August 09, 2006 - 2:11 pm

The ML-MAR approach to missing data allows missingness to be predicted by variables that are not missing for the individual. So both y and x variables. However, with WLSMV, if missing is predicted by y variables, the results are distorted, while they are not if missing is predicted by x variables.

Antonio A. Morgan-Lopez posted on Tuesday, October 03, 2006 - 2:35 pm

Hello Bengt (I am having to send this in 2 parts...),
I wanted to follow-up with you on our discussion from March 2006 on this thread about Latent Class Pattern Mixture Models (LCPMMs, i.e., "CU" models for NMAR dropout). With a very small sample (N = 128), I have looked at a series of K-class (i.e., single-class through 4-class) LCPMMs where CU (treatment attendance classes) jointly accounts for a) three-piece linear growth in alcohol use over time across 12 weekly alc. assessments, b) observed measures of treatment attendance (i.e., missingness) from weeks 2-12 of tx (i.e., everyone "shows up" for week 1) and c) the (calendar) week of the trial when the person started treatment. BIC and entropy suggest that a 3-class model fits best and, in fact, when you mix estimates (i.e., growth parameter intercepts, tx effects for each piece) across classes (weighted averaging outside the analysis), you make a different inference than you would have made if you took the results of the 1-class model (e.g., standard LGM under ignorability - but with the missingness indicators left in the model to compare BICs). (Part 1 ends here.....)

Antonio A. Morgan-Lopez posted on Tuesday, October 03, 2006 - 2:39 pm

(Part 2 starts here....) My question is the 3-class model has 64 parameters - exactly half the number of people in the dataset, which is a dangerously low ratio of people-to-parameters (i.e., 2-to-1 - though the class-specific estimates do not look strange and I reproduce the log likelihood value multiple times with 500 starts). But 33 of those parameters (11 indicators x 3 classes) are the thresholds for the missingness (show/no-show) indicators. Lin et al (2004; Biometrics) say that for CU models for NMAR, data are MAR within each class, after conditioning on class membership - seems to me that once you condition on CU, you could ignore these missingness indicators (just as you would never need the missingness indicators in single-population models) and not be penalized for having such a low ratio because more than half the parameters in the model would not even be there if class membership were known. I wanted to get your thoughts on this and see if this was off-base........

Lin, H.Q., McCulloch, C.E., & Rosenheck, R.A. (2004) Latent pattern mixture model for informative intermittent missing data in longitudinal studies. Biometrics, 60, 295-305.

Bengt O. Muthen posted on Tuesday, October 03, 2006 - 8:11 pm

I see what you are saying, but it seems that you cannot get at CU status without estimating those thresholds, so I think they are necessary. It is an empirical question if you do better with such a 64-parameter model than not trying NMAR at all.

Antonio A. Morgan-Lopez posted on Wednesday, October 04, 2006 - 5:23 am

Thanks again Bengt. I agree that this would be a very different model w/o the thresholds. I just worried a little bit about the low ratio, especially given that this particular CU model looks better empirically than 1-class model under MAR (though I realize this is not necessarily a "test" for or against MAR). No one has brought up the ratio problem with these data and it seems like it doesn't worry you either.....

Bengt O. Muthen posted on Thursday, October 05, 2006 - 7:04 am

The ratio worries me - a simulation might indicate how much one should worry.

Antonio A. Morgan-Lopez posted on Thursday, October 05, 2006 - 1:26 pm

I conducted a small sim as part of this work (as part of a poster at Yih-ing Hser's CALDAR conference and a talk I gave Oct 2 at Bud MacCallum's brown bag @ UNC), which focused on confidence interval coverage for the mixing of the class-specific parameters in the meanstructure, using all the class-specific parameter estimates (e.g., growth parameter intercepts, treatment effects, show/no-show thresholds, variance components, etc.) from the 3-class model as population parameters with simulation N=128 (500 replications). I looked at coverage under 1-class through 4-class models, given there were 3 classes in the pop. There were two things that were encouraging for the three-class solution: 1) coverage was excellent for the 4 effects I looked at (weighted average treatment effects on the three linear pieces and the intercept at the last week of treatment), between 92-98% coverage across all replications and 2) no non-converged solutions/local maxima in any replication. Coverage was bad for 2-class and terrible for 1-class, with the majority of the confidence interval misses (relative to the pop. tx effect(s)) coming because the (class-mixed) tx effect was overestimated. 4-class is where the % of non-converged solutions was so high (even with 700 random starts in all conditions), I stopped studying anything beyond 3-class. Does this help?..

Bengt O. Muthen posted on Friday, October 06, 2006 - 7:08 pm

The 3-class results sound encouraging.

Orla Mc Bride posted on Thursday, November 09, 2006 - 1:15 am

I am conducting analyses using data from NESARC, a complex survey design study. My analyses are concerned with a sub-sample of respondents, which I identify in my set-up using the subpopulation command. My query concerns the coding of respondents who are not members of the sub-sample. How should they be coded?

Linda K. Muthen posted on Thursday, November 09, 2006 - 10:08 am

There is no need for any special coding for observations not in the subpopulation. This is handled internally.

Orla Mc Bride posted on Friday, November 10, 2006 - 4:57 am

Just to clarify, those respondents who are not included in my subpopulation are coded as missing in the dataset (due to 2 screener questions). I have identified these people in the set-up (missing are all -9 and type = mixture complex missing). Is this correct?

Also, in my output, should the number of observations reflect the number of respondents in my subsample or rather the entire sample?

Antonio A. Morgan-Lopez posted on Friday, February 16, 2007 - 11:56 am

Hello Bengt/Linda,

I wanted to follow-up on an Oct 2006 thread on Latent Class Pattern Mixtures (e.g., MJB, 2003) on an issue that probably comes up in any K=>2 GMM. In working with the covariance matrix of the estimates (covb) (and a Jacobian matrix of 1st-order derivatives) to generate delta method standard errors for weighted-averaged estimates from LCPMM, I noticed that there were non-zero covariances *across* classes. I initially thought that was strange, as I was expecting covb to be block-diagonal (0s for all parameter covariances across classes). But then I wondered if these non-zero covariances were one of the places in the model where the uncertainty in class membership was reflected; in fact the two class combination (in my K=3 model) that has the largest off-diagonal in the matrix of average latent class probabilities also has the largest cross-class covariances. The other sets of cross-class covariances are 0 (or so small as to be functionally 0). Are my suspicions on-base? If not, any explanation as to why covb isn't block diagonal would be very helpful......

Bengt O. Muthen posted on Friday, February 16, 2007 - 7:46 pm

That's right - it has to do with the posterior probabilities which are spread over all classes, so due to uncertainty as you say.

Antonio A. Morgan-Lopez posted on Friday, February 16, 2007 - 8:29 pm

Thanks so much as always; hope "retirement" is treating you well.....

Richard E. Zinbarg posted on Friday, February 16, 2007 - 9:34 pm

Dear Linda and/or Bengt,
I am analyzing data from a longitudinal study of risk for anxiety disorders and depression in 600+ high school juniors. At T1, we obtained self-reports on vulnerability measures for all subjects and tried to obtain peer-report versions of the same measures for all subjects. However, because some subjects refused to nominate peers and some peers refused to participate, we actually obtained peer-reports on roughly 50% of our subjects only. I was thinking about incorporating the missing data by using the multiple group approach to missing data. However, I have come across some references suggesting that the FIML approach to missing data is conceptually equivalent to the multiple group approach. If this is true, it certainly seems preferable to me to go the FIML route based on ease of model specification. Can you confirm that these approaches are equivalent? If not, when would you use the one and when would you use the other.
Thanks for your time!

Linda K. Muthen posted on Saturday, February 17, 2007 - 8:08 am

The FIML and multiple group approaches are the same.

Richard E. Zinbarg posted on Saturday, February 17, 2007 - 8:26 pm

thanks very much Linda!

Renee Thompson posted on Monday, March 12, 2007 - 6:20 pm

My N is 99 but when i run the following syntax, the number of observations in the output is only 70.

VARIABLE: NAMES ARE (I deleted this for brevity);
MISSING = ALL (99);
USEVAR = ad1 ad2 ad3 satbf1 percrit1
percrit2 percrit3;
ANALYSIS: TYPE = MEANSTRUCTURE MISSING;
MODEL: i s | ad1@0 ad2@1 ad3@2;
i s ON satbf1;
ad1 ON percrit1;
ad2 ON percrit2;
ad3 ON percrit3;
OUTPUT: SAMPSTAT STANDARDIZED MODINDICES (3.84)

Do you know what I am doing incorrectly?

Bengt O. Muthen posted on Thursday, March 15, 2007 - 5:14 pm

Look to see if the output says that individuals with missing on all variables or missing on x variables are deleted. If you don't see this, please send your input, output, data and license number to support@statmodel.com.

Jan Hochweber posted on Friday, March 23, 2007 - 7:45 am

I'd like to use estimated sigma within and between matrices for multilevel regression and path analysis. In part, these matrices would serve as input for multiple group analysis.

Many variables in my dataset are treated as covariates. As far as I know, covariate missingness leads to listwise deletion when using FIML.

When using the sigma matrices as input for analysis with covariate missingness,
I wonder what would be the right N for the sample/the groups. Has missingness in covariates to be taken in account to determine N for covariance matrix input?

Thanks a lot!

Linda K. Muthen posted on Friday, March 23, 2007 - 9:14 am

For the pooled-within matrix use the sample size shown in the output where you saved the pooled-within matrix minus the number of clusters. This takes into account any observations lost due to missing data. For the sigma between matrix the sample size is the number of clusters.

Lisa Liu posted on Friday, May 18, 2007 - 12:33 am

Hi, I am trying to run a two-level path analysis but am having trouble estimating the missing data. When I take out the level two data and just run it as a path analysis, the model successfully estimates the missing data. But when I add in the level two data, it stops working. This is strange because all of the missing data is level 1 data. Any suggestions? Thanks for your time!

Linda K. Muthen posted on Friday, May 18, 2007 - 6:22 am

I would need more information to help you. Please send your input, data, output, and license number to support@statmodel.com.

V X posted on Friday, August 24, 2007 - 3:35 am

1. While using "Type = imputation " option, how does Mplus generate S.E. of the estimates? Does it apply Rubin's rules? I compared the results of the same model with FIML and MI , the S.E. is quite different.

2. How can I request the output of relative efficiency, Relative Increasein Variance, Fraction Missing Information using MPLUS?

Thank you.

Linda K. Muthen posted on Friday, August 24, 2007 - 7:10 am

1. We estimate standard errors for multiple imputation according to the Schafer 1997 reference listed in the user's guide. FIML and MI are asymptotically equivalent. Differences can come about with small samples.

2. These items are not currently available in Mplus.

V X posted on Monday, August 27, 2007 - 3:46 pm

This is a follow-up question to my previous inquiry about analyzing data using multiple imputation.

I generated multiple imputed data sets (40 replications) using PROC MI in SAS. I then analyzed the data using both Mplus and PROC CALIS, taking into account that the data were generated by multiple imputation methods. Below are examples of the resulting parameter estimates with standard errors in parentheses. For comparison, I have also included estimates obtained from Mplus using full information maximum likelihood.

Analyzing data based on multiple imputation procedures (using the identical sets of data in both analyses)
estimate from Mplus: 1.47 (se = .04)
estimate from CALIS: 1.54 (se = .23)

Analyzing data without multiple imputation
estimates from Mplus (FIML): 1.52 (.24)

It is interesting to note that the standard error obtained using PROC CALIS based on MI is quite comparable to that obtained using Mplus with FIML. The result from Mplus based on MI is strikingly different. How does one account for the large discrepancy?

Linda K. Muthen posted on Monday, August 27, 2007 - 4:05 pm

I would need more information to comment on this. Given that the parameter estimates are simple averages over the replications, I wonder why they are different. Unless, they are the same, I wouldn't expect the standard errors to be the same. If you send the three outputs and your license number to support@statmodel.com, I can take a look at it.

Jie Lu posted on Tuesday, September 18, 2007 - 11:17 pm

Hi,

I am trying to fit a structural model with imputed data set generated through the procedure of ICE in STATA. One of the endogenous variable is a dummy. After I fit the model, I do get averaged CFI TLI RMSEA and their respective starndard deviation. However, I did not get those for the Chi-square? How can I get them?

Thanks

Linda K. Muthen posted on Thursday, September 20, 2007 - 11:22 am

The current version of Mplus gives a mean and standard deviation for chi-square.

LJ posted on Sunday, September 23, 2007 - 6:35 am

Hi, Linda,

My version of Mplus is 4.21. When I changed the type of all my endogenous vairalbes as continuous, I can get the averaged Chi-square and its standard deviation. But if some endogenous variables are categorical, I cannot get the averaged Chi-square and standard deviation. Is there some constraint for the WLSMV estimators?

Thanks

Linda K. Muthen posted on Sunday, September 23, 2007 - 8:15 am

With WLSMV, the chi-square test statistic and the degrees of freedom are adjusted to obtain a correct p-value. So the degrees of freedom varies across the replications and therefore we do not report its average.

V X posted on Thursday, November 22, 2007 - 11:59 am

I am not so understand with the "integration = montecarlo;" option in Mplus. WOuld you please provide some references so that I could have a good understand what is the mechanism and when to apply it?

Happy thanksgiving !

Linda K. Muthen posted on Friday, November 23, 2007 - 10:04 am

Monte Carlo integration can be useful with many dimensions of integration and in other special cases described in the user's guide. You can search for this in the computational statistics literature. I don't know of a particular reference offhand.

Graciela Muniz posted on Friday, February 08, 2008 - 5:41 am

Hi Bengt and Linda,
I am interested in using Mplus for fitting a growth model to a data set with missing values on the outcome variable.
I could use TYPE= RANDOM MISSING and the model would produce factor scores and other estimates under a MAR assumption
(missing data mechanism depends on observed data)
However, my question is the following. If I want to model the missing data mechanism as in Diggle and Kenward, I could use the "missing data indicator at time t ON outcome at time (t-1)" (u ON y) type of code but would still need the TYPE=MISSING bit of code to avoid listwise deletion. Am I not "overriding" the missing data code with the inclusion of TYPE= MISSING?
Should not factor scores obtained under the two model specifications (with and without the explicit missing data model) be different due to the presence of the model for the missing data mechanism as I am only including "outcome at time (t-1)" in the missing data model?
thanks,Graciela

Bengt O. Muthen posted on Friday, February 08, 2008 - 8:12 am

The alternative of using missing data indicators (u_t on y*_t-1) in the modeling (so allowing for MNAR) takes the same approach of using all available data as MAR does, so Type = Missing does not override this. Factor scores should come out different in the two approaches given that the models are different.

Hillary Schafer posted on Thursday, February 21, 2008 - 6:43 am

Hi,

I have a question about the MLR estimator, the WLSMV estimator, and missing data.

Regardless of which estimator I use, I get the following message:

"Data set contains cases with missing on all variables except
x-variables. These cases were not included in the analysis."

and only the cases that have been observed on y are included.

I was under the impression that only the cases with missing on X should be deleted from an analysis estimated with FIML so I am surprised that both MLR and WLSMV use the same number of cases.

Can you please explain why this is happening and how I might change the syntax in order to utilize the cases that have been observed on x?

Thank you.

Linda K. Muthen posted on Thursday, February 21, 2008 - 9:23 am

In a conditional model, information on x does not contribute to the estimation of the regression coefficient in the regression of y on x, and the mean and variance of x are not estimated. So an observation with only information on x is not be used because it has no information to contribute.

In an unconditional model where the means, variances, and covariance between y and x are estimated, cases with information only o x are included in the analysis.

The only thing you can do to avoid this exclusion is to mention the variances of the x variables that have missing on x in the MODEL command. This will cause them to be treated as y variables. Their means, variances, and covariances will be estimated and distributional assumptions will be made about them.

Hillary Schafer posted on Thursday, February 21, 2008 - 10:42 am

Thank you for your quick reply. Would you suggest estimating the (co)variances for the x-side variables (and declaring nonnormal variables on the categorical line)? Could this cause other problems or violate assumptions?

Linda K. Muthen posted on Thursday, February 21, 2008 - 10:50 am

I would not do this. If your x variables are continuous normal, it would probably be okay and in line with multiple imputation programs. If they are categorical, it would change the model. The bottom line is that if you are interested in regression coefficients, bringing the cases with only x's into the model will not change the results.

Hillary Schafer posted on Thursday, February 21, 2008 - 10:58 am

Thank you

anonymous posted on Tuesday, March 18, 2008 - 6:54 am

Hello,
I understand that 10% covariance coverage is the default setting in Mplus. However, is there any "rule-of-thumb" regarding the proportion of data that must be present to get reliable estimates?

Linda K. Muthen posted on Thursday, March 20, 2008 - 10:51 am

I would be happy if I had no lower than 80 percent coverage. I don't know of any rule of thumb.

Antonio A. Morgan-Lopez posted on Friday, March 21, 2008 - 1:40 pm

Hi Bengt/Linda,

Have you published and/or are you aware of any articles that illustrate methods for modeling the conditional expectation of the likelihood given the data and current values of the parameter set (i.e., EM for model parameters) for conventional (single-class, single-level) SEMs? I am either coming across applications of EM a) for means and covariances (e.g., Schafer, 1997, p.163-181), b) for model parameters within the multilevel track (e.g., Lee & Poon, 1998, Stat. Sinica; Liang & Bentler, 2004, Psychometrika) or c) for model parameters in the mixture track (Muthén/Shedden, 1999). For b and c, some places are obvious where the model, within the E-step, would be modified/structured to fit a conventional model as a special case - and not-so-obvious in other places. And many texts that discuss casewise ML for MAR data for conventional SEM seem to say little about EM, N-R or any other optimization techniques (while there is plenty of talk on this for multilevel and/or mixture SEM). Any help either of you could provide on this would be greatly appreciated………

Bengt O. Muthen posted on Saturday, March 22, 2008 - 12:47 pm

EM for single-level, single-class SEM is described in a Rubin et al article on factor analysis (Rubin-Thayer?).

Antonio A. Morgan-Lopez posted on Saturday, March 22, 2008 - 1:35 pm

Thank you Bengt,

Looks like they have two papers that are relevant:

Rubin, D.R. & Thayer, D.T. (1982) EM Algorithms for Maximum Likelihood Factor Analysis. Psychometrika, 47, 69-76.

Rubin, D.R. & Thayer, D.T. (1983) More on EM for ML Factor Analysis. Psychometrika, 48, 253-257.

There was also this:

Bentler, P.M. & Tanaka. J.S. (1983) Problems with EM algorithms for ML factor analysis. Psychometrika, 48, 371-375.

Won't be able to get them until Mon (have hardcopy access but no electronic). The 2nd Rubin paper appears to be a response to the paper by Bentler & Jeff Tanaka where both groups traded concerns about susceptibility of another optimization method (N-R maybe? I'll see on Monday...) to local maxima. Thanks for pointing me in the right direction......

Richard E. Zinbarg posted on Tuesday, March 25, 2008 - 7:16 pm

Hi Linda or Bengt,
I know that a model with more parameters than subjects has identification problems (presumably related to the fact that the rank of the data matrix is limited by the number of subjects in this case) but I am not clear on how missing data impacts this. If I am fitting a model say with 200 parameters free to be estimated and 600 subjects total but have complete data from say 150 subjects (self-report is collected from all 600 but more expensive measures such as peer-report and diagnostic interview are collected from subsamples) would we have the same identification problems that we would have had if we didn't have the 450 subjects with partial data? or do the additional subjects with partial data help in this regard? Thanks for any insight you can provide!

Linda K. Muthen posted on Wednesday, March 26, 2008 - 8:25 am

I don't think this will result in identification problems. I think that you may obtain large standard errors for parameters for which you have little information to compute the H1 model.

Richard E. Zinbarg posted on Wednesday, March 26, 2008 - 10:52 am

thanks for the very quick reply Linda! I don't understand the last part about parameters for which we have little information to compute the H1 model. If you woulod be able to elaborate a bit I would appreciate it and am not sure which parameter those would be.

Linda K. Muthen posted on Wednesday, March 26, 2008 - 3:08 pm

Let's say that the sample covariance between y1 and y2 has only 10 percent coverage. Then the standard errors of any parameters that draw on that information may be large.

Richard E. Zinbarg posted on Wednesday, March 26, 2008 - 7:44 pm

crystal clear now, thanks Linda!

Eduardo Bernabe posted on Tuesday, May 06, 2008 - 12:12 pm

Dear Drs Muthen, this can be a very silly question but I am struggling to figure out how Mplus handles missing data (MD). Well, I hava a dataset of 8028 people, with 248 variables (between categorical and continuous) including missing data in all of them. I know that Mplus 5 takes into account MD by default, but I want to know why the number of analysed cases differ according to the number of variables in the dataset. See this example, when I regress age on sex (none have MD) the number of analysed cases is only 5189 instead of the 8028. On the other hand, when I create a new dataset including only age and sex as variables the number of analysed cases is 8028. Why are these differences? Is there any default mechanism that I am missing?

Thanks in advance for your reply

Linda K. Muthen posted on Tuesday, May 06, 2008 - 12:46 pm

I would have to see the two full outputs and your license number at support@statmodel.com to answer that.

Yu Kyoum Kim posted on Wednesday, July 30, 2008 - 2:50 pm

Dear Drs. Muthen,

I am confused which approach Mplus uses to handle the missing data. Especially default setting (TYPE = MISSING & H1 as the default ).

Would you mind elaborating the approach used in default setting. I am not expecting statistical instruction but I want to know exact approach which Mplus uses.

Thank you,

Linda K. Muthen posted on Wednesday, July 30, 2008 - 4:11 pm

Chapter 1 of the user's guide has a description of missing data handling in Mplus along with a reference.

Yu Kyoum Kim posted on Thursday, July 31, 2008 - 7:53 am

Dear Dr. Muthen,

Chapter 1 of the user's guide says this.

"Mplus provides maximum likelihood estimation under MCAR
(missing completely at random) and MAR (missing at random; Little &
Rubin, 2002) for continuous, censored, binary, ordered categorical
(ordinal), unordered categorical (nominal), counts, or combinations of
these variable types"

Does this mean Mplus used the method Rubin (2002) recommended?

This might be due to my lack of knowlege.
Could you please specify the method you used?
Can I just say Mplus used "special" ML to hand missing data?

Please let me know.

Thank you so much for your help in advance.

Linda K. Muthen posted on Friday, August 01, 2008 - 9:52 am

I am not familiar with Rubin (2002).

Just quote the user's guide if you cannot paraphrase it.

Nikolai Eton posted on Wednesday, October 22, 2008 - 8:40 am

Dear Linda and Bengt,

I am interested in LGC with multiple indicators and multilevel growth mixture models. As my dataset is "different" in a way, I would like to hear your advice on how to handle missing data in it using Mplus.

The dataset consists of several variables (cont. & cat.) across 5 timepoints on the item level (every variable shows up 5 times) and is hierarchical with individuals nested in groups. The whole dataset is related to one country. The special thing about the dataset is that there is a high fluctuation across groups (that I can control), but also above countries (that I can't control). Thus, individuals sometimes change the group. In addition to this, sometimes they leave the country (and probably return later), what appears as missing value in the time of absence.

Overall, I have about 1000 individuals nested in about 25 groups with all 5 timepoints available for 106, 4 timepoints for 87, 3 timepoints for 165, 2 timepoints for 220 and 1 timepoint for 442 variables. I do not necessarily want to explain the absence.

What is your recommendation on taking care of missing values in here?

Thank you very much for your help.

Nikolai Eton posted on Wednesday, October 22, 2008 - 8:58 am

One add-on to my question:
the minimum covariance coverage is .113 (although that var has about 70% missy data information). Thus, basic analysis results in this error message:

THE COVARIANCE COVERAGE FALLS BELOW THE SPECIFIED LIMIT.

Bengt O. Muthen posted on Thursday, October 23, 2008 - 9:41 am

I think there are two separate issues here.

If you use group as a level 2, you are treating group as a random mode of variation. In this case, changing group membership implies the need to use a "crossed random effects" approach (see the multilevel lit.), which Mplus currently cannot do.

The leaving the country at some time points is a missing data question which probably is handled fine by the standard ML MAR approach of Mplus. But you want to pay attention to coverage that is lower than the Mplus default limit of 0.10 (which can be altered) - that may already be too low (just put there to prevent convergence problems) for seriously relying on the results. It depends on where the low coverage occurs. If it is for a covariance between say the first and last time point, but coverage is otherwise high, then that is not so problematic because you typically don't have a growth model parameter corresponding to the covariance between first and last. A problem would be if low coverage happens for a variable (the diagonal of the coverage report) or for variables at close time points.

Jeff Cookston posted on Friday, October 24, 2008 - 10:06 am

We have a data set of 300 adolescents who were sampled at three waves in a cohort sequential design. It's a typical longitudinal data set and we have a reasonable amount of missingness.

We did a series of unconditional and conditional LGM models to describe and predict constructs and concluded our paper with a conditional parallel process model between two constructs. To maximize our power and sample, we included cases that had data at two time points and used the missing estimator (our rationale being that two points provides a line if not a curve). When we looked at our data everything made sense and we wrote up and submitted our manuscript.

Upon receiving our reviews to this and two similar papers, we received consistent critiques that I'm hoping you can help me clarify for the reviewers. I'm writing today to ask if you can direct me to literature that address these issues.

1) Reviewers were concerned that latent growth curve models cannot be properly identified or stably fit with 3 (or less) time points. Is there evidence that the models are trust-worthy?

2) Is our rationale to keep cases with at least two time points and use the missing estimator something we can justify?

3) For some of our estimates, the size of the estimate is very small (e.g., slope = .008). Although significant, how are small estimates to be interpreted?

Thanks for your time.

Bengt O. Muthen posted on Saturday, October 25, 2008 - 12:12 pm

1) Typically, at least 4 time points is desirable for good growth modeling. With only 3 time points, there are several model mis specification risks that cannot be countered due to having too few time points to identify more flexible models. This is discussed in our Mplus Short Courses, Topic 3 (see videos and handouts on our web site). Still, many published studies have used only 3 time points. If all individuals have only 2 time points, only a very limited growth model is possible.

2) It sounds like you have 3 time points for a majority of individuals and 2 for some. The percentages of each should be given. And, in fact, you could have included individuals with only 1 time point. This is what ML estimation under the "MAR" assumption of missing data theory (see the Little & Rubin book) would do. If a majority has 3 time points, I don't see a serious problem with this approach.

3) I think you are talking about a slope mean. The size of this depends on the time scores. The real question is what the implied change in mean is for the outcome from one time point to the next. You find that in the Mplus output when requesting RESIDUAL.

Hemant Kher posted on Thursday, October 30, 2008 - 2:53 pm

Dr. Muthen -- Greetings,

I am working on fitting growth models to survey data and have a question about missing data.

There were 233 students in the sample. We collected data at 4 equally spaced time points. With regards to our key variables used to fit growth models, here is the breakdown of how often students provided data:

107 students provided data at all 4 time points (46%)
64 students provided data at 3 points (27%)
36 students provided data at 2 points (15%)
23 students provided data at 1 point (10%)
3 students did not provide any data (1%)

I have read somewhere that for growth models, we need at least 1 time point -- but I am not sure if having close to 10% people that provided only 1 observation will affect our growth models.

Bengt O. Muthen posted on Friday, October 31, 2008 - 9:49 am

Assume that you have a linear growth model. The most important factor is how many individuals have at least 3 time points because that's how many you need to identify all the parameters. The individuals with fewer time points also contribute to the estimation of some parameters so they are helpful to include. Of those who have at least 3 time points you also want to know how representative they are of the whole group - a simple thing to check is if the mean of the outcome at the first time point is significantly different across the 4 missing data groups you list. If different, you may consider "pattern-mixture modeling".

Michael Strambler posted on Friday, December 19, 2008 - 6:57 pm

In an earlier posting, it was mentioned that FIML was available for categorical outcomes. However, whenever I have tried this I get a warning stating, "Data set contains cases with missing on x-variables. These cases were not included in the analysis." This has been the case when I have run logistic regression analyses and when I have run SEM models with binary indicators of latent variables. Can you clear this up for me? Is there a way I can get Mplus to use FIML with such analyses?

Linda K. Muthen posted on Sunday, December 21, 2008 - 6:40 am

A regression model is estimated conditioned on the observed exogenous variables. Means, variances, and covariances of these variables are not part of the regression model. Missing data theory applies to observed endogenous variables. You can include the observed exogenous variables in the model by mentioning their variances in the MODEL command. By doing this they are treated as endogenous variables and distributional assumptions are made about them.

Michael Strambler posted on Wednesday, December 24, 2008 - 9:08 am

Thank you for the helpful response. From looking at the manual I am not clear on the code for mentioning the variances in the MODEL command. Can you provide a little more detail on this?

Linda K. Muthen posted on Friday, December 26, 2008 - 2:25 pm

See variances in the index of the Mplus User's Guide. It points to page 524 where how to refer to variances is explained.

Michael Karcher posted on Tuesday, February 03, 2009 - 2:37 pm

Hello,

I have missing data question that I was hoping someone could answer. We developed a ten factor measure of connectedness, with one factor measuring ting connectedness to sibling. As expected, some of our subjects do not have siblings and thus their data is missing appropriately. To avoid losing those subjects on the other factors, I estimated the missing data using a multiple imputation procedure and followed it with an invariance analysis that compared siblings and non-sibling samples across the factor loadings, intercepts, residuals, and covariance matrix.

My first question is do you find this analytic approach appropriate? As I expected, the results were nearly identical across the two samples, both when testing the ten factor model and single factor model (i.e., sibling connectedness scale). The only caveat is that the sibling connectedness results should only generalize to subjects with siblings.

My second question is whether mixture modeling with known classes is a better approach to answer this question. If my understanding of mixture modeling is correct, I would draw the same conclusion. Am I correct?

Thanks for your time and consideration.

Linda K. Muthen posted on Tuesday, February 03, 2009 - 5:44 pm

I would look at subjects with and without siblings separately. Then if you want to compare them, do so on the factors that are not about sibling connectedness. Imputing siblings for those without seems a little iffy.

Mixture modeling with only a known class variable is the same as multiple group analysis.

anonymous posted on Thursday, February 26, 2009 - 3:52 pm

Hello,
I am trying to determine how to approach a missing data issue. I have ratings of depression severity across time for about N=400. The timing of observations varies across individuals, so I plan to nest time points within individuals. One problem, however, is that the number of data points also varies across individuals. For instance, the number of data points for the sample ranges from 1-16, with a mean of 8 data points, SD=2.8, and variance=7.7. I am not certain what would be the best way to approach this. For example, would it be best to include only the first 8 time-points for the analysis? Any thoughts would be very much appreciated.

Bengt O. Muthen posted on Thursday, February 26, 2009 - 6:53 pm

I would include all the data. The varying timings is handled by the AT option of the growth language (using |) and the varying number of time points can be handled either by

(1) using a single-level, wide approach letting the observation vector be of length 16 and using a missing data symbol for time points not available

or

(2) using a two-level (time points within individuals), long approach with a univariate outcome, where the different number of time points per individual is merely resulting in different cluster sizes and is therefore inconsequential.

anonymous posted on Friday, February 27, 2009 - 10:14 am

Hi Dr. Muthen,
I have attempted a LGMM using the first option, however the model does not converge. Do you think that convergence would be more feasible with the second option?
Thanks very much for your help!

Bengt O. Muthen posted on Friday, February 27, 2009 - 4:00 pm

I am not sure. It would depend on the reason for non-convergence. You would have to contact support@statmodel.com to have this diagnosed.

anonymous posted on Saturday, February 28, 2009 - 2:46 pm

The error I obtain is the following:
*** ERROR
One or more variables have a variance of zero.
Check your data and format statement.

There is one variable with only 3 subjects and the variance is 0.027. However, Mplus indicates that the variance is 0.000 for this variable. Is this possible or is the data file incorrect?
Many thanks!

Linda K. Muthen posted on Saturday, February 28, 2009 - 3:40 pm

This cannot be answered without seeing your input, data, output, and license number at support@statmodel.com.

anonymous posted on Monday, March 02, 2009 - 7:15 am

Dr. Muthen,
Many thanks. I will send you the input, data, and output.

I know that Mplus does not generate graphs when the type=random analysis is used to account for individually-varying times of observation. I'm guessing this is also true if type=random is used in the LGMM framework, correct?

Thanks very much for your assistance.

David Thomson posted on Monday, March 02, 2009 - 4:13 pm

I'm using the ECLS-B database from NCES. There are about 12 weights to be applied to various variable sets. I think I've identified the correct weight for my variables and should receive confirmation from NCES soon.

However, even though the output in SPSS shows I have a variable weighted, my Mplus output shows only something like 51 cases were analyzed. There are missing weight values for some cases, predictably. And yet, I'm told I cannot impute any value, neither a 1 or 0 for example, in Mplus.

Do you have any transformation or filtering suggestions, so that I can do a CFA with a larger sample?

Linda K. Muthen posted on Tuesday, March 03, 2009 - 1:41 pm

In my understanding, missing is not allowed for a weight variable. You need to check with NCES to determine which weight variable you should use. It should not have missing.

anonymous posted on Wednesday, March 04, 2009 - 11:45 am

Hello Dr. Muthen,
Regarding your response to my query concerning how to approach missing data (Bengt O. Muthen posted on Thursday, February 26, 2009 - 6:53 pm), what are the advantages and disadvantages to taking a long vs. wide approach? I have more familiarity with the wide approach and would prefer it, but will certainly consider the long approach if it has definite advantages over missing data.
Also, a few more questions:
1) Can the long approach be used in conjunction with a LGMM?
2) Is it possible to graph the classes of trajectories with an HLM that uses the analysis=random option?
Thanks very much!

Bengt O. Muthen posted on Wednesday, March 04, 2009 - 5:39 pm

I think the wide approach is generally preferable, but not always. For example, you can allow the residual variances for the outcomes to vary across time. But the wide approach has to use the max number of observations per subject which may lead to a long observation vector (very wide). And with individually-varying times of observation having a different residual variance for each time point makes for many parameters. Furthermore, some time points may not have variation in the outcome if the missingness is extreme.

1) Yes. In this case the latent class variable is a between-level variable (see UG for examples).

2) I think so; try it.

anonymous posted on Thursday, March 05, 2009 - 7:19 am

Hi Dr. Muthen,
Yes, using the wide approach, I've found that some time points do not have variation in the outcome b/c the missingness is extreme. I was thinking of simply only including time points where the covariance coverage equals or exceeds .60 (although I have no reference to justify this approach).
1. Does this seem to be an acceptable solution?
2. Do you know of any references that recommend such a covariance coverage?
thanks!

Bengt O. Muthen posted on Thursday, March 05, 2009 - 8:13 am

You can manipulate the data to fit better with the wide approach by deleting time points or combining them with adjacent timepoints, but such manipulation does not seem right. Given what you see, I would instead take the long approach. You may find the DATA WIDETOLONG option helpful.

Andrea Vocino posted on Wednesday, March 11, 2009 - 10:42 pm

Just wondering if it's possible in any way to run MLM estimation when having missing data. I have noticed that MLM requires the raw data (so it must be a FIML type estimation) so even if I feed the model with a covariance matrix it won't work.

Thanks in advance!

Linda K. Muthen posted on Thursday, March 12, 2009 - 7:23 am

Missing data are not allowed with MLM.

anonymous posted on Thursday, March 12, 2009 - 12:46 pm

Hello Dr. Muthen,
I've attempted to transform the data from wide to long per your suggestion (Bengt O. Muthen posted on Thursday, March 05, 2009 - 8:13 am). Thankfully, the model ran! However, I have a few questions regarding interpretation:
1) How might I obtain a graph of the LGM trajectory? When I attempt to view the individually-fitted curves, only two data points are plotted on the y-axis.
2) If I am using the long option, I no longer need TSCORES correct?
3) How might I compare LGM models? When using the wide approach, I've conducted chi-square diff tests for nested models (intercept vs. intercept + slope vs. Intercept + slope + quadratic slope), but I am not certain how to do this using the log likelihood test.
Many thanks!

Bengt O. Muthen posted on Thursday, March 12, 2009 - 4:50 pm

1) You need to send this to support with the usual information.

2) You need to use AT (and TSCORES) also here if you want to take into account individually-varying times of observation.

3) 2 times the difference in loglikelihood for two models is chi-square distributed.

Sanja Franic posted on Monday, April 20, 2009 - 3:30 am

Is it ok to write that MPlus estimate models under missing data theory using all available data (when 'missing' option is used)? I.e. it does not perform peirwise deletion, does it?

Amir Sariaslan posted on Monday, April 20, 2009 - 6:16 am

Sanja,

In the newer versions of Mplus, TYPE = MISSING is the default, where missing cases are handled under the Missing at Random (MAR) assumption using Full-Information Maximum Likelihood (FIML). You may also specify models with listwise deletion through LISTWISE=ON in the DATA-command. More information is provided in the User's Guide, pp. 7-8.

Sincerely,
Amir

Bengt O. Muthen posted on Monday, April 20, 2009 - 3:26 pm

With ML estimators all available data are used, using "MAR". Mplus only does pairwise with categorical outcomes and WLSMV.

Jeremy Miles posted on Tuesday, May 26, 2009 - 1:10 pm

Hi,

I've received this comment from a reviewer, regarding a confirmatory factor analytic study:

"Were missing data patterns missing at random (this can be done in Mplus by specifying a mixture analysis and using only a single class latent variable, using the %OVERALL% syntax at the beginning of the model statement and declaring the outcome variable as categorical variables)."

I don't understand what they are suggesting, and even if I did understand, I don't see how any test could tell if the data were MAR/MCAR vs MNAR.

Is this documented anywhere?

Thanks,

Jeremy

Bengt O. Muthen posted on Thursday, May 28, 2009 - 11:22 am

I also don't understand what is suggested here. And there is no general test of MAR/MCAR versus MNAR. You can only test for MCAR.

Jeremy Miles posted on Thursday, May 28, 2009 - 11:31 am

Thanks for confirming.
Jeremy

Paola Zaninotto posted on Wednesday, June 10, 2009 - 9:17 am

I have mplus version 5, I am running a path analyis and I understand that the default is to estimate the model under missing data theory.
How can I turn off this option?
I just want to use complete case analysis in order to compare my results with another package.
Thank you

Linda K. Muthen posted on Wednesday, June 10, 2009 - 9:27 am

Add LISTWISE=ON; to the DATA command.

Paola Zaninotto posted on Thursday, June 11, 2009 - 4:22 am

Thank you

Richard Rivera posted on Sunday, June 14, 2009 - 2:39 pm

i tried adding LISTWISE=on, to the data command, but I get error:

*** ERROR in Data command
Unknown option:
LISTWISE

Linda K. Muthen posted on Sunday, June 14, 2009 - 5:54 pm

This option came out with Version 5. Perhaps you are using an older version where listwise deletion is the default. If not you need to send your full output and license number to support@statmodel.com.

Richard Rivera posted on Monday, June 15, 2009 - 6:03 pm

In V 4.21, I tried running logistical model on a binary categorical outcome with Type=missing:

ANALYSIS:
Type = missing;
ESTIMATOR = ML;

But I got the following message:

*** FATAL ERROR
THIS MODEL CAN BE DONE ONLY WITH MONTECARLO INTEGRATION.

Please advise.

Linda K. Muthen posted on Monday, June 15, 2009 - 6:20 pm

Add INTEGRATION=MONTECARLO; to the ANALYSIS. Also download Version 5.21.

Michael Businelle posted on Monday, July 27, 2009 - 10:29 am

Dear Dr. Muthen.

A reviewer of one of my manuscripts requested that I report how Mplus handles missing data. I have a complex structural equation model (see below). I used the WLSMV estimator and MISSING = ALL (999). The outcome variable is categorical (1=relapse, 0=abstinent) and no subjects are missing on this variable. However, some subjects have missing data on some of the other observed variables. For instance, some subjects do not have data for c1-c4 (each of the observed variables that make up the crave latent variable). Is my description of what Mplus does in this situation correct? Syntax for the model is below.

“Intent to treat abstinence was the dependent variable in the current study. Thus, none of the participants were missing on the dependent variable (i.e., missing were counted as relapse). However, some participants did not complete all of the study measures. Mplus handles these missing values by estimating them using the other variables in the model.”

MODEL: SES by s1 s2 s3 s4;
Neigh by h1-h4;
Support by i1-i3;
NA by n1-n4;
agency by a1-a5;
Crave by c1-c4;
neigh on ses;
support on ses neigh;
NA on neigh support crave;
agency on crave na;
w4itt on agency ses;

Linda K. Muthen posted on Monday, July 27, 2009 - 4:38 pm

Factor indicators are dependent variables. For censored and categorical outcomes using weighted least squares estimation, missingness is allowed to be a function of the observed covariates but not the observed outcomes. When there are no covariates in the model, this is analogous to pairwise present analysis.

I think that what you are saying is that all of your dependent variables are continuous except abstinence. If this is the case, I would use maximum likelihood estimation where maximum likelihood estimation under MCAR (missing completely at random) and MAR (missing at random; Little & Rubin, 2002) is available for continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types. MAR means that missingness can be a function of observed covariates and observed outcomes.

Andrea K. Wittenborn, Ph.D. posted on Monday, July 27, 2009 - 5:41 pm

I am interested in receiving your suggestions for analyzing my data using Mplus. The data comes from an intervention study for couples transitioning to marriage. 18 couples completed pre-test data and 12 couples completed post-test data. I am interested in assessing change in couple's attachment, affect regulation, empathy, and trust (continuous variables) following intervention. This was a very preliminary study which is why I want to use the FIML capabilities of Mplus to keep the sample size higher than 12.

Bengt O. Muthen posted on Monday, July 27, 2009 - 6:17 pm

The standard approach to analyze longitudinal data is to use FIML under the "MAR" assumption (see missing data lit.). This means that you use all available data - 18 couples at time 1 and 12 couples at time 2. I assume that the 12 couples is a subset of the 18. Couple, not individual, represents the mode of variation for which independent observations is assumed to hold, so the sample size is 18. Because of this, note that with only 2 timepoints the sample size of 18 is quite low and does not allow the estimation of a model with many variables and parameters.

Newcomer posted on Tuesday, July 28, 2009 - 8:44 am

Hi, I am using Mplus to run linear regression and just wonder if Mplus can save the adjusted means (predicted values from a multiple regression). If so, what is the command?

Thanks!

Linda K. Muthen posted on Tuesday, July 28, 2009 - 12:55 pm

You cannot obtain these automatically. You would need to use the DEFINE command in a subsequent analysis to obtain them.

Newcomer posted on Tuesday, July 28, 2009 - 2:00 pm

Thanks so much Linda! Just to confirm--so, I will need to plug in the regression coefficients in the equation to calculate the predicted values using the DEFINE command, and then use the SAVEDATA command to save it, right?

Linda K. Muthen posted on Tuesday, July 28, 2009 - 4:06 pm

Right.

Gabriel Schlomer posted on Wednesday, July 29, 2009 - 11:51 am

In a previous post the following was stated by another user:

"In the newer versions of Mplus, TYPE = MISSING is the default, where missing cases are handled under the Missing at Random (MAR) assumption using Full-Information Maximum Likelihood (FIML)."

And then was followed up with a statement by Bengt:

"With ML estimators all available data are used, using "MAR"'

I have seen this sort of wording, "all available data are used", by Drs. Muthen in regard to missing data in several places but I have not seen either of them directly state that when using TYPE=MISSING FIML is being employed.

Is is fair to say that when you specify TYPE=MISSING (which is now the default) MPLUS is using FIML?

Bengt O. Muthen posted on Wednesday, July 29, 2009 - 12:17 pm

"FIML" is used in some literature to mean full-information maximum-likelihood estimation (most often with continuous outcomes, but that is not necessary) and with missing data the "MAR" assumption of missing data theory is utilized. (As an aside, I think the "full-information" part is superfluous because maximimu-likelihood estimation uses full information; to me it is not a good idea to add unnecessary acronyms beyond those in mainstream statistics.) Mplus uses ML to refer to maximum-likelihood estimation. ML under MAR is therefore the same as "FIML" and uses all available data.

So, TYPE=MISSING together with ESTIMATOR=ML gives "FIML". TYPE=MISSING together with ESTIMATOR=WLSMV, however, does not use MAR but a less flexible assumption detailed in the UG.

Gabriel Schlomer posted on Wednesday, July 29, 2009 - 12:59 pm

Many thanks for the clarification!

Michelle Williams posted on Tuesday, August 04, 2009 - 8:27 am

I am running growth models with a lot of missing data. I need to compare nested models. However, my understanding is that it is not appropriate to use traditional chi-square difference tests to compare the fit of the nested models when modeling missing data due to the approximated chi-square values. Further, the manual states that the DIFFTEST option can only be used with MLMV or WLSMV estimators, yet I am using the default (ML) estimator. What is the most appropriate way to compare the relative fit of the nested models in this case? Should I be changing my estimator, or using some other approach?

Thank you.

Linda K. Muthen posted on Tuesday, August 04, 2009 - 11:45 am

The presence of missing data should not be an issue with difference testing. It is only the estimator that dictates the type of difference testing. ML uses a simple difference in chi-square. MLR requires the use of a scaling correction factor. Estimators ending in MV can use the DIFFTEST option.

Michelle Williams posted on Tuesday, August 04, 2009 - 3:10 pm

Linda,

Thank you for your response. I realize that ML chi-square is typically the difference in the chi-square values (with the difference in the degrees of freedom as the df). However, when I estimate these models using regular ML estimation, the df change between samples. For example, if I run a model with sample A, and then run the exact same code with sample B, my chi-square df changes from 70 to 71, respectively. This implies to me that regular chi-square difference testing might not be ok. Am I totally off base?

Linda K. Muthen posted on Tuesday, August 04, 2009 - 3:30 pm

If the model is the same, changing the sample does not change the degrees of freedom with ML. If you send the two outputs and your license number to support@statmodel.com, I will find the explanation of this difference.

Linda K. Muthen posted on Wednesday, August 05, 2009 - 9:42 am

Both models estimate the same number of parameters. The difference in degrees of freedom is due to a different number of parameters in the unrestricted models due to different patterns of missing data in the two samples.

Michelle Williams posted on Wednesday, August 05, 2009 - 9:58 am

In that case, can I still use the traditional chi-square difference test to compare these results to the results from a nested model, even though the df change?

Linda K. Muthen posted on Wednesday, August 05, 2009 - 11:29 am

Nested models are tested for the same data set so this should not be a problem.

Harald Gerber posted on Thursday, August 06, 2009 - 11:03 am

Do you deem it necessary to conduct an analysis of sample selectivity when FIML is used? I thought of comparing those with full data with those having at least one missing on the main study variables of interest. However, I'm not sure whether this analysis is theoretically needed because FIML uses all data available to estimate the model. In case there a differences between both groups...is MAR violated?

Bengt O. Muthen posted on Thursday, August 06, 2009 - 12:08 pm

MAR is not necessarily violated - the missingness can still be predicted by the variables that are observed. You cannot test if MAR holds. Although of interest in itself, your comparison can only reject MCAR. So unless you try to get into NMAR (not missing at random) modeling, you might just as well go ahead with ML under MAR (i.e. what is often called FIML).

Paul Tremblay posted on Thursday, October 29, 2009 - 11:04 am

I'm trying to understand how missingness in x variables are handled in MPLUS. I have tried the simplest case with two continuous variables based on a sample size N=415 with X missing 22 cases while Y is missing only 1 case. If I regress y on x I get a message that 1 case is missing on all variables (N = 414). I had expected a message indicating that the analysis would be based on 393 cases (415 - 22). Is the analysis based on
414 cases or on 393 cases? (i.e., a listwise deletion or are the cases with missing Xs somehow adjusted for missingness rather than ommitted from the analysis). I tried to find information on this and don't understand one of
your statements that "Covariate missingness can be modeled if the covariates are explicitly brought into the model and given a distributional assumption." Have I done this in my
example? Thank you.

Linda K. Muthen posted on Thursday, October 29, 2009 - 11:41 am

Your analysis uses TYPE=GENERAL with continuous outcomes. In this special case, there is no difference between estimating the model conditioned on the x variables or treating the x variables as y variables. This is why the 22 cases are not deleted from the analysis. In other cases, it does make a difference how the x variables are treated and cases with missing on x are deleted unless they are explicitly brought into the model by, for example, mentioning their variances in the MODEL command. In this case, they are treated as y variables and distributional assumptions are made about them.

Bjarte Furnes posted on Friday, December 04, 2009 - 3:46 am

Hi,

I doing a longitudinal study of 1000 children followed at four time points to assess language and literacy growth. Since this study is still ongoing there are some children that not have been assessed at time 3 and time 4 yet. In one of my papers I'm focusing on time 2-time 4, doing SEM, to examine how variuos language skills are related to later literacy development. I'm not very familar with missing, but in my data I have some missing values due to the fact that some children have not been assessed yet. What type of missing is this, and how do I handle it?

Thank you!

Linda K. Muthen posted on Friday, December 04, 2009 - 8:54 am

It sounds like it would be MCAR so using our default of MAR with TYPE=MISSING should be fine.

John Mallett posted on Thursday, January 07, 2010 - 5:59 am

Hi Linda

I want to compare a measurement model obtained from a complete sample (N=1041) with the same measurement model obtained by multiple imputation using the same data with approximately 30% planned missingness MCAR. I want to see if the MI approach gets close to the original measurement model in a real data set. The items are are scaled on 7-point Likert

I have managed to run the measurement model using both methods and the models look similar but I wondered whether the data could be combined in one measurement invariance type analysis (multigroup?).

Is this possible?

For the MI analysis I used a .dat file with the names of the 30 imputed datafiles.

Thanks for any pointers

Bengt O. Muthen posted on Thursday, January 07, 2010 - 8:49 am

The complete-data sample and the MI samples are not independent so multigroup analysis would not be correct.

What you could do is to divide the sample into groups that have different planned missingness (variables for which everyone has data plus variables that some have data for) and then do a multigroup analysis where you can test invariance over model parts that are in common for the different groups. So this would not use MI.

John Mallett posted on Friday, January 08, 2010 - 1:41 am

Thanks

Holly Burke posted on Thursday, January 14, 2010 - 2:03 pm

Hello,

I was wondering how Mplus handles missing data with WLSM in categorical factor analysis?

I thought Mplus handled missing data using maximum likelihood, but when I run the following analysis code: TYPE = COMPLEX EFA 1 5 MISSING; the output says the program used the WLSM estimator so how could the program also be using the ML estimator?

Thank you.

Linda K. Muthen posted on Thursday, January 14, 2010 - 2:55 pm

In this case, Mplus uses a pairwise present approach.

S. Jeanne Horst posted on Friday, January 29, 2010 - 7:18 am

We have collected student self-report data at seven time points and are interested in doing MM, which may lead into GMM or LGM, depending on the results of the MM. However, we have missing data (total n = 1434; listwise n = ~1271). We have determined that the missing data are not MCAR, and for now are treating them as MAR (will eventually do MNAR models, but are starting with MAR). We would like to do FIML.

My question is two-part:

1. Is the following syntax FIML?

TITLE: 2ClassA means free and var free but fix equal 0 covars
DATA: File is 'EffortMM99.dat';
VARIABLE: names are
id eff1 eff2 eff3 eff4 eff5 eff6 eff7;
Usevariables are eff1 eff2 eff3 eff4 eff5 eff6 eff7;
missing are all (99);
classes=c(2);
ANALYSIS:
type=mixture missing;
estimator = MLR;
starts 500 500;
MODEL:
%overall%
eff1 eff2 eff3 eff4 eff5 eff6 eff7;
OUTPUT: TECH1;

2. We have quite a number of external covariates, which we are hoping to use with the auxiliary command. However, some of the external covariate data are missing, as well. Can we use these data with FIML and the auxiliary command? Or, what is your recommendation?

Thank you for any advice that you can offer! It is much appreciated.

Linda K. Muthen posted on Friday, January 29, 2010 - 8:48 am

1. Whenever you combine TYPE=MISSING and maximum likelihood estimation, you have FIML.

2. The AUXILILARY setting for missing data correlates cannot be used with TYPE=MIXTURE. You would need to do this yourself.

S. Jeanne Horst posted on Friday, January 29, 2010 - 10:22 am

Dr. Muthen,

Thank you for your response.

Would the best approach be to compute MI separately for the external criteria, based upon mixture (class membership from the MM) and then use the imputed data set as auxiliary variables?

Or, is there another approach that would be preferable?

Thank you.
Jeanne

Bengt O. Muthen posted on Friday, January 29, 2010 - 4:59 pm

By MI I assume you mean Multiple Imputation. I don't know about MI software with a mixture (I assume your MM notation means mixture modeling), but perhaps you mean doing MI for subjects grouped by most likely class, which might be an alright approximation. But perhaps you could simply do MI for the external covariates without involving mixtures.

If your substantive model can reasonably be extended to include those multiply imputed external covariates among your other covariates, that might be the most straightforward approach. Otherwise, you can include the externals as auxiliaries, either with them imputed or with their missingness.

I hope I understood your questions.

S. Jeanne Horst posted on Saturday, January 30, 2010 - 8:48 am

Dr. Muthen,

Thank you; that is extremely helpful.

Jeanne

Dr. Walter H. Schreiber posted on Thursday, February 04, 2010 - 12:07 am

Dear Dr. Muthen,
im my dataset some data are missing. i have 13 measurement occasions.

i use the line
LISTWISE=ON;

this drops every Subject, which has a missing value on one of those 13 MP.

my question:
is there a command to set a criteria for the missing data per subject?

for example i only don't wanna use those Subjects, which have more than 3 missing values from the 13 possible.

thank you for your support.
walter

Linda K. Muthen posted on Thursday, February 04, 2010 - 7:44 am

I would not recommend using any rule related to missing data. This can cause the sample to be skewed. I would use all available data.

Maartje van Stralen posted on Wednesday, February 17, 2010 - 2:48 am

Dear Dr. Muthen,

I have come across some problems running my measurement model. I would like to run the Theory of Planned Behavior on a dataset containing 2,000 participants. My file consists of 30 observed variables who load on 5 factors: intention (3 obs. var); pros (9 obs. var); cons (7 obs. var); self-efficacy (9 obs. var); and social influence (2 obs. var).

If I try to run this model:

MISSING ARE ALL (-9);

ANALYSIS:
ITERATIONS = 1000;
CONVERGENCE = 0.00005;
COVERAGE = 0.10;

OUTPUT: SAMPSTAT MODINDICES(10) STANDARDIZED TECH4;

Model:
intenT0 by inten1t0-inten3t0;
prost0 by pros1t0-pros9t0;
cont0 by con1t0-con7t0;
EEt0 by EE1t0- EE9t0;
SIt0 by SSt0 SMt0;

No model results are shown (at least only the estimate is shown without s.e., p-values, MI etcetera) and I receive the following text:
MAXIMUM LOG-LIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS -63112.478
NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED.

I have already tried to increase the number of iterations but this didn't help. Can the high number of missing values explain this error (number of missing data patterns is 133). If yes, how can I solve this? If not, do you have another suggestion that explains this error? Thank you very much for your help.
Best wises!

Maartje van Stralen posted on Wednesday, February 17, 2010 - 5:40 am

Please forget my previous question!! A colleague of my solved the problem. My apologies for this inconvenience!
very best wishes,

Maartje

Tom Hildebrandt posted on Monday, February 22, 2010 - 9:13 am

I am wondering the best way to handle a standard CFA with dichotomous indicators where 1 indicator has missing data for all members of a dichotomous covariate? I get the following error when I run the model:

THE WEIGHT MATRIX PART OF VARIABLE AMEN IS NON-INVERTIBLE. THIS MAY BE DUE TO ONE OR MORE CATEGORIES HAVING TOO FEW OBSERVATIONS. CHECK YOUR DATA AND/OR COLLAPSE THE CATEGORIES FOR THIS VARIABLE. PROBLEM INVOLVING THE REGRESSION OF AMEN ON GENDER. THE PROBLEM MAY BE CAUSED BY AN EMPTY CELL IN THE BIVARIATE TABLE.

Linda K. Muthen posted on Tuesday, February 23, 2010 - 9:09 am

To be sure we understand the model, please send your full output and license number to support@statmodel.com.

Brian Hall posted on Friday, March 26, 2010 - 1:27 pm

Dear Dr. Muthen,
A quick question:
I'm using the MLR estimator for CFA analyses. I have opted to use multiple imputation in order to test CFA models separately in multiple waves of data (sample size precludes normal temporal invariance investigation using FIML).

Given the robust estimation, I am concerned that MPLUS is not providing a scaled correction factor in the imputed results. Is this a valid concern? Do I need to compute the scaling factor? and if so, how?
Thanks in advance,
Brian

Linda K. Muthen posted on Sunday, March 28, 2010 - 10:39 am

I'm not sure that using multiple imputation rather than FIML helps with a small sample size. You can test for invariance over time without looking at each time point separately. See the Topic 4 course handout starting with Slide 78 where multiple indicator growth is shown. The first steps test for measurement invariance.

If you are using TYPE=IMPUTATION and MLR, you will obtain an average MLR chi-square and standard deviation over imputations. These chi-square values have been corrected using the scaling correction factor. How to use a scaling correction factor with multiple imputation is a research question.

Kristin Voegtline posted on Thursday, April 22, 2010 - 8:44 am

Hello-

I am running a latent profile analysis with imputed data. I have generated 40 imputations with SAS proc mi, and created an ASCII file containing the names of the 40 data sets as described in the Mplus User’s guide. Although I am able to get the LPA models to converge I am concerned about the range of the indicator estimates across the classes - I have three continuous variables as indicators, all of which have been z-scored. Is it possible that the profiles may well change meaning from imputation to imputation in Mplus? In other words, across the 40 datasets is it necessary to verify that profile 1 always has the same meaning across imputations, as do profile 2, profile 3, etc. How does Mplus handle this?

Thank you for your help!

Linda K. Muthen posted on Thursday, April 22, 2010 - 9:05 am

You should use starting values to insure that the classes don't switch across imputations. Run one data set with sufficient random starts to get stable starting values.

Diego A. Carrasco Ogaz posted on Sunday, August 29, 2010 - 8:07 pm

Dear Linda & Bengt,

I got a question over handling missing data in SEM analysis for panel studies. Marini, Olsen, & Rubin (1980) suggest this method should be used in nested pattern missing data, that is every subsequent wave time should be a sub sample of previous wave, like this:

t1 t2 t3
n 1 1 1
n 1 1 0
n 1 0 0
etc...

A few reviews I found, don't clearly make statements on this issue (Enders, 2001; Newman, 2003). For example, what if the pattern missing data is like this:

t1 t2 t3
n 1 1 1 = 3 complete wave times
n 1 1 0 = t1 and t2; not t3
n 1 0 0 = just t1
n 1 0 1 = t1, not t2 and t3
n 0 1 1 = just t1 and t2
n 0 0 1 = just t3
n 0 1 0 = just t2

My concern is what it is most recommended to do with the not nested cases of the available data?
to drop them out, or to hold them for the analysis of panel data when using ML?

Another concern, is the 'few cases' in the panel paths; is only a problem of sample amount (enough data to estimate the parameters), or there is a relation of between the N amount of the within covariances (panel cases) versus the between covariances (cross cases)?

I'll welcome any comments or directions on this issue, thanks in advance!

Diego.

Bengt O. Muthen posted on Monday, August 30, 2010 - 9:00 am

I think you make a distinction between dropout (monotone missingness) and intermittent missingness. I would think it is ok to make the standard MAR assumption for intermittent missing; perhaps it is even MCAR. You should certainly keep these cases in your analyses. The principle should be to use all available data. MAR for dropout may hold close enough, but for dropout one may want to also investigate other modeling (see for instance my paper under Missing data). But this is more advanced since it means that the missingness is part of what you model.

You then bring up coverage which Mplus prints for each outcome and pairs of outcomes. You want both types to be high in a longitudinal model.

Michael Strambler posted on Thursday, September 02, 2010 - 9:37 pm

Dr. Muthen,
I'm running a simple model examining one indirect effect with one mediator. I have missing values on all variables (x, m, and y) and mentioned the x variable in the model command with the aim of FIML handling all missing data. However, Mplus is dropping cases that have missing values on all variables. Can FIML not address such cases? When I run the same model in Amos (which from what I understand also uses FIML) it appears to use the entire sample. Can you please explain what is happening here? Thank you.

Linda K. Muthen posted on Friday, September 03, 2010 - 9:23 am

Cases with missing on all variables have nothing to contribute to the modeling. Although AMOS does not tell you these observations are not used, they are not.

Michael Strambler posted on Friday, September 03, 2010 - 2:31 pm

Thank you. There are other variables not in the model that these cases have values on. Would Mplus stop dropping the cases if I brought these in as auxiliary variables? If so, do I only mention these variables in the auxiliary command or do they also need to be mentioned in the usevariables command?

Linda K. Muthen posted on Friday, September 03, 2010 - 3:38 pm

No, that will not change things.

Katy Roche posted on Friday, September 17, 2010 - 7:23 am

I have 20 imputed .dat files. Can you point me to syntax that creates the .dat file which lists the 20 imputed data sets?

Linda K. Muthen posted on Friday, September 17, 2010 - 8:08 am

If you imputed the data in Mplus, the file is created for you. See DATA IMPUTATION. If you did this outside of Mplus, you need to create the file yourself.

Katy Roche posted on Friday, September 17, 2010 - 8:50 am

I did this outside of Mplus. Is there syntax or guidance on how to create the file myself? I know that the file is to list the 20 data sets but am unclear about how to create this.

Linda K. Muthen posted on Friday, September 17, 2010 - 9:07 am

See Example 13.13 in the most recent user's guide which is on the website.

Antonio A. Morgan-Lopez posted on Thursday, September 23, 2010 - 1:13 pm

I have a question about the individual LL values output under the SAVEDATA option. In trying to reproduce individual LogLikelihood values from a single-rep simulated dataset under MAR missingness, the values I calculated for a single case (in Proc IML) were slightly different than the value(s) produced in the SAVEDATA output. I was originally using the model-implied means and covariances under H0 to calculate the LogLike in Proc IML but then switched to the H1 means and covariances; the H1 sufficient stats seemed to reproduce the proper individual LL values in the output dataset. So a) am I right in my understanding that the LL values under the SAVEDATA command are the H1 LL values and, if so, b) is there anyway to also output the individual values under H0?

Linda K. Muthen posted on Thursday, September 23, 2010 - 5:48 pm

Try using TYPE=RANDOM. There is a problem with TYPE=GENERAL and all continuous outcomes for the saved LL's which will be changed in the next update.

Antonio A. Morgan-Lopez posted on Thursday, September 23, 2010 - 7:23 pm

Thank you Linda - confirmed on my end that TYPE=RANDOM + SAVEDATA/LOGLIKE gives H0 LLs. Might both H0 and H1 LLs be available in SAVEDATA for the next update?

Linda K. Muthen posted on Friday, September 24, 2010 - 4:51 pm

No, just H0. Why would you want this for the H1 model?

Antonio A. Morgan-Lopez posted on Friday, September 24, 2010 - 5:17 pm

More for illustrative purposes for case-level discrepancies between H1 and H0 LLs - was thinking about them for a module on FIML for a seminar on missing. Your point is well taken though because is difficult to think of how H1 LLs would be useful in practice when your concern is H0 in real applications.

J.D. Haltigan posted on Tuesday, September 28, 2010 - 4:17 pm

Just to be sure of myself: when type=missing is specified, what is the default method that Mplus uses to handle missing data? Or is this a function of the model specified. In my case, it is a simple path analysis; all variables are observed.

Linda K. Muthen posted on Tuesday, September 28, 2010 - 5:46 pm

It is a function of the estimator and variable type not the model.

Mplus provides maximum likelihood estimation under MCAR (missing completely at random), MAR (missing at random), and NMAR (not missing at random) for continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types (Little & Rubin, 2002). MAR means that missingness can be a function of observed covariates and observed outcomes. For censored and categorical outcomes using weighted least squares estimation, missingness is allowed to be a function of the observed covariates but not the observed outcomes. When there are no covariates in the model, this is analogous to pairwise present analysis.

Sarah Ryan posted on Wednesday, October 06, 2010 - 10:53 am

I am working on the prospectus for my dissertation in which I will be using a national (NCES) longitudinal dataset. A number of exogenous variables are categorical, as is the mediating variable (5 levels) and the outcome (7 levels- so could be considered continuous). I have some missingness (data meet MAR)- seldom more than 10% and a good deal less in most cases. I have two questions, and I apologize if these seem horribly basic:

1) Is it best to use the WLSMV estimator?

and, if so,

2) Do I need to employ MI to deal with missingness as a first step? I have had some faculty at a training I attended suggest that it is always better to impute first, even in the SEM framework.

I'm still learning, so I'm having trouble understanding when and why I would or would not be wise to use MI.

Thanks.

Linda K. Muthen posted on Thursday, October 07, 2010 - 10:17 am

With categorical outcomes, you can use either weighted least squares or maximum likelihood estimation. If your model has more than four factors, maximum likelihood would not be feasible because numerical integration is required. If you use maximum likelihood, you can use the default missing data estimation which is asymptotically equivalent to multiple imputation. If you use weighted least squares estimation, I would use multiple imputation because missing data estimation with weighted least squares estimation is not as good as with maximum likelihood.

Alice posted on Thursday, November 04, 2010 - 11:27 am

Question for Mplus discussion board:

I am a new user of Mplus. I am trying to run latent class analysis. The data files covers to waves of data. The data file has complex sample survey features. It has stratification, clustering, and weights.

And I need to use subpopulation option. The sample is wave4 sample.

My questions are:

(1) after I limit my sample to wave 4 sample, my data still has missing on all variables. Income is a continous variables and all others are categorical variables. Am I able to use Full Information Maximum Likelihood to deal with missingness.
I googled somewhere and it says Mplus FIML is only for continuous variables. Is that true?

(2) FIML is able to deal with complex survey sample?

(3) Can I use multiple waves to run latent class analysis?

Please see my code in the next message for reference.

Linda K. Muthen posted on Thursday, November 04, 2010 - 2:00 pm

1. The default in Mplus is estimating the model using all information using maximum likelihood. A person who has missing values on all variables does not contribute anything so they are deleted.

2. Yes.

3. This would be LCGA or GMM. See the Chapter 8 examples in the user's guide.

Alice posted on Friday, November 05, 2010 - 10:55 am

Thanks for the reply, Linda. Although my data are longitudinal (two waves), there are no repeated measures. Wave 1 data are respondents' reporting of their parents socioeconomic variables and wave 4 data are respondents' socioeconomic variables. I want to use latent class analysis to capture intergenerational mobility. And I want to identify individuals into different class membership. For example, I want to classify people into different groups, like moving up, staying the same as their parents, or moving down. For this kind of model, can I do simple latent class analysis (treating the longitudinal data as cross-sectional data) instead of LCGA or GMM?

BTW, using Full Information Maximum Likelihood to deal with missing data, do I need to specify it in the code?

bushra farah nasir posted on Friday, November 05, 2010 - 6:14 pm

I am trying to analyse clinical + genetic data from a patient cohort as part of my PhD. I have started using LCA (LatentGolD) to classify any underlying latent classes within my data, however after reading the manuals and a few tutorials, I am still confused as to how to determine the best cluster model. Some places I have noticed they just opt for the lowest BIC, however in other places they select the lowest L2 value. Is there any set criteria to select the best model?

p.s. I have a very basic statistical background!

Bengt O. Muthen posted on Saturday, November 06, 2010 - 5:54 am

It is very common to use BIC with mixtures - take a look at

Nylund, K.L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.

which is on the Mplus web site.

Bengt O. Muthen posted on Saturday, November 06, 2010 - 9:52 am

Answer for Alice:

FIML has come to mean ML under the MAR assumption of missing data. It is true that the term is typically used with continuous outcomes, but ML under MAR can be used with categorical outcomes as well, not just continuous ones. It is the default in the current Mplus version. You obtain ML under MAR simply by specifying what missing data symbol you have in the data (Missing = in the Variable command). By requesting Patterns in the Output command you will see what missingness there is in your data.

It sounds like you want to do Latent Transition Analysis. This is a Latent Class Analysis at several time points where you can study changes in class membership over time. The User's Guide has several such examples and there are several papers posted on our web site on this topic.

Xi Chen posted on Monday, November 15, 2010 - 2:14 pm

Hi Dr. Muthen,
I ran a simple regression in Mplus and SPSS. The valid cases in SPSS with listwise deletion is 409, while the number of observation in Mplus is only 290, together with this:
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 190

I checked the way I read data and I did not find any problem. Looking forward to your suggestions. Thanks!

Linda K. Muthen posted on Monday, November 15, 2010 - 2:18 pm

That message suggests that you did not do listwise deletion in Mplus. That message is related to TYPE=MISSING where missing data theory does not apply to independent variables. To do listwise deletion in Mplus, specify LISTWISE=ON in the DATA command.

Xi Chen posted on Monday, November 15, 2010 - 2:40 pm

the result was the same with listwise=on.

Linda K. Muthen posted on Monday, November 15, 2010 - 2:45 pm

Please send the two outputs and your license number to support@statmodel.com.

Xi Chen posted on Monday, November 15, 2010 - 5:15 pm

It turns out that the total N is 712 but Mplus only used 480 observations. the variables in the model do not have missing data and there are 712 rows in the datafile. Is there anyway to find out which part of data are used in mplus?
Thanks!

Linda K. Muthen posted on Monday, November 15, 2010 - 6:05 pm

You can save the data using the SAVEDATA command.

Xi Chen posted on Monday, November 15, 2010 - 9:22 pm

Hi Dr. Muthen,
I have checked the data used in the analysis and the original data. it looks like Mplus deleted some observations not for missing data problem (some observations without missing data were also deleted). Is there any reason why Mplus would delete observations from analysis? Thanks!

Linda K. Muthen posted on Tuesday, November 16, 2010 - 6:26 am

No.

Alain Girard posted on Tuesday, February 15, 2011 - 7:55 am

Hi,
I just upgrade Mplus to 6.1 and i run an old program and the number of subjects is now lower.

I want to estimate a regression model using FIML. But now i revice the following message:

*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 29
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

How can i use FIML with 6.1 ?

Thanks

Alain Girard
University of Montreal

Linda K. Muthen posted on Tuesday, February 15, 2011 - 8:02 am

In Version 5, the default changed from listwise deletion to TYPE=MISSING. You can obtain listwise deletion by adding LISTWISE=ON; to the DATA command.

Missing data theory does not apply to observed exogenous covariates. That is why observations with missing on x are excluded. If you want them included, you must mention their variances in the MODEL command. When you do that they are treated as dependent variables and distributional assumptions are made about them.

Dawit Getnet posted on Friday, February 25, 2011 - 2:04 am

I have a question.

Data could be missing either MAR, MCAR or MNAR. My question is how can we identify the missing data is which type? I am using SAS. So, what is he mechanism?

Bengt O. Muthen posted on Friday, February 25, 2011 - 8:29 am

You can't identify whether the data are missing as MAR or NMAR, the two key contenders. For how to approach this dilemma, see the 2 papers on missing data by Enders and Muthen et al. mentioned on our home page, which also show how to do NMAR modeling in Mplus.

Nicholas Bishop posted on Thursday, March 17, 2011 - 11:43 am

Hello,
I have a question regarding a discrepancy between the estimated sample statistics produced by the descriptives produced for example 11.2, and the estimated sample statistics provided for a LGM. For the outcome trajectory, the estimated means for the baseline observation are the same, but diverge for later observations. Specifically, the estimated means are substantially lower for the LGM produced SAMPLE STATISTICS: ESTIMATED SAMPLE STATISTICS at later observations than the RESULTS FOR BASIC ANALYSIS: ESTIMATED SAMPLE STATISTICS.

What is the source of this difference? Thanks.

Nick

Linda K. Muthen posted on Thursday, March 17, 2011 - 12:04 pm

Please send the output files that show this and your license number to support@statmodel.com.

Aleksandra Holod posted on Sunday, March 27, 2011 - 1:33 pm

I am a doctoral student attempting to test second-order factors that I will include in a future SEM analysis. I am using complex survey data which require weights. My dataset does have two different versions of the weights – a base weight with a Taylor Series strata and PSU, or replicate weights. When I was proposing this project, I was advised by my mentor to use FIML in Mplus to address missing data. I have been using the replicate weights with bootstrap standard errors, but it appears that FIML is not available with this method of weighting.

Am I correct in my understanding that FIML is not available when using replicate weights with bootstrap standard errors?

In this case, what approach do you recommend? If at all possible, I would like to avoid listwise deletion.

Linda K. Muthen posted on Sunday, March 27, 2011 - 2:09 pm

I don't think this is the case. Please send the output and your license number to support@statmodel.com so I can see what message you are getting.

Sabine Spindler posted on Tuesday, March 29, 2011 - 7:54 am

Dear Dr. Muthen,

I am unsure whether I understand above explanations correctly.

My model is a simple 1 factor, 20 items model with ordinal scaled data. I am using the WLSMV estimator.

However, I have missing data. How can I estimate these missing data points? CAn I use FIML?

Thank you very much, Sabine

Bengt O. Muthen posted on Tuesday, March 29, 2011 - 8:17 am

You can use FIML with your model and this will handle the missing data well.

Martie Thompson posted on Wednesday, April 27, 2011 - 11:55 am

can FIML be used when WLSMV is specified as estimator and is this accomplished by type = missing command? An earlier reply post by Linda to another user's question indicated that "if you use TYPE=MISSING with WLSMV, the missing data technique is pairwise present."
Thanks!

Linda K. Muthen posted on Wednesday, April 27, 2011 - 2:23 pm

No, FIML cannot be used with WLSMV.

Sarah Ryan posted on Wednesday, April 27, 2011 - 3:24 pm

I am running a mediation (mediator is latent continuous) model with four latent factors, and predominantly categorical indicators. The outcome is ordered categorical with six levels. Would it be advisable to treat the outcome as continuous in this case (rather than specifying it as CATEGORICAL) in order to reduce the computation burden of the numerical integration that will be required for this model?

Linda K. Muthen posted on Wednesday, April 27, 2011 - 5:57 pm

It depends on whether the ordinal variable piles up at either end, that is, has a floor or ceiling effect. If it does, it should be treated as a categorical variable. If not, you are probably safe to treat it as a continuous variable. You can also consider using the WLSMV estimator. If you have categorical factor indicators, each factor is one dimension of integration with maximum likelihood estimation.

Sarah Ryan posted on Thursday, April 28, 2011 - 2:26 pm

Okay. If I use the WLSMV estimator with imputed data sets, however, is there a way to test multigroup invariance?

My reading has led me to think that difference testing with imputed data is relatively unexplored.

Also, is a corrected/scaled Wald statistic provided in the output when using Mplus to analyze imputed data sets using WLSMV?

Linda K. Muthen posted on Sunday, May 01, 2011 - 10:17 am

I don't know of a way to do a difference test with imputed data.

A correct Wald test using MODEL TEST is provided with imputed data.

J.D. Haltigan posted on Monday, June 13, 2011 - 10:17 am

A quick question I had:

I realize that beginning with v. 5 Mplus uses missing and Type=H1 as the default in model analyses. However, I was curious as to why the same exact model run (same command-line syntax) would indicate different missingness (on the same input data file) between versions 4.1 and 6.0. This was noted when an adviser ran the same analyses on a different version than I am using (the estimates are basically the same, but in the analyses using 4.1, all the data is indicated as being present whereas in V. 6 it indicates that there are 2 cases with data missing on x-variables and 150 cases where data is missing on all variables except x variables). I know from a previous analyses using FIML in v 4.1 that the warning for missing data is worded such that missing data are noted as 'number of cases with missing on all variables'. Is this difference b/c v 5 and higher looks at missingness as a function of x and y variables whereas v 4.1 looks at it with respect to all variables considered simultaneously?

Linda K. Muthen posted on Monday, June 13, 2011 - 10:34 am

This difference is explained under Version History. See Analysis Conditional on Covariates under Version 6.1.

J.D. Haltigan posted on Tuesday, June 14, 2011 - 10:30 pm

Thanks Linda, read through this. To make sure I am understanding the difference fully would it be correct to say that pre v6.0 cases were only deleted if they were missing values for all variables (i.e., endogenous and exogenous) whereas currently cases are deleted if they are missing either all x vars, all y vars, and/or both considered in total?

Linda K. Muthen posted on Wednesday, June 15, 2011 - 10:05 am

Yes, for pre Version 6. For Version 6 and on, cases are deleted if they have missing for one or mode x variables or all y variables. These are considered separately.

J.D. Haltigan posted on Wednesday, June 15, 2011 - 3:35 pm

Many thanks. One thing I am still having trouble wrapping my head around is that when I conduct parallel analyses in V6, my results are exactly the same as they were in v4. The only difference is that in v6 I get the warning that data set contains cases with missing on all variables except x-variables. These cases were not included in the analysis.

With those cases not included in the analysis how is it that the model results can still be exactly the same? Does it have something to do with the fact that I complete data for the x-variable in the model? Bear with me if the answer is straightforward and I am just not seeing it.

Linda K. Muthen posted on Wednesday, June 15, 2011 - 5:48 pm

It is the case that for maximum likelihood estimation for continuous outcomes with no missing data, the results will be the same if the model is estimated with y and x or with y conditioned on x. It is only in this case that the results will be the same. We changed to y conditioned on x to be in line with the rest of the program and regression in general.

Suzanne Bartle- Haring posted on Wednesday, July 06, 2011 - 9:02 am

Are there sample size parameters for conducting a pattern mixture growth model? I am writing a data analysis section for a grant in which missing data that is not ignorable is expected. It is a small clinical trial with a total of 60 subjects with equal distribution in two treatment groups (n=30 in each). I know this is very small, but was wondering if testing a model with two growth classes (one with missing one without) would be possible?

Bengt O. Muthen posted on Wednesday, July 06, 2011 - 1:02 pm

That depends on how many time points you have and what the growth shape is, plus parameter values. You have to do a Monte Carlo simulation study to learn about it. 60 may not be too low even for a 2-class model, but with mixtures the answer also depends on the degree of class separation in the growth factor means. See UG chapter 12.

Eric Teman posted on Tuesday, August 16, 2011 - 9:25 pm

When using FIML in Mplus, does Mplus delete cases with missing values on exogenous observed variables by default? And uses full information for missing values on the endogenous variables?

Linda K. Muthen posted on Wednesday, August 17, 2011 - 6:34 am

Yes. Missing data theory does not apply to observed exogenous variables. Any case with missing on one or more observed exogenous variables is eliminated from the analysis.

Eric Teman posted on Wednesday, August 17, 2011 - 5:51 pm

Does this also mean that missing data theory doesn't apply to CFAs, since there are only exogenous variables in a CFA?

Bengt O. Muthen posted on Wednesday, August 17, 2011 - 6:05 pm

There are only endogenous variables in CFA. Endo is Y, exo is X.

Eric Teman posted on Wednesday, August 17, 2011 - 6:14 pm

If missing values on covariates are ordinal, will FIML still delete the cases?

Bengt O. Muthen posted on Wednesday, August 17, 2011 - 6:19 pm

Yes. You can use multiple imputation where you specify that the variable is categorical.

Eric Teman posted on Wednesday, August 17, 2011 - 6:49 pm

And this is only since version 6.1 right?

Linda K. Muthen posted on Thursday, August 18, 2011 - 6:14 am

Multiple imputation came out in Version 6.

Eric Teman posted on Thursday, August 18, 2011 - 11:36 am

When did FIML start deleting cases with missing values on the exogenous variables? What did it do prior to that?

Bengt O. Muthen posted on Thursday, August 18, 2011 - 11:46 am

Version 6.1. See Version History on our web site to find more information about this.

You can easily revert to before v6.1 by mentioning the means or variances of the covariates. But this then makes additional model assumptions, not included in the original model. They are the same assumptions you make with multiple imputation. We made this change to be consistent throughout the program with categorical modeling, mixture modeling, and other cases.

Eric Teman posted on Thursday, August 18, 2011 - 2:38 pm

Hypothetically, let's say a dataset had missing values only on exogenous variables, i.e., the endogenous observed variables are complete. Would employing FIML be identical to using listwise deletion?

Bengt O. Muthen posted on Thursday, August 18, 2011 - 2:48 pm

Yes, at least the way I define FIML. FIML is helpful when some endogenous variables have some missing values because then missingness is allowed to be a function of some of the other, not missing, variables for the individuals with missing. For instance, in a longitudinal study, the outcome at the first time point may be observed for many persons and this may predict later missingness.

Eric Teman posted on Saturday, August 20, 2011 - 3:03 pm

On p. 458 of the version 6 Mplus manual, it says, "The ASCII files...must be created by the user." I noticed yesterday, though, that I did not need to manually create these. When did Mplus start doing this automatically?

Linda K. Muthen posted on Monday, August 22, 2011 - 3:16 pm

This is done automatically if you impute the data sets using DATA IMPUTATION but not if you impute the data sets outside of Mplus.

Eric Teman posted on Wednesday, August 24, 2011 - 8:05 pm

Hypothetically, in a Monte Carlo simulation study where a design cell contains 1,000 replications of a CFA model where multiple imputation was used (with 10 imputation data sets created per replication), would you simply average all of the parameter estimates and fit statistics (including chi-square) for that one cell across all imputations within replications?

Linda K. Muthen posted on Thursday, August 25, 2011 - 8:38 am

You can do that for the parameter estimates but not for the fit statistics. How to accumulate fit statistic information is unstudied except for ML chi-square. See the most recent Topic 9 course handout for the formula.

Eric Teman posted on Thursday, August 25, 2011 - 2:39 pm

Is the ML chi-square over 5 imputations, for example, output anywhere for reading into an outside statistics software package? Or will I have to calculate the ML chi-square over the number of imputations in multiple imputation?

Bengt O. Muthen posted on Thursday, August 25, 2011 - 2:58 pm

Note that the ML chi-square for each imputation - or the average of this over replications - is not a useful measure of fit but misestimates fit quite a bit. See the Topic 9 handout of 6/1/11, slides 212-216 for a study of this. The correct chi-square T_imp is printed.

Eric Teman posted on Thursday, August 25, 2011 - 3:02 pm

But is it printed in the ASCII results file. I can't find it there. I see it in the OUT files but not the ASCII files.

Eric Teman posted on Thursday, August 25, 2011 - 4:31 pm

I am using WLSMV as the estimator. Maybe that is why I'm not getting T_imp. In this case, how should I proceed with calculating the chi-square over imputations?

Bengt O. Muthen posted on Thursday, August 25, 2011 - 5:24 pm

You are hitting the research frontier - that hasn't been invented yet.

Eric Teman posted on Thursday, August 25, 2011 - 5:46 pm

Would it be reasonable/appropriate to do a Monte Carlo simulation study to see how multiple imputation works with the WSLSMV estimator? Mplus is capable of this, right? We just don't know how it will act?

Eric Teman posted on Thursday, August 25, 2011 - 10:26 pm

To be more specific, I mean we don't know how the adjunct and chi-square fit statistics will behave when WLSMV is used, right? It might be beneficial for a Monte Carlo simulation to be done????

Bengt O. Muthen posted on Friday, August 26, 2011 - 6:13 am

You will probably learn something about how poorly the statistic performs, but then there is still the need for theory to suggest a better statistic.

Heather Whitcomb-Starnes posted on Tuesday, September 06, 2011 - 11:29 am

Basic question... I'm a new user and not sure which topic to post this in.

How do I get means and n missing for each variable?

I want to compare the means and (number of observations for each variable) in Mplus with the means (and n's) in SAS to make sure the data set conversion was completed accurately.

I was able to get means from the following code (I'm using version 6.11 on Linux):

Title: Checking mplus datafile to SAS datafile
Data:
File is FIdata_8-23-11_nonames.dat;
Variable:
Names are

lwlkm08 lwlks08 lwlkb08 wlk2x08
sidewalk parkdcar grass_strip bike_trail
safe_street noculdesac intrsctn altroute strght_st trees intrstng_thg
natrlsight attrctv_bldg traffic trffcspd_slow trffcspd_fast pedsgnl_cross
crossw_beeps islands cross_busyst car_sdwlk curbcut crime_high walk_unsafed
walk_unsafen straydog alleys_unsafe teens_unsafe streetlights walkbike_seen
walkstores parking_diff walkplaces walktobus hilly barriers_walk
popdens2 popdens3
singlemixed singlefamily housing;

Missing = all(-1234) ;
Analysis:
Type = basic;
Output:
sampstat;

Heather Whitcomb-Starnes posted on Tuesday, September 06, 2011 - 11:30 am

Follow-up to my post above:

I was not able to determine the n for each variable in Mplus, using the code above.

Linda K. Muthen posted on Tuesday, September 06, 2011 - 1:48 pm

TYPE=BASIC is how you would obtain the means. You can also use the SAMPSTAT option. The sample size is the same for each mean. It is shown at the beginning of the analysis summary.

anouk van drunen posted on Wednesday, October 05, 2011 - 8:02 am

Hello,

Am sorry if not posting in the right place, could not find an appropiate topic.
I keep getting this error message when running my input file.

*** ERROR
The length of the data field exceeds the 40-character limit for free-formatted
data. Error at record #: 1, field #: 32
*** ERROR
The number of observations is 0. Check your data and format statement.
Data file: F:\MATCHEDBYCHNR W1234_mplus.csv

I have saved the spss file as .csv without variable names so everything should be alright.

Furthermore I was wondering, i am using type=complex and have missing data, is FIML used automatically?
also i have indirect effects, bootstrap cannot be used i have found out is there any other option to know whether the indirect effects are significant?
(i've heard about Prodscal but am not sure how that works)

Best,

Anouk

Linda K. Muthen posted on Wednesday, October 05, 2011 - 9:05 am

It sounds like you may have fixed format data that you are trying to read as free format. If that is not the problem, please send the relevant files and your license number to support@statmodel.com.

If you are using a version after Version 5, TYPE=MISSING is the default.

When the BOOTSTRAP option is not available, Delta method standard errors are provided.

anouk van drunen posted on Friday, October 07, 2011 - 2:19 am

Dear Linda,

Thank you for your reply.
How can I indicate that I have a fixed format in the input file?

Can you advice me on a reference for the Delta method?

Thank you,

anouk van drunen posted on Friday, October 07, 2011 - 2:58 am

Dear Linda,
Using a .dat file i know succeeded in not getting this error however now, a variable is not recognized in the model comment:

model:
f1 by w12mep w12met
f2 by w2afp1 w2afp2

Mplus says f2 is not recognized. what can I do to solve that?

Best,

Linda K. Muthen posted on Friday, October 07, 2011 - 8:07 am

See the FORMAT option.

The Delta method standard errors are computed with MODEL INDIRECT automatically. There is a FAQ on the website if you want more information.

You need a semicolon (;) after each BY statement.

anouk van drunen posted on Friday, October 07, 2011 - 8:43 am

Thank you very much!
that helps.

Finally have output now, one ,hopefully last.., question:

i get this message:

THE COVARIANCE COVERAGE WAS NOT FULFILLED FOR ALL GROUPS
THE MISSING DATA EM ALGORITHM WILL NOT BE INITIATED
CHECK YOU DATA OR LOWER THE COVARIANCE COVERAGE LIMIT.

SAMPLE STATISITCS
THE MINIMUM COVARIANCE COVERAGE WAS NOT FULFILLED FOR ALL GROUPS

1 i thought FIML was used and not EM, but that should be similar right?
2 how could i solve the covariance problem?

Thank you again for the quick replies.

Best,

Linda K. Muthen posted on Friday, October 07, 2011 - 11:04 am

1. It is.
2. The default covariance coverage is .10. You must have lower coverage than that in one group. See the COVERAGE option in the user's guide.

roofia galeshi posted on Friday, October 07, 2011 - 5:44 pm

Hello,

I am interested in using MPlus with large data sets like TIMSS and NAPE, in particular TIMSS. To do so I have a few questions:

1) Does Mplus allow for sample weight to items if yes how?

2)How Mplus handles missing data in blocks to explain it in detail:

TIMSS is a collection of 12 booklets that is administered to several thousands students. Each student answers only to 2 booklets, as the result, if one wants to stduy the whole data set, one will have blocks of missing data. I was wondering if Mplus can handle such data sets.
TIMSS 2003: 740 students are responding to items in block 1 and 2, another 740 students respond to block 2 and 3, another 740 respond to blocks 3 and 4 and so on. I was wondering if I stack all these items then there will be blocks of items which are missing. Can Mplus handle such a data?

3) I will be using it for Latent class analysis and was wondering if I could fix some of the parameters for the purpose of equating these blocks of itmes

Thank you

Bengt O. Muthen posted on Saturday, October 08, 2011 - 2:04 pm

1) Mplus allows for sampling weights - see the paper (which is also on our web site):

Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12, 411-434.

I don't know what you mean by "sample weight to items"

2 )You have missing by design which can be handled in Mplus in two ways. One is to have as variables (columns in the data) all the variables in all 12 booklets so that all students have missing on most variables. The other is to do a multiple-group analysis where each group of students has its own set of variables (but the same number of variables).

3) Yes, you can hold parameters equal for the purpose of equating.

roofia galeshi posted on Saturday, October 08, 2011 - 2:50 pm

thank you for your reply and sorry for the double posting.
just to clarify, these variables are all categorical not continuous. Basically correct (1) or wrong(0) answers to mathematics questions. Can Mplus work with missing data, about 50% of the data is missing?

Thank you in advance

Bengt O. Muthen posted on Saturday, October 08, 2011 - 3:14 pm

No problem.

anouk van drunen posted on Sunday, October 09, 2011 - 5:18 am

Dear Linda,

I have output for two mediation models now.
However, the robust chi square values are not computed. (with MLR AND TYPE=COMPLEX)
Also, no standard errors are displayed only the estimates. (i use the STANDARDIZED CINTERVAL command in OUTPUT)

Do you know why?

Thank you,

Linda K. Muthen posted on Sunday, October 09, 2011 - 7:31 am

It sounds like the model did not converge. If you want help, please send the output and your license number to support@statmodel.com.

roofia galeshi posted on Wednesday, October 12, 2011 - 9:00 am

Hello,

Would you please direct me to the particular Mplus chapter that discusses handling blocks of missing data for categorical variables, mainly Latent Class modeling.

Thank you for your help in advance

Linda K. Muthen posted on Wednesday, October 12, 2011 - 12:17 pm

I'm not sure what you mean by blocks of missing data. If you mean missing be design, this is taken care of by the default of TYPE=MISSING.

roofia galeshi posted on Wednesday, October 12, 2011 - 6:13 pm

missing data are not by design, the data is a national data set that has several chunks missing, I would like to learn how Mplus handles these types of data.
Thanks

Linda K. Muthen posted on Thursday, October 13, 2011 - 12:02 pm

Please see pages 7-8 of the user's guide.

Joseph E. Glass posted on Wednesday, November 30, 2011 - 8:13 am

Hello,

I am receiving the error "THE MINIMUM COVARIANCE COVERAGE WAS NOT FULFILLED FOR ALL GROUPS" when running a structural equation model using complex survey data and WLSMV (n=1,100, with 46 observed variables). There are several latent variables and a number of observed dummy variables as covariates. Everything is being regressed on a dichotomous variable.

When I run the analysis variables through type=BASIC to investigate covariance coverage, it appears that all coverage values are well above 0.9. I wonder what I should be looking for. This is not a multiple group analysis so I wonder if there are other groups that the error message would be referring to? Perhaps it is referring to the clusters in the complex survey data?

Thanks so much!

Linda K. Muthen posted on Wednesday, November 30, 2011 - 10:07 am

Please send the output and your license number to support@statmodel.com so I can see the full picture.

Caroline Fitzpatrick posted on Wednesday, December 14, 2011 - 1:17 pm

Using mplus 6, I am getting listwise deletion for my DV, even after specifying in the syntax:

MISSING = KQEET07(999);

this KQEET07 is my DV and the only variable with missing data. I am confused because I have used the same syntax with similar models. Now, instead of getting FILM, I get listwise deletion.
Any ideas?

thanks for your help!

Linda K. Muthen posted on Wednesday, December 14, 2011 - 2:43 pm

The default is not listwise deletion. If you want me to see what is happening, send the output and your license number to support@statmodel.com.

Lisa M. Yarnell posted on Thursday, December 15, 2011 - 2:09 pm

Drs. Muthen: I have seen the use of latent coefficients (latent "placeholders," if you will) in longitudinal growth models where there is planned missing data.

Can latent coefficients (or latent "placeholders") also be used in the context of multi-group modeling when not all items from a standard scale were administered to one group?

What about if the item was not administered to either group? Would we be able to use latent placeholders and have Mplus estimate what the factor loadings for those items would have been had they been administered?

Or is this an inappropriate use of the latent coefficients / latent placeholders?

Do you have an example in the Mplus manual, or do you know of an article, that has used latent placeholders in the context of multigroup modeling with planned missing data in the past?

Thanks.

Lisa M. Yarnell posted on Thursday, December 15, 2011 - 2:13 pm

More specifically, can latents be used in multi-group CFAs, rather than multi-group LGMs, when there is planned missing data?

Linda K. Muthen posted on Thursday, December 15, 2011 - 2:21 pm

If a study has planned missingness, a missing value flag should be assigned to individuals who did not take the item. Nothing more needs to be done. If no one took an item, it will not be used in the analysis. An item with all missing contributes nothing to the model.

Lisa M. Yarnell posted on Thursday, December 15, 2011 - 2:28 pm

Would Mplus estimate a CFA loading for an item that was not administered if we included it in the model and in our code?

That is, Mplus could estimate what the loading would have been, had it been administered in that group--based on the functioning of the other items in the lower-order factor on which it loads?

Lisa M. Yarnell posted on Thursday, December 15, 2011 - 2:54 pm

And if so, would I need to set something for the latent item equal to other estimates in the model, such as the error variance for that item?

Bengt O. Muthen posted on Thursday, December 15, 2011 - 6:25 pm

You cannot identify or estimate a loading for an item that was not administered to anyone in the sample because there is no sample information on this loading. You have to have some subjects who has responses on the item so that you know how this item correlates with the other items - and therefore can draw inference about how subjects who didn't take the item might have scored.

In a two-group model you can have an item that is only administered in one of the two groups, but you cannot estimate a group-specific loading for that item in the group that didn't take the item. This is for the same reason as above.

I have not heard of latent coefficients / latent placeholders, so I don't know what that is.

Lisa M. Yarnell posted on Thursday, December 15, 2011 - 6:57 pm

Thank you!

Lisa M. Yarnell posted on Sunday, December 18, 2011 - 9:09 pm

Bengt (or Linda): You mentioned that "in a two-group model you can have an item that is only administered in one of the two groups."

This is what I am doing, and the items are categorical, in a 2-group (gender) CFA model. One such item that was administered to only one group (girls only) is CBCEYEP (measuring eye problems as a somatic symptom).

When I run the analysis, I get the message: *** ERROR Categorical variable CBCEYEP contains less than 2 categories.

However, that is not true. Among girls, to whom the item was administered, all three possible categories were endorsed. I set loadings and thresholds to be equal in both groups, and freed the scaling factor in one group.

Can you tell me from this information what I am doing wrong?

Linda K. Muthen posted on Monday, December 19, 2011 - 6:00 am

Please send the input, data, output, and your license number to support@statmodel.com.

Lisa M. Yarnell posted on Monday, December 19, 2011 - 3:34 pm

Hello, thanks for your advice, Linda! I was looking at this downloadable sheet: http://www.statmodel.com/download/Different%20number%20of%20variables%20in%20different%20groups.pdf

We have extra variables in certain groups (and missing in the other group), and these variables are dependent. Fixing the residual variances to be equal to those in the other group implies Theta parameterization, does it not?

Thanks again.

Bengt O. Muthen posted on Monday, December 19, 2011 - 6:52 pm

The trick in that FAQ is only for continuous outcomes, not categorical outcomes that you have. Here is another approach you can take.

Assume as an example that you have 10 items that both males and females respond to and assume that each gender also responds to 5 additional items, but they aren't the same for the two genders. So each gender responds to 15 items. Your input should then refer to 15 items in the USEV list and your model should have any equality constraints applied only to the same 10 items that both genders respond to.

Lisa M. Yarnell posted on Monday, December 19, 2011 - 6:58 pm

Thanks, Bengt. Why wouldn't I list 20 items in the USEV list, if there were 10 common items + 5 extra items for boys + 5 extra items for girls?

If I list 15, which 15 do I list? Those administered to the first group?

Bengt O. Muthen posted on Monday, December 19, 2011 - 8:38 pm

You don't want to list 20 items because both groups would then have 5 items where nobody in the group has a responses to those 5 items.

The 15 items are different items for the two groups. For males, it is the set of 15 items that males responded to, for females it is the set of items that females responded to. So you have to arrange your data that way.

Lisa M. Yarnell posted on Monday, December 19, 2011 - 9:13 pm

Bengt, I am trying to run this as a multi-group model, where parameters for males and estimates for females are estimated simultaneously.

Do I estimate this in three stages? For example, get the estimates for the model with items that females responded to, then get the estimates for the model with items that males responded to, then run the overall model with parameters set at the values derived from the first two models for the times that these are missing for a certain gender in the overall model?

I am sorry. I am so confused about this. I think the three-stage strategy may work because Linda mentioned that groups with no data for an item do not contribute to the estimates? However, if I have for example a factor with 8 indicators but two are missing for boys and two are missing for girls, and I try this three-stage stratgegy, can I assume that including the loadings for these items administered only to one of the two groups would have no effect on the other loadings when I add them in?

Bengt O. Muthen posted on Tuesday, December 20, 2011 - 8:32 am

What I suggested is a single analysis, not a multi-stage analysis. I suggested a simultaneous, 2-group analysis of males and females. You arrange your data so that say the first 10 columns are the common items and the next 5 columns are the items specific to each gender (so those 5 are different items for the two genders). So for instance if you have one factor f, you say in the Overall part of the model:

f by y1-y15;

You can then apply measurement invariance across gender for the first 10 items. The next 5 items contribute to the measurement of f, although they are different items for the two genders.

This is a standard type of approach when different sets of subjects take different forms of an achievement test. A similar approach is also used with multiple-cohort data.

If you are still unsure of what I am suggesting, you may want to consult with an SEM person on your campus who can sit down with you and talk you through it.

Lisa M. Yarnell posted on Thursday, January 05, 2012 - 7:25 pm

Hello, what does the following message imply about my data, and how can I fix the problem so that the model will run? I don't think one entire group is missing data for these items, so I am not sure why I am getting these messages.

WARNING: THE BIVARIATE TABLE OF VANDA_D AND SKINP_D HAS AN EMPTY CELL.

COMPUTATIONAL PROBLEMS ESTIMATING THE CORRELATION FOR VANDA_D AND SKINP_D.

Lisa M. Yarnell posted on Thursday, January 05, 2012 - 7:34 pm

P.S. I do have several items with low endorsement, such as the items below, which were mentioned in the warning above. Can items with low endorsement lead to the generation of a warning like the messages above? The data are not really missing, so it's a confusing message to receive.

VANDA_D
Category 1 0.963 315.000
Category 2 0.037 12.000

SKINP_D
Category 1 0.937 251.000
Category 2 0.063 17.000

Linda K. Muthen posted on Friday, January 06, 2012 - 11:36 am

When a bivariate table has an empty cell, this implies a correlation of one which means that only one of the variables should be used in the analysis. Variables that correlate one are not statistically distinguishable. Empty cells can occur for extreme items when sample size is small.

Amanda Hare posted on Wednesday, January 11, 2012 - 8:28 am

Hi there-

I am trying to run what I thought was a very simple model using version 6.1, predicting wave 2 self esteem (continuous) from sex (categorical), wave 1 self esteem (continuous), and authoritative parenting (continuous). The problem is that I'm getting listwise
deletion of all cases with missing on x-variables! Here are the
highlights:

MISSING IS .;

USEVARIABLES ARE
Ssex
PAQaeAbD
w1SRSESs
w2SRSESs;

ANALYSIS:
Type = Missing;

MODEL:
w2SRSESs on Ssex w1SRSESs PAQaeAbD;

OUTPUT:
sampstat standardized;

Can you help?
Thanks!

Linda K. Muthen posted on Wednesday, January 11, 2012 - 11:24 am

This is because missing data theory does not apply to observed exogenous variables. To avoid this, you would need to bring all of the covariates into the model by mentioning their variances in the MODEL command. When you do this, they are treated as dependent variables and distributional assumptions are made about them.

EFried posted on Saturday, January 14, 2012 - 8:51 pm

Dear Dr Muthén!

Data set of N=800, 5 measurement points (MP), first MP has 5%, last MP 40% missings on my one continuous outcome variable. Covariates (6 time invariant, 1 time varying) also have some missings. If I run the whole growth mixture model, MPLUS deletes about 50% of my subjects, which is an insane amount of information I do not want to lose:

"Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 396

What to do?

(1) Auxiliary isn't possible in type RANDOM or MISSING if I see that correctly.

(2) I watched your videos 3-6, but the parts about missing data confused me more than they helped me ;). I read up on "Diggle Kenward selection Modeling" and "Roy's Model (Pattern Mixture Modeling)" but I don't want to write a paper on missing data and imputation. Other people must struggle with this also. Are there any guidelines I can follow on this?

(3) Chapter 11 of your wonderful manual: "Covariate missingness can be modeled if the covariates are brought into the model and distributional assumptions such as normality are made about them."
What does this mean - how do I model covariate missingness exactly?

Thank you so much!
T

Linda K. Muthen posted on Sunday, January 15, 2012 - 4:44 pm

I would choose number 3. What this means is that in regression the model is estimated conditioned on the covariates and no distributional assumptions are made about them. If you bring them into the model and treat them as dependent variables, distributional assumptions are made about them.

EFried posted on Wednesday, January 18, 2012 - 9:55 pm

Thank you! Is there an example in the v6 manual or in any of the videos for this? Or in one of the papers? I wouldn't quite now how to do this.

Also, I have 5 measurement points, and 8 time invariant and 2 time varying covariates (with 4 measurement points each).

So I probably would have to decide which ones to bring into the model as dependent variable, otherwise the model would become not identified anymore?

Thank you
Torvon

Linda K. Muthen posted on Thursday, January 19, 2012 - 6:45 am

You should bring all observed exogenous covariates into the model. If they are call x1 and x2, you say in the MODEL command

x1 x2;

EFried posted on Thursday, January 19, 2012 - 12:27 pm

So to the model ...

%OVERALL%
i s | y0@0 y1@1 y2*2 y3*3 y4*4;
i s ON x1-5;

y0 ON x6;
y1 ON x7;
y2 ON x8;
y3 ON x9;
y4 ON x10;

...

I add the line
x1-x10;
?

Thanks

Linda K. Muthen posted on Thursday, January 19, 2012 - 6:41 pm

Yes.

Nancy Lewis posted on Tuesday, January 31, 2012 - 10:42 am

I am trying to run a mixed-effects meta-analysis using SEM, with 4 dummy coded moderator variables as fixed effects and a random effect for the intercept. Several studies are missing data on one or more moderator variables.

When I run the model with TYPE=RANDOM, I get a warning indicating that listwise deletion was done and only 11 of my 22 cases were included in the model. However, when I run the model with both the intercept and moderators as fixed, I do not get this warning and all 22 cases are included.

Why is this and what do I need to do to use FIML for the mixed-effects analysis?

Thank you for your help.

Linda K. Muthen posted on Tuesday, January 31, 2012 - 2:26 pm

You are probably running an older version of the program where with all continuous variables the y with x model was estimated instead of the y given x.

Nancy Lewis posted on Tuesday, January 31, 2012 - 2:53 pm

I am using Version 6.11.

Linda K. Muthen posted on Wednesday, February 01, 2012 - 5:49 am

Please send your outputs and license number to support@statmodel.com so I can see what is happening.

katie bee posted on Wednesday, February 08, 2012 - 10:14 am

Professor Muthen,

I am using Mplus Version 6.1 and am using ML w/monte carlo integration, and have been unable to get fit statistics. I thought I read that beginning w/version 3, this would be possible? Do you have any suggestions?

Thank you.

Linda K. Muthen posted on Wednesday, February 08, 2012 - 2:17 pm

Chi-square and related fit statistics are not available when means, variances, and covariances are not sufficient statistics for model estimation.

Anonymous posted on Friday, February 17, 2012 - 10:14 am

Hi. I’m trying to unpack the defaults in Mplus (5.21) Re: the way it “adjusts” for observed control covariates, across different estimators.
I have a model: Latent Y regressed on latent X1 and a set of observed control covariates. I use the MLR estimator. It looks like the default is to give estimates of the covariances between the observed and latent covariates. My understanding is that one doesn’t need to “call in” the covariances amongst the observed covariates, in order to make sure that the fitted regression parameters control for the other variables in the model.
If I, say, use numerical integration here instead, it looks like the default is to NOT estimate covariances between the observed and latent covariates. Is it still the case that the regression parameters are adjusted for the effects of the covariates in the model (observed or latent)?
I ask b/c I fit the same model –w/ MLR and then w/numerical integration—I get notably different estimates for the effects of my observed covariates. W/ MLR, it looks like it might be adjusting for the other covariates, whereas w/ numerical integration it does not appear to be. If I “call in” the covariances between the latent and observed covariates w/ numerical integration, it looks like the MLR model w/out numerical integration. I suppose it could also be a difference in the way the two models handle missing data (?). Thanks for any thoughts.

Linda K. Muthen posted on Friday, February 17, 2012 - 1:51 pm

Please send the two outputs and your license number to support@statmodel.com.

gibbon lab posted on Monday, March 05, 2012 - 8:49 am

Hi Linda,

In one of your old posts (above), you mentioned "For censored and categorical outcomes using weighted least squares estimation, missingness is allowed to be a function of the observed covariates but not the observed outcomes." I was just wondering if you have a reference paper for this so that I can read more details. Thanks a lot.

Kathleen Berger posted on Monday, March 05, 2012 - 11:56 am

Hi Linda,

I am new to MPLUS and have version 6.12. I am running a simple CFA with one factor and 21 ordinal outcome variables. But, I have missing data (assuming MAR). I am a little confused as I have read different things in terms of whether the program 'handles' the missing data when missing is indicated. I have gotten the program to run and get fit indices, but am worried that the estimator being used isn't appropriate.
(WLSMV). Is this ok?

thanks so much!

INPUT INSTRUCTIONS

TITLE: 1 FACTOR CFA OF COGNITIVE CAPACITY SAFETY TBI
DATA: FILE IS "C:\Users\kathy\Desktop\shepherd_safety_project\
ControlFIle_cg_cc3_3_2012.dat";
VARIABLE:
NAMES ARE cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 cc13 cc14
cc15 cc16 cc17 cc18 cc19 cc20 cc21;
CATEGORICAL ARE cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12
cc13 cc14 cc15 cc16 cc17 cc18 cc19 cc20 cc21;
MISSING are all (9);
MODEL:
F1 BY cc1 cc2 cc3 cc4 cc5 cc6 cc7 cc8 cc9 cc10 cc11 cc12 cc13 cc14
cc15 cc16 cc17 cc18 cc19 cc20 cc21;
OUTPUT:
SAMPSTAT;
STAND;
RESIDUAL;
PATTERNS;
SAVEDATA:
FILE IS COGCAP_02122012cfa.DAT;
DIFFTEST IS DERIV.DAT;
FORMAT IS F2.0;
PLOT:
TYPE IS PLOT2;

Linda K. Muthen posted on Monday, March 05, 2012 - 2:18 pm

Gibbon:

There is no paper that describes this. With WLSMV the dependent variables are looked at in pairs so missing data information cannot be gathered from all variables like in maximum likelihood.

Linda K. Muthen posted on Monday, March 05, 2012 - 2:20 pm

Kathleen:

In your case with only one factor, I would recommend using MLR. As an alternative, you can do multiple imputation to generate data sets and use WLSMV to analyze them.

naT posted on Thursday, April 19, 2012 - 4:15 pm

I am having trouble reproducing residual variances from the estimated parameters of path analysis model with missing data.

I ran path models with and without missing variables. When I manually recalculated the residual variance using estimated parameters, I could only reproduce mplus residual variances when I have full data. I took out the cases with missing variables and recalculated again but my calculation still did not match with the mplus estimate of residual variance.

Could you please help me understand what is the problem. I built the model as single indicator factor analysis model. Thank you.

Bengt O. Muthen posted on Thursday, April 19, 2012 - 9:05 pm

I don't see how you can manually calculate the residual variance - it is estimated by ML I assume? You don't say if your variables are continuous or categorical, where in the latter case residual variances are not free parameters.

naT posted on Friday, April 20, 2012 - 2:19 pm

It is estimated with ML. All variables were continuous.

I just recalculated as the variance of difference between estimated dependent variable Y' and actual Y. Y' was calculated as Y'=intercept+coef*X. X and Y are observed, so I used the parameters (coef and intercept) produced by mplus to reproduce residual variance in a spreadsheet. Please let me know if I have misunderstood. I was able to reproduce the residual variance for full data but not for missing data.

Bengt O. Muthen posted on Friday, April 20, 2012 - 8:30 pm

You are assuming that estimated variance of Y equals the sample variance of Y. This is not true for all models. If you look at your missing data run, requesting Residual in the Output command, you will probably see that the difference between estimated and observed variance is not zero.

Mai Sherif posted on Tuesday, May 08, 2012 - 8:42 am

I am fitting a latent variable model that also includes random effects. Some of the items are binary while the rest are continuous. My data includes missing values as well.
The output that I get says that the dimension of numerical integration is 1 and it is actually very fast. I was wondering how the fitting takes place with only one dimension of numerical integration although I have seven latent variables and random effects? Is there a dimension reduction technique used by MPlus?

Thanks!

Bengt O. Muthen posted on Wednesday, May 09, 2012 - 8:27 am

We would need to see the output to tell. Please send to Support.

Amanda Hare posted on Monday, May 14, 2012 - 8:41 am

Hi there-

I am running some new analyses for a revise/resubmit and have encountered a problem. When I ran the following script back in November, the results yielded and N of 184:

USEVARIABLES ARE gender1y psautm1 psautm4 psycon1 psycon4;

MISSING IS .;

ANALYSIS:
TYPE IS MEANSTRUCTURE;

MODEL:
psycon4 on psycon1;
psycon4 on psautm1;
psycon4 on gender1y;
psautm4 on psautm1;
psautm4 on psycon1;
psautm4 on gender1y;
psautm1 with psycon1;
psautm4 with psycon4;

OUTPUT:
sampstat standardized;

Today, the same script with the same data is giving me a much smaller N. Can you help?

Amanda

Linda K. Muthen posted on Monday, May 14, 2012 - 9:05 am

Please send the two outputs and your license number to support@statmodel.com.

Mai Sherif posted on Wednesday, May 16, 2012 - 10:20 am

I have 4 u's (random effects) and 3 z's (latent variables) and the output below specifies that the dimension of numerical integration is only 3. I am just wondering why the dimension is only 3 and how the other latent variables are integrated.

usevar = G1-G4 L1-L4 N1-N3;

Categorical are N1-N3;

Missing are all (-9);

Analysis:

Estimator=MLR;

Model:

u1 by G1@1 L1@1 ;
u2 by G2@1 L2@1 ;
u3 by G3@1 L3@1 ;
u4 by G4@1 L4@1 ;

G1-G4;
L1-L4;

z1 by G1-G4 N1;
z2 by L1-L4 N2;
z3 by N1@1 N2@1 N3@1;

[N1$1 N2$1 N3$1];

z1 with z2;
z1 with z3;
z2 with z3;

u1 with u2-u4 @0;
u2 with u3-u4 @0;
u3 with u4 @0;
u1-u4 with z1-z3 @0;
Number of dependent variables 11
Number of continuous latent variables 7
Integration Specifications
Type STANDARD
Number of integration points 15
Dimensions of numerical integration 3 Adaptive quadrature ON
LinkLOGIT

Mai Sherif posted on Wednesday, May 16, 2012 - 10:25 am

Another question I have is about dealing with survival data. Do we have to specify that some indicators are survival indicators? Or is it sufficient to have the survival items set up as in (Muthen and Masyn, 2005)and then MPlus will automatically model the conditional probabilities of survival (hazard function)rather than just a logistic model?

Many thanks.

Linda K. Muthen posted on Wednesday, May 16, 2012 - 10:42 am

Please send the output and your license number to support@statmodel.com.

For discrete-time survival, you do not need to specify that the indicators are survival indicators. You need to arrange the data and specify the model as shown in Example 6.19.

Please limit your posts to one window.

Eric Deemer posted on Wednesday, May 16, 2012 - 11:26 am

Hello,
I would like to determine the proportion of missing data in my data set. Under the DATA MISSING command, would I just ask for DESCRIPTIVES for the variables that I named as missing? Are frequencies provided as part of the DESCRIPTIVES output?

Eric

Linda K. Muthen posted on Wednesday, May 16, 2012 - 12:01 pm

Ask for the PATTERNS option in the OUTPUT command.

Eric Deemer posted on Wednesday, May 16, 2012 - 12:13 pm

Thanks, Linda!

Eric

Mai Sherif posted on Thursday, May 17, 2012 - 3:38 am

Dear Linda,

Thanks a lot!

rs posted on Thursday, June 21, 2012 - 9:59 pm

Hi Linda,

I am conducting SEM analysis using data from cross-sectional study. Missing data analysis for some variables showed data are missing not at random (MNAR). I am wondering whether MLR will be able to handle data which are missing not at random? If not, will it help if I impute missing values using SPSS for these variables before I conduct the analysis using MPLUS?

Thank you.

Bengt O. Muthen posted on Friday, June 22, 2012 - 9:11 am

You want to make a distinction between MCAR, MAR, and NMAR (=MNAR). See missing data books, or our Topic 4 teaching. Mplus does MAR by default (often called FIML) and can also do NMAR modeling. Mplus also does multiple imputation. Multiple imputation assumes MAR.

What you are probably seeing is that MCAR does not hold. MAR may still hold. There is no way of knowing if MAR or NMAR holds.

Chris Blanchard posted on Friday, June 22, 2012 - 9:49 am

Hi there,
i'm doing a latent class growth analysis across 5 time points (baseline, 3, 6, 9, and 12 months) with missing data. In reading varoius literature, I understand I should use the "auxiliary" function to ensure my data is MAR. So, I've pasted my syntax below to ensure it is correct because my output doesn't seem to change with or without the auxiliary function (note, the sample size is 269 patients).

USEV ARE age(0,1) educ(0,1) employ (0,1) t1_PA t2_PA t3_PA t4_PA t5_PA;

MISSING = t2_PA t3_PA t4_PA t5_PA(999);

CLASSES = c(2);

AUXILIARY = (r) age educ employ;

ANALYSIS:
TYPE = MIXTURE;
STARTS 20 2;
MITERATION = 300;

MODEL:
%OVERALL%
i s | t1_PA@0 t2_PA@1 t3_PA@2 t4_PA@3 t5_PA@4;
i-s@0;

Any help would be greatly appreciated!

chis

Eric Teman posted on Friday, June 22, 2012 - 8:13 pm

When it says missingness is not allowed on observed exogenous variables, does this include the indicators of latent exogenous variables?

Bengt O. Muthen posted on Friday, June 22, 2012 - 8:34 pm

Cris - you want to use auxiliary = (M) ..., not (R). See Topic 4.

Linda K. Muthen posted on Saturday, June 23, 2012 - 9:24 am

Eric:

Indicators of exogenous latent variables are endogenous variables.

rick gibbon posted on Monday, June 25, 2012 - 1:39 pm

Hi Prof Muthen,

In response to one of your previous comments
"FIML is an estimator and EM is one algorithm for computing FIML estimates. Other algorithms include Quasi-Newton, Fisher Scoring, and Newton-Raphson. Mplus uses the EM algorithm for the unrestricted H1 models and the other algorithms for H0 models. "

Aren't Quasi-Newton, Fisher Scoring and Newton-Raphson mathematical methods for finding a solution of a equation? How are they related to missing data? If using these algorithms for H0 models, is it true that the missing data were not taken into account for H0 models? Thanks.

Bengt O. Muthen posted on Monday, June 25, 2012 - 3:54 pm

I think I was trying to make a distinction between estimators and algorithms because it seems like sometimes missing data handling is referred to as using the "EM approach", which mixes apples and oranges. So the answer is Yes to your first question. Any of the algorithms can be used to do ML under MAR, which is often called FIML. So the answer to your second question is No - the use of these algorithms is unrelated to whether missing data is handled or not. You have to look to the assumptions made in the estimation to know how missingness is handled.

gibbon lab posted on Monday, June 25, 2012 - 6:11 pm

Hi Linda,

In response to one of your old comments
"There is no paper that describes this. With WLSMV the dependent variables are looked at in pairs so missing data information cannot be gathered from all variables like in maximum likelihood."

So pairwise likelihood uses information from each pair of the observed endogenous variables. But for those who have only one observed endogenous variable, there are no pairs available. Will those subjects be thrown away when using pairwise likelihood? Thanks.

Linda K. Muthen posted on Monday, June 25, 2012 - 8:07 pm

Missing data theory applies only to two or more dependent variables.

Katherine Pratte posted on Thursday, July 26, 2012 - 12:05 pm

I am including auxiliary variables in a latent growth curve model. Do I need to define what variables are binary or nominal before listing them in the auxiliary option?

Linda K. Muthen posted on Friday, July 27, 2012 - 10:27 am

Auxiliary variables are treated as continuous and should not be specified to be other than that. Using a nominal variable as an auxiliary variable would not work. You may want to create a set of dummy variables.

Miles Taylor posted on Friday, July 27, 2012 - 1:08 pm

I am having some trouble with coding in upgraded Mplus. My old version of Mplus was Version 4. If I ran a continuous growth curve in 4 I had to specify the TYPE=Missing option but once I did that it would handle both missingness on my observed Y's (from attrition) and missingness on my X variables. Now in the new version (6) TYPE=Missing is the default but my model is "kicking out" anyone with missingness on any X variable. It only used to do that when I modeled a noncontunuous Y variable. Is this some problem in my code or did the default FIML change such that it automatically drops cases with missing on the X variables?

Thanks,
Miles

Linda K. Muthen posted on Friday, July 27, 2012 - 2:12 pm

There was a default change in Version 6. It is described in the Version History on the website under Version 6.1.

Lisa M. Yarnell posted on Monday, July 30, 2012 - 9:47 am

Hi Drs. Muthens,

We are experiencing a drop from N = 9303 to N = 7717 in our model with only one exogneous variable.

However, we checked missingness on the exogenous variable, and there is an incidence of missingness of only n = 347.

Is there another reason that Mplus removes cases from analysis, other than missingness on an exogenous variable, such that we are losing almost 1600 cases?

Thank you.

Linda K. Muthen posted on Monday, July 30, 2012 - 12:07 pm

Please send the output, data, and your license number to support@statmodel.com.

Melissa Simard posted on Wednesday, August 01, 2012 - 8:24 am

Hi there,

I am trying to run through the steps of factorial invariance and ultimately run from these latent constructs a growth curve over 4 time points (continuous data).

I have imputed my missing data (resulting in 20 data sets) using the latest version of Amelia and created the list.dat file (as is done in example 11.5 of the user guide). However, while I no longer have missing data, I do have some variables (mean scores) that will be ultimately included in my model that are non-normal (not seriously however).

I would like to use an appropriate estimator that will allow me to use the TYPE = imputation command to summarize results and provide me with the information I require to examine the CFI and RMSEA CI to judge my model as I go through the steps of factorial invariance.

From what I can tell with my first attempt, using the MLR estimator (not the MLM as it seems its not available with TYPE=imputation) I cannot access the CI for the RMSEA (as it is provided if using the ML estimator with TYPE = imputation).

My question is how robust the ML estimator is with nonnormal data and if it would be appropriate to use this (knowing I have some nonnormality) so that I can get the fit information I require using the TYPE= imputation command.

Your advice would be invaluable.
Many thanks!

Linda K. Muthen posted on Thursday, August 02, 2012 - 11:18 am

With multiple imputation, the only fit statistic that has been developed for multiple imputation is chi-square for ML. For the others, averages are given. I would run ML and MLR and see how different the standard errors are. If they are not that different, it would indicate that you variables all not that non-normal. I would then use ML.

Ebrahim Hamedi posted on Tuesday, August 21, 2012 - 6:46 pm

Hi
Is there any way to obtain listwise deletion only for usevariable list not for all the variables (listed in names ARE .....)?

Many thanks,
Ebi

Linda K. Muthen posted on Tuesday, August 21, 2012 - 7:00 pm

You should get listwise deletion for the variables on the USEVARIABLES list if one is used. If you don't think this is the case, please send the files and your license number to support@statmodel.com.

Ebrahim Hamedi posted on Tuesday, August 21, 2012 - 7:22 pm

I have 13 groups, and am testing a five item CFA (but 150 items in the data set). The sample sizes shown in the "Number of observations" section of the result, is 20 to 30% less than real sample sizes. The only explanation that I can think of is that listwise deletion has been applied to all items menioned in "names are" part. Can you think of any other explanation? If not, I will talk to my supervisor to send the files. Thanks.

Linda K. Muthen posted on Wednesday, August 22, 2012 - 6:33 am

Look at the warning messages that are printed for possible reasons. Check that you are reading the data correctly. You may have blanks in the data set that are not allowed with free format. Check that the number of variable names in the NAMES list is the same as the number of columns in the data set.

Ebrahim Hamedi posted on Wednesday, August 22, 2012 - 7:53 pm

Hi
Thanks. I did not know with free format blank is not allowed. It worked out.

Mohsen

Bogdan Voicu posted on Tuesday, September 18, 2012 - 8:09 am

Hi,

I run a TWOLEVEL model. N is 72418. MPlus6.12 drops 42258 cases due to missingness on the x-variables. I have used SPSS to check for the total number of cases with at least a missing value and it is two times lower: 20765.

If I run a model with no predictor (just the dependent variable), there is no difference in the number of dropped cases reported by MPlus as compared to the one that I compute in SPSS. The more predictors I add, the higher the loss of cases when using MPlus (as compared to the value that I compute in SPSS).

Since I am not very experienced with MPlus, it is probably something that I miss, but I have no clue what this should be. Any suggestion would be more than welcome!

Linda K. Muthen posted on Tuesday, September 18, 2012 - 10:48 am

In Mplus, a case is dropped if it has missing values on one or more predictors. To explain the differences, you would need to send the relevant files and your license number to support@statmodel.com.

Calvin D. Croy posted on Thursday, October 04, 2012 - 5:11 pm

Do Mplus 6 and 7 use FIML to calculate the means and variances produced when Type = Basic?

I read hypothetical data for 6 obs and 3 vars into Mplus:
Var1 Var2 Var3
1 2 2
2 3 3
3 4 .
4 5 .
6 7 7
7 8 8

The Mplus printout shows mean = 4.833 and variance = 4.472 for both Var2 and Var3. For Var 2, these values are the same as produced by Excel and that I calculate by hand.

However, for Var3 Excel and my hand caluclations show mean = 5.000 and pop. variance = 6.500. Are the Mplus values 4.833 and 4.472 different from these because of FIML?

Thanks for clarifying how FIML does or doesn't change the sample descriptive statistics produced in Mplus.

Bengt O. Muthen posted on Friday, October 05, 2012 - 9:46 am

Yes, FIML is used for BASIC. So you draw on information from other variables.

André Krug posted on Wednesday, November 28, 2012 - 7:09 am

Hello,

i have one question:

How can MPLUS7 show me, means seprated by a catergory like a treatment?

Thank you very much.

Linda K. Muthen posted on Wednesday, November 28, 2012 - 10:53 am

You can do a TYPE=BASIC for each group separately using the USEOBSERVATIONS option. Or if you want to test parameter differences, you can do a multiple group analysis.

Miriam Forbes posted on Thursday, December 06, 2012 - 6:16 pm

Hello,

I have data that is neither MAR nor missing by design - we are measuring sexual dysfunctions as part of our analysis, and people that did not engage in sexual activity were unable to answer many of the questions, so have system missing values. We are using mixture models to analyse the relationships between sexual dysfunctions, depression and anxiety disorders.

I was wondering whether using FIML would be an acceptable way to deal with these cases, or if you have any other advice?

All the best,

Miriam

Linda K. Muthen posted on Friday, December 07, 2012 - 9:42 am

FIML is probably the best you can do. It is probably better than listwise deletion.

Miriam Forbes posted on Sunday, December 09, 2012 - 6:45 pm

Thanks Linda.

Miriam Forbes posted on Tuesday, December 11, 2012 - 4:15 pm

One more question - can we use FIML at a disorder-level (i.e., total scores from separate scales), or does it have to be at an item-level?

Thanks again,

Miriam

Linda K. Muthen posted on Wednesday, December 12, 2012 - 8:49 am

You can use FIML also for sum scores.

Xiaochen Chen posted on Thursday, January 10, 2013 - 12:08 pm

Hello, Dr. Muthen,

I tried to fit a multilevel regression model with missing data on Y variable. I want to explore whether having an outgroup friend (level1 predictor,dichotomous) influences attitudes toward the out-group. The Y variable (attitude)is continous and is MCAR. Here is my syntax:
Variable:
Names are ID School Friend Attitude;
USEV= School Friend Attitude;
WITHIN = Friend;
MISSING are Attitude (99);
CLUSTER = School;
Analysis:
TYPE = TWOLEVEL RANDOM MISSING;

MODEL:
%WITHIN%
sfriend | Attitude on Friend;
%BETWEEN%
Attitude sFriend;
Attitude with sfriend;

I got the waring " Data set contains cases with missing on all variables except x-variables. These cases were not included in the analysis."

I don't know why Mplus used list-wise deletion to deal with missingness on Y variable? How can I use FIML instead?

Many thanks!

Linda K. Muthen posted on Thursday, January 10, 2013 - 1:10 pm

The missing data theory of FIML does not apply to observed exogenous variables. The model is estimated conditioned on these variables. This is not listwise deletion. FIML is being used.

Bengt O. Muthen posted on Thursday, January 10, 2013 - 2:02 pm

You can use the PATTERNS option in the OUTPUT command to check your missing data patterns on your Y variables.

Christopher R. Beasley posted on Friday, January 18, 2013 - 2:10 pm

I am preparing data for an MSEM path model that will be examined in Mplus for my dissertation. The extent of missing data less than 5%. I am considering two options for handling the missing data.

1) Impute missing data using EM before running the model. This would allow me to retain available data for summary scores.

2) Run the model in Mplus using FIML estimation of missing data. This is a more accurate estimation but would be based on less information, because the summary scores would be missing for any case missing data on any item in the measure. The extent of missing data would also be greater because of this handling of the missing data.

What might be the advantages and disadvantages of using EM vs. FIML in this instance?

Bengt O. Muthen posted on Friday, January 18, 2013 - 3:13 pm

Imputation and FIML should give quite similar results. Typically, if you can do FIML that gives you more options for various tests. Note that Mplus does imputation. I don't see how there would be a greater extent of missing data with FIML than imputation.

Regarding the summary scores, why not use the average value of the items that are not missing. The imputed values don't carry new information anyway.

Hicham Raïq posted on Monday, January 21, 2013 - 10:06 am

I m working on SEM with caegorical variables. In the output of my results, I have this warning
Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 105

The number of my observation is 2388.

What is the methode, can you suggest to me for dealing with missing: liswise deletion or pairwise deletion. Some authors propose the maximum liklihood estimation for incomplete data. But this option doesn't work in my case because my Analysis is type=complex.

Tank you to give an advise about this situation

Bengt O. Muthen posted on Monday, January 21, 2013 - 8:14 pm

Is your question about how to include the 105 cases?

Hicham Raïq posted on Thursday, January 24, 2013 - 6:51 am

May be should I include those cases, but is it is the best method to deal with missing

Thanks

Linda K. Muthen posted on Thursday, January 24, 2013 - 12:00 pm

Missing data theory does not apply to observed exogenous covariates. If you want to include those cases in your analysis, you need to use multiple imputation. See DATA IMPUTATION in the user's guide.

Yalcin Acikgoz posted on Sunday, March 31, 2013 - 12:59 pm

Dr. Muthen,

Below is from a post of yours dated December 15, 2011 - 2:21 pm:

"If a study has planned missingness, a missing value flag should be assigned to individuals who did not take the item. Nothing more needs to be done."

I am working on data with skip patterns such that there are some variables that are responded by only a subset of participants. I have two questions:

1) When using multiple imputation, can I impute such that only those who should have responded but refused are imputed? I don't want values for those who are valid skips. If I set valid skips to missing, they will be imputed. If I do not set to missing but assign them values, the program will use those values as valid responses while imputing missing data. What should I do?

2) The model I am working on includes these variables that are responded by only a sub-sample. When modeling these variables, is there anything that needs to be done? Or do I just use them just like any other variable in the model?

Thank you in advance!

Jenny L. posted on Friday, May 31, 2013 - 10:56 pm

Dear Professors,

If the missing data are not specified in the command, how would they be treated by Mplus? In my data set, my missing data were blank; I forgot to specify missing is blank in the first place, but still got an output with no error message. I'm curious how those missing data were treated.

Thank you in advance for your help.

Jenny L. posted on Friday, May 31, 2013 - 11:12 pm

I should also mention that when missing values were not specified, Mplus still seemed to use all data (i.e., the sample size was the number of all participants I had).

Linda K. Muthen posted on Saturday, June 01, 2013 - 8:37 am

If you do not declare a value as a missing value flag using the MISSING option, it is treated as a valid value. All cases will be used.

If you have a blank in the data set with fixed format and do not declare it as missing, it is treated as a zero. Blanks are not allowed with free format data. They cause the data to be misread.

Jenny L. posted on Saturday, June 01, 2013 - 9:04 am

Thank you for the clarification!

Louise Mewton posted on Tuesday, July 30, 2013 - 4:45 pm

Hi there - I am conducting a longitudinal CFA with one factor, four non-normal continuous indicators and two time points, pre-intervention and post-intervention. I have about 650 cases. I am conducting tests of measurement invariance (partial strict invariance is supported) and the aim of the study is to determine the effect of the intervention on the latent mean (so the reduction in the latent mean from pre to post, which is significant). I am able to run the analyses no problem, but the problem is that about 50% of the post-intervention data is missing from drop out. I would rather use MLR and all cases rather than perform listwise deletion, but I'm not sure what the impact of such a large amount of missing data would have. Any help would be really appreciated.

Thanks,
Louise

Bengt O. Muthen posted on Tuesday, July 30, 2013 - 5:23 pm

The Mplus default is MAR using all cases, which is obtained when requesting either ML or MLR. But you are right that 50% attrition is a lot and that means that the results depend to an uncomfortably large extent on the model assumptions, including normality. It can be particularly problematic if the missingness rate is different for the intervention groups. I assume it is not possible to try to find a random sample of those who were lost to follow-up.

Louise Mewton posted on Tuesday, July 30, 2013 - 5:42 pm

Wow! Thank you for your very prompt reply.

The intervention is done online in an open access research setting, so we have no control group just the intervention and no contact with the patient once they drop out. The indicator variables are very much left censored.

I'm gathering you'd suggest using completer only data?

Thanks again,
Louise

Bengt O. Muthen posted on Tuesday, July 30, 2013 - 6:34 pm

No, using completer only data would probably be worse. The best you can do is probably using all data via MLR (assuming MAR).

Louise Mewton posted on Tuesday, July 30, 2013 - 6:50 pm

Thank you very much.

SYoon posted on Friday, August 09, 2013 - 12:07 pm

Hi, I am using MPlus 7.1
I have missing values in independent and dependent variables. Outcome variables are continuous but mediator variables are categorial.

I wanted to handle them as FILM but then I've got this error message.

TYPE IS MISSING;
ESTIMATOR IS ML;
*** ERROR
MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION.

So I changed the syntax like this:

TYPE IS MISSING;
ESTIMATOR IS WLSM;
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 953

Do you have any suggestions? Thank you.

Bengt O. Muthen posted on Friday, August 09, 2013 - 1:03 pm

Because your mediator is categorical you have to pay special attention to how to treat the mediator in the modeling. Call it u and let u* be an underlying continuous latent response variable for u. The key question is if u or u* is the predictor (IV) for the distal outcome y. ML uses u which complicates matters. WLSMV uses u*. Bayes can use either. More correct causal effects are obtained as in the paper on our website:

Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus.

A simple approach is to use WLSMV and include the x variables in the model by mentioning their means or variances. This also enables Model Indirect. You can also do Multiple Imputations as a first step to handle the missingness on the x's.

SYoon posted on Friday, August 09, 2013 - 3:17 pm

Thank you so much for the specific direction. I've tried two cases according to your comment.

1)
TYPE IS MISSING;
ESTIMATOR IS WLSMV;

And then I included variances of x-variables in my original model-

ses black hisp asian white male urbanicity public;

but then, I've got this message.
WARNING: VARIABLE URBANICI MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.

Does WLSMV hold the missing at random assumption? If I didn't include variances or means of x-variables, then it just deletes missing values in x-variables?
just,
TYPE IS MISSING;
ESTIMATOR IS WLSMV;

2) I also tried to do multiple imputation of outcome variables. (but not mediators)

TYPE IS imputation;
*** ERROR in DATA command
There are fewer NOBSERVATIONS entries than groups in the analysis.

I would appreciate it if you have further suggestion.

Thank you.

Bengt O. Muthen posted on Friday, August 09, 2013 - 3:50 pm

WLSMV is not be the best estimator for handling missing data. ML and Bayes are better because they are full-information estimators. But with ML you can't have a u* mediator. So why not try Bayes?

Regarding 2), you would have to send your files to Support to diagnose the problem.

SYoon posted on Friday, August 09, 2013 - 3:56 pm

I tried Bayes as well but it doesn't seem to handle indirect effect model here.

MODEL INDIRECT is not available for analysis with ESTIMATOR=BAYES.

When I use ML then I get this message.

MODEL INDIRECT is not available for analysis with ALGORITHM=INTEGRATION.

Should I stay with WLSMV in this case?

I greatly appreciate your help.

Bengt O. Muthen posted on Friday, August 09, 2013 - 6:25 pm

The fact that Model Indirect is not available should not deter you. You can express the effect yourself using Model parameter labels that you use in Model Constraint.

boydadavid posted on Monday, September 23, 2013 - 8:45 am

Hi, I have a question regarding the covariance coverage output. I see that one of my variables i have a number of ****** beside it.

What does this mean?

Covariances
SEXF AGE EDUC INCOME SWD
________ ________ ________ ________ ________
SEXF 0.249
AGE -0.158 154.638
EDUC 0.059 -5.428 2.733
INCOME -195.182 31807.575 1612.430 ***********
SWD 0.015 0.474 -0.029 71.745 0.068
NEVERMAR -0.007 -2.898 0.106 -1065.354 -0.021
CHRONIC 0.010 0.691 0.025 -148.704 0.005

Linda K. Muthen posted on Monday, September 23, 2013 - 2:25 pm

This means that income has a variance so large that it will not fit in the space allocated. We recommend keeping the variances of continuous variables between one and ten. You can rescale variables using the DEFINE command by dividing them by a constant so that their variances are between one and ten, for example,

y = y/10;

Eric Deemer posted on Thursday, September 26, 2013 - 6:57 pm

I'm trying to calculate the percentage of missing data in my data set. If I specify "missing = all(-999)", for example, is there a command I can use to determine the frequency with which the value "-999" is observed? Thanks.

Eric

Linda K. Muthen posted on Friday, September 27, 2013 - 7:41 am

The PATTERNS option of the OUTPUT command will show you the patterns of missing data.

una posted on Thursday, November 07, 2013 - 6:18 am

Dear Prof. Muthen,
I am running an analyses with MLR. If I understand correctly the default is that Mplus deletes cases with missing values on exogenous observed variables and uses full information for missing values on the endogenous variables? What are the advantages of using this approach? Do you have a reference to read more about this?
Thank you very much in advance,

Linda K. Muthen posted on Thursday, November 07, 2013 - 6:27 am

The advantages are using all available information rather than using listwise deletion. See the Little and Rubin reference in the user's guide.

milan lee posted on Friday, December 06, 2013 - 8:35 am

Hi,
I wanted to test the missing pattern of my dataset based on Little’s MCAR test (Schlomer, Bauman, & Card, 2010) using Mplus. I checked out posts on the Mplus forum and it looks like we have to use "type=mixture" to obtain Little's MCAR test and the variables has to be categorical. However, I doubt I understand it well. There has to be a general command for testing MCAR and MNAR for imputation in a general regression model (not complex mixture model). May I have your advice on how to conduct this MCAR test with a chi-square value in Mplus? What is the command syntax for this test in continuous variables and simple regression models?
Thank you very much!

Tihomir Asparouhov posted on Friday, December 06, 2013 - 3:15 pm

The MCAR test we give for categorical variables with Mixture is not Little's MCAR test. It comes from

Fuchs (1982) Maximum Likelihood Estimation and Model Selection in Contingency Tables With Missing Data J of Amer Stat Assn.

About NMAR I would recommend reading

http://statmodel.com/download/Muthen%20et%20al%202011-Psych%20Meth-Growth1.pdf

In particular MAR v.s NMAR testing is conditional on assumptions about the missing data mechanism. So I would say it is somewhat limited (that has nothing to do with which software you use - it comes from the fact that the MAR hypothesis is very very general - it is hard to test against any ignorable missing data mechanism).

Little's test is not available in the current version of Mplus.

milan lee posted on Friday, December 06, 2013 - 7:03 pm

Thanks a lot for your explanation, Tihomir! Very very informative and helpful!!!

Matteo posted on Wednesday, January 15, 2014 - 9:58 am

Dear MPlus developers, I'm trying to understand the exact algorithm that you use for dealing with missing data through maximum likelihood.
Reading classical papers on the topic, I thought that there exists a closed form for the maximization of the full information maximum likelihood problem with missing data only when the outcomes can be considered multivariate normal, while in all the other cases, so for example with categorical outcomes, we need iterative methods like the EM algorithm. Is this what MPlus does? Or am I wrong? Sorry for bothering you, but I didn't find this information anywhere,
Thanks in advance.

Tihomir Asparouhov posted on Wednesday, January 15, 2014 - 11:22 am

Appendix 6 in
http://statmodel.com/download/techappen.pdf
gives some information on the missing data estimation. Closed for expression is available for all the models estimated with ML. Mplus does not the EM algorithm to deal with missing data. The general ML estimation is described in
http://statmodel.com/download/ChapmanHall06V24.pdf

Matteo posted on Thursday, January 16, 2014 - 3:51 am

Dear Tihomir, thank you very much for your answer! I had already seen Appendix 6, but I didn't find what I was looking for and also I thought it was a little bit out of date, since it starts by saying "Missing Data is allowed for in cases where all y variables are continuous and normally distributed", while I read in the general description of modelling missing data that "MPlus provides ML estimation under MCAR (missing completely at random), MAR (missing at random), and NMAR (not missing at random) for continuous, censored, binary, ordered categorical (ordinal), unordered categorical (nominal), counts, or combinations of these variable types (Little & Rubin, 2002)."
That is also what I'm mostly interested in.
I will give a look to the second reference you gave me, Thank you very much again for your answer!

Anonymous posted on Monday, January 20, 2014 - 2:52 pm

In SEM with the criterion variable having categorical indicators with missing values (the predictor variables have continuous indicators with missing values), can I use FIML?
Thank you very much in advance!!

Bengt O. Muthen posted on Monday, January 20, 2014 - 5:12 pm

Yes.

Alexis posted on Saturday, February 22, 2014 - 3:27 am

Hi,
I’m running a model with ML estimation. By default Mplus deletes cases with missing values on exogenous variables and uses full information for missing values on the endogenous variables. The result is, however, that I’m still missing 33 cases due to missings on five exogenous variables. On this forum, I read that Mplus can handle missings on x-variables if they are brought into the model as y-variables, for instance, by mentioning the variance of the variable in the model command. But I’m not sure what the effect of this is and what I’m exactly doing. Isn’t it a bit artificial? So I’m wondering what the normal/standard procedure is for handle missings. Do you follow the default and accept that you have 33 missings due to missings on x-variables. Or do you do some tricks/adjustments to make your x-variables look like y-variables and consequently no cases are deleted? If the latter option is the standard/best approach, could you please tell me what I precisely should add to my model command given the fact I have five exogenous variables of which three are dichotomous and two are continuous. Is it just as simple as adding “X1 X2 X3 X4 X5;” into the syntax?
Thank you very much in advance.

Bengt O. Muthen posted on Saturday, February 22, 2014 - 3:34 pm

(1) Usually in multiple imputation normality is assumed for the variables with missing. This approach is also taken if you bring the x variables into the model. (2) Or, you can use multiple imputation and specify that a variable is say categorical and then an underlying continuous-normal latent response variable is assumed. Both approach (1) and (2) therefore make assumptions. Treating your binary x's as continuous-normal in approach (1) is only an approximation and taking approach (2) may also be only an approximation. So in both cases you go beyond the assumption of the model you originally specified for the y's as a function of the x's. Approach (1) is probably very often taken. You mention 33 missings, but more relevant is probably the percentage that this corresponds to - if it is small the analysis doesn't rely on assumptions as much as if it is large.

Natalie Bohlmann posted on Tuesday, February 25, 2014 - 8:25 pm

question about missing data.

I am analyzing some longitudinal data in a cross-lag model (N = 250)and have 30% of subjects missing data at T1, 26% at T2 and 38% at T3. Relatedly, 36% have data at all three time points, 34% at 2 of 3 and, 30% at only 1 of 3.

Based on examining correlations within the sample, these appear to be MAR and we have included correlated variables in the analysis as auxiliary variables to help with estimation and reduce bias.

We submitted the paper and both the handling editor and 1 of the reviewers expressed concern regarding the % missing. Do you know of any references that give guidelines on acceptable levels of missingness? Or do you have a personal rule of thumb?
We have cited McArdle et al 2004 & Enders & Bandalos, 2001 on the use of FIML to address missingness. And Enders 2010 on the use of auxiliary variables. Any other suggestions? thank you very much for your ideas/suggestions.

Linda K. Muthen posted on Wednesday, February 26, 2014 - 11:34 am

I would be most concerned about how many observations are present at two of the tree time points. I would also compare my results to those of listwise deletion. You can also create dummy variables for missing for times two and three and regress them on the time 1 outcome to see if y1 predicts missingness.

I don't know of any discussion of how much missingness is too much. The Enders book is the most likely source.

Lorena Llosa posted on Tuesday, April 22, 2014 - 1:33 pm

We are conducting a randomized control trial and we are doing multilevel modeling to determine program effect. According to the What Works Clearinghouse (WWC), when dealing with missing data in our analyses we need to do so separately for the treatment and control groups. Can we use FIML separately for the treatment and control groups?

The other alternative that WWC accepts is multiple imputation but this also needs to be done separately for treatment and control groups. Is there a way to do this within Mplus?

Linda K. Muthen posted on Wednesday, April 23, 2014 - 10:48 am

You can do this using multiple group analysis.

Han-Jung Ko posted on Monday, April 28, 2014 - 4:55 pm

Dr. Muthen,
I am running a 3-step GMM model with 5-wave scale scores as my outcome variables and a few independent variables to predict the class membership (e.g., race and LGS scores).
Three participants missed to indicate their race and one missed on LGS. To keep them in the step 3 analysis, here is the model syntax:
Model:
%OVERALL%
i s | pl_1@0 pl_2@1 pl_3@2 pl_4@3 pl_5@4;
i WITH s;
c on AA sex LGS z_t1_c z_t1_n z_t1_e highschool somecoll college;
[AA LGS];
...
The analysis did include all the 163 participants. However, the AIC, BIC, and ABIC are much larger than the model without estimating AA and LGS. I wonder what your advices would be given this situation.
Thank you.

Linda K. Muthen posted on Tuesday, April 29, 2014 - 6:05 pm

When you bring the covariates into the model, the metic of the AIC etc. change.

By the way, you must bring in all of the covariates not just a subset.

Han-Jung Ko posted on Tuesday, April 29, 2014 - 11:11 pm

Thank you, Dr. Linda.

When I bring in all the covariates, the model could not converge. I was wondering whether it was because of a small sample size, around 160?

Linda K. Muthen posted on Wednesday, April 30, 2014 - 6:13 am

Please send the output and your license number to support@statmodel.com.

Jiawen Chen posted on Tuesday, May 27, 2014 - 1:57 pm

Dr. Muthen,

I'm running a latent growth model in which the latent intrinsic work rewards variable at each of the six waves was specified by four items, and then intercept and slope were estimated using the six latent variables. Finally, I used the intercept and slope to predict generativity at the final wave along with some control variables.

My concern is that there are two types of missing data here: those who did not participate in a given wave (missing at random) and those who participated but did not answer intrinsic work rewards questions because they were unemployed at the time (missing not at random). I'm fine with having FIML estimate for those who missed the wave, but not comfortable estimating for those who participated the wave but were unemployed. Would it be possible for you to give me some suggestions on how to restructure my model to account for this missing not at random? Would it be appropriate to add six dummy variables, one for each wave's employment status, when predicting generativity to address the concern of unemployment? Or should I revise in some way the longitudinal CFA model in the earlier step? Thank you very much.

Bengt O. Muthen posted on Tuesday, May 27, 2014 - 6:38 pm

It's a good research question that I don't think I know the answer to. I wonder what it would be like if you use a parallel process growth model where you have one binary part of employed/unemployed and one continuous part with an intrinsic work reward score. The latter is missing when the former is in the unemployed status. Which would mean missing as a function of an observed variable so could be MAR. At least the missing would not only be a function of other intrinsic work reward scores, but directly a function of employment.

Tania Bartolo posted on Monday, June 23, 2014 - 12:16 pm

Hello,

I am planning on running path analysis models involving examination of direct and indirect effects on a sample (N = 159) with data at 3 different time points. My variables of interest are scale scores (means of multiple items from questionnaires). The sample has both item-level missingness (1 or more items of a scale missing) and scale-level missingness (entire scale missing) for both predictor and outcome variables and I am seeking advice on the best approach to deal with missingness. I had the following questions:

1) Should item-level missingness be dealt with using multiple imputation as a first step in software outside of Mplus? And should this then be followed by maximum likelihood estimation at the scale-level in Mplus? Or:

2) Should item-level and scale-level missingness be dealt with using multiple imputation outside of Mplus and the imputed "complete" dataset be used for subsequent analyses?

Any clarification would be much appreciated!

Bengt O. Muthen posted on Monday, June 23, 2014 - 5:06 pm

The optimal approach would seem to be to formulate a factor model for the item indicators for each factor and then simply use FIML (so assuming MAR). The practical problem arises from the one-factor models maybe not fitting the data well. Imputation could use a less restrictive model. But then again, the scales you mention probably are sums of items which in itself assumes a one-factor model.

Tania Bartolo posted on Tuesday, June 24, 2014 - 8:50 pm

Hi Bengt,

Thanks for your quick reply! The scales mentioned are actually subscales from questionnaires with more than one factor.

Just to clarify, are you suggesting running a full SEM instead of a path analysis model to assist with item-level missingness using FIML? I am concerned doing so would be difficult given my sample size.

Thanks!

Tania

Linda K. Muthen posted on Wednesday, June 25, 2014 - 2:24 pm

Yes. It may be more difficult but it would yield a better analysis.

RSrinivasan posted on Saturday, July 05, 2014 - 11:51 am

Hello Drs. Muthen,

I am grad student working on a project with missing data in all most all variables. I have 5 latent variables, incl. 2 exogenous variable. I tried the syntax for missing data, but the error keeps asking me to add listwise=on and nochiquare in output. I did that as well.. but the error keeps coming back. Please help.
I would not want the program to delete a complete data set for a few missing data points. Here is my syntax-

Data:

FILE IS "I:\Data\XYZWV.csv" ;
LISTWISE=ON;

Variable:

Names are

x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
w1 w2 w3 w4 v1 v2 v3 v4
gender age marital edu income city pur
apppur amtpur;

Usevariables are

x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
w1 w2 w3 w4 v1 v2 v3 v4;

missing = all (-999);

analysis:
type = missing;

MODEL:

x by x1-x4;
y by y1-y4;
z by z1-z4;
w by w1-w4;
v by v1-v4;

OUTPUT: MODINDICES standardized nochisquare;

*** WARNING in ANALYSIS command
Starting with Version 5, TYPE=MISSING is the default for all analyses.
To obtain listwise deletion, use LISTWISE=ON in the DATA command.
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

Linda K. Muthen posted on Saturday, July 05, 2014 - 3:03 pm

If you want the model estimated using all available information, remove LISTWISE=ON; from the DATA command. The warning is just informing you that the default is using all available information. If you get an error when you remove LISTWISE=ON, send that output and your license number to support@statmodel.com.

Hillary Schafer posted on Monday, July 14, 2014 - 8:23 am

Hello, I would like to conduct a model-based (H1) multiple imputation analysis but my model contains a latent variable interaction. I specified a Bayesian estimator but a message in the output file says that Bayesian estimation is not allowed with latent variable interaction and so the default estimator (ML) was used. However, I also see specifications for Bayesian estimation in the Summary of Analysis section. Will you please tell me whether the resulting imputed data were imputed using a Bayesian estimator or a ML estimator? Also, can I be sure that the imputation was done under my H1 model and not under H0? Thank you.

RSrinivasan posted on Monday, July 14, 2014 - 8:52 am

Thanks Dr. Muthen.

Hillary Schafer posted on Monday, July 14, 2014 - 10:20 am

My apologies - I had the H0 and H1 imputation language reversed. I see now that an H1 model was used for imputation in the model that I described. Thank you.

Kate Mulvihill posted on Tuesday, August 19, 2014 - 11:11 am

Hello, I am new to Mplus and am using version 6.

I am trying to conduct multiple imputation for specified variables prior to analyses for my MA Thesis. The problem I have encountered is that missing data that appear as 999 in my SPSS data file and in my .dat data file, which Mplus is reading, appear as asterisks in all of the imputed files. I checked, and each 999 in my original data sets appears as an asterisk in the imputed files. I conducted a cross-sectional version of the same multiple imputation and subsequent analyses last week, and did not encounter this problem. I copied and pasted the same input file for the current data sets. Could you help to guide me toward my error? My apologies for the basic question.

Linda K. Muthen posted on Tuesday, August 19, 2014 - 11:52 am

Mplus uses the asterisk as the missing data symbol when data are saved. When you read the data, you must say

MISSING = *;

Be sure to note that the variables may not be in the original order. All information about the format and variable order are given at the end of the output where the imputed data sets are saved.

Kate Mulvihill posted on Tuesday, August 19, 2014 - 12:00 pm

Hello, Linda.

Thank you for your reply.

I don't understand what you mean by "when data are saved."

Likewise, I don't understand what you mean by "when you read the data." Do you mean when I run the input file for the analysis, or when I run the multiple imputation input file? In the latter, I specified MISSING = ALL (999); under VARIABLES.

I made sure that the order of variables in the dat file, VARIABLES list, and IMPUTE VARIABLES list are in the same order.

I just re-did the entire process, starting from my SPSS file. The same asterisks appear in my imputed files. I ran some simple multiple regression analyses to see what would happen. I specified IMPUTATION as TYPE and gave the correct list name to retrieve my imputed files. I got a full set of output, even though there are still asterisks in the imputed files and I did not specify MISSING in the regression input.I got a warning saying that the CHI-2 test could not be conducted perhaps due to a large amount of missing data. This is the only warning I received despite having the asterisks in the files.

Thank you for any further clarification you can provide!

Linda K. Muthen posted on Tuesday, August 19, 2014 - 1:23 pm

Please send the relevant files and your license number to support@statmodel.com.

Eric Deemer posted on Tuesday, September 30, 2014 - 7:23 pm

I'm wondering why Mplus doesn't use cases with missing data on predictors with FIML? From what I've read, it's not possible to use cases with missing data on Y under FIML but still possible (albeit more difficult) to use cases with missing values on X but observed values on Y. Any light you can shed on this would be helpful. Thanks so much.

Eric

Linda K. Muthen posted on Wednesday, October 01, 2014 - 6:58 am

Missing data theory applies to dependent variables. Missing data theory does not apply to observed exogenous variables because the model is estimated conditioned on these variables. You can mention the variances of the observed exogenous variables in the MODEL command. This causes them to be treated as dependent variables and distributional assumptions are made about them but they will be used in the analysis.

Eric Deemer posted on Wednesday, October 01, 2014 - 8:00 am

Thanks, Linda. That helps a lot.

Best,
Eric

Briana Chang posted on Tuesday, November 11, 2014 - 10:39 am

Hello Linda and Bengt,

I'm hoping this is a simple question and I'd like to be sure about it before I proceed. I'd like to bring covariates into the model so that FIML is used to handle missing on the covariates. Can you do this for binary covariates with missing values?

Thank you.

Linda K. Muthen posted on Wednesday, November 12, 2014 - 5:51 am

Yes. Note that if you want to bring one covariate into the model, you must bring all covariates into the model. You cannot bring in just a subset.

Yoosoo posted on Saturday, January 03, 2015 - 10:55 am

Hello,

I am running a SEM with 16 endogenous observed variables and 4 latent variables using WLSMV estimator with default missing data option.

The data summary tells me that the number of missing data pattern is 29, and the reported # of observations is 48000, which is the total # of samples.

Is there a way that I can find out the number of incomplete observations that were imputed and included in the analysis?

Thank you for your helpful support as always.

Bengt O. Muthen posted on Saturday, January 03, 2015 - 5:20 pm

There is no imputation done with WLSMV. The pairwise present approach is used.

Yoosoo posted on Saturday, January 03, 2015 - 6:16 pm

Thank you for your comment Bengt. Is there a way to simply report in the output the number of incomplete observations that were found in the input data?

Linda K. Muthen posted on Sunday, January 04, 2015 - 10:36 am

You can see this in the frequencies for the missing data patterns. The total sample minus those with no missing would be the number of observations with some missing values. For each variable or pairs of variables, see the coverage values.

Djangou C posted on Sunday, January 11, 2015 - 11:34 pm

I'm doing multiple imputation with Mplus and would like to know how to compute the standard deviation of a point estimate (the mean)from the standard error provided by Mplus. Could please give a reference? Thank you

Bengt O. Muthen posted on Monday, January 12, 2015 - 11:10 am

The missing data book by Joe Schafer gives a good account of all the relevant imputation formulas used in Mplus.

Sharon Simonton posted on Friday, February 13, 2015 - 3:15 pm

I’m currently trying to run two level multilevel models for several binary outcomes using FIML estimation procedures with longitudinal complex sample data. The models are complex: the level-1 models typically having 10-20 binary IVs and the level-2 models for the intercepts having a maximum of 16 continuous IVs. Many of the IVs are completely observed.

Do I need to bring all of the x variables into the model in order to have observations having missing data for the x variables included? In a 6/22/2006 posting you note that “If only two or three of your covariates have missing data, then FIML should be fine. You should study the missing data in your covariates. Perhaps there are some with very little missing data such that you could allow the listwise deletion on those and bring the others into the model.” However, on 11/12/2014 you say that “if you want to bring one covariate into the model, you must bring all covariates in to the model. You cannot bring in just a subset.”

A small subset of the IVs account for most of my missing data. Is there a way to use the 6/2006 strategy and use listwise deletion for x variables missing small amounts of data – and not include x variables which don’t have any missing data in the model? Multiple imputation isn’t feasible for a variety of reasons. It looks as though your thinking on this may have changed – but figured it’s worth asking. Thank you in advance for your help!

Bengt O. Muthen posted on Friday, February 13, 2015 - 4:11 pm

The issue with not bringing all the covariates into the model is that you want the covariates to correlate freely (as covariates should). This may not happen unless you model it. Say that you have e.g. two covariates and missing on X1 and not missing on X2 and you bring X1 into the model (essentially making it a Y). This model may leave X1 and X2 specified as uncorrelated. If you say X1 WITH X2 then you bring X2 into the model, so you have to say X1 ON X2 to correlate them and saying ON can have consequences for the rest of the model. So it is safest to bring all the Xs into the model. I assume you have considerable missingness on that small subset of IVs so that Listwise deletion is not an option.

Jaap Schuitema posted on Friday, February 20, 2015 - 4:58 am

I am running a complex model with many x-variables. One of those x-variables has missings. If I bring this x-variable into the model by mentioning the variance, the model does not fit any more. The problem is that this variable is now assumed to be uncorrelated with the other x-variables. So I added WITH-statements which brought in all the other x-variables into the model. And I needed to add more WITH-statements. Now I get a warning that the number of observed variables is exceeding the number of clusters in my model.

This puzzles me. If I look at de diagramview, the model seems the same (and when I do this with x-variables without missings the Chi-square and df are also the same) Why is there this large increase of observed variables and do you know a way to deal with this problem? Is there a way to let Mplus estimate the x-variables without increasing the number of observed variables or can I ignore this warning?

Thanks a lot for your help
Jaap

Linda K. Muthen posted on Friday, February 20, 2015 - 12:16 pm

You must bring all of the covariates into the model or none of them. You can do this by mentioning the variances. When you do this, they are treated as dependent variables in the model. The warning is to remind you that independence of observation with clustered data is at the cluster level. The impact of this on your results has not been well studied.

Jaap Schuitema posted on Wednesday, February 25, 2015 - 6:43 am

Dear Linda,

Thank you for your answer. I still have a question about bringing in all the covariates. Why does this increase the number of free parameters in the model and in the same time it doesn't affect the number of degrees of freedom.

Bengt O. Muthen posted on Wednesday, February 25, 2015 - 1:31 pm

Because the increased number of free parameters is the same as the increased number of parameters in H1 - namely the means, variances, and covariances among the covariates.

Lisa M. Yarnell posted on Friday, March 27, 2015 - 2:31 pm

Hi Drs. Muthen, I am unclear why Mplus is not deleting cases that are missing data for all dependent variables. Notes from my output are below. BWACHGAP is dependent; all other variables are independent. 140,274 is the N in my total sample, but why is this number not decreased, given that some cases do not have data for the sole dependent variable?

SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 140274
Number of dependent variables 1
Number of independent variables 11
Number of continuous latent variables 0
Observed dependent variables
Continuous BWACHGAP
Observed independent variables
CRITSKLL DIFFMETH SCH_CITY SCH_TOWN SCH_RURL SCH_MDLG SCH_LARG SCH_TITI SCH_ETH2 SCH_NSLP SCH_SMLL

Cluster variable TEACHID4
Between variables
BWACHGAP CRITSKLL DIFFMETH SCH_CITY SCH_TOWN SCH_RURL SCH_MDLG SCH_LARG SCH_TITI SCH_ETH2 SCH_NSLP SCH_SMLL

Estimator MLR
Information matrix OBSERVED . . .

SUMMARY OF DATA
Number of missing data patterns 2
Number of clusters 20127
Average cluster size 6.969

MISSING DATA PATTERNS (x = not missing)
1 2
BWACHGAP x
CRITSKLL x x
DIFFMETH x x
SCH_CITY x x
SCH_TOWN x x
SCH_RURL x x
SCH_MDLG x x
SCH_LARG x x
SCH_TITI x x
SCH_ETH2 x x
SCH_NSLP x x
SCH_SMLL x x

Bengt O. Muthen posted on Friday, March 27, 2015 - 2:49 pm

Please send the output to support so we can see the full issue.

Lisa M. Yarnell posted on Friday, March 27, 2015 - 2:49 pm

Further, Drs. Muthen, I saw the note above that by modeling the variances of exogenous variables, they are treated as dependent and distributional assumptions are made about them; it was implied that this one way to retain cases that would otherwise be dropped due to having missing data on all dependent variables.
(Please correct me if that is wrong.)
However, I have the following questions: 1) Why would an analyst want exogenous variables to be treated as dependent in the model; what consequences are there to this? 2) When I explored the result of modeling vs. not modeling the variances of my exogenous variables named above, I found that the fit indices changed drastically simply due to the explicit modeling of these variances.

Please see below. Is this drop in the goodness in fit due to the improper assumptions that may be made about the distributions of these variables? Is the assumption multivariate normality? Thank you.

*Without* exogenous variables' variances explicitly modeled:
RMSEA 0.013
CFI 0.830
TLI 0.716

With exogenous variables' variances explicitly modeled:
RMSEA 0.079
CFI 0.027
TLI -0.189

Bengt O. Muthen posted on Friday, March 27, 2015 - 2:51 pm

Please send these 2 outputs to support so we can see the whole story.

Lisa M. Yarnell posted on Friday, March 27, 2015 - 2:55 pm

Thank you Bengt. What insight can you give here? I am now using a corporate-licensed install of Mplus currently, and do not have the license number on hand.

Lisa M. Yarnell posted on Friday, March 27, 2015 - 2:57 pm

OK , I will send the output, but cannot request the license number until Monday, and am not sure if all employees receive that number.

Bengt O. Muthen posted on Friday, March 27, 2015 - 3:27 pm

Regarding your first question, perhaps you are bringing x's into the model by mentioning their variances. In this case you no longer have a univariate model for your BWACHGAP DV, but you have a multivariate model for BWACHGAP and all the x's. So even if you have missing on BWACHGAP, people who have non-missing data on at least one x variable are (correctly) kept in the analysis sample.

Regarding the change in fit, I cannot speculate except to say that you should make sue you let all the x's correlate freely.

Lisa M. Yarnell posted on Monday, March 30, 2015 - 7:23 am

UG Ch11 states: "NMAR modeling is possible using ML estimation where categorical outcomes are indicators of missingness and where missingness can be predicted by continuous and categorical latent variables."

Yes, I have predicted missingness as a dichotomous outcome in such models--2 DVs are modeled: the outcome itself, and missingness. Both can be regressed on covariates, and this assumes MAR.

1) By correlating these two DVs, we can see if missingness is correlated with the predicted score in the whole sample--whether NMAR is a better assumption--right? Or is this not true, if ML estimation of missing scores (first DV) assumes MAR in the first place?

2) The above strategy (correlating these 2 DVs) works in a 1-level model but not a 2-level model. With the latter I get: "Covariances involving between-only categorical variables are not currently defined on the BETWEEN level."

I can run regressions of both DVs on the between level--but not correlate these outcomes. Does Mplus not allow for modeling covariances of dichotomous DVs on the between level?

I know mixture modeling is another option for NMAR, but given the first statement above, it seems this strategy should work: "categorical outcomes are indicators of missingness and missingness can be predicted..."

Perhaps simply not in a 2-level model where missingness is between only?

Lisa M. Yarnell posted on Monday, March 30, 2015 - 9:13 am

Drs. Muthen,
I employed the strategy of creating a latent variance on level-2 to define the residual variance for the indicator of missingness, and successfully correlated this residual variance with residual variance with the central variable.

It seems that MAR is a plausible assumption given an estimated value of the correlation of missingness with the DV of about 0:
F1 WITH
BWACHGAP -0.003 0.424 -0.006 0.995

Is this an OK way to assess the MAR assumption? Thank you sincerely.

Bengt O. Muthen posted on Monday, March 30, 2015 - 12:55 pm

1) The missing data literature emphasizes that you cannot test whether NMAR is more suitable than MAR. I recommend the book by Craig Enders. This means that your 2- DV approach is not correct. Perhaps because the information on the residual correlation that you focus on comes only from those who don't have missing on Y (the rest is handled by the bivariate normal information). For NMAR modeling you need at least 2 DVs, not counting the binary missing data indicators. For more on NMAR modeling, see for instance:

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper.
download paper contact first author show abstract

2) No longer relevant.

Pamela Medina posted on Tuesday, April 14, 2015 - 3:44 pm

Hello,

I have a dataset (n=3,000), with a large number of missing values. I was told to conduct multiple imputation using Bayesian analysis. When I try to run the imputation, the "DATA IMPUTATION" command does not turn blue. I was wondering if it is unavailable in the demo version, or if I have done something wrong in the input.

Here is my input:
TITLE: bayesian imputation test
DATA: FILE IS "E:\Dissertation\2012LAPOPComplete.dat"
VARIABLE: NAMES ARE gen cp8 cp9 cp6 cp7 cp21 it1 b10a b11 b13 b18 b21 b21a b31 b32 prot3 prot6 prot8 vote cp4 cp4a cp2 np2 cp13 np1 cp5 q10new ed etid q11 q2;
USE VARIABLES ARE gen cp8 cp9 cp6 cp7 cp21 it1 b10a b11 b13 b18 b21 b21a b31 b32 prot3 prot6 prot8 vote cp4 cp4a cp2 np2 cp13 np1 cp5 q10new ed etid q11 q2;
CATEGORICAL ARE gen cp8 cp9 cp6 cp7 cp21 it1 b10a b11 b13 b18 b21 b21a b31 b32 prot3 prot6 prot8 vote cp4 cp4a cp2 np2 cp13 np1 cp5 ed etid q11 gen q2;
AUXILLARY ARE ed etid q11 q2 gen;
MISSING ARE cp8 cp9 cp6 cp7 cp21 it1 b10a b11 b13 b18 b21 b21a b31 b32 prot3 prot6 prot8 vote cp4 cp4a cp2 np2 cp13 np1 cp5 q10new ed etid q11 q2 (888888 988888 999999);
DATA IMPUTATION: IMPUTE = cp8 cp9 cp6 cp7 cp21 it1 b10a b11 b13 b18 b21 b21a b31 b32 prot3 prot6 prot8 vote cp4 cp4a cp2 np2 cp13 np1 cp5 q10new ed etid q11 q2;
NDATASETS = 10;
SAVE = missimp*.dat;
ANALYSIS: TYPE = BASIC;
OUTPUT: TECH8;

Linda K. Muthen posted on Tuesday, April 14, 2015 - 6:19 pm

Commands with more than one word do not turn blue.

You can use only 6 variables in the Demo.

Irena Schneider posted on Tuesday, August 04, 2015 - 1:48 pm

Dear Linda/Bengt,

I understand that Mplus removes cases with missing on all variables to run FIML estimation. I am working with political trust survey questions: out of about 37,000 respondents, 485 were removed for having missing values on all variables. This makes the summary statistics on the Mplus basic analysis slightly off from the summary stats I have in Stata.

I'm concerned that this ends up removing reticent respondents from the data who are scared to answer the survey question. Hypothetically, just to see if I could retain the 485 cases, I added a different variable into the basic analysis in Mplus, which almost all respondents answered.

This time, Mplus ran the basic analysis without a problem-- no cases were removed, yet the summary statistics in the basic output are still off by the same amount as when Mplus removed the 485 cases. I'm confused because Mplus should have been using the same exact data and missing values as Stata this time. Any idea why this happened?

Bengt O. Muthen posted on Tuesday, August 04, 2015 - 2:25 pm

You have to make a distinction between DVs and IVs and multivariate vs univariate estimation in the following sense.

Perhaps you are saying that you have a model with DVs and IVs and where all the DVs have missing data. They get removed because they don't add information to the estimation of relations between DVs and IVs. When you say you add a variable with no missing, perhaps that is used as a DV in which case Mplus keeps everyone because it can draw on missing data theory.

Summary statistics can be computed using Type=Basic which uses all DV and IV variables and since no "DV ON IV" is part of this analysis, all variables will be used in the missing data analysis to computed the summary statistics.

Statistically, there would be no difference between Stata and Mplus, only in how you use the programs.

If this doesn't help, send relevant outputs from Mplus and Stata to support along with your license number.

Janelle Montroy posted on Thursday, August 06, 2015 - 9:00 am

Hello,
I am using Mplus 7 to perform IRT analyses on 104 dichotomous outcomes. Participants were only administered about a quarter of the items. Items were sampled in a way that creates MAR, as the missingness can be partially predicted by a categorical covariate, i.e., grade. Specifically, participants were administered a higher proportion of grade appropriate items and lower proportions of grade inappropriate items. I read in the user’s manual that a covariate can be used to model missingness of categorical DVs when one uses WLS. However, my attempts to use Auxiliary = grade(M) with WLS, WLSMV, ULSMV, and MLR have all generated the error message "Analysis with categorical variables is not available with the 'm' specifier in the AUXILIARY option." Is there another way to account for the MAR, in the context of dichotomous items and lots of (n=2317) missing data patterns?

Bengt O. Muthen posted on Thursday, August 06, 2015 - 3:00 pm

You can let grade predict the factor (s) and thereby draw on MAR. Missingness as a function of a covariate is what the manual refers to for WLSMV handling missing data.

Jason Anthony posted on Friday, August 07, 2015 - 6:11 am

Interesting... So in Janelle's example above, if one were to regress the factor(s) on grade then you're saying that would adjust for the MAR design of the item sampling? Would the same approach work if one were to use MLR and ULSMV?

Jason Anthony posted on Friday, August 07, 2015 - 9:39 am

In addition to my post above, is there a way to print the IRT parameterization when one uses a covariate in the model? I tried using the D(1.0) output option, but that didn't work. thanks for all your guidance!

Bengt O. Muthen posted on Friday, August 07, 2015 - 5:17 pm

First post:

Q1: Yes, if there is no other variable than grade not included in the model that predicts the missingness.

Q2: Yes.

Bengt O. Muthen posted on Friday, August 07, 2015 - 5:19 pm

Second post:

IRT typically uses N(0,1) for the latent variable which you won't necessarily get with a covariate. So you would have to do the re-parameterization yourself, e.g using Model constraint.

Jason Anthony posted on Saturday, August 08, 2015 - 2:43 pm

Sorry, I'm still struggling with how to get the IRT parameterization. The (simplified) code below results in an error message that Model Constraint does not recognize the N function. I also tried constraining the factor mean to zero and factor residual variance to one, but that does not yield the IRT parameterization of thresholds/difficulties either, presumably becuase it is the factor variance that needs to be constrained.

VARIABLE:
NAMES are
mplusid V1-V104 GRADE;
USEVARIABLES are
V1-V104 GRADE;
CATEGORICAL =
V1-V104;
MISSING are . ;

ANALYSIS:
TYPE = GENERAL;
! ESTIMATOR = MLR;
PROCESSORS = 2;

MODEL:
F1 BY V1* V2-V104;
F1 on GRADE;
[F1] (mean);

MODEL CONSTRAINT:
mean = N(0,1);

Bengt O. Muthen posted on Sunday, August 09, 2015 - 7:41 pm

You can try to do this using the tech report we have on our website:

See the IRT page and the paper mentioned at:

A brief technical description of the formulas used in the plots of item characteristics curves and information curves is available

If you can't handle these formulas, you can try to make this simple by estimating the model in a first step without Model Constraint (your statement is not allowed). Fix the residual variance of f1 at 1 (f1@1). Asking for TECH4 you get the total variance of f1. Rerun by using a grade variable scaled so that the total variance of f1 is 1.

Jason Anthony posted on Wednesday, August 12, 2015 - 9:04 am

Thank you, Bengt, for referring me to the tech report. Reparameterizing the loadings and thresholds to discriminations and difficulties was no trouble.

Interestingly, whether or not one used a covariate to account for MAR had essentially no impact on estimates of a and b when using MLR, rs = .99 and 1.0 for as and bs, respectively. However, using a covariate to account for MAR had major impact on estimates of b when one used WLSMV, r among bs only .60. Estimates of b from MLR and WLSMV were more closely aligned when one accounted for the MAR using a covariate, r = .90, than when one did not account for MAR, r = .66.

Bengt O. Muthen posted on Wednesday, August 12, 2015 - 5:03 pm

Interesting; makes sense.

Hillary Gorin posted on Friday, August 21, 2015 - 2:41 pm

How do people assess the mechanism of missingness (I. e. MCAR, MAR, NMAR) when dealing with categorical variables? Little MCAR’s test does not work. Is there an alternative in MPLUS?

Bengt O. Muthen posted on Friday, August 21, 2015 - 2:48 pm

Typically, it is not assessed. One cannot test MAR against NMAR for instance. ML under MAR is used as the standard - also in Mplus. See also examples in the missing data chapter 11 in the UG.

Stephanie Whitworth posted on Monday, October 12, 2015 - 5:30 pm

Hi Linda & Bengt,
I am following your recommendations for determining the best approach to dealing with NMAR data, and running MAR, pattern-mixture, and selection models. I cannot get the input for the Roy-Muthen model to run, and was hoping you could explain what the "u" variable is?

E.g.
Variable:
Names =
y0-y5 u1-u5;

Missing = *;

usev = y0-y5
d1 d2 d3 d4 d5;

classes = cu(2) cy(4);

Thank you

Bengt O. Muthen posted on Monday, October 12, 2015 - 6:12 pm

See the Muthen et al (2011) Psych Methods article on our web site. Page 22 describes the Roy method. The u variables of the runs shown on the web site can be ignored. I think they reflect missing or not at the different time points.

Stephanie Whitworth posted on Monday, October 12, 2015 - 8:46 pm

Hi Bengt,

Thank you for your response. However, when I removed the "u" variables from the input, the model didn't run.

Do I need to be coding missingness as present/absent manually?

Thank you

Sharon Simonton posted on Wednesday, October 14, 2015 - 10:14 am

I'm currently trying to run some manual R3STEP models for a binary distal outcome and am trying to use FIML estimation procedures to account for missing data. Many of my IVs are binary. I have from 200 to 3,600 observations in each latent class. When I try to run models for my binary outcomes, I receive a message that the covariance matrix for one or more of my classes cannot be inverted. I’m wondering if this is happening because of (1) the distributional assumption of multivariate normality for all of the IVs and (2) very sparse/empty cell counts when I enter 2-3 binary indicators in the model.

My emerging impression is that FIML estimation cannot handle any empty cells in the crosstabs/frequency table and covariance/design matrices (e.g. no variance in outcome within cell). FIML estimation often just won’t work when I have fewer < 5 observations in any cell. For example, if I try to look at differences in a binary outcome by latent class, gender, and race (4 x 2 x 3) and one subgroup’s cell for the presence of the binary outcome is empty. It looks as though I need to have some threshold number of observations in each possible cell in order to successfully use FIML estimation procedures – FIML can’t seem to handle sparse tables. Is there a better way to do this? I suspect that multiple imputation will also be problematic. Thank you in advance for your help!

Bengt O. Muthen posted on Wednesday, October 14, 2015 - 2:36 pm

I assume you have missing data on your binary outcome(s) to give zero frequencies for certain combinations of binary covariate (IV) values. With binary covariates a singular covariance matrix problem may arise in those cases. I don't think there is a way around that problem; it is a common problem in logistic regression.

If that doesn't answer your concerns, we need more information to comment. Please send input, output, data, and license number to Support along with a clarification of:

- do you have missing on the binary outcomes or the binary IVs?

- when you talk about entering 2-3 binary indicators is that the outcomes you are talking about or the IVs?

Wen-Hsu Lin posted on Wednesday, October 14, 2015 - 6:30 pm

Hi, I have a question regarding wave non response in my data (attrition). I have 8 waves and the attrition (wave 1 vs. wave 8) is almost 40%. I then ran my growth curve. My question is: Can FIML provide proper estimation? I checked the Covariance Coverage table and some numbers were at .6. Is this ok? Can I combined FIML and weight (IPTW) to adjust for attrition?
Thank you

Bengt O. Muthen posted on Wednesday, October 14, 2015 - 6:44 pm

Coverage of 0.6 is not good. FIML or any other missing data technique will have to rely too strongly on model assumptions, especially that the missing is MAR and that the variables are normally distributed.

But things are better if earlier time points have higher coverage. Assumptions play a much smaller role if the coverage is at least say 0.8.

Wen-Hsu Lin posted on Thursday, October 15, 2015 - 4:36 pm

Yes. Earlier time points have converge over .9 then drop to .6. I am concerned because different methods give somewhat different results. Thank you.

Bengt O. Muthen posted on Friday, October 16, 2015 - 12:05 pm

You can show this in a write up so readers can judge it. For instance, report the results when using only some of the early time points as compared to using all time points.

Simon Coulombe posted on Thursday, October 22, 2015 - 4:06 pm

Hi, I'm testing a path analysis model with two binary covariates which lead to the deletion of 5.4% of the sample.

For continuous outcome, I know that there is a way of dealing with missing values on covariates by including them explicitly into the model. Is this possible with binary covariates too?
If no, what would be the best way of dealing with these missing values?

Thanks you very much.

Simon Coulombe posted on Thursday, October 22, 2015 - 6:05 pm

Sorry for my mistake. I forgot to specify that the two binary covariates have missing values (which lead to the deletion of 5.4% of the sample).

Thanks

Bengt O. Muthen posted on Thursday, October 22, 2015 - 7:05 pm

Yes, you can do the same with binary covariates, although it is a bit better and also possible using Bayes to say that they are binary.

Simon Coulombe posted on Friday, October 23, 2015 - 6:30 am

Thanks. In other word, I should simply use Estimator=Bayes instead of Estimator=ML?

Because I'm trying to test a mediation model, I should use the Model constraint command instead of the Model indirect? (If I understand well from a previous post, I would also have to divide by the SD of the DV and multiply by the SD of the IV).

Is there a reference that I could read with a more concrete example of this? (or syntax)

Thanks you for everything.

Vanessa Castro posted on Friday, October 23, 2015 - 8:04 am

Hi. I am running a latent profile analysis on 13 rating variables with two level nesting (15 scenarios within 300 people). I was able to successfully model a covariate (age group, dummy coded) with the use of algorithm=integration and integration=montecarlo statements, and including the covariate mean explicitly the MODEL line.

However, I am trying now to test a different covariate (implicit emotion beliefs), and when I try running the same syntax, I get the following missing data warning message:

Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 297

My syntax is:
missing = .;

usevar =
subjid
ems
SSavoid
SSleave
SMmod
ADneg
ADpos
REdet
REdis
REpos
RErum
REacc
RMsup
RMpos
RMphys;

classes = c(3);
cluster = subjid;

Analysis:
Type = Mixture Complex ;
algorithm = integration;
integration = montecarlo;
Starts = 100 10;
stiterations = 10;
k-1starts = 100 10;
processors =8(starts);

Output:
SAMPSTAT tech11;

Thank you for your help!

Bengt O. Muthen posted on Friday, October 23, 2015 - 6:18 pm

Add the Patterns option to your OUTPUT command so you can see the missing data structure.

If that doesn't help, send output to Support along with license number.

Bengt O. Muthen posted on Friday, October 23, 2015 - 6:23 pm

Answer to Coulombe:

If you want to include covariates in the model and say that some are binary, Bayes is the way to go. Put the binary ones on the Categorical list and for covariates x1-xp say:

x1-xp with x1-xp;

Yes on your second paragraph.

No, not yet - but it is coming.

Simon Coulombe posted on Saturday, October 24, 2015 - 10:58 am

Thank you Dr. Muthen.

Just to make sure:
adding x1-xp with x1-xp is the way for including the covariates in the model (by correlating them all with each others?).

Finally, if I understand well, we don't use bootstrap with Bayesian analysis?

Thank you very much.

Simon

Bengt O. Muthen posted on Saturday, October 24, 2015 - 4:53 pm

Q1. Yes.

Q2. Right, that's not needed with Bayes. Bayes still allows non-normal parameter distributions and non-symmetric confidence intervals.

SABA posted on Thursday, November 26, 2015 - 8:25 am

Hi, I am running a multiple regression. 25% of my data is missing on a questionnaire because (respondents had refused to respond to that specific questionnaire) however, they have responded to another questionnaire which is also a part of model. The data is missing at random. The analysis type is complex and these respondents are cluster in my analysis. I run my model by trying both estimators ML/ MLR and in both cases I get the following message.
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 1621
My question is why these cases are excluded from the analysis? And ML is not estimating them. Thank you

Bengt O. Muthen posted on Thursday, November 26, 2015 - 6:48 pm

In regular regression you use only subjects with complete data on the x's.

If you have subjects who have missing on x's but not missing on y, you can benefit from that using missing data theory by "including the x's in the model", which you can do in Mplus by adding

x1-x5;

assuming you have 5 x's.

SABA posted on Thursday, December 03, 2015 - 2:44 am

Hi, I am running a multiple regression. 25% of my data is missing on a questionnaire because (respondents had refused to respond to that specific questionnaire) however, they have responded to another questionnaire which is also a part of model. The data is missing at random. The analysis type is complex and these respondents are cluster in my analysis. I run my model by trying both estimators ML/ MLR and in both cases I get the following message.
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 1621
If I use estimator MAR, then I get the following message
*** ERROR in ANALYSIS command
Unrecognized setting for ESTIMATOR option:
MAR
Could you please tell me why these estimators are not estimating the missing values? And what could be the solution. Thank you

Bengt O. Muthen posted on Thursday, December 03, 2015 - 8:46 am

MAR is not an estimator. Perhaps you are thinking about ML estimation using the MAR assumption. Saying ML or MLR will estimate under the MAR assumption.

Subjects with missing data on x are typically excluded because x is not part of the model in terms of parameters estimated. But adding a normality assumption on x you can bring x into the model saying in the Model command:

x;

and this will bring up the sample size. Note, however, that regression slope estimates are only affected if subjects with missing on x have data on y.

Phil Wood posted on Monday, December 14, 2015 - 3:24 pm

Is there any way to exclude observations from analysis based on the number of missing data points across variables? Something along the order of, say, in SAS, saying If nmiss(of item1-item10)<5 then delete;
Thanks!

Bengt O. Muthen posted on Monday, December 14, 2015 - 4:31 pm

Not directly. But you can use DEFINE to do it by using

IF(y EQ _MISSING)THEN ymiss=1 ELSE ymiss=0;

and then use DEFINE to add up the ymiss.

Grace Icenogle posted on Wednesday, January 13, 2016 - 10:03 am

Dear Drs. Muthen and Asparouhov,

I am conducting an MGA (11 groups) with one outcome. It is my understanding that to fill in missing data on the outcome (i.e., FIML), I need to call out the variances of all covariates.

However, I am interested in non-linear age trends in my outcome, therefore my model contains the following interactions:
Age*Age and Age*Age*Age.

My question is do I need to call out the variances of the interactions like so:
Age AgeSq AgeCu Covar1 Covar2;

OR can they be excluded like so:
Age Covar1 Covar2;

Thank you so much for you time.

Best,
Grace

Bengt O. Muthen posted on Wednesday, January 13, 2016 - 12:31 pm

If a person has missing data on the outcome it won't help if you bring the covariates into the model as you suggest. The people you want to include are those with the outcome observed who have missing on some covariates.

You need to mention all variables including the interactions (for deeper technical reasons, this won't be technically exactly correct but approximately so).

Grace Icenogle posted on Wednesday, January 13, 2016 - 12:56 pm

Ah, understood. Thank you for your insight!

Grace Icenogle posted on Friday, January 15, 2016 - 12:16 pm

Dear Mplus team,

A follow-up from my last question:

When I have only one DV, how can I 'get' Mplus to include cases with missing data on the DV (but no missing data on IVs)?

Would it be appropriate to include other variables known to be correlated with the DV in the model, say, by calling out their variances and/or means? If so, is it advisable to call out the means, variances, or both?

If not, is there another approach to retain cases with missing data on the DV?

Thank you for taking the time to respond--particularly to someone who is early on in the learning process.

Kindly,
Grace

Bengt O. Muthen posted on Friday, January 15, 2016 - 5:47 pm

If you have a regression of y ON x, you can certainly bring x into the model by mentioning its mean or variance. But the slope won't be affected by data on people who have missing on y and observed on x. The slope is affected only by data on people who have observed y and missing x.

Sona Aoyagi posted on Friday, February 12, 2016 - 7:48 am

Hi,
I'd like to know about TYPE=DDROPOUT option in DATA MISSING command, which is for pattern mixture model.

1) In user's guide, it says "For TYPE=SDROPOUT and TYPE=DDROPOUT, the number of binary indicators is one less than the number of variables in the NAMES statement because dropout cannot occur
before the second time point an individual is observed."
But I actually have cases which are dropped out at the first observed point (i.e. before the second time point).
Is there any solution to build in these cases which occurred before the second time point in pattern mixture model?

2) When I ran the model (TYPE=DDROPOUT), I got the error as below.
" One or more variables have a variance of zero. Check your data and format statement."
The error occurred at d1. Could you tell me the meaning of this error message?
DATA MISSING:
NAMES = y0-y5;
BINARY = d1-d5;
TYPE = DDROPOUT;
MODEL:
i s | y0@0 y1@2 y2@6 y3@10 y4@14 y5@20 ;
i ON d1-d5;
s ON d3-d5;
s ON d1 (1);
s ON d2 (1);

3) Sorry for this beginner question: What is the difference between TYPE=MISSING and TYPE=DDROPOUT?

TIA

Bengt O. Muthen posted on Friday, February 12, 2016 - 5:15 pm

Type = Missing is for ML under the MAR assumption. Type=Ddropout is for pattern-mixture modeling of NMAR data. For a review see the paper on our website:

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper.
download paper contact first author show abstract

If you are a beginner when it comes to missing data handling I would not use pattern-mixture modeling and therefore not Ddropout.

Lin Jiang posted on Tuesday, February 16, 2016 - 11:03 am

Hi Dr. Muthen,

My model has two latent variables. The IV is a latent variable with two observed variables (1 categorical, 1 continuous); the DV is a latent variable with 4 observed variables. The categorical observed variable for IV has 50 missing values out of 257. However, the continuous variables for IV has 145 missing values.

I run the overall model first. The model fits. However, when I run the multi-group SEM with gender, it shows "no convergence number of iterations exceeded". I guess, the "no convergence" is caused by the large number of missing values.

I want to impute the missing values first, and then run the multi-group SEM model again. However, one of my dissertation committee members insists that I should make the multi-group model with missing values fit, and then, make the model fit after imputation. I suspect whether the first step is necessary, since a lot of missing values exit, it is hard to make the multi-group SEM model fit. Do you have any suggestions?

Also, my data were collected from both male and female. When I use the imputation, should I estimate the missing data separately based on different gender? How can I do that? ( I didn't attend any Mplus trainings/workshops because of the tight budget. I learned how to use it by myself.) Thank you for your help!

Bengt O. Muthen posted on Tuesday, February 16, 2016 - 6:50 pm

Send output from the no convergence run (s) to Support along with your license number.

But note that with 145 missing values out of 257 there is no missing data approach that is trustworthy because you rely too much on model assumptions and too little on data.

Lin Jiang posted on Tuesday, February 16, 2016 - 7:28 pm

Dr. Muthen,

Thank you for your reply! I am using the Mplus installed on the computers in statistic lab in my university. Do you know where can I find the license number? Thank you.

Linda K. Muthen posted on Wednesday, February 17, 2016 - 4:59 pm

You must be the registered user of an Mplus license with a current support contract to be eligible for support. You can ask the IT person in charge of the lab if this is possible.

Aurelie Lange posted on Wednesday, March 23, 2016 - 8:40 am

Dear Dr Muthen,

We have a longitudinal study with 4 waves spread over 2 years. As we use routinely collected data from a treatment for criminal adolescents, we have a large amount of missing data: we have 50 - 70% missing data on each variable on each wave. As a consequence, our covariance coverage varies from just above .20 up to aproximately .60. We are conducting growth models, as well as latent class growth models, using FIML.

I was wondering whether there is any way in which we could get an indiciation of the estimation bias introduced in our parameter estimates by FIML. For example, for multiple imputation Collins (2001) suggests to compute a standardized bias, which is 100*(average estimate - parameter)/se, where se is the standard deviation of the estimate.
Are there any such possibilities to investigate whether the missing data is leading to biased parameter estimations when using FIML?

Thank you so much for your advice!

Aurelie

Linda K. Muthen posted on Wednesday, March 23, 2016 - 9:03 am

You could do a simulation study to see the effect of so much missing data on your results. We recommend no more than 10 to 20 percent missing based on our experience.

Filipa Alexandra da Costa Rico Cala posted on Monday, April 18, 2016 - 4:09 pm

Dear Linda,

I would like to perform a CFA analysis with a variable which has some missing values. In fact, in SPSS I used the imputation and it generated 5 data sets. how can I analyse in Mplus these 5 data sets generated from multiple imputation? Can I perform multiple imputation in Mplus? many thanks in advance for your help,

Linda K. Muthen posted on Monday, April 18, 2016 - 4:35 pm

See Example 13.13 for using imputed data sets. See Example 11.5 for imputing data in Mplus.

Filipa Alexandra da Costa Rico Cala posted on Wednesday, April 20, 2016 - 6:31 am

Dear Linda,

In fact, I was analysing my missing data and I found that I have less than 5% of missing data, therefore I will not use multiple imputation (as I don't have many missing data). Therefore, do you think that Mplus will handle automatically my missing data (using the method FIML). Or should I write something specifically in the Syntax? In fact, I have already read many information, but I am a little confused and I think I need your advice. Many thanks in advance for your help.

Linda K. Muthen posted on Wednesday, April 20, 2016 - 6:55 am

With maximum likelihood estimation, FIML is the default. You don't need to specify anything.

Filipa Alexandra da Costa Rico Cala posted on Wednesday, April 20, 2016 - 7:10 am

Thanks very much. However, I am conducting a CFA with categorical indicators. Also with categorical indicators, can I use FIML? Once again, many thanks for your help.

Linda K. Muthen posted on Wednesday, April 20, 2016 - 10:55 am

You can use the CATEGORICAL options with maximum likelihood estimation.

Filipa Alexandra da Costa Rico Cala posted on Wednesday, April 20, 2016 - 11:59 am

Dear Linda,

Many thanks for your reply and for your help. However, I tried to use this in Mplus and it didn't worked (I received an error message). here I attach the syntax that I wrote for performing a CFA with unordered categorical observed variables with a maximum likelihood estimation for you see, in order to see if I did something wrong.

Title: CFA DSM-IV-J;
Data: File is validacao only gamblingversao21904.dat;
Variable: NAMES are DSM1rec DSM2rec DSM3rec DSM4rec DSM5rec DSM6rec DSM7rec DSM8rec DSM9rec;
USEVARIABLES are DSM1rec DSM2rec DSM3rec DSM4rec DSM5rec DSM6rec DSM7rec DSM8rec DSM9rec;
MISSING are all (-999);
NOMINAL are all;
Analysis: TYPE IS MISSING H1

Model: F1 by DSM1rec DSM2rec DSM3rec
DSM4rec DSM5rec DSM6rec DSM7rec DSM8rec DSM9rec;

Output:
STANDARDIZED;
MODINDICES;

That is the correct syntax for CFA with unordered categorical variables, and using a maximum likelihood estimation for missing values? Because the software is also using the WLMSV estimator to handle categorical variables. Once again, may thanks for your help.

Linda K. Muthen posted on Wednesday, April 20, 2016 - 4:29 pm

The default is WLSMV. Add ESTIMATOR = ML; to the ANALYSIS command if you want maximum likelihood.

Filipa Alexandra da Costa Rico Cala posted on Thursday, April 21, 2016 - 1:36 am

Linda,

Thank you very much for you advice. I will change estimator and use the estimator ML instead of WLSMV. However, I am still a bit confused about one thing: can you please only tell me if I sill use the WLSMV estimator (the default for categorical data), how the software will handle missing data? Because from what I had read, I didn't understand if it's listwise ou pairwise.
Once again, many thanks for your help.

Linda K. Muthen posted on Thursday, April 21, 2016 - 7:02 am

Pairwise.

Filipa Alexandra da Costa Rico Cala posted on Thursday, April 28, 2016 - 11:00 am

Dear Linda,

Many thanks for all your help. In fact, I followed your suggestion and thus, I tried to perform the CFA of my instrument (an instrument with unordered categorical variables with 9 items) using both estimators, that is, I performed one analysis with the WLSMV estimator (the default for categorical variables) and I performed another analysis with ML estimator and I obtained different results. Taking into account that I only have 8 missing values for a sample of 750 respondents and analysing the pattern of the missing, it seemed that the pattern is MAR, what do you think it will be the best approach for my case? Do you think that will be to use the estimator WLSMV and let the software handle automatically through pairwise method? It's a CFA with 9 items, so I do not have covariates, right? Once again, thank you very much for all your help and insights.

Linda K. Muthen posted on Thursday, April 28, 2016 - 5:14 pm

Are you using the CATEGORICAL option with WLSMV and ML. You should be doing this. You should be comparing the patterns of significance not the values of the coefficients. ML gives logisitic regression and WLSMV gives probit regression. They are not on the same scale.

Filipa Alexandra da Costa Rico Cala posted on Friday, April 29, 2016 - 1:33 am

Dear Linda,

Once again, thank you very much for your help and insights. Therefore, do you mean performing the same syntax for using the categorical option with WLSMV and ML? Or do you mean performing two syntaxes and then compare the patterns of significance (and not the values of the coefficients) I performed this syntax for the categorical option with WLSMV and ML. Can you please tell me if it is right?

Title: CFA DSM-IV-J;
Data: File is validacao only gamblingversao21904.dat;
Variable: NAMES are DSM1rec DSM2rec DSM3rec DSM4rec DSM5rec DSM6rec DSM7rec DSM8rec DSM9rec;
USEVARIABLES are DSM1rec DSM2rec DSM3rec DSM4rec DSM5rec DSM6rec DSM7rec DSM8rec DSM9rec;
MISSING are all (-999);
NOMINAL are all;
Analysis: TYPE IS MISSING H1
ESTIMATOR IS WLSMV
Model: F1 by DSM1rec DSM2rec DSM3rec
DSM4rec DSM5rec DSM6rec DSM7rec DSM8rec DSM9rec;
Output:
STANDARDIZED;
MODINDICES;

Is this the correct manner to perform the syntax for using the categorical option with WLSMV and ML?
In addition, question 2) If the data were completely at random, do you think that pairwise method should be fine?
Once again, many thanks for all your help

Linda K. Muthen posted on Friday, April 29, 2016 - 7:33 am

Please send the two outputs and your license number to support@statmodel.com.

Filipa Alexandra da Costa Rico Cala posted on Friday, April 29, 2016 - 7:52 am

Thank you Linda. I will send

Jiebing Wang posted on Tuesday, May 10, 2016 - 8:28 am

Hello Dr. Muthen,

I did a MGCFA for a 11 item scale (2 groups). One group has missing values on items; the valid sample size for items ranges 498-505. The output showed the sample size for that group is 502. Why?

Thanks!
Jiebing

Bengt O. Muthen posted on Friday, May 13, 2016 - 2:10 pm

Send to Support along with your license number.

Jessica Grady posted on Tuesday, June 14, 2016 - 5:19 pm

Hello, I am running a series of multiple regression analyses. The data set contains missing values so I am using maximum likelihood to retain the full sample in models. Some of the outcome variables are non-normal, others are normally distributed. Based on my understanding, it is appropriate to use MLR (rather than ML) to estimate parameters for those models with outcome variables that are non-normally distributed. I am wondering what the best practice is: is it appropriate to use MLR for all analyses reported within the same paper, even for those with outcome variables that are normally distributed? Or should I use ML when modeling outcomes that are normally distributed and MLR when modeling outcomes that are not normally distributed? Results do differ slightly when using ML versus MLR. Thank you!

Linda K. Muthen posted on Wednesday, June 15, 2016 - 7:25 am

You should use MLR throughout.

Richard E. Zinbarg posted on Thursday, June 30, 2016 - 11:47 am

I ran a LCGA model with the LRTBOOTSTRAP option to test the difference in fit between a 3 class model and a 2 class model. The model took several days to run and when it finished the output did not contain any results but rather ended with a warning that the covariance coverage falls below the specified limit. It seems rather odd to me that it took several days to run and seemed to be fitting models with different random start values but then did not give me the results from any of the models. Does this seem odd to you too?

Tihomir Asparouhov posted on Friday, July 01, 2016 - 5:22 pm

The program terminated before the computation was completed either by you (accidentally) or the Mplus program crashed during the computation. Run it again and if it crashes again send it to support@statmodel.com

Richard E. Zinbarg posted on Monday, July 04, 2016 - 8:43 pm

thanks - it ran for several days. I need to figure out how to submit it as a batch job to version of Mplus on my university's social science computing cluster so it doesn't tie up my computer for several days again. Once I figure that out, I will re-run it and get back to you.

Bengt O. Muthen posted on Tuesday, July 05, 2016 - 6:11 pm

Ok.

Paulo Alexandre Ferreira Martins posted on Saturday, August 06, 2016 - 10:28 am

Hi!
Running a “summary data” do not allow some commands like “Define”.
So, i wonder which option to i have to take to run a reverse observed variable to include in the output descriptives.
Thank you!

Bengt O. Muthen posted on Saturday, August 06, 2016 - 11:12 am

What do you mean by a reverse variable?

Paulo Alexandre Ferreira Martins posted on Saturday, August 06, 2016 - 11:57 am

A continuous variable in which a 7 point likert scale should be in the opposite direction of others.
I’ve already did this syntax following your suggestion, but then, not requesting summary data option.
e.g:
Define:
SBPN13r=7-SBPN13r;
SPBN19r=7-SBPN19r;

Linda K. Muthen posted on Saturday, August 06, 2016 - 2:11 pm

You can't do this with summary data only with individual data.

Katy Roche posted on Tuesday, September 13, 2016 - 9:47 am

I have a data set that was developed using an intentionally missing data design - I am unable to run IRT successfully due to sparse data coverage. I get these error messages for many variables:

WARNING: THE BIVARIATE TABLE OF DPES2_C AND DPES1_C HAS AN EMPTY CELL.

COMPUTATIONAL PROBLEMS ESTIMATING THE CORRELATION FOR DPES2_C AND DPES1_C.

THE MISSING DATA COVARIANCE COVERAGE FOR THIS PAIR IS ZERO.

I lowered COVERAGE to well below .10 and these errors remain.

Can you help me figure out how to address these?

Bengt O. Muthen posted on Tuesday, September 13, 2016 - 10:08 am

You may need to represent the missing data design by a multiple group analysis.

Kieran Mepham posted on Monday, September 26, 2016 - 6:25 am

I have imported data from SPSS (trialled in both CSV and tab delimited formats), n=590 and 48 variables. When I run any analysis in Mplus 7, it is indicated that there is 1 missing data pattern, while none are missing in SPSS, and none correspond to the value of missing set in Mplus (-99). Using type=basic, it is furthermore indicated that none of the variables in the dataset have missing data patterns, yet a data pattern is missing with a frequency of 590. No covariance coverage is found to be below 1.000.

I am currently working under the assumption that this will not affect my results, but I would like to know if there are specific types of errors in either the dataset or my Mplus input instructions that may have caused this?

Thanks,

Kieran Mepham

Linda K. Muthen posted on Monday, September 26, 2016 - 3:14 pm

The one missing data pattern is a pattern of no missing.

Ebrahim Hamedi posted on Sunday, October 23, 2016 - 11:02 pm

Hi
from the manual: "The default is to estimate the model under missing data theory using all available data".

1- is this another way to say Mplus's default is to use the Full Information Maximum Likelihood Method?

2- is there any occasion that this default is not used in mplus (for example, when using a certain variable type, or a certain estimator, or a certain rate of missingness)?

Many thanks for your answers.

Linda K. Muthen posted on Monday, October 24, 2016 - 6:40 am

1. For maximum likelihood estimators, yes.

2. No.

Laura Groves posted on Monday, November 14, 2016 - 6:58 am

Hello Dr. Muthen,

I'm trying to use LISTWISE=ON in the DATA command as I would like to use an MLM estimation; however, I keep getting the error message "Estimator MLM is only available when LISTWISE=ON is specified in the DATA command. Default estimator will be used". I don't understand why it isn't recognising that I've specified listwise on. Any help would be much appreciated!

Thank you,
Laura

Bengt O. Muthen posted on Monday, November 14, 2016 - 5:10 pm

Please send your output to Support along with your license number.

Sofie Wouters posted on Tuesday, November 15, 2016 - 4:46 am

We have cross-sectional data for which Little's MCAR test divided by its df is somewhat larger than 2, suggesting that our data are NMAR. However, the extent of missingness is only small (between 1 and 2% in a large dataset). Would such a model be estimated in Mplus without being biased? Or would one need auxiliary variables? Can these data be MAR despite the Little test?

Thanks in advance for the feedback!

Sofie Wouters posted on Tuesday, November 15, 2016 - 5:38 am

Taking a second look, we saw that including the mean of two variables as a third variable when calculating the Little test caused the test to be higher than 2 (removing that third variable resulted in a value lower than 2, suggesting MAR). Still, I'm interested in what the options were if the data had been MNAR :-)

Bengt O. Muthen posted on Tuesday, November 15, 2016 - 5:28 pm

Rejecting MCAR still makes it possible that MAR holds.

DavidBoyda posted on Monday, December 19, 2016 - 10:40 am

Apologies, probably not the correct area for this question, however. I am experiencing an odd anomaly in mplus (i think). I have a variable with 42 endorsed Yeses. however when use said variable in a model the univariate proportion and counts shows this:

Y5
Category 1 0.895 212.615
Category 2 0.105 25.010

Whats more, I can visually count 42 in SPSS. Its a binary var. Any ideas? is there a command in Mplus I am missing?

DavidBoyda posted on Monday, December 19, 2016 - 10:56 am

Apologies. Its the weight command.

Bengt O. Muthen posted on Monday, December 19, 2016 - 6:05 pm

Sounds like you are ok then.

Anders Albrechtsen posted on Wednesday, December 21, 2016 - 2:28 am

Dr. Dr. Muthen,

I'm new to Mplus and I'm struggling to do a linear regression with missing data in the independent variables. Unfortunately Mplus performs listwise deletion even when "listwise=on" is not specified. Why is that? As I understand Mplus should handle missing data with FIML by default.

It happens in even the simplest models such as this example:
http://www.ats.ucla.edu/stat/mplus/seminars/Intro_Mplus_74/analyze_74.htm

When I run the same syntax on this dataset Mplus deletes 124 of 200 cases, but in the example it keeps all 200 cases.

I don't understand what I'm doing wrong?

BR,
Anders

Anders Albrechtsen posted on Wednesday, December 21, 2016 - 3:17 am

Update:
I believed I solved the problem by adding the variables with missing values in brackets:

Syntax:
title: Multiple regression with missing data
data:
file is hsbmis2.dat;
variable:
names are
id female race ses hises prog academic read write math science socst hon;
usevariables are write read female math;
Missing are all (-9999);
model:
write on female read math;
[female read math];

Bengt O. Muthen posted on Wednesday, December 21, 2016 - 6:02 pm

Regression analysis is done conditional on the covariates and therefore they cannot have missing data. FIML doesn't change this. One can however extend the model to include the covariates which you have done. All of this is explained in chapter 10 of our new book mentioned on our home page.

Anders Albrechtsen posted on Thursday, December 22, 2016 - 3:20 am

Dear Dr. Muthen,

Thank you for the clarification.

Best regards,
Anders

Elana McDermott posted on Tuesday, January 03, 2017 - 12:13 pm

I am trying to impute data for a multi-group structural equation model but am new to imputation and am having difficulty assessing the most appropriate course of action.

My model uses categorical (dependent) and latent continuous variables (independent, mediator); data are non-normal; and, in addition to the grouping variable, data are clustered (multi-level). Should I use the TYPE = BASIC TWOLEVEL; command in this situation?

In addition, a colleague mentioned predictive mean matching may be appropriate - but was uncertain whether this technique is available in Mplus. Is it, or something similar, available?

Thank you in advance.

Bengt O. Muthen posted on Tuesday, January 03, 2017 - 2:09 pm

Why do you prefer imputation to simply using ML under the standard MAR assumption?

Elana McDermott posted on Tuesday, January 03, 2017 - 2:27 pm

Thank you for the quick response. I am missing data across the indicators of my main independent variable (a latent factor) and in several covariates. By using imputation I was hoping to avoid sample size reduction due to missing on x-variables.

Bengt O. Muthen posted on Tuesday, January 03, 2017 - 5:17 pm

You don't need imputation for missing on x variables. You can "bring the x's into the model" by mentioning their means, variances, or covariances. Missingness will then be accepted on the x's and handled via FIML (we write about this in our new book).

Allison Ross posted on Thursday, January 05, 2017 - 5:55 am

The error message I'm receiving says that there is a non-missing blank at record 1 field 15. I cannot, however, locate any problem with this case. What, specifically, should I be looking for with this error message?

Thank you!

Linda K. Muthen posted on Thursday, January 05, 2017 - 6:10 am

Blanks are not allowed with free format data. Apparently your data set contains a blank at record 1 field 15.

Please send the output, data set, and your license number to support@statmodel.com if you cannot solve the problem.

Anders Albrechtsen posted on Monday, January 09, 2017 - 4:53 am

Dear Dr. Muthen,

I'm doing a multiple regression with data from a P&P survey containing missing data in both the dependent and manifest variables. Data weights have been added to correct for disproportional stratification. Data seems to be MAR.

The model is specified as follows:

Y on X1 X2 (etc.)

Estimator = MLR

Without accounting for sample weights the model and missing data handling works quite nice giving a much better model fit compared to pairwise or listwise deletion. However once sample weights are added to the model standard errors increase significantly resulting in an overall worse fit.

I have estimated the exact same model in SPSS using pairwise deletion with sample weights toggled on and off. The SPSS model with pairwise deletion and sample weights has slightly lower standard errors and a higher R Square (0,724 vs. 0,674) compared to the Mplus MLR model. I find this quite baffling since MLR should at least equal pairwise deletion unless the model and missing data handling have been misspecified.

What would you recommend I do next in order to improve model fit with sample weights turned on?

Best regards,
Anders

Bengt O. Muthen posted on Monday, January 09, 2017 - 1:25 pm

I assume that Y is a latent variable measured by several indicators so that model fit is an issue. Typically pairwise deletion is different and not as good as ML(R) under the MAR assumption. Judging by SEs and R-square is not necessarily the best approach. If you have missing on x's you may want to "bring them into the model" by mentioning their variances. We discuss these missing data matters in Chapter 10 of our new book.

Anders Albrechtsen posted on Monday, January 09, 2017 - 11:51 pm

Dear Dr. Muthen,

Thanks.

Y is not a latent variable, but an overall satisfaction rating from a specific question in the questionnaire, i.e. "Overall, how satisfied are you with this train journey?"

Thus is's a simple multiple regression model:

Y on X1 X2 etc.,

where the X's are specific satisfaction ratings, e.g. satisfaction with train punctuality, cleanliness etc.

The idea is to derive the key satisfaction drivers for a key driver analysis.

I will definitely read chapter 10 once the book arrives in my mail box.

Best regards,
Anders

samah Zakaria Ahmed posted on Sunday, January 22, 2017 - 11:02 am

i have two latent class variables and 10 items (5 items for each latent variable), all items are binary. how can i write the model command to assign the items for each related latent class variable.
i tried to write:
c1 by u1-u5
c2 by v1-v5
but i found warning message

Luke Rapa posted on Friday, February 17, 2017 - 6:49 am

I have a well-fitting measurement & structural model (RMSEA for both is .03 and CFI/TLI are .95 or higher). In both models, all individual indicators are loading significantly as expected and at the .50 level or higher. The structural model is longitudinal and includes multiple waves of data; there are some variables in the model with a high degree of missingness. I’m using MLR due to some non-normality.

When I add auxiliary variables to help address missingness, I find that the signs of certain factor loadings are reversing and paths that were previously significant no longer are so. I am using the following code to declare auxiliary variables:

AUXILIARY = (m) v1 v2 v3;

Is there a problem with this declaration of auxiliaries? If not, is there a reason that various factor loadings would change direction (reverse signs, from positive to negative loadings) and paths would become non-significant with the addition of these auxiliary variables?

Bengt O. Muthen posted on Friday, February 17, 2017 - 2:07 pm

The missing data handling is different as intended so it sounds like the missingness is strongly selective. See also our short course handout and video on this on our website under Topic 4.

Anne Black posted on Wednesday, March 15, 2017 - 6:42 am

Dear Dr. Muthen,
I am estimating a SEM with a categorical outcome, and several categorical covariates with varying amounts of missing data. I am using type=complex and am specifying grouping, weight, stratification, and cluster variables. I was hoping to bring the incomplete covariates into the model and use estimator=Bayes to handle the missing data, but this isn't an option with multiple groups analysis. Can you suggest another way to handle the missingness? Thank you!

Bengt O. Muthen posted on Wednesday, March 15, 2017 - 6:23 pm

That's a hole in what's available. As an alternative that's not optimal you could use ML and treat the covariates as continuous (normal).

Ads posted on Saturday, April 08, 2017 - 8:24 am

I was looking to use dummy coded variables as a part of the auxiliary (m) statement. Is this acceptable practice when all model DVs are continuous?

I would think this meets assumptions of saturated correlates, where all dummy auxiliary variables are correlated with each other as well as with IVs and with residuals of DVs. Correlations between continuous and dummies would be point-biserial, and the correlations among dummies themselves would be phi correlations.

I noticed one suggestion on this thread to use dummies on the auxiliary line, but another post on these forums (http://www.statmodel.com/discussion/messages/22/3457.html?1481589074) mentioned "continuous variables only" in the context of auxiliary variables. However, the original poster was talking about declaring the dummies as categorical, which I presume you wouldn't have to do if using MLR estimation and assuming the dummies function in terms of point-biserial and phi correlations?

Bengt O. Muthen posted on Monday, April 10, 2017 - 7:06 pm

You are right.

Ebrahim Hamedi posted on Tuesday, August 01, 2017 - 4:23 pm

Hi
With a binary outcome with both WLSMV or MLR, I get these warnings:

*** WARNING
Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 206
*** WARNING
Data set contains cases with missing on all variables except x-variables. These cases were not included in the analysis. Number of cases with missing on all variables except x-variables: 80

This is basically leaving out any cases with a single missing value (=listwise). When I use "listwise=on", I get the same final number of observations, without these warnings.

So, with categorical outcomes, there is no way to use FIML instead of listwise?

thanks in advance,
Ebi

Ebrahim Hamedi posted on Tuesday, August 01, 2017 - 4:27 pm

Just wanted to add that I am using "type=twolevel".

Bengt O. Muthen posted on Tuesday, August 01, 2017 - 5:56 pm

I assume you have a single DV in which case this happens - FIML doesn't kick in this univariate response case. You can "bring the x's into the model" by mentioning their variances and "FIML" would be activated because now you have a multivariate response case.

See also chapter 10 of our new book where these issues are described.

Ebrahim Hamedi posted on Tuesday, August 01, 2017 - 8:03 pm

Problem solved. Much appreciated. The whole data seem to be used in the analysis now. however, I still get this warning:

*** WARNING
Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 32

Just to be sure, are these cases with missing on "all" x variables?

cheers

Bengt O. Muthen posted on Wednesday, August 02, 2017 - 4:30 pm

Yes.

Amaia Calderon posted on Wednesday, October 11, 2017 - 12:51 am

Good morning!
I am running a growth mixture model adding some predictors of class membership. Some of these predictors are scales conformed by individual variables with many missing data. I would like to know:
1.- Is it better to do the imputation of the individual variables conforming the scales, or the scale values per se?
2.- Is it better that: (a) I do a multiple imputation of the predictor variables outside MPlus (i.e using STATA) and then use the imputed datasets in MPlus, or (b) would you rather do the imputation in Mplus.
3.- In scenario (a), how can I combine the different STATA datasets to be used in MPlus? Is there any guidance on this in the MPlus guide book?
4.- In scenario (b), where can I find some help to write the code to do the multiple imputation of the predictors in MPlus?

Thanks a lot for your help!

Bengt O. Muthen posted on Wednesday, October 11, 2017 - 2:27 pm

1. That's a research question suitable for SEMNET.

2-4. I would do the imputations in Mplus - see the UG example 11.5.

Christoph Herde posted on Tuesday, January 09, 2018 - 5:22 pm

Dear Prof. Muthen,

I am working with a data set with lots of missings.

After I have done some latent variable analyses, there are some reasons why I would love to generate some manifest scores (simply average some items and use the resulting variable in my model).

How does Mplus work with missings when I define/compute a new variable?

Does Mplus first apply FIML and then computes the variable based upon a "full" data set? If not, is it possible to do this in any way?

I have checked the user manual and this forum, but was unable to find an answer to my question.

I would highly appreciate a short reply.
Thanks in advance!

Bengt O. Muthen posted on Tuesday, January 09, 2018 - 5:33 pm

Do you mean creating the average of observed variables in the DEFINE command?

Christoph Herde posted on Tuesday, January 09, 2018 - 5:39 pm

yes, indeed.
Or is there another way how I may simply get a manifest average score across items that computes the average based upon a "complete" data set?

Bengt O. Muthen posted on Tuesday, January 09, 2018 - 5:59 pm

These Define rules are described in the V8 UG on pages 643 and 644. The mean uses those items that don't have missing. The sum deletes subject with missing on any item (I think I got that right). FIML is not applied because Define is not part of the Model command.

Daniel Leopold posted on Monday, January 15, 2018 - 3:25 pm

Is there any way to calculate or obtain summary statistics (i.e., descriptives, such as mean and SD) for the covariance coverage matrix provided in the Mplus output?

It's reasonable to do this by hand/Excel for small models, but I'm hoping to provide this data for multiple, large longitudinal models.

Many thanks,
Dan

Bengt O. Muthen posted on Monday, January 15, 2018 - 3:56 pm

No. We haven't had such a request.

Erik Ruzek posted on Wednesday, February 14, 2018 - 8:42 am

I have seen varying advice about bringing x-variables into the model for FIML estimation. The two options I've seen discussed are as follows:

1) Include the means or variances of x-variables in the MODEL statement. The implication is that doing so also introduces the covariances into the estimation. However, covariances are not reported.

2) Include the covariances, in which case the output shows both the variances and covariances. The number of parameters increases with this option.

Which of these options is the proper way to add X-variables into the model?

Thank you.

Bengt O. Muthen posted on Wednesday, February 14, 2018 - 4:26 pm

Alt 2) is obtained if you bring all x's into the model you will get them correlated by default. No x's should be uncorrelated.

Erik Ruzek posted on Wednesday, February 14, 2018 - 7:24 pm

Thank you, Bengt. Is there any advantage to specifying the covariances of the x-variables in the MODEL statement?

Bengt O. Muthen posted on Thursday, February 15, 2018 - 4:33 pm

I just think it is a good default approach. We seldom have theories about zero x-variable correlations - most theories are about y's as a function of x's. If you mis-specify and don't include an x-covariance where there should be one, you get misfit and perhaps distorted results.

Erik Ruzek posted on Friday, February 16, 2018 - 9:05 am

Bengt, I see your point. May I clarify just a bit further? Imagine I am running a regression with multiple predictors and want to use FIML to recover missing data and I do not add the covariances to the MODEL statement. If I am interested in the effect of a particular X on Y, can I no longer assume that I am holding all other Xs constant?

Bengt O. Muthen posted on Friday, February 16, 2018 - 6:10 pm

Yes, you can assume that - but it is an unusual regression model where some x's are specified to be uncorrelated.

Erik Ruzek posted on Saturday, February 17, 2018 - 8:39 am

I think one more clarification will settle this for me for good. Even if I have no missing data on my predictors, and I run the following model:

Y ON X1 X2 X3;

Would you still suggest that I add 'X1-X3 WITH x1-x3' to account for any correlation among the predictors?

Bengt O. Muthen posted on Saturday, February 17, 2018 - 11:58 am

No. When you don't say anything about ("mention") the parameters of the marginal distribution of the x's (that is, you don't say x1-x3; or [x1-x3]; or x1-x3 with x1-x3;), they are correlated as the default - it's just that these parameters are not estimated; just like regular regression.

But when you don't mention the x's, subjects with missing on any x will be deleted.

Xinyu Ni posted on Monday, April 02, 2018 - 12:04 pm

Hi, I am having difficulties in checking missing patterns in saved data. I use Patterns in the output to ask Mplus to analyze missing patterns for all the indicators for a latent class growth modeling. Currently, there are 58 patterns and several patterns have large frequencies. I was wondering how I can check those special patterns in the data. Whether mplus can create a flag variable to store the different types of missing patterns for each case and the flag variable will be saved in the savedata function part. Thanks

Bengt O. Muthen posted on Monday, April 02, 2018 - 3:14 pm

There is not an option to do this.

carlo di chiacchio posted on Monday, April 23, 2018 - 6:25 am

Dear Prof. Muthen,

I ran a path model on a complex survey data set.
I specified a multivariate normal distribution among dependent and indpendent variables as to use all the information in the data (FIML).
Looking at the output, MPlus warns me that 25 cases were deleted because of missing data on all the variables except for one of the dependent variables.
Successively I ran a multiple group path model (boys vs girls) with the same specification above. This time no warning in the output.
Since the results of the indirect effects were different comparing the overall model with the multiple group model, my question is: why in the overall model the 25 cases were deleted from the analysis and, in contrast, in the multigroup analysis all the cases were used?

Thank you so much for your help.
Carlo

Bengt O. Muthen posted on Monday, April 23, 2018 - 4:45 pm

Please send both outputs to Support along with your license number.

Mengting Li posted on Monday, May 28, 2018 - 4:58 am

Hi,
I did not find the code of MCAR test in the user's guide, could you please tell me how to write the code?
Thank you so much for your help!

Bengt O. Muthen posted on Monday, May 28, 2018 - 5:40 pm

We don't have a special code for this.

Evelyn Tan posted on Tuesday, May 29, 2018 - 7:27 pm

Dear Prof. Muthen,
I’m working with a dataset with some missing data (no more than 20%), and would like to report on the descriptives and frequencies of the continuous and categorical variables. I would like to use FIML to deal with the missing data.
I can find the descriptives for the imputed continuous variables are reported in “ESTIMATED SAMPLE STATISTICS”, however I can’t seem to get the frequencies for the imputed categorical variables (can only find these for the original dataset under UNIVARIATE SAMPLE STATISTICS).
I’m a relatively new user to MPlus, and have tried reading the user manual and this forum, but can’t seem to find an answer. I’m hoping you might be able to point me in the right direction?
Many thanks,
Evelyn

Linda K. Muthen posted on Wednesday, May 30, 2018 - 2:30 pm

Please send your output and license number to support@statmodel.com.

Ellen Houben posted on Tuesday, June 26, 2018 - 4:28 am

Dear dr. Muthén,

I performed SEM using ML estimation. as Mplus estimates certain missings, I added the information of all people who started filling in the questionnaire. I got the following warnings:

*** WARNING
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 498

This first one is logic, als it is the exact number of control variables that were missing. But the following warnings, I cannot make sense of them.

*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 23
*** WARNING
Data set contains cases with missing on all variables except
x-variables. These cases were not included in the analysis.
Number of cases with missing on all variables except x-variables: 182

As I wish to use the descriptive statistics using SPSS I would like to know exactly what my sample is.. Can you help me?

Bengt O. Muthen posted on Tuesday, June 26, 2018 - 3:20 pm

Try Type=Basic.

Snigdha Dutta posted on Tuesday, August 21, 2018 - 8:20 am

I wanted to clarify:
If my data is multivariate non-normal, continuous, and with missing data, I have to first check if it is MAR, MCAR, nonignorable.

Then run the model using ESTIMATOR IS FIML?

Bengt O. Muthen posted on Tuesday, August 21, 2018 - 6:02 pm

Q1: No. And it isn't possible to determine if the data are MAR or NMAR.

Q2: Just do this (which is assuming MAR, which is probably the best you can do in most cases).

Snigdha Dutta posted on Wednesday, August 22, 2018 - 3:22 am

Thank you for your reply.
In case of non-normal continuous data with missingness, is FIML preferred or MLR?

Bengt O. Muthen posted on Wednesday, August 22, 2018 - 4:44 pm

MLR.

Shaljan Areepattamannil posted on Monday, September 24, 2018 - 10:55 am

I am running a three-level regression model. Data set contains cases with missing values. I am using FIML to handle missing data. However, I get the following warnings:

*** WARNING
Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 104633
Data set contains cases with missing on all variables except x-variables. These cases were not included in the analysis. Number of cases with missing on all variables except x-variables: 7671

I would like to include these cases in the analysis. What should I specify in the Mplus syntax? Thank you in advance.

Bengt O. Muthen posted on Monday, September 24, 2018 - 4:49 pm

You can avoid this by including the x's in the model - mention their variances. Pros and cons of this are described in Chapter 10 of our RMA book.

Shaljan Areepattamannil posted on Thursday, September 27, 2018 - 9:51 pm

The three-level data set contains 48 level-3 clusters. When I run the three level regression model using MLR, Mplus includes only 46 level-3 clusters. Would it be possible to know the the clusters that Mplus excluded from the analysis? Thank you.

Bengt O. Muthen posted on Friday, September 28, 2018 - 11:10 am

Please send your full output and data to
Support along with your license number.

Pia Kreijkes posted on Friday, November 09, 2018 - 4:27 am

Is there a way to impute missing data using expectation maximisation? I couldn't find anything in the user guide or on the discussion board except for a post that is more than 10 years old. I performed MI before but now decided to use EM after all (because of several reasons, including very small proportion of missing data). I have complex data (two levels) and most variables are ordinal and quite skewed.
Many thanks

Bengt O. Muthen posted on Friday, November 09, 2018 - 1:05 pm

When you say EM I think you are referring to ML under MAR (which some call FIML). ML does not impute data but estimates model parameters (in the H0 case) or means, variances, and covariances (in the H1 case) that take into account the missing data.

Pia Kreijkes posted on Friday, November 09, 2018 - 2:26 pm

Thanks for your prompt reply. However, I actually mean the expectation maximisation algorithm to impute missing values (for instance available in SPSS) so that I can analyse a complete data set. I did not want to use FIML because of the nature of my data. My variables are measured on a 5-point Likert-type scale and responses often fall on the extreme ends. In addition, I'd like to use all my items for the imputation, not only those in the model I'm estimating at a given point, which I think would be the case with FIML. Amongst others, I want to conduct ESEM afterwards for which I am planning to use the WLSMV estimator. Hence, I want the complete data set before doing any analyses.

Tihomir Asparouhov posted on Friday, November 09, 2018 - 3:18 pm

The EM algorithm applies only to continuous items - not categorical. Mplus uses the standard Bayes estimation for imputing data. See Mplus User's Guide example 11.5 for the syntax. This can be used for imputing categorical items as well and it would be our recommendation for your situation. If you need more information on the Mplus imputation method see
http://statmodel.com/download/Imputations7.pdf

Pia Kreijkes posted on Tuesday, November 13, 2018 - 11:40 pm

Ah great that you made me aware of that! For some reason I assumed that EM is suitable for categorical data. Could I then use Bayes estimation as described in the UG and only impute a single data set rather than multiple? My issue is that I will later do some analyses in Mplus that cannot handle multiple data sets, which is why I decided not to go that route. My overall proportion of missing data is below 2% which is why I thought single imputation would be appropriate and should not consequentially underestimate SEs.

Bengt O. Muthen posted on Wednesday, November 14, 2018 - 4:49 pm

Imputation is not needed. You can use ML (FIML) - it is available also for categorical items. Or use Bayes.

Pia Kreijkes posted on Thursday, November 15, 2018 - 12:15 am

Thank you. I have some follow up questions.

1. If I use estimator=MLR and specify my data as categorical, no fit indices are provided. Is that also the case for Bayes? In that case, one option might be not to specify my data as categorical because I have 5 response categories, although the distributions are quite asymmetric.
2. In the FAQ Estimator choices with categorical outcomes, you write that “Bayes and maximum-likelihood are asymptotically equivalent when non-informative priors are used for Bayes”. I’m not very familiar with Bayesian approaches, so I’d like to know what it means to use non-informative priors. When I specify a model, is that an informative prior?
3. Is Bayes still not available for ESEM or can it be used by now?
4. Lastly, just to clarify ones more, imputing only one data set using Bayes and then using the WLSMV estimator is inappropriate?

Bengt O. Muthen posted on Thursday, November 15, 2018 - 2:58 pm

1. Try Bayes - it gives PPP with categorical outcomes (see our Bayes papers about it and its low power).

2. The default of Mplus is to use non-informative priors. To quickly learn about Bayes, listen to our Short Course Topic 11 section on Bayes - it also exists in YouTube clips (see our home page about the Mplus YouTube channel).

3. Bayes is available for EFA, not for ESEM.

4. Right - because you don't get the right SEs or chi-square test of fit from only one imputation - the uncertainty in the missing data values is not fully represented.

Zehua Cui posted on Wednesday, January 16, 2019 - 7:12 am

Hi Dr.Múthen,
I am a graduate student learning Mplus, could you help me with some of my confusions about mentioning variables in the model command to deal with missing data?
1) Do I need to mention latent predictor variables? Or do I need to mention their indicators?
2) For a simple moderation, do I also need to mention my moderator as well as the interaction term in the model?
3) Also, majority of my predictor/control variables are continuous but some of them are binary such as gender. Can I mention them all together in the model command? Will there be a problem? I realize that when I bring gender and other continuous variables together in the model command, I will get a message saying “THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS .....PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 24, W1UPS_T”
However, when I take gender out, the message will be gone. Also, I realize that when I add bootstrap in the analysis command, the error message is also gone. Do you have some advice why is that?
4) Do you have any suggestions as to what should I do when I have both categorical and continuous variables as x variables? (below is my model command in the syntax, is it correct?)

Model:
[Gender W1COP W1RSA W1UPS_T INT];

W2_Aggr on Gender W1UPS_T W1RSA W1COP INT;

Bengt O. Muthen posted on Wednesday, January 16, 2019 - 2:06 pm

1) No on both.

2) Yes

3) Q1: Yes. Q2: No. Ignore the error message in this case.

4) Your setup is fine. For much more on this, see Chapter 10 of our RMA book.

Alice posted on Tuesday, January 22, 2019 - 6:17 am

Dear Prof. Muthen,

I wonder how a correlation is calculated between two variables in MPLUS when the data missing command is applied to one of the variables?

What I am after is just the formula in order to understand the correlation results when one variable has some missing values, but the other one doesn't.

Bengt O. Muthen posted on Tuesday, January 22, 2019 - 2:37 pm

See our Short Course Topic 4 video and handout, slides 173-176 or chapter 10 of our RMA book - or the Little & Rubin missing data book.

Hye Jeong Choi posted on Tuesday, March 12, 2019 - 1:43 pm

I used Proc MI to created 10 imputed dataset and ran path models using Mplus. It ran well without any warning and with 10 replications (both requested and completed) However, output showed "rate of missing" (e.g.,0.275) I can not understand how come I have rate of missingness with imputed dataset. I checked my dataset and they looked all imputed.

I also check Mplus and it showed there is no missing across variables.
(e.g., one missing pattern and all variables= X (=not missing) One pattern of missing and frequency was my N.

Why I am having "rate of missing" other than 0?

Tihomir Asparouhov posted on Wednesday, March 13, 2019 - 9:09 am

Take a look at the FAQ on this topic

https://www.statmodel.com/download/Missingness%20fraction.pdf

Simon Coulombe posted on Monday, May 13, 2019 - 5:30 pm

Hi,
I'm testing a model with most of its endogenous variables being binary.
-Is WLSM or WLSMV most appropriate? Note: I can test the model with a continuous OR a binary outcome.
-How are these two estimators taking care of missing values (FIML?)
- Are these two estimators robust to non-normality of the few continuous variables included in the model?
Thank you
Simon

Bengt O. Muthen posted on Tuesday, May 14, 2019 - 4:48 pm

Use MLR if you can - it provides "FIML" and is robust to non-normality for the cont's outcomes.

Simon Coulombe posted on Thursday, May 16, 2019 - 6:27 am

I've used WLSMV so far. To what extent is it robust to missing values?
Simon

Bengt O. Muthen posted on Thursday, May 16, 2019 - 4:49 pm

Read all about it in the FAQ on our website "Estimator choices with categorical outcomes":

http://www.statmodel.com/download/EstimatorChoices.pdf

carlo di chiacchio posted on Friday, May 31, 2019 - 2:23 am

Good morning.

I'm trying to estimate a SEM model with latent variables, both dependent and independent.
Some of the observed variables reflecting the latent constructs contain missing data. I would like to manage this missingness.
Usually, when I run path models, I repeat the list of variables below the MODEL command, indicating a Multivariate Normal Distribution. FIML is the default estimator.
Which procedure do I have to do in the case of latent model as to manage missing values?

Thank you so much for your help

Bengt O. Muthen posted on Sunday, June 02, 2019 - 11:15 am

Use the same procedure.

Note, however, that it is not innocuous to bring in covariates into the model and add normality assumptions, especially not with a large proportion missing and binary covariates. See our RMA book.

Richard E. Zinbarg posted on Monday, July 15, 2019 - 12:13 pm

I have a CFA model for indicators with a small number of response options and WLSMV provides better fit than ML. I also have missing data though so am unsure how much to weigh the better fit provided by WLSMV versus the missing data handling advantage of ML. Any advice on how to weigh these two considerations? Thanks in advance for any help you can provide.

Bengt O. Muthen posted on Monday, July 15, 2019 - 5:02 pm

When you say ML here, do you mean treating the variables as continuous? I assume so (but note that ML can handle categorical DVs too).

It all depends on the distribution of the indicators - you get a bigger difference in fit if there are floor or ceiling effects.

Robert Archer posted on Tuesday, July 16, 2019 - 3:43 pm

Hello Dr. Bengt and Linda Muthen, I am attempting to complete example 11.2 of the Mplus user guide (descriptive statistics for missing data). One issue I am running into is the following error message:
*** ERROR
One or more variables have a variance of zero.
Check your data and format statement.

The output shows this is one of the binary dropout variables. Any recommendations on how to solve this? Thank you very much.

Bengt O. Muthen posted on Tuesday, July 16, 2019 - 6:26 pm

Please send your full output and the data you used to Support along with your license number.

Richard E. Zinbarg posted on Wednesday, July 17, 2019 - 11:39 am

thanks Bengt. Yes, when I said ML I meant treating the variables as continuous. It sounds like we should try designating the items as categorical but specify that ML be used for the estimator? Whenever designating items as categorical, I've always used the Mplus default of WLSMV up until now.

Bengt O. Muthen posted on Wednesday, July 17, 2019 - 5:26 pm

Q1: Yes, this is the preferred method for handling missing data (or use Bayes).

See also the estimator summary in the FAQ on our website:

Estimator choices with categorical outcomes

Frederick Anyan posted on Friday, October 18, 2019 - 11:14 am

I understand that missing by design can safely be assumed as MCAR. Then I
have a data set where "Depression" is measured on six occasions and "Stress"
is measure on three occasions (ie., from 4th to 6th occasion). Can I use the
data set to estimate
(1) Parallel process LGM for Depression and stress
(2) Parallel process LGM for depression and stress with covariates
(time-varying or invariant?)
(3) LGM for depression and regress stress as time varying covariate on
corresponding measurement occasions (ie., 4th to 6th occasions)

1)
ANALYSIS:
TYPE = MISSING;
ESTIMATOR = MLR;

MODEL:
I1 S1|y1@0 y2@1 y3@2 y4@3 y5@4 y6@5;
I2 S2|s1@0 s2@1 s3@2 s4@3 s5@4 s6@5;
S2 ON I1;
S1 ON 12;

2)
ANALYSIS:
TYPE = MISSING;
ESTIMATOR = MLR;
MODEL
I1 S1|y1@0 y2@1 y3@2 y4@3 y5@4 y6@5;
I2 S2|s1@0 s2@1 s3@2 s4@3 s5@4 s6@5;
S2 ON I1;
S1 ON 12;
I1 S1 I2 S2 ON gender educ;

y1 ON z1;
y2 ON z2;
y3 ON z3;
y4 ON z4;
y5 ON z5;
y6 ON z6;

s4 ON z4;
s5 ON z5;
s6 ON z6;

3)
ANALYSIS:
TYPE = MISSING;
ESTIMATOR = MLR;

MODEL:
I1 S1|y1@0 y2@1 y3@2 y4@3 y5@4 y6@5;

I1 S1 ON gender educ;

y4 ON s4;
y5 ON s5;
y6 ON s6;

I would be grateful for your advice. Thank you.

Bengt O. Muthen posted on Friday, October 18, 2019 - 3:36 pm

I don't think you gain anything by specifying all 6 occasions for Stress; it doesn't bring in more information from the data. Just have

i2 s2 | s4@0 s5@1 s6@2;

Also, s1 ON i2 does not correspond to the time ordering because s1 includes the change from time 1 to time 2, so before i2 has a measured counterpart.

Frederick Anyan posted on Saturday, October 19, 2019 - 1:34 am

Right. Thank you

Jill Rabinowitz posted on Saturday, December 28, 2019 - 3:47 pm

I have a question regarding estimating missing data in continuous covariates. Above you indicate, "You can "bring the x's into the model" by mentioning their means, variances, or covariances." It's not clear to me how you mention covariate means, variances, or covariances. Could you point me to an mplus input or output example where this is done?

Linda K. Muthen posted on Sunday, December 29, 2019 - 10:34 am

For means, see pages 729-30 of the user's guide; for variances, see page 728; for covariances, see pages 726-27 .

Lisa M. Yarnell posted on Wednesday, February 12, 2020 - 10:57 am

I am interested in using the IMPUTE command, and am deciding about the options explained in V8 UG p. 576 (chapter 15). Can you clarify what exactly is "restricted" in the H0 option?

I would like to understand the distinction between restricted and unrestricted models, in the Mplus sense of these terms.

Lisa M. Yarnell posted on Wednesday, February 12, 2020 - 11:22 am

Essentially, I believe that H0 is a hypothesized model, while H1 is a null model. Is that right?

In that sense this implies, based on Mplus estimation, that for example, covariances among exogenous variables are NOT part of he H0 restricted model; while all such parameters WOULD be part of of the H1 unrestricted model, in estimation and output if requested. The estimated parameters for the H0 model would thus depend on the model itself.

Is this correct?

Tihomir Asparouhov posted on Wednesday, February 12, 2020 - 5:14 pm

The above would be correct if you replace "exogenous" with "endogenous". Exogenous variables are correlated for both H0 and H1 models. With the H0 option the data is imputed from the estimated model. It can be any model. The model must be specified in the input file by you (just like any other Mplus estimation). You might find this helpful

http://www.statmodel.com/download/Imputations7.pdf

Lisa M. Yarnell posted on Thursday, February 13, 2020 - 7:13 am

Hi Tihmoir, can you clarify your statement that exogenous variables ARE correlated in the H0 restricted model?

V8 UG p. 51 reads:
"Following are the default settings for covariances/residual covariances:
- Covariances among observed independent variables are NOT part of the model. The model is estimated conditioned on the observed independent variables."

I had discussed recently with Kris Preacher that this is an aspect specific to Mplus; whereas in some other SEM programs, these covariances ARE part of such an H0 model (e.g, in LISREL).

Katy Roche posted on Thursday, February 13, 2020 - 7:27 am

We are running a model with missing data and bringing in X variances into the model. However, we shoudl have n = 547 but only have n = 490. Unclear why it is not including all observations

Bengt O. Muthen posted on Thursday, February 13, 2020 - 7:46 am

Answer for Yarnell:

See our FAQ: Covariates – analysis conditioned on covariates

Lisa M. Yarnell posted on Thursday, February 13, 2020 - 9:03 am

Hello, I notice that in V8 UG ex. 11.7, the DATA IMPUTATION command is used, but the IMPUTE option is not. Why not?

It seems that imputation IS done for variables in the H0 model. So, why mot use the IMPUTE option? The UG reads "When the IMPUTE option is not used, no imputation of missing data for the analysis variables is done." Does its omission mean that the were no missing data, in original data set analyzed for the H0 model, for the variables in the H0 model?

Katy Roche posted on Thursday, February 13, 2020 - 10:54 am

For the model that is not including all 547 observtions (just 490)...We tried mentioning the variances in the model command (right under our regression statement), but it is still deleting our observations. We then tried mentioning the means, but it still won’t work.

Bengt O. Muthen posted on Thursday, February 13, 2020 - 2:09 pm

Answer for Roche:

We need to see the outputs - send along with data to Support with your license number.

Bengt O. Muthen posted on Thursday, February 13, 2020 - 2:19 pm

Answer for Yarnell:

The intro to ex11.7 says that the aim is to obtain "plausible values". These values are what is commonly referred to as factor scores. Unlike regular factor scores, each individual gets a set of scores (20 in this case). So no imputation of missing data on the observed variables is attempted; we are only concerned with the factors.

Lisa M. Yarnell posted on Wednesday, February 19, 2020 - 8:27 am

Hello! I tried imputing data using the IMPUTATION command, but since the values I intended to impute were y variables in the H0 model, the cases were dropped, instead of having their y value imputed. This is not what I wanted.

Why the cases were dropped is not clear to me -- I have previously seen that cases that are missing x variables can be dropped; I have not seen cases dropped based on missing their y variables.

E.g., p. 443 of the V8 UG reads: "In all models, observations with missing data on covariates are deleted because models are estimated conditional on the covariates."

How can I impute into these missing y values instead of having the cases dropped? Should I send my INP and OUT files?

Brandon Goldstein posted on Thursday, February 20, 2020 - 8:56 am

Hello,
I have a simple missing data question. I am interested in running a multiple regression, but I am missing on some of my x variables. I noticed that when using FIML, you can added correlations among the residuals and this will allow the x variables to be estimated. Is this an acceptable approach?

Bengt O. Muthen posted on Thursday, February 20, 2020 - 4:46 pm

Answer to Yarnell:

Yes, send to Support so we can see what you are doing.

Bengt O. Muthen posted on Thursday, February 20, 2020 - 4:49 pm

Answer to Goldstein:

You can "bring the x variables into the model" by saying e.g.

x1-x5 with x1-x5;

Then subjects with missing on an x won't be deleted but FIML is used. This is not innocuous, however, especially with binary x's as we explain in our RMA book - see also our Short Course Topic 11 video and handout and also YouTube video.

Stefania Pagani posted on Tuesday, April 07, 2020 - 2:32 am

Hello,

I have a dataset with a total of 2079 participants. The outcome variable reduces this to 891 as only this amount of participants had the opportunity to intervene. So the missing data is not missing at random. I have ran two path analyses which include mediators and moderators, one with the whole dataset, and one where I have deleted the participants that did not have the opportunity to intervene. The estimates in the output for both datasets are quite different.

My question comes back to how MPlus is dealing with this missing data, and whether it is more appropriate for me to use the full dataset or the trimmed one to interpret my results?

Thank you in advance for your help.

Bengt O. Muthen posted on Tuesday, April 07, 2020 - 3:13 pm

It sounds like you have a lot of missing on Y but not on M. If so, you should use the full data set because M (and X) can then serve as predictors of missing on Y in the usual "FIML" way (that is, assuming MAR). We describe this in our RMA book chapter 10.

Stefania Pagani posted on Tuesday, April 07, 2020 - 11:45 pm

Hello Bengt,

Thank you for your quick response. That is correct re a lot of data missing on Y but not M. However, the data that is missing on Y is not missing at random (NMAR) because of the nature of the measure (it is a proportion score for bystander intervention behaviour but is missing for those who did not report the opportunity to intervene). Would it therefore be the case that the trimmed dataset is better to use in this instance?

Thank you.

Bengt O. Muthen posted on Wednesday, April 08, 2020 - 4:17 pm

That's ok but then your inference is to the population of those who did not report the opportunity to intervene.

Stefania Pagani posted on Thursday, April 09, 2020 - 7:00 am

That makes sense. Thank you very much for your help.

Hannah M. Loso posted on Friday, April 10, 2020 - 9:36 am

Dear Drs Muthen and Muthen,

I am having some trouble with my analyses. I am doing a nested path analysis and was getting the error "data is missing on the x-axis" in order to remedy this I mentioned the variances of the variables in the model command. After I did so my model fit drastically changed and has pretty poor fit. Do you have any suggestions on how I might fix this or what might be happening? Thank you so much.

Bengt O. Muthen posted on Saturday, April 11, 2020 - 5:42 pm

We need to see the specifics of your case to say - send both outputs to Support@statmodel.com along with your license number.

Rebecca Lazarides posted on Wednesday, April 15, 2020 - 1:25 pm

Hello,
I am modelling a the three step approach (manual BCH estimation) described by Asparouhov and Muthén 2014 (Regression auxiliary model combined with latent class; section 3.2, step 2)

In step 2, the following error message is shown: "***WARNING Data set contains cases with missing on all variables except x-variables. These cases were not included in the analysis. Number of cases with missing on all variables except x-variables: 506"

I cannot include the x-variables in the model - when addressing the variance or means of the x-variables the model does not run and the following error message is shown:

"*** ERROR
The following MODEL statements are ignored:
* Statements in the OVERALL class:
C#1 ON V1013
C#1 ON ZMATH10
C#2 ON V1013
C#2 ON ZMATH10
C#3 ON V1013
C#3 ON ZMATH10
*** ERROR
One or more MODEL statements were ignored. These statements may be
incorrect or are only supported by ALGORITHM=INTEGRATION."

Is there a way to use all data - including those cases with missing on y?

Bengt O. Muthen posted on Wednesday, April 15, 2020 - 3:49 pm

We need to see your full output - send to Support@statmodel.com along with your license number.