Mplus Discussion >> Missing on x-variables

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Missing on x-variables

Mplus Discussion > Missing Data Modeling >

Message/Author

Jiyoung posted on Friday, June 26, 2009 - 2:10 am

I added control variables to my model. I got the following message after the addition. I wanted to use the full information. Therefore, I defined the all of the variables with missing data. It seems that the program deleted cases using listwise deletiion. With the following warning, I could not see the model estimation.

Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 94
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

What can I do to see the results of the model estimation?

Linda K. Muthen posted on Friday, June 26, 2009 - 6:35 am

The only way to avoid the listwise deletion of covariates is to bring them into the model as dependent variables. You can do this by mentioning their variances in the MODEL command. You then make distributional assumptions about them.

nina chien posted on Friday, September 04, 2009 - 1:58 pm

To avoid listwise deletion of covariates, I added the covariates into the model by mentioning their variances. This resulted in a series of WARNING statements that some of my covariates are categorical (which they are): “WARNING: VARIABLE GENDERR MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.”

So, I tried stating covariates as categorical, but this resulted in error statements that the covariates are not DV’s: “ERROR in VARIABLE command. CATEGORICAL option is used for dependent variables only. GENDERR is not a dependent variable.”

I'm very confused because the WARNING and ERROR messages are contradictory. Can I ignore the warning statements?

Also, the model where covariates were declared as continuous did not converge. Is this related to the WARNING?

Thanks so much.

Linda K. Muthen posted on Friday, September 04, 2009 - 4:46 pm

You should not put covariates on the CATEGORICAL list. The message comes about because the mean and variance of a dichotomous variable are not independent of each other. I would need to see the full output and your license number at support@statmodel.com to say if this is related to convergence.

Yvonne Terry-McElrath posted on Tuesday, April 05, 2011 - 6:50 pm

Drs. Muthen -

I have been asked by a reviewer to provide a reference for exactly what Mplus is doing when bringing covariates with missing data into the model as dependent variables by steps such as mentioning their variances in the MODEL command (and thus making distributional assumptions about them). If you have any recommendations for this, I would greatly appreciate it.

Many thanks.

Linda K. Muthen posted on Wednesday, April 06, 2011 - 10:24 am

I don't know of a reference to this exactly. When you bring the x's into the model, multivariate normality is assumed for the x's and all continuous y's. This assumption is discussed in any structural equation modeling book.

Cheung hoi Shan posted on Monday, July 16, 2012 - 6:12 pm

I ran a path analysis and one of the independent variables has missing data on several cases. Apparently listwise deletion was used. Is there any way I can avoid this? Thanks.

Linda K. Muthen posted on Tuesday, July 17, 2012 - 7:45 am

Missing data theory does not apply to observed exogenous variables. The model is estimated conditioned on them. You can bring all of the covariates into the model by mentioning their variances in the MODEL command. Then they will be treated as dependent variables and distributional assumptions will be made about them.

Cheung hoi Shan posted on Tuesday, August 14, 2012 - 7:43 pm

Thanks Linda. I have run the analysis, and realise that if I bring the exogenous variable with missing data into the model, df becomes higher. I am wondering if this would pose any issue with explaining to readers how the model was specified. How do people usually justify the use of this method?

Linda K. Muthen posted on Wednesday, August 15, 2012 - 6:19 am

Your degrees of freedom should not change if you put no restrictions on the x's. Please send the output and your license number to support@statmodel.com so I can see what you did.

radanielina-hita marie louise posted on Saturday, June 15, 2013 - 7:59 am

Dear Dr. Muthen

I am running a mediational model with two covariates (relationship status and greek membership). I have missing values on these. I specified type=missing with MLR. Then I entered the covariates in the model command and Mplus uses the whole sample. I understand that missing data theory does not apply to exogenous observed variables and when I do not include them in the model, there was no substantive changes except in one of my hypothesized effect. I want to keep all sample but I am wondering what happened here that brought the change? Can I trust the output with the observed variables included in the model

Thanks

Linda K. Muthen posted on Sunday, June 16, 2013 - 3:46 pm

When you mention the means, variances, or covariances of the exogenous variables in the MODEL command, they are treated as dependent variables and distributional assumptions are made about them. This can change the results if not all variables are continuous. See the Version 6.1 version history on the website for a description.

wahideh Achbari posted on Wednesday, January 22, 2014 - 2:19 am

Dear Dr Muthen

When I am running a one-factor CFA model with continuous data, I get the following error. "Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 2"

Previously, I have run the same model without getting the error message.

I have checked whether the error is not due to misreading the input file, but still cannot find its source.

Thank you.

Linda K. Muthen posted on Wednesday, January 22, 2014 - 6:20 am

Please send the outputs where you get the message and where you don't get the message along with your license number to support@statmodel.com.

Katharina Groß posted on Monday, February 24, 2014 - 11:34 am

Dear Mplus team,

I have several questions regarding logistic regression with missing values. My data set contains 1000 cases, 300 of them with missing values. y and x1 are both binary and have missing values; x2-x4 are continuous and completely observed. I want Mplus to use all cases, including those with missing values. In order to do that I mentioned the variance of x1 and ran the following model:

ANALYSIS:
estimator = ml;
integration = montecarlo;
MODEL:
y on x1 x2 x3 x4;
x1;

My questions are as follows:

1. When I mention the variance of x1 Mplus uses 950 of 1000 cases. If I specify one further independent variable (no matter which one), all cases are included. What I don’t understand is that the coefficients of the model vary depending on which variance I choose to mention. This observation confuses me and I am not sure which variance I should mention.

2. A more general question concerns predicting probabilites. Is it right that it is not possible to predict probabilities when there are missing values on an independent variable using ML estimation?

3. Is the given R square the McKelvey & Zavoina’s R square – and can I trust the value although I have missing values on the independent variable or does this bias the R square-computation (as I would guess)?

I would appreciate any help.

Linda K. Muthen posted on Tuesday, February 25, 2014 - 1:25 pm

1. You should include the variances of all of the covariates or none of the covariates. When you include, for example, y1 and y2, they are correlated and y3 and y4 are correlated because the model is estimated conditional on them. But there are no correlations between y1, y2 and y3, y4. The zero correlations vary depending on which variances you inlcude in the model.

2. I don't think this can be done.

3. Yes.

Hanna Esser posted on Wednesday, September 03, 2014 - 6:13 am

Dear Mplus team,

I have a question regarding missing values in my path model.

Variables x1-x6 are exogenous. Four of them are binary (gender and yes/no variables), two are continuous.
Variables y1-y7 are endogenous. y1-y6 are ordinal, y7 is binary (yes/no).
y1-y6 are also exogenous, because I compute regressions on y7.

My current sample size is 4009 and I would like to use the whole sample for my analysis. Unfortunately, there are missing data in variables x2-x6. I assume that they are MCAT. When I do the analysis without extra assumptions, there are 1031 cases with missing on x-variables which get excluded from my analysis. Most of the missing data are on the variable “income” because many refused to give this information and on the variable “Parents without qualification” (yes/no).

By reading old discussions I found out that I when I mention the variances no listwise deletion happens. This is the case in my analysis; it includes all 4009 cases when I do this, but I get the warning that my exogenous variables x1-x4 are dichotomous but declared as continuous (which is true).

What can I do to include all cases and avoid the listwise deletion of 25% of my cases?

Thank you

Bengt O. Muthen posted on Wednesday, September 03, 2014 - 3:18 pm

You can do what you did and ignore the warning.

Shiny7 posted on Monday, September 29, 2014 - 3:37 am

Dear Drs. Muthen,

I ran a multilevel model with 7 x-Variables and one continous outcome using MLR.

As known, Mplus does not include cases with missings on all x-Variables, which in my case are many, many cases.

Am I right that the only solution is to mention the variances of the x-Variables in the model command? Although the assumption that my data are multivariate normally distributed is in fact not given (!) (using MLR therefore).

The problem is furhter, that I have only 21 clusters, and if i take the variances of the x-variables into the model, the number of paramters exceeds the number of clusters which is also a known problem.

Can you please help solving that problem?

Best regards
Shiny

Linda K. Muthen posted on Monday, September 29, 2014 - 9:59 am

I can't see any other approach to this.

Shiny7 posted on Monday, September 29, 2014 - 10:22 am

Dear Mrs. Muthen,

okay, thank you so much for your immediate reply...

Shiny

Sandra Coulon posted on Monday, November 24, 2014 - 9:28 am

Basic regression with type=general is listwise deleting on x. Based on this thread and also http://www.ats.ucla.edu/stat/mplus/faq/fiml_counts.htm, I mention the predictors in my model statement as in:

model:
LWCNO24 PSS24 on age24 female studyxtx BPComp BMI24 NEIGH24 GRMnGni GxNEIGH;
[age24 female studyxtx BPComp
BMI24 CE24 NSat24 NEIGH24 GRMnGni GxNEIGH];

However, when I do that, I get the following:

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.206D-20. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 65, GXNEIGH.

I thought maybe this was a degrees of freedom issue but the "number of free parameters" is 65. What is the issue here, and is there another way to get the program not to listwise delete on X?

Linda K. Muthen posted on Monday, November 24, 2014 - 9:35 am

This is likely due to one or more of your predictors being binary. The mean and variance of a binary variable are not orthogonal and this can trigger the message. Comment out the means. If the message disappears, you can put them back and ignore the message.

Sandra Coulon posted on Monday, November 24, 2014 - 9:58 am

Thank you for the quick reply! I did comment out the means output request and the message did not go away. Other thoughts?

Linda K. Muthen posted on Monday, November 24, 2014 - 11:10 am

Please send the two outputs, with and without the means, and your license number to support@statmodel.com.

Madison Aitken posted on Tuesday, February 24, 2015 - 9:30 am

I am running a logistic regression with a binary dependent variable (MLR estimation). I have both continuous and binary independent variables. Both the dependent and independent variables have missing data.

1) Should I mention the variances only for independent variables with missing data, or should I mention the variances for all independent variables?

2) Is the procedure of mentioning variances also valid for categorical (binary) independent variables?

3) Is there anything further I should do to address the missing values of Y?

Linda K. Muthen posted on Tuesday, February 24, 2015 - 9:42 am

1. You must mention all of the variances of the covariates or none of them. If you mention only some, the model is estimated with zero covariances among the ones you mentioned and the ones you did not mention.

2. Yes, for maximum likelihood estimation but not for weighted least squares estimation. Note that each variance mentioned requires one dimension of integration.

3. Missing data theory does not apply to a single dependent variable. If you bring in the covariates, they are treated as dependent variables so you have more than one. Distributional assumptions are then made about the covariates.

Madison Aitken posted on Tuesday, February 24, 2015 - 10:30 am

Thank you, Dr. Muthen, for such a quick response! I was hoping you could clarify one more thing. Here is the model portion of my input:

MODEL:
y1 on x1 x2 x3 x4;

x1;
x2;
x3;
x4;

You mentioned in #3 above that when I bring the covariates into the model, they are treated as dependent variables; thus, I would have more than one dependent variable. Does this mean that I am currently estimating missing data for both the x and y variables based on my current input? I ask this because the number of observations being used in the analysis seems to reflect this. Thank you for your help.

Linda K. Muthen posted on Tuesday, February 24, 2015 - 10:54 am

Yes, this is what is happening.

Heather Gilmartin posted on Friday, March 06, 2015 - 8:24 am

Dear Linda,

I have received a warning on a model that is measuring the relationship of a predictive composite model on a latent model and a measured outcome:

*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 21
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

You indicated in a previous post: "The only way to avoid the listwise deletion of covariates is to bring them into the model as dependent variables. You can do this by mentioning their variances in the MODEL command. You then make distributional assumptions about them."

Is the code to mention the variances in the MODEL command included in one of your workshop handouts or in the users guide? I cannot find any coding instructions.

Thanks,

Linda K. Muthen posted on Friday, March 06, 2015 - 9:12 am

If you have variables x1, x2, x3, say

x1 x2 x3;

in the MODEL command.

Madison Aitken posted on Friday, April 03, 2015 - 8:21 am

Dear Dr. Muthen,

You said in an earlier post that when you mention the variances of the covariates (in order to estimate x-side missing data), they are treated as dependent variables and distributional assumptions are made about them. Does that mean they must be normally distributed? If so, how robust would an analysis like this be to violations of normality? I am using multinomial logistic regression (y has 3 levels) and my x variables have a strong positive skew with a high frequency of zeroes, although they are not zero inflated. Is there anything else you can recommend that would be more appropriate? I am hesitant to delete cases with missing x variables as this seems to affect the means and standard deviations quite a bit. Thank you.

Bengt O. Muthen posted on Friday, April 03, 2015 - 1:02 pm

It is not clear in general how robust ML-MAR is to strong non-normality, but take a look at section 4.7 of the following paper on our website which was just published in the SEM journal:

Asparouhov, T. & Muthén B. (2014). Structural equation models and mixture models with continuous non-normal skewed distributions. Web note 19. Version 2. Download Mplus inputs and outputs used in this paper here.

So if your x variables are truly continuous variables, not discrete, and have no floor or ceiling effects, you could use the skew-t distribution for multiple imputation in a first step, followed by the logistic regression.

Diane Putnick posted on Monday, June 29, 2015 - 8:27 am

Dear Drs. Muthen:

I have run a latent class growth analysis model (3 classes) with one predictor x. This predictor had missing data points, so I estimated its variance in order to use the total sample in my analysis. However, after I added x variance in the model command, I got an error message and the c (the latent class variable) on x (predictor with missing data) command was ignored.
Attached is my syntax for the model command. Could you please let me know what’s wrong in my syntax? Thank you.

MODEL:
%Overall%
i s | y1@0 y2@4 y3@8;
i@0; s@0;
i with s @0;
x;
x with i @0; x with s@0;
c on x;

*** ERROR
The following MODEL statements are ignored:
* Statements in the OVERALL class:
C#1 ON X
C#2 ON X
*** ERROR
One or more MODEL statements were ignored. These statements may be
incorrect or are only supported by ALGORITHM=INTEGRATION.

Linda K. Muthen posted on Monday, June 29, 2015 - 11:22 am

Add ALGORITHM=INTEGRATON; to the MODEL command.

Daniel Ryan Kavish posted on Wednesday, March 16, 2016 - 12:21 am

Dear Drs. Muthen,

I am rather new to MPlus and SEM and am a graduate student. So, I apologize if I am overlooking something obvious.
I read through the forums and understand that if I include my X-variables in the model as dependent variables, then I can avoid listwise deletion of cases. My problem is that when I do this, my output produces "WITH" output that I did not request. I simply do not understand why I am getting "WITH" output that I did not request in my input. Also, I am clueless about how this impacts the model I am outlining in the input (which only has 4 WITH statements).

Additional info that might be needed: I am using WLSMV and have two categorical outcome measures in my model. Listwise=off.

Is it possible to avoid getting all of the "WITH" output that was not requested?

Thank you,

Linda K. Muthen posted on Wednesday, March 16, 2016 - 3:00 pm

You can bring the covariates into the model only with maximum likelihood estimation. So you should not do this with WLSMV.

If you get WITH statements as the default, you do not want, fix them to zero:

y1 WITH y2@0;

Daniel Ryan Kavish posted on Saturday, March 19, 2016 - 11:04 am

Dear Drs. Muthen,

My mentors have been adamant that my analysis should use WLSMV because of the ordinal/categorical nature of my outcome measures. Thus, I guess my main followup question is how can I use WLSMV for my analysis and still avoid losing cases?

Again, I apologize for my lack of experience and thank you for your guidance.

Bengt O. Muthen posted on Saturday, March 19, 2016 - 4:21 pm

Use ML or Bayes instead. Note that ML does not mean that you treat your outcome as continuous (it's a common misunderstanding). Just declare them as categorical.

Daniel Ryan Kavish posted on Saturday, March 19, 2016 - 6:07 pm

To make sure I understand, do I simply run my analysis as normal (listwise=off), but change the estimator to ML (still declaring categorical variables)?

If so, then what paper could I cite for this process?

If I am completely misunderstanding, then please help me understand a bit better.

Thanks again,

Bengt O. Muthen posted on Sunday, March 20, 2016 - 7:01 am

Q1: Yes.

Q2: There is the whole IRT (Item Response Theory) literature. Or, you can cite the Skrondal & Rabe-Hesketh book Generalized Latent Variable Modeling.

Daniel Ryan Kavish posted on Sunday, March 20, 2016 - 3:46 pm

Dear Drs. Muthen,

My apologies. I should have mentioned I was using COMPLEX analysis option due to the inclusion of survey weights (weight, stratification, cluster). It is giving me an error stating that ML is not able to be used with this type of analysis.

any other ideas on how I can avoid dropping my missing cases, while still being able to use WLSMV?

More info= Full sample is nearly 11,000 cases. Most of my variables have between 0.5-3% missing (not many missing), but three specific variables are missing 10-13% (they come from a different questionnaire-none of which are outcome measures). I did mean/mode impute and run the analysis the same way as I do with the missings included, and the results are nearly identical to the pair wise deleted output (trivial differences in a couple coefficients, significance of variables did not change). That being said, I am guessing this method is frowned upon.

Again, thank you for all of the advice. i am learning a lot through your forums, publications, and user guide.

Daniel Ryan Kavish posted on Sunday, March 20, 2016 - 3:51 pm

I should add that when I took the weights out and ran the model with ML (just to see how it would work), it still deleted the same amount of cases as was happening in WLSMV. Thus, I'm at a loss of how to proceed.

Thank you,

Linda K. Muthen posted on Sunday, March 20, 2016 - 9:49 pm

Send the relevant files and your license number to support@statmodel.com.

Vaiva Gerasimaviciute posted on Wednesday, March 23, 2016 - 1:59 pm

I am running a three-wave cross-lagged analysis with two categorical variables (yes/no). I do have various patterns of missing data. Sometimes it is missing on a first wave, sometimes on the second, first and second, second and third, etc.

1)To my understanding, by default m-plus uses FIML?

2) Should I use different method to deal with missing data instead? (multiple imputation or other?)

Thank you,
Vaiva

Linda K. Muthen posted on Wednesday, March 23, 2016 - 3:33 pm

1. With maximum likelihood estimation, FIML is used.

2. Multiple imputation and FIML are asymptotically equivalent. I would use FIML unless there is some special reason to use multiple imputation.

Vaiva Gerasimaviciute posted on Thursday, March 24, 2016 - 6:15 am

Thank you.

I think due to categorical nature of variables mplus uses WLSMV. It also asks me to use theta parameterization.

If I add Estimator=ML; and remove parameterization=theta; it gives me error
*** FATAL ERROR
THIS MODEL CAN BE DONE ONLY WITH MONTECARLO INTEGRATION.

Does this mean I need to do imputation due to the categorical nature of my main variables?

Vaiva Gerasimaviciute posted on Thursday, March 24, 2016 - 6:22 am

Can I run my model as if these were continuous variables?
That way I am allowed to use ML

Linda K. Muthen posted on Thursday, March 24, 2016 - 8:16 am

With WLSMV, pairwise present is used to handle missing data.

Regarding the error, add INTEGRATION=MONTECARLO; and ALGORITHM=INTEGRATION; to the ANALYSIS command.

You can use the CATEGORICAL option with ML.

I don't think any of this implies anything about multiple imputation.

Hye Jeong Choi posted on Thursday, August 18, 2016 - 11:27 am

Hi,

I used x1 x2 x3; under model commend.
I did not receive any warning due to missingness, but..my model fit became super strange.

e.g., CFI=0.000 TLI=-0.453,

Without FIML, model fit was great.

What's happening here?

Thank you in advance.

Bengt O. Muthen posted on Thursday, August 18, 2016 - 11:43 am

Please send outputs and license number to Support.

Minnik Findik posted on Wednesday, October 12, 2016 - 4:46 am

Hi. I am trying to run a multilevel regression model. And I have around 2000 kids missing (out of 7000), in order to get over this problem,I have included all my variables' variances. Below you can see part of the syntax:
CLUSTER = laestab;

ANALYSIS: TYPE = TWOLEVEL random;
MODEL:
%WITHIN%
emosumx ON genderC EthnicWR FamSumC
CommSumC;
genderC;
EthnicWR;
FamSumC;
CommSumC;
%BETWEEN%
emosumx ON PhaseEdu1 PhaseEdu2 Urban;
PhaseEdu1;
PhaseEdu2;
Urban;

When I run the model, I get the following error:
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.228D-16. PROBLEM INVOLVING PARAMETER 26.

THE MODEL ESTIMATION TERMINATED NORMALLY

Parameter 26 is PSI GenderC X GenderC.

So I am tiny bit confused as to what I should be doing. What is wrong?

Bengt O. Muthen posted on Wednesday, October 12, 2016 - 4:05 pm

It sounds like the GenderC variable is binary. If so, you can ignore this message.

Alissa Mahler posted on Sunday, June 11, 2017 - 4:07 pm

Hello,

I'm finding that when I use the "USEOBSERVATIONS" command, Mplus is dropping cases beyond the ones that are missing x-variables. For example, the Mplus model indicates 266 observations, and indicates that 6 cases dropped b/c they were missing an x-value, however in my SPSS dataset, I have 292 cases. I'm wondering why 20 additional cases appear to have been excluded?

Thanks for your help!

Linda K. Muthen posted on Sunday, June 11, 2017 - 4:49 pm

Please send the output, data set, and your license number to support@statmodel.com.

Milan R posted on Thursday, August 10, 2017 - 11:55 am

Hi Dr. Muthen,
I wanted to run a four-class GMM with binary predictors, which have missing data. What is the most appropriate way to handle the missingness on the X-side in my case? I learned from many posts on the forum that I could bring a predictor in the model by mentioning it out and making distributional assumptions. But since mine are categorical, I wonder if mentioning their variances in the MODEL command is the way. Or should I follow the 3-step approach using auxiliary variables?
Thanks!

Bengt O. Muthen posted on Thursday, August 10, 2017 - 4:39 pm

This is a tricky situation without a good solution. You can use WITH among the x's to bring them into the model. But, this leads to numerical integration with many dimensions so it is computationally demanding.

Milan R posted on Thursday, August 10, 2017 - 7:23 pm

Hi Dr. Muthen,
Thank you for your prompt reply! For my situation, do you have a rough estimate of the computational time needed to run this GMM with a sample size of 7000? I would like to know a reasonable length of time before giving up or suspecting a problem with my model. Thanks again.

Bengt O. Muthen posted on Friday, August 11, 2017 - 10:15 am

I couldn't say because it depends on so many factors including your specific model, the number of x variables with missing data, and your data. Ask for TECH8 and you will see the time each iteration takes.

Amanda Sim posted on Monday, November 06, 2017 - 5:42 pm

Dear Dr. Muthen,

My iv consists of a sum score of 17 yes/no items. There are 4 cases (out of 291) with missing data on one or more of the 17 items; hence, there are 4 cases with the iv missing.

In order to include these 4 cases in my analysis, I brought the iv into the model. However, when I do this, the regression coefficients get much larger (e.g. changes from 0.1 when the 4 cases are excluded to 0.4 when iv is brought into the model).

Do you think this is because multivariate normality is assumed when the iv is brought into the model (and the distribution of my iv is NOT actually normal)? I looked at the 4 cases to see if there was anything unusual about them but although they are all clustered around one end of the distribution, they do not appear to be massive outliers.

I am unsure how to interpret the different results when excluding the 4 cases vs. bringing them into the model and would appreciate your thoughts on how to proceed.

Thank you,
Amanda

Bengt O. Muthen posted on Tuesday, November 07, 2017 - 2:31 pm

Please send these 2 outputs to Support along with your license number.

Zhi Li posted on Tuesday, December 19, 2017 - 7:09 pm

Dear Dr. Muthen,
I am running a pathway analysis with 3 continuous endogenous variables regressed on 7 exogenous variables(5 continuous, 2 binary).
Based on the messages above, I know that mentioning the variance/mean of the exogenous variables in the model will help avoid list-wise deletion for the X-variables. Yet, when I do include all variances of the exogenous variables in the model (because they have to be included all together), I came across the following error message.
THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION.
My questions:
(a) I am wondering if this has anything to do with the fact that three of my continuous exogenous variables did not have any missing values at all. I raised this because when I do not mention their variances, the error message disappeared.
I don't think the error message is caused by mentioning the variance of the binary variables, because even when they are not included, the error message still pops out.
(b) What is your suggestion to my current situation to avoid list-wise deletion for X? Thanks so much and I appreciate your help!

Bengt O. Muthen posted on Wednesday, December 20, 2017 - 12:31 pm

Send the output to Support along with your license number.

Sara Namazi posted on Sunday, February 18, 2018 - 12:17 pm

Dear Dr. Muthen,

I am very new to Mplus. My data-set consists of an (n) of 156. However, when I run my model, the number of observations drops to 154 and I get this warning message:

WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 2

Do I need to check my data? How can I determine how many valids exists for each variable? Do I report this in my manuscript?

Bengt O. Muthen posted on Sunday, February 18, 2018 - 2:50 pm

Check what your missing data flag is. Perhaps you made it 999 and overlooked that that was a valid x value.

The output option Patterns gives you missing data information.

Sara Namazi posted on Sunday, February 18, 2018 - 3:28 pm

Thank you, Dr Muthen. Your response is very helpful.

I noticed that when I add my covariates in my model, listwise deletion of covariates occurs. How can I avoid this problem, as I want to use all of my cases.

Bengt O. Muthen posted on Sunday, February 18, 2018 - 4:27 pm

See point 2 of the FAQ on our website:

Missing on x's

Talea Cornelius posted on Monday, August 13, 2018 - 9:25 am

Dear Drs. Muthen,

I seem to be having a similar problem with my binary covariates. I specified that the variance should be estimated, so missing observations are not removed from the sample, but left out the means:

age_* bmi_final*; !continuous
[age_* bmi_final*];

male* hispanic* black*; !binary

I still get the following error:

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS -0.185D-15. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 62, %BETWEEN%: BLACK

Bengt O. Muthen posted on Monday, August 13, 2018 - 9:43 am

This message can be ignored when binary covariates are brought into the model by mentioning their parameter(s). Note, however, that there are issues to consider - better ways to do this - when bringing binary covariates into the model as we explain in our Short Course Topic 11 video and handout on our website (YouTube or full video version), referring to Chapter 10 of our book Regression and Mediation Analysis using Mplus.

Talea Cornelius posted on Monday, August 13, 2018 - 10:10 am

Thank you so much. Should I then be mentioning both the means and variances for the binary covariates? At one point I had tried to constrain the variances (dependent on the mean of the binary variable), but that made the model fail.

Bengt O. Muthen posted on Monday, August 13, 2018 - 10:22 am

No, mentioning either the mean or the variance is fine.

Talea Cornelius posted on Monday, August 13, 2018 - 10:58 am

Thank you so much for your assistance. I have one final question. I recognize that the warning about variance is ignorable; when I explicitly state that I want covariances estimated between covariates I get a similar warning. Is this also ignorable?

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS 0.178D-17. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 88, %BETWEEN%: BMI_FINAL WITH HISPANIC

Bengt O. Muthen posted on Monday, August 13, 2018 - 11:05 am

Yes, ignorable if at least one of those two variables is binary and is a between-level variable.

Talea Cornelius posted on Monday, August 13, 2018 - 4:16 pm

Dear Dr. Muthen,

That was indeed the case (HISPANIC is a binary variable). However, in the series of 6 models I am running (3 outcomes/2 sets of predictors) I get an error for a continuous variable:

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE
TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE
FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING
VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE
CONDITION NUMBER IS 0.166D-18. PROBLEM INVOLVING THE FOLLOWING PARAMETER:
Parameter 78, %BETWEEN%: BMI_FINAL

I am not sure why this would be happening, since nothing has changed in my code other than variable names.

Bengt O. Muthen posted on Monday, August 13, 2018 - 5:18 pm

We need to see the full output - send to Mplus Support along with your license number. Include the data if you can.

Jesus Garcia posted on Thursday, February 14, 2019 - 9:34 am

Dear Drs. Muthen

I am trying to execute a causality model, regression, between observed variables Int_1 - int_5 and uso_dec1 - uso_dec5 but the results obtained are not reliable,

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.000
90 Percent C.I 0.000 0.019
Probability RMSEA <= .05 1.000

CFI/TLI

CFI 1.000
TLI 1.002

Chi-Square Test of Model Fit for the Baseline Model

Value 6522.375
Degrees of Freedom 22
P-Value 0.0000

SRMR (Standardized Root Mean Square Residual)

Value 0.009

Could you advise me how to execute my model ?. The following is my proposed model:

USEVARIABLES ARE
USO_DEC1 USO_DEC3 USO_DEC4 USO_DEC5
INT_1 INT_2 INT_3 INT_4 INT_5;

MODEL:
USO_DEC1 ON INT_1;
USO_DEC3 ON INT_3 INT_4;
USO_DEC4 ON INT_2 INT_3 INT_4 INT_5;
USO_DEC5 ON INT_3 INT_4 INT_5;

OUTPUT: STDYX; TECH1; TECH4; MODINDICES(ALL);

Thanks!

Bengt O. Muthen posted on Thursday, February 14, 2019 - 5:18 pm

This general analysis question is suitable for SEMNET. When you post there you may want to explain what you mean by "a causality model" and also give the model's chi-square.

Laurence Wright posted on Monday, September 02, 2019 - 1:58 am

Dear Drs Muthen,

I'm attempting to do SEM and consistently receive the same error warning. I have been through all my data and ensured that all missing data are coded as -99 and this is accounted for in my model.

I have read through a few of the solutions (e.g multiple imputation) suggested on this forum and on your FAQ but I am unsure if these are appropriate for my situation - as I know my colleagues have run similar models without having to do this. I was wondering if you might be able to help at all?

This is my input:

VARIABLE: Names are
Age
Gender
Ethnic
LEC_No
Childhoo
Anx_Atta
Avo_Atta
PTCI_Tot
IIP_Tot
PSS_Tot
RE_DX
AVO_DX
TH_DX
AD_DX
NSC_DX
DR_DX
PHQ_Symp;

USEVAR are
Age
Gender
Ethnic
LEC_No
RE_DX
AVO_DX
TH_DX
AD_DX
NSC_DX
DR_DX
PHQ_Symp;
Missing are all (-99);

ANALYSIS:
ESTIMATOR = MLR;

MODEL:
PTSD by RE_DX AVO_DX TH_DX;
DSO by AD_DX NSC_DX DR_DX;

PHQ_Symp on PTSD Gender Age Ethnic LEC_No;
PHQ_Symp on DSO Gender Age Ethnic LEC_No;

OUTPUT:
STDYX mod;

Laurence Wright posted on Monday, September 02, 2019 - 1:59 am

And this is the error message I receive (apologies for the double post I reached the size limit for one):

*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 6
*** WARNING
Data set contains cases with missing on all variables except
x-variables. These cases were not included in the analysis.
Number of cases with missing on all variables except x-variables: 83
2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

Thanks very much!

Laurence

Bengt O. Muthen posted on Monday, September 02, 2019 - 1:10 pm

When it can't fit in one window it should be sent to Support along with your license number.

When you get these missing data messages, you should check your data and make sure you agree with the statements so that you are certain that the data are read the way you intended.

Aurelie Lange posted on Tuesday, April 14, 2020 - 10:04 am

Dear Dr Muthen,

We are running twolevel (logistic) regression analysis with missing data on x. As far as I've understood, there are several options to avoid listwise deletion:
a. bring the variance of all x-variables into the model. But this shouldn't be done too lightly, especially with binary x's. Right?
b. do the same thing, but then use Bayes estimator (as recommended in your RMA book).
c. use multiple imputation and then run the regression analyses on the imputed data

Some questions:
1. Is there any preferred method?
2. Can I impute missing data as is done in example 11.6 of the user guide? Or should I specify a more advanced / different imputation model?
3. We want to run several regression models (using the same predictors each time, but different outcomes). Following example 11.6, should we only include the variables relevant for the specific regression in the IMPUTE = syntax? Or should we specify all variables relevant to our paper? Or is it better to do MI seperately and then run all analyses on these new imputed datasets (so that all analyses are based on exactly the same imputed data)?

Thanks for your advice!

Kind regards,
Aurelie

P.s. I very much appreciate all the help I've been receiving through the past few years on this discussion board. It has really helped me to better understand Mplus and improve my skills.

Bengt O. Muthen posted on Wednesday, April 15, 2020 - 4:02 pm

1. I don't think there are general guidelines for what is best. If you have a lot of missing data, in your case, you might want to try one non-imputation approach and one imputation approach. I like the Bayes method you mention in b, but Bayes uses probit, not logit regression and therefore might not be attractive (next Mplus version will have Bayes logit).

2-3. If you impute, I would use UG ex 11.5 and include all the different DVs that you are interested in so that you use the same imputed data sets in all subsequent analyses.

Tor Neilands posted on Friday, August 07, 2020 - 10:29 pm

Hi Bengt,

I am working with a statistician colleague who is fitting a series of logistic regression models in Mplus with clustered data (TYPE=COMPLEX). She has X variables with missing values and so plans to bring them into the model as random variables. She intends to perform backward elimination.

My understanding is that there are two options for performing backward elimination: a) the usual method, which is to drop X variables, beginning with the x-variable with the largest p-value, or b) retain all X variables, but use MODEL CONSTRAINT or an @0 in the MODEL syntax to fix each x-variable's regression coefficient to zero. My understanding is that with X variables treated as random due to missing values the latter approach would be preferred because the same means, variances, and covariances of the same X variables would be estimated in all models, yielding more comparable log-likelihoods. Do you concur? Or do you have an alternative recommendation?

Thanks very much in advance,

Tor Neilands

Bengt O. Muthen posted on Saturday, August 08, 2020 - 4:30 pm

I concur.

Tor Neilands posted on Tuesday, August 18, 2020 - 8:51 am

Thank you, Bengt.

We are experiencing very long run times for our initial multivariable model, which is probably not surprising. This leads me to a couple of follow-up questions: 1) Is it okay for us to revert x-variables without missing data back to fixed status or would that bias the results to an unacceptable degree? 2) Do you have other suggestions of things for us to try to improve the convergence speed?

Thank you as always,

Tor

Bengt O. Muthen posted on Tuesday, August 18, 2020 - 9:53 am

With ML, you probably get many dimensions of integration due to bringing x's with missing into the model.
1) Separating the x variables without missing into being exogenous covariates, you have to make sure they remain correlated with the x's with missing which may be tricky.
2) You may consider using the Mplus imputation approach in UG chapter 11 for the x's and then do Data Type=Imputation in a second step. Perhaps handle the clustering by two-level. See also Chapter 10 in our RMA book where we show advantages of Bayes in Table 10.29.

Gaye Ildeniz posted on Thursday, August 27, 2020 - 9:59 am

Hello,
I'm trying to run a two-wave study where some variables in wave2 has missing variables. I have tried using both MLR and ML and also tried specifying the means or variances in the model. missing data are in the DV1T1-DV8T1 and DV1T2-DV8T2.

MODEL:
ACE BY ace1 ace2 ace3 ace4 ace5 ace6 ace7 ace8 ace9 ace10 ace11 ace12 ace13;
DVT1 BY DV1T1 DV2T1 DV3T1 DV4T1 DV5T1 DV6T1 DV7T1 DV8T1;
DV1T1 DV2T1 DV3T1 DV4T1 DV5T1 DV6T1 DV7T1 DV8T1;
DVT2 BY DV1T2 DV2T2 DV3T2 DV4T2 DV5T2 DV6T2 DV7T2 DV8T2;
DV1T2 DV2T2 DV3T2 DV4T2 DV5T2 DV6T2 DV7T2 DV8T2;
DVT1 WITH DVT2;
DVT1 ON ACE SOCDES2;
PPROC1 ON ACE SOCDES1;
CPROC1 ON ACE SOCDES1;
PPROC2 ON ACE SOCDES2;
CPROC2 ON ACE SOCDES2;

I still get this warning:
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 226
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

I cannot figure out what I'm missing. I appreciate your help in advance.
Thank you.

Bengt O. Muthen posted on Thursday, August 27, 2020 - 4:22 pm

To diagnose this, send your data and output to Support along with your license number.

Zehua Cui posted on Sunday, September 27, 2020 - 6:25 pm

Hi Dr. Muthén,

I am running a simple analysis looking at the quadratic associations between perceived stress (x1) and some adjustment outcome (y). Control variables such as age (x2), gender (x3), income (x4) were also included. I mentioned all covariates in the model command to estimate missing data:
x1 x1(square) x2 x3 x4;

However when I do that, I always get a warning message:THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY ...THIS MAY BE DUE TO THE STARTING VALUES ...PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 44, x1(square) (or if I don't include income as a covariate, it will say that the problem parameter is x1(square) WITH age).

It looks like the problem parameter always involves x1(square).However, I don't see any offending estimates in my output. And if I don't estimate missing data by putting all covariates in the model command, there is no warning messages.

Could you please give me some guidance as to why this happens? In this case, will it be okay to ignore the warning message? Or if there is another way to estimate missing data on x variables when there is a quadratic term involved. Thank you so much in advance!

Bengt O. Muthen posted on Monday, September 28, 2020 - 5:09 pm

X and x-square are highly correlated. Instead of the simple square, use (x - xmean)^2

Zehua Cui posted on Monday, September 28, 2020 - 6:44 pm

Thank you very much for your response Dr. Muthén!

I changed my syntax to

model:
(x-xmean) (x-xmean)^2 age gender income on y;

(x-xmean) (x-xmean)^2 age gender income;

However, I am still getting the same error message. Is there anything else that I can do?

Bengt O. Muthen posted on Tuesday, September 29, 2020 - 5:02 pm

That term should not be in the Model command but in the Define command:

x-square = (x-xmean)*(x-xmean);

The Model command then uses this new x-square variable.

Zehua Cui posted on Tuesday, September 29, 2020 - 6:16 pm

Hi Dr. Muthén,

So sorry that my message was a little misleading! I actually did exactly what you mentioned. I did mean center the x variable, and then squared it using the define command. I then used the new squared x variable in the model command.

xnew xnew^2 age gender income on y;
xnew xnew^2 age gender income;

However, the error message still exists..

Are there other ways that I can try to resolve this issue?

Thank you so much!

Zehua Cui posted on Tuesday, September 29, 2020 - 6:35 pm

Hi Dr. Muthén,

I apologizes for posting messages in two posts. I just want to follow up and let you know that I just ran the model using the new x-square variable and without the gender, then the error message disappears.

With this being the case, would it be okay to ignore the error message if I include gender back into the model?

Thank you!

Bengt O. Muthen posted on Wednesday, September 30, 2020 - 3:04 pm

Then we need to see your full output to diagnose the problem - send to Support along with your license number.