Message/Author 

Jiyoung posted on Friday, June 26, 2009  2:10 am



I added control variables to my model. I got the following message after the addition. I wanted to use the full information. Therefore, I defined the all of the variables with missing data. It seems that the program deleted cases using listwise deletiion. With the following warning, I could not see the model estimation. Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 94 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS What can I do to see the results of the model estimation? 


The only way to avoid the listwise deletion of covariates is to bring them into the model as dependent variables. You can do this by mentioning their variances in the MODEL command. You then make distributional assumptions about them. 

nina chien posted on Friday, September 04, 2009  1:58 pm



To avoid listwise deletion of covariates, I added the covariates into the model by mentioning their variances. This resulted in a series of WARNING statements that some of my covariates are categorical (which they are): “WARNING: VARIABLE GENDERR MAY BE DICHOTOMOUS BUT DECLARED AS CONTINUOUS.” So, I tried stating covariates as categorical, but this resulted in error statements that the covariates are not DV’s: “ERROR in VARIABLE command. CATEGORICAL option is used for dependent variables only. GENDERR is not a dependent variable.” I'm very confused because the WARNING and ERROR messages are contradictory. Can I ignore the warning statements? Also, the model where covariates were declared as continuous did not converge. Is this related to the WARNING? Thanks so much. 


You should not put covariates on the CATEGORICAL list. The message comes about because the mean and variance of a dichotomous variable are not independent of each other. I would need to see the full output and your license number at support@statmodel.com to say if this is related to convergence. 


Drs. Muthen  I have been asked by a reviewer to provide a reference for exactly what Mplus is doing when bringing covariates with missing data into the model as dependent variables by steps such as mentioning their variances in the MODEL command (and thus making distributional assumptions about them). If you have any recommendations for this, I would greatly appreciate it. Many thanks. 


I don't know of a reference to this exactly. When you bring the x's into the model, multivariate normality is assumed for the x's and all continuous y's. This assumption is discussed in any structural equation modeling book. 


I ran a path analysis and one of the independent variables has missing data on several cases. Apparently listwise deletion was used. Is there any way I can avoid this? Thanks. 


Missing data theory does not apply to observed exogenous variables. The model is estimated conditioned on them. You can bring all of the covariates into the model by mentioning their variances in the MODEL command. Then they will be treated as dependent variables and distributional assumptions will be made about them. 


Thanks Linda. I have run the analysis, and realise that if I bring the exogenous variable with missing data into the model, df becomes higher. I am wondering if this would pose any issue with explaining to readers how the model was specified. How do people usually justify the use of this method? 


Your degrees of freedom should not change if you put no restrictions on the x's. Please send the output and your license number to support@statmodel.com so I can see what you did. 


Dear Dr. Muthen I am running a mediational model with two covariates (relationship status and greek membership). I have missing values on these. I specified type=missing with MLR. Then I entered the covariates in the model command and Mplus uses the whole sample. I understand that missing data theory does not apply to exogenous observed variables and when I do not include them in the model, there was no substantive changes except in one of my hypothesized effect. I want to keep all sample but I am wondering what happened here that brought the change? Can I trust the output with the observed variables included in the model Thanks 


When you mention the means, variances, or covariances of the exogenous variables in the MODEL command, they are treated as dependent variables and distributional assumptions are made about them. This can change the results if not all variables are continuous. See the Version 6.1 version history on the website for a description. 


Dear Dr Muthen When I am running a onefactor CFA model with continuous data, I get the following error. "Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 2" Previously, I have run the same model without getting the error message. I have checked whether the error is not due to misreading the input file, but still cannot find its source. Thank you. 


Please send the outputs where you get the message and where you don't get the message along with your license number to support@statmodel.com. 


Dear Mplus team, I have several questions regarding logistic regression with missing values. My data set contains 1000 cases, 300 of them with missing values. y and x1 are both binary and have missing values; x2x4 are continuous and completely observed. I want Mplus to use all cases, including those with missing values. In order to do that I mentioned the variance of x1 and ran the following model: ANALYSIS: estimator = ml; integration = montecarlo; MODEL: y on x1 x2 x3 x4; x1; My questions are as follows: 1. When I mention the variance of x1 Mplus uses 950 of 1000 cases. If I specify one further independent variable (no matter which one), all cases are included. What I don’t understand is that the coefficients of the model vary depending on which variance I choose to mention. This observation confuses me and I am not sure which variance I should mention. 2. A more general question concerns predicting probabilites. Is it right that it is not possible to predict probabilities when there are missing values on an independent variable using ML estimation? 3. Is the given R square the McKelvey & Zavoina’s R square – and can I trust the value although I have missing values on the independent variable or does this bias the R squarecomputation (as I would guess)? I would appreciate any help. 


1. You should include the variances of all of the covariates or none of the covariates. When you include, for example, y1 and y2, they are correlated and y3 and y4 are correlated because the model is estimated conditional on them. But there are no correlations between y1, y2 and y3, y4. The zero correlations vary depending on which variances you inlcude in the model. 2. I don't think this can be done. 3. Yes. 

Hanna Esser posted on Wednesday, September 03, 2014  6:13 am



Dear Mplus team, I have a question regarding missing values in my path model. Variables x1x6 are exogenous. Four of them are binary (gender and yes/no variables), two are continuous. Variables y1y7 are endogenous. y1y6 are ordinal, y7 is binary (yes/no). y1y6 are also exogenous, because I compute regressions on y7. My current sample size is 4009 and I would like to use the whole sample for my analysis. Unfortunately, there are missing data in variables x2x6. I assume that they are MCAT. When I do the analysis without extra assumptions, there are 1031 cases with missing on xvariables which get excluded from my analysis. Most of the missing data are on the variable “income” because many refused to give this information and on the variable “Parents without qualification” (yes/no). By reading old discussions I found out that I when I mention the variances no listwise deletion happens. This is the case in my analysis; it includes all 4009 cases when I do this, but I get the warning that my exogenous variables x1x4 are dichotomous but declared as continuous (which is true). What can I do to include all cases and avoid the listwise deletion of 25% of my cases? Thank you 


You can do what you did and ignore the warning. 

Shiny7 posted on Monday, September 29, 2014  3:37 am



Dear Drs. Muthen, I ran a multilevel model with 7 xVariables and one continous outcome using MLR. As known, Mplus does not include cases with missings on all xVariables, which in my case are many, many cases. Am I right that the only solution is to mention the variances of the xVariables in the model command? Although the assumption that my data are multivariate normally distributed is in fact not given (!) (using MLR therefore). The problem is furhter, that I have only 21 clusters, and if i take the variances of the xvariables into the model, the number of paramters exceeds the number of clusters which is also a known problem. Can you please help solving that problem? Best regards Shiny 


I can't see any other approach to this. 

Shiny7 posted on Monday, September 29, 2014  10:22 am



Dear Mrs. Muthen, okay, thank you so much for your immediate reply... Shiny 


Basic regression with type=general is listwise deleting on x. Based on this thread and also http://www.ats.ucla.edu/stat/mplus/faq/fiml_counts.htm, I mention the predictors in my model statement as in: model: LWCNO24 PSS24 on age24 female studyxtx BPComp BMI24 NEIGH24 GRMnGni GxNEIGH; [age24 female studyxtx BPComp BMI24 CE24 NSat24 NEIGH24 GRMnGni GxNEIGH]; However, when I do that, I get the following: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.206D20. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 65, GXNEIGH. I thought maybe this was a degrees of freedom issue but the "number of free parameters" is 65. What is the issue here, and is there another way to get the program not to listwise delete on X? 


This is likely due to one or more of your predictors being binary. The mean and variance of a binary variable are not orthogonal and this can trigger the message. Comment out the means. If the message disappears, you can put them back and ignore the message. 


Thank you for the quick reply! I did comment out the means output request and the message did not go away. Other thoughts? 


Please send the two outputs, with and without the means, and your license number to support@statmodel.com. 


I am running a logistic regression with a binary dependent variable (MLR estimation). I have both continuous and binary independent variables. Both the dependent and independent variables have missing data. 1) Should I mention the variances only for independent variables with missing data, or should I mention the variances for all independent variables? 2) Is the procedure of mentioning variances also valid for categorical (binary) independent variables? 3) Is there anything further I should do to address the missing values of Y? 


1. You must mention all of the variances of the covariates or none of them. If you mention only some, the model is estimated with zero covariances among the ones you mentioned and the ones you did not mention. 2. Yes, for maximum likelihood estimation but not for weighted least squares estimation. Note that each variance mentioned requires one dimension of integration. 3. Missing data theory does not apply to a single dependent variable. If you bring in the covariates, they are treated as dependent variables so you have more than one. Distributional assumptions are then made about the covariates. 


Thank you, Dr. Muthen, for such a quick response! I was hoping you could clarify one more thing. Here is the model portion of my input: MODEL: y1 on x1 x2 x3 x4; x1; x2; x3; x4; You mentioned in #3 above that when I bring the covariates into the model, they are treated as dependent variables; thus, I would have more than one dependent variable. Does this mean that I am currently estimating missing data for both the x and y variables based on my current input? I ask this because the number of observations being used in the analysis seems to reflect this. Thank you for your help. 


Yes, this is what is happening. 


Dear Linda, I have received a warning on a model that is measuring the relationship of a predictive composite model on a latent model and a measured outcome: *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 21 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS You indicated in a previous post: "The only way to avoid the listwise deletion of covariates is to bring them into the model as dependent variables. You can do this by mentioning their variances in the MODEL command. You then make distributional assumptions about them." Is the code to mention the variances in the MODEL command included in one of your workshop handouts or in the users guide? I cannot find any coding instructions. Thanks, 


If you have variables x1, x2, x3, say x1 x2 x3; in the MODEL command. 


Dear Dr. Muthen, You said in an earlier post that when you mention the variances of the covariates (in order to estimate xside missing data), they are treated as dependent variables and distributional assumptions are made about them. Does that mean they must be normally distributed? If so, how robust would an analysis like this be to violations of normality? I am using multinomial logistic regression (y has 3 levels) and my x variables have a strong positive skew with a high frequency of zeroes, although they are not zero inflated. Is there anything else you can recommend that would be more appropriate? I am hesitant to delete cases with missing x variables as this seems to affect the means and standard deviations quite a bit. Thank you. 


It is not clear in general how robust MLMAR is to strong nonnormality, but take a look at section 4.7 of the following paper on our website which was just published in the SEM journal: Asparouhov, T. & Muthén B. (2014). Structural equation models and mixture models with continuous nonnormal skewed distributions. Web note 19. Version 2. Download Mplus inputs and outputs used in this paper here. So if your x variables are truly continuous variables, not discrete, and have no floor or ceiling effects, you could use the skewt distribution for multiple imputation in a first step, followed by the logistic regression. 


Dear Drs. Muthen: I have run a latent class growth analysis model (3 classes) with one predictor x. This predictor had missing data points, so I estimated its variance in order to use the total sample in my analysis. However, after I added x variance in the model command, I got an error message and the c (the latent class variable) on x (predictor with missing data) command was ignored. Attached is my syntax for the model command. Could you please let me know what’s wrong in my syntax? Thank you. MODEL: %Overall% i s  y1@0 y2@4 y3@8; i@0; s@0; i with s @0; x; x with i @0; x with s@0; c on x; *** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: C#1 ON X C#2 ON X *** ERROR One or more MODEL statements were ignored. These statements may be incorrect or are only supported by ALGORITHM=INTEGRATION. 


Add ALGORITHM=INTEGRATON; to the MODEL command. 


Dear Drs. Muthen, I am rather new to MPlus and SEM and am a graduate student. So, I apologize if I am overlooking something obvious. I read through the forums and understand that if I include my Xvariables in the model as dependent variables, then I can avoid listwise deletion of cases. My problem is that when I do this, my output produces "WITH" output that I did not request. I simply do not understand why I am getting "WITH" output that I did not request in my input. Also, I am clueless about how this impacts the model I am outlining in the input (which only has 4 WITH statements). Additional info that might be needed: I am using WLSMV and have two categorical outcome measures in my model. Listwise=off. Is it possible to avoid getting all of the "WITH" output that was not requested? Thank you, 


You can bring the covariates into the model only with maximum likelihood estimation. So you should not do this with WLSMV. If you get WITH statements as the default, you do not want, fix them to zero: y1 WITH y2@0; 


Dear Drs. Muthen, My mentors have been adamant that my analysis should use WLSMV because of the ordinal/categorical nature of my outcome measures. Thus, I guess my main followup question is how can I use WLSMV for my analysis and still avoid losing cases? Again, I apologize for my lack of experience and thank you for your guidance. 


Use ML or Bayes instead. Note that ML does not mean that you treat your outcome as continuous (it's a common misunderstanding). Just declare them as categorical. 


To make sure I understand, do I simply run my analysis as normal (listwise=off), but change the estimator to ML (still declaring categorical variables)? If so, then what paper could I cite for this process? If I am completely misunderstanding, then please help me understand a bit better. Thanks again, 


Q1: Yes. Q2: There is the whole IRT (Item Response Theory) literature. Or, you can cite the Skrondal & RabeHesketh book Generalized Latent Variable Modeling. 


Dear Drs. Muthen, My apologies. I should have mentioned I was using COMPLEX analysis option due to the inclusion of survey weights (weight, stratification, cluster). It is giving me an error stating that ML is not able to be used with this type of analysis. any other ideas on how I can avoid dropping my missing cases, while still being able to use WLSMV? More info= Full sample is nearly 11,000 cases. Most of my variables have between 0.53% missing (not many missing), but three specific variables are missing 1013% (they come from a different questionnairenone of which are outcome measures). I did mean/mode impute and run the analysis the same way as I do with the missings included, and the results are nearly identical to the pair wise deleted output (trivial differences in a couple coefficients, significance of variables did not change). That being said, I am guessing this method is frowned upon. Again, thank you for all of the advice. i am learning a lot through your forums, publications, and user guide. 


I should add that when I took the weights out and ran the model with ML (just to see how it would work), it still deleted the same amount of cases as was happening in WLSMV. Thus, I'm at a loss of how to proceed. Thank you, 


Send the relevant files and your license number to support@statmodel.com. 


I am running a threewave crosslagged analysis with two categorical variables (yes/no). I do have various patterns of missing data. Sometimes it is missing on a first wave, sometimes on the second, first and second, second and third, etc. 1)To my understanding, by default mplus uses FIML? 2) Should I use different method to deal with missing data instead? (multiple imputation or other?) Thank you, Vaiva 


1. With maximum likelihood estimation, FIML is used. 2. Multiple imputation and FIML are asymptotically equivalent. I would use FIML unless there is some special reason to use multiple imputation. 


Thank you. I think due to categorical nature of variables mplus uses WLSMV. It also asks me to use theta parameterization. If I add Estimator=ML; and remove parameterization=theta; it gives me error *** FATAL ERROR THIS MODEL CAN BE DONE ONLY WITH MONTECARLO INTEGRATION. Does this mean I need to do imputation due to the categorical nature of my main variables? 


Can I run my model as if these were continuous variables? That way I am allowed to use ML 


With WLSMV, pairwise present is used to handle missing data. Regarding the error, add INTEGRATION=MONTECARLO; and ALGORITHM=INTEGRATION; to the ANALYSIS command. You can use the CATEGORICAL option with ML. I don't think any of this implies anything about multiple imputation. 


Hi, I used x1 x2 x3; under model commend. I did not receive any warning due to missingness, but..my model fit became super strange. e.g., CFI=0.000 TLI=0.453, Without FIML, model fit was great. What's happening here? Thank you in advance. 


Please send outputs and license number to Support. 


Hi. I am trying to run a multilevel regression model. And I have around 2000 kids missing (out of 7000), in order to get over this problem,I have included all my variables' variances. Below you can see part of the syntax: CLUSTER = laestab; ANALYSIS: TYPE = TWOLEVEL random; MODEL: %WITHIN% emosumx ON genderC EthnicWR FamSumC CommSumC; genderC; EthnicWR; FamSumC; CommSumC; %BETWEEN% emosumx ON PhaseEdu1 PhaseEdu2 Urban; PhaseEdu1; PhaseEdu2; Urban; When I run the model, I get the following error: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.228D16. PROBLEM INVOLVING PARAMETER 26. THE MODEL ESTIMATION TERMINATED NORMALLY Parameter 26 is PSI GenderC X GenderC. So I am tiny bit confused as to what I should be doing. What is wrong? 


It sounds like the GenderC variable is binary. If so, you can ignore this message. 


Hello, I'm finding that when I use the "USEOBSERVATIONS" command, Mplus is dropping cases beyond the ones that are missing xvariables. For example, the Mplus model indicates 266 observations, and indicates that 6 cases dropped b/c they were missing an xvalue, however in my SPSS dataset, I have 292 cases. I'm wondering why 20 additional cases appear to have been excluded? Thanks for your help! 


Please send the output, data set, and your license number to support@statmodel.com. 

Milan R posted on Thursday, August 10, 2017  11:55 am



Hi Dr. Muthen, I wanted to run a fourclass GMM with binary predictors, which have missing data. What is the most appropriate way to handle the missingness on the Xside in my case? I learned from many posts on the forum that I could bring a predictor in the model by mentioning it out and making distributional assumptions. But since mine are categorical, I wonder if mentioning their variances in the MODEL command is the way. Or should I follow the 3step approach using auxiliary variables? Thanks! 


This is a tricky situation without a good solution. You can use WITH among the x's to bring them into the model. But, this leads to numerical integration with many dimensions so it is computationally demanding. 

Milan R posted on Thursday, August 10, 2017  7:23 pm



Hi Dr. Muthen, Thank you for your prompt reply! For my situation, do you have a rough estimate of the computational time needed to run this GMM with a sample size of 7000? I would like to know a reasonable length of time before giving up or suspecting a problem with my model. Thanks again. 


I couldn't say because it depends on so many factors including your specific model, the number of x variables with missing data, and your data. Ask for TECH8 and you will see the time each iteration takes. 

Amanda Sim posted on Monday, November 06, 2017  5:42 pm



Dear Dr. Muthen, My iv consists of a sum score of 17 yes/no items. There are 4 cases (out of 291) with missing data on one or more of the 17 items; hence, there are 4 cases with the iv missing. In order to include these 4 cases in my analysis, I brought the iv into the model. However, when I do this, the regression coefficients get much larger (e.g. changes from 0.1 when the 4 cases are excluded to 0.4 when iv is brought into the model). Do you think this is because multivariate normality is assumed when the iv is brought into the model (and the distribution of my iv is NOT actually normal)? I looked at the 4 cases to see if there was anything unusual about them but although they are all clustered around one end of the distribution, they do not appear to be massive outliers. I am unsure how to interpret the different results when excluding the 4 cases vs. bringing them into the model and would appreciate your thoughts on how to proceed. Thank you, Amanda 


Please send these 2 outputs to Support along with your license number. 

Back to top 