Mplus Discussion >> Auxiliary function of missing data

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Auxiliary function of missing data

Mplus Discussion > Missing Data Modeling >

Message/Author

Xu, Man posted on Tuesday, August 05, 2008 - 11:06 am

sorry if I multi posted this.
I just experimented with the auxiliary function for the Mplus 5.1. The results are quite similary to FIML. I am not quite sure if the auxiliary vaiables are considered in relation to the clustering effect for data that has clustering structure. I used complex design function to account for the cluster effect.
Can I have more information regarding this function please?
Thanks!

Bengt O. Muthen posted on Tuesday, August 05, 2008 - 3:12 pm

Yes, Type=Complex is in operation when you use aux(m).

Xu, Man posted on Thursday, August 07, 2008 - 9:26 am

Thanks! I didn't totally understand the technical appendix for this function. But if I understood correctly, this function implemented method from Graham(2003), right? My confusion is that this paper didn't specify how this works for multilevel data. How Mplus takes into account of it when Type=Complex is in operation when you use aux(m)?

Graham, J. W. 2003
Adding missing-data-relevant variables to FIML-based structural equation models
Structural Equation Modeling 10, 1 page 80-100

Bengt O. Muthen posted on Thursday, August 07, 2008 - 9:43 am

Aux(m) still uses maximum-likelihood estimation, just with an extended set of variables. Type=Complex adjusts for complex sample features just like with other ML estimation - so there is no extra difficulty when aux(m) is added. Type=twolevel is another matter.

Xu, Man posted on Thursday, August 07, 2008 - 10:40 am

I see. Thanks for the reply! How it is another matter when Type=twolevel? (so sorry if my question is a bit "idiotic...")

Bengt O. Muthen posted on Thursday, August 07, 2008 - 6:17 pm

Well, then you have to make sure that the "saturated correlates" approach is applied correctly to the two levels - this has not been explicated in the literature. May be straightforward, but...

Andy Ross posted on Wednesday, February 11, 2009 - 1:57 am

Hi

Following on from your last post, is there any plans to make the aux(m) available for type=twolevel or type=complex?

Many thanks!

Andy Ross posted on Wednesday, February 11, 2009 - 1:59 am

Apologies, in my last post I meant type=mixture not type=complex

Linda K. Muthen posted on Wednesday, February 11, 2009 - 10:20 am

We have no plans to do this in the immediate future.

Guillaume Filteau posted on Tuesday, April 21, 2009 - 2:15 pm

Is it possible to estimate the variance of an auxiliary covariate?

Would that result in using all individuals for the analysis, even if they are missing the auxiliary variable?

Best,
Guillaume

Linda K. Muthen posted on Wednesday, April 22, 2009 - 9:36 am

No,only the mean and the standard error of the mean are given.

Sara May posted on Sunday, October 25, 2009 - 10:27 am

Is it possible to specify both binary variables as well as continuous variables as missing data correlates?

The following syntax doesn�t work if I specify the dummy variables as categorical:

auxiliary = (m) dummy1 dummy2 continuous1 continuous2;

Thanks!

Linda K. Muthen posted on Sunday, October 25, 2009 - 11:17 am

This is for continuous variables only.

Maren Winkler posted on Monday, May 10, 2010 - 3:08 am

Hi,

I've estimated a SEM and didn't have problems with fit, SE etc. Now I've added auxiliary variables (aux (m)) and got the following message:

"THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR A LATENT VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO LATENT VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO LATENT VARIABLES. CHECK THE TECH4 OUTPUT FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE ."

I have a model as follows:

F1 BY a b c;
F2 BY x y z;

RESID BY d;
RESID WITH F1@1;

d ON F1;
d@0;

F2 ON F1 RESID;

(the variable RESID is specified in order to allow an additional path from the residuum of variable d on F2 over and above the loading of F1 - on which d loads, too).
The correlation of d and RESID is .906 - which is no surprise given the model.

I'm confused because the model as above worked fine without auxiliary variables.

thanks for your help!

Linda K. Muthen posted on Monday, May 10, 2010 - 8:04 am

I think you mean:

RESID WITH F1@0;

Carolyn CL posted on Thursday, December 13, 2012 - 9:12 am

Hello Dr. Muthen,

I was wondering if there is a way to make the AUXILIARY = (m)x; function work when some of the dependent variables are categorical?

If I treat my variables as continuous and run the (m)x function and compare this model to a saturated correlate model, the results tend to be quite similar in terms of fit indices (CFI, RMSEA), effect sizes and significance. However I am not comfortable treating these categorical variables as continuous because of their non-normal distributions.

Would very much appreciate your suggestions.

Carolyn

Linda K. Muthen posted on Thursday, December 13, 2012 - 9:43 am

AUXILIARY (m) is available only for continuous variables.

Karen-Inge Karstoft posted on Monday, January 07, 2013 - 3:55 am

Hi,

given that it is not possible to use the aux (m) in TYPE=MIXTURE - is there any other way to account for variables predicting missingness without including them in the model?

Thanks,
Karen-Inge

Linda K. Muthen posted on Monday, January 07, 2013 - 10:58 am

You could use multiple imputation. See DATA IMPUTATION.

Chris Blanchard posted on Friday, January 11, 2013 - 11:23 am

Hi,
i'm new to MPlus and wanted to confirm the following. I have a repeated measure design with 4 times points (steps per day at the end of CR, 3 mo after CR, 6 mo after CR, 9 mo after CR). In some preliminary analyses, I found that Diag_Rec (categorical), BMI, and frst_ev (categorical) were related to "missingness" at follow-up assessments. So, I just want to be sure that these variables are included in the model, so do I use the auxiliary function to do so in the analysis below? If not, do I have to use multiple imputation instead? Thanks!
chris

VARIABLE:
NAMES ARE id Diag_Rec BMI frst_ev t2_steps t3_steps t4_steps t5_steps;

USEV ARE t2_steps t3_steps t4_steps t5_steps;

MISSING t2_steps(999) t3_steps(999) t4_steps(999) t5_steps(999);

AUXILIARY = Diag_Rec BMI frst_ev;

CLASSES = c(2);

ANALYSIS:
ESTIMATOR = ML;
TYPE = MIXTURE;
STARTS 100 5;
MITERATION = 300;

MODEL:
%OVERALL%
i s | t2_steps@-3 t3_steps@-1 t4_steps@1 t5_steps@3;
i-s@0;

Bengt O. Muthen posted on Friday, January 11, 2013 - 4:56 pm

I assume these 3 variables don't have a substantive role in your growth model - if they do - just included them in the model. So if not, either approach you mention is fine in principle. You have to specify Auxiliary Missing, not just Auxiliary (see UG). But I think we have not yet developed Aux Missing for Mixtures yet, so that won't work. Multiple imputation is possible, although you assume a 2-class model so regular unrestricted (1-class) imputation is a bit off, but probably better than not using those variables.

Maren Formazin posted on Wednesday, February 13, 2013 - 8:04 am

Hi,

I have data from 2326 subjects. For the model that I'm interested in, I use 7 items. There are 8 subjects with complete missings on these 7 items.

If I estimate my model without the AUXILIARY (M) command, Mplus warns me there are 8 cases with complete missings and tells me N = 2318.

However, if I estimate my model with the AUXILIARY (M) command with my data, Mplus warns me "1 case with complete missing data" and tells me N = 2325.

This doesn't make sense to me - there are indeed 8 subjects with complete missing data on those 7 items that constitute my model; however, all 8 participants have non-missing data on the auxiliary variables.

What can have gone wrong here?

Thanks!

Linda K. Muthen posted on Wednesday, February 13, 2013 - 3:53 pm

Please send the two outputs, data, and your license number to support@statmodel.com.

Carolyn CL posted on Tuesday, March 19, 2013 - 12:11 pm

Hello Dr. Muthen,

I am running a saturated correlate structural equation model with socio-economic status as the auxiliary variable and dummies representing poverty trajectories (3 dummies, 4th reference category excluded) as independent variables predicting weight status (3 categories: normal, overweight, obese).

When I run a basic model (N = 1230):

Weight ON d_low d_inc d_dec;

The model runs fine.

When I add the auxiliary variable (N = 2120):

Weight ON d_low d_inc d_dec;

SES WITH Weight d_low d_inc d_dec;

The coefficients of the dummy variables are comparable, but the standard errors are inflated, and one sig. effect becomes ns.

When I run the full model (with additional independent and dependent variables) including the auxiliary variable SES, the model fails to converge. Increasing the number of iterations does not solve the problem. I can however run the model without the auxiliary and the model converges, but I obviously lose part of my sample.

Any idea why the full model with the auxiliary will not converge?

Carolyn

Linda K. Muthen posted on Tuesday, March 19, 2013 - 12:17 pm

I would have no idea. You would need to send the outputs and your license number to support@statmodel.com for further information.

Carolyn CL posted on Thursday, March 21, 2013 - 8:46 am

I found the problem - the model was by default allowing the dummy variables to correlate with each other once I added the auxiliary variable. Restricting the correlations to 0 allows the model to converge and provides the expected results.

Many thanks!

Michael J. Kieffer posted on Sunday, February 23, 2014 - 11:59 am

Imagine a model-building process with a dataset that has missing data and a simple model that includes:
USEVARIABLES = y x1;
AUXILIARY = (m) c1 c2;
Model: y ON x1;

Then, I want to know if c1 should be a control, so I do:
USEVARIABLES = y x1 c1;
AUXILIARY = (m) c2;
Model: y ON x1 c1;

In such a case, I feel like it's important to list c1 as auxiliary in the first model, because if I didn't, it seems like the two models would be fitted to different variance-covariance matrices (i.e., b/c one incorporates c1 in FIML and the other doesn't). I'm not talking about comparing the goodness of fit, but just a less formal model-building process where I'm trying to decide which covariates to include based on their z-stats. For instance, in a scenario in which I have a large number of covariates to consider, I systematically try different covariates but make sure to include them either in AUXILIARY or USEVARIABLES. Or a scenario where I want to present effect sizes for a question predictor with two different combinations of covariates.

Does this approach make sense or am I worrying too much about the concern that leaving c1 out of both AUX and USEVARIABLES in one model than including it in USEVARIABLES in a later model?

Michael J. Kieffer posted on Sunday, February 23, 2014 - 12:03 pm

I ask the question above, b/c I also frequently use BOOTSTRAP, which I realize is incompatible with AUX = (m). If my concern is valid, are there any plans to make these two compatible in the future?

PS- I realize that the Model commands above also need a line like: x1 c1; to treat these as outcomes and get FIML to work. I just ran out of space.

Bengt O. Muthen posted on Monday, February 24, 2014 - 3:22 pm

I think it is difficult to deal with missing using Auxiliary (m) at the same time as you try to decide on which covariates to control for. I would do the latter first and if needed then the former. Particularly since Aux m often does not affect the results much. You may also want to take this by a general analysis discussion list like SEMNET.

Stafanie Chris posted on Wednesday, March 05, 2014 - 5:44 am

Hello Dr. Muthen,

I did a latent class analysis,and determined three-class solution is the most appropriate number of classes. Then, in my following step, I included three distal continuous outcomes by Lanza's method (DCON).To account for the clustering nature of my data (cases nested in classes), I used "TYPE=COMPLEX MIXTURE".However, there comes a warning, "Auxiliary variables with DCATEGORICAL or DCONTINUOUS are not available with TYPE=MIXTURE COMPLEX."

What should I do to account for the hierachcical structure of my data in mplus?

Many thanks!

Stafanie

Bengt O. Muthen posted on Wednesday, March 05, 2014 - 10:34 am

You are right that this is not implemented yet for Lanza's method. I don't have a good answer to give you.

Rimantas Vosylis posted on Tuesday, May 26, 2015 - 10:33 am

Dear Muthens,
I am running a large longitud CFA model on a dataset that has some missing data. When I run this model without aux variables my model runs fine, but when I include (m) aux variables, I get a warning: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. <...> PROBLEM INVOLVING VARIABLE.
It doesn't tell me which variable creates the problem. It seems it can be related to one of the dependent variables in the model, which residual variance I have fixed to zero, because residual var for this variable was always insignificant in my previous models and I kept getting the warnings. This variable is a bit different from the others in the model (it is grades received in school, while others are subjective measures of scholar competence), therefore I believe fixing its variance to zero is a good strategy (which it turned out to be because I had no more warning messages). I think that a problem I describe can be caused by automatically added covariances between this variable and aux variables by using “aux (m)” function. Since the error var of my variable is 0 then correlation is not estimated and gives a warning.
Do You think this can be the case? Is it possible to fix some covariances of auxiliary variables to zero?
Maybe there is some other reason? Maybe I should ignore this message because the actual model and parameters in the model look fine?
Thanks in advance!

Linda K. Muthen posted on Tuesday, May 26, 2015 - 10:56 am

Please send the relevant outputs and your license number to support@statmodel.com.

Aurelie Lange posted on Monday, December 12, 2016 - 1:49 am

Dear Dr Muth�n,

I am using MLR in a type=complex model. I have longitudinal data. For one of the analyses, I am only interested in te last the time-point. Therefore, I have added the other time-points (as well as some client characteristics) to AUXILIARY so that MLR can use all available information when estimating the model.
However, the parameter estimates do not change at all when including or excluding the AUXILIARY command. Hence, I am doubting whether it is working appropriately.

If auxiliary is not working well in this situation, would it be better to use multiple imputation instead? I understood that maximum likelihood and M.I. perform equally well, but if ML can't use all available information, then M.I. might be better, right?

Sincerely,
Aurelie

Bengt O. Muthen posted on Monday, December 12, 2016 - 4:31 pm

I would not use MI when you can do ML-MAR ("FIML"). For instance, you can include the second to last time point in the model and just use WITH to connect it to the last outcome - this is essentially what Aux(M) does. But if Aux (M) didn't change the estimates then this won't either - and nor would MI.

Rolf Gjestad posted on Wednesday, February 21, 2018 - 3:24 am

Dear Muthens

I analyze data from a national spine registry in Norway, which has a function score over 3 measurement points (0, 3 and 12 months). A linear latent growth curve model does not fit, as almost all change take place in the first 3 months. I therefore used a contrast code model with the residuals set to zero:

!AUXILIARY = (M) Age ;
!Age Gender Roker BMI ;
!OpKompl Liggdogn SmRyT1 SmBeT1 ASA TidlOpr VAST1 ;

Analysis:
Estimator = MLR ;

Model:
I S1 | ODIT1@0 ODIT2@1 ODIT3@1 ;
I S2 | ODIT1@0 ODIT2@0 ODIT3@1 ;
ODIT1@0 ; ODIT2@0 ; ODIT3@0 ;

This model with df=0 works fine in giving mean change and individual changes (variance), however; without goodness of fit statistics. I also analyzed a latent difference score model which of course resulted in equal estimates. However, when I included an auxiliary variable I got the error message:

THE MODEL ESTIMATION TERMINATED NORMALLY

WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE..... PROBLEM INVOLVING VARIABLE �ݨ�{N�.

Is the problem the lack of residual variance in the observed variables?

Bengt O. Muthen posted on Wednesday, February 21, 2018 - 4:52 pm

Why do you fix the residual variances at zero?

Rolf Gjestad posted on Wednesday, February 21, 2018 - 10:09 pm

Hi again,

I did that in order to identify the model, which then makes this a difference score model. I should of course had more measurement points and estimated latent change, but I have only 3 Points of measures. And the change are not linear at mean and individual level (variance and plot in Mplus). However, I tried two models where I pre-specified 10% and 20% of the variance in the observed variables as residuals. And then, no error message was given. Could that be a strategy here or should I pre-specify the same variance estimate in all three variables?

Regards,
Rolf

Bengt O. Muthen posted on Thursday, February 22, 2018 - 3:28 pm

You can hold the residual variances equal across time.

Rolf Gjestad posted on Thursday, February 22, 2018 - 10:34 pm

Many thanks Bengt

Yes, I now did this. But then I had to restrict some parameter in order to identity the model. I first tried the relation between I and S2, as this relation was not significant in the starting model. However, this ended with problems and no computed Standard errors. Then I restricted the covariance between the slope factors, and the model turned out OK (but I lost a stat.sign. relation between change in 0-3 and 3-12 months). So, this model will work.

Regards,
Rolf

Aurelie Lange posted on Monday, July 16, 2018 - 5:57 am

Dear Dr. Muthen,

I have tried adding auxiliary variables, however it doesn't seem to make any difference. All the values are exactly the same as the results for the model without auxiliary variables included. Nevertheless, I have included a fair amount of theoretically relevant information with auxiliary. Am I doing something wrong, or is it common that results do not change.

Thanks for your reply!

Sincerely,
Aurelie

Bengt O. Muthen posted on Monday, July 16, 2018 - 2:42 pm

Are you referring to Auxiliary (M), that is, for missing? If so, send your output to Support along with your license number.

Robert Archer posted on Thursday, October 04, 2018 - 3:26 pm

Hello, I am running a longitudinal mediation model and am trying to include auxiliary variables. When I run the model I get this error message: *** ERROR in VARIABLE command
Analysis with categorical variables is not available with the 'm' specifier in the AUXILIARY option.

Is there a way around this to where I can keep my categorical variables in the model? Thank you for any information which can be provided.

Bengt O. Muthen posted on Friday, October 05, 2018 - 2:44 pm

You could treat the categorical variable as continuous.

Robert Archer posted on Friday, October 05, 2018 - 4:00 pm

Hello Dr. Muthen, in order to treat the categorical variables as continuous should this be an option specified in Mplus or should I look into a process to recode the variables? Thank you for your response and information.

Bengt O. Muthen posted on Friday, October 05, 2018 - 5:33 pm

You just don't put them on the categorical list.

Robert Archer posted on Friday, October 05, 2018 - 6:07 pm

Okay, thank you very much.

Lisa van Zutphen posted on Wednesday, January 02, 2019 - 7:21 am

Hello,

I am going to run a cross-lagged panel model. FIML with auxiliary variables will be used to handle missing data. I noticed that there are various codes between brackets (page 896).

AUXILIARY = names of auxiliary variables;
names of auxiliary variables (M);
names of auxiliary variables (R3STEP);
names of auxiliary variables (R);
names of auxiliary variables (BCH);
names of auxiliary variables (DU3STEP);
names of auxiliary variables (DCATEGORICAL);
names of auxiliary variables (DE3STEP);
names of auxiliary variables (DCONTINUOUS);
names of auxiliary variables (E);

I cannot figure out what these codes mean (such as (M), (E) or (BCH) etc.) and if I should add code between brackets or just simply fill in the names of the auxiliary variables (without code between brackets). I hope you can explain the codes to me.

Bengt O. Muthen posted on Wednesday, January 02, 2019 - 11:47 am

See pages 615 - 618 in the Version 8 User's Guide posted on our website. The Aux (M) approach is what you want.

Lisa van Zutphen posted on Thursday, January 03, 2019 - 1:29 am

Thank you very much for your fast reply!

Ashley Hum posted on Wednesday, July 31, 2019 - 1:59 pm

Hello,
I am running a LCA and want to use Aux commands to do 2 things:1) use Aux (m) to identify variables important for missingness and 2) use R3STEP to examine predictors of class membership.

Based on my attempts, it appears you cannot use Aux = variable (m) in conjunction with Aux = variable (R3STEP).

I know in R3STEP the variables on aux don't impact the estimation of classes. Are variables included on R3STEP (which may also be important variables to for missingness) used in FIML for the part of the model focused on estimating the classes (in addition to the part of the model using multinomial logistic regressions to examine predictors of class membership)?

Thank you very much.
Ashley

Bengt O. Muthen posted on Wednesday, July 31, 2019 - 5:38 pm

No. You would have to do 1-step if you want to enjoy the benefits of FIML.

Ting Dai posted on Friday, April 17, 2020 - 2:29 pm

Dear Drs. Muthen: if a variable is potentially both related to missing at various time points and an outcome which is influenced by the latent slope and intercept of a LGC model, can I possibly include z as both an auxiliary variable AND an endogenous variable in the model?

I tried it with the ex11.1 by adding necessary syntax:

DATA: FILE = ex11.1.dat;
VARIABLE: NAMES = x1 y1-y4 z x2;

USEVARIABLES = y1-y4 z; !Adding z in the model.

MISSING = ALL (999);
AUXILIARY = (m) z;
ANALYSIS: ESTIMATOR = ML;
MODEL: i s | y1@0 y2@1 y3@2 y4@3;

z ON i s; !Adding z in the model.

OUTPUT: TECH1;

But Mplus does not recognize z as a useable variable for the LGC model. How might I achieve it?

Thanks in advance.

Bengt O. Muthen posted on Friday, April 17, 2020 - 5:59 pm

It is sufficient to include z as a DV in the model. Don't put it on the Auxiliary list.

Ting Dai posted on Thursday, April 23, 2020 - 2:56 pm

Dear Dr. Muthens:

Can a categorical or dichotomous variable be used as an auxiliary variable? In other words, can z in the following syntax be a categorical or a dichotomous variable?

AUXILIARY = (m) z;

Thanks in advance!

Bengt O. Muthen posted on Thursday, April 23, 2020 - 6:21 pm

Aux m is only for continuous variables. You would have to do the modeling yourself to mimic what Aux m does (see our Short Courses).

Simon Coulombe posted on Wednesday, June 24, 2020 - 7:31 pm

Hi,
I want to use bootstrap at the same time as using auxiliary (m) (cause of missing values). It seems impossible to combine these two commands. Am I right?

Also, I was wondering, if I include auxiliary variables but do not specify "(m)", how are these variables treated? Is it helping to deal with missingness in that case?
Thank you
Simon

Tihomir Asparouhov posted on Thursday, June 25, 2020 - 10:40 am

Bootstrap is currently not available in combination with auxiliary (m).

Without (m) the auxiliary variables will not be used to help with the missing data. They will just be kept in the data set if you are saving the data via the savedata command.

If the boostrap SE are important in your application it is technically possible to get these but with extra effort. You would have to generate bootstrap samples (Mplus currently won't do that for you) and then analyze them using the external montecarlo feature in Mplus (see User's Guide example 12.6 Step 2). Savedata: results= will give the estimates from each boostrap sample and those can be further manipulated to get SE or confidence limits.