Mplus Discussion >> Logistic regression with survey data FIML

Topics
Last Day
Last 3 Days
Last Week
Tree View


Logistic regression with survey data ...

Mplus Discussion > Categorical Data Modeling >

Message/Author

Paola Zaninotto posted on Thursday, February 12, 2009 - 9:25 am

I want to run a logistic regression for a subpopulation (men), using the complex survey design and also full maximum likelihood information for missing data; I wrote the command as follow, however, I still have missing values, why it is not using all cases? My total sample is 3,906 and it is using only 3600, it excludes the missing on the x-s

Usevariables are cesd2 bmival25 waist25 indager2;

Missing = all (-9999);
USEOBSERVATIONS= indsex2 EQ 1;
STRAT IS strata2;
CLUSTER IS psu2;
WEIGHT IS wwgt2;
Categorical is cesd2;
Define:
psu2 = (strata2*10000)+psu2;
Analysis:
Type is complex;
ESTIMATOR=MLR;
Model:
cesd3 on bmival25 waist25 indager2;

Linda K. Muthen posted on Thursday, February 12, 2009 - 1:05 pm

The model is estimated conditioned on the independent variables. Missing data theory applies to the dependent variables. This is why these cases are excluded. You can bring them into the model by mentioning their variances in the MODEL command. When you do this, they are treated as dependent variables and distributional assumptions are made about them. This may not be appropriate.

Paola Zaninotto posted on Friday, February 13, 2009 - 5:20 am

when i run the same model for a continuos variable, it uses the whole sample, although independent variables have missings, why is that?

Linda K. Muthen posted on Friday, February 13, 2009 - 8:18 am

For continuous variables and TYPE=GENERAL, the results are the same whether the exogenous observed variables are in the model or not.

Paola Zaninotto posted on Tuesday, February 17, 2009 - 7:55 am

I run the logistic regression bringing the independent variables in the model by mentioning their variances. I got a message saying that I need to specify the INTEGRATION=MONTECARLO. So I have run the
model as follow

SUBPOPULATION men2 EQ 0;
Missing = all (-9999);
STRAT IS strata2;
CLUSTER IS psu2;
WEIGHT IS wwgt2;
Categorical is cesd2;
Define:
psu2 = (strata2*10000)+psu2;
Analysis:
TYPE = COMPLEX;
ESTIMATOR=MLR;
INTEGRATION=montecarlo;
Model:

cesd2 on bmival25 waist25 indager2 livepart2 totwq5_b2 alcohol2 cigst12 limitill2 adl2;
bmival25 waist25 indager2 livepart2 totwq5_b2 alcohol2 cigst12 limitill2 adl2;

In this way it uses all available cases (results are very close to those from complete case analysis).
Is this using the EM algorithm instead of FIML?

Bengt O. Muthen posted on Tuesday, February 17, 2009 - 11:07 am

FIML is used.

Just as a teaching note because this is a topic that often comes up - EM is an algorithm and FIML (or ML) is an estimator. ML can be done using EM or other algorithms (such as NR, FS, and QN - see Mplus UG). There is a tendency in some SEM writings to refer to ML estimation of an unrestricted covariance matrix (and mean vector) as "EM" because that is the typical algorithm. But that's for continuous outcomes.

So ML is used here, allowing missing data, not doing imputations of missing data, but estimating the model parameters directly.