I want to run a logistic regression for a subpopulation (men), using the complex survey design and also full maximum likelihood information for missing data; I wrote the command as follow, however, I still have missing values, why it is not using all cases? My total sample is 3,906 and it is using only 3600, it excludes the missing on the x-s
Usevariables are cesd2 bmival25 waist25 indager2;
Missing = all (-9999); USEOBSERVATIONS= indsex2 EQ 1; STRAT IS strata2; CLUSTER IS psu2; WEIGHT IS wwgt2; Categorical is cesd2; Define: psu2 = (strata2*10000)+psu2; Analysis: Type is complex; ESTIMATOR=MLR; Model: cesd3 on bmival25 waist25 indager2;
The model is estimated conditioned on the independent variables. Missing data theory applies to the dependent variables. This is why these cases are excluded. You can bring them into the model by mentioning their variances in the MODEL command. When you do this, they are treated as dependent variables and distributional assumptions are made about them. This may not be appropriate.
I run the logistic regression bringing the independent variables in the model by mentioning their variances. I got a message saying that I need to specify the INTEGRATION=MONTECARLO. So I have run the model as follow
SUBPOPULATION men2 EQ 0; Missing = all (-9999); STRAT IS strata2; CLUSTER IS psu2; WEIGHT IS wwgt2; Categorical is cesd2; Define: psu2 = (strata2*10000)+psu2; Analysis: TYPE = COMPLEX; ESTIMATOR=MLR; INTEGRATION=montecarlo; Model:
Just as a teaching note because this is a topic that often comes up - EM is an algorithm and FIML (or ML) is an estimator. ML can be done using EM or other algorithms (such as NR, FS, and QN - see Mplus UG). There is a tendency in some SEM writings to refer to ML estimation of an unrestricted covariance matrix (and mean vector) as "EM" because that is the typical algorithm. But that's for continuous outcomes.
So ML is used here, allowing missing data, not doing imputations of missing data, but estimating the model parameters directly.