Message/Author 


I want to run a logistic regression for a subpopulation (men), using the complex survey design and also full maximum likelihood information for missing data; I wrote the command as follow, however, I still have missing values, why it is not using all cases? My total sample is 3,906 and it is using only 3600, it excludes the missing on the xs Usevariables are cesd2 bmival25 waist25 indager2; Missing = all (9999); USEOBSERVATIONS= indsex2 EQ 1; STRAT IS strata2; CLUSTER IS psu2; WEIGHT IS wwgt2; Categorical is cesd2; Define: psu2 = (strata2*10000)+psu2; Analysis: Type is complex; ESTIMATOR=MLR; Model: cesd3 on bmival25 waist25 indager2; 


The model is estimated conditioned on the independent variables. Missing data theory applies to the dependent variables. This is why these cases are excluded. You can bring them into the model by mentioning their variances in the MODEL command. When you do this, they are treated as dependent variables and distributional assumptions are made about them. This may not be appropriate. 


when i run the same model for a continuos variable, it uses the whole sample, although independent variables have missings, why is that? 


For continuous variables and TYPE=GENERAL, the results are the same whether the exogenous observed variables are in the model or not. 


I run the logistic regression bringing the independent variables in the model by mentioning their variances. I got a message saying that I need to specify the INTEGRATION=MONTECARLO. So I have run the model as follow SUBPOPULATION men2 EQ 0; Missing = all (9999); STRAT IS strata2; CLUSTER IS psu2; WEIGHT IS wwgt2; Categorical is cesd2; Define: psu2 = (strata2*10000)+psu2; Analysis: TYPE = COMPLEX; ESTIMATOR=MLR; INTEGRATION=montecarlo; Model: cesd2 on bmival25 waist25 indager2 livepart2 totwq5_b2 alcohol2 cigst12 limitill2 adl2; bmival25 waist25 indager2 livepart2 totwq5_b2 alcohol2 cigst12 limitill2 adl2; In this way it uses all available cases (results are very close to those from complete case analysis). Is this using the EM algorithm instead of FIML? 


FIML is used. Just as a teaching note because this is a topic that often comes up  EM is an algorithm and FIML (or ML) is an estimator. ML can be done using EM or other algorithms (such as NR, FS, and QN  see Mplus UG). There is a tendency in some SEM writings to refer to ML estimation of an unrestricted covariance matrix (and mean vector) as "EM" because that is the typical algorithm. But that's for continuous outcomes. So ML is used here, allowing missing data, not doing imputations of missing data, but estimating the model parameters directly. 

Back to top 