Ryan Marek posted on Tuesday, December 11, 2012 - 9:15 am
I'm currently a graduate student who is new to this area of statistics in general. I apologize in advanced if this has been addressed earlier.
I'm currently modeling some CFAs to produce latent factors at various time points. The indicators are categorical (1 = Presence; 0 = Absence). I have around 800 cases for the first time point, 600 cases for the second time point, and 300 cases for the third time point. I used the WLSMV estimator and found excellent fitting models as well as model invariance. However, I have been told that I should be using FIML to handle my missing data. I'm curious as to what I should do.
Is there a way to use FIML with WLSMV? If not, what would you suggest I do or do you feel the analytic plan is fine as is.
Also, any recommended readings would be most helpful!
You can change the estimator to ML and you will be using FIML. With maximum likelihood and categorical dependent variables, numerical integration is required. Each factor requires one dimension of integration. A model with more than four factors can be computationally demanding.
Ryan Marek posted on Tuesday, December 11, 2012 - 2:31 pm
Thanks for the advice! I have 3 factors for each time point. When doing one dimension of integration, do I just use this syntax or do I need to specify something else?
You don't need to specify anything to obtain numerical integration is most cases. How many time points do you have?
Ryan Marek posted on Wednesday, December 12, 2012 - 7:17 am
We're taking two approaches. I am looking to model 3 latent constructs (pre-surgery, 1 month post-surgery, and 3 months post-surgery) for a longitudinal study.
For now, I believe my adviser would like to try to take a more applied approach. We have a measure we used at the first time point (pre-surgery) and we'd like to demonstrate how latent construct modeling can help clean up our post-surgical measures to show how our pre-surgical measure can adequately predict these latent constructs 1 and 3 months from surgery. In this case, we would have two time points, but we'd not really be modeling change across time.
Ryan Marek posted on Wednesday, December 12, 2012 - 7:21 am
Let me be more clear in my above message. We have 3 latent constructs for three time points in our first approach to model change across time. In our second approach, we have 3 latent constructs for two time points and want to our measure to predict these latent constructs.
Your models have too many dimensions of integration to be practical using maximum likelihood. Weighted least squares does not handle your missing data properly. I would suggest using Bayes with the default of non-informative priors. This handles missing data in a full-information way like maximum likelihood.
J Owens posted on Wednesday, January 02, 2013 - 9:52 am
Dear Dr. Muthen:
I am running a path analytic model with a categorical endogenous mediating variable. I am estimating nested models and need the Chi-square goodness-of-fit results to test for statistical significance using the sequential/forward constraint imposition method. As a result, I am using the WLS estimator with the theta parameterization. However, I also have missing data and would like to use FIML estimation to handle it. Is this possible? If so, what command would I use to tell MPlus to use FIML for the missing data?
FIML refers to maximum likelihood estimation not weighted least squares. The default in Mplus is to use all available information for all estimators. If you have a lot of missing data, I suggest using maximum likelihood estimation or multiple imputation. However, you will not obtain chi-square values for these methods that can be used for testing nested models.
I am reading your response to this question, experiencing some confusion about it. I, too, am running a path analysis with a categorical variable. My categorical variable is a dichotomous covariate in a path model with 11 other continuous variables. As I understand it, FIML is preferred for its ability to handle missing data and non-normality (I have a fair amount of missing data, skewness, and kurtosis), and yet, it is not advised for a model that has a categorical variable. It seems that the WLS or WLSMV is preferred for a model with a categorical variables. Is this correct? Specifically, is it correct that FIML is not appropriate if one of my variables is dichotomous?
No, this is not correct. Both WLSMV and ML can be used with categorical outcomes. ML is preferred if there is a lot of missing data and not too many latent variables with categorical indicators. Please note that the scale of an observed exogenous variable is not an issue. All observed exogenous covariates in regression are treated as continuous whether they are binary or continuous.
Thank you so much for clearing this up! As I'm sure you know, wading around in the statistics world can be pretty overwhelming sometimes. I really appreciate your help.
Joy Thompson posted on Saturday, February 27, 2016 - 11:31 am
My model has endogenous latent variables with categorical indicators and an endogenous observed dichotomous variable. My understanding is that I cannot use ML estimation because my data are categorical, but that I should instead use either continuous/categorical variable methodology (CVM) or weighted least squares (WLS). I believe that WLS is offered in Mplus whereas CVM is not; is that correct? Also, any insights about missing data techniques that can be used with WLS or CVM would be greatly appreciated. My understanding is that FIML can be used to handle missing data that is continuous and normal, but should not be used with categorical data. What missing data technique is recommended with the WLS or CVM estimation techniques?
It is a fallacy that ML cannot be used with categorical variables. ML and the CATEGORICAL option can be used in Mplus to obtain logistic or probit regression. Missing data handling is the same as for continuous variables. MPLUS also has weighted least squares estimation via WLS, WLSM, and WLSMV which using the CATEGORICAL option provides probit regression.
I don't know what you mean by "use the assumptions under ML". Logistic regression is typically done by ML and that is how it is done in Mplus. There are really no special ML assumptions for logistic regression. The logistic regression model of course has assumptions. With missing data, ML handles "FIML" in this case as well. FIML is just ML under the MAR assumption and that is applicable not only to continuous variables but categorical as well.
Thanks very much! I meant whether the ML assumptions, particularly multivariate normality which only continuous variables can satisfy, would still be applicable when using ML for logistic regression. I should have been more specific. Thank you for clarifying. It sounds like ML estimation is quite robust.