Model evaluation for logistic regression PreviousNext
Mplus Discussion > Categorical Data Modeling >
Message/Author
 Zach Gassoumis posted on Friday, February 07, 2014 - 1:20 pm
I am running logistic regression in Mplus using FIML. I would like to be able to generate some of the traditional model evaluation statistics that journal reviewers expect for logistic regression, to decrease the odds of my paper being rejected due to a reviewer's lack of familiarity with modelling in Mplus. To this end, I have two questions:

1) The R-square statistic in the output (presumably for Y*, per equation 15 of the first technical appendix) is helpful, but I would also like to provide the more commonly reported McFadden R-square. McFadden's pseudo R-square is a ratio of the estimated and null models' loglikelihoods, but the loglikelihood required is different from the one provided in Mplus' Model Fit section. Is there any way to obtain the loglikelihood statistic required for McFadden's R-square in the Mplus output?

2) I would also like to calculate the Hosmer & Lemeshow goodness-of-fit statistic. For a model without missing data, I could calculate it from the parameter estimates. But this is a poor approach for a model with missingness on variables that strongly predict the outcome variable. Is there any way to generate predicted probabilities (or predicted Y* values for all observations in a dataset) using the Mplus model's FIML-based estimates? Or would this be a violation of the assumptions required in an FIML modelling context?
 Bengt O. Muthen posted on Friday, February 07, 2014 - 2:22 pm
1) You can get the McFadden R-square using 2 runs, one with all the slopes fixed at zero.

2) It sounds like you have missingness on covariates - I don't know the best way to do get the predicted probs for that case.
 Zach Gassoumis posted on Friday, February 07, 2014 - 3:14 pm
Thank you for the guidance. When I do the two runs to get McFadden's R-square, what values should I plug into the R-square equation? I tried using the Loglikelihood from the Model Fit section, but the resultant value was far from the true McFadden R-square I calculated based on running the same model in SAS: 0.014 from Mplus vs. 0.149 from SAS (all variables had complete data). Note: this was a simulation dataset, and the SAS value is in-line with expectations.
 Bengt O. Muthen posted on Saturday, February 08, 2014 - 8:28 am
Show me the formula you use, the output section from Mplus where you pick up the LL values, and how you plug in the values.
 Zach Gassoumis posted on Saturday, February 08, 2014 - 9:10 am
The formula I'm using is:
R2v = 1 - LLM / LL0
(LLM = LL of the estimation model; LL0 = LL of the null model)

I've pulled the LL values from the Model Fit section of the output. For the estimation model:

MODEL FIT INFORMATION 

Number of Free Parameters 19

Loglikelihood

H0 Value -6715.375


And for the null model:

MODEL FIT INFORMATION 

Number of Free Parameters 15

Loglikelihood

H0 Value -6808.280


This results in:
R2v = 1 - -6715.375 / -6808.280 = 0.0136


By contrast, the SAS output of the same estimation model includes:

         Model Fit Statistics 

Intercept
Intercept and
Criterion Only Covariates

AIC 1251.175 1073.364
SC 1256.083 1097.903
-2 Log L 1249.175 1063.364


R2v = 1 - (1063.364/-2) / (1249.175/-2) = 0.1490


All parameter and variance estimates are equivalent across the two programs.
 Bengt O. Muthen posted on Monday, February 10, 2014 - 10:21 am
The loglikelihood values are very different between what you report for SAS and Mplus, so something is off here. For instance, the LL for the Mplus H0 model is -6715.375 whereas the SAS number is -531.682, which is 10 times less.

To sort this out, send your Mplus outputs and a pdf of the SAS output to Support.
 Bengt O. Muthen posted on Monday, February 10, 2014 - 11:15 am
The reason the LLs are so different is that by mentioning their variances your Mplus runs bring the covariates into the model in the sense of estimating their parameters. That implies that you have not one but several DVs and therefore the LLs of the two programs are not on the same metric. Since you are doing regression you don't want to include the covariates in the model. Don't mention the variances of the covariates and you will get agreement.
 Zach Gassoumis posted on Monday, February 10, 2014 - 11:31 am
Thank you for the clarification. The example I sent you had complete data, for simplicity - I'm sorry that I neglected to mention this. Once I apply the model to a dataset with missing data, I will need to include the covariates in the model to invoke FIML. Based on your response, I take it that the LLs cannot be used to calculate McFadden's R-square for the logistic DV when the model utilizes FIML - is that correct?
 Bengt O. Muthen posted on Monday, February 10, 2014 - 3:37 pm
I don't know - that is a research question. Perhaps it is possible to do a run with only the covariates - just-identified modeling handling the missingness - and then subtract that LL value from each of the two LL's to eliminate the marginal covariate LL and thereby still consider "y | x" as in McFadden's approach.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: