Mplus Discussion >> Model evaluation for logistic regression

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Model evaluation for logistic regression

Mplus Discussion > Categorical Data Modeling >

Message/Author

Zach Gassoumis posted on Friday, February 07, 2014 - 1:20 pm

I am running logistic regression in Mplus using FIML. I would like to be able to generate some of the traditional model evaluation statistics that journal reviewers expect for logistic regression, to decrease the odds of my paper being rejected due to a reviewer's lack of familiarity with modelling in Mplus. To this end, I have two questions:

1) The R-square statistic in the output (presumably for Y*, per equation 15 of the first technical appendix) is helpful, but I would also like to provide the more commonly reported McFadden R-square. McFadden's pseudo R-square is a ratio of the estimated and null models' loglikelihoods, but the loglikelihood required is different from the one provided in Mplus' Model Fit section. Is there any way to obtain the loglikelihood statistic required for McFadden's R-square in the Mplus output?

2) I would also like to calculate the Hosmer & Lemeshow goodness-of-fit statistic. For a model without missing data, I could calculate it from the parameter estimates. But this is a poor approach for a model with missingness on variables that strongly predict the outcome variable. Is there any way to generate predicted probabilities (or predicted Y* values for all observations in a dataset) using the Mplus model's FIML-based estimates? Or would this be a violation of the assumptions required in an FIML modelling context?

Bengt O. Muthen posted on Friday, February 07, 2014 - 2:22 pm

1) You can get the McFadden R-square using 2 runs, one with all the slopes fixed at zero.

2) It sounds like you have missingness on covariates - I don't know the best way to do get the predicted probs for that case.

Zach Gassoumis posted on Friday, February 07, 2014 - 3:14 pm

Thank you for the guidance. When I do the two runs to get McFadden's R-square, what values should I plug into the R-square equation? I tried using the Loglikelihood from the Model Fit section, but the resultant value was far from the true McFadden R-square I calculated based on running the same model in SAS: 0.014 from Mplus vs. 0.149 from SAS (all variables had complete data). Note: this was a simulation dataset, and the SAS value is in-line with expectations.

Bengt O. Muthen posted on Saturday, February 08, 2014 - 8:28 am

Show me the formula you use, the output section from Mplus where you pick up the LL values, and how you plug in the values.

Zach Gassoumis posted on Saturday, February 08, 2014 - 9:10 am

The formula I'm using is:
R2v = 1 - LLM / LL0
(LLM = LL of the estimation model; LL0 = LL of the null model)

I've pulled the LL values from the Model Fit section of the output. For the estimation model:

MODEL FIT INFORMATION 
 
Number of Free Parameters                       19 
 
Loglikelihood 
 
          H0 Value                       -6715.375

And for the null model:

MODEL FIT INFORMATION 
 
Number of Free Parameters                       15 
 
Loglikelihood 
 
          H0 Value                       -6808.280

This results in:
R2v = 1 - -6715.375 / -6808.280 = 0.0136

By contrast, the SAS output of the same estimation model includes:

         Model Fit Statistics 
 
                             Intercept 
              Intercept            and 
Criterion          Only     Covariates 
 
AIC            1251.175       1073.364 
SC             1256.083       1097.903 
-2 Log L       1249.175       1063.364

R2v = 1 - (1063.364/-2) / (1249.175/-2) = 0.1490

All parameter and variance estimates are equivalent across the two programs.

Bengt O. Muthen posted on Monday, February 10, 2014 - 10:21 am

The loglikelihood values are very different between what you report for SAS and Mplus, so something is off here. For instance, the LL for the Mplus H0 model is -6715.375 whereas the SAS number is -531.682, which is 10 times less.

To sort this out, send your Mplus outputs and a pdf of the SAS output to Support.

Bengt O. Muthen posted on Monday, February 10, 2014 - 11:15 am

The reason the LLs are so different is that by mentioning their variances your Mplus runs bring the covariates into the model in the sense of estimating their parameters. That implies that you have not one but several DVs and therefore the LLs of the two programs are not on the same metric. Since you are doing regression you don't want to include the covariates in the model. Don't mention the variances of the covariates and you will get agreement.

Zach Gassoumis posted on Monday, February 10, 2014 - 11:31 am

Thank you for the clarification. The example I sent you had complete data, for simplicity - I'm sorry that I neglected to mention this. Once I apply the model to a dataset with missing data, I will need to include the covariates in the model to invoke FIML. Based on your response, I take it that the LLs cannot be used to calculate McFadden's R-square for the logistic DV when the model utilizes FIML - is that correct?

Bengt O. Muthen posted on Monday, February 10, 2014 - 3:37 pm

I don't know - that is a research question. Perhaps it is possible to do a run with only the covariates - just-identified modeling handling the missingness - and then subtract that LL value from each of the two LL's to eliminate the marginal covariate LL and thereby still consider "y | x" as in McFadden's approach.

Karen S. Mitchell posted on Friday, February 07, 2020 - 7:03 am

Dr. Muthen, would you still recommend subtracting the covariate-only model LL from the estimated model LL and null model LL to calculate McFadden's R-square? I have a similar situation where I'm including the covariates in order to use maximum likelihood but want to calculate a pseudo R-square for models with nominal DVs.

Bengt O. Muthen posted on Friday, February 07, 2020 - 2:52 pm

I would do 2 runs, one with all covariate coefficients free and one with all of them fixed at zero.

Karen S. Mitchell posted on Monday, February 10, 2020 - 9:08 am

To clarify, if this is my model (DV has 3 categories):

DV#1 on IV cov1 cov2 ;
DV#2 on IV cov1 cov2 ;

IV;
cov1 ;
cov2 ;

I would compare the loglikelihood from the model above to the LL from the following model:

DV#1 on IV@0 cov1@0 cov2@0 ;

Bengt O. Muthen posted on Monday, February 10, 2020 - 5:06 pm

Right, but you would mention the X parameters in both models and I would do it this way:

IV with cov1 cov2;
cov1 with cov2 ;

Chelsea Garneau posted on Wednesday, June 03, 2020 - 8:24 am

If I am modeling indirect effects, do I need to fix those to zero also in my constrained model?

Bengt O. Muthen posted on Wednesday, June 03, 2020 - 2:41 pm

What's the context?

Chelsea Garneau posted on Friday, June 05, 2020 - 10:37 am

Here is the syntax for the constrained model.

Model:
uncertMutual by unq1r unq2r unq3r unq4r; uncertDef by unq5r unq6r unq7r unq8r; uncertFuture by unq9r unq10r unq11r unq12r; uncert by uncertMutual uncertDef uncertFuture;
divorceatt by divaq1r divaq2r divaq3; loveenough by lovenq1 lovenq2 lovenq3 lovenq4;

uncert divorceatt loveenough on MomNPart@0;
uncert divorceatt loveenough on DadNPart@0;
uncert divorceatt loveenough on Divorced@0;
uncert divorceatt loveenough on ParentsCycle@0;
Cycled on uncert@0 divorceatt@0 loveenough@0 MomNPart@0 DadNPart@0 Divorced@0 ParentsCycle@0;

Model indirect:
Cycled ind uncert MomNPart;
Cycled ind divorceatt MomNPart;
Cycled ind loveenough MomNPart;
Cycled ind uncert DadNPart;
Cycled ind divorceatt DadNPart;
Cycled ind loveenough DadNPart;
Cycled ind uncert Divorced;
Cycled ind divorceatt Divorced;
Cycled ind loveenough Divorced;
Cycled ind uncert ParentsEverCycle;
Cycled ind divorceatt ParentsCycle;
Cycled ind loveenough ParentsCycle;

Bengt O. Muthen posted on Monday, June 08, 2020 - 5:13 pm

I don't understand the question. I recommend watching path analysis in our Short Course Topic 1 and mediation analysis in our Short Course Topic 9.