Logistic regression models PreviousNext
Mplus Discussion > Categorical Data Modeling >
Message/Author
 Daniel posted on Tuesday, December 16, 2003 - 6:39 am
Hi, is there a way to determine whether a model is linear in the logit for continuous predictor variables in mplus? I am following the Hosmer and Lemeshow criteria for variable selection, section 4.2, in their 2000 second edition text.
 bmuthen posted on Tuesday, December 16, 2003 - 10:03 am
You can always plot the sample logits against x to see if they are approximately linear in x. See Muthen (1993) in the Bollen-Long book - ref on the Mplus web site.
 daniel posted on Wednesday, December 17, 2003 - 4:39 am
Thank you.
 Don posted on Saturday, August 27, 2005 - 10:27 am
Hi, I have a big logictic model:
i. 1 binary response(dependent, y)
ii. 20 explanatory(independent, c1-c15 dichotomous, f16-f20 continuous)
iii. 5 interactions including "dichotomous*continuous" and "dichotomous*dichotomous"
iv. 7 coefficients had to be fixed in logictic regression
v. the missing are everythere, "." symbol in SAS file

When I ran yesterday, it appeared that:
1. I couldn't use "TYPE=LOGICTIC", the error message suggested to use "TYPE=RANDOM", CAN I use "TYPE = general missing h1"?
2. I couldn't use "c1xf16 | c1 XWITH f16" and "c1xc2 | c1 XWITH c2", the error message suggested to use "DEFINE command", I tried "DEFINE: c1xf16 = c1 * f16
c1xc2 = c1 * c2;", but the error message said that "c1xf16" and "c1xc2" are unknown, so it didn't work. Could you please to tell me how to create the interaction terms in M+ command? OR is there a simple way in M+ to for interaction like in S+: "c1:f16" or in SAS: "c1*c2" ?
3. Can I use "OUTPUT: standardized sampstat"? If I can't, does it realy matter for model estimates interpretation?
4. Does M+ offer the results of -2*loglikelihood under Ho and Ha? Since we need to use likelihood ratio test for significant term-selestion but I didn't find out in the output.
5. Does M+ have a build-in or existing function for stepwise model selection based on specified BIC or AIC? I know S+ has this kind of function called "stepwise".

Thanks
 Linda K. Muthen posted on Saturday, August 27, 2005 - 4:26 pm
1. TYPE=LOGISTIC; is just for univariate logisitc regression. If you have categorical outcomes and are using the maximum likelihood estimator, you are estimating logistic regressions. Yes, you can use TYPE=GENERAL MISSING H1; with ESTIMATOR = ML or MLR;

2. If you define new variables, you need to put them at the end of the USEVARIABLES statement. If you did that and still get the message, send your input, data, output, and license number to support@statmodel.com.

3. Yes, you can use this in most cases. You will get sample statistics and standardized parameter estimates.

4. You get a loglikelihood value for your H0 model.

5. No.

Please send further questions of this type along with your license number to support@statmodel.com. We try to reserve Mplus Discussion for questions that do not fall under Mplus support.
 Don posted on Wednesday, August 31, 2005 - 12:53 pm
Hi, Linda

I am dealing a data with less than 18000 subjects (18000 rows in SAS data file) with less than 12000 complete.
In the future, I am told to deal with 300000 - 600000 subjects (rows in SAS data file).

Could M+ handle this huge data set?
Could you please tell me the limit of rows or records for M+?

Thanks
 Linda K. Muthen posted on Wednesday, August 31, 2005 - 2:39 pm
There are no limits on the numbers of observations other than the limits imposed by your own computer.
 Marten Beckman posted on Thursday, September 15, 2005 - 5:34 am
Hi,
I have a problem with calculating propensity scores: I wanted to predict a binary (treatment/non-treatment)outcome to receive the propabilites to which group the persons should belong considering their covariate-differences.
Problem: I have a hierarchical dataset with some non-ignorable missing data - so Mplus seemed to be the option. But I encountered two problems which i couldn't solve with the manual only:

1) Theoretically I think a "TYPE = LOGISTIC MISSING H1 CLUSTER"-analysis would be appropriate but it seems that this combination doesn't work.

2) I need to have the estimated probabilities to which outcome class my cases belong. [In the manual I just found an option for latent class analysis (these CPROBABILITIES)]. I couldn't find a similar option for the logistic regression.

So my question would be: Can Mplus solve these two problems? How can I do that?

Thank you very much,

Marten
 Linda K. Muthen posted on Thursday, September 15, 2005 - 8:18 am
1. I think what you want is TYPE=COMPLEX MISSING; with ESTIMATOR = ML; This will give you logistic regression when you have categorical outcomes. TYPE=LOGISTIC; is only for univariate logistic regression and is limited in which options can be used with it.

2. You would need to calculate probabilities from the logistic regression coefficients. How to do this is described in Chapter 13.
 Marten Beckman posted on Thursday, September 15, 2005 - 10:51 am
thank you, that was very helpful already - the model seems to work (although with MLR instead of ML - but that sould be fine i guess)
for 2.) this chapter 13 did not refer to the problem i have. i need the indvidual values for predicting the class membership: i can't find a command to tell this mplus (or to get the the odds or something)
To illustrate it a bit more what I need: I refer to those values which were equivalent to those ones one gets out of the SPSS-logistic regression with the option:
"regression -> binary regression -> save -> predicted values -> probabilities",
or in syntax language the
LOGISTIC REGRESSION y_binary
/METHOD = ENTER x_contin
/SAVE = PRED DEV

I hope that I was able to explain myself a bit better...

Thank you again!

Marten
 BMuthen posted on Thursday, September 15, 2005 - 2:42 pm
Mplus does not have an automatic way to compute the values that you want. You would need to compute them using DEFINE in a second run. Let's say that there is one predictor x and therefore one estimated slope b equal to .5 say and one intercept equal to .2 say.

DEFINE: logit = 0.2 + 0.5*x;
prob = 1 / (1 + exp (-logit));

Use the SAVEDATA command to save these values.
 David Bard posted on Wednesday, December 06, 2006 - 11:56 am
How does M+ calculate the StdYX estimates in logistic, proportional odds, and multinomial logit models?
 Linda K. Muthen posted on Thursday, December 07, 2006 - 9:02 am
For logistic and proportional odds (ordered categorical I assume), a latent response variable variance is used. It's residual has variance pi squared divided by three. For multinomial logistic regression, standardized coefficients are not provided.
 David Bard posted on Thursday, December 07, 2006 - 11:58 am
Hmmm, I've tried using that value but cannot replicate the M+ values. For example, my point estimate for a single latent variable predictor's logistic coefficient is .277. The variance of this latent predictor is .78. The reported Std coefficient is .245 which does = .277*sqrt(.78). But the reported StdYX is .128 which by my hand calculation does not equal .277*sqrt(.78)/(pi/sqrt(3))=.135. Am I setting up the calculation correctly?
 Bengt O. Muthen posted on Thursday, December 07, 2006 - 4:08 pm
For the DV variance you want to use the sum of the explained part and the residual variance.
 Manuel posted on Friday, June 27, 2008 - 12:14 pm
I am using a bivariate panel model across four time points with one dichotomous (R) and one continuous (S) variable. I assume an autoregressive process for R and use S as a time-varying covariate.

VARIABLE:
NAMES ARE R03 R04 R05 R06 S03 S04 S05;
CATEGORICAL ARE R03 R04 R05 R06;
ANALYSIS: ESTIMATOR = ML;
MODEL: R06 ON R05 S05;
R05 ON R04 S04;
R04 ON R03 S03;

!S05 WITH S04 S03;
!S04 WITH S03;

Two quick questions:

1. The model is exactly what I want, but unfortunately the only fit statistics I get are LL(H0), AIC, BIC and adj. BIC, which are more or less helpful in comparing slightly different specifications, but not enough evidence for my reviewers;-) What other options (fit indices, etc.) do I have to justify its use?

2. Regarding the last two omitted ("!") statements: It is my understanding that by default Mplus assumes all exogenous variables to be correlated (as in standard regression). That is what I also assume for all S variables, although I am not particularly interested in this correlation. Parameter estimates are the same with or without the WITH statement, but fit indices differ, so should I include or omit the WITH statement when assessing & reporting model fit?

Thank you very much!
 Linda K. Muthen posted on Friday, June 27, 2008 - 2:36 pm
1. With ML, this is all you get. You can use WLSMV and probit regression if you want to see more traditional fit statistics.

2. The means, variances, and covariances of the exogenous observed variables are not parameters in the model unless you bring them into the model as you do with you WITH statements. When you bring them in, you make distributional assumptions about them.
 yang posted on Friday, October 31, 2008 - 7:46 am
Dear Linda,

In a path analysis with binary outcomes,
y1 on x z
y2 on x z
y3 on y1 y2 z

1. is it possible to estimate indirect effects and be able to get odds ratio interpretation? Using ML gives an error message.
2. is the indirect effect of x on y3 through y1 equivalent to the product of the coefficients for y1 on x and y3 on y1?
 Linda K. Muthen posted on Friday, October 31, 2008 - 9:24 am
1. I would need to see the output and your license number at support@statmodel.com to comment on this.

2. Yes.
 fritz kern posted on Sunday, November 16, 2008 - 1:10 pm
Hello,
I've got 4 quick questions following your post from September 15, 2005.

1. Am I right that multiple predictors are simply added up?
For example logit = 0.2 + 0.5*x1 + 0.7*x2 + 0.3x3;

2. Do I use the unstandardized or standardized coefficients for intercept and slopes? I guess unstandardized but I'd like to make this sure.

3. How can I save the prob variable from the define command? I tried various savedata subcommands I found in the manual, but obviously not the correct one.

4. Is it still true for Mplus 5.1 that there is no automatic way to compute these probabilites? And if there is an automatic way - what is it?

Thanks in advance!
 fritz posted on Sunday, November 16, 2008 - 1:35 pm
Sorry, I forgot another question: Does it matter if the X1, X2 and so on are continious or categorical? Let's say, I have a model like

y_binary = intercept + X1_continous + x2_continous + x3_categorical

(For example x1 = ses, x2 = personality trait, x3 = sex)

Do I have to take this into account computing the logit?

I just answerded questions 3 from the previous post. I forgot the usevariables command - sorry!

Thanks again!
 Linda K. Muthen posted on Monday, November 17, 2008 - 8:44 am
1. Yes.
2. Unstandardized.
4. If probabilities are not part of the results, there is no option to compute them.

The formula is the same for continuous or binary covariates. You simply select an appropriate value of the covariate.
 fritz posted on Monday, November 17, 2008 - 11:02 am
Thanks a lot. I've got another follow up question:

I computed logistic regression with SPSS and MPlus. The results are quite comparable. The only striking difference the algebraic sign of the constant/intercept. In SPSS it's negative in MPlus positive; in both programms it's about +/-9. I don't understand this.

In addition, computing logit and prob, I get a zero variance for the prob. I tried the same calculation for the logit with a negative sign for the MPlus intercept. This lead to comparable probs as I got in SPSS. Do I have to multiply the intercept by -1?

Thanks again!
 Linda K. Muthen posted on Monday, November 17, 2008 - 12:14 pm
Mplus gives a threshold. SPSS gives an intercept. The sign is the only difference between the two. So if the formula requires the intercept, you need to change the sign.
 Richard Rivera posted on Wednesday, June 17, 2009 - 3:48 pm
In M+, I conducted multiple logistic regression analysis on binary outcome (w/ missing data theory). I attained a threshold for each independent variable as well as for the dependent variable.

In spss, I would like to create classification table. In computing the logodds for each case - i need the model intercept.

1. Should I just just the threshold associated with dependent variable(then multiply by -1)?

2. If not, how can I calculate one intercept for the model from the M+ thresholds?

Thanks
 Linda K. Muthen posted on Wednesday, June 17, 2009 - 4:30 pm
Yes.
 Richard Rivera posted on Thursday, June 25, 2009 - 2:43 pm
I conducted multiple logistic regression with the default procedure (FIML for estimate parameters) for data w/ missingness . Moreover, I used MLR estimation w/ monetecarlo numerical integration.

I would like to calculate Satorra-Bentler chi-square difference test between model (w/ multiple indicators) and model with just the intercept. However, I need to how to get log likelihood of just the intercept model.

Thanks
 Bengt O. Muthen posted on Thursday, June 25, 2009 - 5:27 pm
Fix all the slopes at zero.
 Rebeca  posted on Friday, January 06, 2012 - 2:27 pm
Hello,

I conducted a logistic regression analysis with a dichotomous moderating variable (i.e. gender). My results showed two significant interactions but I am having trouble figuring out how to interpret these findings in MPLUS.

I tried to probe the interactions similarly to how I would in SPSS (i.e., create a separate variable for boys and one for girls, create new interaction term with variable, and re-run the analyses separately for boys and girls); however, this did not seem to work as the results showed that nothing was significant, which doesn't make sense.

Also, I thought of exporting the data to SPSS so that I could probe and graph the interactions there but I don't think that the values in the ouput or any other scores that I could save would allow me to do that.

I've looked through all the forums and wasn't able to find anything that was especially helpful for this particular issue. Do you have any other suggestions? Thank you.
 Bengt O. Muthen posted on Friday, January 06, 2012 - 6:01 pm
It sounds like you create an interaction variable as x1*x2, where x2 is dichotomous and you are interested in knowing what the slopes of x1 are for the different genders in the regression

y = a + b1*x1+b2*x2+b3*x1*x2+e.

If that is the case and x2 is scored 0/1, you simply use your regular regression knowledge to find that for x2=0 the x1 slope is b1 and for x2=1 the x1 slope is b1+b3.
 Rebeca  posted on Tuesday, January 10, 2012 - 9:09 am
Thank you. Just to clarify, I know that a=the constant but do I get this number from the threshold section? If so, because there are two values here do I assume that they are the two values of my dichotomous outcome variable (dropout$1;graduated or dropout$2;dropout) such that I would choose the value in the estimate column that corresponds to the value I am most interested in (i.e., dropout)?

Relatedly, I know that e=the error term but where in the mplus output would I be able to locate this value. Thank you again for all of your help, it is greatly appreciated.
 Bengt O. Muthen posted on Tuesday, January 10, 2012 - 11:06 am
With a dichotomous outcome, the a term is obtained as the negative of the threshold for this outcome. But it sounds like you have more than 2 outcome categories and have either an ordinal or nominal outcome.

You don't use e when you plot interactions - the expected y does not include that residual. Note that the equation I gave is for a continuous dependent variable, which is the logit DV in your case. Translating that to probabilities for the dichotomous outcome is described in our Topic 2 short course - see handout and video on our web site.
 Emil Coman posted on Thursday, January 12, 2012 - 2:04 pm
I am also trying to save the probabilities using Bengt's suggestion above, and I have e.g.
DEFINE: logit = 0.267*sex - 0.011*age ;
prob = 1 / (1 + exp (-logit));
Now, what do I ask for in SAVEDATA, and how? I tried
SAVEDATA: FILE IS logistic_1.csv;
MNAMES logit prob subject ;
MSELECT = logit prob ;
but the file saved does not contain the 2 newly defined variables. Any suggestions? Thanks, Emil
 Linda K. Muthen posted on Thursday, January 12, 2012 - 3:13 pm
You would need to put the new variables, logit and prob, at the end of the USEVARIABLES list. Remove the MNAMES and MSELECT options. They are for merging two data sets.
 Rebeca  posted on Tuesday, January 17, 2012 - 8:03 am
The outcome variable only has two values that are labeled but I think that the problem might be that during the analysis, some participants have missing data for the outcome variable and it is being interpreted as being a third value. Consequently, it is assuming that this is an ordinal variable.

However, I was under the assumption that when you run an analysis in MPLUS and run it with syntax to estimate missing data, it would automatically identify the missing value code and not interpret it as an additional value.
 Linda K. Muthen posted on Tuesday, January 17, 2012 - 2:01 pm
If the missing value code is on the MISSING list for that variable, it will not be treated as a legitimate value. It sounds like you are not reading the data correctly. Please send the input, data, output, and your license number to support@statmodel.com.
 terrie andrews posted on Tuesday, April 10, 2012 - 4:53 pm
Hi,

I'm trying to get the probabilities of intent on g1 (categorical). What am I doing incorrectly?

TITLE: A intent to group TPB.
DATA: File is /Users/Terrie/Dropbox/TBQ MPLUS intent to group.dat;
VARIABLE: NAMES ARE a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 c1 c2 c3 c4 c5
c6 c7 c8 c9 c10 c11 c12 c13 c14 i1 i2 i3 i4 i5 i6 i7 i8
t1 t2 t3 t4 t5 t6 g1;
USEVARIABLES ARE a2 a4 a6 a7
s9 s10 s13 c3 c10 c11
i5 i7 g1 prob logit;
categorical are g1;
ANALYSIS: TYPE IS GENERAL; ESTIMATOR IS ML;
MODEL: attitude by a2 a4 a6 a7;
subject by s9 s10 s13;
control by c3 c10 c11;
intent by i7 i5;
intent on attitude subject control;
g1 on intent;
DEFINE: logit = -1.106 + 0.341*intent;
prob = 1 / (1 + exp (-logit));

OUTPUT: STDYX;
savedata: file is output.csv;
 Bengt O. Muthen posted on Tuesday, April 10, 2012 - 5:20 pm
You can't use a latent variable (intent) in Define - it has to be an observed variable in your data.

Instead, use TECH4 in the run where you got the estimates and get the mean and variance of intent. Then use the formula you have (I assume 1.106 is the threshold) for different points on the intent scale (say mean, 1 SD below the mean, 1 SD above the mean).
 terrie andrews posted on Tuesday, April 10, 2012 - 5:42 pm
intent scale is continuous 1 to 5 (e.g., 1.15, 2.74) just realized g1 on intent is not significant :-( 306 in sample, n=77 category 1, n=229 category 2. appears model did not adequately predict binary outcome variable
ESTIMATED MEANS
ATTITUDE SUBJECT CONTROL INTENT G1
________ ________ ________ ________ ________
1 0.000 0.000 0.000 0.000 0.000
COVARIANCE MATRIX
ATTITUDE SUBJECT CONTROL INTENT G1
________ ________ ________ ________ ________
ATTITUDE 0.243
SUBJECT 0.017 0.157
CONTROL 0.083 0.279 0.935
INTENT 0.292 -0.004 0.229 0.542
G1 0.100 -0.001 0.078 0.185 3.353
CORRELATION MATRIX
ATTITUDE SUBJECT CONTROL INTENT G1
________ ________ ________ ________ ________
ATTITUDE 1.000
SUBJECT 0.089 1.000
CONTROL 0.175 0.728 1.000
INTENT 0.805 -0.014 0.322 1.000
G1 0.110 -0.002 0.044 0.137 1.000
 terrie andrews posted on Tuesday, April 10, 2012 - 5:43 pm
oops, meant intent items on intent scale are continuous
 Avalon de Bruijn posted on Monday, May 21, 2012 - 1:42 am
Hi,
I am conducting a logistic regression analysis of which I would like to plot the estimated probabilities against one of the predictors (adjusted for the covariates).

As I understood I first need to define the estimated probabilities and save them. Here, I run into a problem. The newly defined variables (and other variables) are not saved. What do I do wrong?

DEFINE:
logit = 4.021 + (0.077 * sex) + (0.704 * smoking) + (0.119 * age)
+ (0.019 * educ) - (0.029 * q7c) + (0.276 * peeruse) + (0.349 * permpeer)
- (0.195 * permm) + (0.157 * usem) + (0.233 * online) + (0.257 * ABIown)
+( 0.187 * q14alc) + (0.059 * comp4) - (0.010 * q14nalc);
prob = 1 / (1 + exp (-logit));

SAVEDATA: FILE IS newdata.csv;

Thank you in advance.
 Linda K. Muthen posted on Monday, May 21, 2012 - 5:43 am
You need to put the new variables at the end of the USEVARIABLES list.
 sima posted on Saturday, June 07, 2014 - 5:30 am
Hi,
I am predicting a default of companies with variables which are time varying.
for exp.
Year Default Comp. 1Variable 2variable
1999 0 ABC 200 10
2000 0 ABC 50 7
2001 1 ABC 20 2
1999 0 KKK 201 5
2000 1 KKK 100 5

How can i run logit reg. where 1-defaulted in year t, 0 - no.

Thanks
 Linda K. Muthen posted on Saturday, June 07, 2014 - 5:37 pm
You have repeated measures of default. Do you want to do a growth model? What are you trying to do?
 sima posted on Sunday, June 08, 2014 - 2:37 am
Hi Linda,

I am predicting a default probability. have data from annual 2002 -2012 for US companies (for both defaulted and not), each firm has 6 variables( liquidity, profitability and etc).

Is Logistic regression appropriate to use?
 Linda K. Muthen posted on Sunday, June 08, 2014 - 11:15 am
You can use a logistic regression at each time point.
 sima posted on Sunday, June 08, 2014 - 11:57 am
Linda,

Searching some paper I found that researchers estimate probability for example in 3years time.
Would you please advise me how I would do this with logistic regression.?
 Linda K. Muthen posted on Sunday, June 08, 2014 - 1:41 pm
See Chapter 14 of the user's guide or a book on categorical data methodology.
 sima posted on Sunday, June 08, 2014 - 1:59 pm
also, the same logit model in different papers has been given in different explanations:
(1) Pt-1(Yi,t=1|xit-1)=(1+exp(-α-βxi,t-1))-1

Yi,t default in year t;
xit-1 -vector of exp. variables in previous year.

(2)Pt(Yi,t+s=1|xit)=(1+exp(-δ(s)xi,t))-1

Yi,t+s default in year t+s;

xi,t vector of explanatory variables ,including constant, obseved at the end of year t;

δ(s) xi,t –lenear combination of explanatory variables. estimate a vector δ and referred to the linear combination δ(s)xi,t.

Comparing these two models and applying it to SPSS, I cannot understand the difference.
Model first has α constants, 2 does not, but has δ(s).

Would you be able to explain it to me.
Thanks
 Bengt O. Muthen posted on Sunday, June 08, 2014 - 4:23 pm
Your formulas came out distorted as you see. Also, you may want to ask this kind of general modeling question on SEMNET or Multilevelnet.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: