Cam McIntosh posted on Wednesday, October 12, 2005 - 6:48 pm
Regarding path analysis models estimated using the logit link, as well as for path models involving a combination of logit links and identity links (linear regressions), I am not completely clear as to how the set of simultaneous equations would be written. I am asking for more information on this issue because I am using Mplus for some analyses, and have some very keen methodologists wanting detailed information (e.g., form of equations).
Coming from classical SEM and piles upon piles of textbooks and articles describing the LISREL (Joreskog-Keesling-Wiley) model in both matrix and scalar form, I am now trying to learn the Mplus "beyond SEM" mathematical framework. It is pretty refreshing although challenging. I find that I have been locked into one way of thinking for quite some time.
Anyway, I am currently working on getting a better understanding of the form of the path models with logits, and those which have both logits and linear regression coefficients. I assume that some of this information will be in the updated/upcoming technical appendix dealing with the underlying statistical theory for version 3. But for now, say we have a 3-variable path model, loosely:
Let's say x is continuous and both y1 and y2 are binary, and I am using logit link in Mplus. I assume the errors are fixed at pi/3 as with plain old logistic regression. Is there some particular way to write/express such a model in terms of logits? Perhaps there are some papers and articles on Mplus that treat this explicitly? When using logits, I am just not clear on the "formal" form of the second equation which regresses y2 (an endogenous measured variable) on y1 (another endogenous measured variable) and x (the exogenous measured variable).
Thanks for any assistance and your patience with a former LISRELite,
bmuthen posted on Wednesday, October 12, 2005 - 8:54 pm
Let's have y be the observed binary variable and y* the continuous latent response variable behind it. One common statistical way of writing this is to specify pi_1 as P(y1=1 |x) and let the first logistic regression equation be written as
logit (pi_1) = tau1 + gamma1*x
But you can also use the latent response variable formulation that you gave where the dependent variable is y1* and e1 has its variance fixed at pi^2/3 as you say.
Same for the second equation, where you can let pi_2 be P(y2=1 | x, y1) and have
logit pi_2 = tau2 + gamma2*x + beta*y1
or you can use the y2* latent response variable way of writing it as you have done.
Note then that in Mplus the ML-logit approach uses y1 and not y1* in the second equation. One could use y1* but that hasn't been implemented yet in Mplus with ML, only with WLSMV.
Hope that helps.
bmuthen posted on Wednesday, October 12, 2005 - 8:58 pm
P.S. A minor detail is that in Mplus, the intercept is not estimated but the threshold (preparing for polytomous outcomes), where these have reversed signs. So to be precise, if in the latent response variable formulation you have an intercept in the equation, the threshold that y* has to exceed to change y=0 to y=1 is zero - i.e. you can only identify either the threshold or the intercept.
Cam McIntosh posted on Wednesday, October 12, 2005 - 11:47 pm
Many thanks Bengt, That was exactly what I was looking for and clears things up. In the analyses I have been working on, the ML-logit approach in Mplus would seem to be the most theoretically appropriate, given that I am dealing with some truly categorical (i.e. qualitative) outcomes. However, I will likely be doing some analyses with probit links and the WLSMV estimator in some future analyses, where I will be dealing with survey response options that can be reasonably viewed as discretized versions of continuous response variables. Cam
bmuthen posted on Thursday, October 13, 2005 - 4:24 pm
I think it is important to distinguish between two different situations with continuous latent response variables. In doing so we avoid perpetuating a myth about "truly categorical" vs discretized outcomes.
The first situation is represented by the first equation we are discussing, a regular logit regression expression for the probability of the outcome as a function of x - this can always be presented in terms of continuous latent response variables because the latter model formulation is the same as the former. It doesn't matter if we are studying a truly categorical outcome (such as alive or dead) or not. Even alive/dead can be seen as having a continuous latent response variable - such as a disease process - falling below or exceeding a threshold. The necessity of outcomes having been formed by "discretizing" a continuous latent response variable is a myth.
The second situation is represented by the second equation where the binary variable y1 serves as a mediator. Here it does matter if we consider the mediator as the continuous latent response variable underlying y1 or simply the observed y1 itself - those two models fit the data differently. I would argue that, depending on the application, even with truly categorical variables such as alive/dead it might sometimes be more relevant to have the continuous latent response variable - the disease state - as mediator rather than the observed dichotomy. Both logit and probit could in principle work with either approach, although some combinations are not yet covered in Mplus.