From the manual, the following returns the latent change curve for binary data.
DATA: FILE IS ex6.4.dat; VARIABLE: NAMES ARE u11-u14; CATEGORICAL ARE u11-u14; MODEL: i s | u11@0u12@1u13@2u14@3;
want to use maximum likelihood, estimate=ML, but what type of analysis type? When I use Type is meanstructure (like a standard growth curve) I get different results (ok, that is fine, since one uses probit and the other logistic[ML]).
DATA: FILE IS ex6.4.dat; VARIABLE: NAMES ARE u11-u14; CATEGORICAL ARE u11-u14; ANALYSIS: TYPE IS meanstructure; Estimator=ML; MODEL: i s | u11@0u12@1u13@2u14@3;
But, scale parameters are no longer returned. Is this correct? I'm actually not even sure what the scale parameters mean. In all of the references I've looked at, the way they talk about logistic regression is via the link function and p(Y=1) = 1/(1 + exp(-x*b)). This is all I want. Yet, the intercept is set to zero and thresholds are returned (with both input specifications). I know thresholds are the value on the standard normal distribution where a score greater than the threshold would be predicted as a Y=1, but how do they fit into the standard logistic regression framework (i.e., what does knowing the threshold buy me?)? What are the scale parameters (which are only returned for the probit, it seems).
I find the Mplus manual helpful for supplying the input for different types of models, but not why things are set up the way they are in the Mplus framework, or how to interpret the results. Any help on this would be appreciated.
You would use TYPE=MEANSTRUCTURE for both ESTIMATOR=ML or the default estimator ESTIMATOR=WLSMV. As you mentioned, WLSMV gives probit regression coefficients while ML with the logit link gives logistic regression coefficients. Logistic regression with ML does not use scale parameters.
See Technical Appendix 1 which is on the website for a description of probit regression as implemented in Mplus with weighted least squares. There are also several references on the website for categorical data modeling which describe both probit and logistic regression.
The purpose of the Mplus User's Guide is to describe the Mplus language and give examples of how it is used. To understand the method and the interpretation of results, we suggest the literature for that method.
Given I'm using logistic regression to model the probability of a success over time, I would like to better understand the output. The majority of Technical Appendix 1 treats the logistic model as it seems to be normally treated, using an intercept and partial slopes that can be transformed into a probability of a success via function p(Y=1) = (1/(1 + exp(-(b0 + bx))).
That being said, that isn't how Mplus handles logistic regression (is there a way for it to?). Appendix 1 goes on to discuss the logistic using the threshold instead of an intercept. I have been unable to locate the precise way the parameters output relate to the probability. I've tried to recover the probabilities on the figure that can be output to no avail. It seems I've tried everything. Give the model from Example 6.4, can you go over how to get the probability of success given x = Some value (p(Y=1) | X=x). The relavant output is here:
Means I 0.000 S -0.926
Variances I 2.856 S 0.323
for X=0, the probability of a success should be about .32
Also, how does one specify a grouping variable for logistic longitudinal models. I've tried using Grouping (Mplus didn't like it) and modeling unsuccessfully I and S on Group status (which Mplus also didn't like)
There is no mystery to what Mplus does here. The only difference between your formula
p(Y=1) = 1/(1 + exp(-(b0 + bx))
and what Mplus does is that Mplus uses a threshold t instead of the intercept,
t = - b0.
This is described in Chapter 13 of the Version 4 User's Guide available as pdf on the web site. See also eqns (27) to (32) in the Appendix 1 you mention. The reason a threshold instead of an intercept is used is that this prepares for having more than 2 categories (number of thresholds is one less than the number of categories). See also the Agresti references given.
So your formula above gives the probability of the outcome in this simple regression case.
Now, ex 6.4 that you refer to is a growth modeling example where things are a bit more complex - see the literature on growth modeling with binary outcomes. Ex 6.4 uses WLSMV estimation and the probit model, but let's discuss this instead in terms of ML and logit. Consider the outcome y at time point 1 where if centering takes place at time point 1, y is a function only of the growth intercept i. I am confused by your mentioning of x=0 here since you give the i mean - if i was regressed on x, Mplus would have reported the i intercept. In any case, here, the probability of y needs to be calculated by numerical integration over the distribution of i. Instead of this marginal probability, you can easily compute the y probability conditional on being at the mean of i (=0), which would be
Multiple-group analysis with models requiring integration are carried out using "Knownclass" - see the User's Guide.
ywang posted on Tuesday, September 28, 2010 - 8:02 am
Dear Drs. Muthen:
When Bayesian estimation was used for a growth model with categorical indicator variables, the model terminated normally, but the autocorrelation plot shows that the correlation is about 0.9. Does that mean the growth model does not fit the data well?
I am doing GMM with two binary indicators measured at five timepoints. Do you have any recommendations or examples (paper) which results should be reported in order to describe the classes formed by binary variables and, in particular, how to plot them? Thank you very much.
Kreuter, F. & Muthén, B. (2008). Analyzing criminal trajectory profiles: Bridging multilevel and group-based approaches using growth mixture modeling. Journal of Quantitative Criminology, 24, 1-31.
Kreuter, F. & Muthén, B. (2008). Longitudinal modeling of population heterogeneity: Methodological challenges to the analysis of empirically derived criminal trajectory profiles. In Hancock, G. R., & Samuelsen, K. M. (Eds.), Advances in latent variable mixture models, pp. 53-75. Charlotte, NC: Information Age Publishing, Inc. Click here for information about the book.
I've run a binary growth curve using the ML estimator, so it's logistic. The model looks good but the covariate effects don't look like logit coefficients, meaning the magnitudes seem too large and don't match similar results in Stata (logistic regression of one of the binary indicators on the covariates). I have 3 questions. 1. How do I interpret the covariate effects on the intercept and linear slope, what are they in/decreases in exactly if not the logged odds of the intercept and slope? 2. Is there any intuitive way to interpret them in terms of the values of the Y variables themselves? Alternately, are there example papers that interpret them in some intuitive manner? 3. I know I can still generate predicted probabilities regardless, would you please direct me to the best source for this formula? Thanks, Miles
1. The DV is a latent continuous variable so the regression coefficients are regular linear regression coefficients. They can be understood in terms of the SD of the latent DV, but you can also plot the probability of the binary outcome as a function of your covariate.
2-3. See articles by Don Hedeker and Robert Gibbons and books like Fitzmaurice, Laird and Ware "Applied Longitudinal Analysis".
Fantastic, thanks. And just so I am clear, you mention in 1 above that I can interpret these in terms of the SD of the growth parameters but I should use the standardized coefficients for that, correct? Miles
I'm having problems with a binary growth model using ML. I want to check the probabilities using the baseline estimated model and my output is:
Means I 0.000 S 3.783 Thresholds PSYPROB6$1 12.730 Variances I 193.764 S 7.542 3.752
The data read in fine as Mplus spit out the correct proportions of the observed binary indicators(at baseline/I = .14). However, when I use the formula above from the posting dated Thursday, April 20, 2006 - 1:56 pm to get the estimated probability for the baseline model at the intercept, I got a number that is way off the observed probability and very small, likely because the tau is so very large. Any idea what is happening here? I'm just not sure if its a math mistake or a model mistake. Thanks!
Getting the unconditional outcome probabilities involves numerical integration over the growth factors because they are random variables (have variances > 0). The logit model for the outcome conditional on the factors combined with normal factors results in this need for integration. So it is hard to compute by hand.
Thanks so much for this clarification. I have been using Mplus for over a decade on a number of different models successfully (all of my calculations by hand match my Mplus results) but the information in the applied literature on binary/ordinal growth is all over the place. I have seen examples where authors interpret covariate effects on I and S in a categorical framework as logit and probit coefficients of the y outcomes (instead of increases/decreases in the continuous I and S representing change in y* over time). I have seen plotted probabilities taken (I assume) from the formula above in the Discussion board. I have also seen examples where interpretations are in terms of logged odds when the authors list the WLS estimator was used. In short, it has been very difficult to find consistent and correct examples using Mplus in the literature, more so than any other model type I have used. I recently found a paper/chapter by Masyn, Petra, and Liu 2014 geared toward an applied audience that clearly explains the models and dispels some inconsistencies in interpretation I've seen in the literature using Mplus software.
Masyn, K., Petras, H. and Liu, W. (2013). Growth Curve Models with Categorical Outcomes. In Encyclopedia of Criminology and Criminal Justice (pp. 2013-2025). Springer.
Note also that the need for numerical integration vanishes when you use the probit link because then you have the normal-normal combination that results in normal (instead of logistic-normal). That is, normal (probit) for outcome regressed on growth factors combined with normal growth factors. That results in a univariate normal distribution function expression.