Anonymous posted on Tuesday, November 20, 2001 - 11:13 am
Hi, I am pretty new in this field. I have a data set with the following relationship:
X -> Y -> Z
X, Y and Z are all observable binary longitudinal data. I wonder if Mplus can handle this type of model, if it does, which part of the analysis (e.g. 'path analysis') should be performed? Many thanks for your help!
bmuthen posted on Wednesday, November 21, 2001 - 8:26 am
Yes, this can be seen as path analysis. In Mplus language you define y and z as categorical and then say:
Is it possible to run a parallel process model for a continuous and a categorical variable? I haven't found any examples that combine the parallel process with the categorical approach.
My continuous variable was measured monthly over 15 months, my categorical (binary) variable at baseline, month 3, month 9 and month 15. I would like to know if one variable "drives" the other. I'm fairly new at this Thanks
Yes, it is possible to do that. Just put together a model where one process is continuous with 15 measures and one categorical with four measures. You can put together using Example 22.1 for the continuous outcome and Example 22.1D for the categorical outcomes with/or without the covariates.
Anonymous posted on Monday, December 06, 2004 - 11:28 am
How much variability is necessary for MPlus to estimate a model? I have binary outcome data and some low base rates (in terms of the presence of the outcome). I have 20 participants measured over 18 time points. When I attempt to run the LGCA, I get the following error message:
THE WEIGHT MATRIX PART OF VARIABLE T1 IS NON-INVERTIBLE. THIS MAY BE DUE TO ONE OR MORE CATEGORIES HAVING TOO FEW OBSERVATIONS. CHECK YOUR DATA AND/OR COLLAPSE THE CATEGORIES FOR THIS VARIABLE. PROBLEM INVOLVING THE REGRESSION OF T1 ON RACE. THE PROBLEM MAY BE CAUSED BY AN EMPTY CELL IN THE JOINT DISTRIBUTION.
Unfortunately, I get the same message for about 1/2 of my time points. (I have no missing data - just mostly 0's with a few 1's interspersed throughout the dataset.)Is this error accurate, or might I have an error in my program?
bmuthen posted on Monday, December 06, 2004 - 11:41 am
You mention doing LGCA (I assume you mean LCGA), that is ML estimation of a mixture model. But the error message seems to refer to weighted least squares analysis - not a mixture analysis. Please clarify or send input, output, and data to email@example.com.
Anonymous posted on Wednesday, October 12, 2005 - 12:04 am
I am working on LGM with longitudinal binary variables. It also has time- varying covariates which are also binary. When I added time-varying covariates, the program gave me a warning saying that it needs Monte Carlo integration.
Having Monte Carlo as integration is the way to go in my situation(LGM with binary variables and time-varying covariates)?
I would imagine that without covariates, you were using the default WLSMV estimator and when you added the covariates, maximum likelihood was used. Certain models require Monte Carlo integration. If you need additional information, please send your input, data, output, and license number to firstname.lastname@example.org.
Gina Allen posted on Friday, October 06, 2006 - 11:30 am
We are working on a project in which we are trying to model five categorical indicators at 6 time points. One variable (retirement) designates a permanent transition (once retired always retired). The rest (work, marriage, etc.) can take any value at each time point. We are planning to run latent class models for each of the five variables over the 6 time points and then do a follow up latent class analysis of the cross-classification of the resulting trajectories. Is there an example we can follow or a more efficient way to program this?
I am not sure what you mean when you say "then do a follow up latent class analysis of the cross-classification of the resulting trajectories". I can imagine several approaches - here are two, both are based on an LCA at each time point. One is latent transition analysis where class membership at time t influences class membership at time t+1. There are several examples of that in the UG. The other is latent class growth analysis where the outcome is the latent class variable at each time point - that is a bit cumbersome, however. The latter is the same as doing a joint LCA of all time points, but then structuring the latent class probabilities according to an LCGA.
I am interested in modelling longitudinal poverty (a binary indicator) over 15 waves of data. I am learning about the LCGA approach, however, please could you tell me what the main differences are between this method and latent Markov analysis (of Langeheine & van de Pol)?
Jungeun Lee posted on Thursday, August 09, 2007 - 12:37 pm
I am running a latent growth mixture modeling with 4 binary variables. When number of classes >1, Mplus gives me a warning like;
ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL.THE FOLLOWING PARAMETERS WERE FIXED:3
The parameter 3 is alpha for the slope of the first class and probably because of that, S.E for the slop of the first class is 0. So, I can't tell it is statistically significant or not. Could you let me know how problematic it is to have the warning like the above? Could you please also let me know what I can do in a situation like this?
This seems like a problematic message. Please send your input, data, output, and license number to email@example.com.
Jungeun Lee posted on Tuesday, August 21, 2007 - 4:10 pm
I just emailed them to you. Thanks!
Emily Blood posted on Wednesday, September 24, 2008 - 7:23 am
I am fitting a latent growth curve with repeated observed binary outcome, a latent continuous intercept and a latent continuous slope and a logit link between the latent intercept and slope and binary outcomes, using the MLR estimation. I just want to be clear on the estimation procedure. Does the likelihood assume normality for the observed outcomes (or the unobserved continuous variable with the threshold:Y*) and use the normal density or is the logistic density used in the likelihood? If you could let me know that would be great. Thanks, Emily P.S. I have read the Muthen and Asparouhov 2008 paper, thank you for sending that reference. I still have this one question, though.
The binary growth model with logit link uses regular logistic regression of each binary outcome on the growth factors. Regular logistic regression does not necessitate an underlying y* variable, but simply considers the conditional probability of the binary outcome as a function of the growth factors. However, the logistic regression model can equivalently be expressed in terms of such a y* variable that has a logistic density given the predictors (growth factors).
Emily Blood posted on Wednesday, September 24, 2008 - 11:30 am
Thank you for the response. So the observed data likelihood in this case is expressed as: Integral[f(xb) * normal density of random effects] integrated over the random effects? Where f(xb)=p(xb)^y * (1-p(xb))^(1-y)? I'm basing this on the estimation section in the Muthen & Aspourohov, 2008 paper and putting in the likelihoods for this specific case. Hopefully I've understood it correctly? Thanks, Emily
I am interested in creating a factor-of-curves LGM that utilizes both binary and continuous lower-order curves to estimate the higher order factor. Would this be as simple as defining the lower-order binary curves as categorical (as described in example 6.4 of the user's guide) then creating a model similar to that described by Duncan, Duncan, and Strycker (2006)? Here is a link to their example of the factor-of-curves LGM with only continuous outcomes: http://www.ats.ucla.edu/stat/mplus/examples/ddsla/app52man.inp.txt. Thanks.
I have two questions relating to the factor-of-curves model mentioned above. Is it possible to utilize mixture modeling on the second-order slope and intercept? I would like to examine the heterogeneity in the second order curve. Also, I have missing data related to the outcome variables (smoking and health screening in a sample of older adults). What would be the most efficient way of accounting for data NMAR in the factor-of-curves model (while also utilizing selection modeling with the common I S) ?
To answer your second question, NMAR is a big topic and is not easy to carry out well; no approach is really "efficient". A first step would be to check if those dropping out have a different mean on the outcome before dropping out than others do. But even if that is so, it doesn't mean that dropout is NMAR; it could still be MAR. I think the pattern-mixture approach is probably the most accessible in terms of exploring the missingness.
You get more dimensions of integration due to categorical outcomes, so that can make for heavy computations.
Regan posted on Wednesday, February 03, 2010 - 7:14 pm
I am interested in following up on Mr. Bishop's questions. I have about 2% of my sample that are non-responders on the four indicators that make up my main independent factor variable. I have done some preliminary regression analyses on the variables in the model, and it seems that one could argue that these missing cases violate the MAR assumptions for FIML or MI. Should I drop these cases from analyses, or how do I use the pattern-mixture approach you mentioned in your response to Mr. Bishop? Thanks!
I want to compute P(yj=1|x1, x2) ; j = 1,2,3,4 for given values of x1 and x2.
I known P(yj=1|i, s, x1, x2) = 1-F(-i-(j-1)*s) thus P(yj=1|i, s, x1, x2) = E(P(yj=1|i, s, x1, x2))
To compute P(yj=1|x1, x2) i simule (using R) i and s for given value of x1 and x2 and compute P(yj=1|i, s, x1, x2) for each simulated subject. I obtain P(yj=1|x1, x2) by take the mean of P(yj=1|i, s, x1, x2).
It looks like you are doing numerical integration by simulation. When you don't condition on the growth factors, the normal factors together with logit link requires numerical integration to get the probabilities. This is computed in the PLOT command I believe. With probit link the numerical integration is not needed - instead you have an explicit expession.
Leslie Roos posted on Wednesday, September 09, 2015 - 10:08 am
I'm conducting longitudinal growth model of binary data using the ML estimator and have run a basic growth model as well as a growth model with a number of predictors regressed on the i and s values. I have a couple of questions, as well as an error message question. Thanks for your time!
(1) What would you recommend reported re indices of model fit for the ML estimator?
(2) Should I compare Loglikellihood values between the basic & indicator models to show that the predictor model is a better fit?
(3) Error message ..."THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX." -- the 'problem' parameter changes each time I run analyses, I also included montecarlo integration with the ML estimator after reading a previous post, but I'm not sure about the purpose of this (and obtain the same error message with or without)
(4) I've requested CINTERVALS for the log odds ratios but do not obtain any in the output -- why might this be?
(2) No, I think it is enough to see the that predictors have significant slopes.
(3-4). Send output and license number to support.
Leslie Roos posted on Wednesday, September 16, 2015 - 10:29 am
Thank you for this input -- would you be able to direct me to a resource for interpreting / reporting indices of model fit for binary latent growth models from Tech10? Because we have 5 time points in the growth model, It seems that there are multiple options for the 'bivariate fit information.'
i.e. Is the Overall Bivariate Pearson Chi-square or Overall Loglikelihood chi-square most important vs. the multiple Bivariate Pearson Chi-square or Loglikelihood chi-square indices from each of the 10 pairs? What are the chi-square thresholds for these indicators indicating good model fit?
I would look at each pair. Pearson and likelihood chi-2's should agree - otherwise trust neither. I would use this information in a more descriptive way as for the crime example in the paper on our website:
Muthén, B. & Asparouhov, T. (2009). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press. download paper contact first author show abstract
Leslie Roos posted on Friday, September 25, 2015 - 12:04 pm
Thanks for your response Bengt --
I'm trying to identify which stats from the output should be reported for our basic bivariate growth model, before moving onto describing results from a model with additional covariates.
The Pearson and Log-Likehood Chi-Squares from Tech 10 are very similar for each pair (i,e.. 25.3 & 27.0; 11.7 & 12.3; 14.0 & 14.5; 10.5 & 10.8). Would you consider these numbers to 'agree' or are there further stats that should be run here.
The 'Model Fit information' Pearson and Log-Likelihood Chi-Squares agree with each other in that they are both significant. -- Were you referring to these or the tech10 information?
Regarding the paper you reference, We have looked into the crime example, but it seems somewhat different regarding LCGA (vs. a basic latent growth model). I'm not clear on if you were referring to the descriptives in Table 6.1, and what would makes sense to report for our example, as we do not have iterative comparative models.
Leslie Roos posted on Wednesday, December 02, 2015 - 1:23 pm
Hi Bengt, I wanted to follow up about the question, above, and ask if this would be a more appropriate question to email vs. discussion board post? Thank you!
The last column gives z-tests so you have significant misfit for 2 of the 4 cells in that bivariate table. If other pairs also have several significant spots of misfit you may want to modify your model. For instance, use a quadratic model instead of a linear one.
Leslie Roos posted on Monday, January 11, 2016 - 4:50 pm
Thank you so much for this response. It makes sense that the 2 residual Z scores > 1.96 indicate poor fit. I ran the quadratic term you suggest in the basic model and neither the mean nor the variance is significant.
You could try to correlate residuals for time-adjacent outcomes to improve model fit. With binary outcomes and ML this calls for adding factors that make the items correlate beyond the growth factors. With WLSMV and Bayes there is no such need.
Leslie Roos posted on Monday, January 25, 2016 - 4:47 pm
Got it -- it seems that a simple 'Si_T1 with Si_T2 S1_T3...' doesn't work because the indicators Si_T1, Si_T2 etc are categorical/binary -- is there alternate syntax that we could be directed to in line with "adding factors that make the items correlate beyond the growth factors" ? Thanks!