dagmar posted on Thursday, February 07, 2002 - 5:17 pm
Is it possible to do LCA with binary and categorical indicators, as well as continuous outcomes? Mplus manual includes examples of LCA with binary indicators and continuous outcomes and of LCSA with 3-category latent class indicators. How would the syntax be combined? Thanks. Dagmar
Yes, this is possible although it would not technically be Latent Class Analysis or Latent Profile Analysis. Indicators that are on the CATEGORICAL list would be binary or polytomous and their thresholds would be referred to useing the $ convention, while variables that are not on the CATEGORICAL list would be assumed to be continuous and would be referred to by their variable names.
dagmar posted on Sunday, February 17, 2002 - 5:09 pm
Hello, I tried writing the syntax for this nameless analysis, but can't get it to run. Some of the categorical variables are binary, others have three to five categories. I don't know how whether this is the right way to deal with this. Thanks very much for your help--the error message is at the end of the syntax file. Dagmar
TITLE: Categorical and Continuous Variables; MS Survey
DATA: FILE IS "C:\My Documents\work\MS\Profile Analysis\ms.dat";
Error message: *** WARNING in Model command All variables are assumed to be y-variables. Check that the covariances between these and other variables are as intended. *** ERROR in Model command Unknown threshold value 2 for variable MSMH9 *** ERROR in Model command Ordered thresholds 3 and 4 for class indicator PAIN4 are not increasing. Check your starting values. *** ERROR The following MODEL statement(s) in Class 1 are ignored: [ MSMH9$2 ]
What the error message is trying to tell you is that PAIN4 has four thresholds. You are giving starting values for three, so we are assigning zero to the fourth by default and zero is smaller than the value assigend to threshold 3.
Regarding MSMH9, it must be binary but you are assigning two thresholds so we are ignoring the second.
It might be a good idea to run a regular TYPE=BASIC on your data to see how may thresholds each variable has according to Mplus to make sure this is what you expect.
I am new to IRT/LCA methods, and have a question. I have data that were collected where respondents had a choice between a dichotomous (yes/no) response and 4 point likert type (degree of agreement) response.
Is this something that can be done using MPlus?
Thank you, Howard
bmuthen posted on Monday, October 21, 2002 - 2:28 pm
Do you mean that the subject chose the response format or that 2 response types were randomly offered to subjects? If the former, I guess one has to figure out which type of person tended to choose which format and if the response of similar people measured by the different formats relate the same way or differently to other variables. I think I have seen analogous situation in achievement testing situations.
Howard posted on Monday, October 21, 2002 - 5:47 pm
The former: individual respondents chose the response format. Respondents in this case are elderly nursing home residents. Those who were more cognitively impaired were more likely to use the dichotomous response format. (There may be other variables, but we have not done thorough exploration of this.) We have done some multiple group CFA on our instruments to see if the same multidimensional structure holds up. We found evidence of stability, but also some differences. However, this could only be done by recoding the dichotomous data onto the 1-4 scale (using a z-score). IRT has been recommended as a preferred approach to calibrating the response formats. It seems like we could do a 2-class LCA model using cognitive status as a predictor of class membership. Part of my confusion has to do with the fact that we want to do CFA on our multidimensional structure, then proceed to causal modeling. However, I think the LCA part and the factor structure are interdependent. For example, what if a particular item should be deleted or assigned to a different latent variable? Won't that change the associated IRT parameters?
As I write this, I wonder if it matters - once we have fit the best model, won't we optimize both the factor structure and the IRT part?
Thank you for your reply this afternoon. I appreciate your further advice.
bmuthen posted on Tuesday, October 22, 2002 - 10:15 am
I think the measurement modeling is strongly intertwined with the structural modeling in this situation. It is hard to give general advice without knowing the situation better, but I'll give you some quick thoughts. One can certainly jointly analyze the people to took the 2-point questions and the 4-point questions - without reformatting the 4-point data. And one can see if descriptively similar factor structures arise. And then relate factors to other variables. Such a 2-group analysis is, however, unusual in that both variables and people are different across groups, whereas typically only the people are different (and they are also typically obtained through random sampling which is not the case here because people choose group membership) - so at the end we are not sure we are measuring the same thing. This would also be true for the LCA. You mention CFA, factor structure, IRT, and LCA - it is not clear to me if you think of the first 3 as different - I don't. I see LCA and CFA as complementary procedures and both would be affected by the non-random choice of response formats, a non-random choice that may distort structural conclusions. I don't know if one can assume in this case that conditional on cognitive impairment status, the choice between the 2 formats is random, and I am not sure how this could be incoporated into the modeling.
Howard posted on Tuesday, October 22, 2002 - 11:15 am
Well, I put a lot of ideas out there, but perhaps I should be more specific about my goals. We have collected the data in this way to allow people with limited cognitive abilities to provide as much information as they can (instead of being coded as missing). We now want to combine the data together. So my question is, can we use LCA to do that? If so, how? and what assumptions would we have to make about the multidimensional structure? Thank you.
bmuthen posted on Tuesday, October 22, 2002 - 1:44 pm
Tell me a bit more. How are you thinking about the LCA - e.g. how would the LCA classes be different from the known groups of individuals (those who chose the 2-cat format and those who chose the 4-cat format)? Are you thinking that there are 2 classes of individuals, having to do with cognitive impairment, and the tendency to choose one format over the other is related to those 2 classes?
Howard posted on Wednesday, October 23, 2002 - 10:41 am
Yes, I am thinking that there is a tendency to use one response format or the other. You are right that the class is observed: we know their choices. What is not observed is why.
One point to clarify: The choice is at the item level - some people choose 2-cat for only a small number of items, using 4-cat mostly. Others use 2-cat mostly.
bmuthen posted on Wednesday, October 23, 2002 - 6:03 pm
Let me think aloud although it may not get at what you are after. It sounds like it would be valuable to achieve a calibration of the 2 formats, so that one can translate the response to one of the formats into the other format. If not, you don't know if relationships between the response and covariates differ across people because of the format difference or because of people differences.
The problem of getting a calibration is that nobody has taken both formats. Optimally, a random subset of individuals should take both formats. I wonder if it is possible to calibrate even without this. Perhaps you have to first study (by logistic regression with a binary dependent outcome and with cognitive impairment etc as predictors) who chooses which format. And then see, for example if impairment is the key factor, how format outcomes relate for people with the same impairment, but who chose different formats. Once the responses on different formats are calibrated, you can get to the structural part.
But, my answer may be beside the point. I remember that ETS had a related situation where people could choose a reading topic and then answer questions on it. Perhaps other readers remember this, or have input to give.
Howard posted on Wednesday, October 23, 2002 - 7:25 pm
If we were to do a logit to predict response format, how would that help us calibrate the data? It sounds like we would still need to do an experimental study where people did both response formats.
If we were to plan such an experiment where we had a sample of people respond to both forms (in random order), the following questions arise: (1) how would we analyze the data to develop the proper calibration parameters; and (2) how would we determine the sample size requirements for such a study? In some places I have read that to do this with IRT requires very large samples eg n > 1000.
It occurs to me that within each level of cognitive impairment (on a 6 point scale), there are some people who used each format. Could we use these groups for calibration? There may be some heterogeneity with respect to cognitive impairment within each level of our cognitive impairment score. However, this would give us at lease a rough estimate of the parameters we are interested in.
Thank you very much for all of your advice so far.
bmuthen posted on Thursday, October 24, 2002 - 6:53 am
Your paragraph "It occurs to me..." refers to the approximate calibration I had in mind in my paragraph "The problem of...". The logistic regression would make is plausible that impairment was a key factor in the choice.
If you had an experiment with a random subsample of people responding to both forms, you have 3 subgroups of people (one group has the 2-cat format, 1 has the 4-cat format, 1 has both formats). You could calibrate within a 3-group analysis while you are also doing the structural analysis. By which I mean that the random subsample has 2 indicators (1 2-cat, 1 4-cat) which enables identification of a factor. Each of the groups with only one indicator sets its threshold and loading equal to the loadings and thresholds of these 2 indicators. I don't think you need n> 1000; it depends on the number of variables. But I think I will now step aside from this topic and leave room for a consultant who can get more familiar with your particular situation.
Howard posted on Thursday, October 24, 2002 - 9:01 am
Thank you very much for all of your careful attention to these questions.
If I understand your last comment correctly, if we had a sample that did both forms, we could set the parameters as you mention. But in the absence of such a sample we need to use an 'approximate' calibration approach.
This has been very helpful - I feel much better about what questions to ask of a consultant.
Thank you, Howard
bmuthen posted on Thursday, October 24, 2002 - 9:46 am
Hi I am new to MPLUS and LCA. The problem I have at hand is I have to assign values ranging from 0 to 100 which are called the disability weights to a sample of 200 individuals based on certain 5 continuous variables which can be considered as indicator variables. Can this be handled under the umbrella of latent structure analysis and with MPLUS.
Thank you for your prompt reply. Is it possible to consider it as a case of grouping individuals into 100 different classes because what I ultimately require is a disability score for each individual that could range from 0 to 100 based on the 5 variables which are indicators of the patients health condition.
Perhaps the following paper can help you understand the difference between factor analysis and latent class analysis. It can be download from the Mplus website from Mplus papers.
Muthén, B. & Muthén, L. (2000). Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24, 882-891.
I was wondering if there were guidelines for the local dependence assumption in LCA, in terms of how to handle indicators that may be correlated (approx = .65). I have 10 indicators I'm interested in using, but 2 of the variables are correlated. The rest of the indicators don't correlate higher than .35. It is my understanding that local dependence is likely violated here. Should those that have a higher correlation be combined? Does MPLUS have any features for handeling local dependence?
I don't think that the assumption of conditional independence in LCA says anything about the correlation of the observed latent class indicators. I think that the assumption means that the model is estimated such the the residual covariances among the observed latent class indicators are zero.
Thank you for your prompt reply. I had recently recieved a review on a manuscript that used LCA to define subtypes of weightlifters in terms of body image disturbance and the reveiwer claimed that the correlations between observed indicator variables suggested violations of conditional independence. I will look into the residual covariances. Do you know of any usefull references on this assumption in LCA?
bmuthen posted on Tuesday, November 23, 2004 - 8:19 am
Just to add to this discussion. Note that the LCA model - with its assumption of conditional (local) independence - tries to explain correlations among observed variables. With zero observed correlations there is nothing to explain. So high correlations among observed variables is not a sign that the model assumptions are violated, on the contrary. Think about how 2 variables become correlated by mixing 2 classes of uncorrelated individuals, one class low on both variables and one class high on both variable (plot it and you will see).
You may also want to model local DEpendencies among some variables within class if that makes substantive sense - an this can be done using Mplus. For a good reference, see the edited book
Hagenaars, J.A & McCutcheon, A. (2002). Applied latent class analysis. Cambridge: Cambridge University Press.
This is very helpful indeed. How would I go about modeling conditional (local) dependencies in Mplus?
I will order the book today.
bmuthen posted on Tuesday, November 23, 2004 - 10:24 am
For continuous outcomes it is easy, just say y1 with y2 within a certain class. For other outcomes it is harder but the Version 3 User's Guide has an example. Note, however, that you should have a good substantive reason for making this deviation from the standard model.
Salma Ayis posted on Tuesday, June 06, 2006 - 3:52 am
How can I detect for local dependency among the indicators while fitting a Latent class model?
You can look at the standardized bivariate residuals in TECH10. If you have many large standardized residuals, you can increase the number of classes and see if that reduces the residuals. You can also add a factor to the model that has as indicators the latent class indicators.
I would like to predict continuous outcomes (out1-out4 from a categorical latent variable (latent profile analysis, 7 continuous indicators - 5 classes). I tried the following syntax (with numerical integration) but it does not work: out1 out2 out3 out4 ON C#1-C#4
Is there any other way to do it. I imagine I could just add the outcomes to the usevariables list without any other specifications but then they will be used in the clustering algorythm and I do not want these variables to influence the classification (as outcomes).
However, if I do not want the outcomes to influence the classes ?
The only things I can think of is to work from the best fitting model without outcomes and then to fix "@" class characteristics (means, variances) before including the outcomes. However, given the fact that I have almost as many class indicators as I have outocomes, I'm afraid it may reflect badly on the model.
Else, I will have to work from the saved class membership and analyse the outcomes via MANOVA. But then, I will loose the information from the posterior class probabilities...
I would not recommend either of these approaches. When you have a distal outcome, the parameters of interest are the means and how they vary across classes. I would instead use the AUXILIARY option with the e setting to test the equality of the means of the distal outcome variables across classes. See the user's guide for more information about this option.
Reading the Technical appendice on this function, I saw that it can be used for any covariate. When would you suggest the use of this method versus the direct inclusion of covariates as predictors of class membership? In my case, I have a theoretical/pratical rationale (antecedents are included as predictors of c#1-c#4 in the model and outcomes, which arrive after, are kept out of the model so as not to influence something occuring before) but are there any other arguments to consider in this choice (other than the slower computation and harder interpretability already noted in the appendice)?
You are right, as usual. Excuse the confusion, I got carried over by the new feature.
Sara posted on Sunday, February 17, 2008 - 12:16 pm
I have 2 questions regarding the "new" function of the AUXILIARY option to test the equality of means for variables not used in the mixture model (auxiliary variables). First, I have 3 classes and used this function ("auxiliary" with "e" behind auxiliary variables of interest) to produce "EQUALITY TESTS OF MEANS ACROSS CLASSES USING POSTERIOR PROBABILITY-BASED MULTIPLE IMPUTATIONS WITH 2 DEGREE(S) OF FREEDOM" and the correponding class means. I can see that the omnibus test for a variable is significant. Is there a way to produce Class by Class tests of mean differences (post-hoc tests)?
Second, where can I find more information regarding exactly how these means and tests are being computed using "POSTERIOR PROBABILITY-BASED MULTIPLE IMPUTATIONS". I couldn't find anything regarding this in the MPLUS Manual 5.
Not in the current version - such information is obtained only if the variable is in the model (e.g. as a covariate). What would you use the class-specific variance for - are you aiming to get the standard error for each mean?
In the upcoming Mplus version 5.1 the SEs for the means will be provided.
Herb Marsh posted on Wednesday, March 12, 2008 - 2:28 am
I have conducted an LPA based on 8 y variables used to define 5 groups (arrows from ys to latent groups). I also have a set of 10 criterion variables that might be considered as auxiliary variables in MPLUS terminology. I have done separate analyses based on:
I have a latent profile analysis with 8 y variables that define 5 groups and 10 criterion variables as auxiliary variables.
I can test of the equality of the criterion variable means across the five latent groups. Although I understand that these are based on pseudo-class draws (equality test of means across latent classes using Wald chi-square based on draws from posterior probabilities, a strategy seems to be a combination of relating the auxiliary variables to group dichotomies and group probabilities), I am not sure how to interpret them.
The values for auxiliary variables are, I assume, means for each variable on the corresponding pseudo groups (classes in the output).
How do these relate to the corresponding path coefficients or odds ratios in analyses in which the criterion variables are treated as covariates?
Can I determine the amount of variance in one set of variables (group dichotomies or probabilities) explained by the other (auxiliary variables)? – or even the amount of variance in each grouping variable (dichotomy or probability) explained by the set of auxilary variables?
Regarding the equality testing of means for auxiliary variables (criterion vbles; I'll call them x's in the following) across classes, you are correct in your interpretation. I see this as univariate information and not informative for including all the auxiliary variables as predictors of latent class membership (c in the following). So this is just a first step for seeing which x's might be useful. This information does not directly related to having several of the x's as predictors of c. In Mplus Version 5.1 coming out next month we will have a second-stage approach where we do pseudo-class draws and provide output for "c on x" multinomial regressions. This new output is relevant for choosing a set of x's.
Regarding the amount of variance explained, I don't think that is a common usage in logistic regression or multinomial logistic regression. Perhaps a better way to understand how important x's are is to plot the probability curves for c on x as can be done in Mplus graphics.
I conducted an LCA with covariates (7 indicators; 12 covariates) using a complex survey data, which seems to work fine. However, when I call in the saved data (using savedata: command) into stata to do some more next step analyses, I found that the raw class counts based on the most likely LC membership in Mplus output are different from the same from the saved data. Later I found that the proportion in mplus (based on most likely class membership) is same as the estimated proportion in the saved data (using stata survey estimation module). Now, the question is which class counts would you recommend to report? Would you rather not report the class counts at all in survey data? It seems to me that the class counts in complex survey data may be kind of misleading. Any suggestion would be most appreciated. Thanks.
If you are not using Version 5.1, please do. If you are, please send your input, data, output and license number to firstname.lastname@example.org.
Argyris posted on Thursday, August 14, 2008 - 12:33 pm
Hello, I would like to estimate categorical variables, such as gender or diagnostic group, as potential class predictors in an LCA. I know I can test the equality of means for continuous variables using the "auxiliary" option. Is there an option for categorical variables? Is it robust/legitimate to use the same? Also, is there a way to get the proportions of categorical variables per class (as in a contingency table) in the output without having to read them from the histogram? Many thanks.
You are correct. Then you can get them from the AUXILIARY option. The means are given for each class.
Argyris posted on Friday, August 15, 2008 - 5:28 am
Thank you. Yes, true about the means in Auxiliary. It does not, however, seem to provide the proportions of an (auxiliary) categorical variable per latent class, say the proportion of males per latent class 1. I seem to be able to get this through the histogram by resting the mouse on the relevant bar. I just wanted to ensure that is right and find out whether there was an alternative. Sorry to persevere.
Argyris posted on Friday, August 15, 2008 - 8:51 am
Quite. These means/proportions of auxiliary categorical variables do not appear anywhere in the output though--only in the histograms, it would seem. The means provided in the equality of means part of the output are the means of the binary codings of the variables (eg. 1.43 for a variable coded between 1 and 2;see copied output below). The query is more about confirming that what I get by resting the cursor on the bars is, indeed, the mean/proportion, given it appears nowhere else. Sorry about the confusion. Thanks very much.
GENDER07 Mean S.E.
Class 1 1.321 0.053 Class 2 1.454 0.055 Class 3 1.537 0.037 Class 4 1.520 0.031
When there are categorical indicators in an LCA, the output includes the LATENT CLASS ODDS RATIO RESULTS section. I understand that these odds ratios represent the comparisons of each pair of classes. However, i'm not sure how to interpret the statistical signficance test. The way it's calculated, it appears to be a test of whether the odds ratio differs significantly from 0. Is that correct? (In other words, one should not interpret these to mean that the odds ratio is significantly different than 1?). Is there a way to use these results to calculate whether the odd ratio differs from 1?
Yes, the 3rd column of the Mplus output is always Est./S.E., so this tests against zero. To test against the more relevant value of 1 in this case, you simply consider
(Est. - 1)/S.E.
But it is more common to provide a confidence interval around the odds ratio point estimate. This interval can be requested in the Mplus Output command saying Cinterval. As in the literature, these intervals are derived from those of the logodds and then exponentiated.
Jinseok Kim posted on Tuesday, October 20, 2009 - 11:47 pm
I conducted an LCA with auxiliary option. It ran with not problem but presented no SE values for a variable but "*******". Does this mean that the SE of the auxiliary variable is not computed or something like "out of range"? In either case, is there anyway that I can get the SE values for the variable for each class?
I am conducting a pretty simple LCA on complete cases only (LISTWISE IS ON). When I use the AUXILIARY option to compare means across classes for some auxiliary variables the number of observations used in the model is reduced - presumably because Mplus is now using the subset of observations for which all the model variables *and* all the auxiliary variables are non-missing. Is it possible to make Mplus fit the model to the same subset of observations it would if AUXILIARY was not specified, but still compare means for the auxiliary variables (where available)?
Before Version 6 the listwise deletion was done on the full set of variables inlcuding the auxiliary variables. This has been changed in Version 6 so that the listwise deletion is done on only the analysis variables.
K Frampton posted on Thursday, April 21, 2011 - 10:07 am
Hello. I have a 5 class LCA with 7 continuous indicators. I'm trying to decipher if there are significant class differences in particular indicators' means (e.g., if class 1 and class 2 are signficantly different from each other on indicator 1). This seems so simple, but I'm having a hard time interpreting the output. Is this provided? If not, any guidance in how to calculate this is much appreciated! Thank you!
I would like to test a relationship using a binary variable as a dependent variable and a few continuous variables as the predictors. I have read from the manual that this type of analysis is allowed. Could you please confirm this.
Also, which chapter should I read if I'd like explore further regarding the above analysis.
I am running a LCGA model with several continuous covariates and a continuous distal outcome-Time 2 Teacher report of Externalizing. I am controlling for Time 1 Teacher report of Externalizing on Time 2 report, however Time 1 report of externalizing is not currently included as a covariate in the model.
To estimate class differences on Time 2 externalizing controlling for Time 1 externalizing, is it enough to have "Extern2 on extern1" in the model or should I also include Time 1 externalizing as a covariate of class.
To clarify, are you talking about 2 different variables - "externalizing" and "Teacher report of Externalizing"? Perhaps the former is self-reported? So that you have these 2 variables at time 1 and also these 2 variables at time 2?
Thank you for your quick response Dr. Muthen. My apologies for not being clear. I am talking about the same variable- both Time 1 and Time 2 externalizing scores are teacher report. I want to assess between class differences on teacher report of externalizing at Time 2 (Extern2) controlling for teacher report of externalizing at Time 1 (Extern1). I have 4 classes.
My syntax includes "Extern2 on Extern1*" and I allow this estimate to vary between classes. But do I also need to include Extern1 as a covariate of class membership? That is, do I also need to include:
I'm trying to save posterior probabilities and class assignments in a LCGM but I'm getting the following error:
*** ERROR in SAVEDATA command The syntax for the FILE option has changed. Please refer to the Mplus User's Guide for available options.
This is my syntax:
SAVEDATA: FILE IS C:\Users\Jaclyn\Documents\PhD\Analysis\General delinquency\ Linear LCGA - with FIML FINAL MODELS with assignment\LCGA count ZIP (inc non-offenders FIML) FINAL\males; save = cprobabilities;
Hi Linda- Thank you for your response. If auxiliary (e) is not possible given the multiple categorical latent variables, would you suggest setting up a series of Wald tests to see if the means at level 1 are signfiicantly different by level 2 profile?
Hi Linda- When I tried to use the Wald tests of equality, I am running into trouble because the variables I would like to test using model statements are at level 1. I am trying to test if auxiliary variables at level 1 are significantly different across the level 2 profiles. However, because my latent profile model has 3 between and 1 within profile, Mplus only sees one group/profile at level 1 and I cannot test the mean differences at Level 1. Also, it seems that Mplus will not let me call for the tests of equality on level 1 variables in the model between statements. Is there a way to get these mean differences at level 1 based on level 2 profile membership? Without outputting the profile memberships or predicted probabilities? Thank you for your help, Ann-Marie
I don't think it is possible to do user-specified Wald tests based on auxiliary variables - that is, variables that are not part of the model - you have to have model parameter labels that you refer to.
I'm trying to do LCA with binary variables. I performed multiple imputation to deal with missing data.
However, there are not output parts as "results in probability scale" and "odds ratio results" in my output. I guess it is because I put 5 imputation files. Then, how can I treat missing data when trying to do LCA? I know it is required that threre're no blanks(missing data) in data file.
You are reading your data incorrectly. You either have more names in the NAMES list than you have columns of data or you have blanks in your data set and are reading it as free format.
Chia-Yi Chiu posted on Saturday, September 07, 2013 - 1:58 pm
I am fitting binary data with the LCA in the context of cognitive diagnostic models. Because the model is a reparametrization of the logistic model (not directly a logistic model), I need to impose some constraints to the parameters. When the highest interaction term in the logistic model is 3-way, the iterations converge. However, when the highest interaction term goes up to 4-way, Mplus gives the following error message:
STARTING VALUES FOR THE DEPENDENT PARAMETERS COULD NOT BE COMPUTED FROM THE STARTING VALUES FOR THE INDEPENDENT PARAMETERS. CHANGE THE STARTING VALUES FOR THE INDEPENDENT PARAMETERS.
THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AS ERROR IN THE COMPUTATION. CHANGES YOUR MODEL AND/OR STARTING VALUES.
It looks like the starting value is an issue. However, I did not specify any starting value, but used
STARTS = 0;
I then changed the number of random starts to
STARTS = 100 10;
But it did not work either. Any suggestion is welcome. Thanks!!