Message/Author 

dagmar posted on Thursday, February 07, 2002  5:17 pm



Is it possible to do LCA with binary and categorical indicators, as well as continuous outcomes? Mplus manual includes examples of LCA with binary indicators and continuous outcomes and of LCSA with 3category latent class indicators. How would the syntax be combined? Thanks. Dagmar 


Yes, this is possible although it would not technically be Latent Class Analysis or Latent Profile Analysis. Indicators that are on the CATEGORICAL list would be binary or polytomous and their thresholds would be referred to useing the $ convention, while variables that are not on the CATEGORICAL list would be assumed to be continuous and would be referred to by their variable names. 

dagmar posted on Sunday, February 17, 2002  5:09 pm



Hello, I tried writing the syntax for this nameless analysis, but can't get it to run. Some of the categorical variables are binary, others have three to five categories. I don't know how whether this is the right way to deal with this. Thanks very much for your helpthe error message is at the end of the syntax file. Dagmar TITLE: Categorical and Continuous Variables; MS Survey DATA: FILE IS "C:\My Documents\work\MS\Profile Analysis\ms.dat"; VARIABLE: NAMES ARE id msmh9 es1 uhw2 speech thinking pain1 pain4 edss marryliv cours2 female educ lwith vscat age_10 dur_10 wk_40 perinc cesd_10 hhinc50 volhalf hhwk_5 fat_5 socsup10; USEVARIABLES ARE msmh9 es1 uhw2 speech thinking pain1 pain4 edss marryliv cours2 female educ lwith vscat age_10 dur_10 wk_40 perinc cesd_10 hhinc50 volhalf hhwk_5 fat_5 socsup10; MISSING ARE ALL (999); CLASSES = c (1); CATEGORICAL = msmh9 es1 uhw2 speech thinking pain1 pain4 marryliv cours2 female educ lwith vscat; ANALYSIS: TYPE IS MIXTURE MISSING; LOGHIGH = +15; LOGLOW = 15; UCELLSIZE = 0.01; ESTIMATOR IS MLR; LOGCRITERION = 0.0000001; ITERATIONS = 1000; CONVERGENCE = 0.000001; MITERATIONS = 500; MCONVERGENCE = 0.000001; MIXC = ITERATIONS; MCITERATIONS = 2; MIXU = ITERATIONS; MUITERATIONS = 2; OUTPUT: SAMPSTAT RESIDUAL TECH1 TECH4 TECH5 PATTERNS; SAVEDATA: FILE IS rawdatams1; FORMAT IS FREE; RECORDLENGTH = 1000; SAVE = CPROBABILITIES; FILE (SAMPLE) IS samplems1; FILE (RESULTS) IS resultsms1; FILE (TECH3) IS tech3ms1; FILE (TECH4) IS tech4ms1; MODEL: %overall% %c#1% [msmh9$1*6 es1$1*5 uhw2$1*4 speech$1*3 thinking$1*2 pain1$1*1 pain4$1*0 marryliv$1*1 cours2$1*2 female$1*3 educ$1*4 lwith$1*5 vscat$1*6]; [msmh9$2*7 uhw2$2*6 speech$2*5 thinking$2*4 pain4$2*3 cours2$2*2 educ$2*1 lwith$2*0]; [speech$3*6 thinking$3*5 pain4$3*4 educ$3*3 lwith$3*2]; [thinking$4*6 educ$4*5]; [edss*1 age_10*1 dur_10*1 wk_40*1 perinc*1 cesd_10*1 hhinc50*1 volhalf*1 hhwk_5*1 fat_5*1 socsup10*1]; edss age_10 dur_10 wk_40 perinc cesd_10 hhinc50 volhalf hhwk_5 fat_5 socsup10; Error message: *** WARNING in Model command All variables are assumed to be yvariables. Check that the covariances between these and other variables are as intended. *** ERROR in Model command Unknown threshold value 2 for variable MSMH9 *** ERROR in Model command Ordered thresholds 3 and 4 for class indicator PAIN4 are not increasing. Check your starting values. *** ERROR The following MODEL statement(s) in Class 1 are ignored: [ MSMH9$2 ] 


What the error message is trying to tell you is that PAIN4 has four thresholds. You are giving starting values for three, so we are assigning zero to the fourth by default and zero is smaller than the value assigend to threshold 3. Regarding MSMH9, it must be binary but you are assigning two thresholds so we are ignoring the second. It might be a good idea to run a regular TYPE=BASIC on your data to see how may thresholds each variable has according to Mplus to make sure this is what you expect. 


I am new to IRT/LCA methods, and have a question. I have data that were collected where respondents had a choice between a dichotomous (yes/no) response and 4 point likert type (degree of agreement) response. Is this something that can be done using MPlus? Thank you, Howard 

bmuthen posted on Monday, October 21, 2002  2:28 pm



Do you mean that the subject chose the response format or that 2 response types were randomly offered to subjects? If the former, I guess one has to figure out which type of person tended to choose which format and if the response of similar people measured by the different formats relate the same way or differently to other variables. I think I have seen analogous situation in achievement testing situations. 

Howard posted on Monday, October 21, 2002  5:47 pm



The former: individual respondents chose the response format. Respondents in this case are elderly nursing home residents. Those who were more cognitively impaired were more likely to use the dichotomous response format. (There may be other variables, but we have not done thorough exploration of this.) We have done some multiple group CFA on our instruments to see if the same multidimensional structure holds up. We found evidence of stability, but also some differences. However, this could only be done by recoding the dichotomous data onto the 14 scale (using a zscore). IRT has been recommended as a preferred approach to calibrating the response formats. It seems like we could do a 2class LCA model using cognitive status as a predictor of class membership. Part of my confusion has to do with the fact that we want to do CFA on our multidimensional structure, then proceed to causal modeling. However, I think the LCA part and the factor structure are interdependent. For example, what if a particular item should be deleted or assigned to a different latent variable? Won't that change the associated IRT parameters? As I write this, I wonder if it matters  once we have fit the best model, won't we optimize both the factor structure and the IRT part? Thank you for your reply this afternoon. I appreciate your further advice. 

bmuthen posted on Tuesday, October 22, 2002  10:15 am



I think the measurement modeling is strongly intertwined with the structural modeling in this situation. It is hard to give general advice without knowing the situation better, but I'll give you some quick thoughts. One can certainly jointly analyze the people to took the 2point questions and the 4point questions  without reformatting the 4point data. And one can see if descriptively similar factor structures arise. And then relate factors to other variables. Such a 2group analysis is, however, unusual in that both variables and people are different across groups, whereas typically only the people are different (and they are also typically obtained through random sampling which is not the case here because people choose group membership)  so at the end we are not sure we are measuring the same thing. This would also be true for the LCA. You mention CFA, factor structure, IRT, and LCA  it is not clear to me if you think of the first 3 as different  I don't. I see LCA and CFA as complementary procedures and both would be affected by the nonrandom choice of response formats, a nonrandom choice that may distort structural conclusions. I don't know if one can assume in this case that conditional on cognitive impairment status, the choice between the 2 formats is random, and I am not sure how this could be incoporated into the modeling. 

Howard posted on Tuesday, October 22, 2002  11:15 am



Well, I put a lot of ideas out there, but perhaps I should be more specific about my goals. We have collected the data in this way to allow people with limited cognitive abilities to provide as much information as they can (instead of being coded as missing). We now want to combine the data together. So my question is, can we use LCA to do that? If so, how? and what assumptions would we have to make about the multidimensional structure? Thank you. 

bmuthen posted on Tuesday, October 22, 2002  1:44 pm



Tell me a bit more. How are you thinking about the LCA  e.g. how would the LCA classes be different from the known groups of individuals (those who chose the 2cat format and those who chose the 4cat format)? Are you thinking that there are 2 classes of individuals, having to do with cognitive impairment, and the tendency to choose one format over the other is related to those 2 classes? 

Howard posted on Wednesday, October 23, 2002  10:41 am



Yes, I am thinking that there is a tendency to use one response format or the other. You are right that the class is observed: we know their choices. What is not observed is why. One point to clarify: The choice is at the item level  some people choose 2cat for only a small number of items, using 4cat mostly. Others use 2cat mostly. 

bmuthen posted on Wednesday, October 23, 2002  6:03 pm



Let me think aloud although it may not get at what you are after. It sounds like it would be valuable to achieve a calibration of the 2 formats, so that one can translate the response to one of the formats into the other format. If not, you don't know if relationships between the response and covariates differ across people because of the format difference or because of people differences. The problem of getting a calibration is that nobody has taken both formats. Optimally, a random subset of individuals should take both formats. I wonder if it is possible to calibrate even without this. Perhaps you have to first study (by logistic regression with a binary dependent outcome and with cognitive impairment etc as predictors) who chooses which format. And then see, for example if impairment is the key factor, how format outcomes relate for people with the same impairment, but who chose different formats. Once the responses on different formats are calibrated, you can get to the structural part. But, my answer may be beside the point. I remember that ETS had a related situation where people could choose a reading topic and then answer questions on it. Perhaps other readers remember this, or have input to give. 

Howard posted on Wednesday, October 23, 2002  7:25 pm



If we were to do a logit to predict response format, how would that help us calibrate the data? It sounds like we would still need to do an experimental study where people did both response formats. If we were to plan such an experiment where we had a sample of people respond to both forms (in random order), the following questions arise: (1) how would we analyze the data to develop the proper calibration parameters; and (2) how would we determine the sample size requirements for such a study? In some places I have read that to do this with IRT requires very large samples eg n > 1000. It occurs to me that within each level of cognitive impairment (on a 6 point scale), there are some people who used each format. Could we use these groups for calibration? There may be some heterogeneity with respect to cognitive impairment within each level of our cognitive impairment score. However, this would give us at lease a rough estimate of the parameters we are interested in. Thank you very much for all of your advice so far. 

bmuthen posted on Thursday, October 24, 2002  6:53 am



Your paragraph "It occurs to me..." refers to the approximate calibration I had in mind in my paragraph "The problem of...". The logistic regression would make is plausible that impairment was a key factor in the choice. If you had an experiment with a random subsample of people responding to both forms, you have 3 subgroups of people (one group has the 2cat format, 1 has the 4cat format, 1 has both formats). You could calibrate within a 3group analysis while you are also doing the structural analysis. By which I mean that the random subsample has 2 indicators (1 2cat, 1 4cat) which enables identification of a factor. Each of the groups with only one indicator sets its threshold and loading equal to the loadings and thresholds of these 2 indicators. I don't think you need n> 1000; it depends on the number of variables. But I think I will now step aside from this topic and leave room for a consultant who can get more familiar with your particular situation. 

Howard posted on Thursday, October 24, 2002  9:01 am



Thank you very much for all of your careful attention to these questions. If I understand your last comment correctly, if we had a sample that did both forms, we could set the parameters as you mention. But in the absence of such a sample we need to use an 'approximate' calibration approach. This has been very helpful  I feel much better about what questions to ask of a consultant. Thank you, Howard 

bmuthen posted on Thursday, October 24, 2002  9:46 am



Yes. Good. 


Hi I am new to MPLUS and LCA. The problem I have at hand is I have to assign values ranging from 0 to 100 which are called the disability weights to a sample of 200 individuals based on certain 5 continuous variables which can be considered as indicator variables. Can this be handled under the umbrella of latent structure analysis and with MPLUS. Thank You 


It sounds like you may want factor analysis rather than latent class analysis. LCA is used to group individuals. Factor analysis is used to group variables. 


Thank you for your prompt reply. Is it possible to consider it as a case of grouping individuals into 100 different classes because what I ultimately require is a disability score for each individual that could range from 0 to 100 based on the 5 variables which are indicators of the patients health condition. Thank You 


Perhaps the following paper can help you understand the difference between factor analysis and latent class analysis. It can be download from the Mplus website from Mplus papers. Muthén, B. & Muthén, L. (2000). Integrating personcentered and variablecentered analyses: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24, 882891. 


I was wondering if there were guidelines for the local dependence assumption in LCA, in terms of how to handle indicators that may be correlated (approx = .65). I have 10 indicators I'm interested in using, but 2 of the variables are correlated. The rest of the indicators don't correlate higher than .35. It is my understanding that local dependence is likely violated here. Should those that have a higher correlation be combined? Does MPLUS have any features for handeling local dependence? 


I don't think that the assumption of conditional independence in LCA says anything about the correlation of the observed latent class indicators. I think that the assumption means that the model is estimated such the the residual covariances among the observed latent class indicators are zero. 


Thank you for your prompt reply. I had recently recieved a review on a manuscript that used LCA to define subtypes of weightlifters in terms of body image disturbance and the reveiwer claimed that the correlations between observed indicator variables suggested violations of conditional independence. I will look into the residual covariances. Do you know of any usefull references on this assumption in LCA? 

bmuthen posted on Tuesday, November 23, 2004  8:19 am



Just to add to this discussion. Note that the LCA model  with its assumption of conditional (local) independence  tries to explain correlations among observed variables. With zero observed correlations there is nothing to explain. So high correlations among observed variables is not a sign that the model assumptions are violated, on the contrary. Think about how 2 variables become correlated by mixing 2 classes of uncorrelated individuals, one class low on both variables and one class high on both variable (plot it and you will see). You may also want to model local DEpendencies among some variables within class if that makes substantive sense  an this can be done using Mplus. For a good reference, see the edited book Hagenaars, J.A & McCutcheon, A. (2002). Applied latent class analysis. Cambridge: Cambridge University Press. 


This is very helpful indeed. How would I go about modeling conditional (local) dependencies in Mplus? I will order the book today. 

bmuthen posted on Tuesday, November 23, 2004  10:24 am



For continuous outcomes it is easy, just say y1 with y2 within a certain class. For other outcomes it is harder but the Version 3 User's Guide has an example. Note, however, that you should have a good substantive reason for making this deviation from the standard model. 

Salma Ayis posted on Tuesday, June 06, 2006  3:52 am



How can I detect for local dependency among the indicators while fitting a Latent class model? 


You can look at the standardized bivariate residuals in TECH10. If you have many large standardized residuals, you can increase the number of classes and see if that reduces the residuals. You can also add a factor to the model that has as indicators the latent class indicators. 


Greetings, I would like to predict continuous outcomes (out1out4 from a categorical latent variable (latent profile analysis, 7 continuous indicators  5 classes). I tried the following syntax (with numerical integration) but it does not work: out1 out2 out3 out4 ON C#1C#4 Is there any other way to do it. I imagine I could just add the outcomes to the usevariables list without any other specifications but then they will be used in the clustering algorythm and I do not want these variables to influence the classification (as outcomes). Thank you in advance for your time. 


You would add the variables to the USEVARIABLES list as shown in Example 8.6. These variables will influence the classes. 


Greetings and thank you for the prompt answer. However, if I do not want the outcomes to influence the classes ? The only things I can think of is to work from the best fitting model without outcomes and then to fix "@" class characteristics (means, variances) before including the outcomes. However, given the fact that I have almost as many class indicators as I have outocomes, I'm afraid it may reflect badly on the model. Else, I will have to work from the saved class membership and analyse the outcomes via MANOVA. But then, I will loose the information from the posterior class probabilities... Any advice ? Thanks again. 


I would not recommend either of these approaches. When you have a distal outcome, the parameters of interest are the means and how they vary across classes. I would instead use the AUXILIARY option with the e setting to test the equality of the means of the distal outcome variables across classes. See the user's guide for more information about this option. 


Thank you again Linda, That is a very nice improvement from version 5! So nice, that I have a new question. Reading the Technical appendice on this function, I saw that it can be used for any covariate. When would you suggest the use of this method versus the direct inclusion of covariates as predictors of class membership? In my case, I have a theoretical/pratical rationale (antecedents are included as predictors of c#1c#4 in the model and outcomes, which arrive after, are kept out of the model so as not to influence something occuring before) but are there any other arguments to consider in this choice (other than the slower computation and harder interpretability already noted in the appendice)? 


The AUXILIARY option is good to use to select covariates when you have many covariates to choose from. It is quicker than including them all in the model to assess their importance. A distal outcome is a dependent variable not a covariate. 


Thanks a lot Linda, You are right, as usual. Excuse the confusion, I got carried over by the new feature. 

Sara posted on Sunday, February 17, 2008  12:16 pm



I have 2 questions regarding the "new" function of the AUXILIARY option to test the equality of means for variables not used in the mixture model (auxiliary variables). First, I have 3 classes and used this function ("auxiliary" with "e" behind auxiliary variables of interest) to produce "EQUALITY TESTS OF MEANS ACROSS CLASSES USING POSTERIOR PROBABILITYBASED MULTIPLE IMPUTATIONS WITH 2 DEGREE(S) OF FREEDOM" and the correponding class means. I can see that the omnibus test for a variable is significant. Is there a way to produce Class by Class tests of mean differences (posthoc tests)? Second, where can I find more information regarding exactly how these means and tests are being computed using "POSTERIOR PROBABILITYBASED MULTIPLE IMPUTATIONS". I couldn't find anything regarding this in the MPLUS Manual 5. Thanks so much. 


There is not currently a way to do class by class tests. See the following link for a description of the method: http://www.statmodel.com/download/MeanTest1.pdf 


Greetings, Is it possible, with the auxilliary (e) command, to obtain estimates of classspecific variances (in addition to class specific means and Wald test) for the auxilliary variables ? I yes, how ? 


Not in the current version  such information is obtained only if the variable is in the model (e.g. as a covariate). What would you use the classspecific variance for  are you aiming to get the standard error for each mean? 


Greetings, I was mostly trying to be report the results as completely as possible. With means and variances, box plot would be possible. Would aiming to get the means SE be a good idea ? 


In the upcoming Mplus version 5.1 the SEs for the means will be provided. 

Herb Marsh posted on Wednesday, March 12, 2008  2:28 am



I have conducted an LPA based on 8 y variables used to define 5 groups (arrows from ys to latent groups). I also have a set of 10 criterion variables that might be considered as auxiliary variables in MPLUS terminology. I have done separate analyses based on: I have a latent profile analysis with 8 y variables that define 5 groups and 10 criterion variables as auxiliary variables. I can test of the equality of the criterion variable means across the five latent groups. Although I understand that these are based on pseudoclass draws (equality test of means across latent classes using Wald chisquare based on draws from posterior probabilities, a strategy seems to be a combination of relating the auxiliary variables to group dichotomies and group probabilities), I am not sure how to interpret them. The values for auxiliary variables are, I assume, means for each variable on the corresponding pseudo groups (classes in the output). How do these relate to the corresponding path coefficients or odds ratios in analyses in which the criterion variables are treated as covariates? Can I determine the amount of variance in one set of variables (group dichotomies or probabilities) explained by the other (auxiliary variables)? – or even the amount of variance in each grouping variable (dichotomy or probability) explained by the set of auxilary variables? 


Regarding the equality testing of means for auxiliary variables (criterion vbles; I'll call them x's in the following) across classes, you are correct in your interpretation. I see this as univariate information and not informative for including all the auxiliary variables as predictors of latent class membership (c in the following). So this is just a first step for seeing which x's might be useful. This information does not directly related to having several of the x's as predictors of c. In Mplus Version 5.1 coming out next month we will have a secondstage approach where we do pseudoclass draws and provide output for "c on x" multinomial regressions. This new output is relevant for choosing a set of x's. Regarding the amount of variance explained, I don't think that is a common usage in logistic regression or multinomial logistic regression. Perhaps a better way to understand how important x's are is to plot the probability curves for c on x as can be done in Mplus graphics. 


Hi, I conducted an LCA with covariates (7 indicators; 12 covariates) using a complex survey data, which seems to work fine. However, when I call in the saved data (using savedata: command) into stata to do some more next step analyses, I found that the raw class counts based on the most likely LC membership in Mplus output are different from the same from the saved data. Later I found that the proportion in mplus (based on most likely class membership) is same as the estimated proportion in the saved data (using stata survey estimation module). Now, the question is which class counts would you recommend to report? Would you rather not report the class counts at all in survey data? It seems to me that the class counts in complex survey data may be kind of misleading. Any suggestion would be most appreciated. Thanks. 


If you are not using Version 5.1, please do. If you are, please send your input, data, output and license number to support@statmodel.com. 

Argyris posted on Thursday, August 14, 2008  12:33 pm



Hello, I would like to estimate categorical variables, such as gender or diagnostic group, as potential class predictors in an LCA. I know I can test the equality of means for continuous variables using the "auxiliary" option. Is there an option for categorical variables? Is it robust/legitimate to use the same? Also, is there a way to get the proportions of categorical variables per class (as in a contingency table) in the output without having to read them from the histogram? Many thanks. 


Covariates are either binary or continuous. In both cases they are treated as continuous. So you would use the AUXILIARY option the same for both. If you ask for SAMPSTAT in the OUTPUT command, you will obtain these values. 

Argyris posted on Thursday, August 14, 2008  2:34 pm



Thanks very much. Sampstat seems to only provide information on those variables that are an explicit part of the model and not the auxiliary variables. But I might be missing something. 


You are correct. Then you can get them from the AUXILIARY option. The means are given for each class. 

Argyris posted on Friday, August 15, 2008  5:28 am



Thank you. Yes, true about the means in Auxiliary. It does not, however, seem to provide the proportions of an (auxiliary) categorical variable per latent class, say the proportion of males per latent class 1. I seem to be able to get this through the histogram by resting the mouse on the relevant bar. I just wanted to ensure that is right and find out whether there was an alternative. Sorry to persevere. 


The mean of a binary variable is a proportion. 

Argyris posted on Friday, August 15, 2008  8:51 am



Quite. These means/proportions of auxiliary categorical variables do not appear anywhere in the output thoughonly in the histograms, it would seem. The means provided in the equality of means part of the output are the means of the binary codings of the variables (eg. 1.43 for a variable coded between 1 and 2;see copied output below). The query is more about confirming that what I get by resting the cursor on the bars is, indeed, the mean/proportion, given it appears nowhere else. Sorry about the confusion. Thanks very much. GENDER07 Mean S.E. Class 1 1.321 0.053 Class 2 1.454 0.055 Class 3 1.537 0.037 Class 4 1.520 0.031 


You must not be using the most recent version of the program. It has means. 

Nan Zhang posted on Monday, February 23, 2009  9:45 pm



I want to save the posterior probabilities and the related syntax is as following: savedata: file is D:\09spring\RA(take action)\data\binaryresults; save=cprobabilities; There is error in the output: *** ERROR in SAVEDATA command The syntax for the FILE option has changed. Please refer to the Mplus User's Guide for available options. Is there any change for the savedata option or anything wrong with my command? 


This looks correct to me. Please send the input, data, output, and license number to support@statmodel.com for further help. 


When there are categorical indicators in an LCA, the output includes the LATENT CLASS ODDS RATIO RESULTS section. I understand that these odds ratios represent the comparisons of each pair of classes. However, i'm not sure how to interpret the statistical signficance test. The way it's calculated, it appears to be a test of whether the odds ratio differs significantly from 0. Is that correct? (In other words, one should not interpret these to mean that the odds ratio is significantly different than 1?). Is there a way to use these results to calculate whether the odd ratio differs from 1? 


Yes, the 3rd column of the Mplus output is always Est./S.E., so this tests against zero. To test against the more relevant value of 1 in this case, you simply consider (Est.  1)/S.E. But it is more common to provide a confidence interval around the odds ratio point estimate. This interval can be requested in the Mplus Output command saying Cinterval. As in the literature, these intervals are derived from those of the logodds and then exponentiated. 

Jinseok Kim posted on Tuesday, October 20, 2009  11:47 pm



I conducted an LCA with auxiliary option. It ran with not problem but presented no SE values for a variable but "*******". Does this mean that the SE of the auxiliary variable is not computed or something like "out of range"? In either case, is there anyway that I can get the SE values for the variable for each class? 


Asterisks would mean the value is too large to fit in the space allowed. For further information, please send your output and license number to support@statmodel.com. 


I am conducting a pretty simple LCA on complete cases only (LISTWISE IS ON). When I use the AUXILIARY option to compare means across classes for some auxiliary variables the number of observations used in the model is reduced  presumably because Mplus is now using the subset of observations for which all the model variables *and* all the auxiliary variables are nonmissing. Is it possible to make Mplus fit the model to the same subset of observations it would if AUXILIARY was not specified, but still compare means for the auxiliary variables (where available)? Many thanks. 


Before Version 6 the listwise deletion was done on the full set of variables inlcuding the auxiliary variables. This has been changed in Version 6 so that the listwise deletion is done on only the analysis variables. 

K Frampton posted on Thursday, April 21, 2011  10:07 am



Hello. I have a 5 class LCA with 7 continuous indicators. I'm trying to decipher if there are significant class differences in particular indicators' means (e.g., if class 1 and class 2 are signficantly different from each other on indicator 1). This seems so simple, but I'm having a hard time interpreting the output. Is this provided? If not, any guidance in how to calculate this is much appreciated! Thank you! 


You can use MODEL TEST to do this. See the user's guide for further information. 


Dear Linda, I would like to test a relationship using a binary variable as a dependent variable and a few continuous variables as the predictors. I have read from the manual that this type of analysis is allowed. Could you please confirm this. Also, which chapter should I read if I'd like explore further regarding the above analysis. Many thanks. Pat 


You can do this using probit or logistic regression. Both examples are shown in Chapter 3. 


Hello, I am running a LCGA model with several continuous covariates and a continuous distal outcomeTime 2 Teacher report of Externalizing. I am controlling for Time 1 Teacher report of Externalizing on Time 2 report, however Time 1 report of externalizing is not currently included as a covariate in the model. To estimate class differences on Time 2 externalizing controlling for Time 1 externalizing, is it enough to have "Extern2 on extern1" in the model or should I also include Time 1 externalizing as a covariate of class. Thank you for your help, Anne 


To clarify, are you talking about 2 different variables  "externalizing" and "Teacher report of Externalizing"? Perhaps the former is selfreported? So that you have these 2 variables at time 1 and also these 2 variables at time 2? 


Thank you for your quick response Dr. Muthen. My apologies for not being clear. I am talking about the same variable both Time 1 and Time 2 externalizing scores are teacher report. I want to assess between class differences on teacher report of externalizing at Time 2 (Extern2) controlling for teacher report of externalizing at Time 1 (Extern1). I have 4 classes. My syntax includes "Extern2 on Extern1*" and I allow this estimate to vary between classes. But do I also need to include Extern1 as a covariate of class membership? That is, do I also need to include: C#1 on Extern1; C#2 on Extern1; C#3 on Extern1; Thank you, Anne 


That depends on your substantive theory. Does latent class membership "exist" before Extern1 or can it be viewed as being influenced by Extern1? 


Thank you so much for your response Dr. Muthen. 


Hi there, I'm trying to save posterior probabilities and class assignments in a LCGM but I'm getting the following error: *** ERROR in SAVEDATA command The syntax for the FILE option has changed. Please refer to the Mplus User's Guide for available options. This is my syntax: SAVEDATA: FILE IS C:\Users\Jaclyn\Documents\PhD\Analysis\General delinquency\ Linear LCGA  with FIML FINAL MODELS with assignment\LCGA count ZIP (inc nonoffenders FIML) FINAL\males; save = cprobabilities; Thanks, Jaclyn 


I can't see the problem. Please send the output and your license number to support@statmodel.com. 


Hello I am running a multilevel latent profile analyses with three profiles at level 2 and only 1 profile at level 1. Students are level 1 and schools are level 2. I am trying to determine if profile membership at level 2 distinguishes between the means (by profile) on level 1 auxiliary variables. However, when I add in the auxiliary command Auxiliary = (2) X1 X2 X3; I receive the following error message: *** ERROR in VARIABLE command Auxiliary variables with 'e' or 'r' specifier are not available with TYPE=MIXTURE with more than one categorical latent variable. I currently have the within variables listed in the Use Variables section as “within” I tried to take them out there and still received the same message. Could you please advise? 


please excuse my typo, the auxiliary code actually reads Auxiliary = (e) x1 x2 x3; 


The error message refers to the fact that you have more than one categorical latent variable on the CLASSES list. I am not sure that AUXILIARY (e) is available with multilevel models. 


Hi Linda Thank you for your response. If auxiliary (e) is not possible given the multiple categorical latent variables, would you suggest setting up a series of Wald tests to see if the means at level 1 are signfiicantly different by level 2 profile? Thank you, AnnMarie 


Yes, you can use Wald tests. 


Hi Linda When I tried to use the Wald tests of equality, I am running into trouble because the variables I would like to test using model statements are at level 1. I am trying to test if auxiliary variables at level 1 are significantly different across the level 2 profiles. However, because my latent profile model has 3 between and 1 within profile, Mplus only sees one group/profile at level 1 and I cannot test the mean differences at Level 1. Also, it seems that Mplus will not let me call for the tests of equality on level 1 variables in the model between statements. Is there a way to get these mean differences at level 1 based on level 2 profile membership? Without outputting the profile memberships or predicted probabilities? Thank you for your help, AnnMarie 


I don't think it is possible to do userspecified Wald tests based on auxiliary variables  that is, variables that are not part of the model  you have to have model parameter labels that you refer to. 


Dear Linda, I would like to run a LCA with four binary and three 3level categorical indicators in a sample of n=60.000. Can I use the the following command for the most simple model without any covariates(?): categorical=tri1 tri2 tri3 bi1 bi2 bi3 bi4; classes=c(2); missing are all (999); analysis: type=mixture; STARTS = 5000 500; stiterations=50; I didn't find any examples in the UG. Thanks, Mario 


That looks fine. 


Dear Linda, I'm trying to do LCA with binary variables. I performed multiple imputation to deal with missing data. However, there are not output parts as "results in probability scale" and "odds ratio results" in my output. I guess it is because I put 5 imputation files. Then, how can I treat missing data when trying to do LCA? I know it is required that threre're no blanks(missing data) in data file. I need your help. Thanks!! 


These are not available with multiple imputation. If you want them, you can use MODEL CONSTRAINT to compute them. See the user's guide. 


Thanks so much, Dr. Linda, I'm trying to do LCA using dataset with missing data. My syntax is as follows: DATA: FILE=bulelem0825.dat; LISTWISE=ON; VARIABLE: NAMES=id sex b1b5 v1v5; USEVARIABLES ARE b1b5 v1v5; CLASSES=c(2); CATEGORICAL= b1b5 v1v5; ANALYSIS: TYPE=MIXTURE; However, I got error message, "Categorical variable B1 contains 59 categories. This exceeds the maximum allowed of 10." I can't understand this message because b1 is binary variable including 2 categories(0=no, 1=yes). Could you tell me what this error messgae means and how to fix it? I do appreciate your help. 


You are reading your data incorrectly. You either have more names in the NAMES list than you have columns of data or you have blanks in your data set and are reading it as free format. 

ChiaYi Chiu posted on Saturday, September 07, 2013  1:58 pm



I am fitting binary data with the LCA in the context of cognitive diagnostic models. Because the model is a reparametrization of the logistic model (not directly a logistic model), I need to impose some constraints to the parameters. When the highest interaction term in the logistic model is 3way, the iterations converge. However, when the highest interaction term goes up to 4way, Mplus gives the following error message: STARTING VALUES FOR THE DEPENDENT PARAMETERS COULD NOT BE COMPUTED FROM THE STARTING VALUES FOR THE INDEPENDENT PARAMETERS. CHANGE THE STARTING VALUES FOR THE INDEPENDENT PARAMETERS. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AS ERROR IN THE COMPUTATION. CHANGES YOUR MODEL AND/OR STARTING VALUES. It looks like the starting value is an issue. However, I did not specify any starting value, but used STARTS = 0; I then changed the number of random starts to STARTS = 100 10; But it did not work either. Any suggestion is welcome. Thanks!! 


Please send the output and your license number to support@statmodel.com. 

WenHsu Lin posted on Tuesday, September 22, 2015  7:00 am



Hello The example 8.6 indicated that latent class c predicated the distal outcome u. How did we know whether the varying thresholds of u across classes are significantly different? or means are significantly different across classes for continuous distal outcome? Thank you. 


You can use the Wald test of MODEL TEST for this. You label the thresholds of u in each class and use the labels in MODEL TEST. See the user's guide for further information. 

WenHsu Lin posted on Tuesday, September 22, 2015  6:01 pm



Thank you Linda 

WenHsu Lin posted on Tuesday, September 22, 2015  6:52 pm



Hi, again after reading the example, I still did not get it. It kept showing warring. My command as follow: usevariables are bmi1 sk1 sk2 sk3 sk4 sk5 sk6 sk8 sk9; missing is blank; categorical are sk1 sk2 sk3 sk4 sk5 sk6 sk8 sk9 bmi1; class=c(2); weight is ipt1; analysis: iteration = 50000; estimator=mlr; type=mixture; model: %overall% model test: c#1=c#2; is the last line correct? The example showed model constraint and symbol for manipulating. In my case, there is no line for bmi1 in the %overall%. So how do I ask Mplus to do the test? Thank you. 


You must specify the parameters, label them, and use the labels in MODEL TEST. %c#1% [bmi1$1](p1); %c#2% [bmi1$1](p2); MODEL TEST; 0 = p1  p2; 

WenHsu Lin posted on Wednesday, September 23, 2015  6:05 pm



This clears all up.Thanks a lot. 

Angelique posted on Thursday, September 22, 2016  1:26 am



Hi, I ran an LCA model. I also determined what predicts class membership. I would like to interpret the Odd's ratio but since some of the predictors are categorical variables (i.e.,B41A), I am having trouble interpreting the output. LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables C#1 ON SITE 0.863 A1 1.508 A2 0.552 A5B 0.547 B41A 0.643 PROSOC 0.498 PARMONIT 0.961 AGEG 0.723 COMMS 0.921 Could you perhaps help? 


That's interpreted the same way as for multinomial logistic regression with an observed nominal DV. See for instance UG Chapter 14 or our new book, Chapter 5. 

Angelique posted on Friday, September 23, 2016  2:07 am



Thank you! 


i have two latent class variables and 10 items (5 items for each latent variable), all items are binary. how can i write the model command to assign the items for each related latent class variable. i tried to write: c1 by u1u5 c2 by v1v5 but i found warning message 


i have two latent class variables and 10 items (5 items for each latent variable), all items are binary. how can i write the model command to assign the items for each related latent class variable. i tried to write: c1 by u1u5 c2 by v1v5 but i found warning message 


Hello, When attempting a mixture model with both categorical and continuous indicator variables, I receive the error below (all caps). I did not specify the model and left the default random start values. It's possible that the model is nonidentified, as this is an exploratory analysis, but I'm not sure if I am doing something wrong with the input. Also, when I run separate analyses with only the categorical or only the continuous variables, the syntax runs. Any thoughts would be greatly appreciated! Thanks, Naomi THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ILLCONDITIONED FISHER INFORMATION MATRIX. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO A NONPOSITIVE DEFINITE FISHER INFORMATION MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.116D18. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THIS IS OFTEN DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. CHANGE YOUR MODEL AND/OR STARTING VALUES. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 63, %C#3%: [ INC_CAT$7 ] 


Please send the output and your license number to support@statmodel.com. 

Back to top 