Message/Author 


For the latent class model with covariates, can contrasts other than each class compared to the reference group be made? 


No, I don't think so. The two tools we have available for classes is that we can make the last class (the reference class) into the one we want by choice of starting values, and have equality constraints on the class probability parameters. 


In a multinomial logistic regression with a covariate and a latent categorical variable having more than two classes, the individuals do not actually have a 1 or 0 signifying class membership, instead they have a probability of membership for each class. Does the estimation of logit coefficients begin with actually assigning each individual to one specific class and then somehow iteratively account for the fact that the probabilities are not exactly 0 and 1? 


The Mplus multinomial regression with a latent class variable as the dependent variable assigns each individual fractionally to all classes using the "posterior probabilities" and does not force a 0/1 classification. This is done throughout the EM iterations. The first set of fractional assignments are based on the starting values, and they are then iteratively improved on until convergence. 


How are nominal variables used as predictors of the latent categorical variable? (i.e. How do they need to be coded?) For example, I am using age, gender, and race as predictors. I understand the numerical and the binary, but I am not clear on the nominal. 


The nominal variables should be turned into a set of dummy variables as in regular regression. So if you have three categories in the nominal variable, you will have two dummy variables. 

Dirk Temme posted on Tuesday, January 15, 2002  5:29 am



As described in Example 25.4 by modifiying the mixture approach appropriately it is possible to estimate a multinomial logistic regression model where the dependent variable (represented by the latent categorical variable c) is unordered categorical (such a model can be used for example if one tries to explain the choice of a specific product from a set of multiple alternative products). As a result of the required "trick" we only get single estimates for the model parameters instead of classspecific estimates. At first sight it seems natural to use the observed latent class indicators u (e.g., measuring the observed choice of a specific product by individual i) as the dependent variable if one is interested in a "true" combination of the multinomial logistic regression model and the finite mixture approach. Unfortunately, is is assumed that the class indicators u are binary or ordered categorical. Therefore it seems that only the choice between two alternatives (e.g., product A and product B) can be explained in a true mixture application. Do you know of a different way to estimate a mixture multinomial logistic regression with multiple unordered categories? 

bmuthen posted on Thursday, January 17, 2002  9:17 am



Although we have not tried this, it seems that it is possible to do what you want using an approach with several latent class variables. One latent class variable corresponds to the unordered outcome  using the training data construction that is shown in Example 25.4. Say that this has 3 categories (call the variable u). The other latent class variable is the true latent class variable. Say that this has 2 categories (call the variable c). You then create a 6class setup: c=1: 1 2 3 c=2: 4 5 6 where columns correspond to the 3 categories of u. Here, class 3 should have its slope on x fixed at zero, while leaving the intercept free. 

Huang posted on Tuesday, February 18, 2003  12:18 pm



Dear, Dr. Muthen, I have one qustion on the interpretation of the output, and two questions on the model fitting. First, in LCA/LCR, the output of the measurement part for one of the binary indicators (manifest variable) is described as the folloing FUTUR (w/i class 1) value s.e val/s.e Category 1 0.808 0.019 42.852 Category 2 0.192 0.019 10.168 How to interpret this item response probability? What is the code for "category 1"? P(futur=1class=1) = 0.808 or 0.192? Second, when I fit the regression part of LCR, if some of the covariates are categorical, do I need to center then also. For example, gender and race are both binary covariates, which format is more propriate in Mplus? variable: names are popul futur alc money sex race; usev are popul futur alc money centsex centrace; classes=c(3); categorical=popul futur alc money; define: centsex=gender.6; centrace=race.725; analysis: type=MIXTURE; MODEL: %OVERALL% c#1 c#2 on centsex centrace; Or: variable: names are popul futur alc money sex race; usev are popul futur alc money sex race; MODEL: %OVERALL% c#1 c#2 on sex race; Third, how to make the last class (the reference class) into the one we want? MODEL: (original model, which takes class3 as the baseline in polytomous regression) %OVERALL% c#1 c#2 on centgrad centrace; %c#1% [popul$1*2 futur$1*2 alc$1*1 money$1*2]; %c#2% [popul$1*1 futur$1*1 alc$1*2 money$1*1]; %c#3% [popul$1*3 futur$1*.5 alc$1*6 money$1*.5]; If I want to use class1 as the reference in regression piece, is the following gonna work? (just, switch the starting value of class1 to class3) %OVERALL% c#1 c#2 on centgrad centrace; %c#1% [popul$1*3 futur$1*.5 alc$1*6 money$1*.5]; %c#2% [popul$1*1 futur$1*1 alc$1*2 money$1*1]; %c#3% [popul$1*2 futur$1*2 alc$1*1 money$1*2]; Thank you very much for your help and your time. Your information and suggestion will be appreciated. Have a great day! Sincerely Huang 


Category 1 is 0. Category 2 is 1. It is not necessary to center covariates. You might want to center the continuous ones but not the binary. Use starting values to make the last class the one you want. What you suggest should work. 

Anonymous posted on Friday, July 25, 2003  1:16 pm



I'm curious of the formula used to compute the estimated class proportions when there are class predictors in the model. I tried looking this up in my categorical data analysis books, but all only gave formulas for the predicted probability P(Cx), not the estimated unconditional probabilily P(C). I suppose this is because in most multinomial models, c is an observed variable, rather than latent. If you can point me to the formula, I would be grateful. Thank you for your attention. 

bmuthen posted on Friday, July 25, 2003  6:59 pm



Without covariates, the probabilities of c have their own parameters, designated as the [c#] intercepts in Mplus. These probabilities are the same as the values obtained from the estimated posterior probabilities from the last stage of the ML iterations, computed for each class and each individual. Each class' probability is obtained by summing over the individuals' post prob values for that class. You are right that with covariates, there is no designated parameter for the c probabilities. The estimated probabilities are, however, obtained in the same way via posterior probabilities as for the case without covariates. 

Anonymous posted on Sunday, October 19, 2003  4:28 pm



I am using mixture modeling to perform a multinomial logistic regression with an unordered polytomous observed dependent variable. I am especially interested in testing whether the estimates for two predictors in the same model are the same. I can easily run the model with and without the relevant constraints. P. 371 of the manual indicates that the log likelihood ratio for a given model can be used to compute the likelihood ratio chisquare for nested models (which, I believe I have). Is there a way to get this likelihood ratio chisquare printed in the output? 


You have to do two runs and then do a difference test using the two loglikelihood values. 


Hi, Linda. Is there any chance in future versions of getting the capacity to test constraints in a single run? That would be a great convenience. 


I will certainly add it to my list. 

Anonymous posted on Monday, October 20, 2003  9:39 am



Regarding the constraint tests for the multinomial regression: I understand that I need to do the difference test. However, it's not clear what values need to be contrasted. For BOTH models (estimates freely estimated and estimates constrained to be equal), the same H0 loglikelihood value is given so I didn't understand your reply which indicated I should do a difference test of the likelihood values from the same model. (No other loglikelihood value appears, e.g., one for H1). I had thought there would be a way to obtain the likelihood ratio chisquares for the two models for each model so that I can subtract one from the other. The Information Criteria (number of free parameters, AIC, BIC, and samplecorrected BIC) do differ for the 2 models. Although the models are nested, can I subtract the relevant Information Criteria values for a valid test of the equality constraint? 


2 times the loglikelihood difference gives you the chisquare difference. You should get a different H0 loglikelihood for each model. If you don't, please send your two outputs to support@statmodel.com so I can see what the problem is. I don't know that the difference between information criteria values for the two models is a valid test of the equality constraint. 

Ken Wahl posted on Saturday, March 27, 2004  11:11 am



With a nominal variable, should it make any difference which value is used as reference? It seems that dummy parameters can drop in/out of significance whe different reference values are used  but maybe I'm not interpreting the results correctly. I'd hate to presume nonsignificance simply because I chose the wrong reference value. 

bmuthen posted on Saturday, March 27, 2004  11:37 am



The choice of reference category makes a difference  and it should because you are discussing different effects with different reference categories. You can change the reference category by changing your starting values. 

Carlos posted on Thursday, April 22, 2004  8:07 am



Hi Linda, Bengt, I would like to use the results from a latent class model conducted in one sample, to obtain class probabilities (and eventually classify) people in a different sample. Do you have any utilities for doing this? Otherwise, do you have any examples I could use, using excel or other similar tools? Sometimes we use a discrimininant function to do this, but we would like to use the results from the latent class model. An example with latent class indicators and covariates would be extremely helpful. The issue gets more complicated because some of our class indicators can be ordered or unordered indicators (although we try avoiding the latter.) Thanks! 


What you would do is fix all of the parameters in the model to the values obtained from the first sample. Then run the analysis using the data from the second sample and asking for CPROBABILITIES in the SAVEDATA command. 

Carlos posted on Thursday, April 22, 2004  11:20 am



That's brilliant (and it makes perfect sense)! Thanks so much. C. 


Greetings. I feel like I asked this a couple of years ago, but I can't find it. I've got a fourclass model I'm happy with. I'm regressing class membership (c#1, 2, and 3) on a dichotomous treatment predictor. I'm wanting to know if the predictor significantly predicts membership in each of the classes, taken singly. That is, are treatment Ss more likely than control Ss to be in class 2. The regression parameter, of course, gives me that in relation to the reference class, class 4. But I want the backdrop to be the aggregate of the other three classes  class 2 vs. anything else. Any suggestions? Thanks. 

bmuthen posted on Wednesday, June 16, 2004  8:57 pm



You can express the probability of being in each of the classes as a function of tx by the usual multinomial logistic regression expression. And through this you can get the sum of probabilities of being in all classes but class 2. So this way you can take the ratio you want and get the point estimate. But I think the log of this probability ratio is not a simple function of the regression coefficients, but a nonlinear function of several coefficients, so to get the SE you have to use the Delta method. Correct me if I am wrong, someone. 


Thanks, I'll look into it. Yes, I've got the point estimate. 

Anonymous posted on Thursday, June 24, 2004  3:35 pm



I have a gmm with a dichotomous predictor of class membership (gender). There are no females in one of my classes and the regression coefficient associating this class with gender is 70.8. I also get an error message stating that this coefficient was fixed to avoid singularity. What should I do in this case? 

bmuthen posted on Friday, June 25, 2004  9:09 am



You can report this solution. The large fixed reg coeff simply means that the probability is 1 for being in this class when the dichotomous predictor is 1 as opposed to 0. The value 70.8 is arbitrary  any large value, say greater than 15 (it depends on the other coefficients' sizes), suffices to give probability 1. 

Anonymous posted on Monday, September 06, 2004  7:06 pm



I have a binary item (X) that I'd like to use as a predictor of class membership. However, I don't want to include it in the GGMM (for my current analysis, I don't want X to affect the latent class solution). Rather, I'd like to get the posterior probabilities of group membership for each class, estimate the weighted proportion on X for each class, and then compare them using a test for two proportions or some other appropriate test. Would this be an appropriate strategy to assess the effect of X on class membership given my need to not include X in the model? 

Anonymous posted on Monday, September 13, 2004  5:46 am



Is it possible to change the reference group for a 4 class model with covariates? The default is for class 4 to be the reference; however, I would like for the reference group to be class 1. Thanks for your help. 


Re: September 6  Because each individual is in each class, putting them in their most likely class only will result in estimation error. Furthermore, the standard errors are incorrect because class membership is taken as observed not inferred which will distort the test of proportions. You might want to look at the paper by B. Muthen in the book edited by Kaplan which can be downloaded from the Mplus homepage. This discusses covariates affecting class membership. This is not always undesirable. 


You can't change the reference group to be other than the last class, but you can use starting values to make sure the class you want as the reference class is the last class. 


I have 2 questions concerning latent class analysis with covariates. I have a latent class variable (attachment style, 20 indicators) with 3 latent classes. In my model, the LC variable is predicted by 3 continuous latent variables (factors). My first question is, as my data is sparse (many indicators for the LC variable), the pvalues of the Chi**2 tests provided in Mplus do not seem appropriate for the overall fit assessment of the model. What kind of procedure do you recommend for the fit assessment of such a model? E.g., is there a possibility to obtain a bootstrap pvalue for the Chi**2 tests in Mplus (as is possible in other LCA computer programs)? The second question concerns the multinomial logistic regression of the LC variable on the covariates. I understand from the output that my predictors significantly predict my LC variable. However, Mplus does not seem to provide any kind of effect size for the predictor model (like pseudoR**2 as is available for multinomial regression for example in SPSS) that could be used to judge whether the effect is really meaningful. Do you know of any possibility to compute an effect size measure for such a model? Thank you very much in advance! 

bmuthen posted on Monday, November 01, 2004  8:04 am



Model testing against data is difficult for LCA with covariates  chisquare testing against a frequency table is not possible given the continuous LCA covariates since that does not provide "grouped" data. I would recommend chisquare testing based on the log likelihood difference between nested models, e.g. with or without some direct effects from the covariates onto the indicators. Or in the case of testing k vs k1 classes, where parameters are on the border of their admissible space, using the LMR test in Mplus (Tech11). The LMR test is a way to avoid the heavy compuations of bootstrap p values. I think a clear way to understand the estimated effects of covariates on the latent classes is to compute predicted probabilities for the classes at given covariate values. The Mplus plot facility can be used to do this automatically; see e.g. the Version 3 User's Guide, end of Chapter 13. I think that is more clarifying than any type of Rsquare statistic which I don't find as natural for noncontinuous outcomes. 


Thank you very much for your reply. The LMR test seems to be helpful. But is it possible to compute bootstrap pvalues for the LR and Pearson X**2 tests in Mplus for LCA models without covariates? 


Let me add to my previous question  would it be possible via Monte Carlo simulation in Mplus? Thanks, Christian 

bmuthen posted on Sunday, November 14, 2004  11:59 am



Such bootstrapping is not available in the current Mplus. I think the Mplus Monte Carlo facility can only be of limited help here. You have to draw new samples with replacement and estimate your model for each such sample. You can submit a set of such new samples to Mplus using the "external" Monte Carlo facility and thereby get summaries where you can study the distribution of your test statistics in a Monte Carlo run of those samples. 


A paper by Langeheine, Pannekoek & van de Pol (1996) about parametric bootstrapping of goodness of fit measures made me think of the possibility to use the Mplus MC facility as they call the procedure they describe in that article "Monte Carlo bootstrap". In addition, the steps to be performed in this bootstrap ("assume the model is true  treat fitted proportions under the model as population proportions  draw samples from this multinomial distribution with known parameters  estimate the same model for these samples and assess G**2 (LRtest) for each sample  reject model if proportion of bootstrap G**2's that are larger than original G**2 is very small." [p.495]) sound very much like that should be possible with a MC simulation. Could you elaborate a bit why you think that this can not be done in Mplus? Again, thank you very much in advance. Langeheine, Pannekoek, & van de Pol (1996). Bootstrapping goodness of fit measures in categorical data analysis. Sociological Methods & Research, 24(4), 492516. 

bmuthen posted on Friday, November 19, 2004  11:12 am



I see  you are referring to parametric bootstrapping where you are drawing samples from an estimated model (I was thinking more in terms of nonparametric drawing with replacement from the raw data). Yes, the Mplus Monte Carlo facility can be used for drawing and analyzing such samples. You enter the estimated model as population parameters and draw samples from this across many replications. The Monte Carlo summaries give distributions for the resulting tests and estimates. But the LR test for a frequency table is not part of the summaries. It is only part of the output if you do "external" (as opposed to internal) Monte Carlo in Mplus. External is when you have generated data yourself outside Mplus and then send those data to the Mplus Monte Carlo facility for analysis and summaries. 

bmuthen posted on Friday, November 19, 2004  11:23 am



Note that to circumvent the limited output from an internal run, you can generate your data sets in a first internal run, save them, and then use them in an external run. 


Are you sure? My (internal) MC output for a model with 3 latent classes contains means, SD's and the number of successful computations for the likelihood ratio chi**2 as well as for the Pearson X**2. Or am I missing something here? Another question: How can I get the loglikelihood for the saturated latent class model in Mplus? Thanks, Christian 

bmuthen posted on Saturday, November 20, 2004  10:36 am



You are right; I forgot about this recent addition to Version 3. Regarding the saturated model logL, it is not separately reported but you could deduce it from the LR chisquare value and the logL for H0 which are both printed, since the LR chisquare is 2 times the logL difference between H1 and H0. 


Your own program can more than you thought! Isn't that a nice surprise...! Thanks for the information about the saturated model logL, I thought about that possibility too, but isn't it a problem for the computation you suggest that Mplus computes "corrected" chisquares (my output tells me Mplus has deleted certain extreme values (cells) in the computation of chisquares)? Thank you, Christian 


We are continually surprised by the unacticipated power of Mplus I think if you add UCELLSIZE = 0; to the ANALYSIS command, you will avoid having any cells deleted. 


Linda and Bengt, thank you. You were extremely helpful. One last question: The MC outputs for my latent class models give me expected as well as observed proportions and percentiles for the fit statistics. Though I think the observed values should be interpreted, I am somewhat uncertain. Could you briefly explain the difference? Thank you, Christian 

bmuthen posted on Friday, November 26, 2004  6:31 am



The expected proportions in the first column simply indicate which percentage levels are reported, so you are right that you would be interested in the results in the observed proportion column, column two. So for instance, the value 0.05 in the first column simply indicates that it is this value that you want to get close to when looking at the second column value. And, analogously for the percentiles. 

Anonymous posted on Sunday, December 12, 2004  7:34 pm



Dear Bengt, I am running a threeclass mixture model with multinomial logit regression on the class membership. I know that the last group (c#3 in my case) is the default reference group. I would like to know how to change the reference group to other groups, say Class 2. Thanks in advance! MODEL: %OVERALL% Y1 with Y2; [Y1 Y2]; c#1 c#2 ON x1 x2 x3 x4; %c#2% Y1 with Y2; [Y1 Y2]; %c#3% Y1 with Y2; [Y1 Y2]; 

bmuthen posted on Sunday, December 12, 2004  7:38 pm



You have to do this by using starting values that make whatever class you want the last class. So you rerun your analysis giving some key starting values for the last class which you take from class 2 of your current solution. 

Anonymous posted on Sunday, December 12, 2004  9:54 pm



Thanks a lot for the suggestion. I have tried it and got some strange results. When there was no starting value, the correlation between two variables in one class was positive while it became negative when starting values were given. All other fit statistics and parameter estimates were exactly the same except the mentioned correlation. 


Send the two outputs to support@statmodel.com so I can see exactly what you are doing  the output with no starting values and the output where you use the ending values as starting values. 

Judith Baer posted on Tuesday, October 25, 2005  11:24 am



We are trying to determine the significance level for the covariates in a LCA with categorical variables. For example c#1 male .963 .095 10.135 Blk .432 .138 3.119 Age .382 .051 7.433 What is the meaning of .095 for males? Thanks. 


The columns of the output are described in Chapter 17 under Summary of Analysis Results. If the value .095 is in the second column of the results, it is the standard error of the parameter estimate .963. 


We heard that MPlus version 4 would implement boostraping method of pvalue for latent class analysis. Dayton mentioned that for spare data chisquare and G2 differ and should not be used for model identification. Our data has 17 binary manifest variables and the output is as follows: Class Chisquare (pvalue) G2 (pvalue) 2 289355 (0.0000) 57781 (1.0000) 3 176063 (0.0000) 44941 1.0000) 4 95719 (0.0000) 36958 (1.0000) 5 90990 (0.0000) 31853 (1.0000) 6 71474 (0.0000) 27569 (1.0000) 7 60354 (1.0000) 24459 (1.0000) 8 43462 (1.0000) 22227 (1.0000) I have tried using Bootstap option for LCA and the chisquare and G2 pvalue is the same as those provided by the standard output. Have I done it correctly just by specifying the bootstrap option? If not, how it can be done in current version? Anway, does the bootstaping help in addressing the pvalue? Based on AIC, BIC and adjusted BIC, these indices keep droping till 12 classes and we do not carrying on since more than 12 classes as it is hard to explain for huge number of classes. We plot the 3 indices against no of classes and the drop in these indices seem to be at a decreasing rate after 7 classes. Does we need to rely on information criteria alone or should more judgement other than statistical be used? Thanks. 

bmuthen posted on Wednesday, December 21, 2005  7:30 am



Yes,the forthcoming Mplus version 4 will give a bootstrapbased p value for the likelihoodratio test of k1 versus k classes. In the current version of Mplus, however, bootstrapping is only for standard errors of parameter estimates and does not help in determining the number of classes. At this point, BIC drop and the Tech11 LoMendelRubin test can be used to help decide on the number of classes  taken together with substantive considerations. 

Kate Degnan posted on Tuesday, January 10, 2006  5:02 am



I am using the mixture modeling to perform a latent profile analysis. I have found interactions to predict the probabilities of membership in the profiles. Since the output gives logit information is it correct to assume that I would interpret/calculate the interaction in the same way I would for a regular logistic regression? 

bmuthen posted on Tuesday, January 10, 2006  8:50 am



Yes. 

Michael P. posted on Monday, March 20, 2006  1:26 pm



Dear Muthen´s, i have two questions regarding multinomial mixture. 1) if you have h latent classes, you can compute the probability of class membership for only h1 classes using covariate values. Is it possible to compute this probability for the last class? I know, in a case without covariates c#h can be computed from the given thresholds of the other classes. 2) Class membership probability uses both thresholds and slopes, but what is if somes slopes are not significant. Can you leave those out and compute the probability only with the significant slopes? Thanks for any response. 


See Chapter 13 where calculating probabilities from logistic regression coefficients is described. It shows how to compute probablities for all classes. If a slope is part of the estimated model, it must be used to compute the probability whether it is signficanct or not. 


I understand that starting values can be changed to control which group is the last class and that models can be run multiple times so that different classes can be used as the reference group in a multinomial logstic regression. However in some instances all groups can be conveniently compared without running multiple models because "alternative parameterizations" are automatically provided in the output. This helpful element disappears when I mention the variances of variables in the model command. Is there any way to retain the "alternative parameterization" function so that all groups can be compared in just one output? 


I think the option is unavailable when numerical integration is involved given that such models are more computer intensive, and I don't think there is a workaround here short of rerunning with different starting values. 


Hi, I have a question about LCA. I found 4 latent classes in a model using 9 nominal variables. Each variable have 2 or 3 choices of response. I want to know if it's possible with MPlus to identified, for each classes, which are the choices of response that explain the classes. I want to be able to tell that people in the class # 1 choose the category 2 in the first variable and the class # 2 choose the category 1. Thank you variable: name are id sexe y1y9; usevariables are y1y9; auxiliary = id; classes = c(4); nominal = y1y9; missing = . ; analysis: type = mixture missing; 


In your results, you will get a mean for each class for all but the last category of the nominal variable. You can turn these means into probabilities. p = 1 /( 1 + exp L) where L = means in the output 


Thank you! I have some value at 15.000 with a S.E. at 0.000. What does that means? Can I use it? Annie 


It means that values became large and were fixed. This should be stated in the output. Yes, you can use it. 


Hi Linda, I have a other question: Can I do Multinomial Logictic regression on nominal variables? I try it and the ouput say that I can't have nominal variables on the rightside of the ON. Thank You 


The NOMINAL option is for dependent variables. If you have an unordered categorical variable that you want to use as a covariate, you need to create a set of dummy variables as in regular regression. If you want it to be a dependent variable, it should be on the lefthand side of ON. See Example 3.6. 

Andy Ross posted on Tuesday, March 20, 2007  6:32 am



Dear Prof. Muthen I have calculated estimates of the effect of a covariate within EACH of the latent classes rather than for m1 classes by comparison to a 'reference' class (i.e. standard multinomal regression coefficients). However i am stuck as to what to call them  and am also looking for a reference for this approach? This is what I have done: I have calculated probabilities from the logistic regression coefficients as outlined in chapter 13, for a LCA with four classes and 13 covariates  using the model contraint command to estimate these new parameters. For example I have calculated the probabilities for membership of each of the latent classes for women, holding all other predictors at their mean, and have done the same for men. I have then calculated the ratio of these two probabilities: prob(women)/prob(men) in each of the classes, giving me a kind of odds (?!?) associated with being female by reference to being male in each of the latent classes I also noted that I can get back to the original multinomial odds ratios by taking the ratio of the new estimate for class one (for example) and the last class. However, whilst these give me a clear indication of the effect of each covariate in each of the classes without comparison to a reference class, I have no idea what to call them. Could you please advise me. With many thanks Andy 


It sounds like you have used the results from a multinomial logistic regression to create probabilities for both males and females and then created odds from those probabilities so that the odds represents the odds of being male versus female in a certain class conditioned on the other covariates. I don't think this has a special name. I would refer to it as an odds. 

Andy Ross posted on Tuesday, March 20, 2007  9:26 am



Excellent  many thanks for that. Do you happen to know of a good reference for this approach? Andy 


I think the Agresti book is good for categorical outcomes. 

Andy Ross posted on Wednesday, March 21, 2007  4:35 am



Many thanks again. 


Hi, I am conducting a research which is trying to observe changes in musical style over time (16801750). Each characteristic I am checking receives a 0 or 1 value per musical work, and in unclear cases, an interim value is used (in no fields are there more than 2 interim values). I understand I should use ordinal multinomial logistic regression, but I have not managed to get SPSS to plot it on a graph. How can that be done? Please take into account that I am a novice in statistics... Thanks 


I would not know how to do this in SPSS as I don't use that program. You should try directing your question to SPSS support. 


Are there any programs you can recommend that do allow such an option? Thanks, Yoel 


No. 


Can Mplus estimate mixed logit models? 


If by mixed you mean a mixture model, yes. If by mixed you mean a multilevel model, yes. 


I was referring to McFadden, D., Train, K., 2000. Mixed MNL models for discrete response. Journal of Applied Econometrics 15, 447–470. but I believe your answer is still yes. I was also wondering whether it was possible to run a model with exogenous latent varaibles [metric contrinuos] and have an endogenous variable given polytomous discrete choices. In other words can Mplus run a diecrete choice model with latent varaibles? 


A multinomial regression with latent variables as covariates can be estimated in Mplus. 


I like to compare three groups in their delinquency development (3 classes) and associated covariates. For this, I use a model as presented in Example 8.8. However, I only get 1 overall result for the multinominal logistic regression, not 3 logistic regression analyses for each group as I expected (same as in the exampleoutput on this homepage). Is it possible to get one result for each subgroup?  I would like to compare the regression coefficients between the groups. A second questions concerns the reference class in multinominal logistic regression analysis. Despite of giving the "right" starting values for intercept and slope of the 3 classes, I do not always get the classes in the order I need for the logistic regressions. Are there further suggestions how to deal with this issue. Thank you very much for your support in advance! 


In the multinomial logistic regression of a categorical latent variable on a set of covariates, the last class is the reference class. This regression cannot vary across classes. The covariates explain the classes. When there are more than two classes, Mplus gives the results with each class as the reference class. If you have further questions, send your files and license number to support@statmodel.com. 


Dear Linda, Dear Bengt, I have a model with 2 latent continuous variables and a nominal (3 unordered categories) dependent variable. I regressed the nominal variable on the two latent variables and I've got my results. Now I want to do a multiple group analysis on the same data (I have to different groups), but when using the usual syntax for multiple group models I get this error message: *** ERROR in ANALYSIS command ALGORITHM = INTEGRATION is not available for multiple group analysis. Try using the KNOWNCLASS option for TYPE = MIXTURE. this is my input file: TITLE: modello 2 DATA: FILE IS sem_2.dat; VARIABLE: NAMES ARE gruppo dasis1 dasis2 diff difft compeff; nominal ARE compeff; GROUPING IS gruppo (1 = IND 2 = DEC); MODEL: att_imp by dasis1* dasis2 (1); att_esp by diff* difft (2); dasis1 dasis2 (3); att_imp @1; att_esp @1; compeff ON att_imp att_esp; MODEL DEC: compeff ON att_imp att_esp; output: samp stdy stdyx ; IT IS NOT CLEAR TO ME HOW I HAVE TO USE THE "KNOWNCLASS option for TYPE = MIXTURE." SINCE MY MODEL DOES NOT INCLUDE LATENT CATEGORICAL VARIABLES. Thanks in advance Claudio 


A mixture model with one categorical latent variable for which the classes are known is the same as a multiple group analysis. See Example 8.8. Use only the KNOWNCLASS variable. 


Dear Linda and Bengt, I would like to regress class membership on covariates in my model, but seem to be running into problems. I tried using "c on gender," but the proportions of my classes change wildly. I tried "auxiliary(r)=gender," but Mplus doesn't seem to recognize this (can you only use auxiliary(e)?). I also tried auxiliary=gender, but gender doesn't show up in the output anywhere. The manual suggests using the SAVEDATA option, but I'm unclear how this might be used. What I really want are estimates with odds ratios saying that males have twice the odds of belonging to class 1 as compared to class 2. Thanks! Mary 


If the classes change when you regress the categorical latent variable on a covariate, this indicates that there are most likely direct effects between the latent class indicators and the covariate. The following paper which is available on the website discusses this: Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345368). Newbury Park, CA: Sage Publications. 


Dear Linda, Thanks for the quick turnaround time on your response! I reviewed the article and then found example 7.12, which gives me the code for modeling direct effects of covariates on indicators. Thanks, Mary 

Anjali Gupta posted on Saturday, November 14, 2009  8:36 am



Hello, In my path models with continuous outcomes, the output includes a 'with' measure of correlation for the dependent variables. If my path model has one binary dependent variable, I don't get that correlation, probably due to the binary variable. Is it possible to get some parallel measure for logistic path models? I've illustrated all my models for my dissertation  and it would be nice if they all contained the same amount of information (to avoid concern). 


So you are asking about a model with several DVs, one of which is binary. The residual covariance (WITH) appears by default for the WLSMV estimator because then that pertains to the underlying multivariate normal DVs (so analogous to having continuous observed DVs). With ML logit, however, there is not such a natural underlying multivariate distribution. With an extra twist, you could do ML probit and add residual covariance by defining a factor f BY DV1 DV2; f@1; [f@0]; where the second loading carries the information about the residual covariance for DV1 and DV2. This can also be used in the ML logistic context. 

Anjali Gupta posted on Saturday, November 14, 2009  9:28 am



Hello, Thank you for the quick reply. I entered all of the syntax you provided (not sure that's correct). "f BY cesd03 afdc_03; f@1; [f@0];" And it ran  but the coefficients predicting to my binary DV changed quite a bit. Maybe I can 'ignore' that and still obtain the needed statistic? And I'm not sure where in the output I'd find the needed covariance. Thank you. 


The loading for afdc_03 on f is the covariance estimate. If it is significant, this residual covariance should be in the model and you would have to use the new estimates for the binary DV coefficients. If the residual covariance is significant, the model without it is misspecified and should not be interpreted. Similarly, if you run this using estimator = WLSMV you can test if that WITH should be in the model and you can see how those binary DV coefficients change. Note that you want to include these residual covariances among all your DVs. 


You should also check that the new factor f does not correlate with any other variables in your model  there should be no estimated f WITH... in your output. If you have any, fix them at zero. 


Hello, I am in the process of running latent growth mixture models to identify distinct trajectories of change in crime rates from 1981 to 2006 at the county level. I consider four crimes: homicide, aggravated assault, robbery, and simple assault. Each model takes between 70160 hours to run. After the distinct trajectory groups are identified, I would like to explore the extent to which trajectory groups differ in terms of timevariant covariates (population change, changes in residential mobility, poverty change, etc). What is the best way to examine the impact of these timevarying characteristics within the framework of mixture modeling? Would it be appropriate to use a percent change in population size, poverty rate, and residential mobility between 1981 and 2006 as a timeinvariant covariate for explaining the change in crime rate between 1981 and 2006? 


In the beginning of your message you say "timevariant covariates", which implies that the covariate changes over the time period that you consider and at each time point influences the outcome at that time point. If this is what you mean, I don't see how such a variable can be related to trajectory class membership which is something that is stable over time. At the end you say "timeinvariant covariate", so either you are then talking about another matter, or that's what you meant to begin with. 


I apologize for an unclear question. The initial idea was to incorporate several “timevariant” covariates and examine the extent to which they explain the level of crime rate at each time point (overall 25 time points) for each latent class. However, I am afraid that this approach will give me too much information about each latent class, and I will be unable to draw meaningful conclusions about what makes the latent classes different. Modeling changes in each of the selected covariates (I have 10 covariates) over time and relating them to changes in crime rates across latent classes seems even more difficult task. At the same time, the idea of substituting “timevariant” covariates with “timeinvariant” (measured just once in time) seems insufficient for accounting for almost three decades of data on crime rate. That is why I was thinking about a way to account for change(s) in my covariates but without complicating the models too much. For example, one of the variables I will be including is the change in population over time. Is it appropriate to include the relative population change in size of counties between two pairs of data points (19812006) to predict the class membership and/or the rate of change in crime rate for each latent class? 


I think incorporating timevarying covariates is important in studying crime rates. I don't think it will hinder your growth mixture modeling. But you should let the timevarying covariates influence the outcomes, not the latent class membership. You are right, however, that modeling with timevarying covariates involves more complex interpretations that using only timeinvariant covariates, and that is true even in a regular, singleclass, growth model. As an example of the latter, you are asking if a change in a covariate between two time points can be specified to influence the rate of change in crime rate. So a timevarying covariate influencing the slope growth factor. That seems complex. I have worked with a model where a timevarying covariate influences the intercept (level) growth factor, but not the slope. I wonder if you could apply a piecewise model with pieces determined by when large changes in pop size occur. This can be applied also with mixtures, not having pop size influencing class membership, but focusing on the influence on the slope which is a withinclass entity. 


A possibility would be to score county pop. size (p) change as the timevarying covariate x_t = p_t  p_t1 and let x_t influence the outcome y_t. This means that the growth factors are interpreted as the crime curve for zero change in county pop. size (zero values of the timevarying covariate). This can be used for singleclass as well as mixture growth, where for mixtures the influence of x_t on y_t can be different in different classes if you want. 


Hi, I am trying to fit a sequence of latent profile analyses (LPA) with covariates that predict latent class membership: * Model 1 is the LPA without covariates (3 Classes). * Model 2 includes one continuous and two binary observed variables as covariates of the LPA (age, gender, school type). * Model 3 adds a continuous latent factor as predictor of class membership (competence). Here are my questions: (1) As far as I have read in another thread, there are no global fit indices to judge the fit of models 2 and 3 (right?). I could compare the loglikelihoods and information criteria of the models, but would that be meaningful? I don't think so, because Model 3 is much more complex than Model 2 since it involves an additional latent variable. (2) Is it possible and meaningful to calculate a measure of explained variance for Models 2 and 3 (e.g. Nagelkerke)? If yes how could I do that? I know that I would need the likelihood of an interceptonly model, but how do I get this? Many thanks in advance Johannes 


1. Chisquare and related fit statistics are not available for your models. Model are nested if they contain the same set of observed dependent variables. 2. I'm not sure how you would do this. 

Finbar posted on Tuesday, October 19, 2010  6:22 am



Dear Bengt and Linda, I have a multinomial logistic model with a DV containing 5 groups, including the reference group. When looking at the modification indices it does not tell me which of the four groups is being referred to. For example, Y# ON... Y# ON... Y# ON... Where I get the significant results for three of the four outcome groups. Is there a way to find out which outcome group is being referred to? If there were four groups, I would imagine that they are in order of 14. For my regression results, it does not give the outcome # either, but they are in order of 14. Thank you 


The last category of the nominal variable in a multinomial logistic regression is the reference category. I think you need to shorten your variable name so the # is printed. They are printed from lowest to highest. 

Syed Noor posted on Thursday, February 03, 2011  10:29 pm



Hi, I have 2 questions regarding LCA with Covariates.. I have identified 3 latent classes (C) using 5 predictors (u). Now I want to run multinomial regression to identify sociodemographic variables that are associated with class membership. 1. One of my covariates marital status has three categories. But when I look at the OUTPUT I am seeing only one odds ratio for c#1 vs C#3. Am I looking at the wrong output? If yes then, 2. Which part of the output will give me the odds ratio with 95% CI for C ON x1 x2 x3. Thanks in advance 


You should create two dummy variables for marital status. Use the CINTERVAL option to obtain confidence intervals. 

Syed Noor posted on Friday, February 04, 2011  12:41 pm



Thank you, Linda..Thank you very much. Syed 

Syed Noor posted on Saturday, February 05, 2011  5:50 pm



Hi Linda, Without covariates, 2 class 3 class BIC 7182 7163 p (LMR) 0.00 0.001 entropy 0.88 0.753 With covariates, 2 class 3 class BIC 7141 7105 p (LMR) 0.00 0.771 entropy 0.889 0.814 My understanding is without covariates 3class solution fits the data better (low BIC compared) but with covariates 2 class solution fits the data better (p=value for LMR). And I think final decision should be made with covariates in the model. What do you suggest? Another thing 3 class with covariates giving a very wide 95%CI for some covariates. I think there are some direct effects from x to u but I am not sure how to model that. Any suggestion? Thanks in advance. Syed 


It seems BIC chooses the threeclass model in both cases. Direct effects are specified as u ON x; 


Profs. Muthen, Can we estimate Random Parameter Logit or Mixed Logit, as coined by Prof. K. Train, in Mplus (mine is V4.0 + mixture addon) when the coefficients associated with covariates follow distribution other than standard normal, e.g. we want to estimate the following model Y = g(alpha + beta1*X1 + beta2*X2) Y is binary (0 and 1) g(.) is a logit link alpha follows standard normal beta1 follows truncated normal beta2 follows triangular distribution Thanks and regards Sanjoy 


You can let the coefficients have a normal distribution, but not the other two distributions. 


Hi, I have a question regarding the output of a multinomial regression analysis, where i regressed some characteristics on latent class membership (3 classes). Standard, the last (3) class is the reference group and in the output i can find the estimates and associated pvalues. However, beneath there is also the heading “ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION” where i can find the output when using another class as reference group. When comparing the same classes, the estimates are the same (in value, but reversed), however, the S.E. is different and therefore pvales are different. How is this possible? 


Please send the output and your license number to support@statmodel.com. 

Julia Lee posted on Wednesday, April 11, 2012  1:58 pm



I am using LPA with multinomial logistic regression. Is there a reason why the magnitude of the estimate of one class is so much greater than the others (4 classes vs on referent)? Does local maxima causes this kind of results? I appreciate your response. For example: C#1 ON LSF 0.115 0.436 0.265 0.791 P 1.261 1.301 0.969 0.332 RAN 0.134 0.543 0.246 0.805 PM 0.045 0.507 0.089 0.929 OL 0.939 0.478 1.964 0.049 SSRS 1.313 0.673 1.949 0.051 SWAN 0.102 0.327 0.313 0.754 SPEECE 1.035 2.897 0.357 0.721 GENDER 0.253 0.654 0.387 0.699 FRL 1.299 2.212 0.587 0.557 TIER 123.064 379.908 0.324 0.746 C#2 ON LSF 3758.058 1649.341 2.279 0.023 PA 2112.290 823.188 2.566 0.010 RAN 228.478 124.362 1.837 0.066 PM 243.467 125.532 1.939 0.052 OL 2029.686 851.741 2.383 0.017 SSRS 1080.844 468.392 2.308 0.021 SWAN 2887.610 1144.011 2.524 0.012 SPEECE 1519.415 552.779 2.749 0.006 GENDER 1468.333 674.166 2.178 0.029 FRL 1200.846 438.840 2.736 0.006 TIER 9583.447 3806.868 2.517 0.012 


The probabilities in class 2 must be much smaller than those in the reference class. I don't think this is a local solution. Did you replicate the best loglikelihood several times? 

Julia Lee posted on Wednesday, April 11, 2012  4:24 pm



Hi Linda, You are right! The probabilities in class 2 is smaller than the reference classes. Is there a way to solve this problem? Thanks! FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 156.71638 0.30080 2 28.44918 0.05460 3 113.43462 0.21772 4 155.98498 0.29940 5 66.41484 0.12748 

Julia Lee posted on Wednesday, April 11, 2012  4:27 pm



Hi again Linda, To answer your question, best likelihood (i.e., the first LL on the list was not replicated) while the 2nd best likelihood was replicated 19 times. Is that the cause of the problem? This is what I used for the starts. Thanks again. STARTS 800 40; STITERATIONS = 40; 


Try STARTS = 2000 500. If you don't replicate the best loglikelihood, you have hit a local solution. 


I've calculated predicted probabilities of class membership from a multinomial logistic regression model with latent classes on binary covariates. I was wondering how I get confidence intervals to examine if the differences in probabilities between groups are significant? 


If you specify the predicted probabilities in MODEL CONSTRAINT, you will get a standard error which can be used in computing confidence intervals. You can also ask for them by using the CINTERVAL option of the OUTPUT command. 


Dear Linda, I run a multinomial logistic regression with continuous and binary predictors and continuous mediators. I've used the MODEL CONSTRAINToption to specify the indirect effects. The output, however, produced equal estimates for each category of the outcome. Do you have any idea? Thanks, Mario 


Please send the output and your license number to support@statmodel.com. 


Dear Professors Muthén, I would like to constrain two of my regression coefficients as equal in my multinomial logistic regression. I have managed to have the wished coefficients equal, however, also across categories and not just within categories... How should I modify the following syntax in order to have different estimate between categories, but equal within categories? nominal is jp; MODEL: jp ON age fHU96 fHU05 fHU08 youngA transgen statshift; jp on lostgen1 (1) lostgen2 (1); Thank you in advance for your help, Zsofia Ignacz 


If jp is nominal, you refer to its categories by jp#1 jp#2 etc 


Dear Professor Muthen, Thank you very much for your answer! It works perfectly! Kind regards to you, Zsofia Ignacz 


I am running a multinomial logistic regression by regressing covariates on a 3class GMM. I saw from the previous posts that to change the reference group was to change the starting values. So I did the following, but I got the exact same results. Did I miss anything? c#3 as reference group %c#1% [i*31 s*1.5]; %c#2% [i*28 s*.1]; %c#3% [i*40 s*0]; c#2 as reference group %c#1% [i*31 s*1.5]; %c#2% [i*40 s*0]; %c#3% [i*28 s*.1]; c#1 as reference group %c#1% [i*40 s*0]; %c#2% [i*28 s*.1]; %c#3% [i*31 s*1.5]; 


Please send the output before you added the starting values and the output with the starting values along with your license number to support@statmodel.com. 


Hi Linda and Bengnt, I see under EXAMPLE 3.6: MULTINOMIAL LOGISTIC REGRESSION in the current manual that one can use code like this (below) when one wants to give starting values or place restrictions on the parameters. MODEL: u1#1 u1#2 ON x1 x3; How might I adjust this code to provide starting values? My ordered dependent variable has 3 levels. I assume I would adjust it to reflect these 3 levels, as in: MODEL: u1#1 u1#2 u1#3 ON x1 x3; But can you give an example of what starting values to use, and where to place them in this code? 


Linda and Bengt, in the post above, I should not have referred to the dependent variable as having ordered levels; it is a nominal (unordered) variable. Also, I should add: The purpose of the starting values is to make the second class the reference class. By default the reference class is the third class. But I also want the odds ratio for Class 1 vs. Class 2. What starting values should I add to the code above, and where would I out them? Thank you! 


With observed nominal variables you don't change the order of categories by starting values as you do with latent categorical (latent class) variables). You can instead change the scoring of your observed categories so that the highest score corresponds to your reference class because the highest score category is chosen as the last, reference, category. 


Thank you, Bengt, for your help! 


I am looking for any advice about calculating confidence interval for predicted probabilities of class membership from a multinomial logistic regression model. I have calculated the predicted probabilities and used the CINTERVAL command to get lower and upper estimates of the coefficients. However, I’m not sure how to use these to estimate 95 percent CIs around the predicted probabilities e.g. do I use the lower estimate for the intercept with the lower estimate for the coefficient. I appreciate this does not relate directly to Mplus, however I’m struggling to find an answer. Any help with getting to the right calculation or useful reference would be really appreciated. Thank you in advance. Jen 


You would need to use MODEL CONSTRAINT to get the predicted probabilities for the values of x you are interested in. You will then get standard errors you can use for confidence intervals. 


Dear Linda, Thank you for your response. I'm finding it difficult to work out how to use the MODEL CONSTRAINT command to get the predicted probabilities. Would you be able to provide any further instructions or reference an example? Kind regards, Jen 


See Example 5.20 to see how MODEL CONSTRAINT is used. See also MODEL CONSTRAINT in the user's guide. See pages 492497 of the user's guide to see how to compute predicted probabilities. 


Hello  I conducted a multinomial logistic regression in which I regressed a Time 2 observed continuous variable on Time 1 latent categorical variable (4 classes). I am now interested in examining a Time 1 observed continuous variable as a moderator. If this is possible, can you please provide me with some direction? XWITH and KNOWNCLASS commands don’t seem appropriate here. Thanks! 


I don't understand where the multinomial logistic regression comes in. 


I'm sorry I wasn't clear. Substantively, I'm interested in knowing, for example, whether high scores on peer rejection are associated with increased odds of membership in a victim class relative to a bully class. Logisitic regression results suggest yes. Now, I would like to investigate whether the association between high scores on peer rejection and the odds of being identified as a bully vs. victim depends on bullies' and victims' aggression scores. So, I would like to know whether I can examine aggression, an observed continuous variable, as a moderator. Does this make sense? Thank you. 


It sounds like you are saying, using regular mediation notation: x = peer rejection score m = aggression score y = victim/bully binary vble I may have misinterpreted because this does not jive with your first message because it said Time 2 was continuous DV and Time 1 was categorical with 4 classes. 


Thanks for your help, and I am very sorry for the confusion. Let me try to clarify. First, I conducted LPA at Time 1 and obtained a 4class solution (thus the categorical latent variable I referenced in the initial post). Second, I conducted the regression as stated in the first post. I realize that x=T2 peer rejection, m=T1 aggression, and y=T1 categorical latent class (bully, victim, bullyvictim, uninvolved) is temporally backwards. In view of this, I guess my first question is, could it be argued that this analysis makes conceptual sense in terms such as: high scores on T2 peer rejection are associated with increased odds of membership in one class compared to another at T1? If yes, can I examine aggresion as a continuous moderator (not mediator)? If no, might it be possible to examine a continuous moderator (again, let's say Time 1 aggression) of the association between Time 1 latent categorical variable (with 4 classes) and a distal T2 outcome (such as peer rejection)? So, in this case, let's suppose a wald test revealed that victims scored higher than bullies on rejection at T2. Can I examine whether this association depends on bullies' and victims' aggression scores (a continuous measure) or need I dichotomize aggression and use knownclass? Thanks very much for your continued help. 


You can have a T1 latent class variable c (4 classes) and a distal T2 continuous outcome y (peer rejection), moderated by a T1 continuous aggression score z. You would do this by Model: %overall% y on z; %c#1% y on z (b1); ... %c#4% y on z (b4); You can test equality of b1b4 using Model test. 

Back to top 