Anonymous posted on Thursday, May 11, 2000 - 2:56 am
I would like to estimate a structural equation mixture model, where y is the dependent and x1 - x4 are the independent latent variables. The path coefficients of the structural model should vary across latent classes. When I try run the following specifications, class size is always half of the sample size and no model fit measures are reported. Can you help me with the model specifications?
In mixture modeling you want to give starting values for parameters that distinguish the different latent classes. This means that your class-specific statements %c#1% and %c#2% should give starting values reflecting your beliefs about how the two classes differ. I think your run gets stuck due to the non-differentiated starting values, simply dividing the sample in halves.
Anonymous posted on Wednesday, August 16, 2000 - 3:34 pm
I have your paper on Second-Generation Structural Equation Modeling (#82) and printouts of the corresponding Mplus examples (penn1-penn7) and I am now trying to use them as a model for my own data.
How did you set the starting values used in these lines in penn1: math7-math9*7 math10*13; slope with intercpt*3.1; [intercpt*42.8 slope*.6];
Here are two strategies for obtaining starting values. How the ones in the Second Generation paper were obtained has been long forgotten.
· Strategy 1
· Do a conventional one-class analysis
· Use estimated growth factor means and standard deviations as growth factor mean starting values in a multi-class model - mean plus and minus .5 standard deviation
· Strategy 2
· Estimate a multi-class model with the variances and covariances of the growth factors fixed to zero
· Use the estimated growth factor means as growth factor mean starting values for a model with growth factor variances and covariances free
Peter Tice posted on Thursday, August 24, 2000 - 8:31 am
Starting values for multi-class models
Yes, I've been working on finding starting values for later identifying the best class model (based on BIC value). Thus far, I'm using the following strategy. I set the growth factor means at zero and for the individual class components provide an intercept value based on an estimated mean and +/- .5 standard deviation. In many ways I'm combining the two strategies listed in the message dated August, 17th. I'd like to continue setting the growth factor means at zero, since my preference is to assume no within-class variation. Nonetheless, with this strategy I'm comfortably able to fit multi-class models including up to 4 latent classes. However, when continuing further with models including 5 or more classes I frequently receive the message that Mplus is unable to calculate standard errors (either a result of incorrect starting values or model identification).
Below is an example of a program that successfully converges.
title: mixture model example data: file is u:\example.dat
variable: names are id t1age del1 del2 del3 del4 ; useobservations = t1age = 12 ; usevariables are del1 del2 del3 del4 ; classes = c(4) ; missing are . ;
analysis: type = mixture ; miterations = 1000 ;
model: %overall% i by del1-del4 @1 ; s by del1 @0 del2 @3 del3 @6 del4 @13 ; q by del1 @0 del2 @9 del4 @36 del4 @169 ; [del1 @0 del2 @0 del3 @0 del4 @0] ; i @ 0 s @0 q @0 ;
savedata: file is u:class ; save = cprobabilities ;
bmuthen posted on Thursday, August 24, 2000 - 9:40 am
A combination strategy for starting values seems reasonable. In your message, I think you mean to say that the growth factor variances, not their means, are fixed at zero; your input reflects this. Regarding problems with calculating standard errors, that often corresponds to a non-identified model (a problematic, non-positive definite Hessian matrix). It may well be that 5 classes is too much to identify from 4 time points of data. More research needs to go into mixture identification matters such as these. At the same time, with mixtures it is harder to judge if the Hessian is problematic due to non-identification or for other reasons, in which case the problem can be avoided by using better starting values.
bmuthen posted on Thursday, August 24, 2000 - 11:04 am
Adding to my previous message, it should not be implied that there must be more variables than classes in mixture modeling. What can be identified depends on the specific mixture model and the data. For example, there are many classic examples where many classes are found with only a single outcome variable. There is often, however, lack of empirical identification where there is not enough information in the data to support a certain number of classes.
Hi. I am running several latent trajectory class models and have a general question. What syntax would I need in order to estimate a quadratic factor in one class but not the other class in a typical two-class model? I believe I heard/read that this is possible but want to make sure that I am specifying the model correctly.
When you want a quadratic growth factor in one class but not in the other, then fix the mean, variance, and covariances of the quadratic growth factor with all of the other growth factors to zero. For example, if class 2 has no quadratic growth factor,
where i is the intercept growth factor, s is the linear growth factor, and q is the quadratic growth factor.
Taj Carson posted on Wednesday, January 23, 2002 - 7:28 am
I am working on an evaluation design that involves using a structural equation model with latent classes. However, I would like to measure change over time in the outcome measures, but will be using a cross-section of people measured at two different time points. Can I use MPlus given this data structure, and the fact that the data from time 1 is from different individuals than the data from time 2?
Anonymous posted on Tuesday, May 13, 2003 - 10:02 am
I have some questions regarding the modeling of latent class measurement models (LCMMs) in Mplus, in the case where the LCMM is posited as an intervening variable between a set of X variables and a series of distal outcomes (Y variables).
First, I’ve noticed that Mplus allows one to specify that the effects of X on Y are fixed or vary across latent classes. Is it the case that when the X effects vary by latent class they can be interpreted as interaction effects (X interacts with LCMM class membership) ?
Second, related to the question above, if the distal outcome (Y variable) is continuous, Mplus allows the means and variance of Y to vary by latent class. The variation in means across latent classes is straightforward, but if the LCMM is being used as a traditional intervening variable, how is the variation in variances to be interpreted ?
Third, if the LCMM (call it L) is concurrent with another intermediating outcome (call it H), both of which are allowed to have effects on a set of Y variables, is it possible to specify that the errors of L and H are correlated ?
Finally, if the distal outcome variable (call it OCAT) is an ordered categorical variable with greater than 2 categories, are the threshold terms OCAT$1, OCAT$2, OCAT$3, etc interpreted as conditional probabilities of some sort, i.e., p(outcome variable level = i versus the baseline | class membership = t) ?
bmuthen posted on Tuesday, May 13, 2003 - 10:12 pm
2) Variation in variances is a function of x predicting class (group) membership, where in each group any parameter including y variances may be group-specific - so a more general form of mediation.
3) The Latent Class model does not have errors in the conventional sense, but residuals from H could be made to have direct influence on latent class indicators beyond the latent classes.
4) Ordered polytomous outcomes are modeled using the proportional-odds model (agresit, 1990, pp 322-324), so saying that there are parallel logit lines for probabilities of outcomes C, C or C-1, etc, where C is the highest category.
Anonymous posted on Monday, December 22, 2003 - 11:50 am
Could you please describe or provide me with a reference for the EMA algorithm that is now the default algorithm for mixture SEM? In my experience I find it to be faster and equally good as EM, but I would like to know how it is working. Thanks in advance.
bmuthen posted on Monday, December 22, 2003 - 12:12 pm
No reference, but the algorithm simply switches away from EM when EM has shown to give little change in the log likelihood for a couple of iterations, and then instead uses quasi-newton or fisher scoring optimization for a while.
Anonymous posted on Wednesday, April 28, 2004 - 2:07 pm
I'm interested in mixture factor analysis with binary or ordinal dvs. I find it difficult to conceptualize how one infers from binary or ordinal indicators the presence of a mixture distribution of continuous latent factors. It would be useful to see a paper that goes into some detail on this, if you would be able to provide a reference.
bmuthen posted on Thursday, April 29, 2004 - 6:14 pm
This is at the research frontier and I am not aware of a paper that detail this - our own writings and formulas behind the software algorithms are not yet ready for dissemination (but soon). You should conceptualize this analogous to how you conceptualize doing the analysis for continuous outcomes. A mixture of continuous latent factors gives rise to a non-normal latent variable distribution and such a distribution can fit some data better than using a normal distribution.
gdeitz posted on Saturday, July 17, 2004 - 7:33 pm
For my dissertation, I am interested in using a finite mixture SEM approach to devising an organizational taxonomy and then comparing the fit to that of measures of established conceptual typologies. The SEM model will involve 5 predictor (latent) variables and two (observed) dependent variables. I have a model paper, but I believed the authors used a competitor's software (although it is not so stated).
Based on what I've seen of MPlus, I think it will do what I need it to do. However, in looking at the examples and training videos, I'm not entirely confident that I've seen anything that exactly addresses what I'd like to do. (Of course, being new to LCA, maybe I'm just not understanding what I'm reading? ;) ). Can you put my mind at ease before I place the order for a student license? Thanks.
Look at the following papers which can be downloaded from the Mplus website under Mplus papers. I think there are in the general area that you are interested in.
Lubke, G. & Muthén, B. (2003). Performance of factor mixture models. Under review, Multivariate Behavioral Research.
Lubke, G. & Muthén, B. (2003). Investigating population heterogeneity with factor mixture models. Under review, Psychological Methods.
Anonymous posted on Sunday, December 12, 2004 - 7:16 am
I am conducting a mixture model. I would like to know how weights are handled in mixture models and the effects of weights on the estimated class probabilities. I cannot find the details in the manual and in the technical appendix. Thanks in advance!
bmuthen posted on Sunday, December 12, 2004 - 11:08 am
The handling of weights is described in Mplus Web Note #7 shown at
Hi, I am estimating a latent class SEM with multiple classes and i would like to set-up a class in which none of the IV's affect the DV and the regression equation for that class just has a constant on the rhs. (for the remaining classes i would like the constant and the effects as well). how do i estimate the equation intercept?
If I understand correctly, you would specify the regression equation in the %OVERALL% part of the model and then fix the regression coefficients to zero in the class for which you want only the mean of the dependent variable to be estimated.
anonymous posted on Friday, February 17, 2006 - 4:11 am
hi i am intending to use SEMM, with a categorical latent class variable c as a predictor of a number of continuous latent variables f. my question relates to the measurement model part of this analysis. more specifically, how do i integrate the latent class variable into the measurement model? i understand that LCA is a measurement model in itself, so do i still need to include the categorical latent c in the measurement model(CFA) of the continuous latent variables, and if yes, how? or am i getting this all wrong? also, can you point me to any paper that has used SEMM?
The factor would be a distal outcome like in Example 8.6 which shows an observed variable as a distal outcome. You would just have a factor as a distal outcome and the variation of the factor means over classes are the parameters of interest. See the following paper which can be downloaded from our website:
Lubke, G. & Muthén, B. (2003). Investigating population heterogeneity with factor mixture models.
It has been published in Psych Methods with a 2005 date I believe.
katharina posted on Thursday, March 02, 2006 - 1:55 am
When estimating a factor mixture model, am I correct in assuming that Mplus by default constrains the parameters required for strict factorial invariance to be equal across classes, while furthermore letting the factor means vary across classes (with the last class receiving a factor mean of zero) by default?
I'm trying to fit a mixture-SEM. How does it approximately take to get a solution? In a first step I allowed only one equation (sr) to be different, but would like to let vary the whole structural model. I let the model run for more than 1 hour and didn't get a solution (on firstname.lastname@example.orgGHZ), the one-class-model runs about 5 seconds. Or is the model to complex to get a mixture-solution? Are there any tricks to speed up the calculation? VARIABLE: USEVARIABLES ARE pt1 pt2 pt4-pt6 sr1 sr2 sr4 sr5 sr7 sr9 sr10 joy1-joy7 ang1-ang6 kog1 kog3-kog6 att2 att4-att7 int2 int4-int7 loy2-loy4 loy6 loy7 per rab mind; CATEGORICAL ARE per rab mind; CLASSES=c(2); ANALYSIS: TYPE=MIXTURE; ESTIMATOR IS MLR; ALGORITHM=INTEGRATION; MODEL: %OVERALL% pt BY pt1 pt2 pt4-pt6; sr BY sr1 sr2 sr4 sr5 sr7 sr9 sr10 ; joy BY joy1-joy7; ang BY ang1-ang6; kog BY kog1 kog3-kog6; att BY att2 att4-att7; loy BY loy2 loy3 loy4 loy6 loy7; int BY int2 int4-int7; pt ON per mind rab; joy ON sr pt; ang ON sr pt; kog ON sr pt per mind rab; att ON joy ang kog; int ON att; loy ON att int; %c#1% sr ON per*0.298 rab*0.01 mind*-0.2; %c#2% sr ON per*0.2 rab*0.2 mind*-0.1;
IF you take the covariates off of the CATEGORICAL list, you won't need numerical integration and things should be much faster. If this does not solve your problem, please send input, data, output, and your license number to email@example.com.
C. Sullivan posted on Tuesday, November 20, 2007 - 6:21 am
I am trying to run a structural equation model with (like ex. 7.19) a few (continuous) latent predictors and a fully endogenous latent class variable. I've been able to get the separate measurement models (2 CFAs and an LCA) to run with reasonable solutions but get a "fatal error...reciprocal interaction problem" message when I try to put the models together. Is there anything I can do to correct this problem?
Stephan posted on Thursday, November 29, 2007 - 7:41 pm
Latent Class/Latent Profile Analysis
Hello, in his paper #86 Prof. Muthén writes on p.1 "(..)data consists of different groups(..)but group membership is not observed" I investigate the collaboration between universities and commercial blue chip-companies. Let’s say 5 exogenous LV and 1 endogenous LV, all continuous. (…) F6 ON F1-F5; (…) However, beside various unobserved population variables I assume that several unis rated their relationship to the same blue chip-company. Due to confidentiality the data set has no matching variable but I believe that the sample is not independent. My question: (1) Is latent profile analysis a valuable tool to take this data set drawbacks into account? (2) If yes, and the outcome will be reasonable can I say that there are invariances between groups but I can only assume what the causes are (same collaborator, country, size, age,…)? (3) In his paper “Maryland keynote v21” Prof. Muthén refers on p.4 to Lubke/Muthén (2005) and I was wondering if this paper is also available but could not find it. It’s not on the reference list.
Any suggestions are appreciated. Many thanks in advance. -Stephan
This can be fine depending on how well separated the classes are. If they are not well-separated it may not be enough. Basically it depends on the data. You can do a Monte Carlo simulation based on the specifics of you data to see.
Hao Duong posted on Tuesday, October 07, 2008 - 9:46 pm
Dr. Muthen, When I run the model with three classes, by default the covariances (WITH) between intercept and slopes (2 slopes) are similar across classes. However, I would like to examine them freely since I expect they may be different across classes. Is this option possible in Mplus? If yes, please explain. Thank you Hao
The default is to hold them equal across classes. To relax the equality constraint, mention them in the class-specific parts of the MODEL command.
Vlad posted on Tuesday, February 02, 2010 - 4:44 am
Hello, I am following example 7.27,page 176, from Mplus book with an additional structural equation, ans on f within each class(I have 2 classes in my model). f is a latent variable which varies between classes. It appears that in class 2 the coefficient and variance of f (ans on f) are not significant. Moreover, the variance of is close to zero. Thus, I restricted the coefficient and variance of f in this class to be equal zero.However, once a model is estimated with new restrictions, classes are switching their places. In other words, class 2 in the new model(with restrictions) represents the sample that was previously classified as class 1. As a result, the restrictions(coefficient and variance of f=0)are applied for a class of not my interest. I have also tried to give starting values for each class but it doesn't work, classes are still switching. Do you have any suggestion how can I test my restrictions for the particular class? Regards, V
In Mplus, I use both latent profile analysis (c=4) and structural equation mixture model (one continuous latent variable and one categorical latent variable, all observed variables are continuous variables, c=4). I want to know why the numbers of latent class in LPA is not equal to the numbers of latent class in SEMM.
Dr Muthen, my research investigates whether the relationship between 2 antecedents and 3 performance measures (outcomes) is influenced by the adoption of certain practices by a set of firms.
I have built a measure of the adoption of these practices using 11 items. My scale measures how frequently firms implement those practices along a three-stage new product development process. The scale is meant to capture the adoption of those practices both in terms of intensity (i.e. frequency) and scope (how many phases). Since literature indicates that both intensity and scope of adoption should have relevant consequences, I wish to retain both dimensions in my model. For this reason, I am thinking about a model that allows me to identify groups of firms that display similar adoption patterns, and to study the "antecedents-outcomes" relationship within each group.
Put simply, I am thinking about a mixture model where a categorical latent variable moderates the "antecedents-outcomes" relationship (intercepts and slopes would be free to vary across classes). In this model: - the categorical latent variable would be measured by the 11 adoption items - the two antecedents would be measured by their respective scales - the three outcomes would be measured using summated scales from EFA.
I would really appreciate your opinion on my modeling approach. Does it look ok? Thanks a lot!
It seems alright to me. As a first step, you may want to do LCA of the 11 items alone and see if the latent classes make substantive sense. When doing the full model, you would let the slopes of outcomes regressed on antecedents vary across the latent classes (as well as the default class variation in their means).
Thank your Bengt. I followed your advice and run a LCA of the 11 items. I also added a few covariates using the aux(e)/aux(r) commands as I was interested in investigating predictors of class membership.
In the full model I let slopes and means vary across the classes as you suggested. Now I would like to test for differences in slopes and means. I am assuming that a chi-square model difference test between a constrained and an unconstrained model is the way to go. In this case, I will have to take into account (2x) the difference in loglikelihood values (considering the scaling correction factor) and the difference in free parameters between the two models to obtain the chi-square statistic. Correct? Thanks!
Thank you Linda. I will use the MODEL TEST command as you suggested. I am wondering if there's any way in which I can instruct MPLUS to hold the slopes for one equation equal across all classes and test it using the MODEL TEST command.
I have 3 equations per class, with three independent variables per equation, and a total of four class. The independent variables are the same across the 4 classes.
I would have used an input like this:
(sx_y is the label I used for slopes, where x indicate class number and y refers to the independent variable for which I am testing equality of slopes)
However MPLUS allows only one equality sign per line.
So instead I have used this input: s1_1=s2_1; s1_1=s3_1; s1_1=s4_1; s2_1=s3_1; s2_1=s4_1; s3_1=s4_1;
is this the right way to code it? I am interested in a joint test first, and then eventually look for pairwise differences (similar to an ANOVA test with post-hoc analyses). Thank you very much!
xianhuazeng posted on Friday, February 24, 2012 - 6:33 am
I am following example 7.19. When u1-u4 are continuous variables, the c is 3 classes latent variable and i want to continuous latent variable regressed on categorical latent variable TITLE: this is an example of DATA: FILE IS ex7.19.dat; VARIABLE: NAMES ARE u1-u8; CATEGORICAL = u5-u8; CLASSES = c (3); ANALYSIS:TYPE =MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% f BY u1-u4; f ON C; %c#1% [u5$1-u8$1]; %c#2% [u5$1-u8$1]; %c#3% [u5$1-u8$1]; OUTPUT:TECH7 TECH8; is it OK?
C cannot appear on the right-hand side of on. The effect you want is the varying of the mean of f across classes.
xianhuazeng posted on Friday, February 24, 2012 - 10:09 pm
Thank you Linda. When u1-u4 are continuous variables, the c is 3 classes latent variable .DATA: FILE IS ex7.19.dat; VARIABLE: NAMES ARE u1-u8; CATEGORICAL = u5-u8; CLASSES = c (3); ANALYSIS:TYPE =MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% f BY u1-u4; c#1 ON f; c#2 ON f; %c#1% [u5$1-u8$1]; %c#2% [u5$1-u8$1]; %c#3% [u5$1-u8$1]; OUTPUT:TECH7 TECH8;
xianhuazeng posted on Saturday, February 25, 2012 - 12:40 am
Yes. See the ESTIMATOR option in the user's guide. There is a table that shows all combinations of analysis types and the estimators available for them.
Jon H posted on Saturday, January 17, 2015 - 10:22 am
I'm helping someone put together a structural equation mixture model, and I want to double-check the model we've been working on. This analysis has three main parts.
First Part: Running a latent class analysis. There are four observed indicators (dependent variables), all dichotomous. They've identified four latent classes using this approach. Not anything too unusual.
Second Part: Running a latent growth curve model on a different dependent variable. This is a continuous dependent variable. Nothing fancy here.
Third Part: Looking at the impact of the four latent classes (independent variable) on the slope and intercept (dependent variable)--put another way, I'd like to see the mean of the intercept and slope vary across classes. However, we want the four latent classes to be predicted from the four dichotomous indicator variables and not from the intercept, slope, or the continuous dependent variable. So this is where it gets tricky.
It seems like the best solution is some hybrid approach using some of the syntax from 7.20, but supplementing with some others from 7.1 and 10.8. I'll post another message with the syntax and some questions immediately below this.
Jon H posted on Saturday, January 17, 2015 - 10:26 am
This is the second part of the post I was referring to.
My understanding is that by specifying y1-y4 in brackets underneath %c#1%, %c#2%, %c#3%, and %c#4% that means that the latent classes will come out of y1-y4 and not anything else. But the mean of the intercept and slope will vary across all four classes.
Questions: 1) Is this a correct interpretation? 2) If this interpretation is not correct, what is this doing? 3) If this interpretation is not correct, do you have another suggestion on how to estimate the model? 4) Is there a simpler way to do this that I'm not thinking of?
The i and s means will also vary across the classes using your input. It sounds like you instead want to use a 3-step approach where the classes are determined only by your y's. See the manual 3-step procedure described in the paper on our website:
Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Appendices with Mplus scripts are available here.
We request that postings are limited to one window.
hankyoung posted on Wednesday, April 20, 2016 - 6:44 am
Do you run wald test?? How did you know syntax??? Let me about syntax plz.
hankyoung posted on Wednesday, April 20, 2016 - 5:03 pm
hi I am running a latent class analysis on a dataset of 40,000 cases. and I have 20 dichotomous variables that I would like to include in the LCA. but I have some problems, first, is this too many variables for the number of cases and dataset? Is there a way to conduct a power analysis for LCA? second, this is syntax problem. I have to wald test and found syntax in idre site. and I saw user guide but did not resolve.
data: file = mult_grp_lca_con.dat; variable: names = group a1 a2 a3; usevariables = a1 a2 a3; classes = g(2) c(2); knownclass = g (group=0 group=1); analysis: type = mixture; model: %overall% c on g;
%g#1.c#1% [a1*0.247] (p1); [a2*2.142 a3*-0.960];
%g#1.c#2% [a1 a2 a3];
%g#2.c#1% [a1*0.978] (p2); [a2*0.173 a3*-1.541];
%g#2.c#2% [a1 a2 a3];
model test: p1 = p2;
but I don't know this is right and I don't understand separately meaning. help me please.
I am trying to run a latent class moderation of a continuous observed outcome (Lanza & Cooper, 2016) measured at baseline and 1-month follow up. Following classify-analyze methods (Lanza & Rhoades, 2013), I have conducted the LCA and added the LC variable (3 classes in this case) to my dataset. I am now trying to run logistic regressions for the moderation but getting the following errors: *** ERROR The following MODEL statements are ignored: * Statements in the OVERALL class: PAM_BL ON LATECL#1 PAM_BL ON LATECL#2 *** ERROR One or more MODEL statements were ignored. These statements may be incorrect. *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 55 Questions: 1) Am I right to be using the KNOWNCLASS label here to denote a latent class variable? If so, what do I need to change in the syntax to have it regress the continuous outcome on the latent classes? 2)In my data I do have some cases with missing data on the observed variable (PAM_BL) but I have indicated for this in my syntax with the MISSING ARE ALL(-99) command.
Hi Dr. Muthen, I'm looking to utilize the same SEM model as you specified in Figure 2 of your article "Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus" (Asparouhov & Muthen, 2013) to test whether Y on X slopes significantly differ between latent classes (I have a two-class model). I'm aware that class separation is one important factor in the power to detect significant differences between classes, however are there any articles you can suggest that would help me in estimating power to identify these differences in slopes? Thanks in advance for you time in responding.