Message/Author 

Daniel posted on Monday, August 25, 2003  9:16 am



I just read the paper by Gitta Lubke and Bengt about monte carlo studies of factor mixture models. I have two questions. First, how does one estimate Mahalanobis distance? Second, am I to understand that there isn't much difference in model performance (in terms of parameter coverage and average posterior probability) between a model with and without classspecific variance? Is it better to allow for classspecific variance in, for example, a growth mixture model, than to assume no variation, or does it not make much difference? 

BMuthen posted on Tuesday, August 26, 2003  10:00 am



I think the formula for the Mahalanobis distance is in the paper. Otherwise, see a multivariate statitics textbook. It can be done by using, for example, SAS proc matrix. Regarding your second question, as far as the data for this paper is concerned, you are correct. It may not be correct for all data. Regarding classspecific variances, they can make a big difference if they are needed to fit the data. For example, studying problematic behavior development over time, one often sees a normal class which shows a low mean and very little variation over time whereas other classes may have a lot of variation. 

Daniel posted on Tuesday, August 26, 2003  10:12 am



Thank you 


I have a question on the following portion of the output file: CATEGORICAL ARE x1x10; CLASSES = c(3); ANALYSIS: TYPE IS MIXTURE; MODEL: %OVERALL% F by x1x10; SAVEDATA: FILE IS outdat.dat; save = fscores; *** ERROR in Analysis command OUTPUT options FSCOEFFICIENT and FSDETERMINACY and SAVEDATA SAVE option FSCORES are not available when there are no latent variables. F is a latent variable. What I really want to do is to estimate different factor structures across classes and save the estimated fscores plus the cprob. By the way, I encountered no problem in saving the cprob. Thank you 


The default with mixture and categorical is to fix the factor variances to zero. Therefore no factor scores etc. are available. I think you need to add ALGORITHM = INTEGRATION to your ANALYSIS command. 


Linda  Is there information on how the factor scores are computed and what are the properties of the factor scores (e.g., validity, preserved factor correlations)? Does it matter if the indicators were continuous or categorical? Thanks 


See Technical Appendix 11 on the website for a description of factor scores for continuous and cateogical outcomes. 


Linda or Bengt, Could I check how a "zero class" as you used in Bengt Muthen and Tihomir Asparouhov, Item response mixture modeling: Application to tobacco dependence criteria, Addictive Behaviors, Volume 31, Issue 6, June 2006, Pages 10501066. (http://www.sciencedirect.com/science/article/B6VC94JW15NF6/2/fa328cda89499c0510baddb575a4512b) is tested? I guess that i'ts done by testing for two classes and setting the thresholds for each item to be 15 for one class? Is that right and is there anything else to be done to test a zero class? thanks Andrew (andrew.baillie AT mq.edu.au) 


You fix the thresholds at +15 for the zero class and you also fix factor means and variances at zero for this zero class. 


Bengt, thanks for your quick response (and for the excellent support!) I'm wondering about setting the factor means for a zero class to zero. I was expecting that factor scores would be standardised so that zero would be the grand average? That way the zero class "could" have a mean that was greater than the other classes? (This sounds v. unlikely with thresholds at +15) thanks again Andrew (andrew.baillie AT mq.edu.au) 


For the zero class you should think of the factor as not existing  when you fix both the mean and the variance of the factor to zero it no longer influences anything in the model. For the remaining classes, the average of the factor means across classes is not standardized to zero. Instead, a reference class with zero factor mean is used. 


Bengt & Linda, I'm wondering about the magnitude of the differences between the models in say yor paper with Tihomir Asparouhov. Given some of the analysis compares 2logliklihoods between nested models what do you think about using the omega^2 effect size as an index of the magnitude of the differences between models? Are there any other alternatives? thanks Andrew andrew.baillie AT mq.edu.au 


For nested models I like to use LR chisquare based on 2*LL diffs  when that is correct (no parameters on the border). I don't know about omega. We are also contemplating generalizing the bootstrapped LR test to models that differ not only in number of classes but also in terms of number of random effects. 

anonymous posted on Monday, April 02, 2007  3:15 pm



Greetings, I just read the forthcoming paper (Muthén, 2006) on Latent variable hybrids: Overview of old and new models. I was wondering if inputs examples for the four cross sectional models graphically exposed in figure 1 were available (mixture factor analysis, non parametric FA, factor mixture analysis, non parametric FMA). Thank you very much 


Send your email address to support@statmodel.com. 

Alex posted on Friday, April 27, 2007  6:53 am



I am working on a latent profile analysis in which I use factor mixture models to test for conditional dependance. Following your previous suggestions, I use factor mixture models with class specific factor variance. They work generally well. However, I also try to fit models with class specific factor loadings (i.e. 7.27). You warned me that such models could be hard to fit (and they are). For example, amongst the numerous warnings which I obtain, I often receive message regarding negative (and significant) residual factor variance within one of the class. My question is whether factor mixture models with within class factor loadings could be easier to fit by fixing the overall factor variance to 1 instead of fixing the loading of the first indicator to 1, while withdrawing (or not ?) class specific estimates of factor variance ? Would it still be logical to compare the fit of factor mixture models with class specific factor variance (unstandardized) to standardized models with class specific loadings ? Thank you very much in advance. 


Yes, when the factor loading matrix is made classspecific, it is my experience that it can help to set the metric in the factor variance (@1) for each class instead of the first loading. This way, the search for the best solution is less dependent on the quality of that one item. 

Alex posted on Friday, April 27, 2007  8:46 am



Thank you very much for your answer. Just to make sure I understand correctly, to "set the metric in the factor variance for each class", I only have to set the variance (f@1) in the %overal% section of the model and not in each class section. 


That's right. Saying that in the overall makes it hold for each class (see Tech1). 


* 10 binary observed variables of u1u10, * factor mixture model with 2 classes, * one factor for each class, * classes differ in factor means only. what should be the statements for item thresholds and factor means under %overall%, %c#1%, and %c#2%? 


Factor means and thresholds are free across classes as the default so I don't think you need to say anything about them. 


I am currently working on a dissertation project comparing the performance of Meehl's taxometric methods to latent variable methods. I have recently purchased the Mplus program to run mixture modeling methods. Every data condition I have simulated uses four continuous variables. I plan to run Latent Profile Analysis, Latent Class Factor Analysis and Factor Mixture Analysis. Is it possible to run Latent Class Factor Analysis with continuous indicator variables and is there an example in the user's guide? My Monte Carlo datasets are tab delimited .dat files that have variable names on the first line. I got an error message when I tried to run latent profile analysis "ERROR Invalid symbol in data file: "V1" at record #: 1, field #: 1." The analysis runs when I delete the variable names on the datafile. Is there a way to work around this in the format statement? I know it is possible to skip columns with fixed data formats but can I skip the first row with a free data format? Finally, I'd like the methods to determine the number of classes that best fit the data. Would I have to analyze each dataset twice, starting with c = 1 and then c = 2 or is it possible to ask the program to do both in one analysis? 


See the following paper which is available on the website for an example of Latent Class Factor Analysis: Muthén, B. (2006). Should substance use disorders be considered as categorical or dimensional? Addiction, 101 (Suppl. 1), 616. You should just delete the line with the variable names. You need to run the analysis separately for two, three, etc. classes. 


I have conducted CFA and LCA on a set of binary indicators. The results suggest evidence for a onefactor model or a threefactor solution with parallel profiles. I want to followup this analysis by estimating a hybrid model. I have read the papers by Muthen (2006) and Muthen & Asparouhov (2006) but I am a bit unclear about the differences between LCFA and IRT mixture modeling. I’d be grateful if you could answer the following questions: (1) Should LCFA be estimated in favor of IRT mixture modeling when the factor is considered to have a nonnormal distribution? (2) In LCFA, is it correct that the latent classes share the same dimension and therefore the factor loadings (but not thresholds) should be equal across classes? (3) Does it make conceptual sense to estimate both types of models on the same set of indicators? If so, in what circumstances? 


One paper that may answer many of your questions is the forthcoming Clark and Muthen article entitled "Models and strategies for factor mixture analysis: Two examples concerning the structure underlying psychological disorders." But since it is currently not available, here are some responses to your questions: First, IRT mixture models and LCFA are not separate models, but LCFA is a special case where the factor loadings and item thresholds are invariant across classes and the factor variance\covariance is zero. The only difference between the classes are the location of the classes on the factor, as indicated by the factor mean being different in each class. So, to answer question 2, the classes do share the same dimension, but the factor loadings and item thresholds are equal across classes. 


I would argue that both the LCFA and more flexible models which relax the equality of the item thresholds and factors loadings should be applied to data, but that it should be kept in mind what each of these models implies about the underlying structure of the data. LCFA and an alternative which allows the factor variance to be estimated (this variance can be restricted to be equal across classes or noninvariant), both have the same factor running through all classes and the difference between classes arises due to having class varying factor means and potentially factor variances. Other models which relax the equality of factor loadings and item thresholds may still have the same factor in both classes depending on the difference in the estimated item thresholds and factor loadings when they are allowed to vary across classes. Also, both the LCFA and other models which relax the equality of item thresholds and factor loadings across classes allow for a nonnormal factor. 

Matt Thullen posted on Tuesday, November 10, 2009  10:14 am



Hello In LCFA, How is the zero factor score interpreted? Im thinking of how to represent the factor scores within and between each class in for a model with 3 factors...like in a graph or plot. Also with LCFA, I based my syntax off the examples in Clark & Muthen(recently posted) and I get warnings about having more equality labels than parameters. I have something resembling this for each of my classes: [u1u12] (112). The model seems to run fine but Im not sure if or what I should do to address those warnings. thank you 


The zero factor score is a reference point. It is not identified as a free parameter. I would need to see your full output and license number to understand why you get an error for the syntax you show. 


I'm fitting factor mixture models and I have model with 4 latent classes and a single latent factor in each class. When I request class probabilities and factor scores, I get two different factor scores, one which is labeled the same way I labeled my latent factor and one with C_ as a prefix. I searched the Mplus version 6 manual but can't find any information on this second factor score. Can you direct me to some relevant documentation? 


I don't know of any documentation. One is mixed over classes and the other is for the most likely class. 


Which is which? 


c_ is most likely class membership. 


Hello, I have a factor mixture model with two factors and 4 classes. I am allowing the means, thresholds, and factor loadings to vary across classes. My factor loadings however have only 0.00 for SE, and 999 for for my pvalue. Is this because I did not make any specifications for the factor variance and it is being held at 0? Can I still compare these factor loadings across classes? My code looks like this: %overall% y1 by u121; y2 by u22u37; %C#1% y1 by u121; y2 by u22u37; [u1$1u21$1] ; [u1$2u21$2] ; [u22$1u37$1]; [u22$2u37$2] ; [u22$3u37$3] ; [y1y2]; ...and so on for the other classes. Thanks so much!! 


Please send the output and your license number to support@statmodel.com so I can see what the problem is. 

Artemis posted on Thursday, November 20, 2014  3:20 am



Dear Profs Muthen, I have read with great interest your paper entitled 'Item response mixture modelling: Application to tobacco dependence criteria' and I would like to generate Table 6 from your paper in my problem (i.e. Response pattern classification by latent classes using factor mixture analysis) I guess factor scores and cprob will have to be saved and depend on which model one fits i.e. which constraints one imposes for measurement invariance, right? In addition, I have in my problem a rather big sample i.e. ~ 7000 which makes computation time quite slow i.e. each model takes 2030 hours to run...would it be acceptable for all the different models I want to fit to select randomly say perhaps 1000 and do all comparisons and then fit the final model to the big sample? Something like a kind of cross validation if I may name it like this? Many thanks for all your help and time to this. 


Q1. See the RESPONSE option on page 754 of the V7 UG. Q2. Seems reasonable. Or, get a computer with at least 8 processors. 

Artemis posted on Friday, November 21, 2014  9:02 am



Thanks a lot for the very helpful responses, unfortunately my PC has 8 processors so I guess I will try the crossvalidation approachthanks again for everything. Sincerely, Artemis 


Drs. Muthen, I am running factor mixture models (FMM) at two different time points with continuous indicators (20 indicators, 5 covariates, 3 factors, and 4 classes). I'm able to get the FMMs to run successfully at each time point but when I try to combine them into a longitudinal model, the model estimation does not terminate normally. I would also like to include a distal outcome in the model. I've tried constraining factor loadings to be equal across time and fixing factor correlations across time to be zero, but that does not help with the model estimation. I haven't been able to find any literature on longitudinal factor mixture models; I've only found crosssectional FMMs. I was wondering if you would be able to point me towards a few useful articles or if you had any suggestions on useful constraints/parameter restrictions for longitudinal FMMs with distal outcomes. Thank you for your help. Thanks, Raghav 


Don't fix factor correlations across time to zero. You can hold factor loadings equal across classes. If this doesn't work, send output to Support along with your license number. 


Dear Drs. Muthen, I am trying to conduct latent class factor analysis (LCFA) using Mplus 7.4. I am following the instructions and examples provided in Clark et al (2013). According to Clark et al, only factor means should vary across classes in LCFA (they call it FMM1). The loadings should be invariant, and the factor covariance matrix should be 0. The results fit these specifications. However, I notice that the residuals variances are also invariant across classes. Should this be the case? I am also wondering about model identification requirements. If I want all loadings to be freely estimated, I know that I must fix the mean of one class to 0 (automatically done for the last class) if there are 2 classes. Must I fix the mean of another class (e.g. to 1) if I have 3 classes? In general must I fix, the mean of k1 classes, if there are a total of k classes, or is it enough to fix 1? Finally, a very basic question: the 5 observed variables in my model are all percentages, and only 2 are normally distributed. Must percentages be declared as some data type other than continuous? Thanks, 'Alim Clark, S. L., Muthén, B., Kaprio, J., D’Onofrio, B. M., Viken, R., et al. 2013. Models and strategies for factor mixture analysis: an example concerning the structure underlying psychological disorders. Structural Equation Modeling, 20(4): 681–703. 


Q1: Residual variance invariance is applied for parsimony's sake. Q2: Try fixing the mean in only one class and see if the model is identified. If not send to Support along with your license number. Q3: Not unless they are close to 0 or 1. 


Thanks Dr. Muthen, I am also trying Exploratory factor mixture analysis (example 4.4.). I have a few questions: 1. in the results, loadings and residual variances vary across classes. does that mean that all parameters are noninvariant across classes? 2. If I specify that I want more than 2 factors (eg EFA 1 4), Mplus says: "Too many factors were requested for EFA. The maximum number of factors is set to 2." Is 2 the max number of factors allowed for Exploratory FMM, or does it depend on something like number of observed variables in the model? 3. I often get the error "NO CONVERGENCE. PROBLEM OCCURRED IN EXPLORATORY FACTOR ANALYSIS WITH 2 FACTOR(S)." What is the best way to handle this? Should I try increasing one or several of the following: initial stage random starts, final stage optimizations, or initial stage iterations (STARTS and/or STITERATIONS)? thanks, 'Alim 


1. Yes 2. It depends on how many variables you have. See our Topic 1 handout for regular EFA on our website. 3. Typically this is because of negative residual variances and there is not an easy fix except to have fewer factors. 


Are there any references or guides that expand on the Mplus syntax given in the Clark et al. (2013) piece on FMM, such that instructions on correctly specifying FMM2, FMM3, and FMM4 models with three or more latent classes are offered? Model identification seems to be a bit tricky when moving into the FMM2, FMM3, and FMM4 model specifications with three or more latent classes (and with four latent factors). Any guidance would be very appreciated. 


The UG has FMM models  check there first. 

Todd Jensen posted on Wednesday, June 08, 2016  5:10 am



It appears that all the FMM examples in the UG only specify a twoclass solution. There does not appear to be any examples with three or more classes specified for FMM. 


UG ex 7.27 can be directly generalized to more than 2 classes. 


Hello, I have conducted a factor mixture model with three latent factors and two classes. Correlations between factors, as well as item residual variances are class specific. All other parameters are held equal. By default, factor means are set to zero in class two, but are estimated in class one. I am now unsure of how to interpret the absolute levels of the factors? Is there a way to obtain absolute levels of the factors to interpret them as an average score of people within a class? Thanks for your help! 


No, because they refer to latent variables, factor means are always relative  one class compared to the other. Same for multiple groups or multiple time points. You don't need more information than that. 


Professor Muthen, Hello! I have read your paper ¡°models and strategies for factor mixture analysis¡± and have some questions. Thank you for shedding light on my questions. 1.You mentioned the measurement invariance as factor loading and intercept invariance and proposed FMM1 and 2. We may naturally come up with FMM4 and 5. But why a FMM3, why not add factor variance/covariance invariance in FMM2? 2. All the 5 FMM didn¡¯t allow cross loadings and got more parsimonious models. Is it worthwhile to allow cross loadings to get better model fit given that the model fit of the 2f2c model was worse than the1f2c wrong model in the second example? 3. I think the analysis of data with FMM is more exploratory, and simultaneously confirmatory. You investigated the data with different FMMs and wanted to get the model with the best model fit. But the model with the best fit was not correct. The alternative 2f2c model also needed adjustment due to partial intercept invariance. This process reminded me of the multiple group EFA asymptotical restriction and model adjustment process. What about your opinion? 4. Is skewt applicable in the framework of FMM, and with Mplus program now? 


A followup to the reference to the FMM1 in the Clark et al (2013) piece. In that piece, the FMM1 is noted as having no within class variability on the factor. As such, using a 2class model, the exemplar syntax sets the variance to 0 in the overall part of the model command while using the marker ID approach to set the metric as the default (also last class mean to zero). If I wanted to estimate all factor loadings and set variance to 1 to identify the model (instead of 0), would this technically move out of the realm of FMM1 (i.e., no within class variability). Since all classes have factor variance at 1, I don't think it would but I am wondering about consequences of doing this. 


Yes, as soon as the variance is > 0 , this would move it out of that realm. 


Dear Drs Muthen, I am running a factor mixture model (CFA up to 2 factors and LPA up to 5 classes) using survey weights. I have been following the steps in the Clark , Muthen et al paper in 2013. I am able to successfully run models at level FMM1 (i.e., class invariant item means and factor loadings, factor covariances fixed to 0, but allowing factor means to vary). However, I have had difficulty running the models from FMM2 (i.e., class varying factor covariances) onwards. In the case of the FMM2 model for 2class, 2 factor model (see below syntax), I was able to get convergence by fixing a small nonsignificant negative residual factor variance to 0, and a nonsignificant factor covariance to 0, but then the model results were not meaningful in that one of the two latent classes had a membership of <5% when it would be expected to be much larger. When I move to FMM3 and 4, I consistently get error messages regarding either nonconvergence or the singularity of the data which seem to be related to the factor means. I am wondering if this would suggest that models beyond the most restrictive are not appropriate for my data? Thank you very much. Angela MODEL: %Overall% F1 by x1x6; F2 by x7x14; [x1x14] (114); %c#1% F1F2; F1 with F2; [F1F2@0]; %c#2% F1F2; F1 with F2; [F1F2*]; 


In response to your above post, Angela, I might mention I have had similar issues progressing from FMM1 through FMM4 from the Clark et al. (2013) piece. Although in my case difficulties seemed related to the factor variances, the common thread seemed to be that the more you try to parse a modestsized sample (you don't mention your N) into K + 1 classes, the harder FMM2 thru 4 are to fit as they permit too much flexibility. Hunt & Jorgensen (2011) explicitly note that "permitting too much flexibility in component models subverts the whole idea of a mixture of definite components and risks a failure of identifiability in the model and numerical problems with the fitting." 


Thank you for this comment  it is interesting to hear that you also had difficulties with this progression. My sample is 1,800+ participants. I also have read the comment by Hunt & Jorgensen, but had thought that FMM2 was still a fairly restrictive model with relatively low flexibility, and was hoping it might be possible to test it in my data. 


Yes, some data don't contain enough information to support the more flexible models. But if you like, you can send your FMM3 run to Support along with your license number and maybe we can see something. 


Thank you very much Dr Muthen. I will send it through. 

Back to top 