Message/Author 

Daniel posted on Monday, August 25, 2003  9:16 am



I just read the paper by Gitta Lubke and Bengt about monte carlo studies of factor mixture models. I have two questions. First, how does one estimate Mahalanobis distance? Second, am I to understand that there isn't much difference in model performance (in terms of parameter coverage and average posterior probability) between a model with and without classspecific variance? Is it better to allow for classspecific variance in, for example, a growth mixture model, than to assume no variation, or does it not make much difference? 

BMuthen posted on Tuesday, August 26, 2003  10:00 am



I think the formula for the Mahalanobis distance is in the paper. Otherwise, see a multivariate statitics textbook. It can be done by using, for example, SAS proc matrix. Regarding your second question, as far as the data for this paper is concerned, you are correct. It may not be correct for all data. Regarding classspecific variances, they can make a big difference if they are needed to fit the data. For example, studying problematic behavior development over time, one often sees a normal class which shows a low mean and very little variation over time whereas other classes may have a lot of variation. 

Daniel posted on Tuesday, August 26, 2003  10:12 am



Thank you 


I have a question on the following portion of the output file: CATEGORICAL ARE x1x10; CLASSES = c(3); ANALYSIS: TYPE IS MIXTURE; MODEL: %OVERALL% F by x1x10; SAVEDATA: FILE IS outdat.dat; save = fscores; *** ERROR in Analysis command OUTPUT options FSCOEFFICIENT and FSDETERMINACY and SAVEDATA SAVE option FSCORES are not available when there are no latent variables. F is a latent variable. What I really want to do is to estimate different factor structures across classes and save the estimated fscores plus the cprob. By the way, I encountered no problem in saving the cprob. Thank you 


The default with mixture and categorical is to fix the factor variances to zero. Therefore no factor scores etc. are available. I think you need to add ALGORITHM = INTEGRATION to your ANALYSIS command. 


Linda  Is there information on how the factor scores are computed and what are the properties of the factor scores (e.g., validity, preserved factor correlations)? Does it matter if the indicators were continuous or categorical? Thanks 


See Technical Appendix 11 on the website for a description of factor scores for continuous and cateogical outcomes. 


Linda or Bengt, Could I check how a "zero class" as you used in Bengt Muthen and Tihomir Asparouhov, Item response mixture modeling: Application to tobacco dependence criteria, Addictive Behaviors, Volume 31, Issue 6, June 2006, Pages 10501066. (http://www.sciencedirect.com/science/article/B6VC94JW15NF6/2/fa328cda89499c0510baddb575a4512b) is tested? I guess that i'ts done by testing for two classes and setting the thresholds for each item to be 15 for one class? Is that right and is there anything else to be done to test a zero class? thanks Andrew (andrew.baillie AT mq.edu.au) 


You fix the thresholds at +15 for the zero class and you also fix factor means and variances at zero for this zero class. 


Bengt, thanks for your quick response (and for the excellent support!) I'm wondering about setting the factor means for a zero class to zero. I was expecting that factor scores would be standardised so that zero would be the grand average? That way the zero class "could" have a mean that was greater than the other classes? (This sounds v. unlikely with thresholds at +15) thanks again Andrew (andrew.baillie AT mq.edu.au) 


For the zero class you should think of the factor as not existing  when you fix both the mean and the variance of the factor to zero it no longer influences anything in the model. For the remaining classes, the average of the factor means across classes is not standardized to zero. Instead, a reference class with zero factor mean is used. 


Bengt & Linda, I'm wondering about the magnitude of the differences between the models in say yor paper with Tihomir Asparouhov. Given some of the analysis compares 2logliklihoods between nested models what do you think about using the omega^2 effect size as an index of the magnitude of the differences between models? Are there any other alternatives? thanks Andrew andrew.baillie AT mq.edu.au 


For nested models I like to use LR chisquare based on 2*LL diffs  when that is correct (no parameters on the border). I don't know about omega. We are also contemplating generalizing the bootstrapped LR test to models that differ not only in number of classes but also in terms of number of random effects. 

anonymous posted on Monday, April 02, 2007  3:15 pm



Greetings, I just read the forthcoming paper (Muthén, 2006) on Latent variable hybrids: Overview of old and new models. I was wondering if inputs examples for the four cross sectional models graphically exposed in figure 1 were available (mixture factor analysis, non parametric FA, factor mixture analysis, non parametric FMA). Thank you very much 


Send your email address to support@statmodel.com. 

Alex posted on Friday, April 27, 2007  6:53 am



I am working on a latent profile analysis in which I use factor mixture models to test for conditional dependance. Following your previous suggestions, I use factor mixture models with class specific factor variance. They work generally well. However, I also try to fit models with class specific factor loadings (i.e. 7.27). You warned me that such models could be hard to fit (and they are). For example, amongst the numerous warnings which I obtain, I often receive message regarding negative (and significant) residual factor variance within one of the class. My question is whether factor mixture models with within class factor loadings could be easier to fit by fixing the overall factor variance to 1 instead of fixing the loading of the first indicator to 1, while withdrawing (or not ?) class specific estimates of factor variance ? Would it still be logical to compare the fit of factor mixture models with class specific factor variance (unstandardized) to standardized models with class specific loadings ? Thank you very much in advance. 


Yes, when the factor loading matrix is made classspecific, it is my experience that it can help to set the metric in the factor variance (@1) for each class instead of the first loading. This way, the search for the best solution is less dependent on the quality of that one item. 

Alex posted on Friday, April 27, 2007  8:46 am



Thank you very much for your answer. Just to make sure I understand correctly, to "set the metric in the factor variance for each class", I only have to set the variance (f@1) in the %overal% section of the model and not in each class section. 


That's right. Saying that in the overall makes it hold for each class (see Tech1). 


* 10 binary observed variables of u1u10, * factor mixture model with 2 classes, * one factor for each class, * classes differ in factor means only. what should be the statements for item thresholds and factor means under %overall%, %c#1%, and %c#2%? 


Factor means and thresholds are free across classes as the default so I don't think you need to say anything about them. 


I am currently working on a dissertation project comparing the performance of Meehl's taxometric methods to latent variable methods. I have recently purchased the Mplus program to run mixture modeling methods. Every data condition I have simulated uses four continuous variables. I plan to run Latent Profile Analysis, Latent Class Factor Analysis and Factor Mixture Analysis. Is it possible to run Latent Class Factor Analysis with continuous indicator variables and is there an example in the user's guide? My Monte Carlo datasets are tab delimited .dat files that have variable names on the first line. I got an error message when I tried to run latent profile analysis "ERROR Invalid symbol in data file: "V1" at record #: 1, field #: 1." The analysis runs when I delete the variable names on the datafile. Is there a way to work around this in the format statement? I know it is possible to skip columns with fixed data formats but can I skip the first row with a free data format? Finally, I'd like the methods to determine the number of classes that best fit the data. Would I have to analyze each dataset twice, starting with c = 1 and then c = 2 or is it possible to ask the program to do both in one analysis? 


See the following paper which is available on the website for an example of Latent Class Factor Analysis: Muthén, B. (2006). Should substance use disorders be considered as categorical or dimensional? Addiction, 101 (Suppl. 1), 616. You should just delete the line with the variable names. You need to run the analysis separately for two, three, etc. classes. 


I have conducted CFA and LCA on a set of binary indicators. The results suggest evidence for a onefactor model or a threefactor solution with parallel profiles. I want to followup this analysis by estimating a hybrid model. I have read the papers by Muthen (2006) and Muthen & Asparouhov (2006) but I am a bit unclear about the differences between LCFA and IRT mixture modeling. I’d be grateful if you could answer the following questions: (1) Should LCFA be estimated in favor of IRT mixture modeling when the factor is considered to have a nonnormal distribution? (2) In LCFA, is it correct that the latent classes share the same dimension and therefore the factor loadings (but not thresholds) should be equal across classes? (3) Does it make conceptual sense to estimate both types of models on the same set of indicators? If so, in what circumstances? 


One paper that may answer many of your questions is the forthcoming Clark and Muthen article entitled "Models and strategies for factor mixture analysis: Two examples concerning the structure underlying psychological disorders." But since it is currently not available, here are some responses to your questions: First, IRT mixture models and LCFA are not separate models, but LCFA is a special case where the factor loadings and item thresholds are invariant across classes and the factor variance\covariance is zero. The only difference between the classes are the location of the classes on the factor, as indicated by the factor mean being different in each class. So, to answer question 2, the classes do share the same dimension, but the factor loadings and item thresholds are equal across classes. 


I would argue that both the LCFA and more flexible models which relax the equality of the item thresholds and factors loadings should be applied to data, but that it should be kept in mind what each of these models implies about the underlying structure of the data. LCFA and an alternative which allows the factor variance to be estimated (this variance can be restricted to be equal across classes or noninvariant), both have the same factor running through all classes and the difference between classes arises due to having class varying factor means and potentially factor variances. Other models which relax the equality of factor loadings and item thresholds may still have the same factor in both classes depending on the difference in the estimated item thresholds and factor loadings when they are allowed to vary across classes. Also, both the LCFA and other models which relax the equality of item thresholds and factor loadings across classes allow for a nonnormal factor. 

Matt Thullen posted on Tuesday, November 10, 2009  10:14 am



Hello In LCFA, How is the zero factor score interpreted? Im thinking of how to represent the factor scores within and between each class in for a model with 3 factors...like in a graph or plot. Also with LCFA, I based my syntax off the examples in Clark & Muthen(recently posted) and I get warnings about having more equality labels than parameters. I have something resembling this for each of my classes: [u1u12] (112). The model seems to run fine but Im not sure if or what I should do to address those warnings. thank you 


The zero factor score is a reference point. It is not identified as a free parameter. I would need to see your full output and license number to understand why you get an error for the syntax you show. 


I'm fitting factor mixture models and I have model with 4 latent classes and a single latent factor in each class. When I request class probabilities and factor scores, I get two different factor scores, one which is labeled the same way I labeled my latent factor and one with C_ as a prefix. I searched the Mplus version 6 manual but can't find any information on this second factor score. Can you direct me to some relevant documentation? 


I don't know of any documentation. One is mixed over classes and the other is for the most likely class. 


Which is which? 


c_ is most likely class membership. 


Hello, I have a factor mixture model with two factors and 4 classes. I am allowing the means, thresholds, and factor loadings to vary across classes. My factor loadings however have only 0.00 for SE, and 999 for for my pvalue. Is this because I did not make any specifications for the factor variance and it is being held at 0? Can I still compare these factor loadings across classes? My code looks like this: %overall% y1 by u121; y2 by u22u37; %C#1% y1 by u121; y2 by u22u37; [u1$1u21$1] ; [u1$2u21$2] ; [u22$1u37$1]; [u22$2u37$2] ; [u22$3u37$3] ; [y1y2]; ...and so on for the other classes. Thanks so much!! 


Please send the output and your license number to support@statmodel.com so I can see what the problem is. 

Artemis posted on Thursday, November 20, 2014  3:20 am



Dear Profs Muthen, I have read with great interest your paper entitled 'Item response mixture modelling: Application to tobacco dependence criteria' and I would like to generate Table 6 from your paper in my problem (i.e. Response pattern classification by latent classes using factor mixture analysis) I guess factor scores and cprob will have to be saved and depend on which model one fits i.e. which constraints one imposes for measurement invariance, right? In addition, I have in my problem a rather big sample i.e. ~ 7000 which makes computation time quite slow i.e. each model takes 2030 hours to run...would it be acceptable for all the different models I want to fit to select randomly say perhaps 1000 and do all comparisons and then fit the final model to the big sample? Something like a kind of cross validation if I may name it like this? Many thanks for all your help and time to this. 


Q1. See the RESPONSE option on page 754 of the V7 UG. Q2. Seems reasonable. Or, get a computer with at least 8 processors. 

Artemis posted on Friday, November 21, 2014  9:02 am



Thanks a lot for the very helpful responses, unfortunately my PC has 8 processors so I guess I will try the crossvalidation approachthanks again for everything. Sincerely, Artemis 


Drs. Muthen, I am running factor mixture models (FMM) at two different time points with continuous indicators (20 indicators, 5 covariates, 3 factors, and 4 classes). I'm able to get the FMMs to run successfully at each time point but when I try to combine them into a longitudinal model, the model estimation does not terminate normally. I would also like to include a distal outcome in the model. I've tried constraining factor loadings to be equal across time and fixing factor correlations across time to be zero, but that does not help with the model estimation. I haven't been able to find any literature on longitudinal factor mixture models; I've only found crosssectional FMMs. I was wondering if you would be able to point me towards a few useful articles or if you had any suggestions on useful constraints/parameter restrictions for longitudinal FMMs with distal outcomes. Thank you for your help. Thanks, Raghav 


Don't fix factor correlations across time to zero. You can hold factor loadings equal across classes. If this doesn't work, send output to Support along with your license number. 


Dear Drs. Muthen, I am trying to conduct latent class factor analysis (LCFA) using Mplus 7.4. I am following the instructions and examples provided in Clark et al (2013). According to Clark et al, only factor means should vary across classes in LCFA (they call it FMM1). The loadings should be invariant, and the factor covariance matrix should be 0. The results fit these specifications. However, I notice that the residuals variances are also invariant across classes. Should this be the case? I am also wondering about model identification requirements. If I want all loadings to be freely estimated, I know that I must fix the mean of one class to 0 (automatically done for the last class) if there are 2 classes. Must I fix the mean of another class (e.g. to 1) if I have 3 classes? In general must I fix, the mean of k1 classes, if there are a total of k classes, or is it enough to fix 1? Finally, a very basic question: the 5 observed variables in my model are all percentages, and only 2 are normally distributed. Must percentages be declared as some data type other than continuous? Thanks, 'Alim Clark, S. L., Muthén, B., Kaprio, J., D’Onofrio, B. M., Viken, R., et al. 2013. Models and strategies for factor mixture analysis: an example concerning the structure underlying psychological disorders. Structural Equation Modeling, 20(4): 681–703. 


Q1: Residual variance invariance is applied for parsimony's sake. Q2: Try fixing the mean in only one class and see if the model is identified. If not send to Support along with your license number. Q3: Not unless they are close to 0 or 1. 


Thanks Dr. Muthen, I am also trying Exploratory factor mixture analysis (example 4.4.). I have a few questions: 1. in the results, loadings and residual variances vary across classes. does that mean that all parameters are noninvariant across classes? 2. If I specify that I want more than 2 factors (eg EFA 1 4), Mplus says: "Too many factors were requested for EFA. The maximum number of factors is set to 2." Is 2 the max number of factors allowed for Exploratory FMM, or does it depend on something like number of observed variables in the model? 3. I often get the error "NO CONVERGENCE. PROBLEM OCCURRED IN EXPLORATORY FACTOR ANALYSIS WITH 2 FACTOR(S)." What is the best way to handle this? Should I try increasing one or several of the following: initial stage random starts, final stage optimizations, or initial stage iterations (STARTS and/or STITERATIONS)? thanks, 'Alim 


1. Yes 2. It depends on how many variables you have. See our Topic 1 handout for regular EFA on our website. 3. Typically this is because of negative residual variances and there is not an easy fix except to have fewer factors. 

Back to top 