Daniel posted on Monday, August 25, 2003 - 9:16 am
I just read the paper by Gitta Lubke and Bengt about monte carlo studies of factor mixture models. I have two questions. First, how does one estimate Mahalanobis distance? Second, am I to understand that there isn't much difference in model performance (in terms of parameter coverage and average posterior probability) between a model with and without class-specific variance? Is it better to allow for class-specific variance in, for example, a growth mixture model, than to assume no variation, or does it not make much difference?
BMuthen posted on Tuesday, August 26, 2003 - 10:00 am
I think the formula for the Mahalanobis distance is in the paper. Otherwise, see a multivariate statitics textbook. It can be done by using, for example, SAS proc matrix.
Regarding your second question, as far as the data for this paper is concerned, you are correct. It may not be correct for all data.
Regarding class-specific variances, they can make a big difference if they are needed to fit the data. For example, studying problematic behavior development over time, one often sees a normal class which shows a low mean and very little variation over time whereas other classes may have a lot of variation.
Daniel posted on Tuesday, August 26, 2003 - 10:12 am
I have a question on the following portion of the output file:
CATEGORICAL ARE x1-x10; CLASSES = c(3);
ANALYSIS: TYPE IS MIXTURE; MODEL: %OVERALL% F by x1-x10; SAVEDATA: FILE IS outdat.dat; save = fscores;
*** ERROR in Analysis command OUTPUT options FSCOEFFICIENT and FSDETERMINACY and SAVEDATA SAVE option FSCORES are not available when there are no latent variables.
F is a latent variable. What I really want to do is to estimate different factor structures across classes and save the estimated fscores plus the cprob. By the way, I encountered no problem in saving the cprob.
The default with mixture and categorical is to fix the factor variances to zero. Therefore no factor scores etc. are available. I think you need to add ALGORITHM = INTEGRATION to your ANALYSIS command.
Is there information on how the factor scores are computed and what are the properties of the factor scores (e.g., validity, preserved factor correlations)? Does it matter if the indicators were continuous or categorical?
thanks for your quick response (and for the excellent support!)
I'm wondering about setting the factor means for a zero class to zero. I was expecting that factor scores would be standardised so that zero would be the grand average? That way the zero class "could" have a mean that was greater than the other classes? (This sounds v. unlikely with thresholds at +15)
I'm wondering about the magnitude of the differences between the models in say yor paper with Tihomir Asparouhov. Given some of the analysis compares -2logliklihoods between nested models what do you think about using the omega^2 effect size as an index of the magnitude of the differences between models? Are there any other alternatives?
For nested models I like to use LR chi-square based on 2*LL diffs - when that is correct (no parameters on the border). I don't know about omega. We are also contemplating generalizing the bootstrapped LR test to models that differ not only in number of classes but also in terms of number of random effects.
anonymous posted on Monday, April 02, 2007 - 3:15 pm
I just read the forthcoming paper (Muthén, 2006) on Latent variable hybrids: Overview of old and new models.
I was wondering if inputs examples for the four cross sectional models graphically exposed in figure 1 were available (mixture factor analysis, non parametric FA, factor mixture analysis, non parametric FMA).
I am working on a latent profile analysis in which I use factor mixture models to test for conditional dependance. Following your previous suggestions, I use factor mixture models with class specific factor variance. They work generally well. However, I also try to fit models with class specific factor loadings (i.e. 7.27). You warned me that such models could be hard to fit (and they are). For example, amongst the numerous warnings which I obtain, I often receive message regarding negative (and significant) residual factor variance within one of the class. My question is whether factor mixture models with within class factor loadings could be easier to fit by fixing the overall factor variance to 1 instead of fixing the loading of the first indicator to 1, while withdrawing (or not ?) class specific estimates of factor variance ? Would it still be logical to compare the fit of factor mixture models with class specific factor variance (unstandardized) to standardized models with class specific loadings ?
Yes, when the factor loading matrix is made class-specific, it is my experience that it can help to set the metric in the factor variance (@1) for each class instead of the first loading. This way, the search for the best solution is less dependent on the quality of that one item.
Thank you very much for your answer. Just to make sure I understand correctly, to "set the metric in the factor variance for each class", I only have to set the variance (f@1) in the %overal% section of the model and not in each class section.
* 10 binary observed variables of u1-u10, * factor mixture model with 2 classes, * one factor for each class, * classes differ in factor means only. what should be the statements for item thresholds and factor means under %overall%, %c#1%, and %c#2%?
I am currently working on a dissertation project comparing the performance of Meehl's taxometric methods to latent variable methods. I have recently purchased the Mplus program to run mixture modeling methods. Every data condition I have simulated uses four continuous variables. I plan to run Latent Profile Analysis, Latent Class Factor Analysis and Factor Mixture Analysis. Is it possible to run Latent Class Factor Analysis with continuous indicator variables and is there an example in the user's guide?
My Monte Carlo datasets are tab delimited .dat files that have variable names on the first line. I got an error message when I tried to run latent profile analysis "ERROR Invalid symbol in data file: "V1" at record #: 1, field #: 1." The analysis runs when I delete the variable names on the datafile. Is there a way to work around this in the format statement? I know it is possible to skip columns with fixed data formats but can I skip the first row with a free data format? Finally, I'd like the methods to determine the number of classes that best fit the data. Would I have to analyze each dataset twice, starting with c = 1 and then c = 2 or is it possible to ask the program to do both in one analysis?
I have conducted CFA and LCA on a set of binary indicators. The results suggest evidence for a one-factor model or a three-factor solution with parallel profiles. I want to follow-up this analysis by estimating a hybrid model. I have read the papers by Muthen (2006) and Muthen & Asparouhov (2006) but I am a bit unclear about the differences between LCFA and IRT mixture modeling. I’d be grateful if you could answer the following questions:
(1) Should LCFA be estimated in favor of IRT mixture modeling when the factor is considered to have a non-normal distribution? (2) In LCFA, is it correct that the latent classes share the same dimension and therefore the factor loadings (but not thresholds) should be equal across classes? (3) Does it make conceptual sense to estimate both types of models on the same set of indicators? If so, in what circumstances?
One paper that may answer many of your questions is the forthcoming Clark and Muthen article entitled "Models and strategies for factor mixture analysis: Two examples concerning the structure underlying psychological disorders."
But since it is currently not available, here are some responses to your questions:
First, IRT mixture models and LCFA are not separate models, but LCFA is a special case where the factor loadings and item thresholds are invariant across classes and the factor variance\covariance is zero. The only difference between the classes are the location of the classes on the factor, as indicated by the factor mean being different in each class. So, to answer question 2, the classes do share the same dimension, but the factor loadings and item thresholds are equal across classes.
I would argue that both the LCFA and more flexible models which relax the equality of the item thresholds and factors loadings should be applied to data, but that it should be kept in mind what each of these models implies about the underlying structure of the data. LCFA and an alternative which allows the factor variance to be estimated (this variance can be restricted to be equal across classes or non-invariant), both have the same factor running through all classes and the difference between classes arises due to having class varying factor means and potentially factor variances. Other models which relax the equality of factor loadings and item thresholds may still have the same factor in both classes depending on the difference in the estimated item thresholds and factor loadings when they are allowed to vary across classes.
Also, both the LCFA and other models which relax the equality of item thresholds and factor loadings across classes allow for a non-normal factor.
Matt Thullen posted on Tuesday, November 10, 2009 - 10:14 am
In LCFA, How is the zero factor score interpreted? Im thinking of how to represent the factor scores within and between each class in for a model with 3 factors...like in a graph or plot.
Also with LCFA, I based my syntax off the examples in Clark & Muthen(recently posted) and I get warnings about having more equality labels than parameters. I have something resembling this for each of my classes: [u1-u12] (1-12). The model seems to run fine but Im not sure if or what I should do to address those warnings.
I'm fitting factor mixture models and I have model with 4 latent classes and a single latent factor in each class. When I request class probabilities and factor scores, I get two different factor scores, one which is labeled the same way I labeled my latent factor and one with C_ as a prefix. I searched the Mplus version 6 manual but can't find any information on this second factor score. Can you direct me to some relevant documentation?
Hello, I have a factor mixture model with two factors and 4 classes. I am allowing the means, thresholds, and factor loadings to vary across classes. My factor loadings however have only 0.00 for SE, and 999 for for my p-value. Is this because I did not make any specifications for the factor variance and it is being held at 0? Can I still compare these factor loadings across classes? My code looks like this: %overall% y1 by u1-21; y2 by u22-u37; %C#1% y1 by u1-21; y2 by u22-u37; [u1$1-u21$1] ; [u1$2-u21$2] ; [u22$1-u37$1]; [u22$2-u37$2] ; [u22$3-u37$3] ; [y1-y2]; ...and so on for the other classes.
Please send the output and your license number to email@example.com so I can see what the problem is.
Artemis posted on Thursday, November 20, 2014 - 3:20 am
Dear Profs Muthen,
I have read with great interest your paper entitled 'Item response mixture modelling: Application to tobacco dependence criteria' and I would like to generate Table 6 from your paper in my problem (i.e. Response pattern classification by latent classes using factor mixture analysis) I guess factor scores and cprob will have to be saved and depend on which model one fits i.e. which constraints one imposes for measurement invariance, right? In addition, I have in my problem a rather big sample i.e. ~ 7000 which makes computation time quite slow i.e. each model takes 20-30 hours to run...would it be acceptable for all the different models I want to fit to select randomly say perhaps 1000 and do all comparisons and then fit the final model to the big sample? Something like a kind of cross validation if I may name it like this? Many thanks for all your help and time to this.
I am running factor mixture models (FMM) at two different time points with continuous indicators (20 indicators, 5 covariates, 3 factors, and 4 classes). I'm able to get the FMMs to run successfully at each time point but when I try to combine them into a longitudinal model, the model estimation does not terminate normally. I would also like to include a distal outcome in the model.
I've tried constraining factor loadings to be equal across time and fixing factor correlations across time to be zero, but that does not help with the model estimation. I haven't been able to find any literature on longitudinal factor mixture models; I've only found cross-sectional FMMs. I was wondering if you would be able to point me towards a few useful articles or if you had any suggestions on useful constraints/parameter restrictions for longitudinal FMMs with distal outcomes. Thank you for your help.
I am trying to conduct latent class factor analysis (LCFA) using Mplus 7.4. I am following the instructions and examples provided in Clark et al (2013). According to Clark et al, only factor means should vary across classes in LCFA (they call it FMM-1). The loadings should be invariant, and the factor covariance matrix should be 0. The results fit these specifications. However, I notice that the residuals variances are also invariant across classes. Should this be the case?
I am also wondering about model identification requirements. If I want all loadings to be freely estimated, I know that I must fix the mean of one class to 0 (automatically done for the last class) if there are 2 classes. Must I fix the mean of another class (e.g. to 1) if I have 3 classes? In general must I fix, the mean of k-1 classes, if there are a total of k classes, or is it enough to fix 1?
Finally, a very basic question: the 5 observed variables in my model are all percentages, and only 2 are normally distributed. Must percentages be declared as some data type other than continuous?
Clark, S. L., Muthén, B., Kaprio, J., D’Onofrio, B. M., Viken, R., et al. 2013. Models and strategies for factor mixture analysis: an example concerning the structure underlying psychological disorders. Structural Equation Modeling, 20(4): 681–703.
I am also trying Exploratory factor mixture analysis (example 4.4.). I have a few questions:
1. in the results, loadings and residual variances vary across classes. does that mean that all parameters are non-invariant across classes?
2. If I specify that I want more than 2 factors (eg EFA 1 4), Mplus says: "Too many factors were requested for EFA. The maximum number of factors is set to 2." Is 2 the max number of factors allowed for Exploratory FMM, or does it depend on something like number of observed variables in the model?
3. I often get the error "NO CONVERGENCE. PROBLEM OCCURRED IN EXPLORATORY FACTOR ANALYSIS WITH 2 FACTOR(S)." What is the best way to handle this? Should I try increasing one or several of the following: initial stage random starts, final stage optimizations, or initial stage iterations (STARTS and/or STITERATIONS)?
Are there any references or guides that expand on the Mplus syntax given in the Clark et al. (2013) piece on FMM, such that instructions on correctly specifying FMM-2, FMM-3, and FMM-4 models with three or more latent classes are offered?
Model identification seems to be a bit tricky when moving into the FMM-2, FMM-3, and FMM-4 model specifications with three or more latent classes (and with four latent factors).
I have conducted a factor mixture model with three latent factors and two classes. Correlations between factors, as well as item residual variances are class specific. All other parameters are held equal.
By default, factor means are set to zero in class two, but are estimated in class one. I am now unsure of how to interpret the absolute levels of the factors? Is there a way to obtain absolute levels of the factors to interpret them as an average score of people within a class?
No, because they refer to latent variables, factor means are always relative - one class compared to the other. Same for multiple groups or multiple time points. You don't need more information than that.
Hello! I have read your paper ¡°models and strategies for factor mixture analysis¡± and have some questions. Thank you for shedding light on my questions.
1.You mentioned the measurement invariance as factor loading and intercept invariance and proposed FMM1 and 2. We may naturally come up with FMM4 and 5. But why a FMM3, why not add factor variance/covariance invariance in FMM2?
2. All the 5 FMM didn¡¯t allow cross loadings and got more parsimonious models. Is it worthwhile to allow cross loadings to get better model fit given that the model fit of the 2f2c model was worse than the1f2c wrong model in the second example?
3. I think the analysis of data with FMM is more exploratory, and simultaneously confirmatory. You investigated the data with different FMMs and wanted to get the model with the best model fit. But the model with the best fit was not correct. The alternative 2f2c model also needed adjustment due to partial intercept invariance. This process reminded me of the multiple group EFA asymptotical restriction and model adjustment process. What about your opinion?
4. Is skew-t applicable in the framework of FMM, and with Mplus program now?
A follow-up to the reference to the FMM-1 in the Clark et al (2013) piece. In that piece, the FMM-1 is noted as having no within class variability on the factor. As such, using a 2-class model, the exemplar syntax sets the variance to 0 in the overall part of the model command while using the marker ID approach to set the metric as the default (also last class mean to zero). If I wanted to estimate all factor loadings and set variance to 1 to identify the model (instead of 0), would this technically move out of the realm of FMM-1 (i.e., no within class variability). Since all classes have factor variance at 1, I don't think it would but I am wondering about consequences of doing this.
Dear Drs Muthen, I am running a factor mixture model (CFA up to 2 factors and LPA up to 5 classes) using survey weights. I have been following the steps in the Clark , Muthen et al paper in 2013. I am able to successfully run models at level FMM-1 (i.e., class invariant item means and factor loadings, factor covariances fixed to 0, but allowing factor means to vary). However, I have had difficulty running the models from FMM-2 (i.e., class varying factor covariances) onwards. In the case of the FMM-2 model for 2-class, 2 -factor model (see below syntax), I was able to get convergence by fixing a small non-significant negative residual factor variance to 0, and a non-significant factor covariance to 0, but then the model results were not meaningful in that one of the two latent classes had a membership of <5% when it would be expected to be much larger. When I move to FMM-3 and 4, I consistently get error messages regarding either non-convergence or the singularity of the data which seem to be related to the factor means. I am wondering if this would suggest that models beyond the most restrictive are not appropriate for my data? Thank you very much. Angela
%Overall% F1 by x1-x6; F2 by x7-x14; [x1-x14] (1-14);
In response to your above post, Angela, I might mention I have had similar issues progressing from FMM-1 through FMM-4 from the Clark et al. (2013) piece. Although in my case difficulties seemed related to the factor variances, the common thread seemed to be that the more you try to parse a modest-sized sample (you don't mention your N) into K + 1 classes, the harder FMM-2 thru 4 are to fit as they permit too much flexibility. Hunt & Jorgensen (2011) explicitly note that "permitting too much flexibility in component models subverts the whole idea of a mixture of definite components and risks a failure of identifiability in the model and numerical problems with the fitting."
Thank you for this comment - it is interesting to hear that you also had difficulties with this progression. My sample is 1,800+ participants. I also have read the comment by Hunt & Jorgensen, but had thought that FMM-2 was still a fairly restrictive model with relatively low flexibility, and was hoping it might be possible to test it in my data.
Yes, some data don't contain enough information to support the more flexible models. But if you like, you can send your FMM-3 run to Support along with your license number and maybe we can see something.