Mplus Discussion >> Data distribution assumptions with version 2.0

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Data distribution assumptions with ve...

Mplus Discussion > Latent Variable Mixture Modeling >

Message/Author

Peter Tice posted on Tuesday, March 20, 2001 - 11:02 am

What data distribution assumptions does Mplus version 2.0 make in estimating the growth mixture models. I understand that with version 1.04 the assumption is that one's data used in estimating the mixture models is normally distributed. Is the same assumption made with version 2.0, or is there more flexibility in estimating growth mixture models that use non-normal data?

Thanks,

Linda K. Muthen posted on Tuesday, March 20, 2001 - 5:18 pm

The data distribution assumptions have not changed between Version 1 and Version 2. The assumption for the continuous y variables is that they are normally distributed within class given x. The marginal distributions of y can be quite non-normal. The latent class indicators are categorical so assumptions of normality are not relevant.

Anonymous posted on Tuesday, March 27, 2001 - 1:15 pm

Can Mplus deal with unbalanced longitudinal data
for mixture modeling?

Linda K. Muthen posted on Monday, April 02, 2001 - 10:22 am

If the outcomes are continuous, Mplus can deal with longitudinal data that is unbalanced, that is, has missing values for some individuals at some timepoints.

Emily Durbin posted on Friday, April 20, 2001 - 12:35 pm

For LC models using categorical data across multiple time points, must observations with missing data at any time point be completely eliminated from the LC analysis?

Are there any alternative methods for estimating the missing status of a subject on the categorical DV using data from the other time points?

Thanks!

Linda K. Muthen posted on Friday, April 20, 2001 - 5:29 pm

In Version 2 of Mplus, missing data are allowed on the categorical latent class indicators by using TYPE=MIXTURE MISSING.

Anonymous posted on Wednesday, April 17, 2002 - 3:42 pm

My analyses show when time-varying covariates are included in growth mixture modeling, it seems that listwise deletion is applied, even though TYPE=MIXTURE MISSING is specified.

Linda K. Muthen posted on Wednesday, April 17, 2002 - 4:21 pm

With TYPE=MIXTURE MISSING, there is a listwise deletion done on both time-varying and time-invariant x variables. So any subject with a missing value on one or more x variables will be eliminated from the analysis.

Anonymous posted on Tuesday, April 06, 2004 - 12:02 pm

A question about data distribution assumptions in Version 3: You indicate on the web site that mixture models will now be estimable with binary, ordinal, nominal and count data as well as continuous data. What is the distribution assumed of the latent factors when working with the discrete scales? Are mixtures of normals still used at the level of the factors, or some other continuous distribution, or is a point mass approach taken where the factors have means but no variances (similar to what is in GLLAMM)? Thanks in advance for your clarification (guidance to references would also be appreciated if available).

bmuthen posted on Tuesday, April 06, 2004 - 1:07 pm

Mplus allows you to take either of the two approaches that you mention, a mixture of normals, or the non-parametric mass point approach. These two approaches are discussed in examples in chapter 7 of the Version 3 User's Guide. The User's Guide refers to work by Aitkin (1999) in Biometrics for the non-parametric approach. The following paper discusses the factor mixture approach with non-zero factor variances (see the Mplus home page for a pdf):

Lubke, G. & Muthén, B. (2003). Performance of factor mixture models. Under review, Multivariate Behavioral Research.

Anonymous posted on Tuesday, April 06, 2004 - 5:27 pm

I have a few questions about the data requirements for GMM.
1. What assumptions are made about the time structure of the data? Is it the same as conventional growth curve models where the Y variable can be unequally spaced, yet in the complete data individuals must be measured at the same time points?
2. Is it likewise assumed that time-varying covariates have the same distribution across individuals?
3. In version 3, is there still listwise deletion of data with missing values on the covariates?
Thanks!

Linda K. Muthen posted on Tuesday, April 06, 2004 - 6:02 pm

1. You can have a model where the time scores are parameters in the model or where the time scores are treated as data. In the first situation, all individuals are measured at the same times. In the second option, individuals can be measured at different times.

2. No, time-varying covariates can also be measured at different times for different individuals.

3. Yes, but if you treat the x-variables as y-variables by mentioning their variances in the MODEL command, they will not be deleted. The only difference is that the model is not estimated condtioned on the x's but they are part of the model.

Ted Fong posted on Tuesday, March 18, 2014 - 3:02 am

Dear Dr. Muthen,

I have performed a factor mixture analysis on 9 continuous items and found a two-class (high vs low class), two-factor mixture model to the data.

However, a reviewer questions the validity of the two classes as they may arise because of violation of assumptions such as heavy skewness and kurtosis. He wants to know how badly-behaved the data are, that may in turn give rise to the mixture components.

I have checked that the skewness and kurtosis of the 9 variables are slightly negative (-1 to 0). 6 of the 9 variables appear to have a bimodal distribution. I have done TECH13 which rejects the multivariate skewness and kurtosis of the two-class, two-factor model.

I understand that the overall distribution of variables can be nonnormal in mixture analysis. I am not sure if my data behaves well or badly according to the view of the reviewer. Overall, he seems to question the use of mixture analysis to finding distinct meaningful subgroups VS its alternative indirect use of approximating some unknown distribution.

I have read the papers written by Bauer and your 2003 paper on substantial and statistical checking of mixture models. But I don't know where to proceed or how to respond to the reviewer.

Do you have any views on this matter? I would be grateful for any comments.

Bengt O. Muthen posted on Tuesday, March 18, 2014 - 1:58 pm

You can use BIC to argue on statistical grounds that a two-class factor mixture model is better than a regular 1-class factor model (or a two-class LCA).

You can use substantive theory to argue on subject-matter grounds that your two classes are useful and of substantive interest by relating the class membership to other variables as I argued in my 2003 response. You can use 3-step methods for that: R3STEP to look at antecedents (predictors of the latent class variable) and DCAT/DCON to look at consequences (distal outcomes; predictive validity).

I think you want to make both arguments in a mixture analysis.

Bengt O. Muthen posted on Tuesday, March 18, 2014 - 1:59 pm

If you send me an email we can discuss this further.

M.O. posted on Tuesday, January 06, 2015 - 6:56 pm

Dear Dr. Muthen,

I have a question related to Dr. Fong's.

I am trying to find out an appropriate analysis method for a dataset,
which has bimodal distribution (N=888, 44 item, 5-point scale).

I have tried latent class analysis (LCA), exploratory mixture factor
analysis (EMFA), and, for comparison, exploratory factor
analysis(EFA). Variables were treated as categorical.

In terms of BIC, LCA showed largest (BIC=83000), while EMFA and EFA
were similar (BIC=72000 and 73000 respectively).

Based on BIC, EMFA seems the best. However, when I examined data
distribution of each classes produced by the EMFA, I found that one of
the classes still have a bimodal distribution.

Now, my worry is violation of distribution assumption for factor analysis part of EMFA. Should I refrain from using EMFA?

Thank you,

Bengt O. Muthen posted on Wednesday, January 07, 2015 - 5:37 pm

EMFA still has the best BIC. It may be hard to find, but it seems that an even better model is possible. I assume that using one more class for EMFA gives a worse BIC, but perhaps that is because you add too many parameters in that extra class such as class-specific loadings.

M.O. posted on Friday, January 09, 2015 - 8:38 pm

Dear Dr. Muthen,

Thank you very much for your insight. I could not figure out how to try class-invariant models in EMFA framework. I will try LCA-CFA.

M.O. posted on Friday, January 09, 2015 - 8:51 pm

Dear Dr. Muthen,

I would like conduct LCA-CFA while allowing factor means to vary across classes. I found
Example 7.17 to be similar to what I would like to try, but I am not sure how to apply that example in a case with 2 factors and 3 classes. Is the following script correct? Output appeared OK, but I would like make sure.

---------
TITLE: LCA-CFA
DATA:
FILE = 'test.dat';

VARIABLE:
NAMES = u1-u10;
CATEGORICAL = u1-u10;
CLASSES = C(3);

ANALYSIS:
TYPE IS MIXTURE;

MODEL:%OVERALL%
f1 by u1-u5;
f2 by u6-u10;
%c#1%
[f1*1];
[f2*1];
----------------------

Thank you again,

Bengt O. Muthen posted on Saturday, January 10, 2015 - 2:14 pm

Looks ok, except you want to have

%c#1%
[f1*1];
[f2*1];

%c#2%
[f1*1];
[f2*1];

You get the same model if you also mention these variances for class 3.

M.O. posted on Sunday, January 11, 2015 - 6:28 pm

Dear Dr Muthen,
Thanks a lot. Result appears better; no bimodal distribution in any classes.

I am sorry to bother you. Would you kindly check code again? This time, I would like to try a model, where factor loadings, factor variance, and thresholds of indicators are class specific, as in Ex 7.27. I have warning messages when I run, but I am not sure whether it is due to my erroneous code.

----
TITLE: Factor mixture model
DATA: FILE = 'sample.dat';

VARIABLE:
NAMES = u1-u10;
CATEGORICAL = u1-u10;
CLASSES = C(3);

ANALYSIS:
TYPE IS MIXTURE;
ALGORITHM=INTEGRATION

MODEL:%OVERALL%
f1 by u1-u5;
f2 by u6-u10;
[f1@0];
[f2@0];

%c#1%
f1 by u1@1 u2-u5;
f2 by u6@1 u7-u10;
f1;
f2;
[u1$1-u5$1];
[u6$1-u10$1];

%c#2%
f1 by u1@1 u2-u5;
f2 by u6@1 u7-u10;
f1;
f2;
[u1$1-u5$1];
[u6$1-u10$1];
----
Thanks again,

Linda K. Muthen posted on Monday, January 12, 2015 - 6:08 am

Please send the output and your license number to support@statmodel.com.

M.O. posted on Sunday, January 18, 2015 - 10:12 pm

Dear Dr. Muthen,
I would like to conduct LCA-CFA while allowing factor means to vary across classes. Thank you to your kind instruction (Jan 10, 2015 on this thread), I could run the program, and realized that the result is almost the same as that of without class-specific statement. To clarify, here are the scripts (2 class, 1 factor model).

When I tested them with several datasets, (A) and (B) often gave same results, but (B) sometimes gave very different result with a warning message (pasted below). Is it OK to use (A)?

---(A)without class-specific statement---
MODEL:
%OVERALL%
f by u1-u5;

---(B)with class-specific statement---
MODEL:
%OVERALL%
f by u1-u5;
%c#1%
[f*1];
%c#2%
[f*1];
----------------------------------------------
Error message for (B)
ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE
INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY BECAUSE THE
MODEL IS NOT IDENTIFIED, OR BECAUSE OF EMPTY CELLS IN THE JOINT
DISTRIBUTION OF THE CATEGORICAL VARIABLES IN THE MODEL.
----------------------------------------------

Thank you very much again,

Linda K. Muthen posted on Monday, January 19, 2015 - 7:17 am

Factor means must be zero in one class for the model to be identified. They cannot be free across all classes.

M.O. posted on Saturday, January 24, 2015 - 11:57 pm

Dear Dr. Muthen,

I would like to conduct a mixture factor analysis in exploratory setting. Is it possible to conduct LCA-EFA, while keeping factor loadings to be the same across classes? I suspect that maybe I could add model specification to Example 4.4, but it is not clear to me.

Just to clarify, here is example 4.4 in the Mplus manual.

-------
TITLE: this is an example of an exploratory
factor mixture analysis with continuous
latent class indicators
DATA: FILE = ex4.4.dat;
VARIABLE: NAMES = y1-y8;
CLASSES = c(2);
ANALYSIS: TYPE = MIXTURE EFA 1 2;
---------

Thank you very much,

Bengt O. Muthen posted on Sunday, January 25, 2015 - 1:22 pm

You can't do that in the EFA setting, but you can do it using the CFA counterpart in ex 7.27. Your CFA can impose the minimal EFA-like restrictions, plus hold loadings equal across classes.

M.O. posted on Sunday, January 25, 2015 - 4:23 pm

Dear Dr. Muthen,

Thank you very much for your reply.

I wonder if there is any recommended ways to explore number of factors (and items belongs to each factor)in the context of LCA-CFA.

I saw some studies conducted preliminary EFA to explore factor structure, and then use it for model specification in LCA-CFA. However, in our case, data have bimodal distributions, and I am not sure how much I can rely on the result of EFA. I am also thinking of trying
EFA in CFA in combination with LCA. But I am not sure about the right way of specifying anchor for each factor.

Thank you again for your insight.

Bengt O. Muthen posted on Monday, January 26, 2015 - 8:18 am

EFA-within-CFA is described in our Topic 1 handout on our website, slides 134-146.

M.O. posted on Monday, January 26, 2015 - 5:55 pm

Thank you very much for your advice. I read the slides.

I am sorry that I did not express my question clearly. My question was
more to do with how to explore number of factors (and items belongs to each factor) in the context of mixture-factor analysis (LCA-CFA).

In case of employing EFA within CFA framework, we need to start with
EFA to determine anchor items. However, since our data have bimodal
distributions, solution of EFA might not be correct. So, there is a
dilemma: result of EFA is not reliable because of bimodal distribution, and result of LCA is not reliable because of violation of local dependence. Then, to conduct mixture-factor analysis, we need to have some assumption about factor structure.

I suspect that there is no single "correct" way of dealing with this
problem, but I wondered if you have recommendations about this issue.

Bengt O. Muthen posted on Tuesday, January 27, 2015 - 8:57 am

I think the way you started is the way to go: Mixture EFA (which allows loadings to differ across classes). With say 2 classes the bimodality can be accounted for. The strategy is to look at BIC for a series of models:

1 class, 1 factor
1c, 2f
etc
2c, 1f
2c, 2f
etc

M.O. posted on Tuesday, January 27, 2015 - 7:09 pm

Dear Dr. Muthen,

Thanks a lot, I feel much more comfortable about this now. A last question is if you are aware of any papers relevant for citation?

Bengt O. Muthen posted on Wednesday, January 28, 2015 - 4:39 pm

See Papers, Factor Mixture Analysis on our website.