Skewed varaibles for latent class/pro... PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
Message/Author
 Xu, Man posted on Wednesday, May 12, 2010 - 6:49 am
I have a bit of a problem in conducting some latent class/profile analysis. I have a few scales derived from factor scores from CFA models. I would like to use these scales as indicators for latent profile analysis but not sure if it is good to proceed with such skewed data (most data located on the low range, very few on the mid and high range). I thought about categorising these scales then conduct a latent class analysis, but this would result in loss of information due to truncation. Another method I am thinking about is to transform the factor scores non linearly (e.g.arctan) so that they have shapes that resembles a normal distribution. But I am not sure if the interpretation of the result from the latent profile analysis changes if this is done to the factor scores.

If it is fine to transform the CFA derived factor scores, I was wondering if in Mplus I could in one go transform (correct the non-normal shape) the latent variables, and fit a latent profile model using the transformed variables as indicators?

Please give some advice and suggestions.

Thanks very much!
 Bengt O. Muthen posted on Thursday, May 13, 2010 - 7:56 am
If you expect a latent class (mixture) model underlying your data it is natural for you to see non-normal outcomes; that's what the mixture can explain.

But perhaps you are concerned about skewed factor scores because the measurements do not cover the full range of the factors well. This might suggest that you should do the mixture modeling on the original outcomes.

I would not first derive factor scores, then transform/discretized them, then to mixture modeling - you don't know what you have at the end of such a long path.
 Jennifer Buckley posted on Monday, November 22, 2010 - 5:44 am
I have six variables that are measured on a 0-10 scale, which I want to use with other binary variables, in a latent class model.

Ideally, I would like to treat the six variables as continous. However, at least two of the variables are very positively skewed e.g. over 92% of cases are between 6 and 10 and 30-40% are at 10.

From reading previous posts (which I don't seem to be able to find again) I believe that this is problematic. I'm therefore seeking advice regarding the options in this situation.

I have thought about collapsing categories to create a categorical variable, however it is theoretically difficult to do this.

Alternatively, I understand that one option is to treat them as censored variables. However, I am unsure of the assumptions involved in this approach.

Any help with this is greatly appreciated.

Kind regards,

Jen Buckley
 Linda K. Muthen posted on Monday, November 22, 2010 - 8:04 am
Can you give an example of the question and answer format for the scale?
 Jennifer Buckley posted on Monday, November 22, 2010 - 9:18 am
Hi, thank you for the quick response.

The variables all come from the same question block, which has the following opening:

"For each of the tasks I read out please tell me on a score of 0-10 how much responsibility you think governments should have. 0 means it should not be governments’ responsibility at all and 10 means it should be entirely governments’ responsibility".

This is then followed by a list of tasks, such as providing health care for the sick.

A showcard was also used with the question, showing 0-10 in a line with the following statements at 0 "should not be governments' responsibility at all" and 10 "should be eniterly governments' responsibility".
 Bengt O. Muthen posted on Monday, November 22, 2010 - 5:33 pm
You could treat the variables that have strong ceiling effects as censored-normal. It may not make a big difference compared to treating them as regular continuous-normal variables. Typically you have to have quite a strong ceiling effect to see a difference.
 Jennifer Buckley posted on Monday, November 29, 2010 - 4:15 am
Following your suggestion I've run models with censored variables. However,I've not come across censored variables before and I've encountered some problems.

1) I’m finding it difficult to understand if they are appropriate in this case. If I understand right, it implies that the measurement tool is not able to measure the full range of responses. However, the measurement scale used implies that 10 is the maximum, individuals believe the government is entirely responsible. Therefore, it seems strange to imply the full range is not measured and difficult to interpret class means of 14 or 20.

2) With the variables as continuous a four-class model was optimal. However, with the censored variables three and four class models might not be identified. (The following message appears “ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED: 29 28 35”. The parameters are NU(P) for one out of two censored variables)

I appreciate that these are not directly issues with Mplus, however any guidance or references in relation to the use of censored variables in this case and steps for getting models identified would be greatly appreciated.

Thank you for your time, Jen Buckley
 Linda K. Muthen posted on Monday, November 29, 2010 - 9:50 am
1. Even though 10 is the highest category allowed, not all who answer 10 may have the same opinion. There may be censoring.

2. Large intecepts in some classes means that everyone is at the maximum.
 Jennifer Buckley posted on Wednesday, December 01, 2010 - 2:28 am
Thanks for answering my questions. Hopefully, you might be able to help me with the following?

Can you explain, or provide a reference referring to, the technical problems associated with treating variables with strong ceiling effects as continuous-normal?

In relation to point 2, I now understand the reason for the message, not how problematic this is: do I need to do something to the model or is it ok to procede?

Thanks again,

Jen Buckley
 Bengt O. Muthen posted on Wednesday, December 01, 2010 - 10:43 am
The problem relates to the fact that the assumption of a normally distributed residual for a DV with a strong ceiling effect cannot hold at the ceiling value - only negative residual values can occur. The regression slope is typically underestimated relative to the uncensored underlying DV slope.

See Maddala's 1983 book on "Limited-dependent and qualitative variables in econometrics" and perhaps also the classic paper by Tobin. See Reference section in our Topic 2 handout.

Regarding point 2: It is ok to proceed; there is no problem.
 Jennifer Buckley posted on Thursday, December 02, 2010 - 1:25 am
Thank you, that's very helpful.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: