Xu, Man posted on Wednesday, May 12, 2010 - 12:49 pm
I have a bit of a problem in conducting some latent class/profile analysis. I have a few scales derived from factor scores from CFA models. I would like to use these scales as indicators for latent profile analysis but not sure if it is good to proceed with such skewed data (most data located on the low range, very few on the mid and high range). I thought about categorising these scales then conduct a latent class analysis, but this would result in loss of information due to truncation. Another method I am thinking about is to transform the factor scores non linearly (e.g.arctan) so that they have shapes that resembles a normal distribution. But I am not sure if the interpretation of the result from the latent profile analysis changes if this is done to the factor scores.
If it is fine to transform the CFA derived factor scores, I was wondering if in Mplus I could in one go transform (correct the non-normal shape) the latent variables, and fit a latent profile model using the transformed variables as indicators?
If you expect a latent class (mixture) model underlying your data it is natural for you to see non-normal outcomes; that's what the mixture can explain.
But perhaps you are concerned about skewed factor scores because the measurements do not cover the full range of the factors well. This might suggest that you should do the mixture modeling on the original outcomes.
I would not first derive factor scores, then transform/discretized them, then to mixture modeling - you don't know what you have at the end of such a long path.
The variables all come from the same question block, which has the following opening:
"For each of the tasks I read out please tell me on a score of 0-10 how much responsibility you think governments should have. 0 means it should not be governmentsâ€™ responsibility at all and 10 means it should be entirely governmentsâ€™ responsibility".
This is then followed by a list of tasks, such as providing health care for the sick.
A showcard was also used with the question, showing 0-10 in a line with the following statements at 0 "should not be governments' responsibility at all" and 10 "should be eniterly governments' responsibility".
You could treat the variables that have strong ceiling effects as censored-normal. It may not make a big difference compared to treating them as regular continuous-normal variables. Typically you have to have quite a strong ceiling effect to see a difference.
Following your suggestion I've run models with censored variables. However,I've not come across censored variables before and I've encountered some problems.
1) I’m finding it difficult to understand if they are appropriate in this case. If I understand right, it implies that the measurement tool is not able to measure the full range of responses. However, the measurement scale used implies that 10 is the maximum, individuals believe the government is entirely responsible. Therefore, it seems strange to imply the full range is not measured and difficult to interpret class means of 14 or 20.
2) With the variables as continuous a four-class model was optimal. However, with the censored variables three and four class models might not be identified. (The following message appears “ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED: 29 28 35”. The parameters are NU(P) for one out of two censored variables)
I appreciate that these are not directly issues with Mplus, however any guidance or references in relation to the use of censored variables in this case and steps for getting models identified would be greatly appreciated.
The problem relates to the fact that the assumption of a normally distributed residual for a DV with a strong ceiling effect cannot hold at the ceiling value - only negative residual values can occur. The regression slope is typically underestimated relative to the uncensored underlying DV slope.
See Maddala's 1983 book on "Limited-dependent and qualitative variables in econometrics" and perhaps also the classic paper by Tobin. See Reference section in our Topic 2 handout.
Regarding point 2: It is ok to proceed; there is no problem.