Message/Author 

Xu, Man posted on Wednesday, May 12, 2010  6:49 am



I have a bit of a problem in conducting some latent class/profile analysis. I have a few scales derived from factor scores from CFA models. I would like to use these scales as indicators for latent profile analysis but not sure if it is good to proceed with such skewed data (most data located on the low range, very few on the mid and high range). I thought about categorising these scales then conduct a latent class analysis, but this would result in loss of information due to truncation. Another method I am thinking about is to transform the factor scores non linearly (e.g.arctan) so that they have shapes that resembles a normal distribution. But I am not sure if the interpretation of the result from the latent profile analysis changes if this is done to the factor scores. If it is fine to transform the CFA derived factor scores, I was wondering if in Mplus I could in one go transform (correct the nonnormal shape) the latent variables, and fit a latent profile model using the transformed variables as indicators? Please give some advice and suggestions. Thanks very much! 


If you expect a latent class (mixture) model underlying your data it is natural for you to see nonnormal outcomes; that's what the mixture can explain. But perhaps you are concerned about skewed factor scores because the measurements do not cover the full range of the factors well. This might suggest that you should do the mixture modeling on the original outcomes. I would not first derive factor scores, then transform/discretized them, then to mixture modeling  you don't know what you have at the end of such a long path. 


I have six variables that are measured on a 010 scale, which I want to use with other binary variables, in a latent class model. Ideally, I would like to treat the six variables as continous. However, at least two of the variables are very positively skewed e.g. over 92% of cases are between 6 and 10 and 3040% are at 10. From reading previous posts (which I don't seem to be able to find again) I believe that this is problematic. I'm therefore seeking advice regarding the options in this situation. I have thought about collapsing categories to create a categorical variable, however it is theoretically difficult to do this. Alternatively, I understand that one option is to treat them as censored variables. However, I am unsure of the assumptions involved in this approach. Any help with this is greatly appreciated. Kind regards, Jen Buckley 


Can you give an example of the question and answer format for the scale? 


Hi, thank you for the quick response. The variables all come from the same question block, which has the following opening: "For each of the tasks I read out please tell me on a score of 010 how much responsibility you think governments should have. 0 means it should not be governmentsâ€™ responsibility at all and 10 means it should be entirely governmentsâ€™ responsibility". This is then followed by a list of tasks, such as providing health care for the sick. A showcard was also used with the question, showing 010 in a line with the following statements at 0 "should not be governments' responsibility at all" and 10 "should be eniterly governments' responsibility". 


You could treat the variables that have strong ceiling effects as censorednormal. It may not make a big difference compared to treating them as regular continuousnormal variables. Typically you have to have quite a strong ceiling effect to see a difference. 


Following your suggestion I've run models with censored variables. However,I've not come across censored variables before and I've encountered some problems. 1) I’m finding it difficult to understand if they are appropriate in this case. If I understand right, it implies that the measurement tool is not able to measure the full range of responses. However, the measurement scale used implies that 10 is the maximum, individuals believe the government is entirely responsible. Therefore, it seems strange to imply the full range is not measured and difficult to interpret class means of 14 or 20. 2) With the variables as continuous a fourclass model was optimal. However, with the censored variables three and four class models might not be identified. (The following message appears “ONE OR MORE PARAMETERS WERE FIXED TO AVOID SINGULARITY OF THE INFORMATION MATRIX. THE SINGULARITY IS MOST LIKELY DUE TO THE MODEL IS NOT IDENTIFIED, OR DUE TO A LARGE OR A SMALL PARAMETER ON THE LOGIT SCALE. THE FOLLOWING PARAMETERS WERE FIXED: 29 28 35”. The parameters are NU(P) for one out of two censored variables) I appreciate that these are not directly issues with Mplus, however any guidance or references in relation to the use of censored variables in this case and steps for getting models identified would be greatly appreciated. Thank you for your time, Jen Buckley 


1. Even though 10 is the highest category allowed, not all who answer 10 may have the same opinion. There may be censoring. 2. Large intecepts in some classes means that everyone is at the maximum. 


Thanks for answering my questions. Hopefully, you might be able to help me with the following? Can you explain, or provide a reference referring to, the technical problems associated with treating variables with strong ceiling effects as continuousnormal? In relation to point 2, I now understand the reason for the message, not how problematic this is: do I need to do something to the model or is it ok to procede? Thanks again, Jen Buckley 


The problem relates to the fact that the assumption of a normally distributed residual for a DV with a strong ceiling effect cannot hold at the ceiling value  only negative residual values can occur. The regression slope is typically underestimated relative to the uncensored underlying DV slope. See Maddala's 1983 book on "Limiteddependent and qualitative variables in econometrics" and perhaps also the classic paper by Tobin. See Reference section in our Topic 2 handout. Regarding point 2: It is ok to proceed; there is no problem. 


Thank you, that's very helpful. 


Hi Is it correct that I do not need to transform skewed variables before conducting an LPA (using MLR)? Many thanks 


Yes, the skewness is part of what is expected in mixtures and part of what determines the classes. 


Thank you very much for confirming. Do you know of any published work that I could reference supporting this? 


I can only give a general background for this, such as the McLachlan & Peel reference we give in the UG ref list. 


Hello, I've conducted a latent profile analysis with observed variables measured on 5 point Likert scales that are somewhat negatively skewed (skeweness between .7 and .1.02). My assumption is that using the latent profile approach to determining the number of classes is correct for this situation, but am concerned that a latent class analysis may be better suited to these outcome measures. Is it correct to use an LPA in this scenario? Thanks. 


I should ask what your thoughts are, not if it is "correct".... 


Because LPA is a mixture model, you would expect your outcomes to be nonnormally distributed with a certain amount of skewness. I would only turn to viewing the outcomes as categorical (using LCA) is you have strong floor or ceiling effects. 


Thank you for your input. 


Hi, Mplus team, I am estimating a model in which one variable (cost) is extremely skewed. I realize that mixture modeling is designed to capture skewed outcomes and by transforming the data I could potentially lose information on classes that are potentially present. I compared model results using untransformed and transformed cost variable and observed nontrivial differences in the results. Few differences were observed in the 2class model, but in a 3class model percentage distribution of profiles and mean values of cost were different. For instance, keeping cost on the original metric leads to separation of one class (2% of a sample) with extremely high cost (not surprising). In the model with logtransformed cost I instead observed two classes (17% and 9%) with similarly high cost but different in respect to most other indicators. Some potential solutions that I have in mind are: 1.Trim some extreme outliers; 2. Rescale cost (e.g. by dividing it by 1000) to bring the metric closer to the rest of the variables and facilitate model convergence. I’d like to avoid recoding cost to categorical, unless you see this as an absolutely recommended approach Also, under what circumstances would you recommend freeing the variances of cost across classes? And, would you recommend sticking to Bayesian estimator given skewness of the data or would MLR handle this just fine? Thank you as always! 


One key issue is if your cost variable has floor or ceiling effects so that this is the major cause of the skewness. If not, perhaps the log transformation is sufficient. You can also try the skewt approach of our paper on our website: Asparouhov, T. & Muthén B. (2015). Structural equation models and mixture models with continuous nonnormal skewed distributions. Structural Equation Modeling: A Multidisciplinary Journal, DOI: 10.1080/10705511.2014.947375. Download Mplus inputs and outputs. 

Back to top 