

Variable Type for latent class analysis 

Message/Author 


I'm conducting latent class analysis with scores from an 8 domain instrument representing different areas of functioning/problems. The domains are rated on a scale from 030, but in increments of 10 (i.e., you receive a score of 0, 10, 20, or 30 depending on severity). I'm wondering what the best way to represent the scores in the LCA would be? When I model the scales as continuous variables from 030 I get a solution indicating 4 classes that make conceptual sense for the population. However, if I convert the scales into a 4point likert (i.e., 03) and run as a continuous variable I only get a 3class solution, and the relation of covariates to class membership makes less conceptual sense. The final alternative I'm considering is to run as a 4level categorical variable (though I haven't tried that, yet). Any thoughts as to the most appropriate method given the above description? Thank you, Christian 


I would not treat a variable that cannot take on values between 0 and 10, 10 and 20, etc. as continuous. If you rescale the variable by dividing by ten and then treat it as continuous, I would be surprised if that changes the results. If the variable has floor or ceiling effects, I would treat it as categorical. 


Hello, I would like to run a latent profile analysis and eventually a latent transition analysis. My data are 4 independent variables from a screening questionnaire, each with a possible range from 0 to 10 (n = 345, clustered in 51 groups). Two of the variables use the full range of values but two do not (ranging 08 and 07). The data are positively skewed with many zeroes. I am considering the following options (although I would be grateful for other suggestions): 1) Use the data as continuous variables in a latent profile analysis. In this case, is the normality of the variables important? 2) Refer to the variables as count data (when I do so, the estimated means for my classes do not match the estimated means provided in the graph of the classes across the variables and some means in the output are actually negative, which should not be possible). 3) Dichotomizing the data into atrisk vs. not atrisk based on the screening questionnaire scoring criteria (my sample size is relatively small so I am concerned about using more than two categories due to possible sparseness). What would the best approach be? I would be very appreciative of any inpput as I am quite new to both Mplus and mixture models. Thank you in advance. Madison 


I would treat the variable as a categorical variable which can deal with a piling up of zeroes. There is a limit of ten categories so you will need to collapse categories on the two with 010. 

Back to top 

