Message/Author 


Hi! I am performing a latent profile analysis of 15 items that were selected from 3 different questionnaires. They are based on Likertscales with 4,5 and 6 response options respectively. How are items with different scales managed by MPlus? I am concerned that having different scales for different items implies that the means and variances that comprise class response patterns would differ across class both due to interclass differences in response patterns, but also due to scale ranges. If this is indeed a problem that the user needs to address, how would you suggest to normalise/standardise each items' data before inputting into Mplus? Or does Mplus do some form of item normalisation/standardisation for the user? Most of my item data shows a skewed distribution (in the univariate/total sample sense), so I am not sure how I would normalise it without changing it's overall shape, and (I assume) I want to retain this. Cheers! 


If you are not comparing LPA results across the 3 questionnaires, I don't see that the differences in scales matter. But you are right that having different scales makes it hard to compare LPA results. Yes, defining the variables is something the user has to do. You can treat the variables as categorical (ordinal) to deal with the skewness (perhaps there is also floor/ceiling effects which would be handled that way). 


Thanks for the quick reply, Bengt. Just to clarify, this LPA model contains 15 items derived from three different questionnaires. For this set of items, I am not comparing LPA models (i.e. 2 class vs 3 class, or 2 class with constraints vs 2 class unconstrained). I have accepted that a 2 class model is the best fit, based on theory. So, do I not need to standardise the variables? What I will be doing is comparing this 2class LPA model to a completely different 2class LPA model (which is based on items from a single scale). It turns out that the two classes in both models classify essentially the same cases, and relate to the same theoretical latent variable. In the comparison of the LPA models (three questionnaire LPA model vs one questionnaire LPA model), I will only be comparing the models in their distribution of posterior probabilities within classes. It is a test of the measurement precision for classes in each model, using different indicators. So, can you confirm that I don't need to normalise the 2class LPA model comprised of items from 3 questionnaires? Cheers! 


I don't think standardization or normalization is helpful here  I would not do that. 


Ok, thanks. One additional thought though. If I make all scales equal in size but just multiplying item scores (so they are all from 17 in my case), is that ok? Does that make sense? An item whose scale that was 14 will be multiplied by 7/4 etc. 


That should be ok but not change the class formation. 


Related question: I am doing an LPA with 14 outcomes and 5 covariates. I am running both R3STEP and regular LCA without R3STEP using the MODEL statement. The R3STEP model no problems. The later model (regular LCA with covariates) never replicates the LL. I am convinced it is because one of the covariates is on a massively different scale than the other variables and that this is making estimation of class posterior probabilities more cumbersome (which wouldn't be the case using R3STEP). This variable is birthweight in grams. Other covariates are the usual demographics, includ. sex, ethnicity etc. Indicators are all continuous either 15 or 19 scales. Do I need to transform birthweight in grams do you think? Or would this not matter. In looking at the LL list, it replicates quite a bit until the best LL value right at the end. I'm puzzled. 


Q1: Try it. Perhaps you also have direct effects. 


So z'ing the birthweight grams variable did the trick (best LL replicated many times). I'm wondering though if there is a 'short' answer as to what putting this variable on a scale more like the other covariates (sex, ethnicity, age in years) does that achieves the desired outcome. My thinking is along the lines of b/c the original variable was on such a massively different metric, a global solution would almost never be reached (i.e., the grams variable is in essence its own local solution). Is this reasoning ballpark at all? 


It has to do with numerical precision which can be higher when all variables are on similar sized variances. 

Back to top 