Mplus Discussion >> Categorical vs. binary variables

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Categorical vs. binary variables

Mplus Discussion > Categorical Data Modeling >

Message/Author

rgm smeets posted on Wednesday, October 10, 2018 - 12:15 pm

I am running a LCA with categorical variables.I use categorical variables with more than two categories.
As binary variables seem easier to interpret to me, I am wondering whether I could change these categorical
variables with more than two categories into binary variables (dummy's). Does this influence the way subgroups are made? Or doesn't it make a difference?

As an example, I could make
five categories of age within the single variable of age and put it into Mplus as a categorical variable with 5 categories
(i.e. age (category 1=0-19yrs, category 2=20-39yrs) and so on).
I could also put in five binary variables for each of the categories within the variable of age (i.e. age_variable1 (1= 0-19yrs
0= else) age_variable2 (1=20-39yrs, 0=else) and so on).

Bengt O. Muthen posted on Wednesday, October 10, 2018 - 5:07 pm

Yes, it makes a difference in the class formation of you work with a dichotomized version. You can check the sensitivity to this by running both ways.

rgm smeets posted on Thursday, October 25, 2018 - 7:49 am

For the previous weeks, I have been running a LCA with categorical and continuous indicators. I encountered some warnings/messages in my output. I searched for the implication of these warnings at this forum and I concluded that I should not worry about them (as the system deals with the situation by for instance fixing variables).

To be 100% sure that the warnings are no problem, I would like to ask you whether I interpret the warnings correctly.

Warning 1: logit thresholds of some indicators are set at extreme values (-15 and 15) --> this is not a problem, just means that in some classes there is a very high or low probability of subscribing to the item.

Warning 2: parameters are fixed to avoid singularity of the information matrix (due to model-nonidentification or empty cells) --> this is also not a problem. It just tells me that the system fixes variables to prevent singularity. I can though imagine where the singularity comes from as I have closely related indicators.

Warning 3: of the X cells in the latent class indicator table, X were deleted in the calculation of chi-square due to extreme values --> this is also not a problem. I should only be careful in interpreting the chi-square as it might unreliable. The rest of the results are fine.

Thanks a lot!

Bengt O. Muthen posted on Thursday, October 25, 2018 - 6:19 pm

W1: Correct.

W2: A non-identification message could be a problem if not related to empty cells.

W3: Right.

rgm smeets posted on Friday, October 26, 2018 - 12:06 am

Thank you for your reply. With regards to warning 2, is there any way I can check whether the warning comes from non-identification or empty cells? Could I somewhere see the information matrix? Btw, I increase the random starts values multiple times and do not get any other messages of non-convergence or non-identification.

Bengt O. Muthen posted on Friday, October 26, 2018 - 3:26 pm

The output shows which parameter is flagged in the non-identification message. If you increase the number of starts you are perhaps getting a better logL value - check - so a different solution that doesn't have a problem.

rgm smeets posted on Saturday, October 27, 2018 - 10:33 am

Thank you for the answer. I am still doubtful whether the cause for this warning is due to model non-identification or empty cells in the information matrix. I increase the start values multiple times (from 100 20 to 200 40 to 400 80 and so on) and each time, the best LogL has been replicated (although a small part of the runs did not converge). But the warning about the fixing of parameters remains. Should I still worry about this warning if the best Logl gets replicated several times? The frustrating thing is that the classes we find are clinically relevant.

Bengt O. Muthen posted on Sunday, October 28, 2018 - 2:08 pm

When you do these runs with different number of Starts, the important thing is not only if the best logL is replicated but also if these best logL values vary across the runs - you of course want to use only the run which has the best best logL.

If the warning remains in the best run, you can send your full output to Support along with your license number so we can understand what's going on.

rgm smeets posted on Tuesday, October 30, 2018 - 8:18 am

Thank you for your help again. I now made my model more "simpel" (by leaving out some variables and merging some categories within variables). This makes the warning about the singularity disappear but gives me the same clinically relevant classes. I also see that the (same) best Logl is replicated with several increases of starting values (from 200 40 to 400 80) without any warnings about unconverged runs. If I come to a 5-class model nonetheless, I see that the best Logl is replicated in just over half of the runs and some are unconverged. Is this a sign that I should stop increasing the number of classes?

Bengt O. Muthen posted on Tuesday, October 30, 2018 - 5:09 pm

Yes, it could be a sign of that. But check with BIC.