

Modeling unreliability of a categoric... 

Message/Author 


Hello, I am interested in testing the predictive ability of a categorical diagnosis (i.e., 0 = doesn't have disorder, 1 = has disorder). I would like to incorporate the unreliability of the diagnosis (e.g., interrater reliability) into my model. How would I attempt such a thing? Thanks, Jim 


Almost forgot, let's say that my estimate of the diagnostic category's reliability is in the form of kappa (because that's probably what's available in the literature). Jim 

bmuthen posted on Wednesday, July 20, 2005  1:34 pm



That's a tricky one. You could try to work with a binary latent class variable c that has a fallible binary indicator u like is done in "Hidden Markov" modeling (see UG ex 8.12). This means that when c = 1 as opposed to 2, u is not necessarily 1, but is 1 with probability say 0.8, where 0.8 reflects the amount of reliability. Fixing these probabilities (which means fixing threshold values as seen in 8.12), you can predict from c instead of from u. I have not tried this approach however. 


Hi Bengt, This is going to clearly show my ignorance of categorical modeling, but I was concerned that this method could potentially increase the error in the latent diagnosis variable (instead of removing the unreliability from the observed diagnosis) because cases would potentially be sorted incorrectly. For example, if there is a conditional probability of .8 for individuals who are positive on the observed variable to be positive on the latent variable, then 20% of individuals who are positive on the observed variable will not be appropriated to be positive on the latent variable. However, if these 20% of individuals are chosen at random, then it is possible than many individuals who are true positives on the latent variable will be assigned to the be negative on the latent class because the sorting is random (and this would increase error). Now, this logic doesn't feel right to me, and I would think the procedure that you are recommending would be less faulty than how I am describing it; I just don't know the correct logic of how the procedure would work. If you could explain how this procedure would overcome my hypothetical problem, I would greatly appreciate it. 

bmuthen posted on Friday, November 04, 2005  2:03 pm



The conditional probability of 0.8 that I mentioned is the probability of being positive on the observed variable given the latent, not the other way around as you state it. I don't understand what you mean by the 20% chosen at random. It is true that the observed variable, even when acknowledging its unreliability this way, will not precisely capture the latent variable. You do better if you have many observed indicators of the latent variable  as you do in repeated measures modeling such as latent class or latent transition (Hidden Markov) modeling. This is a matter of precision in estimating the latent variable "scores" (to use factor score estimation terms). 

Back to top 

