Modeling unreliability of a categoric...
Message/Author
 Jim Prisciandaro posted on Tuesday, July 19, 2005 - 8:35 pm
Hello,

I am interested in testing the predictive ability of a categorical diagnosis (i.e., 0 = doesn't have disorder, 1 = has disorder). I would like to incorporate the unreliability of the diagnosis (e.g., inter-rater reliability) into my model. How would I attempt such a thing?

Thanks,
Jim
 Jim Prisciandaro posted on Tuesday, July 19, 2005 - 9:38 pm
Almost forgot, let's say that my estimate of the diagnostic category's reliability is in the form of kappa (because that's probably what's available in the literature).

Jim
 bmuthen posted on Wednesday, July 20, 2005 - 7:34 am
That's a tricky one. You could try to work with a binary latent class variable c that has a fallible binary indicator u like is done in "Hidden Markov" modeling (see UG ex 8.12). This means that when c = 1 as opposed to 2, u is not necessarily 1, but is 1 with probability say 0.8, where 0.8 reflects the amount of reliability. Fixing these probabilities (which means fixing threshold values as seen in 8.12), you can predict from c instead of from u. I have not tried this approach however.
 Jim Prisciandaro posted on Friday, November 04, 2005 - 6:05 am
Hi Bengt,

This is
going to clearly show my ignorance of categorical modeling, but I was
concerned that this method could potentially increase the error in the
latent diagnosis variable (instead of removing the unreliability from
the observed diagnosis) because cases would potentially be sorted
incorrectly. For example, if there is a conditional probability of .8
for individuals who are positive on the observed variable to be positive
on the latent variable, then 20% of individuals who are positive on the
observed variable will not be appropriated to be positive on the latent
variable. However, if these 20% of individuals are chosen at random,
then it is possible than many individuals who are true positives on the
latent variable will be assigned to the be negative on the latent class
because the sorting is random (and this would increase error).
Now, this logic doesn't feel right to me, and I would think the
procedure that you are recommending would be less faulty than how I am
describing it; I just don't know the correct logic of how the
procedure would work. If you could explain how this procedure would
overcome my hypothetical problem, I would greatly appreciate it.
 bmuthen posted on Friday, November 04, 2005 - 8:03 am
The conditional probability of 0.8 that I mentioned is the probability of being positive on the observed variable given the latent, not the other way around as you state it. I don't understand what you mean by the 20% chosen at random. It is true that the observed variable, even when acknowledging its unreliability this way, will not precisely capture the latent variable. You do better if you have many observed indicators of the latent variable - as you do in repeated measures modeling such as latent class or latent transition (Hidden Markov) modeling. This is a matter of precision in estimating the latent variable "scores" (to use factor score estimation terms).