The Logic of Second Opinions
In research a judgment is said to be objective when everyone agrees on it. A multiple-choice test, for example, is said to be objective because anyone using the scoring key correctly will assign exactly the same scores as anyone else who uses the scoring key correctly.
Many judgments, however, are less than perfectly objective, but still less than entirely subjective. One of the ways to figure out just how objective they are is to get a second opinion.
Many years ago I supervised an extensive program of classroom observation for a study of the effects of class size. To ensure that the observers were all classifying what they saw in a similar enough way, they were occasionally assigned to observe in pairs. One of the routines they followed was to observe a number of children for a few minutes each and at fifteen-second intervals record whether or not the child they were observing was "on task" – educationese for doing his or her work.
So, if two observers observe the same children at the same time, how many of their ratings should be identical? Eighty per cent? Ninety per cent? As is usual in research and statistics, the answer is that it depends.
What it depends on is the likelihood of agreeing by accident. For example, if each of two observers classifies children as on task 90% of the time, you'd expect that just by accident the two of them would agree 82% of the time. Why? Well, even if their judgments are entirely unrelated, you'd still expect the second observer to consider a child on task on 90% of the occasions on which the first observer found the same child on task, and to find a child off task on 10% of the occasions on which the first observer found the same child off task. Ninety per cent of 90% plus 10% of 10% equals 82%.
Given that expected rate of agreement, if the observers made 100 observations, they'd have to agree 87% of the time before you could conclude, by the lowest commonly used standard, that their agreement was more than accidental. The standard I prefer would require agreement on 91% of the observations.
If each observer rated children on task 95% of the time, you would expect the two of them to agree just by accident 90.5% of the time. To do significantly better than that they would have to agree 95% of the time by the lower standard, and 98% of the time by the higher. Many people who do observational research seem not to be aware of this type of analysis, so their reports of 90% agreement often aren't informative. At least, though, they work out the agreement between the two sets of opinions. In many fields, the objectivity of second opinions isn't assessed at all.