Last week I mentioned that ratings often are weighted accidentally. For example, suppose that a professor gives two tests in a course, each of which is to count for 50% of the final mark. The first test has a mean of 65% and a standard deviation of 8%, while the second has a mean of 65% and a standard deviation of 16%. The problem with these statistics is that two students can do equally well but end up with different final marks.
The first student finishes one standard deviation above the mean on the first test and right at the mean on the second. That is, her marks were 73 and 65, and her final mark is half of 73 + 65, or 69. The second student finishes at the mean on the first test and one standard deviation above the mean on the second. That is, her marks are 65 and 81, and her final mark is (65 + 81)/2, or 73. So, even though each student finished at the mean on one test and one standard deviation above the mean on the other, one ended up with a higher mark than the other.
Whenever two variables of differing variability are combined, the more variable one will have the greater effect on the result. Standardization of the ratings, as described last week, will remove this accidental weighting. In the example, each student would have a z- score of 0 (zero) on one test and +1 on the other. They therefore each would have an average z-score on the two tests of 0.5. The professor might decide that the tests should have a mean of 65 and a standard deviation of 12, so she'd assign each student a mark of 65 + (0.5 X 12) = 65 + 6 = 71. Of course the professor is assuming, or perhaps even knows, that results on the two tests are correlated and therefore measuring the same thing. If they aren't a further analysis of the two tests is desirable.
The marks are now fairer, which in psychometric terms, means that they are more reliable. In other words, they're simply more accurate measures of ability in the course (reliability is discussed in more detail in the article on testing).
Of course, when you're combining more than two items, the more important this problem becomes. The greater the disparity in variation, the less reliable ratings will be, and the more helpful, and necessary, standardization will be.
Next week we'll look at some fine points of standardization.
Eliminating Accidental Weighting © 1999, John FitzGerald
Home page | Decisionmakers' index | E-mail