Last week I mentioned that standard scores have several advantages for the decisionmaker. The first is that they literally allow you to compare apples with oranges.
For example, let's suppose that you are evaluating employees with three measures: sales, customer approval ratings, and supervisors' ratings. Sales are calculated in dollars, customer approval ratings have a maximum of 10, and supervisors' ratings have a maximum of 40. If you don't convert these ratings to standard scores, you are pretty well restricted to combining these measures by comparing each employee's rank on them. That is too crude a comparison, because ranks vary in significance according to how close an employee's performance is to the average.
In a normal distribution (which is the type of distribution you're usually dealing with) differences in ranks near the average represent less of a difference in ability than differences at the extremes of the distribution. In a group of 100 people, for example, the difference between the first and the fifth-ranked people is usually greater than the difference between the fifty-first and fifty-fifth ranked.
The fairest way to combine these measures involves converting them to standard scores before combining them. Let's suppose the mean score on the customer rating was 6, and that the standard deviation was 1.5 (you can calculate these figures easily with spreadsheet software and database software; statistical software will calculate the standard scores themselves). To convert a customer rating to a standard score, you first subtract the mean score from it. So if an employee's customer rating is 9 you subtract 6 from it to get a remainder of 3. You then divide this remainder by the standard deviation, which is 1.5. Three divided by 1.5 is 2, which is the standard score, also known as the z-score. A z-score of 2 simply means that the employee finished two standard deviations above the mean. A z-score of –2.0 means that he or she finished 2.0 standard deviations below the mean, and so on.
After converting these measures to z-scores you then add them up and you have a fair rating for each employee. The rating assumes that all three measures are of equal importance. People often weight different parts of a rating system — that is, assign different numbers of points to each section, the more important sections getting the most points — but I advise you not to do this. The problem is that the ratings tend to reflect the higher-weighted sections but not the lower-rated ones. This is explained in more detail in a post on this site's blog. If you want different criteria to have different degrees of importance, you can apply the criteria in stages — for example, you could select employees who score well on the most important section and then rate them by their performance on less important sections. We'll have a further look at accidental weighting next week and get a broader view of how standardization can eliminate it.
For an example of how transformation to standard scores can make data comparable, click here.
How to Compare Apples and Oranges © 1999, John FitzGerald