Research, evaluation, analysis

Scaling
The praises of scaling have often been sung on this site, but I haven't provided much of a description of it. The basic idea of scaling is that different measures may actually measure the same thing. For example, you don't expect to be taller in centimetres than you are in feet and inches, nor do you expect the distance between Toronto and Montreal to be different if you travel it in kilometres than if you travel it in miles.

This identity of measures often occurs in educational testing, and usually you want it to occur. For example, if you're writing a test in mathematics, you want every item on the test to assess mathematical ability. That's not as easy a result to arrange as you might think. Many things can go wrong.

For example, items may be too difficult or too easy. An item that no one gets right is obviously not measuring mathematical ability. Similarly, an item that everyone gets right is of no use in distinguishing anyone's mathematical ability from anyone else's, so it too is not a measure of mathematical ability.

One scaling technique which deals with this problem is an extension of reliability analysis, which is disussed in the article on testing. Each item is correlated with the score over all the other items (item-whole correlation), and a reliability coefficient is calculated which estimates how well the two halves of the test agree (split-half analysis). If the reliability coefficient is too low, the item-whole correlations often will point out the items which are causing the problem. The best solution is to re-write the bad items.

Often you will want a test to contain homogeneous scales which nevertheless differ from each other. For example, a personality test may give ratings on several personality traits. For the separate ratings to be useful the ratings on items measuring each trait have to be unrelated to the items measuring each other trait. One way to see if that is true is to use principal components analysis or factor analysis. These correlational techniques will tell you quickly whether the items are independent. Often they are not, which would mean in the example that the trait scores were meaningless. People would tend to get the same ratings on every supposed trait, so knowing the trait scores would not be giving you any information.

Scaling can be used on many types of data. It should usually be used to construct indices, for example, although it rarely is. I have used it on formulas for determining need for capital expenditure and need for additional budget.

Scaling is especially useful in opinion surveying, but again it is rarely used. A properly designed questionnaire can easily be scaled to determine what general attitudes, if any, underlie the responses to individual items. Another advantage of scaling in opinion surveying is that it helps reduce the effect of the error rate problem. Once you have identified items which are measuring the same attitude, you can calculate a single score for the whole group of items. Instead of making multiple significance tests (to compare regions, for example), you make one. The probability of detecting a spurious difference does not become dangerously inflated as it does when you perform multiple tests. And because you're making fewer tests and testing general conceptions, the report is a whole lot easier to understand.

[Note: Some people would argue that I am talking about indexes here rather than scales, which in some branches of psychometrics are a sub-class of indexes. However, the term index invites confusion with measures such as the Consumer Price Index, so I have used scale for all measures of the type described, which usually function well as measures of quantity.]
Scaling © 1999, John FitzGerald
Home page | Decisionmakers' index | E-mail