Last week we looked at a couple of less common ways of transforming data to make them more amenable to analysis. The transformations often will help you find relationships which had been obscured before.

A couple of more common data transformations are percentiles (described in the article on testing) and standard scores (defined in the article about comparing apples and oranges). Data are transformed to percentiles or standard scores to make them comparable, and of course when you make data easier to compare you're likely to notice things you would have missed before.

For an example we'll look at a couple of stock market indices. If you draw a graph of the closing prices of the Dow Jones Industrial Average and the Nasdaq Composite index over the first quarter of 2000, it will look like the graph you can reach by clicking here. That graph isn't very informative, since the values of the two indices are so different that the graph cannot adequately represent the variation in both. However, if you transform the two sets of data to standard scores (also known as z-scores), you get a graph like the one you can see by clicking here.

This second graph is considerably more informative than the first. For one thing, it now seems that the two indices are negatively correlated. That is, when one index goes up, the other goes down. If we calculate the correlation coefficient it turns out to be -.68, which is moderately negative; it accounts for 46% of the variance in scores. For another thing, we see that the current values of both the Nasdaq and the Dow Jones Industrial are close to their averages for the quarter (all standard scores have a mean of zero).

Neither of these facts could have been inferred from the graph of untransformed scores. Uncovering facts like these is what analysis is about, rather than, as many seem to believe, the compiling of ream upon ream of irrelevant numbers.

For an explanation of why the correlation was only moderately negative, click here.

Making the Data Talk © 2000, John FitzGerald