Inflated Correlations

Most statistical methods were devised to be used with data from single cases, but they are often used with aggregated data. For example, instead of correlating students' test scores with their parents' incomes, researchers will correlate the average test scores at entire schools with the average parent's income.

As anyone who has done this type of work will tell you, you get a much higher correlation coefficient with the aggregated data than with the individual data. This happens because of a straightforward statistical phenomenon.

A test score, or any other score, consists of an effect plus error. In our example the effect is that of parents' income. The error is randomly distributed with a mean of zero. When you aggregate the data you are in effect drawing a sample, and you get the usual effect of sampling. That is, you reduce the error. The positive and negative values of error cancel each other and the value of the average error moves closer to zero.

In other words, you are getting a purer measure of the effect. That sounds desirable, and if you are interested in predicting the performance of the entire student body it in fact is desirable. If, on the other hand, you really are interested in using income to predict individual performance, it is not desirable. For example, much of both popular and professional belief about the success of poor students in school is based on the more powerful correlations of income and academic achievement produced by aggregated analysis. Low income is considered to be a much greater handicap than it in fact is, and this probably tends to distract attention away from other important causes of learning problems in poor children.

Inflated Correlations © 2001, John FitzGerald
Home page | Decisionmakers' index | E-mail