Beware of Big Samples

Last week we looked at some of the benefits of small samples, and this week we'll look at some of the drawbacks of large samples. One which is often mentioned is the increase in the chance of non-sampling error – that simply means that as the sample size increases so does the chance that someone's going to make mistakes with the data, while the chances of detecting such error grow less.

That may sound like a petty objection, but it is a common problem with big data sets. For example, it is not uncommon for databases to contain data whose accuracy cannot be determined, simply because the resources are not available. Relying on reports of these data requires a leap of faith.

Another common problem is the tremendous statistical power of big samples. That is, with big samples you almost always find statistically significant relationships and differences.

For example, let's suppose that you want to compare the percentages of men and women in samples from two different groups. If each group has 100 members, and group A is composed of equal numbers of men and women, then one sex must constitute at least 68% of group B for you to conclude that there is a significant difference, if the criterion for significance is that the probability of the difference happening by chance must be less than .01.

If there are 1,000 people in each group, and group A again has equal numbers of men and women, then group B need only be about 56% men or women for you to conclude there's a difference. That difference may still be useful to you, but let's look at what happens when each group has 10,000 members. Now group B need only be about 52% men or women for the statistical test to declare the difference statistically significant.

As last week's article implied, the important issue in research is not whether a difference or relationship is significant but whether a statistically significant difference has any practical value. For example, if your comparisons of men and women were being done to find a group to direct your male-oriented advertising at, you probably aren't interested in choosing group B rather than group A because 52% of its members are men, regardless of what the statistical test says. You want to find a group with a lot of men. The best approach is to choose a sample size which has a reasonable chance of detecting a significant difference of the size you're interested in.

Beware of Big Samples (c) 1999, John FitzGerald