One of the chief reasons society has concentrated on information management rather than information mastery is that most people haven't been trained in the simple skills and concepts which give us mastery over information. One of these concepts is statistical significance. It is truly a simple concept, but most people have not been taught about it. It is also one of the most important concepts in real analysis, so the general lack of acquaintance with it is hampering our ability to get useful information out of all this electronic technology we've surrounded ourselves with. So here I'm going to take a stab at providing a simple explanation in everyday language. If the stab misses, please let me know of any suggestions you have for getting it nearer the mark.I should note first, however, that some researchers prefer alternatives to significance testing. If you understand significance testing, though, you'll probably also understand their approaches when you run across them.

Statistical significance is founded on estimating how likely an event is. For example, if you tossed a coin 100 times and got 51 heads and 49 tails, you wouldn't be likely to think the coin was crooked. You expect some variation from 50%, after all; in data analysis this variation is called random error. If you got 55 heads, though, you might wonder about your coin. A statistical test would work out the probability of getting 55 heads with an honest coin, and you would then decide if the probability was low enough to worry about.

How do you work out that probability? If you had only tossed the coin twice, the procedure would be pretty obvious. Okay, it would be obvious if most people were trained in the mathematics necessary to work it out, but most aren't. The principle is simple, though. If you toss a fair coin twice, the probability of throwing a head on the first toss is 50%. After the 50% of occasions on which you throw a head, the probability of getting another head on the second toss will again be 50%. So you expect a second head on half of the 50% of second tosses which follow a head on the first toss. The probability of two consecutive heads is therefore 50% of 50%, or 25%.

The probability of a combination of events is always the product of the probabilities of the individual events. For two tosses of the coin, then, you end up with a 25% probability of two heads, a 25% probability of two tails, and consequently a 50% probability of getting one head and one tail.

But what about 100 tosses? Mathematicians have devised a number of formulas for estimating the probabilities of observing different numbers of events over large numbers of trials. You could estimate the probability of getting 55 heads in 100 tosses of the coin with a formula called the chi-square test, or with a formula for estimating what is called the standard error of a proportion (one of the reasons statistics is often seen as daunting is that statisticians have given their elegant ideas some quite inelegant names).

If you work out the standard error of the proportion, you can then consult a table which will tell you that the probability of getting 55 or more heads in 100 tosses of the coin is exactly 16%. So what do you do with that information?

Researchers usually demand that the probability of a result be lower than a criterion probability before they conclude that the result is unlikely. Probably the most common criterion is 5%, although lower criteria are common. If your criterion is that the probability should be less than 5%, then your 55 heads just aren't unlikely enough to justify a suspicion your coin is crooked. You'd need to toss 60 heads or 60 tails to meet that standard. But why would you use 5%? That's a good question.

Any such standard has to be jusified functionally. That is, it should help you make effective decisions. In my experience, most research studies use such large samples and makes so many comparisons that the 5% standard detects far too many unimportant or even spurious differences.

You should assess the size of the difference that different significance criteria will let your test detect. For example, as sample size increases, some very small differences become significant, if only statistically. On the other hand, if you're working with small amounts of data there are occasions on which you can justify a criterion of 10% or more.

As many people have observed, statistical significance has no inherent relationship to practical significance. When you're working with big samples, for example, almost everything becomes statistically significant. For example, if you flip a coin 10,000 times (tip: get a grant and hire people to do this for you) and use a significance criterion of 5%, you need get fewer than 51% heads or tails to consider the result unlikely.

The concept of statistical significance has of course been extended to calculations other than simple percentages or frequencies. We can estimate the probability of differences in averages, for example. If you gave 100 men and 100 women an IQ test, and the women had an average IQ of 110 while the men had an average IQ of 105, should you conclude that the women were smarter than the men? A statistical test would tell you how likely the result was, and you could then decide if the probability that your result was due to random error was small enough to justify concluding that the women did perform better than the men.

Another important area in which statistical significance can be estimated is correlation. Statistical tests called correlation coefficients estimate the degree of similarity between two sets of measurements. For example, you can estimate the closeness of the relationship between height and weight.

Again, many people use questionably low standards for evaluating these coefficients. I have read many articles and seen many presentations in which great attention has been paid to some pretty feeble correlations. If you know the correlation coefficient you can estimate the strength of the relationship very easily. Just square the coefficient. Correlation coefficients range from -1 to +1 (the negative values denote inverse relationships). A value of 1 implies complete similarity and a value of -1 complete dissimilarity. The square of a correlation coefficient of .20 is .20 X .20 = .04. What that square means is that the relation between the two measurements increases your ability to predict one measure from the other by only 4%. For example, if the correlation between height and weight is .20, you have improved your ability to predict a person's weight by 4% if you know his or her height (you usually would work the relationship out separately for each sex, for obvious reasons). If the correlation were .30, you would have increased your ability by 9%, and if the correlation were .70 you would have increased it by 49%.

In summary, to say that a difference is statistically significant is to say that it's unlikely to have happened by accident. The practical implication of statistical significance for information management is that it enables you to decide what parts of the information you're managing are important. For example, if you have collected ten pieces of information which are all highly correlated, you wouldn't want to consider them independently in decisionmaking. Instead you'd do something like work out a combined score. For a further discussion of this aspect of decisionmaking you can consult the articles on fat-free research and plethoratology.

On significance and insignificance © 1999, John FitzGerald

Home page | Decisionmakers' index | E-mail