In the article about averages I mention that statistical analysis is concerned with two mathematical characteristics of samples: averages and variability (variability is also known as variation or dispersion). Last week we looked at averages so this week we'll look at variability.
A simple measure of variability is the range, which is simply the difference between the largest and smallest numbers in a set. For example, if the lowest mark a teacher gives on a test is 60 and the highest 90, the range is 30. If another teacher gives a lowest mark of 40 and a highest mark of 80, the range is 40, and so on. The range is not all that useful – for example, on a test a single poor student in a class of good ones can grossly exaggerate the variability of scores. Statisticians and researchers prefer to work with measures which assess variation from the mean, which, as you will recall from last week, is the arithmetic average of a set of numbers.
The particular measure they prefer to use is the standard deviation. This measure is commonly provided by spreadsheet and database software. It is the square root of the variance, which is obtained by subtracting the mean from every score in a set, squaring the differences, adding them up, then dividing the sum by the total number of scores minus 1 (sometimes you can just divide by the total number, but usually it's better, for mathematical reasons, to use the total number minus 1).
Well, no doubt you could think of simpler ways to assess variability, but the standard deviation has a few outstanding qualities which make it more desirable than other measures. One is that if the set of numbers, or distribution, we are analyzing has a few characteristics which distributions frequently have, the standard deviation can be used to estimate the accuracy of the sample mean as an estimate of the population mean. This estimate is usually phrased as a confidence interval. For example, the 95% confidence interval of a mean is the range within which the population mean has a 95% chance of occurring. The smaller the standard deviation, the narrower this range is.
Another advantage of the standard deviation is that it can be used to convert scores calculated on different scales to scores on a standard scale (these scores are unsurprisingly known as standard scores). Besides equating scores on different scales, a standard score also gives you an accurate idea of its relative size or importance. For example, a standard score of 1.00 will be higher than 84% of all scores in a normal distribution. Converting data to standard scores therefore gives you important evaluative information. Since spreadsheet and database software allow you to calculate the standard deviation they also allow you to calculate standard scores, which have several advantages for the decisionmaker. In another article we look at these advantages and see how you can calculate and interpret standard scores.