Some of the other articles on this site may lead you to believe that I have a bee in my bonnet about opinion polls. I can't deny that such an opinion is reasonable, but in fact my doubts about opinion polls have more to do with how poll results are reported than with polling itself. A properly conducted poll is valuable, but rarely are we given enough information about poll results to be able to figure out if the poll was valuable or not.
For example, on Thursday Reuters and MSNBC published the results of one of series of "tracking" polls of people who said they would likely vote in the forthcoming American presidential elections. One "finding" was reported as follows: "Bush led among men by 47-41 percent." Since the poll had a sample of 1,200 men and women, anyone who has done significance testing of survey results would immediately suspect that this "lead" is illusory. In fact, unless over 75% of the people surveyed were men, a highly unlikely possibility, this difference cannot be statistically significant.
The article, however, neither reported the results of a significance test nor provided the numbers of men and women so that we could work it out on our own. While the idea of performing a significance test on a news item may seem comical, a significance test is one of those procedures which are necessary to enable us to distinguish information from plain old meaningless data. In fact, by not applying a significance test the reporter seems to have missed an obvious headline, since in previous polls the difference between Bush and Gore in support from men was clearly statistically significant.
As the statistically trained will have noticed, I have glossed over some fine points of the analysis (for example, in practice the best procedure would be to start by comparing the difference between men and women in support for each candidate). Nevertheless, the conclusion and the general point are valid. Unanalyzed data are just numbers. Differences between them may just be accidental – we certainly wouldn't expect Bush's and Gore's percentages to be exactly equal at every poll, even if there were no difference in the population from which the samples were drawn. For numbers to justify a conclusion the likelihood of coming up with them by chance must be assessed. If that is not done, the numbers are uninformation.
For a related article about polling, click here.
For a related article about information technology, click here.