I published this article in my newsletter in 1994, and re-posted it online in 1999 becaiuse the advice was still good. It's still good in 2007, unfortunately.The personal computer allows anyone to do pretty well anything. Fed up with your accountant? Buy accounting software. Tired of paying big money to prepare copy for printing? Buy desktop publishing software. Want to collect lots and lots of information? Buy database software.
On the whole, I think the autonomy conferred by the personal computer is one of the greatest developments since the plastic snow shovel. In the interests of encouraging it, I offer here some advice to the novice designer of databases about what distinguishes a database from a set of records.
1. A database provides an immediate, clear benefit. If you're in business, a database should reduce your costs or increase your revenues. If your organization is non-profit, a database should either reduce your costs, enable you to provide more service for the same cost, or enable you to provide more effective service immediately.
In other words, setting up a database on spec – that is, because it might be useful – or because it's the modern, now, à gogo thing to do, is unlikely to be a good use of your organization's time.
2. A database is complete. Databases with incomplete information are not uncommon, particularly when the data for the database are collected by questionnaire. If the questionnaire is confusing, or if it asks for huge amounts of information, or if it asks for information which respondents are reluctant to reveal, a lot of the questionnaires will not come back, and many of those that do come back will not be filled out completely.
Whatever the reason for missing data, an incomplete database is a catastrophe waiting to happen. Data don't usually go missing randomly, and the non-missing data will therefore not be representative of the missing.
For example, if you're constructing a database of the success of women and men in your organization, but the men are less likely to return questionnaires about their success, your database will probably exaggerate women's success.
3. A database is accurate. Verification of the accuracy of the data in a database often consists only of checking that the data entered in the database are identical to the figures obtained in data collection.
If the data are incorrect to begin with, that will not make them correct. Even if your database contains only opinion data, you are better off if you assess the validity of the opinions – for example, by crosstabulating items to ensure that people understood them correctly.
If your database contains data that can be objectively verified, then it should be objectively verified. If it's impossible to do that then you can't place much faith in your database. For example, if the stock prices listed in your newspaper weren't verified, then there wouldn't be much point in reading them.
If verifying information means that you have to compile a smaller database than you had planned, then compile a smaller database. A small accurate database is better than a big inaccurate one.
4. A database does not contain superfluous information. The step in database construction which is most often omitted is establishing that the data in the database are useful. Of course, you can't know beforehand that every piece of information you collect will be useful, but you can establish afterward whether you need every piece of information you collected.
To find out what data you really need you can use standard psychometric techniques. A good database is intended to help in making specific decisions, and psychometric techniques like scale construction will tell you, quickly, which types of information help in making the decision, and which don't.
The Real Database © John FitzGerald, 1994