The final instalment in this series.
Rating and scaling. Many databases are used to produce ratings, and decisions are made on the basis of the ratings. Capital expenditure formulas are examples of this type of rating. The ratings of special need used by some school boards to determine grants to schools are also examples.
Deriving ratings from data is a complicated business, however, and often the ratings are invalid because these complications have not been dealt with. We will look at a few of these complications now.
Incoherent data. One of the essential principles of measurement is that a measuring device measures one thing, and one thing only. A thermometer measures temperature, a barometer measures air pressure, and a hygrometer measures humidity. A measuring device which gave you a reading based on a combination of the temperature, the air pressure, and the humidity wouldn't tell you much of anything.
Similarly, all the items (fields) in a rating scale should measure the same thing. Often they don't, for a variety of reasons. The first is a problem we looked at before – data petrification. An item that does not vary cannot measure anything, so it won't measure the same thing as the other items in the scale. Again, information which varies only slightly will have the same problem.
If the lack of variation is due to premature classification, you can sometimes undo the damage. If you can't, the only thing you can do is jettison the unvarying data.
Then again, all your data may vary adequately, but measure several different things. This problem is often encountered in attitude or satisfaction surveys, but I have also seen it in databases which calculate other types of rating.
In attitude or satisfaction surveys, the problem is that responses to any single attitude item are influenced by many factors in addition to the attitude being assessed. Often the attitude will be less important than these other factors in determining the response to the question.
Capital expenditure formulas, and other non-attitude scales, can also have this type of problem. That is, any single item of information can be a measure not only of need for capital expenditure but of other factors as well. If the other factors are powerful enough, the information may not provide an effective measure of capital expenditure.
In a capital expenditure formula, an additional difficulty is that the formula is intended to measure an abstract, and usually hypothetical, concept – need for capital investment. Often this will turn out not to be a single concept, but two or three. The data used to assess this concept may therefore fall into two or three groups which are not necessarily related to each other.
The result of including information like this in a formula is inaccurate ratings. Let's look at an extreme example. If for a number of cities you added together annual sales of shoes, the number of barber shops, and the floor space of furniture stores, you could claim to have a business index, but not many people would be interested in it. The three measures are so obviously unrelated that their sum can't be a measure of anything.
Although the lack of relationship between measures is usually not so apparent, it can be readily assessed using standard statistical techniques for assessing internal consistency. This type of analysis can be done quickly, and is extremely helpful. Often it will show you how to get reliable ratings by omitting only two or three variables.
Scale construction with factor analysis and reliability assessment is another technique which can be extremely helpful. For example, it can tell you if the ten items on a rating scale actually consist of a group of five items which measure one thing, and another group of five items which measure another thing.
If a rating scale actually consists of two sets of items measuring two different things, it should be split into two rating scales, provided their internal consistency is adequate. The scales can be weighted equally by standardizing them, a technique which will be explained a few paragraphs from now.
Weighting of fields. Earlier we saw that fields can be weighted as well as records. In fact, arbitrary weighting of fields is probably the surest way to destroy the value of your database. If it doesn't destroy its value, it will at least make drawing sound conclusions from your database more time-consuming and arduous than it need be. Unless such weighting is done systematically with a limited range of statistical techniques, it is certain to reduce the reliability of your database.
Furthermore, you cannot avoid this problem by not assigning special weights to some fields. When fields are combined in a rating, the statistical characteristics of each item of information will actually determine a unique weight for it. For example, let's suppose five variables are entered in five fields of a database, no arbitrary weights are assigned, and decisions are based on the sum of the five fields. The problem is that some fields will have higher average values than others, and some will vary more than others. As a result, some will affect the sum more than others. In other words, they will have more weight.
For example, if one field has a mean of 50 and ranges from 40 to 60, while another has a mean of 100 and ranges from 10 to 190, a high score in the second field will have a bigger effect on the sum than a high score in the first field.
People sometimes try to deal with this problem by calculating indexes. The average score in a field is assigned a value of 100, and the score in that field in any particular record is assigned a score corresponding to the percentage difference from the average value. For example, if the mean score is 50, a score of 60 would be given a value of 120.
However, while indexing does eliminate the bias due to differences in average values, it does not eliminate the bias due to differences in variability. For example, if one field has a mean of 40 and ranges from 30 to 50, the highest index will be 125. Another field with a mean of 50 and a range from 10 to 90 will produce a highest index score of 180. Scoring high in the second field will be much more effective in obtaining a high rating than scoring high in the first field.
The best way to eliminate both these biases is to standardize the data. To standardize, you need to know the mean of each field and its standard deviation, which is a measure of variability. Spreadsheet and database software can calculate these two statistics. You then subtract the mean from the value in each field, and divide the difference by the standard deviation. The resulting value is a standard score, and it will have the same mean (zero) and the same variability (a standard deviation of 1) as any other standard score. By standardizing the fields in your database, you will eliminate two kinds of weight problem.
For more information about standard scores, consult this article and this one.
Standardizing will only end your weight problems, however, if you desist from any further arbitrary weighting. If, after standardizing all your data, field A is arbitrarily assigned twice as much weight as field B, the value of the information in the database is almost certain to be reduced. Furthermore, there is a good chance that it will be reduced to zero.
The reason is that data often refuse to behave the way you expect them to. Data that you expect to be important will often, when analyzed mathematically, turn out not to be important at all. Giving additional weight to them will destroy the validity of a formula or scale.
The assignment of weights to the data used in formulas and scales is a complicated process requiring complicated statistical analysis. Usually, it requires judicious interpretation of a factor analysis and formal psychometric assessment of various aspects of reliability. This analysis by itself is not all you have to do, either. You have to consider whether the weights you derive produce results which make sense to people familiar with the field.
The Guiding Principle. The guiding principle of database management is the same as the guiding principle for any sort of software use. That principle is that software doesn't do everything. Having word processing software on your PC does not lead you to expect that your reports will write themselves. Having a personal scheduler on your PC does not lead you to expect that your appointments will schedule themselves. Similarly, having database software does not mean that the data will clean and analyze themselves.
The most important methods for keeping a database fit are simple and can be carried out by anyone familiar with the software. If you collect only a reasonable amount of data and ensure that it is cleaned, you will have gone most of the way toward the goal of database fitness. If you take care of your missing data, and make sure that all your data varies adequately, you will have gone even farther.
The following is a list of suggestions made in this report that can be applied immediately:
when you have to enter data in large numbers of fields, enter a few fields at a time,
after entering data in your database, check a listing against the source of the data,
verify data entry
produce summary reports of the entries in each field to find incorrect codes,
use the sort option to find incorrect codes,
crosstabulate fields and investigate any unusual results,
collect only those items of information which you know to be useful,
use an unmistakable code for missing data, and incorporate additional dummy variables in the file which record whether or not the data in the field you're interested in is valid,
to ensure adequate variability in your data, avoid classifying any data until it's in the database,
have the database calculate percentages, rather than the people providing information,
if you want two groups of equal size to compare, you are better off not to weight one of the groups but either to discard data randomly from one group or collect more data for the other,
standardize data to remove bias due to differences in means and in variability, and
avoid assigning arbitrary weights to fields.