Deciding Where to Measure
Measurement is the heart of evaluation. For example, to evaluate a training program you could assess the abilities of people before they enter the program and then again after and compare the results. If their abilities haven't increased, then the program hasn't worked, right?
In fact, that is not necessarily a correct conclusion. An important issue here is the generality of the measurement. The more general the measurement, the less chance you have of detecting the effect of training.
A simple, if artificial, example will help illustrate this point. Let us suppose a group of students who receive low marks in French are given additional vocabulary drill, and the measure of success is the improvement, if any, in their French marks. The problem is that vocabulary is not the only thing being measured by a French test; French tests usually assess grammar and comprehension, as well, and sometimes pronunciation and oral facility. Consequently, since the test is only partially a measure of knowledge of vocabulary, the effect of the vocabulary drill will be more difficult to detect.
Of course, the goal should be to improve marks in French. The danger, though, is that failure to improve marks in French will be interpreted as a sign of the failure of the vocabulary training. The students may well have acquired bigger vocabularies, a legitimate academic and linguistic achievement, but the choice of measure may have made the effect impossible to detect. Their knowledge of vocabulary should also have been assessed before and after training.
If the vocabulary training was successful but French marks did not increase significantly, the problem may only be that the sample was too small. If the sample was large enough, then the trainers will probably consider expanding their remedial program.
Another possible result is that French marks could increase while knowledge of vocabulary does not (the students, for example, could have supplemented their unsuccessful vocabulary training with increased study of grammar). Measuring at both levels will avoid making the mistaken conclusion that the vocabulary training worked.
This is an example, by the way, of a causal chain. The hypothesis is that improving vocabulary in French leads to improvement leads to imporvement in a higher-order concept, French marks. In a causal chain you must evaluate changes at all the levels of analysis.