Pitfalls of Profiles
A popular use of survey data is the construction of profiles. In constructing a profile, the characteristics of respondents are related to their responses. They are sometimes then combined in ways which are assumed to describe typical respondents. Assuming that a profile represents a typical respondent can, however, cause problems, and results must be carefully analyzed before any such conclusion can be reached.

Let's suppose that you have conducted a marketing survey of 200 single Torontonians. Half are men and half women. Half of the men have incomes over \$50,000 a year, and half of the women.

You ask your sample whether they have taken a vacation trip outside Toronto in the past year. You find that 80% of the people with higher incomes say that they have taken such a trip, and only 50% of the people with lower incomes. Furthermore, 80% of the men have taken a vacation trip outside Toronto, and only 50% of the women. Can you conclude that the typical single traveller is a man with an income over \$50,000?

The first problem with this conclusion is the use of the word typical. Men with incomes over \$50,000 constitute only a quarter of the sample, while 65% of the sample had taken a vacation trip outside Toronto. Men with incomes over \$40,000 are therefore not a majority of the people who have taken vacation trips, so it is difficult to consider them typical. You would need to draw a further random sample of travellers without regard to income or sex to find out if the typical single traveller is a man with an  income over \$40,000.

You might, however, only be interested in finding out if men with incomes over \$40,000 are a more likely market for travel than the other three groups (men with lower incomes, women with higher incomes, women with lower incomes). Do these results allow you to conclude that they are?

No, they do not. Knowledge of the separate effects of your two variables does not allow you to draw conclusions about their combined effects. There are many ways in which the effects of these two variables could have been produced.

For example, further analysis might find that only 70% of men with higher incomes had travelled, but that 90% of the women with higher incomes and 90% of the men with lower incomes had travelled. These results would produce a total sample in which 80% of the men and 80% of the people with higher incomes had travelled, but in which the conclusion that the most likely traveller is a man with an income over \$40,000 is clearly wrong.  A sounder conclusion would be that women with incomes under \$40,000 are much less likely to travel (given the percentages in the other three groups, only 10% of women with incomes of \$40,000 or less would have travelled). Statistical testing would be required to confirm this conclusion, of course, and you would probably be best to draw the further random sample already described to confirm your conclusion.

The example may seem artificial, but similar relationships arise routinely in analysis of data. They are known as interaction effects. In general, analysis of data is more productive if these effects are considered first, before the general effects of the variables being investigated.

In analyzing data, the best approach is to start with the details and then work your way towards an appreciation of general effects (or main effects, as they are known in research). The presence of an interaction effect usually means that the main effects are not particularly informative. Basing a profile solely on main effects will often lead to misleading results.

Classic Profiles © 1996, John FitzGerald