Data often behave in ways which make analyzing them difficult.For example, if you want to detect differences between age groups, but have classified the people in your sample into ten age categories, the numbers of people in some or all of the categories may be too small for a statistical test to find differences between them.
That problem can be overcome, of course, by combining categories – or collapsing them, as the process is often described in research. That raises the question, though, of which categories to combine with which.
You do not want to combine categories arbitrarily or in any way which might lead to the suspicion that you had combined the categories in the way most convenient for your hypothesis. What I usually do is start by combining all the categories into two, a higher and a lower. For example, if I were combining age categories I would combine them into two groups, one older and one younger, of most nearly equal size. To get the most nearly equal numbers of people in each group it might be necessary to combine seven categories, say, into the younger group, but only three into the older. The important consideration is the number of people in each group.
The advantage of two big groups of nearly equal size is that they maximize the power of the statistical test. If you find a statistically significant difference between them, you can then reclassify the group into three groups of nearly equal size and perform the test again. If you keep reclassifying into progressively more groups until the significant difference disappears, you can get a pretty good understanding of the relationships of subgroups within your sample.