|· Home · Description · Purchase ·Tutorials · Download · Support · Compare · Videos · What's New? ·|
Why Use Multiple Comparison Tests?
Analysis of research data analysis often involves the comparison of observed outcomes from two or more groups. For example, suppose you have an experiment that compares a control group against two or more experimental groups. Do you use t-tests for this analysis, or some other technique? This article examines problems surrounding the statistical comparison of three or more groups in an experiment, and provides suggestions for how you can address the problem.
For this discussion, suppose your experiment contains outcomes whose values are continuous measures and considered to be statistically normal. Typical outcomes might be scores on a test, length of time until some event occurs, volume, height, etc. For experiments with this type of outcome measure, it makes sense to compare group mean.
In the case where there are only two means to compare, most researchers will use a Student t-test for independent groups to determine if there are statistically significant differences.
When an experiment consists of more than two groups, the analysis becomes a bit more complicated. Suppose you are comparing the means of three groups and you’re interested in knowing which means are significantly different from the other means. It may seem logical to perform t-tests for all pairs of means – (group1 vs. group 2), (group 1 vs. group 3) and (group 2 vs. group3). In other words, perform t-tests on all possible comparisons. However, there is a fundamental problem with this technique. The p-value associated with each t-test is determined as if only one t-test is performed per experiment. If three t-tests are performed in a single experiment, then the p-values for these tests are no longer accurate.
As an example, suppose you flipped a coin once. The chance of getting a head on any single flip is one-half (50%). However, if you flipped it three times, the probability of getting at least one head is increased substantially (it is now a 87.5% chance). In the same way, if you perform one Student t-test with p = 0.05 as the level of significance, you have a 0.05 chance of making an incorrect decision (This is called a type I error – rejection of a true hypothesis.) However, if you perform three tests, each at the 0.05 level of significance, your chance of making at least one incorrect decision increases significantly. (In this case, your chance of making an error has risen to about 14.3%).
Therefore, the solution to the multiple comparisons problem is NOT to perform all possible t-tests. Instead, you should use a two-step procedure -- an Analysis of Variance (ANOVA) followed by a multiple comparison test. The first step (the ANOVA) answers the question, “Is there at least one mean that is significantly different from one other mean?” If the p-value for the ANOVA results is less than your chosen significance level (usually 0.05), you have evidence that at least one mean is different. If the p-value is not significant your analysis is over and you conclude that there is no difference between any pair of means.
If the ANOVA’s p-value is significant then you proceed to part two of the analysis to answer the question, “Which means are significantly different from which other means?” This is the multiple comparison step. This procedure compares all possible pairs of means and tells you which pairs are significantly different under some selected probability level (usually 0.05). Although this technique resembles multiple t-tests, the difference is that the probability levels are controlled to account for the multiple tests.
Most people use computer programs to perform the Analysis of Variance and multiple comparison tests. For each program, there are usually several comparison tests to select from. Popular multiple comparison procedures include Newman-Keuls, Tukey, Bonferonni and others. Although these techniques differ slightly, the purpose of all of them is to control the significance level for multiple tests. Which one should you use? A good rule of thumb is to check your literature to see which test is most often used in your discipline. However, there are special cases where certain tests are preferred. You should consult a professional statistician to determine which test is best for your data.
Although the examples used here describe an analysis that compares means, multiple comparison tests are also used for a variety of other statistical procedures – such as in comparison of proportions. Therefore, any time you are tempted to perform multiple statistical tests for three or more groups, consider using a multiple comparison technique instead.
© Copyright TexaSoft, 2007
This page was last edited: 09/28/2007