# Statistical Comparison of Two Groups

**A** common form of scientific experimentation is the comparison of two groups. This
comparison could be of two different treatments, the comparison of a treatment to a
control, or a before and after comparison. The preliminary results of experiments that are
designed to compare two groups are usually summarized into a means or scores for each
group. Once you’ve summarized this data, how do you decide if the observed
differences between the two groups are real or just a chance difference caused by the
natural variation within the measurements? A common way to approach that question is by
performing a statistical analysis.

The two most widely used statistical techniques for comparing two groups, where the
measurements of the groups are *normally distributed*, are the **Independent Group
t-test and the Paired t-test**. What is the difference between these two tests and when
should each be used?

The **Independent Group t-test** is designed to compare means between two groups
where there are different subjects in each group. Ideally, these subjects are randomly
selected from a larger population of subjects and assigned to one of two treatments.
Another way to assign subjects to two groups is to randomly assign them to one of two
treatments at the time they enter a study. This randomization is often performed in a
double-blind fashion.

Besides the normality assumption, another requirement of the Independent Group t-test
is that the variances of the two groups be equal. That is, if you were to plot the
observed data from each of the two groups, the resulting bell-shaped histograms would have
approximately the same shape. Before actually performing the Independent Group t-test, a
statistical pre-test is often performed to verify the hypothesis that the variances are
equal. Options for the unequal variance case are discussed later.

Once the data are collected and the
assumptions to performing the t-test are satisfied, the means of the two groups
are compared. The mathematics for the t-test may be performed by a statistical
data analysis programs such as WINKS. The
determination of whether there is a statistically significant difference between the two
means is reported as a p-value. Typically, if the p-value is below a certain level
(usually 0.05), the conclusion is that there is a difference between the two group means.
The lower the p-value, the greater "evidence" that the two group means are
different. It is the p-value that is usually reported in journal articles to support a
researcher’s hypothesis concerning the observed outcomes for the two groups.

The other commonly used type of t-test is the **Paired t-test**. In this case the
subjects for the two groups are the same or matched. That is, the same subjects are
observed twice, often with some intervention taking place between measures. One advantage
of using the same subjects is that experimental variability if less than for the
independent group case. For example, the researcher may observe weight or cholesterol
levels before and after a treatment has been applied. For this test the mean difference
between the two repeated observations is observed and compared. If the difference is
sufficiently great then there is evidence that the treatment caused some change in the
observed variable. A paired t-test is performed and the observed difference between the
groups is summarized in a p-value.

The benefits of performing a t-test is that it is easy to understand and generally easy
to perform. However, the fact that these tests are so widely used does not make them the
correct analysis for all comparisons. There are a few caveats you should be aware of
before performing these tests. As mentioned earlier, in the Independent Group t-test for
example, if the variances are not equal then a variance stabilizing transformation or a
modification of the t-test should be performed – usually **Welch’s t-test** (a
t-test for unequal variances.) This version of the Independent group t-test takes into
account the differences in variances and adjusts the p-value accordingly. If the data for
either test are not normally distributed then a different kind of comparison test might
need to be employed – a nonparametric test. In the case of Independent Groups, the
nonparametric test usually performed is the **Mann-Whitney test**. For paired data that are
not normally distributed, the Wilcoxen signed-rank test is usually performed. All of these
tests are available in WINKS.

Furthermore, sometime researchers make the mistake of performing multiple t-test when
there are more than two groups in their research. This approach destroys the meaning of
the p-value and results in erroneous conclusions about the data. Instead of multiple
t-tests, there are other statistical approaches to multiple group analysis – namely
the analysis of variance approach.

The decision about what comparison test to use for a particular analysis is of vital
importance to making unbiased and correct decisions about your research results.
Professional papers are often rejected when inappropriate tests are performed on research
data. Therefore you should select your analyses with care and consult a professional
statistician if there are any doubts about what kind of analysis to use.