| WINKS Manual Index | Help | Home | Tutorials |

WINKS Online Manual


Chapter 3  

A Review of Statistical Concepts

Introduction

This chapter discusses some of the statistical concepts used in WINKS. If you are a little rusty on statistical nomenclature or on how to interpret the results of statistical tests, this chapter will provide a review of these concepts. If you are familiar with statistical concepts and tests, you may skip most of this chapter without missing any vital information about using WINKS. Here is an outline of what this chapter contains :

Using Statistics to Analyze Information

Today's world is filled with information. The computer has enabled us to gather and create more information than you can possibly remember and understand. Computers contain information such as company sales, bank balances, opinions on products -- an almost innumerable list of numbers and figures. What can you do with it all?

Usually, you do not want to look at the "raw numbers" that have been collected. There is simply too much to comprehend. What you want is a summary of the information. You want to reduce thousands of numbers into a few explanatory numbers that will give you an idea of what is going on. For example, you could look at the daily sales figures of Mary and William's Lemonade stand (365 numbers), and get some idea of the range of sales, but wouldn't you rather just have a few numbers such as:

Total yearly sales: $12,521
Average monthly: $1,043.42
Lowest month: $543.04 
Highest month: $1623.21

Perhaps Mary and William actually had two stands. One day they operated on the corner of PENN and BRYAN and on other days they operated at the corner of MAIN and BROAD. They want to know which location is better. Again, you could look at the raw numbers, but you probably would rather know:

Average weekly at PENN and BRYAN was $302.32
Average weekly at MAIN and BROAD was $178.29

Now you have some evidence to make a decision about which place was better.

These two examples illustrate two major aspects of statistical analysis - description and comparison. Another aspect of statistical analysis that is often used is examining the association between variables. For example, you might be interested in the relationship between the high temperature for the day and the amount of sales. You might suspect that the hotter the temperature, the more lemonade sales. Would you rather investigate this by looking at 365 temperatures and 365 sales figures, or would you rather be able to look at one or two numbers that would confirm whether or not this relationship exists? A measure of the strength of the linear relationship between two independent variables is called correlation. The procedure that allows you to predict the value of a dependent variable given one or more independent variables is regression.

Usually, it is impossible to gather responses from the entire population under investigation. For example, you may wish to investigate the relationship between temperature and sales for all lemonade stands in the city one summer, but do not have the time to keep records on all of them. In such a situation, a random sample is taken and statistical analyses are done on the sample in order to test certain hypotheses about the population. That is, you might randomly choose a few of the stands and analyze their records.

Three ways in which statistics are used to analyze information are description, comparison and association. The procedures in WINKS allow you to summarize information, display it graphically and perform these kinds of analyses on your data. The following sections describe the process of performing a statistical analysis and interpreting your results. Further explanation and examples are found in Chapter 4.

 

Summarizing Information with Statistics

Information comes in a variety of forms. For example, you may have a list of heights of all boys in a PE class. You may also have a count of how many have black hair, how many blond, how many brown and how many red.

There are two very different kinds of information. The first type is often called quantitative or measurement data, since the numbers used in measuring height measure a quantity (where averaging makes sense). The hair color data are often called qualitative data -- color names some quality, but it has no rank or order. Black hair does not come before red hair, etc. Although there are finer ways of describing data, quantitative and qualitative will suffice for this discussion.

 

Describing Quantitative Data

You often hear information from the news media such as . . . the average miles per gallon for a Ford is Z . . . the average height of 10 year old girls is W inches, and so on. These statistics are all descriptive. They give us an idea of the magnitude of some measure of location - often called the central tendency of the distribution ( i.e., a collection of observations of quantitative values of interest.) The arithmetic average is often used as the measure of central tendency, although there are other measures of central tendency that can be used, such as the median or mode.

The measure of central tendency does not give the whole picture. For example, you know that if a reporter says that the average rainfall for June is 5.25 inches that this number is an average, and that the actual rainfall for June is likely to be lower or higher than this average. Suppose you were also told that the rainfall is usually somewhere between 3 inches and 8 inches. This range of likely rainfalls is called a measure of dispersion. Dispersion gives us some indication as to how close you might expect an occurrence (rainfall in June) to fall to its measure of central tendency. If the weatherman tells us that in most years, June rainfall is between 3 and 8 inches, then you would tend to believe that a year with a 12 inch rainfall in June would be a rare event -- but it could happen.

The mean: When using statistics to describe a collection of quantitative observations, you often use the mean as the measure of central tendency and the standard deviation as a measure of dispersion. The mean is also known as the arithmetic average. Thus, if you have 4 numbers:

5, 3, 4 and 4

the mean is calculated by adding up the numbers to get 16, then dividing by 4. The mean for this group of numbers is 4.

The median: Another commonly used measure of central tendency is the median. This number is calculated as the "middle" number in a list of numbers, when the numbers are ranked from smallest to largest. Thus, if you rank our current data, you would get

3, 4, 4, 5

Since there are an even number of data points, the median is the average of the two middle numbers. In this case, the median is

(4 + 4) / 2

or 4 -- which happens to be the same as the mean, but the median is NOT always the same as the mean.

For example, the set of numbers

2,2,3,4,4,5,29

has mean 7 and median 4 (the middle number). Notice that the medians of these two data sets are the same, 4, but the means are different. As illustrated by the second set, the mean is more susceptible to extreme values. Sometimes the median more accurately describes the majority of the data.

The range: In addition to a measure of location, another statistic  is often used as a measure of dispersion (to indicate the spread of the data.) There are several measures of dispersion to choose from. A very commonly used statistic is the range of the data. The range is the largest number minus the smallest number. For example, in the first data set above, the range is 5 minus 3, or 2. In the second data set, it is 29 minus 2, or 27.

If you know that the mean of the data is 4 and that the range is 2, then you know that the data are "tight" around the mean. However, suppose you were told that a river bed had an average depth of 3.5 feet. Would you wade across? Maybe. Then, what if you were told that the range of depths was 14 feet? That would mean that somewhere in the river, there was a spot well over your head. Now would you walk across? You can see that a measure of dispersion is important in examining the distribution of a set of data.

 

The Standard Deviation: Another very commonly used measure of dispersion is the standard deviation, which measures the average difference (deviation) of the numbers on a list from their mean. The standard deviation is especially descriptive for a normal distribution, discussed later.

Percentiles and Box and Whiskers Plot:Another statistic that is used to help understand the distribution of the data is the percentile. The median is called the 50th percentile, which means that 50% of the data are below the median and 50% are above the median. In the same way the 25th percentile is that number where 25% of the data are below that number and 75% are above. The 75th percentile is similar. Fifty percent (75-25) of the data lie between the 25th and 75th percentiles. Suppose you have the following numbers (already ranked):

1,3,5,5,6,6,6,7,7,7,8,8,8,9,9,9,11,11,12,13,16

and you know the five percentiles described above are:

1, 6, 8, 11, and 16

then you know that about 50% of the data fall between 6 and 11, and that the range is 16 - 1 = 15 and that the middle of the data (median) is 8. These five numbers are approximately what make up the Tukey five number summary  (See Hoaglin, Mosteller, Tukey, 1983 for actual formulas.) You can draw a picture of this information by using a box and whiskers plot  as illustrated below:

The left end (whisker) of this plot represents the bottom fourth of the data, the box represents the middle 50%, and the right whisker represents the top fourth of the data. The + locates the median. The median does not have to fall in the center of the box. If the data are skewed (which means that the data are clumped somewhere other than in the middle of the range) then the median (and the box) may be off center. For example, a box plot that looks like the following:

would indicate a distribution where most of the data are clumped together at the low end of the range. This tells us that there are a lot of low numbers, and a few high numbers. Sometimes there are numbers that fall much higher or lower than would be expected. These are called outliers, and are not included in the whiskers, but appear as individual points in the plot. The plot below contains some outliers:

    

Histogram: Another graphical representation of the data is a histogram. A histogram is a bar chart in which the data are organized into groups (intervals of the continuous possible outcomes). For example the data above could be divided into 6 intervals:

Interval         Values in Interval              Number of Values
0-3                  1                                              1
3-6                  3,5,5                                       3
6-9                  6,6,6,7,7,7,8,8,8                    9
9-12                9,9,9,11,11                            5
12-15              12,13                                     2
15-18              16                                           1             

(Interval includes lower, but not upper, boundary)

To make a histogram, a bar is drawn for each interval with the heights of the bars representing the number of data values (or frequency) that fall into the intervals. Comparing the bars to one another gives an idea of the proportion of total observations in each interval and thus gives an overall view of the distribution of the data.

The shape of the distribution becomes important when you are selecting the kind of statistical analysis to use. For example, many statistical procedures (parametric procedures) expect the data to have a near normal distribution, such as illustrated by the first box plot above.  If the data are far from normal, you might need to use other kinds of statistical procedures (non-parametric procedures).

The Normal Distribution: A normal distribution has a graphical representation shown in Figure 3.1. A distribution curve, such as this, is continuous with the area under the curve equal to one. The higher the curve in a given area, the more likely it is that the x's in that area will occur in that area. Most people are familiar with the concept of the bell shaped curve. Notice that it is symmetrical, with the mean located at the center of the curve.

 


Figure 3.1

The bell shaped curve illustrates the distribution of a normal population. Most values are clumped at the middle of the range, with the rest trailing off into symmetric tails to both sides. The exact shape of the curve depends on the mean and standard deviation of the distribution. The mean tells where the peak (center) of the curve is located (measure of location or central tendency) and the standard deviation tells how spread out the curve is. A normal distribution with a small standard deviation will be more peaked and one with a large standard deviation will be flatter. A standard normal distribution has mean 0 and standard deviation 1.

 

Using Histograms to Examine Data Distributions 

Figure 3.2 shows a box-and-whiskers plot, a histogram and a distribution curve for three different distributions labeled a, b, and c. The top set of data (a) are near normal. The box-and-whiskers plot has about equal tails and the histogram shows most data in the middle with data trailing off symmetrically in both tails.

The next two distributions (b and c) illustrate various levels of skewness, and how the box- and-whiskers plot and histogram will look. Usually, you can't know the exact distribution of the data being investigated, but using the WINKS descriptive statistics procedure and histogram, you can get a good idea of the distribution and make decisions about what kind of statistic would be appropriate to use to describe the data or to perform a statistical test.

When the data are symmetrical (and near normal), the mean is usually chosen as the statistic to use to measure central tendency. If the data are skewed, the median is often the statistic of choice.  This is because the mean is much more sensitive to extreme values than the median. A few extreme points can pull the mean well away from the main cluster of data. If there is more than one clump of data along the range, then neither of these measures may be as descriptive as desired.

Selecting a Measure of Dispersion

If the data are near normal, the measure of dispersion that is typically used is the standard deviation. This statistic is not simple to calculate (that's why you let the computer do it). The standard deviation can be used to place intervals around the mean where you would expect a certain percentage of the data to fall. For example, in a "normal" set of data, it is known that the range of points that consists of the mean plus or minus one standard deviation contains about 68% of the entire data set. The mean plus or minus two standard deviations contains about 95% of the data. Thus, if you know that the mean of a set of data is 20, and the standard deviation is 3, you can readily predict that the vast majority of the points may be expected to (about 95%) lie between 14 and 26 (20 plus and minus 2 standard deviations). For normally distributed data, reporting the mean and standard deviation (and the sample size) is usually sufficient to describe the distribution of the data.

In summary, you can look at a box-and-whiskers plot and a histogram of the data to determine if the data are well approximated by a normal distribution. If there is a question about what kind of test is appropriate, use this criteria: If the data are near normal, use the mean and standard deviation to describe its distribution and use parametric comparison procedures. If the data are non-normal, you may want to describe it with some other measure such as the median, the five-number summary, box-and-whiskers plot or a histogram and use non-parametric comparison procedures.


Application to WINKS

Procedures in WINKS that expect data to be quantitative include:

GRAPHS:
    All graphs except labels, frequencies and grouping variables

DESCRIPTIVE STATISTICS:
    Detailed or summary statistics on a single variable

T-TESTS,  ANOVA  AND NON-PARAMETRIC GROUP COMPARISONS:
    All variables except grouping variables

SIMPLE AND MULTIPLE REGRESSION AND CORRELATION:
    Dependent variable, some independent variables

SURVIVAL ANALYSIS:
    Survival rates

TIME SERIES ANALYSIS:
    Time Series Observations

QUALITY CONTROL CHARTS:
    All data except counts, labels or grouping variables


Describing Qualitative Data

Another commonly used type of data is qualitative data (sometimes called nominal, categorical or attribute data). These data typically name some attribute such as sex, eye color, pass/fail, yes/no and so forth. The data are usually not ranked. For example, blue eyes are not "greater" or "less" than brown eyes. Some nominal data may have rank such as socioeconomic class when divided into five groups 1,2,3,4,5. However in this case, group 2 is not twice as "rich" as group 1. The numbers 1,2,3,4,5 are simply convenient labels for groups. Thus means of this data would probably not make sense, so it is treated as qualitative, or categorical data. A single data set of qualitative or categorical data is often described in terms of frequencies. That is, the analysis describes how many observations fall into each group (category). For example, if you were collecting information on batters where L=left handed, R=right handed and E=either, and the data are:

L L R E L R R R R R E R R R L R R L E R R R R R R

then you might summarize the data with the following information:

Type                           Count             Percent
Left-handed              5                      20%
Right-Handed          17                    68%
Either                        3                      12%

WINKS will provide the counts and percentages in the crosstabulations procedure. To graphically describe these numbers you could use a bar chart, pictograph, or a pie chart (also in the crosstabs procedure). A bar chart of this batting  information from WINKS  is illustrated in Figure 3.3. 


Figure 3.3

Application to WINKS

WINKS procedures that expect data to be qualitative, or categorical are:

GRAPHS:
    Frequencies or grouping variables

DESCRIPTIVE STATISTICS procedure:
    Grouping variable for summary statistics to calculate statistics by group

T-TESTS, ANOVA, NON-PARAMETRIC COMPARISONS:
    Grouping variables for independent group analysis.
    Cochran's Q test (must be dichotomous)

CROSSTABULATIONS procedure:
    Frequencies and Crosstabulations analysis
    McNemar's test (must be dichotomous)
    Goodness of Fit

SURVIVAL ANALYSIS:
    Grouping variable for survival analysis and censoring variable

QUALITY CONTROL:
    p-Chart


Investigating Associations Between Variables

The WINKS procedures used to investigate linear relationships between quantitative variables are regression and correlation. Crosstabulations, or contingency tables, are used for this purpose for qualitative variables.

Describing a Linear Relationship Between Quantitative Variables

You may be interested in examining the relationship between two quantitative variables rather than just looking at one variable independent of another. For example, going back to the lemonade stand, you might be interested in examining the relationship between temperature and amount of sales. Your data might look something like this:

    Temp          Sales
    80                203
    83                210
    90                291
    78                170
     64               91
    99                378
    etc.              etc.

You might surmise from browsing through the data that hot days bring more sales (in general). However, if the data were not as obvious, you may want to calculate a number that would summarize this information, and you would probably like to draw a picture (a scattergram) to visually look at the trend.

Pearson Correlation Coefficient: If the data for temperature and sales are quantitative and approximately normally distributed and the relationship is linear, then the statistic that would tell you the strength of the linear relationship is Pearson's r (the correlation coefficient). This statistic ranges from -1 to 1. If the number is close to 1 or -1, it means that there is a strong association or correlation between the two numbers. If the correlation coefficient is close to 0, it means that there is a weak association or no correlation. In the case of the data above, the correlation is 0.92. This is a high positive correlation and tells us that there is strong association or correlation between temperature and sales. As temperature goes up, so do sales. (A negative correlation coefficient (for example -0.87) would imply that as the value of one variable increased, the value of the other decreased.)

Spearman's Correlation: If the data you are using are not close to normal, the correlation coefficient you should use is the Spearman's rs. This coefficient does not make the assumption that the data are normally distributed. Ranks of data, rather than data values themselves, are used to calculate Spearman's rs. Spearman's coefficient also ranges from -1 to 1, and is interpreted similarly to the Pearson's coefficient. A value of 1 results from a perfect direct correlation of the ranks of the data, and a value of -1 shows perfect inverse correlation of the ranks.

Scatterplot: The correlation coefficient is usually not sufficient to tell the whole story about the relationship between two variables. Two variables may have a high degree of association but not in a linear relationship, in which case Pearson's r is not accurate. To check for linearity, it is always recommended that you examine a plot of the data, where each point is plotted on a graph having one variable on the horizontal axis, and the other variable on the vertical axis (such as temperature by sales.) This is called a scatterplot. Figure 3.4 illustrates what that might look like for the lemonade stand data. The scatterplot visually shows you the relationship between the two variables. If there is a strong linear relationship, the scatter will be in an approximate straight line. The more widely scattered, the smaller the correlation coefficient will be. Also, the scatterplot helps you detect points that do not fit the general pattern. A point that falls in an area where there are no other points may indicate a data entry error or an "outlier" -- an unusual point not expected. For example, if Mary and William went on vacation in the summer, there might be a point where the day was very hot, but there were no sales -- this would show up as an unusual point on the scatterplot. 


Figure 3.4

The correlation coefficient and the scatterplot combined will usually give you an indication about the strength of the linear relationship between two variables. WINKS provides a correlation calculation procedure in the Regression and Correlation procedure, which allows you to calculate a Pearson's and Spearman's correlation coefficient and to produce a scatterplot. The Descriptive Statistics and Graphs procedure also allows you to display a scatterplot.

Regression: Another procedure used to investigate the linear relationship between two quantitative variables is regression. Regression differs from correlation in that regression assumes that one of the variables is dependent on the other, while correlation assumes that both variables are independent. It may be that both are influenced by other factors and correlation will tell you whether they tend to be associated with each other without assuming that one causes the other.

Regression is useful for predicting responses for the dependent variable given the value(s) of the independent variable(s) within the range of the data, provided the necessary assumptions are met. For example, if appropriate, regression would allow you to predict lemonade sales for a given temperature.

Like correlation, regression assumes that the relationship between the variables is linear. Regression procedures also assume that the values of the independent variable (X) are fixed (without error) and that, for a fixed X value, the population of Y values (values of the dependent variable) is normally distributed and that all these normal distributions have equal variances (i.e., the variation in Y for a given X is equal for all X's). You can check these assumptions using residual (i.e., difference between observed and estimated Y) plots. If the residuals plotted against an independent variable show a pattern other than a horizontal band of points randomly scattered about zero, these assumptions may be violated.

A t-test is used in simple linear regression analysis to test for significance of the slope of the regression line, the line of least squares drawn through the scatterplot. This t-test checks whether the slope of the regression line is zero, a test equivalent to whether the correlation is significant. These tests tell you whether there is or is not a statistically significant linear relationship between the variables.


Application to WINKS

Correlation and regression procedures in WINKS that expect data to be of the quantitative form are:

DESCRIPTIVE STATISTICS:
    Scatterplot

REGRESSION AND CORRELATION:
    Pearson's and Spearman's correlation coefficient
    Regression Analysis
    Correlation Matrix


Describing a Relationship Between Qualitative Variables

If you are describing two pieces of qualitative information for each observation, you report the information with a frequency table. For example, if you checked the players of a softball team for batting side preference and gender you might report:

 

Male

Female

Total

Left Handed

3

2

5

Right Handed

10

7

17

Either

2

1

3

Total

15

10

25

A graph of this information could be created by using the WINKS 3 dimensional bar chart, as illustrated in Figure 3.5.

 
Figure 3.5

The 3-D graph allows you to visualize the relationship between the two variables and may help you see important patterns in the data. The WINKS Crosstabulation procedure allows you to create a table of counts between two variables and will allow you to create a 3-D bar chart of the resulting crosstabulation table. In addition to describing and displaying information about data in categories, a chi-square test procedure can be used to test whether there is a statistically significant association (see "Using Statistics to Make Comparisons") between the two variables or if they are independent of each other. A chi-square test assumes that observations are independent of one another and that each observation can be assigned to one and only one category.

For more information about analyzing this type of relationship, see the discussion of Crosstabulation analysis later in Chapter 4.

Application to WINKS

Association procedures in WINKS that expect data to be of the qualitative (categorical) form are:

FREQUENCIES AND CROSSTABULATIONS procedure:
    Crosstabulations, Chi Square

ADVANCED ANOVA procedure (WINKS PROFESSIONAL):
    Factors (Grouping Variables) in ANOVAs

 

Using Statistics to Make Comparisons

The previous sections discuss how statistics are used to summarize information into a few descriptive numbers and to examine associations between variables. Another common use of statistics is to help you make decisions about comparisons. For example, medical research is often interested in developing new ways of treating illnesses. A researcher may want to compare the effectiveness of one medicine against another. Usually, this results in an experiment where subjects are randomly assigned to two groups. For example, one group is given medicine 1 and the other is given medicine 2. Information is collected about the effect of the medicine on each individual. The purpose of the test medicine is to relieve headaches. If medicine 1 relieved headaches in an average of 33 minutes and medicine 2 relieved headaches in an average of 32.5 minutes, is there enough evidence to say that medicine 2 is "better?" Is a half a minute "significant?" Or, is the difference merely due to chance? A comparative statistical analysis is designed to allow you to answer these kinds of questions with some idea about the strength of the evidence on which your decision is based.

A variety of statistical tests are available for making comparisons of measures of location (mean or median) depending on the type of data and the number of treatment groups being compared. A single sample t-test is used for comparing a sample average to a known or hypothesized population mean. A two sample t-test is used to compare the means of two independent groups, and analysis of variance (ANOVA) is used to determine the existence of differences among the means of two or more independent groups. If the data are paired, a t-test is used (actually a single sample t-test on the average difference between observations in a pair). A repeated measures ANOVA is used for repeated measures on more than two groups. Multiple comparison procedures detail comparisons of more than two means, providing information about where differences lie.

Non-parametric procedures are available for comparing groups whose distributions cannot be assumed to be normal or where assumptions of equal variances are not met. WINKS uses the Mann-Whitney procedure for two independent groups, the Kruskal Wallis for more than two independent groups, and Friedman's test for repeated measures. These non-parametric procedures use the ranks of the data values, rather than the data values themselves, and are comparisons of medians, rather than means, of the groups. If data are dichotomous repeated measures, Cochran's Q and McNemar's tests are used to compare proportions of "successes" in the groups.

Performing a Statistical Test

As noted above and throughout this tutorial, statistical tests are used in a variety of situations to test hypotheses--about equality of means, equality of variances, significance of correlation or slope of the regression line, equality of proportions in categories (i.e., significance of association between categorical variables), equality of medians.

There is a standard method for using statistics to make decisions. All of the statistical tests in WINKS can be interpreted using the method discussed here. Generally, after checking that the appropriate assumptions are met, the steps in using a statistical test to make a decision are the following:

1. State a null hypothesis (Ho) (and usually an alternative hypothesis (Ha)).

2. Perform an analysis to test the hypothesis (the statistical test).

3. Interpret the test and make a decision, using a decision criteria based on the probability that the null hypothesis has been satisfied.


Stating the Hypotheses

A null hypothesis (sometimes called an hypothesis of no difference) usually states just the opposite of what you hope to show or suspect is true about the population parameters in question. The reason for this is that a statistical test results in a decision to reject or fail to reject the null hypothesis and it is usually considered best to make it difficult (by requiring sufficient evidence) to reject the null hypothesis. This way, accepting a change requires a significant amount of evidence. Changes or differences that may be due simply to chance rather than treatment differences are not easily accepted.

For example, in the headache medicine example the hypotheses are set up to assume that the new medicine is no better (or worse) than the old, and evidence is required to decide otherwise. To obtain this evidence or determine a lack of it, the experiment described above is done. As usual, it is impossible to test the medicines on the entire population of potential patients, so sample groups are tested. The observations on the samples are then used to test the hypotheses about the populations.

The null hypothesis might be stated as:

Ho: There is no difference between the mean times to relief in patients using medicine 1 and those using medicine 2.

The alternative hypothesis states the conclusion you will make if there is enough evidence to reject the null hypothesis. An alternative hypothesis:

Ha: There is a difference between mean time to relief for medicine 1 and medicine 2.

That is, the mean time to relief is different for the medicines.

This kind of alternative hypothesis is called two-sided because it allows for differences in either direction -- medicine 1 could be better OR worse than medicine 2. You could also have a one-sided hypothesis that the mean time to relief for medicine 2 is better (shorter) than that for medicine 1.


Performing the Analysis

How do you decide if you have evidence to reject the null hypothesis, and thus be able to show evidence for the alternative? That is, how do you use statistics to decide which of the two hypotheses to choose? The statistical test is the tool you use to make this decision.

To test an hypothesis about a population parameter (e.g., the mean), a test statistic is calculated which compares the observed data (e.g., sample average) and the expected value of the population parameter when the null hypothesis is true. If the difference between observed and expected values is large, that is, if the test statistic is extreme, it is taken as evidence to suggest that the null hypothesis is not true. If the test statistic is extreme enough, the null hypothesis is rejected and the alternative hypothesis accepted. How extreme the test statistic must be to reject the null hypothesis depends on the chosen significance level (alpha-level) of the test.

Given a significance level, and having stated an alternative hypothesis, a critical region is defined. The critical region is the range of values of the test statistic that will cause you to reject the null hypothesis. See figure 3.6.

 
Figure 3.6

When the null hypothesis is true, the test statistic follows the distribution from which the test takes its name, for example, Student's t distribution (t-test), normal distribution (z-test), F distribution (F-test) or chi-square distribution (chi-square test). Extreme values of the test statistic (those far enough from the center to cause rejection of the null hypothesis) will fall in the tail(s) of the distribution curve. (Thus, tests are sometimes called one-tailed or two-tailed, depending on whether the alternative hypothesis is one-sided or two-sided.) The significance level determines the size of the rejection region, that is, how much of the tails are extreme enough to cause rejection. If the alternative hypothesis is one-sided and alpha is 0.05, the rejection region is the most extreme 5% of the area under the distribution curve in one tail. If the test is two-tailed, alpha is divided between the two tails and each tail's portion of the rejection region has area 0.025. Thus, if the test statistic falls in the most extreme alpha percent of the distribution (the critical region), the null hypothesis is rejected. The least extreme value of the test statistic for which rejection occurs is called the critical value.

Figure 3.6 shows a Student's t distribution with 30 degrees of freedom. For a two tailed test with a significance level of 0.05, the critical values are 2.042 and -2.042 and the critical region is the area (values of t) "outside" these critical values, that is, to the right of 2.042 and to the left of -2.042. The area of the critical region (the shaded areas combined) is 0.05, the significance level (alpha level) of the test.

A calculated t-statistic of 1.34, which falls between the critical values, is not in the critical region, and therefore does not lead to rejection of the null hypothesis. By contrast, a calculated t-statistic of 2.91 falls in the critical region and therefore indicates rejection of the null hypothesis.

If your alternative hypothesis is one-sided, care must be taken with respect to the sign of the test-statistic. The direction in which a test-statistic must be extreme in order to signal rejection of the null hypothesis depends on the alternative hypothesis.

When compared to the distribution of the test statistic under the null hypothesis, a probability of obtaining that value of the calculated test statistic (or a more extreme value) is obtained. This is called the p-value. The p-value ranges from a minimum of 0.0 to a maximum of 1.0. If the p-value associated with a test is small, there is evidence to reject the null hypothesis and accept the alternative. Many people use the value of 0.05 for the significance level to decide if they will reject the null hypothesis. That is, if the p-value for a statistical test is 0.05 or less, they reject the null hypothesis and conclude that there is evidence to support the alternative hypothesis. In WINKS, most procedures report both the value of the test statistic and the p-value. For example, you might see the following results for a t-test:

t = 2.06   D.F. = 30  p = .048


Interpreting the Test and Making a Decision

In the t-test reported above, the t-statistic was calculated to be 2.06. The degrees of freedom for the test (D.F., which is a value that defines the shape of the t-distribution and is related to the sample size) is 30 and the p-value is .048. A decision can be made by comparing the calculated test statistic to the critical value determined by the significance level, sample size, and hypotheses, and available from tables of the probabilities associated with the values of the distribution. A test statistic more extreme than the critical value points to rejection of the null hypothesis because it means that the observed mean (or observed difference of means) is too different from the hypothesized mean (or hypothesized difference of means.)

The 0.05 alpha-level critical value for a two-tailed t-test with 30 degrees of freedom is 2.042. That is, when the two population means are equal as hypothesized (under the null), there is a 5 in 100 chance of obtaining a t-statistic greater than 2.042 or less than -2.042. Thus, if the test statistic is greater than 2.042 or less than -2.042, the null hypothesis is rejected since there is only a small chance (less than 5 in 100 when the null hypothesis is true) that you would obtain evidence as strong or stronger against the null. In this case, the t-statistic is 2.06, greater than 2.042, so the null hypothesis is rejected.

Alternatively, a decision can be reached using the p-value. A low p-value points to rejection of the null hypothesis because it indicates how unlikely it is that a test statistic as extreme as or more extreme than the one given by this data will be observed in a sample of this size from this population if the null hypothesis is true. In this case p=0.048. This means that if the population means are equal as hypothesized (under the null), there is a 48 in 1000 chance that a more extreme test statistic will be obtained using data from this population. That is, the difference between these observed averages is so far from zero that the chance of getting differences farther from zero is less than 48 in 1000.

Is 48 in 1000 small enough? That is, does p=0.048 indicate enough evidence against the null hypothesis to reject it? It is a researcher's judgement what significance level to use. In this case, if a significance level of 0.05 is used, the p-value of 0.048 is small enough (less than 0.05) to say there is sufficient evidence against the null hypothesis. The t-statistic associated with 0.048 (2.06) is more extreme than the t-statistic associated with 0.05 (2.042). See Figure 3.6.

Perhaps a more stringent criteria is necessary, say a significance level of 0.01. Then the p-value of 0.048 is too big (greater than 0.01) to call for rejection of the null hypothesis. The t-statistic associated with 0.048 is 2.06, which is less extreme than the t-statistic associated with 0.01. The critical value for a two-tailed t-test with alpha = 0.01, with 30 degrees of freedom, is 2.75.

Refer to the t-distribution graph in Figure 3.6 and notice the t-values marked 2.75 and -2.75 in the tails of the distribution. The p-value associated with these t-values is p=0.01.

Suppose a significance level of 0.05 is used. You will reject the null hypothesis and conclude that there is evidence that the alternative hypothesis is true. If this had been the result of the headache medicine, then you could have rejected the null hypothesis that the medicines were equal and made a decision that the medicines had different times to relief.

Since you are comparing only two groups, you can then look at the sample means to see which is preferable, knowing the two have been found by this statistical test to be significantly different.

P-values are convenient in that you don't need a table to find the critical value to  compare to the test statistic. They are useful in that they provide more than simply a reject/fail to reject decision. By comparing the p-value to the alpha level, they provide a sense of the strength of the evidence against the null hypothesis. This allows readers to know more than there is/is not strong evidence against the null. Readers can know for themselves how strong the evidence actually is and use their own judgment in making a decision.


Interpreting Multiple Comparisons

In analysis procedures that compare three or more groups, a multiple comparison test is often performed to identify pairs of groups that are statistically different from each other at a particular significance level (alpha level). There are a number of multiple comparison procedures in existence. WINKS uses a procedure called the Newman-Keuls procedure. This procedure makes pairwise comparisons of groups (usually comparing the means or mean ranks) and specifies which comparisons are statistically different at a particular alpha level (WINKS uses 0.05).

For example, in a One-Way Analysis of Variance with four groups, the test statistic (F-test) will tell you if there is evidence that the means of the four groups are different. However, this does not tell you which group means are less than or greater than other group means -- in other words, you do not know where the differences lie. In WINKS, the multiple comparison procedures produce a graph that tells you where the differences (if any) lie. Consider the following graph produced by WINKS from a Newman-Keuls multiple comparison procedure (comparing means of 4 groups):

    Gp  Gp  Gp  Gp
     1   2   3   4
            -------
    ----
        ----

 The group means are displayed in increasing order. (The mean of group 1 is smallest and that of group 4 is largest.) Any two groups underscored by the same line are not significantly different at the 0.05 level of significance. Look closely at the graph. The top line refers to the first set of groups whose means do NOT differ. In this case, the means of groups 3 and 4 are not statistically different.  There are no other two group means that are NOT significantly different from each other. Therefore, all other pairs of comparisons are statistically different. Thus, you can say
 

§         The mean for group 1 is less than the means of all other groups.

§         The mean for group 2 is greater than group 1 and less than groups 3 and 4.

§         The means for groups 3 and 4 do not differ from each other, but the means from 3 and 4 are both greater than the means of groups 1 and 2.

From this information, you should be able to make decisions about your experiment.


Choosing the Right Procedure to Use

There are many statistical tests, each designed to fit a particular type of experimental setup. WINKS contains the statistical tests most commonly performed in research. The following discussion will help you decide if your data fit one of the analysis procedures within WINKS.

Some of the information you should know about your data in order to choose the correct test are the following:

1. Is your purpose to

§         describe the data

§         compare groups of data to make decisions

§         examine the association between  variables for prediction or forecasting?

2. Are the data quantitative or qualitative (nominal /categorical)?

3. If the data are quantitative, is the distribution of the data approximately normal? Or, is the sample size large enough that the Central Limit Theorem will allow a normality assumption?

4. If you are comparing groups, are the groups independent or repeated measures?

Independent groups are two or more groups that consist of "subjects" randomly assigned to the group, such that members of any group are not related to members in any other group or to members within their own group. That is, observations are all mutually independent.

Repeated measures or paired data means that the data are collected on the same or related subjects. Observations on one subject are, however, independent of observations on any other subject. Examples would be before and after measures (before weight and after weight in a diet study, for example) on the same subject or observations over time on the same entity (blood pressure readings on a patient every 6 weeks for a year, for example).

First, to locate the type of analysis you want to accomplish, locate one of the following sections in the chapter:

§         Methods of Describing Information

§         Methods of Comparing Groups

§         Methods of Relating Variables

Then, within each of these groups, decide on the type of variables being used, and if appropriate, if the data are independent or repeated measures.


Methods of Describing Information

Data are near normally distributed - Use the WINKS Descriptive Statistics and Graphs procedure, Detailed statistics on a single variable and summary statistics on a number of variables option. Most common statistics used to describe the data are the mean and standard deviation. Graphs used to describe the information are the box and whiskers plot and histogram.

Data are quantitative, but not normally distributed - Use the WINKS Descriptive Statistics procedure, Detailed statistics on a single variable option. Most common statistics used to describe the data are the median, and the Tukey five number summary. Graphs used to describe the information are the box and whiskers plot and histogram.

Data are qualitative/categorical - Use the WINKS Crosstabulations, Frequencies, Chi-Square procedure, Frequencies, Pie Chart option. Most common method of reporting is by using a frequency table which includes percents of total for each category. Graphs used to describe the information would be the bar chart, pictograph or pie chart.

Data are quantitative, observed over time or in a sequence - Use the WINKS Descriptive statistics and Graphs procedure's Time series plot to draw a graph of the data.

Describing a linear relationship between two normally distributed quantitative variables - Use the WINKS Regression and correlation procedure, the Correlation option to obtain the Pearson's correlation coefficient. Graph the variables in this procedure or in the WINKS Descriptive Statistics procedure, the scatterplot option.

Describing a linear relationship between two non-normally distributed quantitative variables - Use the WINKS Regression and Correlation procedure, the Correlation option to obtain the Spearman's correlation coefficient. Graph the variables in this procedure or in the WINKS Descriptive Statistics procedure, the scatterplot option.

Describe an association between two categorical variables - Use the WINKS Frequencies, Crosstabulations, Chi-square procedure, the Crosstabulation, Chi-Square option. Print out the results of the table, which will produce a percentages table. Graph the variables as a bar chart with two data fields.


Methods of Comparing Groups

COMPARING INDEPENDENT GROUPS :

Comparing two independent groups/ data are quantitative, near normal- Use the WINKS t-test and ANOVA procedure. When comparing two groups, the procedure used will be the t-test. Two versions of the t-test are reported, an equal variance version and an unequal variance version. First, determine the result of the test for equality of variance using the p-value. Then, use the appropriate t-test. The null hypothesis for the t-test is that the means of the two groups are equal. If the p-value for the t-test is low (e.g., p is less than 0.05), then there is statistical evidence to conclude that the means of the two groups are different. See Chapter 4, "Using t-tests and ANOVA Procedures" for an example of this procedure.

Comparing two independent groups/ data are quantitative but not normal - Use the Non-parametric Comparison Tests procedure, the Independent Groups option. The test performed is the Mann-Whitney test. The null hypothesis for this test is that the distribution of the two groups is the same. See Chapter 4, "Using Non-Parametric Procedures" for an example of this test.

Comparing more than two independent groups/ data are quantitative, near normal- This analysis is an extension of the two independent groups test. Use the WINKS t-test and ANOVA procedure, and choose the compare independent groups option. When there are 3 to 10 groups, WINKS performs a one-way analysis of variance (ANOVA). Data for this analysis typically comes from an experiment in which subjects are randomly assigned to three or more groups, where each group is then observed under some unique condition. An example would be a group of pigs that were randomly divided into a control group (regular feed), a group fed with new feed #1 and a group fed with new feed #2. The pigs were observed for 6 weeks, and the weight gain was recorded for each pig. The question to be answered is, "Is one feed superior to the others in producing weight gain in pigs?" The null hypothesis would be that all feeds produce the same weight gain. When this analysis is performed, an analysis of variance table is produced, which contains an F-test, and reports a p-value. If the p-value for the ANOVA is small (e.g., less than 0.05), then there is evidence to conclude that a difference exists. WINKS then produces a multiple comparison procedure to help you isolate which feed is best. See Chapter 4, "Using t-tests and ANOVA Procedures" for an example of this test.

Comparing more than two independent groups/ data quantitative but not normal - Use the WINKS Non-parametric Comparison Tests procedure, and choose the Independent Groups option. The null hypothesis for this analysis is that all groups have equal distributions. The non-parametric procedure performed is called the Kruskal-Wallis test. If the p-value resulting from this test is small (e.g., less than 0.05), then there is evidence to conclude that a difference exists. WINKS then produces a multiple comparison procedure to help you isolate the differences. See Chapter 4, "Using Non-Parametric Comparative Procedures" for an example of this test.

Comparing two groups/ data are qualitative-categorical - To compare two groups when the data are nominal (categorical), use the WINKS Crosstabulations, Frequencies, Chi-Square procedure, the Crosstabulation, Chi Square option. See Chapter 4, "Using Frequency and Crosstabulation Analysis" for an example of using this procedure as a test for homogeneity.