| WINKS Manual Index | Help | Home | Tutorials |

WINKS Online Manual


Chapter 4 Part 6

Frequency and Crosstabulation Procedures

Frequencies Analysis

In the Frequencies analysis option, WINKS "counts" the occurrence of each data value for a single variable or field and displays that information in a table.

For example, in the EXAMPLE database file, one of the fields (variables) in this database is STATUS -- referring to socioeconomic status. Suppose you want to know how the total data set is divided up into the five levels of STATUS. To perform this analysis follow these steps:

Step 1: Open the database named EXAMPLE.

Step 2 From the Analyze menu, “Crosstabulations, Frequencies, Chi Square,” then “Frequencies.”

Step 3: Select STATUS from the field list and click “Add”. Optionally, you can select a Group field (GROUP) -- then multiple frequency tables will be created, one for each group. Click Ok.

Step 4: WINKS will count the data in each of the five categories of STATUS and display the results in the viewer.

---------------------------------------------------------------------------
Frequency Tables C:\WINKS46P\EXAMPLE.DBF
---------------------------------------------------------------------------
Number of records in database = 50 

------ GROUP = A

Frequency Table for STATUS

					     Cumulative   Cumulative
	STATUS 		Frequency    Percent Frequency    Percent
	-----------------------------------------------------------
	    2 			1       9.09        1        9.09
  	    3 			3      27.27        4       36.36
	    4 			1       9.09        5       45.45
	    5 			6      54.55        11      100.0


------ GROUP = B

Frequency Table for STATUS

					     Cumulative   Cumulative
	STATUS 		Frequency    Percent Frequency    Percent
	-----------------------------------------------------------
	     1 			2        6.9        2        6.9
	     2 			6      20.69        8      27.59
	     3 			2        6.9       10      34.48
	     4 			4      13.79       14      48.28
	     5 			15     51.72       29      100.0


------ GROUP = C
Etc...
 

Performing a Goodness of Fit Analysis

A goodness-of-fit test of a single population is a test to determine if the distribution of observed frequencies in the sample data closely matches the expected number of occurrences under a hypothetical distribution of the population. The data observations must be independent and each data value can be counted in one and only one category. It is also assumed that the number of observations is fixed. The hypotheses being tested are

Ho: The population distribution follows the hypothesized distribution.
Ha: The population does not follow the hypothesized distribution.

A Chi-Square statistic is calculation and a decision can be made based on the p-value associated with that statistic. A low p-value indicates rejection of the null hypothesis. That is, a low p-value indicates that the data do not follow the hypothesized, or theoretical, distribution.

For example, data for this test comes from the  Zar (1974), page 46. According to a genetic theory, crossbred pea plants show a 9:3:3:1 ratio of yellow smooth, yellow wrinkled, green smooth, green wrinkled offspring. Out of  250 plants, under the theoretical ratio (distribution) of 9:3:3:1, you would expect about

(9/16)x250=140.625    yellow smooth peas,
(3/16)x250=46.875     yellow wrinkled peas
(3/16)x250=46.875    green smooth peas
(1/16)x250=15.625   green wrinkled peas

After growing 250 of these pea plants, you observe that

152 have yellow smooth peas
  39 have yellow wrinkled peas
  53 have green smooth peas
    6 have green wrinkled peas

To perform this analysis, use the following steps:

Step 1: From the Analyze menu, select “Crosstabulations, Frequencies, Chi Square” then choose the "Goodness-of-Fit" option.

Step 2: You will be prompted to enter the number of categories. In this case, enter 4 for the four categories of peas (yellow smooth, yellow wrinkled, green smooth, green wrinkled).

Step 3: A dialog box will appear allowing you to enter the observed data and either the expected values or ratios. In this example, click (select) on the check box labeled “Check to enter ratios rather than expected values.” Enter the observed values from the table above:

152, 39, 53, 6

and the ratios

9,3,3,1

Press Tab to move from cell to cell. Click Calculate and the calculated chi-square statistic in this case is 8.97 and the p-value is 0.031 is displayed at the bottom of the dialog box. Click Exit and the output will be displayed on the viewer.

---------------------------------------------------------------------------
Goodness of Fit 
---------------------------------------------------------------------------

	 Obs.    Exp.
	|-------|----------|
1 	|    152|   140.625|
	|-------|----------|
2 	|     39|    46.875|
	|-------|----------|
3 	|     53|    46.875|
	|-------|----------|
4 	|      6|    15.625|
	|-------|----------|
	     250       250. 

Calculated CHI SQUARE = 8.97

Degrees of freedom = 3 Appx p = 0.031
 

At a 0.05 level of significance, this p-value indicates that there is enough evidence to reject the null hypothesis that the observed values follow the theoretical distribution. That is, the test (at the 0.05 significance level) suggests that a 9:3:3:1 ratio of yellow smooth:yellow wrinkled:green smooth:green wrinkled peas is not an appropriate distribution for the population from which these data are taken.

Note: you can perform several analyses in the Goodness of Fit dialog box, and each will be displayed in the viewer when you select End.


Performing a Crosstabulation Analysis (Chi-square)

Crosstabulations can be used to perform a chi-square test for independence or a chi-square test for homogeneity. A two-way table is constructed that displays the number of counts for each category. It must be possible to assume that the data observations are independent and that each data value can be counted in one and only one category. It is also assumed that the number of observations is fixed. WINKS allows you to enter data for a two-way table from the keyboard or from a database.

Entering Data from the Keyboard

When you choose to enter the two-way table from the keyboard (“Chi-Square table for Keyboard” option), WINKS will ask you the size of the table (number of rows and columns). A blank table will be presented on the screen, and you will then be prompted to enter a number in each cell of the table.

Entering Data from a Database

If you choose to enter the information from a database, you will be prompted  to indicate what tables are to be calculated. Select one or more fields for the “Data field” (top right hand list box) and select one or more fields for the “By Var” field (bottom right hand side list box). For example if you select fields A and B in the first box, and C in the second box, the tables A x C and B x C will be calculated.

For all tables, you will be prompted to specify what output options you want included in the output tables:

Percents display row, column and total percent of the number in each cell of the table. The contribution to Chi-Square shows how much a particular cell contributed to the size of the Chi-Square statistic. This often comes in handy when you are trying to discover what may have caused a table to result in a low p-value (a high Chi-Square statistic).

For a test for independence, a contingency table looks at two categorical variables from a single sample of one population and tests whether the two variables are related in some way, (e.g., are sex and hair color related?) The hypotheses being tested are:

Ho: The variables are independent of each other. (There is no association between them).
Ha: The variables are not independent of each other.

A Chi-Square statistic is calculated, with (r-1)(c-1) degrees of freedom where r is the number of rows and c the number of columns. A low p-value indicates rejection of the null hypothesis.

WINKS reports both the chi-square statistic and the p-value. If the expected value in one or more cells is less than 5, the chi-square test may not be valid. A warning to this effect appears on the screen if  appropriate. In the case of a 2 by 2 table, Fisher's Exact Test and the chi-square with Yates' correction are also performed and results displayed. Note: Tables as large as 15 columns by 100 rows may be created by reading data from a database. If there are more categories than this, WINKS combines remaining categories in a group called REST. To prevent his, you might combine some groups.

Data for this example are observations of the number of beetles and bugs on  the upper and lower sides of leaves (Zar,1974, page 292).

2 by 2 Contingency Table Data

                    Beetles   Bugs

Upper Leaf      12         7

Lower Leaf      2           8

To perform this analysis, follow these steps:

Step 1: From the Analyze menu, select “Crosstabulations, Frequencies, Chi Square” and choose "Crosstabulations, Chi-Square - From Keyboard" option.

Step 2: You will be prompted to give the size of the table. When asked for the number of rows and columns, type 2, 2 and press Enter. You will then be prompted to select output options. For this example, just select Frequencies. An empty table will appear . Enter the counts for each category into the appropriate cell, and choose Calculate. Preliminary results will appear on the status bar a the bottom of the screen. You can perform calculations on several tables, and all results will appear in the viewer when you select Exit.

---------------------------------------------------------------------------
Crosstabulations and Chi Square 
---------------------------------------------------------------------------
2-Way Contingency Table
 
FREQUENCY|   |    |  TOTAL
	 ------------------------
	 | 12|   7|   19
 	 ------------------------
	 |  2|   8|   10
	 ------------------------
 TOTAL 	   14   15    29
 	 48.3 51.7 100.0

WARNING - Some Expected values less than 5. Chi-Square may not be valid.

Statistic 		DF 	Value 	p-value 
-----------------------------------------------------------------
Chi-Square 		1 	4.887 	0.028
Yates' Chi-Square 	1 	3.312 	0.069
Fisher's Exact Test (one-tail) 		0.033
		    (two-tail) 		0.111
Phi Coefficient 		 .411
Cramer's V .411
Contingency Coefficient 	 .380
Relative Risk 			3.158
Odds Ratio 			6.857
Sensitivity 			 .857
Specificity 			 .533

Sensitivity and Specficity calculations are based on a
table where the cells are in the following pattern:
TP FP
FN TN
T=True, F=False, P=Positive, N=Negative


Step 3: In the viewer, the calculated chi-square statistic in reported as 4.89 with a p-value of 0.028. The chi-square with Yates correction is 3.31 with a p-value of 0.069 and the Fisher Exact Test (two tail) has a p-value of 0.050. Because one of the cells produces an expected value less than 5, WINKS gives a warning that the chi-square analysis for this data may not be valid. Given this warning, it is best to rely on the Fisher's Exact Test for making a decision.

A low p-value indicates rejection of the null hypothesis. At a 0.05 significance level, the Fisher's Exact Test p-value of 0.050 indicates (on the borderline) that there is enough evidence to reject the null hypothesis of independence of the two variables and to conclude that leaf side and type of insect are not independent. In this case it appears that beetles prefer the upper sides of leaves and bugs are about split in their preference. In the case of the Yates results, this decision is marginal.

Notes on 2x2 Table Statistics

Four new statistics have been recently added to the 2 x 2 crosstabulation option in the Crosstabulation, Frequencies and Chi Square analysis. These are Relative Risk, Odds Ratio, Specificity and Sensitivity.

RELATIVE RISK is given by the formula

       a/(a + b)
RR = ----------
       c/(c + d)

where the two by two table is

            Factor 1
               +   -
            ---------
            +|a |  b|
Factor 2     ---------
            -|c |  d|
            ---------

ODDS RATIO is calculated by

     a / b
OR = -----
     c / d

SENSITIVITY is calculated by

        a
SEN = -----
      a + c

Sensitivity is a measure of the ability to call positive those patients/subjects that have the disease/condition.

SPECIFICITY is calculated by

       d
SP = -----
     b + d

Note: Sensitivity and Specificity calculations are only available on the crosstabulations with data entered from the keyboard. Results are based on a table where the cells are in the following pattern:

    TP | FP
    ————————
    FN | TN

Where T=True, F=False, P=Positive, and N=Negative.

Specificity is a measure of the ability to call negative patients/subjects that do not have the disease or condition. For more information on these four statistics, reference any biostatistics or epidemiology text. One example is Basic Biostatistics in Medicine and Epidemiology, A. A. Rimm, Appleton-Century-Crofts, 1980.

Entering data from a Database

When entering crosstabulation data from a database follow these steps:

Step 1: Open a database. Data to be used may be numeric or character, but they should be categorical such as hair color, sex, etc.

Step 2:  From the Analyze menu, select the “Crosstabulations, Frequencies and Chi-Square” then select “Crosstabulations - Chi Square”

Step 3: Select which tables to calculate, and what options to display in the output.

Step 4: Results will be displayed in the viewer.


Example: Crosstabulation as a test for homogeneity

A crosstabulations analysis may also be used to perform a chi-square test for homogeneity. In this case the two-way table looks at variables in samples from two populations  and tests whether the variables follow the same distribution for both populations, that is, whether the populations are homogeneous.

The test statistic and procedure are the same as for the chi-square test for independence. The data are organized so that the populations being tested for homogeneity are represented as values (groups) of a grouping variable. The hypotheses being tested in this case are:

Ho: The populations are homogeneous.
Ha: The populations are not homogeneous.

To use the crosstabulations procedure to test for homogeneity, let the rows of the table represent the categories of the variable and the columns the different populations. (Of course, for test purposes it doesn't matter whether the rows or columns represent the populations.)

If entering the data from the keyboard, simply enter the totals for each category. If creating a database, you need a field for the categorical variable you wish to use in the test for homogeneity, and a grouping variable which identifies the population from which each observation comes. Each record represents one observation. There may be other fields in the database for other variables in these same populations and you can do separate crosstabulation analyses on them.

Suppose you want to check whether the ratio of men and women is the same in three different departments of a company. You obtain the following data:

             Dept 1   Dept 2   Dept 3
Men          10           45         15
Women      8            22           4

The calculated chi-square statistic in this case is 2.30 with a p-value of 0.317.  A decision can be made using this p-value of the test. A low p-value (less than the chosen significance level) is usually taken to indicate rejection of the null hypothesis. At a 0.05 significance level, the p-value of  0.317 indicates that there is not enough evidence to reject the null hypothesis of homogeneity of the three departments. That is, you cannot conclude that the departments are significantly different with respect to sex of employees based on this test.

McNemar's Test

McNemar's test is appropriate for use with paired, dichotomous data. This test is sometimes called a test for related samples or a test for the significance of changes. It is useful for comparing paired or related observations in which the response is dichotomous, that is, the response is one of only two possible outcomes. McNemar's test is the 2 by 2 version of Cochran's Q test described in the section on non-parametric tests. The test assumes that any pair of observations is independent of any other pair of observations, although clearly the observations within a pair are not independent of each other.

For example, you may wish to know if a certain advertisement has an effect on the impression consumers have of a product. You could select a group of people and check their impressions of the product before viewing the advertisement, and again after viewing the ad. You record the reactions of each person as "favorable" or "unfavorable" both before and after seeing the ad. Thus, there are four categories of before-after responses:

“favorable-favorable", 
"favorable-unfavorable", 
"unfavorable-favorable", 
"unfavorable-unfavorable"

You want to know if there is a difference in the reactions before and after viewing the advertisement. It is possible to use McNemar's test if you have only the totals for each of the four categories, or if you have the record of each individual response. The hypotheses being tested are:

Ho: The proportions (of "favorable" and "unfavorable" reactions) in the two groups (before and after) are the same.
Ha: The proportions in the two groups are not the same.

In other words, you are testing whether there is a change in the number of people who react favorably to the product after seeing the advertisement. The test statistic used is:

Q = (B - C)2/(B + C)

where B and C are the number of "favorable-unfavorable" and "unfavorable-favorable" reactions. This test statistic approaches a chi-square distribution with one degree of freedom. The chi-square statistic is useful if the number of pairs of observations is at least 10.

WINKS displays both the chi-square statistic and the p-value of the test. The p-value associated with this test can be used to make a decision. A small p-value (less than the chosen significance level, e.g., 0.05) is usually taken to indicate rejection of the null hypothesis.

For example, in the test of the effect of an advertisement, suppose 20 people participated with the following results (listed on the next page), where 1 is the code for "favorable" and 0 for "unfavorable". For McNemar's test, the data must be coded as 0 or 1, representing the two possible responses. To perform this analysis, follow these steps:

Step 1: Open the database named MCNEMAR.DBF. If you choose to create the database yourself, you can choose the New Database option from the File menu, then use the pre-defined database structure option named “For paired t-tests or McNemar’s Test.”

Step 2: From the Analyze menu, select “Crosstabulations, Frequencies and Chi-Square” then choose "McNemar's test."

You will be prompted to choose the two fields (groups) you wish to compare. In this case, there are only two fields.  Select BEFORE and AFTER. Then select which options you want to appear on the output table. WINKS will perform the calculations and display the results in the viewer.

The chi-square statistic in this case is 0.57, and the p-value is 0.450. The p-value is large, so the null hypothesis of equal proportions is not rejected. That is, there is not enough evidence to say that there is a difference in the reactions before and after viewing the advertisement.

Comparison of Proportions

To compare proportions select Proportion Comparison in the Crosstabs menu. You will be prompted to enter two proportions (or two observed counts) and the sample size associated with each proportion. For example, for the first proportion enter 10 observations out of 20 and for the second enter 15 observations out of 75. When you select Calculate then Exit, the following results are displayed:

  Proportion(1) = 0.5     Z = 2.707
       Proportion(2) = 0.2     P = 0.007 (two tail)

This tells you that the z-statistic for the comparison of these proportions is 2.707, and the p-value associated with the test is 0.007. Thus, there is a significant difference in these proportions.


 
Continue to Chapter 4. Part 7. (Life Table and Survival Analysis.)  

     


| Previous Section | Next Section | WINKS Manual Index | Help | Homepage |