Crosstabulation Analysis (Chisquare)
Crosstabulations can be used to perform a chisquare test for independence or a chisquare test for homogeneity. A twoway table is constructed that displays the number of counts for each category. It must be possible to assume that the data observations are independent and that each data value can be counted in one and only one category. It is also assumed that the number of observations is fixed. SDA allows you to enter data for a twoway table from the keyboard or from a data set.
You can enter data for this analysis using
 Enter from data set (data are raw counts, one record per observation)
 Enter summarized data from keyboard
 Enter from a "count" data set (data are summarized counts)
Examples of each are provided here:
Example 1: Entering Data from a Data Set
(Analyze/Crosstabs, Frequencies, ChiSquare/ Crosstabulations/ ChiSquare)
If you choose to enter the information from a data set, you will be prompted to indicate what tables are to be calculated. Select one or more fields for the “Data field” (top right hand list box) and select one or more fields for the “By Var” field (bottom right hand side list box).
For example open the data file SALARY.SDA (salaries of professors at a college), produce a table of RANK by SEX.
Step 1: Select Analyze/Crosstabulations, Frequencies, ChiSquare/Crosstabulations, ChiSquare.
Step 2: For the variables to use, select Rank and Sex as shown here:
For all tables, you are prompted to specify what output options you want included in the output tables:
 Frequencies
 Total Percent
 Row Percent
 Column Percent
 Expected Values

 Chicontribution
 Residual
 Standardized Residual
 Adjusted Residual

For this example, select the “Expected Values” option. Click OK and the following output is produced:
RANKS(rows) by SEX (columns)
FREQUENCY
EXPECTED  1 2 TOTAL

1 7 20 27
 10.3 16.7

2 15 33 48
 18.4 29.6

3 27 42 69
 26.4 42.6

4 18 13 31
 11.9 19.1

TOTAL 67 108 175
38.3 61.7 100.0
Statistic DF Value pvalue

ChiSquare 3 7.905 0.049
Phi Coefficient .213
Cramer's V .213
Contingency Coefficient .208
The calculated ChiSquare value is 7.905 with 3 degrees of freedom. The pvalue of 0.049 indicates marginal significance. Assuming the SEX code is 1=Female and 2=Male you can see that in the highest rank (4) there were fewer females than expected (11.9 instead of 18) and more males (19.1 instead of 13). This might indicate a gender bias in how professors are promoted rank.
Question: What to the differences in expected and observed in rank=1 indicate?
This is a test of independence. For this analysis the contingency table looks at two categorical variables from a single sample of one population and tests whether the two variables are related in some way, (e.g., are sex and rank related?) The hypotheses being tested are:
Ho: The variables are independent of each other. (There is no association between them).
Ha: The variables are not independent of each other.
If there is no association them (p is greater than 0.05) it means there is no evidence of bias. A low pvalue indicates rejection of the null hypothesis and in this case implies bias.
WINKS SDA reports both the chisquare statistic and the pvalue. If the expected value in one or more cells is less than 5, the chisquare test may not be valid. A warning to this effect appears on the screen if appropriate. In the case of a 2 by 2 table, Fisher's Exact Test and the chisquare with Yates' correction are also performed and results displayed. Note: Tables as large as 15 columns by 100 rows may be created by reading data from a data set. If there are more categories than this, SDA combines remaining categories in a group called REST. To prevent this, you might combine some groups.
Example 2: Entering Data from the keyboard
(Analyze/Crosstabs, Frequencies, ChiSquare/ Crosstabulations/ ChiSquare – From Keyboard)
Data for this example are observations of the number of beetles and bugs on the upper and lower sides of leaves (Zar,1974, page 292).
2 by 2 Contingency Table Data

Beetles 
Bugs 
Upper Leaf 
12 
7 
Lower Leaf 
2 
8 
To perform this analysis, follow these steps:
Step 1: Select Analyze/Crosstabulations, Frequencies, ChiSquare/ Crosstabulations, ChiSquare  From Keyboard.
Step 2: You are first prompted to select output options. For this example, just select Frequencies. You are then prompted to indicate the size of the table. When asked for the number of rows and columns, type 2, 2 and press Enter. An empty table appears. Enter counts for each category into the appropriate cell, and choose Calculate. Preliminary results appear on the status bar a the bottom of the screen. You can perform calculations on several tables, and all results will appear in the viewer when you select Exit.
2Way Contingency Table
FREQUENCY
  TOTAL


12 7 19


2 8 10

TOTAL
14 15 29
48.3 51.7 100.0
WARNING  Some
Expected values less than 5. ChiSquare may not be valid.
Statistic DF Value pvalue

ChiSquare 1 4.887 0.028
Yates'
ChiSquare 1 3.312 0.069
Fisher's Exact
Test (onetail) 0.033
(twotail) 0.050
Phi
Coefficient .411
Cramer's
V .411
Contingency
Coefficient .380
Relative
Risk 3.158
Odds
Ratio 6.857 95% C.I.=(1.124,41.829)
Sensitivity .857
Specificity .533
Sensitivity,
Specificity and RR calculations are based on a
table where the
cells are in the following pattern:
TP FP
FN TN
Step 3: The calculated chisquare statistic is reported as 4.89 with a pvalue of 0.028. The chisquare with Yates correction is 3.31 with a pvalue of 0.069 and the Fisher Exact Test (twotailed) has a pvalue of 0.050. Because one of the cells produces an expected value less than 5, SDA gives a warning that the chisquare analysis for this data may not be valid. Given this warning, it is best to rely on the Fisher's Exact Test for making a decision.
A low pvalue indicates rejection of the null hypothesis. At a 0.05 significance level, the Fisher's Exact Test pvalue of 0.050 indicates (borderline) that there is enough evidence to reject the null hypothesis of independence of the two variables and to conclude that leaf side and type of insect are not independent. In this case it appears that beetles prefer the upper sides of leaves and bugs are about split in their preference. In the case of the Yates results, this decision is marginal.
Example 3: Entering Data from Count Data Set
(Analyze/Crosstabs, Frequencies, ChiSquare/ Crosstabulations/ ChiSquare – from count data)
The following data are from a classic study from 1909 reported by Karl Pearson that observed the association between drinking and criminal behavior.
Step 1: Open CROSSTAB_COUNTS.SAV and Select Analyze, Crosstabs, Frequencies, ChiSquare, Crosstab/ChiSquare (From count data.)
Step 2: Select CRIME as the row variable, DRINKER as the column and COUNT as count. Click Ok.
Step 3: From the Options menu select Frequency and Standardized Residual. Click Ok. The following (partial) output is displayed (similar to Example 1.)
CRIME(C)(rows) by DRINKER(N) (columns)
FREQUENCY YES NO TOTAL

ARSON 50 43 93

RAPE 88 62 150

VIOLENCE 155 110 265

STEALING 379 300 679

COINING 18 14 32

FRAUD 63 144 207

TOTAL 753 673 1426
52.8 47.2 100.0
Typical hypotheses tested include:
Test of independence: Ho: There is no association between the two variables.
or Test of homogeneity: Ho: Distribution of each category is same across population.
Statistic DF Value pvalue

ChiSquare 5 49.731 <0.001
Likelihood Ratio ChiSquare 5 50.517 <0.001
Phi Coefficient .187
Cramer's V .187
Contingency Coefficient .184
Since p<=0.05, the null hypothesis (of independence or homogeneity)
is rejected and multiple comparisons are performed.
continues...
Click the Graph option at the top of the sceeen to display the graph grouped by Drinker within Crime.
End of tutorial
For more information including explanation of options go to next tutorial.