Crosstabulation Analysis (Chi-square)
Crosstabulations can be used to perform a chi-square test for independence or a chi-square test for homogeneity. A two-way table is constructed that displays the number of counts for each category. It must be possible to assume that the data observations are independent and that each data value can be counted in one and only one category. It is also assumed that the number of observations is fixed. SDA allows you to enter data for a two-way table from the keyboard or from a data set.
You can enter data for this analysis using
- Enter from data set (data are raw counts, one record per observation)
- Enter summarized data from keyboard
- Enter from a "count" data set (data are summarized counts)
Examples of each are provided here:
Example 1: Entering Data from a Data Set
(Analyze/Crosstabs, Frequencies, Chi-Square/ Crosstabulations/ Chi-Square)
If you choose to enter the information from a data set, you will be prompted to indicate what tables are to be calculated. Select one or more fields for the “Data field” (top right hand list box) and select one or more fields for the “By Var” field (bottom right hand side list box).
For example open the data file SALARY.SDA (salaries of professors at a college), produce a table of RANK by SEX.
Step 1: Select Analyze/Crosstabulations, Frequencies, Chi-Square/Crosstabulations, Chi-Square.
Step 2: For the variables to use, select Rank and Sex as shown here:
For all tables, you are prompted to specify what output options you want included in the output tables:
- Frequencies
- Total Percent
- Row Percent
- Column Percent
- Expected Values
|
- Chi-contribution
- Residual
- Standardized Residual
- Adjusted Residual
|
For this example, select the “Expected Values” option. Click OK and the following output is produced:
RANKS(rows) by SEX (columns)
FREQUENCY|
EXPECTED | 1| 2| TOTAL
------------------------
1| 7| 20| 27
| 10.3| 16.7|
------------------------
2| 15| 33| 48
| 18.4| 29.6|
------------------------
3| 27| 42| 69
| 26.4| 42.6|
------------------------
4| 18| 13| 31
| 11.9| 19.1|
------------------------
TOTAL 67 108 175
38.3 61.7 100.0
Statistic DF Value p-value
-----------------------------------------------------------------
Chi-Square 3 7.905 0.049
Phi Coefficient .213
Cramer's V .213
Contingency Coefficient .208
The calculated Chi-Square value is 7.905 with 3 degrees of freedom. The p-value of 0.049 indicates marginal significance. Assuming the SEX code is 1=Female and 2=Male you can see that in the highest rank (4) there were fewer females than expected (11.9 instead of 18) and more males (19.1 instead of 13). This might indicate a gender bias in how professors are promoted rank.
Question: What to the differences in expected and observed in rank=1 indicate?
This is a test of independence. For this analysis the contingency table looks at two categorical variables from a single sample of one population and tests whether the two variables are related in some way, (e.g., are sex and rank related?) The hypotheses being tested are:
Ho: The variables are independent of each other. (There is no association between them).
Ha: The variables are not independent of each other.
If there is no association them (p is greater than 0.05) it means there is no evidence of bias. A low p-value indicates rejection of the null hypothesis and in this case implies bias.
WINKS SDA reports both the chi-square statistic and the p-value. If the expected value in one or more cells is less than 5, the chi-square test may not be valid. A warning to this effect appears on the screen if appropriate. In the case of a 2 by 2 table, Fisher's Exact Test and the chi-square with Yates' correction are also performed and results displayed. Note: Tables as large as 15 columns by 100 rows may be created by reading data from a data set. If there are more categories than this, SDA combines remaining categories in a group called REST. To prevent this, you might combine some groups.
Example 2: Entering Data from the keyboard
(Analyze/Crosstabs, Frequencies, Chi-Square/ Crosstabulations/ Chi-Square – From Keyboard)
Data for this example are observations of the number of beetles and bugs on the upper and lower sides of leaves (Zar,1974, page 292).
2 by 2 Contingency Table Data
|
Beetles |
Bugs |
Upper Leaf |
12 |
7 |
Lower Leaf |
2 |
8 |
To perform this analysis, follow these steps:
Step 1: Select Analyze/Crosstabulations, Frequencies, Chi-Square/ Crosstabulations, Chi-Square - From Keyboard.
Step 2: You are first prompted to select output options. For this example, just select Frequencies. You are then prompted to indicate the size of the table. When asked for the number of rows and columns, type 2, 2 and press Enter. An empty table appears. Enter counts for each category into the appropriate cell, and choose Calculate. Preliminary results appear on the status bar a the bottom of the screen. You can perform calculations on several tables, and all results will appear in the viewer when you select Exit.
2-Way Contingency Table
FREQUENCY|
| | TOTAL
------------------------
|
12| 7| 19
------------------------
|
2| 8| 10
------------------------
TOTAL
14 15 29
48.3 51.7 100.0
WARNING - Some
Expected values less than 5. Chi-Square may not be valid.
Statistic DF Value p-value
-------------------------------------------------------------
Chi-Square 1 4.887 0.028
Yates'
Chi-Square 1 3.312 0.069
Fisher's Exact
Test (one-tail) 0.033
(two-tail) 0.050
Phi
Coefficient .411
Cramer's
V .411
Contingency
Coefficient .380
Relative
Risk 3.158
Odds
Ratio 6.857 95% C.I.=(1.124,41.829)
Sensitivity .857
Specificity .533
Sensitivity,
Specificity and RR calculations are based on a
table where the
cells are in the following pattern:
TP FP
FN TN
Step 3: The calculated chi-square statistic is reported as 4.89 with a p-value of 0.028. The chi-square with Yates correction is 3.31 with a p-value of 0.069 and the Fisher Exact Test (two-tailed) has a p-value of 0.050. Because one of the cells produces an expected value less than 5, SDA gives a warning that the chi-square analysis for this data may not be valid. Given this warning, it is best to rely on the Fisher's Exact Test for making a decision.
A low p-value indicates rejection of the null hypothesis. At a 0.05 significance level, the Fisher's Exact Test p-value of 0.050 indicates (borderline) that there is enough evidence to reject the null hypothesis of independence of the two variables and to conclude that leaf side and type of insect are not independent. In this case it appears that beetles prefer the upper sides of leaves and bugs are about split in their preference. In the case of the Yates results, this decision is marginal.
Example 3: Entering Data from Count Data Set
(Analyze/Crosstabs, Frequencies, Chi-Square/ Crosstabulations/ Chi-Square – from count data)
The following data are from a classic study from 1909 reported by Karl Pearson that observed the association between drinking and criminal behavior.
Step 1: Open CROSSTAB_COUNTS.SAV and Select Analyze, Crosstabs, Frequencies, Chi-Square, Crosstab/Chi-Square (From count data.)
Step 2: Select CRIME as the row variable, DRINKER as the column and COUNT as count. Click Ok.
Step 3: From the Options menu select Frequency and Standardized Residual. Click Ok. The following (partial) output is displayed (similar to Example 1.)
CRIME(C)(rows) by DRINKER(N) (columns)
FREQUENCY| YES| NO| TOTAL
------------------------
ARSON| 50| 43| 93
------------------------
RAPE| 88| 62| 150
------------------------
VIOLENCE| 155| 110| 265
------------------------
STEALING| 379| 300| 679
------------------------
COINING| 18| 14| 32
------------------------
FRAUD| 63| 144| 207
------------------------
TOTAL 753 673 1426
52.8 47.2 100.0
Typical hypotheses tested include:
Test of independence: Ho: There is no association between the two variables.
or Test of homogeneity: Ho: Distribution of each category is same across population.
Statistic DF Value p-value
-----------------------------------------------------------------
Chi-Square 5 49.731 <0.001
Likelihood Ratio Chi-Square 5 50.517 <0.001
Phi Coefficient .187
Cramer's V .187
Contingency Coefficient .184
Since p<=0.05, the null hypothesis (of independence or homogeneity)
is rejected and multiple comparisons are performed.
continues...
Click the Graph option at the top of the sceeen to display the graph grouped by Drinker within Crime.
End of tutorial
For more information including explanation of options go to next tutorial.