| · Home · Description · Purchase ·Tutorials · Download · Support · Compare · Videos · What's New? · | ||||||||||
![]() |
||||||||||
Order WINKS NOW
Statistics
|
Crosstabulations can be used to perform a chi-square test for independence or a chi-square test for homogeneity. A two-way table is constructed that displays the number of counts for each category. It must be possible to assume that the data observations are independent and that each data value can be counted in one and only one category. It is also assumed that the number of observations is fixed. SDA allows you to enter data for a two-way table from the keyboard or from a data set.
Example 1: Entering Data from a Data Set (Analyze/Crosstabs, Frequencies, Chi-Square/ Crosstabulations/ Chi-Square)
If you choose to enter the information from a data set, you will be prompted to indicate what tables are to be calculated. Select one or more fields for the “Data field” (top right hand list box) and select one or more fields for the “By Var” field (bottom right hand side list box). For example open the data file SALARY.SDA (salaries of professors at a college), produce a table of RANK by SEX.
Step 1: Select Analyze/Crosstabulations, Frequencies, Chi-Square/Crosstabulations, Chi-Square.
Step 2: For the variables to use, select Rank and Sex as shown here: For all tables, you are prompted to specify what output options you want included in the output tables: · Frequencies only· Include Expected Values· Include Expected Values and Percents· Include Expected Values, Chi-Contribution and Percents· Include Percents· Include Expected Values and Chi-ContributionFor this example, select the “Include Expected Values” option. Click OK and the following output is produced:
FREQUENCY| EXPECTED| 1| 2| TOTAL ------------------------ 1| 7| 20| 27 | 10.3| 16.7| ------------------------ 2| 15| 33| 48 | 18.4| 29.6| ------------------------ 3| 27| 42| 69 | 26.4| 42.6| ------------------------ 4| 18| 13| 31 | 11.9| 19.1| ------------------------ TOTAL 67 108 175 38.3 61.7 100.0
Statistic DF Value p-value ----------------------------------------------------------------- Chi-Square 3 7.905 0.049 Phi Coefficient .213 Cramer's V .213 Contingency Coefficient .208
The calculated Chi-Square value is 7.905 with 3 degrees of freedom. The p-value of 0.049 indicates marginal significance. Assuming the SEX code is 1=Female and 2=Male you can see that in the highest rank (4) there were fewer females than expected (11.9 instead of 18) and more males (19.1 instead of 13). This might indicate a gender bias in how professors are promoted rank.
Question: What to the differences in expected and observed in rank=1 indicate?
This is a test of independence. For this analysis the contingency table looks at two categorical variables from a single sample of one population and tests whether the two variables are related in some way, (e.g., are sex and rank related?) The hypotheses being tested are:
Ho: The variables are independent of each other. (There is no
association between them).
If there is no association them (p is greater than 0.05) it means there is no evidence of bias. A low p-value indicates rejection of the null hypothesis and in this case implies bias.
SDA reports both the chi-square statistic and the p-value. If the expected value in one or more cells is less than 5, the chi-square test may not be valid. A warning to this effect appears on the screen if appropriate. In the case of a 2 by 2 table, Fisher's Exact Test and the chi-square with Yates' correction are also performed and results displayed. Note: Tables as large as 15 columns by 100 rows may be created by reading data from a data set. If there are more categories than this, SDA combines remaining categories in a group called REST. To prevent this, you might combine some groups.
Example 1: Entering Data from the keyboard (Analyze/Crosstabs, Frequencies, Chi-Square/ Crosstabulations/ Chi-Square – From Keyboard)
Data for this example are observations of the number of beetles and bugs on the upper and lower sides of leaves (Zar,1974, page 292).
2 by 2 Contingency Table Data
To perform this analysis, follow these steps:
Step 1: Select Analyze/Crosstabulations, Frequencies, Chi-Square/Crosstabulations, Chi-Square - From Keyboard.
Step 2: You are first prompted to select output options. For this example, just select Frequencies. You are then prompted to indicate the size of the table. When asked for the number of rows and columns, type 2, 2 and press Enter. An empty table appears. Enter counts for each category into the appropriate cell, and choose Calculate. Preliminary results appear on the status bar a the bottom of the screen. You can perform calculations on several tables, and all results will appear in the viewer when you select Exit.
2-Way Contingency Table FREQUENCY| | | TOTAL ------------------------ | 12| 7| 19 ------------------------ | 2| 8| 10 ------------------------ TOTAL 14 15 29 48.3 51.7 100.0 WARNING - Some Expected values less than 5. Chi-Square may not be valid.
Statistic DF Value p-value ------------------------------------------------------------- Chi-Square 1 4.887 0.028 Yates' Chi-Square 1 3.312 0.069 Fisher's Exact Test (one-tail) 0.033 (two-tail) 0.050 Phi Coefficient .411 Cramer's V .411 Contingency Coefficient .380 Relative Risk 3.158 Odds Ratio 6.857 95% C.I.=(1.124,41.829) Sensitivity .857 Specificity .533
Sensitivity, Specificity and RR calculations are based on a table where the cells are in the following pattern: TP FP FN TN T=True, F=False, P=Positive, N=Negative
Step 3: The calculated chi-square statistic is reported as 4.89 with a p-value of 0.028. The chi-square with Yates correction is 3.31 with a p-value of 0.069 and the Fisher Exact Test (two-tailed) has a p-value of 0.050. Because one of the cells produces an expected value less than 5, SDA gives a warning that the chi-square analysis for this data may not be valid. Given this warning, it is best to rely on the Fisher's Exact Test for making a decision.
A low p-value indicates rejection of the null hypothesis. At a 0.05 significance level, the Fisher's Exact Test p-value of 0.050 indicates (borderline) that there is enough evidence to reject the null hypothesis of independence of the two variables and to conclude that leaf side and type of insect are not independent. In this case it appears that beetles prefer the upper sides of leaves and bugs are about split in their preference. In the case of the Yates results, this decision is marginal.
Notes on 2x2 Table Statistics Several other specialized statistics are provided for 2x2 tables. They do not apply to the previous example, but are typically used in a medical setting where the variables are often exposure to a disease (or some stimulus) versus the observation of some outcome – for example smoking vs cancer or exposure to a chemical vs pulmonary disease. When your data are in that setting, the risk statistics given by Relative Risk, Odds Ratio, Sensitivity and Specificity may be useful: Note: Sensitivity and Specificity calculations are only available on the crosstabulations with data entered from the keyboard. Results are based on a table where the cells are in the following pattern:
TP | FP
Where T=True, F=False, P=Positive, and N=Negative.
Specificity is a measure of the ability to call negative patients/subjects that do not have the disease or condition. For more information on these four statistics, reference any biostatistics or epidemiology text. One example is Basic Biostatistics in Medicine and Epidemiology, A. A. Rimm, Appleton-Century-Crofts, 1980.
Example 3: Crosstabulation– Homogeneity Hypothesis (Analyze/Crosstabs, Frequencies, Chi-Square/ Crosstabulations/ Chi-Square)
A crosstabulations analysis may also be used to perform a chi-square test for homogeneity. In this case the two-way table looks at variables in samples from two populations and tests whether the variables follow the same distribution for both populations, that is, whether the populations are homogeneous.
The test statistic and procedure are the same as for the chi-square test for independence. The data are organized so that the populations being tested for homogeneity are represented as values (groups) of a grouping variable. The hypotheses being tested in this case are:
Ho: The populations are homogeneous.
To use the crosstabulations procedure to test for homogeneity, let the rows of the table represent the categories of the variable and the columns the different populations. (Of course, for test purposes it doesn't matter whether the rows or columns represent the populations.) If entering the data from the keyboard, simply enter the totals for each category. If creating a data set, you need a field for the categorical variable you wish to use in the test for homogeneity, and a grouping variable which identifies the population from which each observation comes. Each record represents one observation. There may be other fields in the data set for other variables in these same populations and you can do separate crosstabulation analyses on them. Suppose you want to check whether the ratio of men and women is the same in three different departments of a company. You obtain the following data:
The calculated chi-square statistic in this case is 2.30 with a p-value of 0.317. A decision can be made using this p-value of the test. A low p-value (less than the chosen significance level) is usually taken to indicate rejection of the null hypothesis. At a 0.05 significance level, the p-value of 0.317 indicates that there is not enough evidence to reject the null hypothesis of homogeneity of the three departments. That is, you cannot conclude that the departments are significantly different with respect to sex of employees based on this test.
|
|||||||||
| Top of document | Tutorial Index | TexaSoft Homepage | Send comments |© Copyright TexaSoft, 2007
|
||||||||||