# Pearson's Correlation Coefficient This is one in a series of tutorials using examples from WINKS SDA.

Definition: Measures the strength of the linear relationship between two variables.

Assumptions: Both variables (often called X and Y) are interval/ratio and approximately normally distributed, and their joint distribution is bivariate normal.

Characteristics: Pearson's Correlation Coefficient is usually signified by r (rho), and can take on the values from -1.0 to 1.0. Where -1.0 is a perfect negative (inverse) correlation, 0.0 is no correlation, and 1.0 is a perfect positive correlation.

Related statistics: R2 (called the coefficient of determination or r squared) can be interpreted as the proportion of variance in Y that is contained in X.

Tests: The statistical significance of r is tested using a t-test. The hypotheses for this test are:

H0: rho = 0
Ha: rho <> 0

A low p-value for this test (less than 0.05 for example) means that there is evidence to reject the null hypothesis in favor of the alternative hypothesis, or that there is a statistically significant relationship between the two variables.

Note: This test is equivalent to the test of no slope in the simple linear regression procedure.

Location in WINKS: Pearson's correlation coefficient is found in the following locations:

1. Regression and Correlation - The Correlation procedure produces both Pearson and Spearman Correlation coefficients. The t-test for statistical significance of r is calculated. R2 is also reported.

2. Regression and Correlation - The Simple linear regression reports the Pearson correlation coefficient and the t-test. R2 is also reported.

3. Regression and Correlation - The Correlation Matrix procedure produces a matrix of correlations for a number of pairs of variables at a time, and includes the p-value for the test or significance of r.

Graphs: An important part of interpreting r is to observe a scatterplot of the data. Scatterplots are available from the Graphs option, as a part of Simple Linear Regression and in the Graphical Correlation Matrix option in Regression and Correlation.

Example: Use the Correlation procedure to calculate r for the two variables  HP (horsepower) and WEIGHT in the WINKS "CAR" database. The results from WINKS (in part) are:

Variables used : HP and WEIGHT Number of cases used: 38 Pearson's r (Correlations Coefficient) = 0.9172 R-Square = 0.8413 Test of hypothesis to determine significance of relationship:

H(null): Slope = 0 or H(null): r = 0 (Pearson's)

t = 13.81425 with 36 d.f. p < 0.001

(A low p-value implies that the slope does not = 0.) Spearman's Rank Correlation Coefficient = 0.9071 (Spearman's) t = 12.93361 with 36 d.f. p < 0.001. A scatterplot of this data shows the positive correlation -- cars with higher horsepower tend to weigh more: An example of writing up these results:

Narrative: "An evaluation was made of the linear relationship between horsepower and vehicle weight using Pearson's correlation."

Results: "An analysis using Pearson's correlation coefficient indicates a statistically significant linear relationship between horsepower and vehicle weight r(36)=0.92, p<0.001. For these data, the mean (SD) for horsepower is 101.7(26.4) and for weight 2.86 (0.71)."

Warning: There is a temptation to infer cause and effect when observing a correlation. However, the ability to assign causality depends on the creation of an experiment specifically designed to provide this kind of inference.

Related topics: Spearman's Correlation Coefficient is the non-parametric counterpart to r. See also simple linear regression, multiple regression, and polynomial regression.

Exercise - Correlation

At the beginning of an introductory engineering course, 10 students were given a pre-test to determine their initial mathematical ability. The following table lists the student's pre-test score and final grade in the class:

 Student Number Pre-Test Course Grade 1 2 3 4 5 6 7 8 9 10 45 23 50 46 33 21 13 30 34 50 92 86 97 95 87 76 72 84 85 98

1. Calculate Pearson's Correlation Coefficient (r) on this data.

r =

2. What statistical test is used to determine if this value of r is statistically significant?

3. Is the correlation seen in this data statistically significant. Why?

4. Display a scatterplot of the data. Does the data appear linearly correlated. Do there seem to be any outlier values?

5. Suppose an 11th student were added to the data, with a pre-test score of 40 and a Course Grade of 70. How would this effect r?

