Correlation Coefficient Formula - TRUNG TÂM GIA SƯ TÂM TÀI ĐỨC

Mục lục bài viết

5/5 - (1 vote)

Correlation Coefficient

The correlation coefficient is a statistical concept which helps in establishing a relation between predicted and actual values obtained in a statistical experiment. The calculated value of the correlation coefficient explains the exactness between the predicted and actual values.

The correlation coefficient value always lies between -1 and +1. If the correlation coefficient value is positive, then there is a similar and identical relation between the two variables. Else, it indicates the dissimilarity between the two variables.

The covariance of two variables divided by the product of their standard deviations gives Pearson’s correlation coefficient. It is usually represented by ρ (rho).

ρ (X,Y) = cov (X,Y) / σX.σY.

Here, cov is the covariance. σX is the standard deviation of X, and σY is the standard deviation of Y. The given equation for the correlation coefficient can be expressed in terms of means and expectations.

μx and μy are the mean of x and the mean of y, respectively. E is the expectation.

Assumptions of Karl Pearson’s Correlation Coefficient

The assumptions and requirements for calculating Pearson’s correlation coefficient are as follows:

1. The data set which is to be correlated should approximate the normal distribution. If the data is normally distributed, then the data points tend to lie closer to the mean.

2. ‘Homoscedastic’ is a Greek word meaning ‘able to disperse’. Homoscedasticity means ‘equal variances’. For all the values of the independent variable, the error term is the same. Suppose the error term is smaller for a certain set of values of the independent variable and larger for another set of values; then, homoscedasticity is violated. It can be checked visually through a scatter plot. The data is said to be homoscedastic if the points lie equally on both sides of the line of best fit.

3. When the data follow a linear relationship, it is said to be linear. If the data points are in the form of a straight line on the scatter plot, then the data satisfies the condition of linearity.

4. The variables which can take any value in an interval are continuous variables. The data set must contain continuous variables to compute the Pearson correlation coefficient. If one of the data sets is ordinal, then Spearman’s rank correlation is an appropriate measure.

5. The data points must be in pairs which are termed paired observations. There exists a dependent variable for every observation of the independent variable.

6. There must be no outliers in the data. If the outliers are present, then they can skew the correlation coefficient and make it inappropriate. A point is considered to be an outlier if it is beyond +3.29 or -3.29 standard deviations away. They can be easily determined visually from a scatter plot.

Pearson Correlation Coefficient Formula

The linear correlation coefficient defines the degree of relation between two variables and is denoted by “r”. It is also called a cross-correlation coefficient, as it predicts the relation between two quantities. Now, let us proceed to a statistical way of calculating the correlation coefficient.

If x & y are the two variables of discussion, then the correlation coefficient can be calculated using the formula

Here,

n = Number of values or elements

∑x = Sum of 1st values list

∑y = Sum of 2nd values list

∑xy = Sum of the product of 1st and 2nd values

∑x² = Sum of squares of 1^st values

∑y² = Sum of squares of 2^nd values

How to Find the Correlation Coefficient

Correlation is used almost everywhere in statistics. Correction illustrates the relationship between two or more variables. It is expressed in the form of a number that is known as the correlation coefficient. There are mainly two types of correlations:

Positive Correlation
Negative Correlation

Positive Correlation

The value of one variable increases linearly with an increase in another variable. This indicates a similar relation between both variables. So its correlation coefficient would be positive or 1 in this case.

Negative Correlation

When there is a decrease in the values of one variable with an increase in the values of another variable, in that case, the correlation coefficient would be negative.

Zero Correlation or No Correlation

There is one more situation when there is no specific relation between two variables.

Correlation Coefficient Properties

The correlation coefficient is all about establishing relationships between two variables. Some properties of the correlation coefficient are as follows:

1) The correlation coefficient remains in the same measurement as in which the two variables.

2) The sign that correlations of coefficient have will always be the same as the variance.

3) The numerical value of the correlation of coefficient will be between -1 to + 1. It is known as the real number value.

4) The negative value of the coefficient suggests that the correlation is strong and negative. And if ‘r’ goes on approaching -1, then it means that the relationship is going towards the negative side.

When ‘r’ approaches the side of + 1, then it means the relationship is strong and positive. By this, we can say that if +1 is the result of the correlation, then the relationship is in a positive state.

5) The weak correlation is signalled when the coefficient of correlation approaches zero. When ‘r’ is near zero, then we can deduce that the relationship is weak.

6) Correlation coefficient can be very dicey because we cannot say whether the participants are truthful or not.

The coefficient of correlation is not affected when we interchange the two variables.

7) The coefficient of correlation is a pure number without the effect of any units on it. It also does not get affected when we add the same number to all the values of one variable. We can multiply all the variables by the same positive number. It does not affect the correlation coefficient. As we discussed, ‘r’ is not affected by any unit because ‘r’ is a scale-invariant.

8) We use correlation for measuring the association, but that does not mean we are talking about causation. By this, we simply mean that when we are correlating the two variables, then it might be the possibility that the third variable may be influencing them.

What Is the Correlation Coefficient Formula?

The correlation coefficient is a statistical concept. It establishes a relation between predicted and actual values obtained at the end of a statistical experiment. The correlation coefficient formula helps to calculate the relationship between two variables and thus the result so obtained explains the exactness between the predicted and actual values.

Pearson Correlation Coefficient Formula:

1. Sample Correlation Coefficient

The formula for pearson correlation coefficient for population of size N (written as ρ_{X, Y}) is given as:

where cov is the covariance and (cov(X,Y)= is standard deviation of X and σ_Y is standard deviation of Y.

Given X and Y are two random variables.

2. Population Correlation Coefficient

The formula for pearson correlation coefficient for sample of size n (written as r_xy) is given as:

where n is the sample size, x_i & y_i are the i^th sample points and x̄ & ȳ are the sample means for the random variables X and Y respectively.

Given X and Y are two random variables.

3. Linear Correlation Coefficient

It uses pearson’s correlation coefficient to determine the linear relationship between two variables. Its value lies between -1 and 1. It is given as:

where n is the sample size, x_i & y_i are the i^th sample points and x̄ & ȳ are the sample means for the random variables x and y respectively.

The sign of r indicates the strength of the linear relationship between the variables.

If r is near 1, then the two variables have a strong linear relationship.
If r is near 0, then the two variables have no linear relation.
If r is near -1, then the two variables have a weak (negative) linear relationship.

Let us see the applications of the correlation coefficient formula in the following section.

Examples using Correlation Coefficient Formula

Example 1. Given the following population data. Find the Pearson correlation coefficient between x and y for this data. (Take 1√7 as 0.378)

x	600	800	1000
y	1200	1000	2000

Solution:

To simplify the calculation, we divide both x and y by 100.

Using the correlation coefficient formula,

Pearson correlation coefficient for population =

Answer: Pearson correlation coefficient = 0.756

Example 2. A survey was conducted in your city. Given is the following sample data containing a person’s age and their corresponding income. Find out whether the increase in age has an effect on income using the correlation coefficient formula. (Use 1√181 as 0.074 and 1√2091 as 0.07)

Age	25	30	36	43
Income	30000	44000	52000	70000

Solution:

To simplify the calculation, we divide y by 1000.

Pearson correlation coefficient for sample =

Answer: Yes, with the increase in age a person’s income increases as well, since the Pearson correlation coefficient between age and income is very close to 1.

Example 3: Calculate the Correlation coefficient of given data.

x	41	42	43	44	45
y	3.2	3.3	3.4	3.5	3.6

Solution:

Here n = 5

Let us find ∑x , ∑y, ∑xy, ∑x ², ∑y²

x	y	xy	x²	y²
41	3.2	131.2	1681	10.24
42	3.3	138.6	1764	10.89
43	3.4	146.2	1849	11.56
44	3.5	154	1936	12.25
45	3.6	162	2025	12.96
∑x = 215	∑y = 17	∑xy = 732	∑x² = 9255	∑y² = 57.9

values:

∑x = 215

∑x² = 9255

x̄ = 43

∑(x – x̄)² = σσ_x=10

Y values:

∑y = 17

∑y² = 57.9

∑(y – ȳ)² =σσ_y= 0.1

X and Y combined

N = 5

∑((x – x̄)(y – ȳ)) = 1

∑xy = 732

R calculation:

r = ∑((x – x̄)(y – ȳ))/√((σσ_x)(σσ_y))

r = 1/√((10)(0.1)) = 1

Since r = 1, this indicates significant relation between x and y.

Examples on Correlation Coefficient

Example 1: Calculate the correlation coefficient of the given data.

x	50	51	52	53	54
y	3.1	3.2	3.3	3.4	3.5

Solution:

Here, n = 5

x	50	51	52	53	54
y	3.1	3.2	3.3	3.4	3.5
xy	155	163.2	171.6	180.2	189
x²	2500	2601	2704	2809	2916
y²	9.61	10.24	10.89	11.56	12.25

∑x = 260

∑y = 16.5

∑xy = 859

∑x² = 13530

∑y² = 54.55

By substituting all the values in the formula, we get r = 1. This shows a positive correlation coefficient.

Example 2: Calculate the correlation coefficient of the given data.

x	12	15	18	21	27
y	2	4	6	8	12

Solution:

Here, n = 5

x	12	15	18	21	27
y	2	4	6	8	12
xy	24	60	94	168	324
x²	144	225	324	441	729
y²	4	16	36	64	144

∑x = 93

∑y = 32

∑xy = 670

∑x² = 1863

∑y² = 264

Now, substitute all the values in the below formula.

We have r = 0.84

Cramer’s V Correlation

Cramer’s V correlation is identical to the Pearson correlation coefficient. Pearson correlation coefficient is used to find the correlation between variables, whereas Cramer’s V is used to calculate correlation in tables with more than 2 x 2 columns and rows. It varies between 0 and 1. 0 indicates less association between the variables, whereas 1 indicates a very strong association.

Cramer’s V

.25 or higher – Very strong relationship
.15 to .25 – Strong relationship
.11 to .15 – Moderate relationship
.06 to .10 – Weak relationship
.01 to .05 – No or negligible relationship

Other types of correlation are as follows:

1] Concordance Correlation Coefficient
It measures the bivariate pairs of observations comparative to a “gold standard” measurement.

2] Intraclass Correlation
It measures the reliability of the data that are collected as groups.

3] Kendall’s Tau
It is a non-parametric measure of relationships between the columns of ranked data.

4] Moran’s I
It measures the overall spatial autocorrelation of the data set.

5] Partial Correlation
It measures the strength of a relationship between two variables while controlling for the effect of one or more other variables.

6] Phi Coefficient
It measures the association between two binary variables.

7] Point Biserial Correlation: It is a special case of Pearson’s correlation coefficient. It measures the relationship between two variables:
a] One continuous variable.
b] One naturally binary variable.

8] Spearman Rank Correlation
It is the nonparametric version of the Pearson correlation coefficient.

9] Zero-Order Correlation
It indicates nothing has been controlled for or “partial out” in an experiment.

Frequently Asked Questions

What do you mean by correlation coefficient?

The correlation coefficient is a statistical concept used to measure how strong a relationship is between two variables.

Give the formula for Pearson’s correlation coefficient.

Pearson’s correlation coefficient is given by ρ (X,Y) = cov (X,Y)/σX.σY.

What is the range of the correlation coefficient?

The value of the correlation coefficient lies between -1 to +1.

What do you mean by zero correlation?

A zero correlation denotes that the correlation statistic does not indicate a relationship between the two variables.

What does a correlation coefficient -1 refer to?

A correlation coefficient -1 refers to a perfect negative correlation.

What do you mean by positive correlation?

In positive correlation, the value of one variable increases linearly with an increase in another variable. This denotes a similar relation between both variables.

What does a correlation coefficient 1 refer to?

A correlation coefficient 1 refers to a perfect positive correlation.

How to determine the correlation coefficient?

First, find the covariance of the variables. Then divide the covariance by the product of the standard deviations of the variables. The result gives the correlation coefficient.

How to check whether a correlation is positive or negative?

To check whether a correlation is positive or negative, we have to check the correlation coefficient value. If the value of the correlation coefficient is greater than zero, then it is a positive correlation. If the value is less than zero, then it is a negative correlation. If the value of the correlation coefficient is zero, it shows a zero correlation.

What Is Correlation Coefficient Formula in Statistics?

The correlation coefficient formula determines the relationship between two variables in a dataset and thus checks for the exactness between the predicted and actual values.

How To Use Correlation Coefficient Formula?

We can use the coefficient correlation formula to calculate the Pearson product-moment correlation,

Step 1: Determine the covariance of the two given variables.
Step 2: Calculate the standard deviation of each variable.
Step 3: Divide the covariance by the product of the standard deviations of two variables.

What Is n in the Correlation Coefficient Formula?

In the coefficient correlation formula, n refers to the sample size.

What Are the Applications of Correlation Coefficient Formula?

Given below are the most important applications of the coefficient correlation formula:

The coefficient correlation formula helps in the analysis of the given data by quantifying the degree to which two variables are related which further depicts a linear relationship between two variables.
It is used for financial analysis as it determines the relationship between data sets in business and thus, in a way support decision making.
It helps a lot in decision-making in various fields as it helps to understand the strength of the relationship between two different variables.

Related articles

Correlation Coefficient Formula

✅ Pearson Correlation Formula ⭐️⭐️⭐️⭐️⭐️

✅ Linear Correlation Coefficient Formula ⭐️⭐️⭐️⭐️⭐

Coefficient of Variation Formula

x	12	15	18	21	27
y	2	4	6	8	12
xy	24	60	94	168	324
x²	144	225	324	441	729
y²	4	16	36	64	144

x	12	15	18	21	27
y	2	4	6	8	12
xy	24	60	94	168	324
x²	144	225	324	441	729
y²	4	16	36	64	144

x	12	15	18	21	27
y	2	4	6	8	12
xy	24	60	94	168	324
x²	144	225	324	441	729
y²	4	16	36	64	144