Central Limit Theorem Formula

Mục lục bài viết

Đánh giá bài viết post

The Central Limit Theorem for Sample Means (Averages)

Suppose X is a random variable with a distribution that may be known or unknown (it can be any distribution). Using a subscript that matches the random variable, suppose:

μx= the mean of X
σx= the standard deviation of X

If you draw random samples of size n, then as n increases, the random which consists of sample means, tends to be normally distributed and

The central limit theorem for sample means says that if you keep drawing larger and larger samples (such as rolling one, two, five, and finally, ten dice) and calculating their means, the sample means form their own normal distribution (the sampling distribution). The normal distribution has the same mean as the original distribution and a variance that equals the original variance divided by, the sample size. The variable n is the number of values that are averaged together, not the number of times the experiment is done.

To put it more formally, if you draw random samples of size n, the distribution of the random variable ,

which consists of sample means, is called the sampling distribution of the mean. The sampling distribution of the mean approaches a normal distribution as n, the sample size, increases.

The random variable has a different z-score associated with it from that of the random variable X. The is the value of in one sample.

μx is the average of both X and

= standard deviation and is called the standard error of the mean.

2^nd DISTR

2:normalcdf

normalcdf(lower value of the area, upper value of the area, mean

where:

mean is the mean of the original distribution
standard deviation is the standard deviation of the original distribution
sample size =n

Example 7.2.17.2.1

An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size n=25 are drawn randomly from the population.

Find the probability that the sample mean is between 85 and 92.
Find the value that is two standard deviations above the expected value, 90, of the sample mean.

Answer

Let X= one value from the original unknown population. The probability question asks you to find a probability for the sample mean.

Let = the mean of a sample of size 25. Since μx=90,σx=15, and n=25,

Find P(85<x<92). Draw a graph.

P(85<x<92)=0.6997

The probability that the sample mean is between 85 and 92 is 0.6997.

normalcdf(lower value, upper value, mean, standard error of the mean)

The parameter list is abbreviated (lower value, upper value, μ,

normalcdf

To find the value that is two standard deviations above the expected value 90, use the formula:

The value that is two standard deviations above the expected value is 96.

The standard error of the mean is

Recall that the standard error of the mean is a description of how far (on average) that the sample mean will be from the population mean in repeated simple random samples of size n.

Exercise 7.2.17.2.1

An unknown distribution has a mean of 45 and a standard deviation of eight. Samples of size n = 30 are drawn randomly from the population. Find the probability that the sample mean is between 42 and 50.

Answer

Example 7.2.27.2.2

The length of time, in hours, it takes an “over 40” group of people to play one soccer match is normally distributed with a mean of two hours and a standard deviation of 0.5 hours. A sample of size n=50 is drawn randomly from the population. Find the probability that the sample mean is between 1.8 hours and 2.3 hours.

Answer

Let X= the time, in hours, it takes to play one soccer match.

The probability question asks you to find a probability for the sample mean time, in hours, it takes to play one soccer match.

Let = the mean time, in hours, it takes to play one soccer match.

If μx= _________, σx= __________, and n= ___________,then X∼N(______, ______) by the central limit theorem for means.

The probability that the mean time is between 1.8 hours and 2.3 hours is 0.9977.

Exercise 7.2.27.2.2

The length of time taken on the SAT for a group of students is normally distributed with a mean of 2.5 hours and a standard deviation of 0.25 hours. A sample size of n=60 is drawn randomly from the population. Find the probability that the sample mean is between two hours and three hours.

Answer

Calculator SKills

To find percentiles for means on the calculator, follow these steps.

2^nd DIStR
3:invNorm

where:

k = the k^th percentile
mean is the mean of the original distribution
standard deviation is the standard deviation of the original distribution
sample size = n

Example 7.2.37.2.3

In a recent study reported Oct. 29, 2012 on the Flurry Blog, the mean age of tablet users is 34 years. Suppose the standard deviation is 15 years. Take a sample of size n=100.

What are the mean and standard deviation for the sample mean ages of tablet users?
What does the distribution look like?
Find the probability that the sample mean age is more than 30 years (the reported mean age of tablet users in this particular study).
Find the 95^th percentile for the sample mean age (to one decimal place).

Answer

a. Since the sample mean tends to target the population mean, we have μx=μ=34. The sample standard deviation is given by:

b. The central limit theorem states that for large sample sizes (n), the sampling distribution will be approximately normal.

c. The probability that the sample mean age is more than 30 is given by:

Let k = the 95^th percentile.

Exercise 7.2.37.2.3

In an article on Flurry Blog, a gaming marketing gap for men between the ages of 30 and 40 is identified. You are researching a startup game targeted at the 35-year-old demographic. Your idea is to develop a strategy game that can be played by men from their late 20s through their late 30s. Based on the article’s data, industry research shows that the average strategy player is 28 years old with a standard deviation of 4.8 years. You take a sample of 100 randomly selected gamers. If your target market is 29- to 35-year-olds, should you continue with your development strategy?

Answer

You need to determine the probability for men whose mean age is between 29 and 35 years of age wanting to play a strategy game.

You can conclude there is approximately a 1.9% chance that your game will be played by men whose mean age is between 29 and 35.

Example 7.2.47.2.4

The mean number of minutes for app engagement by a tablet user is 8.2 minutes. Suppose the standard deviation is one minute. Take a sample of 60.

What are the mean and standard deviation for the sample mean number of app engagement by a tablet user?
What is the standard error of the mean?
Find the 90^th percentile for the sample mean time for app engagement for a tablet user. Interpret this value in a complete sentence.
Find the probability that the sample mean is between eight minutes and 8.5 minutes.

Answer

b. This allows us to calculate the probability of sample means of a particular distance from the mean, in repeated samples of size 60.

c. Let k = the 90^th percentile

This values indicates that 90 percent of the average app engagement time for table users is less than 8.37 minutes.

Exercise 7.2.47.2.4

Cans of a cola beverage claim to contain 16 ounces. The amounts in a sample are measured and the statistics are n=34, x¯=16.01 ounces. If the cans are filled so that μ=16.00 ounces (as labeled) and σ=0.143 ounces, find the probability that a sample of 34 cans will have an average amount greater than 16.01 ounces. Do the results suggest that cans are filled with an amount greater than 16 ounces?

Answer

Since there is a 34.17% probability that the average sample weight is greater than 16.01 ounces, we should be skeptical of the company’s claimed volume. If I am a consumer, I should be glad that I am probably receiving free cola. If I am the manufacturer, I need to determine if my bottling processes are outside of acceptable limits.

Summary

In a population whose distribution may be known or unknown, if the size (n) of samples is sufficiently large, the distribution of the sample means will be approximately normal. The mean of the sample means will equal the population mean. The standard deviation of the distribution of the sample means, called the standard error of the mean, is equal to the population standard deviation divided by the square root of the sample size (n).

Formula Review

The Central Limit Theorem for Sample Means:

Central Limit Theorem for Sample Means z-score and standard error of the mean:

Standard Error of the Mean (Standard Deviation ():

What is the central limit theorem?

The central limit theorem relies on the concept of a sampling distribution, which is the probability distribution of a statistic for a large number of samples taken from a population.

Imagining an experiment may help you to understand sampling distributions:

Suppose that you draw a random sample from a population and calculate a statistic for the sample, such as the mean.
Now you draw another random sample of the same size, and again calculate the mean.
You repeat this process many times, and end up with a large number of means, one for each sample.

The distribution of the sample means is an example of a sampling distribution.

The central limit theorem says that the sampling distribution of the mean will always be normally distributed, as long as the sample size is large enough. Regardless of whether the population has a normal, Poisson, binomial, or any other distribution, the sampling distribution of the mean will be normal.

A normal distribution is a symmetrical, bell-shaped distribution, with increasingly fewer observations the further from the center of the distribution.

Fortunately, you don’t need to actually repeatedly sample a population to know the shape of the sampling distribution. The parameters of the sampling distribution of the mean are determined by the parameters of the population:

The mean of the sampling distribution is the mean of the population.

The standard deviation of the sampling distribution is the standard deviation of the population divided by the square root of the sample size.

We can describe the sampling distribution of the mean using this notation:

Where:

X̄ is the sampling distribution of the sample means
~ means “follows the distribution”
N is the normal distribution
µ is the mean of the population
σ is the standard deviation of the population
n is the sample size

Central Limit Theorem Proof

Let us suppose we have X₁, X₂, … X_n independent and identically distributed random variables with variance σ = 1and mean μ = 0.

Let M(t) be the moment generating function of each X_i.

M(0) = 1

M'(1) = E[X_i] = μ = 0

This is the moment generating function of a standard normal distribution, thus proving the central limit theorem.

Sample size and the central limit theorem

The sample size (n) is the number of observations drawn from the population for each sample. The sample size is the same for all samples.

The sample size affects the sampling distribution of the mean in two ways.

1. Sample size and normality

The larger the sample size, the more closely the sampling distribution will follow a normal distribution.

When the sample size is small, the sampling distribution of the mean is sometimes non-normal. That’s because the central limit theorem only holds true when the sample size is “sufficiently large.”

By convention, we consider a sample size of 30 to be “sufficiently large.”

When n < 30, the central limit theorem doesn’t apply. The sampling distribution will follow a similar distribution to the population. Therefore, the sampling distribution will only be normal if the population is normal.
When n ≥ 30, the central limit theorem applies. The sampling distribution will approximately follow a normal distribution.

2. Sample size and standard deviations

The sample size affects the standard deviation of the sampling distribution. Standard deviation is a measure of the variability or spread of the distribution (i.e., how wide or narrow it is).

When n is low, the standard deviation is high. There’s a lot of spread in the samples’ means because they aren’t precise estimates of the population’s mean.
When n is high, the standard deviation is low. There’s not much spread in the samples’ means because they’re precise estimates of the population’s mean.

Example 1: In a study, it was reported that the mean of mobile users is 30 years and the standard deviation is 12. Taking a sample size of 100 what is the mean and standard deviation for the sample mean ages of tablet users?

Solution: Since the sample mean will tend to the population mean, thus, mean is 30.

Answer: Mean = 30, Standard deviation = 1.2

Example 2: An unknown distribution has a mean of 80 and a standard deviation of 24. If 36 samples are randomly drawn from this population then using the central limit theorem find the value that is two sample deviations above the expected value.

Solution: We know that mean of the sample equals the mean of the population. Thus, mean = 80.

Thus value is 80 + 2 (4) = 88.

Answer: The value of that is two sample deviations above the expected value is 88.

Example 3: Suppose the mean age of people living in a town is 45 years and the standard deviation is 10. What will be the mean and variance of ages for sample sizes 20 and 49?

Solution: When n = 20, the central limit theorem cannot be applied as the sample size needs to be greater than or equal to 30.

When n = 49. The sample mean will be 45.

Sample variance = 1.43² = 2.045

Answer: a) For n = 49, Mean = 45, Variance = 2.045

Conditions of the central limit theorem

The central limit theorem states that the sampling distribution of the mean will always follow a normal distribution under the following conditions:

The sample size is sufficiently large. This condition is usually met if the sample size is n ≥ 30.
The samples are independent and identically distributed (i.i.d.) random variables. This condition is usually met if the sampling is random.
The population’s distribution has finite variance. Central limit theorem doesn’t apply to distributions with infinite variance, such as the Cauchy distribution. Most distributions have finite variance.

Central limit theorem examples

Applying the central limit theorem to real distributions may help you to better understand how it works.

Continuous distribution

Suppose that you’re interested in the age that people retire in the United States. The population is all retired Americans, and the distribution of the population might look something like this:

Age at retirement follows a left-skewed distribution. Most people retire within about five years of the mean retirement age of 65 years. However, there’s a “long tail” of people who retire much younger, such as at 50 or even 40 years old. The population has a standard deviation of 6 years.

Imagine that you take a small sample of the population. You randomly select five retirees and ask them what age they retired.

Example: Central limit theorem; sample of n = 5

The mean of the sample is an estimate of the population mean. It might not be a very precise estimate, since the sample size is only 5.

Example: Central limit theorem; mean of a small samplemean = (68 + 73 + 70 + 62 + 63) / 5

mean = 67.2 years

Suppose that you repeat this procedure 10 times, taking samples of five retirees, and calculating the mean of each sample. This is a sampling distribution of the mean.

Example: Central limit theorem; means of 10 small samples

60.8

57.8

62.2

68.6

67.4

67.8

68.3

65.6

66.5

62.1

If you repeat the procedure many more times, a histogram of the sample means will look something like this:

Although this sampling distribution is more normally distributed than the population, it still has a bit of a left skew.

Notice also that the spread of the sampling distribution is less than the spread of the population.

The central limit theorem says that the sampling distribution of the mean will always follow a normal distribution when the sample size is sufficiently large. This sampling distribution of the mean isn’t normally distributed because its sample size isn’t sufficiently large.

Now, imagine that you take a large sample of the population. You randomly select 50 retirees and ask them what age they retired.

Example: Central limit theorem; sample of n = 50

73	49	62	68	72	71	65	60	69	61
62	75	66	63	66	68	76	68	54	74
68	60	72	63	57	64	65	59	72	52
52	72	69	62	68	64	60	65	53	69
59	68	67	71	69	70	52	62	64	68

The mean of the sample is an estimate of the population mean. It’s a precise estimate, because the sample size is large.

Example: Central limit theorem; mean of a large samplemean = 64.8 years

Again, you can repeat this procedure many more times, taking samples of fifty retirees, and calculating the mean of each sample:

In the histogram, you can see that this sampling distribution is normally distributed, as predicted by the central limit theorem.

The standard deviation of this sampling distribution is 0.85 years, which is less than the spread of the small sample sampling distribution, and much less than the spread of the population. If you were to increase the sample size further, the spread would decrease even more.

We can use the central limit theorem formula to describe the sampling distribution:

Example: A set of samples have been collected from a larger sample and the sample mean values are 12.8, 10.9, 11.4, 14.2, 12.5, 13.6, 15, 9, 12.6. Find the population mean.

Solution: The given sample mean values are 12.8, 10.9, 11.4, 14.2, 12.5, 13.6, 15, 9, 12.6.

The population mean values are an average of the above sample mean values

Discrete distribution

Approximately 10% of people are left-handed. If we assign a value of 1 to left-handedness and a value of 0 to right-handedness, the probability distribution of left-handedness for the population of all humans looks like this:

The population mean is the proportion of people who are left-handed (0.1). The population standard deviation is 0.3.

Imagine that you take a random sample of five people and ask them whether they’re left-handed.

Example: Central limit theorem; sample of n = 5

The mean of the sample is an estimate of the population mean. It might not be a very precise estimate, since the sample size is only 5.
Example: Central limit theorem; mean of a small samplemean = (0 + 0 + 0 + 1 + 0) / 5

mean = 0.2

Imagine you repeat this process 10 times, randomly sampling five people and calculating the mean of the sample. This is a sampling distribution of the mean.Example: Central limit theorem; means of 10 small samples

0.4

0.2

0.4

If you repeat this process many more times, the distribution will look something like this:

The sampling distribution isn’t normally distributed because the sample size isn’t sufficiently large for the central limit theorem to apply.

As the sample size increases, the sampling distribution looks increasingly similar to a normal distribution, and the spread decreases:

The sampling distribution of the mean for samples with n = 30 approaches normality. When the sample size is increased further to n = 100, the sampling distribution follows a normal distribution.

We can use the central limit theorem formula to describe the sampling distribution for n = 100.

Frequently asked questions about the central limit theorem

What is a normal distribution?

In a normal distribution, data are symmetrically distributed with no skew. Most values cluster around a central region, with values tapering off as they go further away from the center.

The measures of central tendency (mean, mode, and median) are exactly the same in a normal distribution.

What are the three types of skewness?

he three types of skewness are:

Right skew (also called positive skew). A right-skewed distribution is longer on the right side of its peak than on its left.
Left skew (also called negative skew). A left-skewed distribution is longer on the left side of its peak than on its right.
Zero skew. It is symmetrical and its left and right sides are mirror images.

Why are samples used in research?

Samples are used to make inferences about populations. Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

What is Central Limit Theorem?

The central limit theorem establishes that if large samples are drawn from a population and their sums are taken then the sums form their own normal distribution. Furthermore, by the law of large numbers, this sum converges to the population mean. The central limit theorem is often abbreviated as CLT.

What is the Central Limit Theorem in Statistics?

The central limit theorem in statistics states that irrespective of the shape of the population distribution the sampling distribution of the sampling means approximates a normal distribution when the sample size is greater than or equal to 30.

What is the Central Limit Theorem Formula?

The central limit theorem gives a formula for the sample mean and the sample standard deviation when the population mean and standard deviation are known. This is given as follows:

Sample mean = Population mean = μ
Sample standard deviation = (Population standard deviation) / √n = σ / √n

How Do You Use the Central Limit Theorem?

The following steps can be applied to find a certain probability using the central limit theorem:

Compute this value and find the corresponding z score using the normal distribution table.
Using this value various probabilities can be calculated. [P (X > x), P(X < x), P(a < X < b)}

What is the Purpose of the Central Limit Theorem?

The central limit theorem helps to approximate the characteristics of a population in cases where it is difficult to gather data about each observation of the population.

Why is the Central Limit Theorem Important for Statistical Inference?

The central limit theorem helps to make important inferences about the population from a sample. It can be used to determine if two samples were drawn from the same population as well as to check if the sample was drawn from a certain population.

Do Confidence Intervals Rely on the Central Limit Theorem?

Confidence intervals do not rely on the central limit theorem, however, the central limit theorem helps to construct confidence intervals by approximating the samples to a normal distribution.

Does Central Limit Theorem to a Non-Normal Distribution?

The central limit theorem can be applied to a sample that has been taken from any type of distribution. It says that the arithmetic means of sufficiently large samples will follow a normal distribution.

Central Limit Theorem Definition

The central limit theorem states that irrespective of a random variable’s distribution if large enough samples are drawn from the population then the sampling distribution of the mean for that random variable will approximate a normal distribution. This fact holds true for samples that are greater than or equal to 30. In other words, as more large samples are taken, the graph of the sample means starts looking like a normal distribution.

Central Limit Theorem Application

The central limit theorem is widely used in scenarios where the characteristics of the population have to be identified but analyzing the complete population is difficult. Other applications of the central limit theorem are listed below:

In data science, the central limit theorem is used to make accurate assumptions of the population in order to build a robust statistical model.
In applied machine learning, the CLT helps to make inferences about the model performance.
In statistical hypothesis testing the central limit theorem is used to check if the given sample belongs to a designated population.

Important Notes on Central Limit Theorem

The central limit theorem states that if the size of different samples is large enough then the sampling distribution of the means will approximate a normal distribution.
The sample mean will be the same as the population mean according to the CLT.
Using the central limit theorem the sample standard deviation is given by (Population standard deviation) / √n.
The formula for the z score.

73	49	62	68	72	71	65	60	69	61
62	75	66	63	66	68	76	68	54	74
68	60	72	63	57	64	65	59	72	52
52	72	69	62	68	64	60	65	53	69
59	68	67	71	69	70	52	62	64	68

73	49	62	68	72	71	65	60	69	61
62	75	66	63	66	68	76	68	54	74
68	60	72	63	57	64	65	59	72	52
52	72	69	62	68	64	60	65	53	69
59	68	67	71	69	70	52	62	64	68

73	49	62	68	72	71	65	60	69	61
62	75	66	63	66	68	76	68	54	74
68	60	72	63	57	64	65	59	72	52
52	72	69	62	68	64	60	65	53	69
59	68	67	71	69	70	52	62	64	68