Lesson 10: Central Limit Theorem
Saturday, January 13, 2024
In last lesson, we learned how to use normal distribution to solve problems. In this lesson, we will learn the Central Limit Theorem.
First, load the packages we will use in this lesson.
The Central Limit Theorem (CLT) states that the distribution of the average of a large number of independent, identically distributed (iid) variables will be approximately normal, regardless of the underlying distribution.
We use an example of chi-square distribution to show CLT.
We can see that the distributions of the three samples are chi-square distributions which are right-skewed. They are not normal distribution.
Let’s see what happens after we repeat it 10000 times with sample size of 10.
# Step 4, Repeat the experiment 10000 times, with sample size of 30, calculate the average of each sample
tibble(
experiment = 1:10000,
mean = replicate(10000, mean(rchisq(10, df = 3)))
) |>
ggplot(aes(x = mean, y = ..density..)) +
geom_histogram(bins = 30, color = "white", fill = "steelblue") +
geom_density(color = "purple", size = 1)
We can see that the distribution of the average of each sample is normal distribution.
Let’s see what happens if we repeat it 10000 times with the sample size of 5.
We can see that the distribution of the average of each sample is more like normal distribution as the sample size increases. Normally, we say that the sample size is large enough when it is greater than 30.
If we use different distributions, the distribution of the average of each sample (sampling distribution) will be normal distribution as well.
Thank you!