Lesson 13: Proportion Z test

Monday, January 15, 2024

In this lesson, we will learn the proportion Z test.

First, load the packages we will use in this lesson.

**Proportion Z-test**

Proportion Z test is a hypothesis test for the population proportion. It is used to test categorical data.

- Let’s take a look at the distribution of the variable
`gender`

in the`BEPS`

dataset of`carData`

.

```
gender n percent
female 812 0.532459
male 713 0.467541
```

Hypothesis test for one proportion

The null hypothesis (\(H0\)) is that the population proportion of female is equal to \(0.5\).

The alternative hypothesis (\(H1\)) is that the population proportion of female is not equal to \(0.5\).

\[\begin{aligned}
Z &= {(p_1 - p_0)\over se }
\\
\\
&={(p_1 - p_0)\over {\sigma\over \sqrt n} }
\\
\\
&= {(p_1 - p_0)\over \sqrt {p_0 (1- p_0) \over n} }
\end{aligned}\]

Examples

Here Z is the test statistic, \(p_1\) is the sample proportion, \(p_0\) is the hypothesized proportion, and \(n\) is the sample size. \(se\) is the standard error of the sample proportion, and \(\sigma\) is the standard deviation of the sampling distribution of the sample proportion.

```
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.5
X-squared = 6.2977, df = 1, p-value = 0.01209
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5070389 0.5577139
sample estimates:
p
0.532459
```

The first argument is the number of female, the second argument is the sample size, and the third argument is the hypothesized proportion. The fourth argument is the alternative hypothesis. The default alternative hypothesis is that the population proportion is not equal to the hypothesized proportion. And the default significance level is 0.05.

In this significance level, we reject the null hypothesis because the p-value is less than the significance level 0.05. And the confidence interval does not contain the hypothesized proportion 0.5.

- We can also change the significance level to \(0.01\) by adding the argument
`conf.level = 0.99`

.

```
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.5
X-squared = 6.2977, df = 1, p-value = 0.01209
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.4991511 0.5654829
sample estimates:
p
0.532459
```

In the 0.01 significance level, we fail to reject the null hypothesis because the p-value is greater than the significance level 0.01. And the confidence interval contains the hypothesized proportion 0.5.

We can also change the alternative hypothesis to “less” or “greater” by adding the argument

`alternative = "less"`

or`alternative = "greater"`

.Since the sample proportion is greater than the hypothesized proportion, we can use the alternative hypothesis “greater”.

```
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.5
X-squared = 6.2977, df = 1, p-value = 0.006045
alternative hypothesis: true p is greater than 0.5
95 percent confidence interval:
0.5110761 1.0000000
sample estimates:
p
0.532459
```

In the 0.05 significance level, we reject the null hypothesis because the p-value is less than the significance level 0.05. And the confidence interval does not contain the hypothesized proportion 0.5.

We can also change the hypothesized proportion to 0.6 by adding the argument

`p = 0.6`

. The null hypothesis is that the population proportion of female is equal to \(0.6\).

```
1-sample proportions test with continuity correction
data: 812 out of 1525, null probability 0.6
X-squared = 28.706, df = 1, p-value = 8.426e-08
alternative hypothesis: true p is not equal to 0.6
95 percent confidence interval:
0.5070389 0.5577139
sample estimates:
p
0.532459
```

- In the 0.05 significance level, we reject the null hypothesis because the p-value is less than the significance level 0.05. And the confidence interval does not contains the hypothesized proportion 0.6.

**Two-proportion Z test**

Two-proportion Z test is a hypothesis test for the difference between two population proportions. It is also used to test categorical data.

- Hypotheses:
- Null Hypothesis (\(H0\)): The proportions in the two groups are equal.
- Alternative Hypothesis (\(H1\)): The proportions in the two groups are not equal.

\[\begin{aligned}
Z &= {(p_1 - p_2)\over \sqrt{p(1-p)({1\over n_1} + {1\over n_2})}}
\end{aligned}\]

- Here Z is the test statistic, \(p_1\) is the sample proportion of group 1, \(p_2\) is the sample proportion of group 2, \(p\) is the pooled proportion, \(n_1\) is the sample size of group 1, and \(n_2\) is the sample size of group 2.

\[\begin{aligned}
p ={ n_1 p_1 + n_2 p_2 \over n_1 + n_2}
\end{aligned}\]

Examples

Let’s say we have another sample which is from the voting data of USA, the gender proportion is \(female = 3120\) while \(male = 2880\), so the total sample of USA is 6000

```
2-sample test for equality of proportions with continuity correction
data: c(812, 3120) out of c(1525, 6000)
X-squared = 0.70741, df = 1, p-value = 0.4003
alternative hypothesis: two.sided
95 percent confidence interval:
-0.01600389 0.04092192
sample estimates:
prop 1 prop 2
0.532459 0.520000
```

The first argument is the number of successes in group 1 and group 2, the second argument is the sample size of group 1 and group 2, and the alternative hypothesis and the default significance level is “two.sided” and “0.05”.

In the 0.05 significance level, we fail to reject the null hypothesis because the p-value is greater than the significance level 0.05. And the confidence interval contain 0.

In this lesson, we learned the one-proportion and two-proportion Z test.

The one-proportion Z test is a hypothesis test for one population proportion. It is used to test categorical data.

The two-proportion Z test is a hypothesis test for the difference between two population proportions.

In next lesson, we will learn the t-test.

Thank you!