Week 19: Lab Practice Exercises
Let’s use the British Election Study (BES) data which we used in Homework 1.
We want to understand the Conservatives’ vote share in 2019. To do so, first choose a continuous main independent variable that you think would help to explain this dependent variable. State a hypothesis and a credible mechanism to link X to Y.
\[Con19 = Turnout19\]
H1: The higher the turnout in 2019, the higher the share of votes for the Conservatives party.
Choose two control variables that you think would help to explain the dependent variable. State a hypothesis and a credible mechanism to link X to Y.
\[Con19 = Turnout19 + Lab17 + Electorate19\]
H2: The higher the Labour party’s vote share in 2017, the higer the share of votes for the Conservatives party in 2019.
H3: The higher the electorate in 2019, the lower the share of votes for the Conservatives party.
Call:
lm(formula = Con19 ~ Turnout19 + Lab17 + Electorate19, data = bes)
Residuals:
Min 1Q Median 3Q Max
-45.323 -5.672 2.635 9.593 22.852
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.887e+01 8.933e+00 8.828 < 2e-16 ***
Turnout19 -3.946e-01 1.126e-01 -3.505 0.000489 ***
Lab17 -6.380e-01 3.544e-02 -18.004 < 2e-16 ***
Electorate19 2.514e-04 5.788e-05 4.344 1.63e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.96 on 626 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.4026, Adjusted R-squared: 0.3998
F-statistic: 140.6 on 3 and 626 DF, p-value: < 2.2e-16
Call:
lm(formula = Con19 ~ Turnout19 + Lab17 + Electorate19, data = filter(mutate(drop_na(select(bes,
Con19, Turnout19, Lab17, Electorate19)), cooksd = cooks.distance(mod1)),
cooksd < cutoff))
Residuals:
Min 1Q Median 3Q Max
-36.728 -5.085 2.449 7.898 18.618
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.071e+02 8.015e+00 13.360 < 2e-16 ***
Turnout19 -5.520e-01 9.739e-02 -5.668 2.26e-08 ***
Lab17 -7.507e-01 3.085e-02 -24.333 < 2e-16 ***
Electorate19 1.013e-04 5.614e-05 1.804 0.0717 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.61 on 587 degrees of freedom
Multiple R-squared: 0.5523, Adjusted R-squared: 0.55
F-statistic: 241.4 on 3 and 587 DF, p-value: < 2.2e-16
Model 1 | Model 2 | |
---|---|---|
(Intercept) | 78.87 *** | 107.08 *** |
(8.93) | (8.02) | |
Turnout19 | -0.39 *** | -0.55 *** |
(0.11) | (0.10) | |
Lab17 | -0.64 *** | -0.75 *** |
(0.04) | (0.03) | |
Electorate19 | 0.00 *** | 0.00 |
(0.00) | (0.00) | |
N | 630 | 591 |
R2 | 0.40 | 0.55 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Call:
lm(formula = Con19 ~ Turnout19 + Lab17 + Electorate19, data = filter(mutate(drop_na(select(bes,
Con19, Turnout19, Lab17, Electorate19)), cooksd = cooks.distance(mod1)),
cooksd < 0.02))
Residuals:
Min 1Q Median 3Q Max
-42.448 -5.303 2.615 9.010 21.123
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.120e+01 8.630e+00 10.567 < 2e-16 ***
Turnout19 -4.231e-01 1.076e-01 -3.931 9.41e-05 ***
Lab17 -6.838e-01 3.391e-02 -20.165 < 2e-16 ***
Electorate19 1.419e-04 5.646e-05 2.514 0.0122 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.24 on 619 degrees of freedom
Multiple R-squared: 0.4514, Adjusted R-squared: 0.4487
F-statistic: 169.7 on 3 and 619 DF, p-value: < 2.2e-16
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
(Intercept) | 78.87 *** | 107.08 *** | 91.20 *** |
(8.93) | (8.02) | (8.63) | |
Turnout19 | -0.39 *** | -0.55 *** | -0.42 *** |
(0.11) | (0.10) | (0.11) | |
Lab17 | -0.64 *** | -0.75 *** | -0.68 *** |
(0.04) | (0.03) | (0.03) | |
Electorate19 | 0.00 *** | 0.00 | 0.00 * |
(0.00) | (0.00) | (0.00) | |
N | 630 | 591 | 623 |
R2 | 0.40 | 0.55 | 0.45 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Turnout19 Lab17 Electorate19
Turnout19 1.0000000 -0.56793983 0.18821553
Lab17 -0.5679398 1.00000000 -0.09995258
Electorate19 0.1882155 -0.09995258 1.00000000
\[ y = \beta_0 + \beta_1x_1 + \beta_2x_2\]
Where \(x_2\) is a dummy variable, so when \(x_2 = 0\), the equation becomes:
\[ y = \beta_0 + \beta_1x_1\]
And when \(x_2 = 1\), the equation becomes:
\[\begin{align*} y &= \beta_0 + \beta_1x_1 + \beta_2 \\ \\ &= (\beta_0 + \beta_2) + \beta_1x_1 \end{align*}\]
Call:
lm(formula = Con19 ~ Turnout19 + ConPPCsex19, data = bes)
Residuals:
Min 1Q Median 3Q Max
-40.339 -11.541 1.715 12.248 39.568
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.1505 7.6170 -0.939 0.348
Turnout19 0.7855 0.1120 7.011 6.12e-12 ***
ConPPCsex19 -5.6565 1.3743 -4.116 4.37e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.88 on 628 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1018, Adjusted R-squared: 0.09899
F-statistic: 35.61 on 2 and 628 DF, p-value: 2.249e-15
mod3 |>
ggeffect(c("Turnout19", "ConPPCsex19")) |>
plot(show_ci = F) +
stat_regline_equation( #library(ggpubr)
formula = y ~ x,
label.x.npc = "left",
label.y.npc = "top",
show.legend = F
) +
labs(title = "Turnout19 and ConPPCsex19 on Con19",
x = "Turnout of 2019",
y = "Conservatives votes share in 2019",
)
\[ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1 * x_2\]
Where \(x_2\) is a dummy variable, so when \(x_2 = 0\), the equation becomes:
\[ y = \beta_0 + \beta_1x_1\]
And when \(x_2 = 1\), the equation becomes:
\[\begin{align*} y &= \beta_0 + \beta_1x_1 + \beta_2 + \beta_3x_1 \\ \\ &= (\beta_0 + \beta_2) + (\beta_1 + \beta_3)x_1 \end{align*}\]
Call:
lm(formula = Con19 ~ Turnout19 * ConPPCsex19, data = bes)
Residuals:
Min 1Q Median 3Q Max
-40.046 -11.493 1.893 12.159 39.786
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.3393 9.5699 -0.453 0.650
Turnout19 0.7439 0.1410 5.276 1.82e-07 ***
ConPPCsex19 -13.2244 15.6404 -0.846 0.398
Turnout19:ConPPCsex19 0.1129 0.2324 0.486 0.627
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.89 on 627 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1022, Adjusted R-squared: 0.09789
F-statistic: 23.79 on 3 and 627 DF, p-value: 1.367e-14
Call:
lm(formula = Con19 ~ Turnout19 * Lab17, data = bes)
Residuals:
Min 1Q Median 3Q Max
-55.578 -4.436 3.682 8.604 18.449
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 201.654501 19.347048 10.423 < 2e-16 ***
Turnout19 -1.902698 0.275354 -6.910 1.20e-11 ***
Lab17 -2.992372 0.376425 -7.949 8.76e-15 ***
Turnout19:Lab17 0.034659 0.005514 6.286 6.12e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.76 on 626 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.4212, Adjusted R-squared: 0.4184
F-statistic: 151.8 on 3 and 626 DF, p-value: < 2.2e-16
\[ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1 * x_2\]
When \(x_2 = 24.14\), the equation becomes:
\[\begin{align*} y &= \beta_0 + \beta_1x_1 + \beta_2 * 24.14 + \beta_3x_1 * 24.14 \\ \\ & = (\beta_0 + 24.14\beta_2) + (\beta_1 + 24.14\beta_3)x_1 \\ \\ & = (201.65 + 24.14*(-2.99)) + (-1.90 + 24.14*0.035)x_1 \\ \\ & = 130 - 1.1x_1 \end{align*}\]