第5讲:tidyverse 数据操作 ~ Part 1
2026年03月27日
# 号控制层级;空一整行才是新段落**加粗**、*斜体*、`行内代码`;三重反引号创建代码块$...$ 行内公式;$$...$$ 独立居中公式;LaTeX 语法| 和 - 绘制表格;> 块引用;[^1] 脚注|> 与 %>% 的思想(约10分钟)count() ——最直观的频率统计(约10分钟)filter() 与 arrange()(约20分钟)select() 与 mutate()(约20分钟)summarise() 与 .by=(约15分钟)提示
本讲是整个课程的转折点——从"用 R 写代码"到"用 R 做数据分析"。tidyverse 提供了一套一致、直观的语法,让数据处理变得像说话一样自然。
生态系统概览与核心理念
tidyverse 不是一个包,而是一组共享设计哲学的 R 包的集合:
加载后会看到:
── Attaching core tidyverse packages ──────────────────────
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
| 包 | 主要用途 |
|---|---|
readr |
读取 CSV、TSV 等文本数据 |
dplyr |
数据操作(筛选、变换、汇总) |
tidyr |
数据整形(宽表↔︎长表) |
ggplot2 |
数据可视化 |
stringr |
字符串处理 |
lubridate |
日期时间处理 |
purrr |
函数式编程 |
tidyverse 的核心是 tidy data(整洁数据)的概念,由 Hadley Wickham 提出:
整洁数据的三条原则
不整洁数据的常见问题
# A tibble: 3 × 4
产品 `2019` `2020` `2021`
<chr> <dbl> <dbl> <dbl>
1 手机 100 120 150
2 电脑 80 90 110
3 平板 60 70 85
# A tibble: 3 × 3
患者ID 姓名_年龄 诊断结果
<int> <chr> <chr>
1 1 张三_25 高血压
2 2 李四_32 糖尿病
3 3 王五_28 感冒
# A tibble: 3 × 4
学生ID 语文成绩 数学成绩 英语成绩
<int> <dbl> <dbl> <dbl>
1 1 85 90 88
2 2 78 88 82
3 3 92 85 89
# A tibble: 5 × 5
员工姓名 工号 部门 部门人数 部门经理
<chr> <chr> <chr> <dbl> <chr>
1 张三 E001 销售部 NA <NA>
2 李四 E002 技术部 NA <NA>
3 王五 E003 销售部 NA <NA>
4 销售部 <NA> <NA> 15 赵经理
5 技术部 <NA> <NA> 25 钱经理
# A tibble: 3 × 5
指标 第一季度_实际 第一季度_预算 第二季度_实际 第二季度_预算
<chr> <dbl> <dbl> <dbl> <dbl>
1 销售额 100 90 110 105
2 利润 20 18 25 22
3 客户数 50 55 60 58
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ad…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Tor…
$ bill_len <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, …
$ bill_dep <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, …
$ flipper_len <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180,…
$ body_mass <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, …
$ sex <fct> male, female, female, NA, female, male, female, male, NA, …
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
species island bill_len bill_dep flipper_len body_mass sex year
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18.0 195 3250 female 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
同一个任务,两种写法:
base R:筛选 Adelie 企鹅并计算体重均值
[1] 3701
[1] 3701
tidyverse:同样的任务
重要
tidyverse 的代码像一段自然语言描述:"拿到企鹅数据,筛选出 Adelie 物种,计算体重的均值"——每一步做什么,一眼就能看懂。这就是它的核心价值:可读性。
tidyverse 使用 tibble 代替 base R 的 data.frame——打印更友好,显示行列数和列类型,超出屏幕自动截断:
species island bill_len bill_dep flipper_len body_mass sex year
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18.0 195 3250 female 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
7 Adelie Torgersen 38.9 17.8 181 3625 female 2007
8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007
10 Adelie Torgersen 42.0 20.2 190 4250 <NA> 2007
11 Adelie Torgersen 37.8 17.1 186 3300 <NA> 2007
12 Adelie Torgersen 37.8 17.3 180 3700 <NA> 2007
13 Adelie Torgersen 41.1 17.6 182 3200 female 2007
14 Adelie Torgersen 38.6 21.2 191 3800 male 2007
15 Adelie Torgersen 34.6 21.1 198 4400 male 2007
16 Adelie Torgersen 36.6 17.8 185 3700 female 2007
17 Adelie Torgersen 38.7 19.0 195 3450 female 2007
18 Adelie Torgersen 42.5 20.7 197 4500 male 2007
19 Adelie Torgersen 34.4 18.4 184 3325 female 2007
20 Adelie Torgersen 46.0 21.5 194 4200 male 2007
21 Adelie Biscoe 37.8 18.3 174 3400 female 2007
22 Adelie Biscoe 37.7 18.7 180 3600 male 2007
23 Adelie Biscoe 35.9 19.2 189 3800 female 2007
24 Adelie Biscoe 38.2 18.1 185 3950 male 2007
25 Adelie Biscoe 38.8 17.2 180 3800 male 2007
26 Adelie Biscoe 35.3 18.9 187 3800 female 2007
27 Adelie Biscoe 40.6 18.6 183 3550 male 2007
28 Adelie Biscoe 40.5 17.9 187 3200 female 2007
29 Adelie Biscoe 37.9 18.6 172 3150 female 2007
30 Adelie Biscoe 40.5 18.9 180 3950 male 2007
31 Adelie Dream 39.5 16.7 178 3250 female 2007
32 Adelie Dream 37.2 18.1 178 3900 male 2007
33 Adelie Dream 39.5 17.8 188 3300 female 2007
34 Adelie Dream 40.9 18.9 184 3900 male 2007
35 Adelie Dream 36.4 17.0 195 3325 female 2007
36 Adelie Dream 39.2 21.1 196 4150 male 2007
37 Adelie Dream 38.8 20.0 190 3950 male 2007
38 Adelie Dream 42.2 18.5 180 3550 female 2007
39 Adelie Dream 37.6 19.3 181 3300 female 2007
40 Adelie Dream 39.8 19.1 184 4650 male 2007
41 Adelie Dream 36.5 18.0 182 3150 female 2007
42 Adelie Dream 40.8 18.4 195 3900 male 2007
43 Adelie Dream 36.0 18.5 186 3100 female 2007
44 Adelie Dream 44.1 19.7 196 4400 male 2007
45 Adelie Dream 37.0 16.9 185 3000 female 2007
46 Adelie Dream 39.6 18.8 190 4600 male 2007
47 Adelie Dream 41.1 19.0 182 3425 male 2007
48 Adelie Dream 37.5 18.9 179 2975 <NA> 2007
49 Adelie Dream 36.0 17.9 190 3450 female 2007
50 Adelie Dream 42.3 21.2 191 4150 male 2007
51 Adelie Biscoe 39.6 17.7 186 3500 female 2008
52 Adelie Biscoe 40.1 18.9 188 4300 male 2008
53 Adelie Biscoe 35.0 17.9 190 3450 female 2008
54 Adelie Biscoe 42.0 19.5 200 4050 male 2008
55 Adelie Biscoe 34.5 18.1 187 2900 female 2008
56 Adelie Biscoe 41.4 18.6 191 3700 male 2008
57 Adelie Biscoe 39.0 17.5 186 3550 female 2008
58 Adelie Biscoe 40.6 18.8 193 3800 male 2008
59 Adelie Biscoe 36.5 16.6 181 2850 female 2008
60 Adelie Biscoe 37.6 19.1 194 3750 male 2008
61 Adelie Biscoe 35.7 16.9 185 3150 female 2008
62 Adelie Biscoe 41.3 21.1 195 4400 male 2008
63 Adelie Biscoe 37.6 17.0 185 3600 female 2008
64 Adelie Biscoe 41.1 18.2 192 4050 male 2008
65 Adelie Biscoe 36.4 17.1 184 2850 female 2008
66 Adelie Biscoe 41.6 18.0 192 3950 male 2008
67 Adelie Biscoe 35.5 16.2 195 3350 female 2008
68 Adelie Biscoe 41.1 19.1 188 4100 male 2008
69 Adelie Torgersen 35.9 16.6 190 3050 female 2008
70 Adelie Torgersen 41.8 19.4 198 4450 male 2008
71 Adelie Torgersen 33.5 19.0 190 3600 female 2008
72 Adelie Torgersen 39.7 18.4 190 3900 male 2008
73 Adelie Torgersen 39.6 17.2 196 3550 female 2008
74 Adelie Torgersen 45.8 18.9 197 4150 male 2008
75 Adelie Torgersen 35.5 17.5 190 3700 female 2008
76 Adelie Torgersen 42.8 18.5 195 4250 male 2008
77 Adelie Torgersen 40.9 16.8 191 3700 female 2008
78 Adelie Torgersen 37.2 19.4 184 3900 male 2008
79 Adelie Torgersen 36.2 16.1 187 3550 female 2008
80 Adelie Torgersen 42.1 19.1 195 4000 male 2008
81 Adelie Torgersen 34.6 17.2 189 3200 female 2008
82 Adelie Torgersen 42.9 17.6 196 4700 male 2008
83 Adelie Torgersen 36.7 18.8 187 3800 female 2008
84 Adelie Torgersen 35.1 19.4 193 4200 male 2008
85 Adelie Dream 37.3 17.8 191 3350 female 2008
86 Adelie Dream 41.3 20.3 194 3550 male 2008
87 Adelie Dream 36.3 19.5 190 3800 male 2008
88 Adelie Dream 36.9 18.6 189 3500 female 2008
89 Adelie Dream 38.3 19.2 189 3950 male 2008
90 Adelie Dream 38.9 18.8 190 3600 female 2008
91 Adelie Dream 35.7 18.0 202 3550 female 2008
92 Adelie Dream 41.1 18.1 205 4300 male 2008
93 Adelie Dream 34.0 17.1 185 3400 female 2008
94 Adelie Dream 39.6 18.1 186 4450 male 2008
95 Adelie Dream 36.2 17.3 187 3300 female 2008
96 Adelie Dream 40.8 18.9 208 4300 male 2008
97 Adelie Dream 38.1 18.6 190 3700 female 2008
98 Adelie Dream 40.3 18.5 196 4350 male 2008
99 Adelie Dream 33.1 16.1 178 2900 female 2008
100 Adelie Dream 43.2 18.5 192 4100 male 2008
101 Adelie Biscoe 35.0 17.9 192 3725 female 2009
102 Adelie Biscoe 41.0 20.0 203 4725 male 2009
103 Adelie Biscoe 37.7 16.0 183 3075 female 2009
104 Adelie Biscoe 37.8 20.0 190 4250 male 2009
105 Adelie Biscoe 37.9 18.6 193 2925 female 2009
106 Adelie Biscoe 39.7 18.9 184 3550 male 2009
107 Adelie Biscoe 38.6 17.2 199 3750 female 2009
108 Adelie Biscoe 38.2 20.0 190 3900 male 2009
109 Adelie Biscoe 38.1 17.0 181 3175 female 2009
110 Adelie Biscoe 43.2 19.0 197 4775 male 2009
111 Adelie Biscoe 38.1 16.5 198 3825 female 2009
112 Adelie Biscoe 45.6 20.3 191 4600 male 2009
113 Adelie Biscoe 39.7 17.7 193 3200 female 2009
114 Adelie Biscoe 42.2 19.5 197 4275 male 2009
115 Adelie Biscoe 39.6 20.7 191 3900 female 2009
116 Adelie Biscoe 42.7 18.3 196 4075 male 2009
117 Adelie Torgersen 38.6 17.0 188 2900 female 2009
118 Adelie Torgersen 37.3 20.5 199 3775 male 2009
119 Adelie Torgersen 35.7 17.0 189 3350 female 2009
120 Adelie Torgersen 41.1 18.6 189 3325 male 2009
121 Adelie Torgersen 36.2 17.2 187 3150 female 2009
122 Adelie Torgersen 37.7 19.8 198 3500 male 2009
123 Adelie Torgersen 40.2 17.0 176 3450 female 2009
124 Adelie Torgersen 41.4 18.5 202 3875 male 2009
125 Adelie Torgersen 35.2 15.9 186 3050 female 2009
126 Adelie Torgersen 40.6 19.0 199 4000 male 2009
127 Adelie Torgersen 38.8 17.6 191 3275 female 2009
128 Adelie Torgersen 41.5 18.3 195 4300 male 2009
129 Adelie Torgersen 39.0 17.1 191 3050 female 2009
130 Adelie Torgersen 44.1 18.0 210 4000 male 2009
131 Adelie Torgersen 38.5 17.9 190 3325 female 2009
132 Adelie Torgersen 43.1 19.2 197 3500 male 2009
133 Adelie Dream 36.8 18.5 193 3500 female 2009
134 Adelie Dream 37.5 18.5 199 4475 male 2009
135 Adelie Dream 38.1 17.6 187 3425 female 2009
136 Adelie Dream 41.1 17.5 190 3900 male 2009
137 Adelie Dream 35.6 17.5 191 3175 female 2009
138 Adelie Dream 40.2 20.1 200 3975 male 2009
139 Adelie Dream 37.0 16.5 185 3400 female 2009
140 Adelie Dream 39.7 17.9 193 4250 male 2009
141 Adelie Dream 40.2 17.1 193 3400 female 2009
142 Adelie Dream 40.6 17.2 187 3475 male 2009
143 Adelie Dream 32.1 15.5 188 3050 female 2009
144 Adelie Dream 40.7 17.0 190 3725 male 2009
145 Adelie Dream 37.3 16.8 192 3000 female 2009
146 Adelie Dream 39.0 18.7 185 3650 male 2009
147 Adelie Dream 39.2 18.6 190 4250 male 2009
148 Adelie Dream 36.6 18.4 184 3475 female 2009
149 Adelie Dream 36.0 17.8 195 3450 female 2009
150 Adelie Dream 37.8 18.1 193 3750 male 2009
151 Adelie Dream 36.0 17.1 187 3700 female 2009
152 Adelie Dream 41.5 18.5 201 4000 male 2009
153 Gentoo Biscoe 46.1 13.2 211 4500 female 2007
154 Gentoo Biscoe 50.0 16.3 230 5700 male 2007
155 Gentoo Biscoe 48.7 14.1 210 4450 female 2007
156 Gentoo Biscoe 50.0 15.2 218 5700 male 2007
157 Gentoo Biscoe 47.6 14.5 215 5400 male 2007
158 Gentoo Biscoe 46.5 13.5 210 4550 female 2007
159 Gentoo Biscoe 45.4 14.6 211 4800 female 2007
160 Gentoo Biscoe 46.7 15.3 219 5200 male 2007
161 Gentoo Biscoe 43.3 13.4 209 4400 female 2007
162 Gentoo Biscoe 46.8 15.4 215 5150 male 2007
163 Gentoo Biscoe 40.9 13.7 214 4650 female 2007
164 Gentoo Biscoe 49.0 16.1 216 5550 male 2007
165 Gentoo Biscoe 45.5 13.7 214 4650 female 2007
166 Gentoo Biscoe 48.4 14.6 213 5850 male 2007
167 Gentoo Biscoe 45.8 14.6 210 4200 female 2007
168 Gentoo Biscoe 49.3 15.7 217 5850 male 2007
169 Gentoo Biscoe 42.0 13.5 210 4150 female 2007
170 Gentoo Biscoe 49.2 15.2 221 6300 male 2007
171 Gentoo Biscoe 46.2 14.5 209 4800 female 2007
172 Gentoo Biscoe 48.7 15.1 222 5350 male 2007
173 Gentoo Biscoe 50.2 14.3 218 5700 male 2007
174 Gentoo Biscoe 45.1 14.5 215 5000 female 2007
175 Gentoo Biscoe 46.5 14.5 213 4400 female 2007
176 Gentoo Biscoe 46.3 15.8 215 5050 male 2007
177 Gentoo Biscoe 42.9 13.1 215 5000 female 2007
178 Gentoo Biscoe 46.1 15.1 215 5100 male 2007
179 Gentoo Biscoe 44.5 14.3 216 4100 <NA> 2007
180 Gentoo Biscoe 47.8 15.0 215 5650 male 2007
181 Gentoo Biscoe 48.2 14.3 210 4600 female 2007
182 Gentoo Biscoe 50.0 15.3 220 5550 male 2007
183 Gentoo Biscoe 47.3 15.3 222 5250 male 2007
184 Gentoo Biscoe 42.8 14.2 209 4700 female 2007
185 Gentoo Biscoe 45.1 14.5 207 5050 female 2007
186 Gentoo Biscoe 59.6 17.0 230 6050 male 2007
187 Gentoo Biscoe 49.1 14.8 220 5150 female 2008
188 Gentoo Biscoe 48.4 16.3 220 5400 male 2008
189 Gentoo Biscoe 42.6 13.7 213 4950 female 2008
190 Gentoo Biscoe 44.4 17.3 219 5250 male 2008
191 Gentoo Biscoe 44.0 13.6 208 4350 female 2008
192 Gentoo Biscoe 48.7 15.7 208 5350 male 2008
193 Gentoo Biscoe 42.7 13.7 208 3950 female 2008
194 Gentoo Biscoe 49.6 16.0 225 5700 male 2008
195 Gentoo Biscoe 45.3 13.7 210 4300 female 2008
196 Gentoo Biscoe 49.6 15.0 216 4750 male 2008
197 Gentoo Biscoe 50.5 15.9 222 5550 male 2008
198 Gentoo Biscoe 43.6 13.9 217 4900 female 2008
199 Gentoo Biscoe 45.5 13.9 210 4200 female 2008
200 Gentoo Biscoe 50.5 15.9 225 5400 male 2008
201 Gentoo Biscoe 44.9 13.3 213 5100 female 2008
202 Gentoo Biscoe 45.2 15.8 215 5300 male 2008
203 Gentoo Biscoe 46.6 14.2 210 4850 female 2008
204 Gentoo Biscoe 48.5 14.1 220 5300 male 2008
205 Gentoo Biscoe 45.1 14.4 210 4400 female 2008
206 Gentoo Biscoe 50.1 15.0 225 5000 male 2008
207 Gentoo Biscoe 46.5 14.4 217 4900 female 2008
208 Gentoo Biscoe 45.0 15.4 220 5050 male 2008
209 Gentoo Biscoe 43.8 13.9 208 4300 female 2008
210 Gentoo Biscoe 45.5 15.0 220 5000 male 2008
211 Gentoo Biscoe 43.2 14.5 208 4450 female 2008
212 Gentoo Biscoe 50.4 15.3 224 5550 male 2008
213 Gentoo Biscoe 45.3 13.8 208 4200 female 2008
214 Gentoo Biscoe 46.2 14.9 221 5300 male 2008
215 Gentoo Biscoe 45.7 13.9 214 4400 female 2008
216 Gentoo Biscoe 54.3 15.7 231 5650 male 2008
217 Gentoo Biscoe 45.8 14.2 219 4700 female 2008
218 Gentoo Biscoe 49.8 16.8 230 5700 male 2008
219 Gentoo Biscoe 46.2 14.4 214 4650 <NA> 2008
220 Gentoo Biscoe 49.5 16.2 229 5800 male 2008
221 Gentoo Biscoe 43.5 14.2 220 4700 female 2008
222 Gentoo Biscoe 50.7 15.0 223 5550 male 2008
223 Gentoo Biscoe 47.7 15.0 216 4750 female 2008
224 Gentoo Biscoe 46.4 15.6 221 5000 male 2008
225 Gentoo Biscoe 48.2 15.6 221 5100 male 2008
226 Gentoo Biscoe 46.5 14.8 217 5200 female 2008
227 Gentoo Biscoe 46.4 15.0 216 4700 female 2008
228 Gentoo Biscoe 48.6 16.0 230 5800 male 2008
229 Gentoo Biscoe 47.5 14.2 209 4600 female 2008
230 Gentoo Biscoe 51.1 16.3 220 6000 male 2008
231 Gentoo Biscoe 45.2 13.8 215 4750 female 2008
232 Gentoo Biscoe 45.2 16.4 223 5950 male 2008
233 Gentoo Biscoe 49.1 14.5 212 4625 female 2009
234 Gentoo Biscoe 52.5 15.6 221 5450 male 2009
235 Gentoo Biscoe 47.4 14.6 212 4725 female 2009
236 Gentoo Biscoe 50.0 15.9 224 5350 male 2009
237 Gentoo Biscoe 44.9 13.8 212 4750 female 2009
238 Gentoo Biscoe 50.8 17.3 228 5600 male 2009
239 Gentoo Biscoe 43.4 14.4 218 4600 female 2009
240 Gentoo Biscoe 51.3 14.2 218 5300 male 2009
241 Gentoo Biscoe 47.5 14.0 212 4875 female 2009
242 Gentoo Biscoe 52.1 17.0 230 5550 male 2009
243 Gentoo Biscoe 47.5 15.0 218 4950 female 2009
244 Gentoo Biscoe 52.2 17.1 228 5400 male 2009
245 Gentoo Biscoe 45.5 14.5 212 4750 female 2009
246 Gentoo Biscoe 49.5 16.1 224 5650 male 2009
247 Gentoo Biscoe 44.5 14.7 214 4850 female 2009
248 Gentoo Biscoe 50.8 15.7 226 5200 male 2009
249 Gentoo Biscoe 49.4 15.8 216 4925 male 2009
250 Gentoo Biscoe 46.9 14.6 222 4875 female 2009
251 Gentoo Biscoe 48.4 14.4 203 4625 female 2009
252 Gentoo Biscoe 51.1 16.5 225 5250 male 2009
253 Gentoo Biscoe 48.5 15.0 219 4850 female 2009
254 Gentoo Biscoe 55.9 17.0 228 5600 male 2009
255 Gentoo Biscoe 47.2 15.5 215 4975 female 2009
256 Gentoo Biscoe 49.1 15.0 228 5500 male 2009
257 Gentoo Biscoe 47.3 13.8 216 4725 <NA> 2009
258 Gentoo Biscoe 46.8 16.1 215 5500 male 2009
259 Gentoo Biscoe 41.7 14.7 210 4700 female 2009
260 Gentoo Biscoe 53.4 15.8 219 5500 male 2009
261 Gentoo Biscoe 43.3 14.0 208 4575 female 2009
262 Gentoo Biscoe 48.1 15.1 209 5500 male 2009
263 Gentoo Biscoe 50.5 15.2 216 5000 female 2009
264 Gentoo Biscoe 49.8 15.9 229 5950 male 2009
265 Gentoo Biscoe 43.5 15.2 213 4650 female 2009
266 Gentoo Biscoe 51.5 16.3 230 5500 male 2009
267 Gentoo Biscoe 46.2 14.1 217 4375 female 2009
268 Gentoo Biscoe 55.1 16.0 230 5850 male 2009
269 Gentoo Biscoe 44.5 15.7 217 4875 <NA> 2009
270 Gentoo Biscoe 48.8 16.2 222 6000 male 2009
271 Gentoo Biscoe 47.2 13.7 214 4925 female 2009
272 Gentoo Biscoe NA NA NA NA <NA> 2009
273 Gentoo Biscoe 46.8 14.3 215 4850 female 2009
274 Gentoo Biscoe 50.4 15.7 222 5750 male 2009
275 Gentoo Biscoe 45.2 14.8 212 5200 female 2009
276 Gentoo Biscoe 49.9 16.1 213 5400 male 2009
277 Chinstrap Dream 46.5 17.9 192 3500 female 2007
278 Chinstrap Dream 50.0 19.5 196 3900 male 2007
279 Chinstrap Dream 51.3 19.2 193 3650 male 2007
280 Chinstrap Dream 45.4 18.7 188 3525 female 2007
281 Chinstrap Dream 52.7 19.8 197 3725 male 2007
282 Chinstrap Dream 45.2 17.8 198 3950 female 2007
283 Chinstrap Dream 46.1 18.2 178 3250 female 2007
284 Chinstrap Dream 51.3 18.2 197 3750 male 2007
285 Chinstrap Dream 46.0 18.9 195 4150 female 2007
286 Chinstrap Dream 51.3 19.9 198 3700 male 2007
287 Chinstrap Dream 46.6 17.8 193 3800 female 2007
288 Chinstrap Dream 51.7 20.3 194 3775 male 2007
289 Chinstrap Dream 47.0 17.3 185 3700 female 2007
290 Chinstrap Dream 52.0 18.1 201 4050 male 2007
291 Chinstrap Dream 45.9 17.1 190 3575 female 2007
292 Chinstrap Dream 50.5 19.6 201 4050 male 2007
293 Chinstrap Dream 50.3 20.0 197 3300 male 2007
294 Chinstrap Dream 58.0 17.8 181 3700 female 2007
295 Chinstrap Dream 46.4 18.6 190 3450 female 2007
296 Chinstrap Dream 49.2 18.2 195 4400 male 2007
297 Chinstrap Dream 42.4 17.3 181 3600 female 2007
298 Chinstrap Dream 48.5 17.5 191 3400 male 2007
299 Chinstrap Dream 43.2 16.6 187 2900 female 2007
300 Chinstrap Dream 50.6 19.4 193 3800 male 2007
301 Chinstrap Dream 46.7 17.9 195 3300 female 2007
302 Chinstrap Dream 52.0 19.0 197 4150 male 2007
303 Chinstrap Dream 50.5 18.4 200 3400 female 2008
304 Chinstrap Dream 49.5 19.0 200 3800 male 2008
305 Chinstrap Dream 46.4 17.8 191 3700 female 2008
306 Chinstrap Dream 52.8 20.0 205 4550 male 2008
307 Chinstrap Dream 40.9 16.6 187 3200 female 2008
308 Chinstrap Dream 54.2 20.8 201 4300 male 2008
309 Chinstrap Dream 42.5 16.7 187 3350 female 2008
310 Chinstrap Dream 51.0 18.8 203 4100 male 2008
311 Chinstrap Dream 49.7 18.6 195 3600 male 2008
312 Chinstrap Dream 47.5 16.8 199 3900 female 2008
313 Chinstrap Dream 47.6 18.3 195 3850 female 2008
314 Chinstrap Dream 52.0 20.7 210 4800 male 2008
315 Chinstrap Dream 46.9 16.6 192 2700 female 2008
316 Chinstrap Dream 53.5 19.9 205 4500 male 2008
317 Chinstrap Dream 49.0 19.5 210 3950 male 2008
318 Chinstrap Dream 46.2 17.5 187 3650 female 2008
319 Chinstrap Dream 50.9 19.1 196 3550 male 2008
320 Chinstrap Dream 45.5 17.0 196 3500 female 2008
321 Chinstrap Dream 50.9 17.9 196 3675 female 2009
322 Chinstrap Dream 50.8 18.5 201 4450 male 2009
323 Chinstrap Dream 50.1 17.9 190 3400 female 2009
324 Chinstrap Dream 49.0 19.6 212 4300 male 2009
325 Chinstrap Dream 51.5 18.7 187 3250 male 2009
326 Chinstrap Dream 49.8 17.3 198 3675 female 2009
327 Chinstrap Dream 48.1 16.4 199 3325 female 2009
328 Chinstrap Dream 51.4 19.0 201 3950 male 2009
329 Chinstrap Dream 45.7 17.3 193 3600 female 2009
330 Chinstrap Dream 50.7 19.7 203 4050 male 2009
331 Chinstrap Dream 42.5 17.3 187 3350 female 2009
332 Chinstrap Dream 52.2 18.8 197 3450 male 2009
333 Chinstrap Dream 45.2 16.6 191 3250 female 2009
334 Chinstrap Dream 49.3 19.9 203 4050 male 2009
335 Chinstrap Dream 50.2 18.8 202 3800 male 2009
336 Chinstrap Dream 45.6 19.4 194 3525 female 2009
337 Chinstrap Dream 51.9 19.5 206 3950 male 2009
338 Chinstrap Dream 46.8 16.5 189 3650 female 2009
339 Chinstrap Dream 45.7 17.0 195 3650 female 2009
340 Chinstrap Dream 55.8 19.8 207 4000 male 2009
341 Chinstrap Dream 43.5 18.1 202 3400 female 2009
342 Chinstrap Dream 49.6 18.2 193 3775 male 2009
343 Chinstrap Dream 50.8 19.0 210 4100 male 2009
344 Chinstrap Dream 50.2 18.7 198 3775 female 2009
注记
penguins 来自 palmerpenguins 包,是本讲使用的内置数据集,共 344 行、8 列,记录了南极三个岛屿上三种企鹅的体征测量数据。
|> 与 %>% 的思想
假设我们要对企鹅数据做三步操作:
嵌套写法(由内到外,难以阅读):
[1] 5076
警告
这段代码要从最内层读到最外层,逻辑顺序和代码顺序完全相反。想象一下,如果有十步操作,这段代码会有多难读……
管道操作符 |>(R 4.1+ 原生)或 %>%(magrittr 包)的含义是:
把左边的结果,作为第一个参数,传给右边的函数
平均体重
1 5076
提示
读法:拿到 penguins,然后筛选 Gentoo,然后去除缺失,然后计算均值。
代码的阅读顺序和操作的执行顺序完全一致。这正是管道的价值所在。
|> 与 %>% 的区别| 特性 | \|> |
%>% |
|---|---|---|
| 来源 | R 4.1+ 原生 | magrittr 包 |
| 是否需要加载包 | 否 | 需要(tidyverse 已含) |
| 占位符 |
_(R 4.2+) |
. |
| 速度 | 略快 | 略慢 |
| 推荐程度 | ✅ 新代码推荐 | ✅ 旧代码兼容 |
注记
两者在 99% 的场景下可以互换。本课程统一使用 |>。快捷键:Ctrl + Shift + M(RStudio 可在设置中切换为 |>)。
species island 平均体重 样本量
1 Adelie Dream 3688 56
2 Adelie Torgersen 3706 51
3 Adelie Biscoe 3710 44
4 Chinstrap Dream 3733 68
5 Gentoo Biscoe 5076 123
count():快速频率统计探索数据分布的第一步
count():最直观的频率统计count() 是认识一份新数据时最常用的第一步——它告诉你每个类别有多少行。
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ad…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Tor…
$ bill_len <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, …
$ bill_dep <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, …
$ flipper_len <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180,…
$ body_mass <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, …
$ sex <fct> male, female, female, NA, female, male, female, male, NA, …
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
species n
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
注记
count(x) 等价于 group_by(x) |> summarise(n = n()),但写起来更简洁。n 是默认列名,可以用 name = 参数自定义。
add_count()
species island n
1 Gentoo Biscoe 124
2 Chinstrap Dream 68
3 Adelie Dream 56
4 Adelie Torgersen 52
5 Adelie Biscoe 44
提示
count() 压缩行数(每组一行);add_count() 保留所有行,只是添加新列。两者适用场景不同,后面讲 summarise() 时还会进一步对比。
filter() 与 arrange()
filter():按条件筛选行 species island bill_len bill_dep flipper_len body_mass sex year
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18.0 195 3250 female 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
7 Adelie Torgersen 38.9 17.8 181 3625 female 2007
8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007
10 Adelie Torgersen 42.0 20.2 190 4250 <NA> 2007
11 Adelie Torgersen 37.8 17.1 186 3300 <NA> 2007
12 Adelie Torgersen 37.8 17.3 180 3700 <NA> 2007
13 Adelie Torgersen 41.1 17.6 182 3200 female 2007
14 Adelie Torgersen 38.6 21.2 191 3800 male 2007
15 Adelie Torgersen 34.6 21.1 198 4400 male 2007
16 Adelie Torgersen 36.6 17.8 185 3700 female 2007
17 Adelie Torgersen 38.7 19.0 195 3450 female 2007
18 Adelie Torgersen 42.5 20.7 197 4500 male 2007
19 Adelie Torgersen 34.4 18.4 184 3325 female 2007
20 Adelie Torgersen 46.0 21.5 194 4200 male 2007
21 Adelie Biscoe 37.8 18.3 174 3400 female 2007
22 Adelie Biscoe 37.7 18.7 180 3600 male 2007
23 Adelie Biscoe 35.9 19.2 189 3800 female 2007
24 Adelie Biscoe 38.2 18.1 185 3950 male 2007
25 Adelie Biscoe 38.8 17.2 180 3800 male 2007
26 Adelie Biscoe 35.3 18.9 187 3800 female 2007
27 Adelie Biscoe 40.6 18.6 183 3550 male 2007
28 Adelie Biscoe 40.5 17.9 187 3200 female 2007
29 Adelie Biscoe 37.9 18.6 172 3150 female 2007
30 Adelie Biscoe 40.5 18.9 180 3950 male 2007
31 Adelie Dream 39.5 16.7 178 3250 female 2007
32 Adelie Dream 37.2 18.1 178 3900 male 2007
33 Adelie Dream 39.5 17.8 188 3300 female 2007
34 Adelie Dream 40.9 18.9 184 3900 male 2007
35 Adelie Dream 36.4 17.0 195 3325 female 2007
36 Adelie Dream 39.2 21.1 196 4150 male 2007
37 Adelie Dream 38.8 20.0 190 3950 male 2007
38 Adelie Dream 42.2 18.5 180 3550 female 2007
39 Adelie Dream 37.6 19.3 181 3300 female 2007
40 Adelie Dream 39.8 19.1 184 4650 male 2007
41 Adelie Dream 36.5 18.0 182 3150 female 2007
42 Adelie Dream 40.8 18.4 195 3900 male 2007
43 Adelie Dream 36.0 18.5 186 3100 female 2007
44 Adelie Dream 44.1 19.7 196 4400 male 2007
45 Adelie Dream 37.0 16.9 185 3000 female 2007
46 Adelie Dream 39.6 18.8 190 4600 male 2007
47 Adelie Dream 41.1 19.0 182 3425 male 2007
48 Adelie Dream 37.5 18.9 179 2975 <NA> 2007
49 Adelie Dream 36.0 17.9 190 3450 female 2007
50 Adelie Dream 42.3 21.2 191 4150 male 2007
51 Adelie Biscoe 39.6 17.7 186 3500 female 2008
52 Adelie Biscoe 40.1 18.9 188 4300 male 2008
53 Adelie Biscoe 35.0 17.9 190 3450 female 2008
54 Adelie Biscoe 42.0 19.5 200 4050 male 2008
55 Adelie Biscoe 34.5 18.1 187 2900 female 2008
56 Adelie Biscoe 41.4 18.6 191 3700 male 2008
57 Adelie Biscoe 39.0 17.5 186 3550 female 2008
58 Adelie Biscoe 40.6 18.8 193 3800 male 2008
59 Adelie Biscoe 36.5 16.6 181 2850 female 2008
60 Adelie Biscoe 37.6 19.1 194 3750 male 2008
61 Adelie Biscoe 35.7 16.9 185 3150 female 2008
62 Adelie Biscoe 41.3 21.1 195 4400 male 2008
63 Adelie Biscoe 37.6 17.0 185 3600 female 2008
64 Adelie Biscoe 41.1 18.2 192 4050 male 2008
65 Adelie Biscoe 36.4 17.1 184 2850 female 2008
66 Adelie Biscoe 41.6 18.0 192 3950 male 2008
67 Adelie Biscoe 35.5 16.2 195 3350 female 2008
68 Adelie Biscoe 41.1 19.1 188 4100 male 2008
69 Adelie Torgersen 35.9 16.6 190 3050 female 2008
70 Adelie Torgersen 41.8 19.4 198 4450 male 2008
71 Adelie Torgersen 33.5 19.0 190 3600 female 2008
72 Adelie Torgersen 39.7 18.4 190 3900 male 2008
73 Adelie Torgersen 39.6 17.2 196 3550 female 2008
74 Adelie Torgersen 45.8 18.9 197 4150 male 2008
75 Adelie Torgersen 35.5 17.5 190 3700 female 2008
76 Adelie Torgersen 42.8 18.5 195 4250 male 2008
77 Adelie Torgersen 40.9 16.8 191 3700 female 2008
78 Adelie Torgersen 37.2 19.4 184 3900 male 2008
79 Adelie Torgersen 36.2 16.1 187 3550 female 2008
80 Adelie Torgersen 42.1 19.1 195 4000 male 2008
81 Adelie Torgersen 34.6 17.2 189 3200 female 2008
82 Adelie Torgersen 42.9 17.6 196 4700 male 2008
83 Adelie Torgersen 36.7 18.8 187 3800 female 2008
84 Adelie Torgersen 35.1 19.4 193 4200 male 2008
85 Adelie Dream 37.3 17.8 191 3350 female 2008
86 Adelie Dream 41.3 20.3 194 3550 male 2008
87 Adelie Dream 36.3 19.5 190 3800 male 2008
88 Adelie Dream 36.9 18.6 189 3500 female 2008
89 Adelie Dream 38.3 19.2 189 3950 male 2008
90 Adelie Dream 38.9 18.8 190 3600 female 2008
91 Adelie Dream 35.7 18.0 202 3550 female 2008
92 Adelie Dream 41.1 18.1 205 4300 male 2008
93 Adelie Dream 34.0 17.1 185 3400 female 2008
94 Adelie Dream 39.6 18.1 186 4450 male 2008
95 Adelie Dream 36.2 17.3 187 3300 female 2008
96 Adelie Dream 40.8 18.9 208 4300 male 2008
97 Adelie Dream 38.1 18.6 190 3700 female 2008
98 Adelie Dream 40.3 18.5 196 4350 male 2008
99 Adelie Dream 33.1 16.1 178 2900 female 2008
100 Adelie Dream 43.2 18.5 192 4100 male 2008
101 Adelie Biscoe 35.0 17.9 192 3725 female 2009
102 Adelie Biscoe 41.0 20.0 203 4725 male 2009
103 Adelie Biscoe 37.7 16.0 183 3075 female 2009
104 Adelie Biscoe 37.8 20.0 190 4250 male 2009
105 Adelie Biscoe 37.9 18.6 193 2925 female 2009
106 Adelie Biscoe 39.7 18.9 184 3550 male 2009
107 Adelie Biscoe 38.6 17.2 199 3750 female 2009
108 Adelie Biscoe 38.2 20.0 190 3900 male 2009
109 Adelie Biscoe 38.1 17.0 181 3175 female 2009
110 Adelie Biscoe 43.2 19.0 197 4775 male 2009
111 Adelie Biscoe 38.1 16.5 198 3825 female 2009
112 Adelie Biscoe 45.6 20.3 191 4600 male 2009
113 Adelie Biscoe 39.7 17.7 193 3200 female 2009
114 Adelie Biscoe 42.2 19.5 197 4275 male 2009
115 Adelie Biscoe 39.6 20.7 191 3900 female 2009
116 Adelie Biscoe 42.7 18.3 196 4075 male 2009
117 Adelie Torgersen 38.6 17.0 188 2900 female 2009
118 Adelie Torgersen 37.3 20.5 199 3775 male 2009
119 Adelie Torgersen 35.7 17.0 189 3350 female 2009
120 Adelie Torgersen 41.1 18.6 189 3325 male 2009
121 Adelie Torgersen 36.2 17.2 187 3150 female 2009
122 Adelie Torgersen 37.7 19.8 198 3500 male 2009
123 Adelie Torgersen 40.2 17.0 176 3450 female 2009
124 Adelie Torgersen 41.4 18.5 202 3875 male 2009
125 Adelie Torgersen 35.2 15.9 186 3050 female 2009
126 Adelie Torgersen 40.6 19.0 199 4000 male 2009
127 Adelie Torgersen 38.8 17.6 191 3275 female 2009
128 Adelie Torgersen 41.5 18.3 195 4300 male 2009
129 Adelie Torgersen 39.0 17.1 191 3050 female 2009
130 Adelie Torgersen 44.1 18.0 210 4000 male 2009
131 Adelie Torgersen 38.5 17.9 190 3325 female 2009
132 Adelie Torgersen 43.1 19.2 197 3500 male 2009
133 Adelie Dream 36.8 18.5 193 3500 female 2009
134 Adelie Dream 37.5 18.5 199 4475 male 2009
135 Adelie Dream 38.1 17.6 187 3425 female 2009
136 Adelie Dream 41.1 17.5 190 3900 male 2009
137 Adelie Dream 35.6 17.5 191 3175 female 2009
138 Adelie Dream 40.2 20.1 200 3975 male 2009
139 Adelie Dream 37.0 16.5 185 3400 female 2009
140 Adelie Dream 39.7 17.9 193 4250 male 2009
141 Adelie Dream 40.2 17.1 193 3400 female 2009
142 Adelie Dream 40.6 17.2 187 3475 male 2009
143 Adelie Dream 32.1 15.5 188 3050 female 2009
144 Adelie Dream 40.7 17.0 190 3725 male 2009
145 Adelie Dream 37.3 16.8 192 3000 female 2009
146 Adelie Dream 39.0 18.7 185 3650 male 2009
147 Adelie Dream 39.2 18.6 190 4250 male 2009
148 Adelie Dream 36.6 18.4 184 3475 female 2009
149 Adelie Dream 36.0 17.8 195 3450 female 2009
150 Adelie Dream 37.8 18.1 193 3750 male 2009
151 Adelie Dream 36.0 17.1 187 3700 female 2009
152 Adelie Dream 41.5 18.5 201 4000 male 2009
filter() 的常用条件 species island bill_len bill_dep flipper_len body_mass sex year
1 Gentoo Biscoe 50.0 16.3 230 5700 male 2007
2 Gentoo Biscoe 50.0 15.2 218 5700 male 2007
3 Gentoo Biscoe 47.6 14.5 215 5400 male 2007
4 Gentoo Biscoe 46.7 15.3 219 5200 male 2007
5 Gentoo Biscoe 46.8 15.4 215 5150 male 2007
6 Gentoo Biscoe 49.0 16.1 216 5550 male 2007
7 Gentoo Biscoe 48.4 14.6 213 5850 male 2007
8 Gentoo Biscoe 49.3 15.7 217 5850 male 2007
9 Gentoo Biscoe 49.2 15.2 221 6300 male 2007
10 Gentoo Biscoe 48.7 15.1 222 5350 male 2007
11 Gentoo Biscoe 50.2 14.3 218 5700 male 2007
12 Gentoo Biscoe 46.3 15.8 215 5050 male 2007
13 Gentoo Biscoe 46.1 15.1 215 5100 male 2007
14 Gentoo Biscoe 47.8 15.0 215 5650 male 2007
15 Gentoo Biscoe 50.0 15.3 220 5550 male 2007
16 Gentoo Biscoe 47.3 15.3 222 5250 male 2007
17 Gentoo Biscoe 45.1 14.5 207 5050 female 2007
18 Gentoo Biscoe 59.6 17.0 230 6050 male 2007
19 Gentoo Biscoe 49.1 14.8 220 5150 female 2008
20 Gentoo Biscoe 48.4 16.3 220 5400 male 2008
21 Gentoo Biscoe 44.4 17.3 219 5250 male 2008
22 Gentoo Biscoe 48.7 15.7 208 5350 male 2008
23 Gentoo Biscoe 49.6 16.0 225 5700 male 2008
24 Gentoo Biscoe 50.5 15.9 222 5550 male 2008
25 Gentoo Biscoe 50.5 15.9 225 5400 male 2008
26 Gentoo Biscoe 44.9 13.3 213 5100 female 2008
27 Gentoo Biscoe 45.2 15.8 215 5300 male 2008
28 Gentoo Biscoe 48.5 14.1 220 5300 male 2008
29 Gentoo Biscoe 45.0 15.4 220 5050 male 2008
30 Gentoo Biscoe 50.4 15.3 224 5550 male 2008
31 Gentoo Biscoe 46.2 14.9 221 5300 male 2008
32 Gentoo Biscoe 54.3 15.7 231 5650 male 2008
33 Gentoo Biscoe 49.8 16.8 230 5700 male 2008
34 Gentoo Biscoe 49.5 16.2 229 5800 male 2008
35 Gentoo Biscoe 50.7 15.0 223 5550 male 2008
36 Gentoo Biscoe 48.2 15.6 221 5100 male 2008
37 Gentoo Biscoe 46.5 14.8 217 5200 female 2008
38 Gentoo Biscoe 48.6 16.0 230 5800 male 2008
39 Gentoo Biscoe 51.1 16.3 220 6000 male 2008
40 Gentoo Biscoe 45.2 16.4 223 5950 male 2008
41 Gentoo Biscoe 52.5 15.6 221 5450 male 2009
42 Gentoo Biscoe 50.0 15.9 224 5350 male 2009
43 Gentoo Biscoe 50.8 17.3 228 5600 male 2009
44 Gentoo Biscoe 51.3 14.2 218 5300 male 2009
45 Gentoo Biscoe 52.1 17.0 230 5550 male 2009
46 Gentoo Biscoe 52.2 17.1 228 5400 male 2009
47 Gentoo Biscoe 49.5 16.1 224 5650 male 2009
48 Gentoo Biscoe 50.8 15.7 226 5200 male 2009
49 Gentoo Biscoe 51.1 16.5 225 5250 male 2009
50 Gentoo Biscoe 55.9 17.0 228 5600 male 2009
51 Gentoo Biscoe 49.1 15.0 228 5500 male 2009
52 Gentoo Biscoe 46.8 16.1 215 5500 male 2009
53 Gentoo Biscoe 53.4 15.8 219 5500 male 2009
54 Gentoo Biscoe 48.1 15.1 209 5500 male 2009
55 Gentoo Biscoe 49.8 15.9 229 5950 male 2009
56 Gentoo Biscoe 51.5 16.3 230 5500 male 2009
57 Gentoo Biscoe 55.1 16.0 230 5850 male 2009
58 Gentoo Biscoe 48.8 16.2 222 6000 male 2009
59 Gentoo Biscoe 50.4 15.7 222 5750 male 2009
60 Gentoo Biscoe 45.2 14.8 212 5200 female 2009
61 Gentoo Biscoe 49.9 16.1 213 5400 male 2009
species island bill_len bill_dep flipper_len body_mass sex year
1 Chinstrap Dream 51.3 19.2 193 3650 male 2007
2 Chinstrap Dream 52.7 19.8 197 3725 male 2007
3 Chinstrap Dream 51.3 18.2 197 3750 male 2007
4 Chinstrap Dream 51.3 19.9 198 3700 male 2007
5 Chinstrap Dream 51.7 20.3 194 3775 male 2007
6 Chinstrap Dream 52.0 18.1 201 4050 male 2007
7 Chinstrap Dream 50.5 19.6 201 4050 male 2007
8 Chinstrap Dream 50.3 20.0 197 3300 male 2007
9 Chinstrap Dream 58.0 17.8 181 3700 female 2007
10 Chinstrap Dream 50.6 19.4 193 3800 male 2007
11 Chinstrap Dream 52.0 19.0 197 4150 male 2007
12 Chinstrap Dream 50.5 18.4 200 3400 female 2008
13 Chinstrap Dream 52.8 20.0 205 4550 male 2008
14 Chinstrap Dream 54.2 20.8 201 4300 male 2008
15 Chinstrap Dream 51.0 18.8 203 4100 male 2008
16 Chinstrap Dream 52.0 20.7 210 4800 male 2008
17 Chinstrap Dream 53.5 19.9 205 4500 male 2008
18 Chinstrap Dream 50.9 19.1 196 3550 male 2008
19 Chinstrap Dream 50.9 17.9 196 3675 female 2009
20 Chinstrap Dream 50.8 18.5 201 4450 male 2009
21 Chinstrap Dream 50.1 17.9 190 3400 female 2009
22 Chinstrap Dream 51.5 18.7 187 3250 male 2009
23 Chinstrap Dream 51.4 19.0 201 3950 male 2009
24 Chinstrap Dream 50.7 19.7 203 4050 male 2009
25 Chinstrap Dream 52.2 18.8 197 3450 male 2009
26 Chinstrap Dream 50.2 18.8 202 3800 male 2009
27 Chinstrap Dream 51.9 19.5 206 3950 male 2009
28 Chinstrap Dream 55.8 19.8 207 4000 male 2009
29 Chinstrap Dream 50.8 19.0 210 4100 male 2009
30 Chinstrap Dream 50.2 18.7 198 3775 female 2009
filter() 的进阶用法 species n
1 Adelie 152
2 Gentoo 124
species island bill_len bill_dep flipper_len body_mass sex year
1 Adelie Torgersen 40.3 18.0 195 3250 female 2007
2 Adelie Torgersen 42.0 20.2 190 4250 <NA> 2007
3 Adelie Torgersen 41.1 17.6 182 3200 female 2007
4 Adelie Torgersen 42.5 20.7 197 4500 male 2007
5 Adelie Biscoe 40.6 18.6 183 3550 male 2007
6 Adelie Biscoe 40.5 17.9 187 3200 female 2007
7 Adelie Biscoe 40.5 18.9 180 3950 male 2007
8 Adelie Dream 40.9 18.9 184 3900 male 2007
9 Adelie Dream 42.2 18.5 180 3550 female 2007
10 Adelie Dream 40.8 18.4 195 3900 male 2007
11 Adelie Dream 44.1 19.7 196 4400 male 2007
12 Adelie Dream 41.1 19.0 182 3425 male 2007
13 Adelie Dream 42.3 21.2 191 4150 male 2007
14 Adelie Biscoe 40.1 18.9 188 4300 male 2008
15 Adelie Biscoe 42.0 19.5 200 4050 male 2008
16 Adelie Biscoe 41.4 18.6 191 3700 male 2008
17 Adelie Biscoe 40.6 18.8 193 3800 male 2008
18 Adelie Biscoe 41.3 21.1 195 4400 male 2008
19 Adelie Biscoe 41.1 18.2 192 4050 male 2008
20 Adelie Biscoe 41.6 18.0 192 3950 male 2008
21 Adelie Biscoe 41.1 19.1 188 4100 male 2008
22 Adelie Torgersen 41.8 19.4 198 4450 male 2008
23 Adelie Torgersen 42.8 18.5 195 4250 male 2008
24 Adelie Torgersen 40.9 16.8 191 3700 female 2008
25 Adelie Torgersen 42.1 19.1 195 4000 male 2008
26 Adelie Torgersen 42.9 17.6 196 4700 male 2008
27 Adelie Dream 41.3 20.3 194 3550 male 2008
28 Adelie Dream 41.1 18.1 205 4300 male 2008
29 Adelie Dream 40.8 18.9 208 4300 male 2008
30 Adelie Dream 40.3 18.5 196 4350 male 2008
31 Adelie Dream 43.2 18.5 192 4100 male 2008
32 Adelie Biscoe 41.0 20.0 203 4725 male 2009
33 Adelie Biscoe 43.2 19.0 197 4775 male 2009
34 Adelie Biscoe 42.2 19.5 197 4275 male 2009
35 Adelie Biscoe 42.7 18.3 196 4075 male 2009
36 Adelie Torgersen 41.1 18.6 189 3325 male 2009
37 Adelie Torgersen 40.2 17.0 176 3450 female 2009
38 Adelie Torgersen 41.4 18.5 202 3875 male 2009
39 Adelie Torgersen 40.6 19.0 199 4000 male 2009
40 Adelie Torgersen 41.5 18.3 195 4300 male 2009
41 Adelie Torgersen 44.1 18.0 210 4000 male 2009
42 Adelie Torgersen 43.1 19.2 197 3500 male 2009
43 Adelie Dream 41.1 17.5 190 3900 male 2009
44 Adelie Dream 40.2 20.1 200 3975 male 2009
45 Adelie Dream 40.2 17.1 193 3400 female 2009
46 Adelie Dream 40.6 17.2 187 3475 male 2009
47 Adelie Dream 40.7 17.0 190 3725 male 2009
48 Adelie Dream 41.5 18.5 201 4000 male 2009
49 Gentoo Biscoe 43.3 13.4 209 4400 female 2007
50 Gentoo Biscoe 40.9 13.7 214 4650 female 2007
51 Gentoo Biscoe 42.0 13.5 210 4150 female 2007
52 Gentoo Biscoe 42.9 13.1 215 5000 female 2007
53 Gentoo Biscoe 44.5 14.3 216 4100 <NA> 2007
54 Gentoo Biscoe 42.8 14.2 209 4700 female 2007
55 Gentoo Biscoe 42.6 13.7 213 4950 female 2008
56 Gentoo Biscoe 44.4 17.3 219 5250 male 2008
57 Gentoo Biscoe 44.0 13.6 208 4350 female 2008
58 Gentoo Biscoe 42.7 13.7 208 3950 female 2008
59 Gentoo Biscoe 43.6 13.9 217 4900 female 2008
60 Gentoo Biscoe 44.9 13.3 213 5100 female 2008
61 Gentoo Biscoe 45.0 15.4 220 5050 male 2008
62 Gentoo Biscoe 43.8 13.9 208 4300 female 2008
63 Gentoo Biscoe 43.2 14.5 208 4450 female 2008
64 Gentoo Biscoe 43.5 14.2 220 4700 female 2008
65 Gentoo Biscoe 44.9 13.8 212 4750 female 2009
66 Gentoo Biscoe 43.4 14.4 218 4600 female 2009
67 Gentoo Biscoe 44.5 14.7 214 4850 female 2009
68 Gentoo Biscoe 41.7 14.7 210 4700 female 2009
69 Gentoo Biscoe 43.3 14.0 208 4575 female 2009
70 Gentoo Biscoe 43.5 15.2 213 4650 female 2009
71 Gentoo Biscoe 44.5 15.7 217 4875 <NA> 2009
72 Chinstrap Dream 42.4 17.3 181 3600 female 2007
73 Chinstrap Dream 43.2 16.6 187 2900 female 2007
74 Chinstrap Dream 40.9 16.6 187 3200 female 2008
75 Chinstrap Dream 42.5 16.7 187 3350 female 2008
76 Chinstrap Dream 42.5 17.3 187 3350 female 2009
77 Chinstrap Dream 43.5 18.1 202 3400 female 2009
species island bill_len bill_dep flipper_len body_mass sex year
1 Adelie Torgersen NA NA NA NA <NA> 2007
2 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007
3 Adelie Torgersen 42.0 20.2 190 4250 <NA> 2007
4 Adelie Torgersen 37.8 17.1 186 3300 <NA> 2007
5 Adelie Torgersen 37.8 17.3 180 3700 <NA> 2007
6 Adelie Dream 37.5 18.9 179 2975 <NA> 2007
7 Gentoo Biscoe 44.5 14.3 216 4100 <NA> 2007
8 Gentoo Biscoe 46.2 14.4 214 4650 <NA> 2008
9 Gentoo Biscoe 47.3 13.8 216 4725 <NA> 2009
10 Gentoo Biscoe 44.5 15.7 217 4875 <NA> 2009
11 Gentoo Biscoe NA NA NA NA <NA> 2009
arrange():排序行 species body_mass
1 Chinstrap 2700
2 Adelie 2850
3 Adelie 2850
4 Adelie 2900
5 Adelie 2900
slice():按位置选行 species island bill_len bill_dep flipper_len body_mass sex year
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18.0 195 3250 female 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
提示
slice_max() / slice_min() / slice_head() / slice_tail() / slice_sample() 是一组实用的行选取函数,配合 by = 分组参数使用特别强大。
select() 与 mutate()
select():选择列 species island body_mass
1 Adelie Torgersen 3750
2 Adelie Torgersen 3800
3 Adelie Torgersen 3250
4 Adelie Torgersen NA
select() 的辅助函数 species bill_len bill_dep
1 Adelie 39.1 18.7
2 Adelie 39.5 17.4
3 Adelie 40.3 18.0
4 Adelie NA NA
rename():重命名列 物种 岛屿 体重_克
1 Adelie Torgersen 3750
2 Adelie Torgersen 3800
3 Adelie Torgersen 3250
4 Adelie Torgersen NA
mutate():创建或修改列 species 体重_千克 嘴峰长宽比 大型企鹅
1 Adelie 3.75 2.09 FALSE
2 Adelie 3.80 2.27 FALSE
3 Adelie 3.25 2.24 FALSE
4 Adelie NA NA NA
mutate() 与条件判断 species body_mass 体型
1 Adelie 3750 小型
2 Adelie 3800 小型
3 Adelie 3250 小型
4 Adelie NA <NA>
5 Adelie 3450 小型
mutate() 的进阶用法 species n
1 ADELIE 152
2 CHINSTRAP 68
3 GENTOO 124
提示
across() 是 dplyr 1.0 引入的神器。~ round(.x, 1) 是匿名函数写法,.x 代表当前列。等价于对每一个符合条件的列,执行 round(列, 1)。
summarise() 与 .by=
summarise():汇总统计[1] 4202
[1] 802
[1] 6300
[1] 2
样本量 平均体重 体重标准差 最大体重 缺失数量
1 344 4202 802 6300 2
.by=:分组操作的新语法dplyr 1.1.0 引入了 .by= 参数,可以直接在 summarise()、mutate() 内部指定分组,无需 group_by() 和 ungroup():
species 样本量 平均体重 平均嘴长
1 Adelie 152 3701 38.8
2 Gentoo 124 5076 47.5
3 Chinstrap 68 3733 48.8
species sex 样本量 平均体重
1 Adelie male 73 4043
2 Adelie female 73 3369
3 Gentoo female 58 4680
4 Gentoo male 61 5485
5 Chinstrap female 34 3527
6 Chinstrap male 34 3939
提示
.by= 比 group_by() 更简洁,且操作完成后自动取消分组,不会留下"隐藏状态"造成后续意外。本课程统一推荐使用 .by=。
.by= 与 mutate():组内计算 species body_mass 组内均值 与均值之差 组内排名
1 Adelie 3750 3701 49.3 66.5
2 Adelie 3800 3701 99.3 59.5
3 Adelie 3250 3701 -450.7 125.5
4 Adelie NA 3701 NA 152.0
5 Adelie 3450 3701 -250.7 103.5
提示
.by= + mutate() 和 .by= + summarise() 的关键区别:前者保留所有行,只是添加组内统计列;后者压缩行数,每组只剩一行汇总结果。
# 从原始数据到汇总报告:一气呵成
penguins |>
drop_na(body_mass, sex) |>
mutate(
嘴峰长宽比 = round(bill_len / bill_dep, 2),
体重等级 = if_else(body_mass > 4200, "大型", "小型")
) |>
summarise(
样本量 = n(),
平均体重 = round(mean(body_mass), 0),
平均长宽比 = round(mean(嘴峰长宽比), 2),
大型比例 = round(mean(体重等级 == "大型") * 100, 1),
.by = c(species, sex)
) |>
arrange(species, sex) species sex 样本量 平均体重 平均长宽比 大型比例
1 Adelie female 73 3369 2.12 0.0
2 Adelie male 73 4043 2.12 32.9
3 Chinstrap female 34 3527 2.65 0.0
4 Chinstrap male 34 3939 2.66 20.6
5 Gentoo female 58 4680 3.20 91.4
6 Gentoo male 61 5485 3.15 100.0
tidyverse 是什么:一组共享设计哲学的包的集合;核心是 tidy data——每列一个变量,每行一个观测;代码可读性是它的核心价值
管道操作符:|> 把左边的结果传给右边;让代码从左到右、从上到下,与操作顺序一致;快捷键 Ctrl+Shift+M
count():频率统计最简写法,count(x) 直接返回各类别计数;add_count() 保留原始行数;sort = TRUE 降序排列
行操作:filter() 按条件筛选行,支持 %in%、between()、is.na();arrange() 排序,desc() 降序;drop_na() 去除缺失行;slice_max() 等取极值行
列操作:select() 选列,starts_with() 等辅助函数;mutate() 创建新列,if_else() 二元条件,case_when() 多分支;across() 批量操作多列
汇总与分组:summarise() 压缩为汇总行;.by= 直接在函数内分组,无需 group_by() / ungroup();mutate(.by=) 添加组内统计列而不压缩行数
penguins 数据,筛选出 Biscoe 岛上体重超过 4500 克的企鹅,按体重降序排列,只保留物种、岛屿、体重、性别四列mutate() 创建新列:(a)将体重从克转换为磅(1 克 = 0.00220462 磅);(b)用 case_when() 将嘴峰长度分为"短"(< 40mm)、"中"(40–50mm)、"长"(> 50mm)三类.by=),计算每组的样本量、体重均值和标准差,结果保留一位小数bill_len / bill_dep)最高的 3 只企鹅,输出包含物种、岛屿、性别、长宽比的表格(提示:用 slice_max(.., by = species))mutate(.by=) 计算每只企鹅的体重在同物种同性别中的百分位排名(提示:用 percent_rank()),找出各组中排名前 10% 的个体第6讲:tidyverse 数据操作(下)
tidyr:数据整形 —— pivot_longer() 与 pivot_wider()
left_join()、inner_join()、anti_join()
stringr 核心函数drop_na()、fill()、replace_na()
提示
第6讲是本讲的延伸——学完这两讲,你就能应对日常数据分析中 80% 的数据清洗任务,无论数据有多"乱"。
第5讲:tidyverse 数据操作(上)
「整洁的数据是分析的起点;tidyverse 让整理数据这件事,本身也变得整洁。」
数据挖掘与R语言 | 第5讲:tidyverse 数据操作 ~ 1