数据挖掘与R语言

第5讲:tidyverse 数据操作 ~ Part 1

2026年03月27日

上讲回顾

  • Markdown 是什么:轻量级纯文本标记语言;内容与格式分离;一次编写,多处发布
  • 标题与段落# 号控制层级;空一整行才是新段落
  • 强调与代码**加粗***斜体*`行内代码`;三重反引号创建代码块
  • 数学公式$...$ 行内公式;$$...$$ 独立居中公式;LaTeX 语法
  • 表格与引用|- 绘制表格;> 块引用;[^1] 脚注
  • R Markdown:Markdown 超集;代码块嵌入可执行 R 代码;行内代码自动嵌入计算结果

本讲内容

  • Part 1:tidyverse 是什么? ——生态系统概览与核心理念(约10分钟)
  • Part 2:管道操作符 ——|>%>% 的思想(约10分钟)
  • Part 3:count() ——最直观的频率统计(约10分钟)
  • Part 4:行操作 ——filter()arrange()(约20分钟)
  • Part 5:列操作 ——select()mutate()(约20分钟)
  • Part 6:汇总与分组 ——summarise().by=(约15分钟)

提示

本讲是整个课程的转折点——从"用 R 写代码"到"用 R 做数据分析"。tidyverse 提供了一套一致、直观的语法,让数据处理变得像说话一样自然。

Part 1 tidyverse 是什么?

生态系统概览与核心理念

1.1 tidyverse 是一个"包的集合"

tidyverse 不是一个包,而是一组共享设计哲学的 R 包的集合:

▶️ 查看代码
install.packages("tidyverse")  # 一次安装,全部到位
library(tidyverse)             # 一次加载,核心包全部激活

加载后会看到:

── Attaching core tidyverse packages ──────────────────────
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2
主要用途
readr 读取 CSV、TSV 等文本数据
dplyr 数据操作(筛选、变换、汇总)
tidyr 数据整形(宽表↔︎长表)
ggplot2 数据可视化
stringr 字符串处理
lubridate 日期时间处理
purrr 函数式编程

1.2 tidy data:整洁数据的三条原则

tidyverse 的核心是 tidy data(整洁数据)的概念,由 Hadley Wickham 提出:

整洁数据的三条原则

  1. 每一是一个变量
  2. 每一是一个观测
  3. 每一个单元格是一个值

不整洁数据的常见问题

  • 列名是数据(比如年份作为列名)
# A tibble: 3 × 4
  产品  `2019` `2020` `2021`
  <chr>  <dbl>  <dbl>  <dbl>
1 手机     100    120    150
2 电脑      80     90    110
3 平板      60     70     85
  • 一列存了多个变量
# A tibble: 3 × 3
  患者ID 姓名_年龄 诊断结果
   <int> <chr>     <chr>   
1      1 张三_25   高血压  
2      2 李四_32   糖尿病  
3      3 王五_28   感冒    
  • 一个变量散落在多列
# A tibble: 3 × 4
  学生ID 语文成绩 数学成绩 英语成绩
   <int>    <dbl>    <dbl>    <dbl>
1      1       85       90       88
2      2       78       88       82
3      3       92       85       89
  • 一张表存了多种观测单元
# A tibble: 5 × 5
  员工姓名 工号  部门   部门人数 部门经理
  <chr>    <chr> <chr>     <dbl> <chr>   
1 张三     E001  销售部       NA <NA>    
2 李四     E002  技术部       NA <NA>    
3 王五     E003  销售部       NA <NA>    
4 销售部   <NA>  <NA>         15 赵经理  
5 技术部   <NA>  <NA>         25 钱经理  
# A tibble: 3 × 5
  指标   第一季度_实际 第一季度_预算 第二季度_实际 第二季度_预算
  <chr>          <dbl>         <dbl>         <dbl>         <dbl>
1 销售额           100            90           110           105
2 利润              20            18            25            22
3 客户数            50            55            60            58
▶️ 查看代码
# penguins 就是一个标准的 tidy data
glimpse(penguins)
Rows: 344
Columns: 8
$ species     <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ad…
$ island      <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Tor…
$ bill_len    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, …
$ bill_dep    <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, …
$ flipper_len <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180,…
$ body_mass   <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, …
$ sex         <fct> male, female, female, NA, female, male, female, male, NA, …
$ year        <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
▶️ 查看代码
head(penguins)
  species    island bill_len bill_dep flipper_len body_mass    sex year
1  Adelie Torgersen     39.1     18.7         181      3750   male 2007
2  Adelie Torgersen     39.5     17.4         186      3800 female 2007
3  Adelie Torgersen     40.3     18.0         195      3250 female 2007
4  Adelie Torgersen       NA       NA          NA        NA   <NA> 2007
5  Adelie Torgersen     36.7     19.3         193      3450 female 2007
6  Adelie Torgersen     39.3     20.6         190      3650   male 2007

1.3 tidyverse 与 base R 的对比

同一个任务,两种写法:

base R:筛选 Adelie 企鹅并计算体重均值

▶️ 查看代码
adelie <- penguins[penguins$species == "Adelie",]
mean(adelie$body_mass, na.rm = TRUE)
[1] 3701
▶️ 查看代码
#或者
mean(penguins[penguins$species == "Adelie",]$body_mass, na.rm = TRUE)
[1] 3701

tidyverse:同样的任务

▶️ 查看代码
penguins |>
  filter(species == "Adelie") |>
  drop_na(body_mass) |> 
  summarise(均值 = mean(body_mass))
  均值
1 3701

重要

tidyverse 的代码像一段自然语言描述:"拿到企鹅数据,筛选出 Adelie 物种,计算体重的均值"——每一步做什么,一眼就能看懂。这就是它的核心价值:可读性

1.4 tibble:tidyverse 的数据框

tidyverse 使用 tibble 代替 base R 的 data.frame——打印更友好,显示行列数和列类型,超出屏幕自动截断:

▶️ 查看代码
penguins
      species    island bill_len bill_dep flipper_len body_mass    sex year
1      Adelie Torgersen     39.1     18.7         181      3750   male 2007
2      Adelie Torgersen     39.5     17.4         186      3800 female 2007
3      Adelie Torgersen     40.3     18.0         195      3250 female 2007
4      Adelie Torgersen       NA       NA          NA        NA   <NA> 2007
5      Adelie Torgersen     36.7     19.3         193      3450 female 2007
6      Adelie Torgersen     39.3     20.6         190      3650   male 2007
7      Adelie Torgersen     38.9     17.8         181      3625 female 2007
8      Adelie Torgersen     39.2     19.6         195      4675   male 2007
9      Adelie Torgersen     34.1     18.1         193      3475   <NA> 2007
10     Adelie Torgersen     42.0     20.2         190      4250   <NA> 2007
11     Adelie Torgersen     37.8     17.1         186      3300   <NA> 2007
12     Adelie Torgersen     37.8     17.3         180      3700   <NA> 2007
13     Adelie Torgersen     41.1     17.6         182      3200 female 2007
14     Adelie Torgersen     38.6     21.2         191      3800   male 2007
15     Adelie Torgersen     34.6     21.1         198      4400   male 2007
16     Adelie Torgersen     36.6     17.8         185      3700 female 2007
17     Adelie Torgersen     38.7     19.0         195      3450 female 2007
18     Adelie Torgersen     42.5     20.7         197      4500   male 2007
19     Adelie Torgersen     34.4     18.4         184      3325 female 2007
20     Adelie Torgersen     46.0     21.5         194      4200   male 2007
21     Adelie    Biscoe     37.8     18.3         174      3400 female 2007
22     Adelie    Biscoe     37.7     18.7         180      3600   male 2007
23     Adelie    Biscoe     35.9     19.2         189      3800 female 2007
24     Adelie    Biscoe     38.2     18.1         185      3950   male 2007
25     Adelie    Biscoe     38.8     17.2         180      3800   male 2007
26     Adelie    Biscoe     35.3     18.9         187      3800 female 2007
27     Adelie    Biscoe     40.6     18.6         183      3550   male 2007
28     Adelie    Biscoe     40.5     17.9         187      3200 female 2007
29     Adelie    Biscoe     37.9     18.6         172      3150 female 2007
30     Adelie    Biscoe     40.5     18.9         180      3950   male 2007
31     Adelie     Dream     39.5     16.7         178      3250 female 2007
32     Adelie     Dream     37.2     18.1         178      3900   male 2007
33     Adelie     Dream     39.5     17.8         188      3300 female 2007
34     Adelie     Dream     40.9     18.9         184      3900   male 2007
35     Adelie     Dream     36.4     17.0         195      3325 female 2007
36     Adelie     Dream     39.2     21.1         196      4150   male 2007
37     Adelie     Dream     38.8     20.0         190      3950   male 2007
38     Adelie     Dream     42.2     18.5         180      3550 female 2007
39     Adelie     Dream     37.6     19.3         181      3300 female 2007
40     Adelie     Dream     39.8     19.1         184      4650   male 2007
41     Adelie     Dream     36.5     18.0         182      3150 female 2007
42     Adelie     Dream     40.8     18.4         195      3900   male 2007
43     Adelie     Dream     36.0     18.5         186      3100 female 2007
44     Adelie     Dream     44.1     19.7         196      4400   male 2007
45     Adelie     Dream     37.0     16.9         185      3000 female 2007
46     Adelie     Dream     39.6     18.8         190      4600   male 2007
47     Adelie     Dream     41.1     19.0         182      3425   male 2007
48     Adelie     Dream     37.5     18.9         179      2975   <NA> 2007
49     Adelie     Dream     36.0     17.9         190      3450 female 2007
50     Adelie     Dream     42.3     21.2         191      4150   male 2007
51     Adelie    Biscoe     39.6     17.7         186      3500 female 2008
52     Adelie    Biscoe     40.1     18.9         188      4300   male 2008
53     Adelie    Biscoe     35.0     17.9         190      3450 female 2008
54     Adelie    Biscoe     42.0     19.5         200      4050   male 2008
55     Adelie    Biscoe     34.5     18.1         187      2900 female 2008
56     Adelie    Biscoe     41.4     18.6         191      3700   male 2008
57     Adelie    Biscoe     39.0     17.5         186      3550 female 2008
58     Adelie    Biscoe     40.6     18.8         193      3800   male 2008
59     Adelie    Biscoe     36.5     16.6         181      2850 female 2008
60     Adelie    Biscoe     37.6     19.1         194      3750   male 2008
61     Adelie    Biscoe     35.7     16.9         185      3150 female 2008
62     Adelie    Biscoe     41.3     21.1         195      4400   male 2008
63     Adelie    Biscoe     37.6     17.0         185      3600 female 2008
64     Adelie    Biscoe     41.1     18.2         192      4050   male 2008
65     Adelie    Biscoe     36.4     17.1         184      2850 female 2008
66     Adelie    Biscoe     41.6     18.0         192      3950   male 2008
67     Adelie    Biscoe     35.5     16.2         195      3350 female 2008
68     Adelie    Biscoe     41.1     19.1         188      4100   male 2008
69     Adelie Torgersen     35.9     16.6         190      3050 female 2008
70     Adelie Torgersen     41.8     19.4         198      4450   male 2008
71     Adelie Torgersen     33.5     19.0         190      3600 female 2008
72     Adelie Torgersen     39.7     18.4         190      3900   male 2008
73     Adelie Torgersen     39.6     17.2         196      3550 female 2008
74     Adelie Torgersen     45.8     18.9         197      4150   male 2008
75     Adelie Torgersen     35.5     17.5         190      3700 female 2008
76     Adelie Torgersen     42.8     18.5         195      4250   male 2008
77     Adelie Torgersen     40.9     16.8         191      3700 female 2008
78     Adelie Torgersen     37.2     19.4         184      3900   male 2008
79     Adelie Torgersen     36.2     16.1         187      3550 female 2008
80     Adelie Torgersen     42.1     19.1         195      4000   male 2008
81     Adelie Torgersen     34.6     17.2         189      3200 female 2008
82     Adelie Torgersen     42.9     17.6         196      4700   male 2008
83     Adelie Torgersen     36.7     18.8         187      3800 female 2008
84     Adelie Torgersen     35.1     19.4         193      4200   male 2008
85     Adelie     Dream     37.3     17.8         191      3350 female 2008
86     Adelie     Dream     41.3     20.3         194      3550   male 2008
87     Adelie     Dream     36.3     19.5         190      3800   male 2008
88     Adelie     Dream     36.9     18.6         189      3500 female 2008
89     Adelie     Dream     38.3     19.2         189      3950   male 2008
90     Adelie     Dream     38.9     18.8         190      3600 female 2008
91     Adelie     Dream     35.7     18.0         202      3550 female 2008
92     Adelie     Dream     41.1     18.1         205      4300   male 2008
93     Adelie     Dream     34.0     17.1         185      3400 female 2008
94     Adelie     Dream     39.6     18.1         186      4450   male 2008
95     Adelie     Dream     36.2     17.3         187      3300 female 2008
96     Adelie     Dream     40.8     18.9         208      4300   male 2008
97     Adelie     Dream     38.1     18.6         190      3700 female 2008
98     Adelie     Dream     40.3     18.5         196      4350   male 2008
99     Adelie     Dream     33.1     16.1         178      2900 female 2008
100    Adelie     Dream     43.2     18.5         192      4100   male 2008
101    Adelie    Biscoe     35.0     17.9         192      3725 female 2009
102    Adelie    Biscoe     41.0     20.0         203      4725   male 2009
103    Adelie    Biscoe     37.7     16.0         183      3075 female 2009
104    Adelie    Biscoe     37.8     20.0         190      4250   male 2009
105    Adelie    Biscoe     37.9     18.6         193      2925 female 2009
106    Adelie    Biscoe     39.7     18.9         184      3550   male 2009
107    Adelie    Biscoe     38.6     17.2         199      3750 female 2009
108    Adelie    Biscoe     38.2     20.0         190      3900   male 2009
109    Adelie    Biscoe     38.1     17.0         181      3175 female 2009
110    Adelie    Biscoe     43.2     19.0         197      4775   male 2009
111    Adelie    Biscoe     38.1     16.5         198      3825 female 2009
112    Adelie    Biscoe     45.6     20.3         191      4600   male 2009
113    Adelie    Biscoe     39.7     17.7         193      3200 female 2009
114    Adelie    Biscoe     42.2     19.5         197      4275   male 2009
115    Adelie    Biscoe     39.6     20.7         191      3900 female 2009
116    Adelie    Biscoe     42.7     18.3         196      4075   male 2009
117    Adelie Torgersen     38.6     17.0         188      2900 female 2009
118    Adelie Torgersen     37.3     20.5         199      3775   male 2009
119    Adelie Torgersen     35.7     17.0         189      3350 female 2009
120    Adelie Torgersen     41.1     18.6         189      3325   male 2009
121    Adelie Torgersen     36.2     17.2         187      3150 female 2009
122    Adelie Torgersen     37.7     19.8         198      3500   male 2009
123    Adelie Torgersen     40.2     17.0         176      3450 female 2009
124    Adelie Torgersen     41.4     18.5         202      3875   male 2009
125    Adelie Torgersen     35.2     15.9         186      3050 female 2009
126    Adelie Torgersen     40.6     19.0         199      4000   male 2009
127    Adelie Torgersen     38.8     17.6         191      3275 female 2009
128    Adelie Torgersen     41.5     18.3         195      4300   male 2009
129    Adelie Torgersen     39.0     17.1         191      3050 female 2009
130    Adelie Torgersen     44.1     18.0         210      4000   male 2009
131    Adelie Torgersen     38.5     17.9         190      3325 female 2009
132    Adelie Torgersen     43.1     19.2         197      3500   male 2009
133    Adelie     Dream     36.8     18.5         193      3500 female 2009
134    Adelie     Dream     37.5     18.5         199      4475   male 2009
135    Adelie     Dream     38.1     17.6         187      3425 female 2009
136    Adelie     Dream     41.1     17.5         190      3900   male 2009
137    Adelie     Dream     35.6     17.5         191      3175 female 2009
138    Adelie     Dream     40.2     20.1         200      3975   male 2009
139    Adelie     Dream     37.0     16.5         185      3400 female 2009
140    Adelie     Dream     39.7     17.9         193      4250   male 2009
141    Adelie     Dream     40.2     17.1         193      3400 female 2009
142    Adelie     Dream     40.6     17.2         187      3475   male 2009
143    Adelie     Dream     32.1     15.5         188      3050 female 2009
144    Adelie     Dream     40.7     17.0         190      3725   male 2009
145    Adelie     Dream     37.3     16.8         192      3000 female 2009
146    Adelie     Dream     39.0     18.7         185      3650   male 2009
147    Adelie     Dream     39.2     18.6         190      4250   male 2009
148    Adelie     Dream     36.6     18.4         184      3475 female 2009
149    Adelie     Dream     36.0     17.8         195      3450 female 2009
150    Adelie     Dream     37.8     18.1         193      3750   male 2009
151    Adelie     Dream     36.0     17.1         187      3700 female 2009
152    Adelie     Dream     41.5     18.5         201      4000   male 2009
153    Gentoo    Biscoe     46.1     13.2         211      4500 female 2007
154    Gentoo    Biscoe     50.0     16.3         230      5700   male 2007
155    Gentoo    Biscoe     48.7     14.1         210      4450 female 2007
156    Gentoo    Biscoe     50.0     15.2         218      5700   male 2007
157    Gentoo    Biscoe     47.6     14.5         215      5400   male 2007
158    Gentoo    Biscoe     46.5     13.5         210      4550 female 2007
159    Gentoo    Biscoe     45.4     14.6         211      4800 female 2007
160    Gentoo    Biscoe     46.7     15.3         219      5200   male 2007
161    Gentoo    Biscoe     43.3     13.4         209      4400 female 2007
162    Gentoo    Biscoe     46.8     15.4         215      5150   male 2007
163    Gentoo    Biscoe     40.9     13.7         214      4650 female 2007
164    Gentoo    Biscoe     49.0     16.1         216      5550   male 2007
165    Gentoo    Biscoe     45.5     13.7         214      4650 female 2007
166    Gentoo    Biscoe     48.4     14.6         213      5850   male 2007
167    Gentoo    Biscoe     45.8     14.6         210      4200 female 2007
168    Gentoo    Biscoe     49.3     15.7         217      5850   male 2007
169    Gentoo    Biscoe     42.0     13.5         210      4150 female 2007
170    Gentoo    Biscoe     49.2     15.2         221      6300   male 2007
171    Gentoo    Biscoe     46.2     14.5         209      4800 female 2007
172    Gentoo    Biscoe     48.7     15.1         222      5350   male 2007
173    Gentoo    Biscoe     50.2     14.3         218      5700   male 2007
174    Gentoo    Biscoe     45.1     14.5         215      5000 female 2007
175    Gentoo    Biscoe     46.5     14.5         213      4400 female 2007
176    Gentoo    Biscoe     46.3     15.8         215      5050   male 2007
177    Gentoo    Biscoe     42.9     13.1         215      5000 female 2007
178    Gentoo    Biscoe     46.1     15.1         215      5100   male 2007
179    Gentoo    Biscoe     44.5     14.3         216      4100   <NA> 2007
180    Gentoo    Biscoe     47.8     15.0         215      5650   male 2007
181    Gentoo    Biscoe     48.2     14.3         210      4600 female 2007
182    Gentoo    Biscoe     50.0     15.3         220      5550   male 2007
183    Gentoo    Biscoe     47.3     15.3         222      5250   male 2007
184    Gentoo    Biscoe     42.8     14.2         209      4700 female 2007
185    Gentoo    Biscoe     45.1     14.5         207      5050 female 2007
186    Gentoo    Biscoe     59.6     17.0         230      6050   male 2007
187    Gentoo    Biscoe     49.1     14.8         220      5150 female 2008
188    Gentoo    Biscoe     48.4     16.3         220      5400   male 2008
189    Gentoo    Biscoe     42.6     13.7         213      4950 female 2008
190    Gentoo    Biscoe     44.4     17.3         219      5250   male 2008
191    Gentoo    Biscoe     44.0     13.6         208      4350 female 2008
192    Gentoo    Biscoe     48.7     15.7         208      5350   male 2008
193    Gentoo    Biscoe     42.7     13.7         208      3950 female 2008
194    Gentoo    Biscoe     49.6     16.0         225      5700   male 2008
195    Gentoo    Biscoe     45.3     13.7         210      4300 female 2008
196    Gentoo    Biscoe     49.6     15.0         216      4750   male 2008
197    Gentoo    Biscoe     50.5     15.9         222      5550   male 2008
198    Gentoo    Biscoe     43.6     13.9         217      4900 female 2008
199    Gentoo    Biscoe     45.5     13.9         210      4200 female 2008
200    Gentoo    Biscoe     50.5     15.9         225      5400   male 2008
201    Gentoo    Biscoe     44.9     13.3         213      5100 female 2008
202    Gentoo    Biscoe     45.2     15.8         215      5300   male 2008
203    Gentoo    Biscoe     46.6     14.2         210      4850 female 2008
204    Gentoo    Biscoe     48.5     14.1         220      5300   male 2008
205    Gentoo    Biscoe     45.1     14.4         210      4400 female 2008
206    Gentoo    Biscoe     50.1     15.0         225      5000   male 2008
207    Gentoo    Biscoe     46.5     14.4         217      4900 female 2008
208    Gentoo    Biscoe     45.0     15.4         220      5050   male 2008
209    Gentoo    Biscoe     43.8     13.9         208      4300 female 2008
210    Gentoo    Biscoe     45.5     15.0         220      5000   male 2008
211    Gentoo    Biscoe     43.2     14.5         208      4450 female 2008
212    Gentoo    Biscoe     50.4     15.3         224      5550   male 2008
213    Gentoo    Biscoe     45.3     13.8         208      4200 female 2008
214    Gentoo    Biscoe     46.2     14.9         221      5300   male 2008
215    Gentoo    Biscoe     45.7     13.9         214      4400 female 2008
216    Gentoo    Biscoe     54.3     15.7         231      5650   male 2008
217    Gentoo    Biscoe     45.8     14.2         219      4700 female 2008
218    Gentoo    Biscoe     49.8     16.8         230      5700   male 2008
219    Gentoo    Biscoe     46.2     14.4         214      4650   <NA> 2008
220    Gentoo    Biscoe     49.5     16.2         229      5800   male 2008
221    Gentoo    Biscoe     43.5     14.2         220      4700 female 2008
222    Gentoo    Biscoe     50.7     15.0         223      5550   male 2008
223    Gentoo    Biscoe     47.7     15.0         216      4750 female 2008
224    Gentoo    Biscoe     46.4     15.6         221      5000   male 2008
225    Gentoo    Biscoe     48.2     15.6         221      5100   male 2008
226    Gentoo    Biscoe     46.5     14.8         217      5200 female 2008
227    Gentoo    Biscoe     46.4     15.0         216      4700 female 2008
228    Gentoo    Biscoe     48.6     16.0         230      5800   male 2008
229    Gentoo    Biscoe     47.5     14.2         209      4600 female 2008
230    Gentoo    Biscoe     51.1     16.3         220      6000   male 2008
231    Gentoo    Biscoe     45.2     13.8         215      4750 female 2008
232    Gentoo    Biscoe     45.2     16.4         223      5950   male 2008
233    Gentoo    Biscoe     49.1     14.5         212      4625 female 2009
234    Gentoo    Biscoe     52.5     15.6         221      5450   male 2009
235    Gentoo    Biscoe     47.4     14.6         212      4725 female 2009
236    Gentoo    Biscoe     50.0     15.9         224      5350   male 2009
237    Gentoo    Biscoe     44.9     13.8         212      4750 female 2009
238    Gentoo    Biscoe     50.8     17.3         228      5600   male 2009
239    Gentoo    Biscoe     43.4     14.4         218      4600 female 2009
240    Gentoo    Biscoe     51.3     14.2         218      5300   male 2009
241    Gentoo    Biscoe     47.5     14.0         212      4875 female 2009
242    Gentoo    Biscoe     52.1     17.0         230      5550   male 2009
243    Gentoo    Biscoe     47.5     15.0         218      4950 female 2009
244    Gentoo    Biscoe     52.2     17.1         228      5400   male 2009
245    Gentoo    Biscoe     45.5     14.5         212      4750 female 2009
246    Gentoo    Biscoe     49.5     16.1         224      5650   male 2009
247    Gentoo    Biscoe     44.5     14.7         214      4850 female 2009
248    Gentoo    Biscoe     50.8     15.7         226      5200   male 2009
249    Gentoo    Biscoe     49.4     15.8         216      4925   male 2009
250    Gentoo    Biscoe     46.9     14.6         222      4875 female 2009
251    Gentoo    Biscoe     48.4     14.4         203      4625 female 2009
252    Gentoo    Biscoe     51.1     16.5         225      5250   male 2009
253    Gentoo    Biscoe     48.5     15.0         219      4850 female 2009
254    Gentoo    Biscoe     55.9     17.0         228      5600   male 2009
255    Gentoo    Biscoe     47.2     15.5         215      4975 female 2009
256    Gentoo    Biscoe     49.1     15.0         228      5500   male 2009
257    Gentoo    Biscoe     47.3     13.8         216      4725   <NA> 2009
258    Gentoo    Biscoe     46.8     16.1         215      5500   male 2009
259    Gentoo    Biscoe     41.7     14.7         210      4700 female 2009
260    Gentoo    Biscoe     53.4     15.8         219      5500   male 2009
261    Gentoo    Biscoe     43.3     14.0         208      4575 female 2009
262    Gentoo    Biscoe     48.1     15.1         209      5500   male 2009
263    Gentoo    Biscoe     50.5     15.2         216      5000 female 2009
264    Gentoo    Biscoe     49.8     15.9         229      5950   male 2009
265    Gentoo    Biscoe     43.5     15.2         213      4650 female 2009
266    Gentoo    Biscoe     51.5     16.3         230      5500   male 2009
267    Gentoo    Biscoe     46.2     14.1         217      4375 female 2009
268    Gentoo    Biscoe     55.1     16.0         230      5850   male 2009
269    Gentoo    Biscoe     44.5     15.7         217      4875   <NA> 2009
270    Gentoo    Biscoe     48.8     16.2         222      6000   male 2009
271    Gentoo    Biscoe     47.2     13.7         214      4925 female 2009
272    Gentoo    Biscoe       NA       NA          NA        NA   <NA> 2009
273    Gentoo    Biscoe     46.8     14.3         215      4850 female 2009
274    Gentoo    Biscoe     50.4     15.7         222      5750   male 2009
275    Gentoo    Biscoe     45.2     14.8         212      5200 female 2009
276    Gentoo    Biscoe     49.9     16.1         213      5400   male 2009
277 Chinstrap     Dream     46.5     17.9         192      3500 female 2007
278 Chinstrap     Dream     50.0     19.5         196      3900   male 2007
279 Chinstrap     Dream     51.3     19.2         193      3650   male 2007
280 Chinstrap     Dream     45.4     18.7         188      3525 female 2007
281 Chinstrap     Dream     52.7     19.8         197      3725   male 2007
282 Chinstrap     Dream     45.2     17.8         198      3950 female 2007
283 Chinstrap     Dream     46.1     18.2         178      3250 female 2007
284 Chinstrap     Dream     51.3     18.2         197      3750   male 2007
285 Chinstrap     Dream     46.0     18.9         195      4150 female 2007
286 Chinstrap     Dream     51.3     19.9         198      3700   male 2007
287 Chinstrap     Dream     46.6     17.8         193      3800 female 2007
288 Chinstrap     Dream     51.7     20.3         194      3775   male 2007
289 Chinstrap     Dream     47.0     17.3         185      3700 female 2007
290 Chinstrap     Dream     52.0     18.1         201      4050   male 2007
291 Chinstrap     Dream     45.9     17.1         190      3575 female 2007
292 Chinstrap     Dream     50.5     19.6         201      4050   male 2007
293 Chinstrap     Dream     50.3     20.0         197      3300   male 2007
294 Chinstrap     Dream     58.0     17.8         181      3700 female 2007
295 Chinstrap     Dream     46.4     18.6         190      3450 female 2007
296 Chinstrap     Dream     49.2     18.2         195      4400   male 2007
297 Chinstrap     Dream     42.4     17.3         181      3600 female 2007
298 Chinstrap     Dream     48.5     17.5         191      3400   male 2007
299 Chinstrap     Dream     43.2     16.6         187      2900 female 2007
300 Chinstrap     Dream     50.6     19.4         193      3800   male 2007
301 Chinstrap     Dream     46.7     17.9         195      3300 female 2007
302 Chinstrap     Dream     52.0     19.0         197      4150   male 2007
303 Chinstrap     Dream     50.5     18.4         200      3400 female 2008
304 Chinstrap     Dream     49.5     19.0         200      3800   male 2008
305 Chinstrap     Dream     46.4     17.8         191      3700 female 2008
306 Chinstrap     Dream     52.8     20.0         205      4550   male 2008
307 Chinstrap     Dream     40.9     16.6         187      3200 female 2008
308 Chinstrap     Dream     54.2     20.8         201      4300   male 2008
309 Chinstrap     Dream     42.5     16.7         187      3350 female 2008
310 Chinstrap     Dream     51.0     18.8         203      4100   male 2008
311 Chinstrap     Dream     49.7     18.6         195      3600   male 2008
312 Chinstrap     Dream     47.5     16.8         199      3900 female 2008
313 Chinstrap     Dream     47.6     18.3         195      3850 female 2008
314 Chinstrap     Dream     52.0     20.7         210      4800   male 2008
315 Chinstrap     Dream     46.9     16.6         192      2700 female 2008
316 Chinstrap     Dream     53.5     19.9         205      4500   male 2008
317 Chinstrap     Dream     49.0     19.5         210      3950   male 2008
318 Chinstrap     Dream     46.2     17.5         187      3650 female 2008
319 Chinstrap     Dream     50.9     19.1         196      3550   male 2008
320 Chinstrap     Dream     45.5     17.0         196      3500 female 2008
321 Chinstrap     Dream     50.9     17.9         196      3675 female 2009
322 Chinstrap     Dream     50.8     18.5         201      4450   male 2009
323 Chinstrap     Dream     50.1     17.9         190      3400 female 2009
324 Chinstrap     Dream     49.0     19.6         212      4300   male 2009
325 Chinstrap     Dream     51.5     18.7         187      3250   male 2009
326 Chinstrap     Dream     49.8     17.3         198      3675 female 2009
327 Chinstrap     Dream     48.1     16.4         199      3325 female 2009
328 Chinstrap     Dream     51.4     19.0         201      3950   male 2009
329 Chinstrap     Dream     45.7     17.3         193      3600 female 2009
330 Chinstrap     Dream     50.7     19.7         203      4050   male 2009
331 Chinstrap     Dream     42.5     17.3         187      3350 female 2009
332 Chinstrap     Dream     52.2     18.8         197      3450   male 2009
333 Chinstrap     Dream     45.2     16.6         191      3250 female 2009
334 Chinstrap     Dream     49.3     19.9         203      4050   male 2009
335 Chinstrap     Dream     50.2     18.8         202      3800   male 2009
336 Chinstrap     Dream     45.6     19.4         194      3525 female 2009
337 Chinstrap     Dream     51.9     19.5         206      3950   male 2009
338 Chinstrap     Dream     46.8     16.5         189      3650 female 2009
339 Chinstrap     Dream     45.7     17.0         195      3650 female 2009
340 Chinstrap     Dream     55.8     19.8         207      4000   male 2009
341 Chinstrap     Dream     43.5     18.1         202      3400 female 2009
342 Chinstrap     Dream     49.6     18.2         193      3775   male 2009
343 Chinstrap     Dream     50.8     19.0         210      4100   male 2009
344 Chinstrap     Dream     50.2     18.7         198      3775 female 2009

注记

penguins 来自 palmerpenguins 包,是本讲使用的内置数据集,共 344 行、8 列,记录了南极三个岛屿上三种企鹅的体征测量数据。

Part 2 管道操作符

|>%>% 的思想

2.1 没有管道时的代码

假设我们要对企鹅数据做三步操作:

  1. 筛选 Gentoo 企鹅
  2. 去除体重缺失值
  3. 计算平均体重

嵌套写法(由内到外,难以阅读)

▶️ 查看代码
mean(
  subset(
    penguins[penguins$species == "Gentoo", ],
    !is.na(body_mass)
  )$body_mass
)
[1] 5076

警告

这段代码要从最内层读到最外层,逻辑顺序和代码顺序完全相反。想象一下,如果有十步操作,这段代码会有多难读……

2.2 管道的思想:从左到右,一步一步

管道操作符 |>(R 4.1+ 原生)或 %>%(magrittr 包)的含义是:

左边的结果,作为第一个参数,传给右边的函数

▶️ 查看代码
# 同样的三步操作,用管道写:
penguins |>
  filter(species == "Gentoo") |>
  drop_na(body_mass) |>
  summarise(平均体重 = mean(body_mass))
  平均体重
1     5076

提示

读法:拿到 penguins然后筛选 Gentoo,然后去除缺失,然后计算均值。

代码的阅读顺序和操作的执行顺序完全一致。这正是管道的价值所在。

2.3 |>%>% 的区别

▶️ 查看代码
# "|>"  是 R 4.1+ 的原生管道(推荐,无需加载任何包)
penguins |> filter(species == "Adelie")

# %>% 是 magrittr 包的管道(tidyverse 自带,历史更久)
penguins %>% filter(species == "Adelie")
特性 \|> %>%
来源 R 4.1+ 原生 magrittr 包
是否需要加载包 需要(tidyverse 已含)
占位符 _(R 4.2+) .
速度 略快 略慢
推荐程度 ✅ 新代码推荐 ✅ 旧代码兼容

注记

两者在 99% 的场景下可以互换。本课程统一使用 |>。快捷键:Ctrl + Shift + M(RStudio 可在设置中切换为 |>)。

2.4 管道的实际应用

▶️ 查看代码
# 多步操作串联:每种企鹅,各岛屿的平均体重
penguins |>
  drop_na(body_mass) |>
  summarise(
    平均体重 = round(mean(body_mass), 0),
    样本量   = n(),
    .by  = c(species, island)
  ) |>
  arrange(species, 平均体重)
    species    island 平均体重 样本量
1    Adelie     Dream     3688     56
2    Adelie Torgersen     3706     51
3    Adelie    Biscoe     3710     44
4 Chinstrap     Dream     3733     68
5    Gentoo    Biscoe     5076    123
arrange(
  summarise(
    drop_na(penguins, body_mass),
    平均体重 = round(mean(body_mass), 0),
    样本量   = n(),
    .by = c(species, island)
  ),
  species, 平均体重
)
# 步骤1:删除缺失值
penguins_clean <- drop_na(penguins, body_mass)

# 步骤2:按物种和岛屿分组汇总
penguins_summary <- summarise(
  penguins_clean,
  平均体重 = round(mean(body_mass), 0),
  样本量   = n(),
  .by = c(species, island)
)

# 步骤3:排序
penguins_result <- arrange(penguins_summary, species, 平均体重)

Part 3 count():快速频率统计

探索数据分布的第一步

3.1 count():最直观的频率统计

count() 是认识一份新数据时最常用的第一步——它告诉你每个类别有多少行。

▶️ 查看代码
# 最简用法:统计各物种数量

glimpse(penguins)
Rows: 344
Columns: 8
$ species     <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ad…
$ island      <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Tor…
$ bill_len    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, …
$ bill_dep    <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, …
$ flipper_len <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180,…
$ body_mass   <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, …
$ sex         <fct> male, female, female, NA, female, male, female, male, NA, …
$ year        <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
▶️ 查看代码
penguins |> count(species)
    species   n
1    Adelie 152
2 Chinstrap  68
3    Gentoo 124
▶️ 查看代码
# sort = TRUE:按数量降序排列
penguins |> count(species, sort = TRUE)
    species   n
1    Adelie 152
2    Gentoo 124
3 Chinstrap  68

注记

count(x) 等价于 group_by(x) |> summarise(n = n()),但写起来更简洁。n 是默认列名,可以用 name = 参数自定义。

3.2 多变量计数与 add_count()

▶️ 查看代码
# 多变量组合计数
penguins |>
  count(species, island, sort = TRUE)
    species    island   n
1    Gentoo    Biscoe 124
2 Chinstrap     Dream  68
3    Adelie     Dream  56
4    Adelie Torgersen  52
5    Adelie    Biscoe  44
▶️ 查看代码
# add_count():保留原始行数,只添加计数列
penguins |>
  add_count(species, name = "物种总数") |>
  select(species, island, 物种总数) |>
  head(5)
  species    island 物种总数
1  Adelie Torgersen      152
2  Adelie Torgersen      152
3  Adelie Torgersen      152
4  Adelie Torgersen      152
5  Adelie Torgersen      152

提示

count() 压缩行数(每组一行);add_count() 保留所有行,只是添加新列。两者适用场景不同,后面讲 summarise() 时还会进一步对比。

Part 4 行操作

filter()arrange()

4.1 filter():按条件筛选行

▶️ 查看代码
# 基本语法
filter(数据框, 条件1, 条件2, ...)
# 或用管道
数据框 |> filter(条件1, 条件2)
▶️ 查看代码
# 单个条件
penguins |> filter(species == "Adelie")
    species    island bill_len bill_dep flipper_len body_mass    sex year
1    Adelie Torgersen     39.1     18.7         181      3750   male 2007
2    Adelie Torgersen     39.5     17.4         186      3800 female 2007
3    Adelie Torgersen     40.3     18.0         195      3250 female 2007
4    Adelie Torgersen       NA       NA          NA        NA   <NA> 2007
5    Adelie Torgersen     36.7     19.3         193      3450 female 2007
6    Adelie Torgersen     39.3     20.6         190      3650   male 2007
7    Adelie Torgersen     38.9     17.8         181      3625 female 2007
8    Adelie Torgersen     39.2     19.6         195      4675   male 2007
9    Adelie Torgersen     34.1     18.1         193      3475   <NA> 2007
10   Adelie Torgersen     42.0     20.2         190      4250   <NA> 2007
11   Adelie Torgersen     37.8     17.1         186      3300   <NA> 2007
12   Adelie Torgersen     37.8     17.3         180      3700   <NA> 2007
13   Adelie Torgersen     41.1     17.6         182      3200 female 2007
14   Adelie Torgersen     38.6     21.2         191      3800   male 2007
15   Adelie Torgersen     34.6     21.1         198      4400   male 2007
16   Adelie Torgersen     36.6     17.8         185      3700 female 2007
17   Adelie Torgersen     38.7     19.0         195      3450 female 2007
18   Adelie Torgersen     42.5     20.7         197      4500   male 2007
19   Adelie Torgersen     34.4     18.4         184      3325 female 2007
20   Adelie Torgersen     46.0     21.5         194      4200   male 2007
21   Adelie    Biscoe     37.8     18.3         174      3400 female 2007
22   Adelie    Biscoe     37.7     18.7         180      3600   male 2007
23   Adelie    Biscoe     35.9     19.2         189      3800 female 2007
24   Adelie    Biscoe     38.2     18.1         185      3950   male 2007
25   Adelie    Biscoe     38.8     17.2         180      3800   male 2007
26   Adelie    Biscoe     35.3     18.9         187      3800 female 2007
27   Adelie    Biscoe     40.6     18.6         183      3550   male 2007
28   Adelie    Biscoe     40.5     17.9         187      3200 female 2007
29   Adelie    Biscoe     37.9     18.6         172      3150 female 2007
30   Adelie    Biscoe     40.5     18.9         180      3950   male 2007
31   Adelie     Dream     39.5     16.7         178      3250 female 2007
32   Adelie     Dream     37.2     18.1         178      3900   male 2007
33   Adelie     Dream     39.5     17.8         188      3300 female 2007
34   Adelie     Dream     40.9     18.9         184      3900   male 2007
35   Adelie     Dream     36.4     17.0         195      3325 female 2007
36   Adelie     Dream     39.2     21.1         196      4150   male 2007
37   Adelie     Dream     38.8     20.0         190      3950   male 2007
38   Adelie     Dream     42.2     18.5         180      3550 female 2007
39   Adelie     Dream     37.6     19.3         181      3300 female 2007
40   Adelie     Dream     39.8     19.1         184      4650   male 2007
41   Adelie     Dream     36.5     18.0         182      3150 female 2007
42   Adelie     Dream     40.8     18.4         195      3900   male 2007
43   Adelie     Dream     36.0     18.5         186      3100 female 2007
44   Adelie     Dream     44.1     19.7         196      4400   male 2007
45   Adelie     Dream     37.0     16.9         185      3000 female 2007
46   Adelie     Dream     39.6     18.8         190      4600   male 2007
47   Adelie     Dream     41.1     19.0         182      3425   male 2007
48   Adelie     Dream     37.5     18.9         179      2975   <NA> 2007
49   Adelie     Dream     36.0     17.9         190      3450 female 2007
50   Adelie     Dream     42.3     21.2         191      4150   male 2007
51   Adelie    Biscoe     39.6     17.7         186      3500 female 2008
52   Adelie    Biscoe     40.1     18.9         188      4300   male 2008
53   Adelie    Biscoe     35.0     17.9         190      3450 female 2008
54   Adelie    Biscoe     42.0     19.5         200      4050   male 2008
55   Adelie    Biscoe     34.5     18.1         187      2900 female 2008
56   Adelie    Biscoe     41.4     18.6         191      3700   male 2008
57   Adelie    Biscoe     39.0     17.5         186      3550 female 2008
58   Adelie    Biscoe     40.6     18.8         193      3800   male 2008
59   Adelie    Biscoe     36.5     16.6         181      2850 female 2008
60   Adelie    Biscoe     37.6     19.1         194      3750   male 2008
61   Adelie    Biscoe     35.7     16.9         185      3150 female 2008
62   Adelie    Biscoe     41.3     21.1         195      4400   male 2008
63   Adelie    Biscoe     37.6     17.0         185      3600 female 2008
64   Adelie    Biscoe     41.1     18.2         192      4050   male 2008
65   Adelie    Biscoe     36.4     17.1         184      2850 female 2008
66   Adelie    Biscoe     41.6     18.0         192      3950   male 2008
67   Adelie    Biscoe     35.5     16.2         195      3350 female 2008
68   Adelie    Biscoe     41.1     19.1         188      4100   male 2008
69   Adelie Torgersen     35.9     16.6         190      3050 female 2008
70   Adelie Torgersen     41.8     19.4         198      4450   male 2008
71   Adelie Torgersen     33.5     19.0         190      3600 female 2008
72   Adelie Torgersen     39.7     18.4         190      3900   male 2008
73   Adelie Torgersen     39.6     17.2         196      3550 female 2008
74   Adelie Torgersen     45.8     18.9         197      4150   male 2008
75   Adelie Torgersen     35.5     17.5         190      3700 female 2008
76   Adelie Torgersen     42.8     18.5         195      4250   male 2008
77   Adelie Torgersen     40.9     16.8         191      3700 female 2008
78   Adelie Torgersen     37.2     19.4         184      3900   male 2008
79   Adelie Torgersen     36.2     16.1         187      3550 female 2008
80   Adelie Torgersen     42.1     19.1         195      4000   male 2008
81   Adelie Torgersen     34.6     17.2         189      3200 female 2008
82   Adelie Torgersen     42.9     17.6         196      4700   male 2008
83   Adelie Torgersen     36.7     18.8         187      3800 female 2008
84   Adelie Torgersen     35.1     19.4         193      4200   male 2008
85   Adelie     Dream     37.3     17.8         191      3350 female 2008
86   Adelie     Dream     41.3     20.3         194      3550   male 2008
87   Adelie     Dream     36.3     19.5         190      3800   male 2008
88   Adelie     Dream     36.9     18.6         189      3500 female 2008
89   Adelie     Dream     38.3     19.2         189      3950   male 2008
90   Adelie     Dream     38.9     18.8         190      3600 female 2008
91   Adelie     Dream     35.7     18.0         202      3550 female 2008
92   Adelie     Dream     41.1     18.1         205      4300   male 2008
93   Adelie     Dream     34.0     17.1         185      3400 female 2008
94   Adelie     Dream     39.6     18.1         186      4450   male 2008
95   Adelie     Dream     36.2     17.3         187      3300 female 2008
96   Adelie     Dream     40.8     18.9         208      4300   male 2008
97   Adelie     Dream     38.1     18.6         190      3700 female 2008
98   Adelie     Dream     40.3     18.5         196      4350   male 2008
99   Adelie     Dream     33.1     16.1         178      2900 female 2008
100  Adelie     Dream     43.2     18.5         192      4100   male 2008
101  Adelie    Biscoe     35.0     17.9         192      3725 female 2009
102  Adelie    Biscoe     41.0     20.0         203      4725   male 2009
103  Adelie    Biscoe     37.7     16.0         183      3075 female 2009
104  Adelie    Biscoe     37.8     20.0         190      4250   male 2009
105  Adelie    Biscoe     37.9     18.6         193      2925 female 2009
106  Adelie    Biscoe     39.7     18.9         184      3550   male 2009
107  Adelie    Biscoe     38.6     17.2         199      3750 female 2009
108  Adelie    Biscoe     38.2     20.0         190      3900   male 2009
109  Adelie    Biscoe     38.1     17.0         181      3175 female 2009
110  Adelie    Biscoe     43.2     19.0         197      4775   male 2009
111  Adelie    Biscoe     38.1     16.5         198      3825 female 2009
112  Adelie    Biscoe     45.6     20.3         191      4600   male 2009
113  Adelie    Biscoe     39.7     17.7         193      3200 female 2009
114  Adelie    Biscoe     42.2     19.5         197      4275   male 2009
115  Adelie    Biscoe     39.6     20.7         191      3900 female 2009
116  Adelie    Biscoe     42.7     18.3         196      4075   male 2009
117  Adelie Torgersen     38.6     17.0         188      2900 female 2009
118  Adelie Torgersen     37.3     20.5         199      3775   male 2009
119  Adelie Torgersen     35.7     17.0         189      3350 female 2009
120  Adelie Torgersen     41.1     18.6         189      3325   male 2009
121  Adelie Torgersen     36.2     17.2         187      3150 female 2009
122  Adelie Torgersen     37.7     19.8         198      3500   male 2009
123  Adelie Torgersen     40.2     17.0         176      3450 female 2009
124  Adelie Torgersen     41.4     18.5         202      3875   male 2009
125  Adelie Torgersen     35.2     15.9         186      3050 female 2009
126  Adelie Torgersen     40.6     19.0         199      4000   male 2009
127  Adelie Torgersen     38.8     17.6         191      3275 female 2009
128  Adelie Torgersen     41.5     18.3         195      4300   male 2009
129  Adelie Torgersen     39.0     17.1         191      3050 female 2009
130  Adelie Torgersen     44.1     18.0         210      4000   male 2009
131  Adelie Torgersen     38.5     17.9         190      3325 female 2009
132  Adelie Torgersen     43.1     19.2         197      3500   male 2009
133  Adelie     Dream     36.8     18.5         193      3500 female 2009
134  Adelie     Dream     37.5     18.5         199      4475   male 2009
135  Adelie     Dream     38.1     17.6         187      3425 female 2009
136  Adelie     Dream     41.1     17.5         190      3900   male 2009
137  Adelie     Dream     35.6     17.5         191      3175 female 2009
138  Adelie     Dream     40.2     20.1         200      3975   male 2009
139  Adelie     Dream     37.0     16.5         185      3400 female 2009
140  Adelie     Dream     39.7     17.9         193      4250   male 2009
141  Adelie     Dream     40.2     17.1         193      3400 female 2009
142  Adelie     Dream     40.6     17.2         187      3475   male 2009
143  Adelie     Dream     32.1     15.5         188      3050 female 2009
144  Adelie     Dream     40.7     17.0         190      3725   male 2009
145  Adelie     Dream     37.3     16.8         192      3000 female 2009
146  Adelie     Dream     39.0     18.7         185      3650   male 2009
147  Adelie     Dream     39.2     18.6         190      4250   male 2009
148  Adelie     Dream     36.6     18.4         184      3475 female 2009
149  Adelie     Dream     36.0     17.8         195      3450 female 2009
150  Adelie     Dream     37.8     18.1         193      3750   male 2009
151  Adelie     Dream     36.0     17.1         187      3700 female 2009
152  Adelie     Dream     41.5     18.5         201      4000   male 2009

4.2 filter() 的常用条件

▶️ 查看代码
# 数值比较
penguins |>
  filter(body_mass > 5000)
   species island bill_len bill_dep flipper_len body_mass    sex year
1   Gentoo Biscoe     50.0     16.3         230      5700   male 2007
2   Gentoo Biscoe     50.0     15.2         218      5700   male 2007
3   Gentoo Biscoe     47.6     14.5         215      5400   male 2007
4   Gentoo Biscoe     46.7     15.3         219      5200   male 2007
5   Gentoo Biscoe     46.8     15.4         215      5150   male 2007
6   Gentoo Biscoe     49.0     16.1         216      5550   male 2007
7   Gentoo Biscoe     48.4     14.6         213      5850   male 2007
8   Gentoo Biscoe     49.3     15.7         217      5850   male 2007
9   Gentoo Biscoe     49.2     15.2         221      6300   male 2007
10  Gentoo Biscoe     48.7     15.1         222      5350   male 2007
11  Gentoo Biscoe     50.2     14.3         218      5700   male 2007
12  Gentoo Biscoe     46.3     15.8         215      5050   male 2007
13  Gentoo Biscoe     46.1     15.1         215      5100   male 2007
14  Gentoo Biscoe     47.8     15.0         215      5650   male 2007
15  Gentoo Biscoe     50.0     15.3         220      5550   male 2007
16  Gentoo Biscoe     47.3     15.3         222      5250   male 2007
17  Gentoo Biscoe     45.1     14.5         207      5050 female 2007
18  Gentoo Biscoe     59.6     17.0         230      6050   male 2007
19  Gentoo Biscoe     49.1     14.8         220      5150 female 2008
20  Gentoo Biscoe     48.4     16.3         220      5400   male 2008
21  Gentoo Biscoe     44.4     17.3         219      5250   male 2008
22  Gentoo Biscoe     48.7     15.7         208      5350   male 2008
23  Gentoo Biscoe     49.6     16.0         225      5700   male 2008
24  Gentoo Biscoe     50.5     15.9         222      5550   male 2008
25  Gentoo Biscoe     50.5     15.9         225      5400   male 2008
26  Gentoo Biscoe     44.9     13.3         213      5100 female 2008
27  Gentoo Biscoe     45.2     15.8         215      5300   male 2008
28  Gentoo Biscoe     48.5     14.1         220      5300   male 2008
29  Gentoo Biscoe     45.0     15.4         220      5050   male 2008
30  Gentoo Biscoe     50.4     15.3         224      5550   male 2008
31  Gentoo Biscoe     46.2     14.9         221      5300   male 2008
32  Gentoo Biscoe     54.3     15.7         231      5650   male 2008
33  Gentoo Biscoe     49.8     16.8         230      5700   male 2008
34  Gentoo Biscoe     49.5     16.2         229      5800   male 2008
35  Gentoo Biscoe     50.7     15.0         223      5550   male 2008
36  Gentoo Biscoe     48.2     15.6         221      5100   male 2008
37  Gentoo Biscoe     46.5     14.8         217      5200 female 2008
38  Gentoo Biscoe     48.6     16.0         230      5800   male 2008
39  Gentoo Biscoe     51.1     16.3         220      6000   male 2008
40  Gentoo Biscoe     45.2     16.4         223      5950   male 2008
41  Gentoo Biscoe     52.5     15.6         221      5450   male 2009
42  Gentoo Biscoe     50.0     15.9         224      5350   male 2009
43  Gentoo Biscoe     50.8     17.3         228      5600   male 2009
44  Gentoo Biscoe     51.3     14.2         218      5300   male 2009
45  Gentoo Biscoe     52.1     17.0         230      5550   male 2009
46  Gentoo Biscoe     52.2     17.1         228      5400   male 2009
47  Gentoo Biscoe     49.5     16.1         224      5650   male 2009
48  Gentoo Biscoe     50.8     15.7         226      5200   male 2009
49  Gentoo Biscoe     51.1     16.5         225      5250   male 2009
50  Gentoo Biscoe     55.9     17.0         228      5600   male 2009
51  Gentoo Biscoe     49.1     15.0         228      5500   male 2009
52  Gentoo Biscoe     46.8     16.1         215      5500   male 2009
53  Gentoo Biscoe     53.4     15.8         219      5500   male 2009
54  Gentoo Biscoe     48.1     15.1         209      5500   male 2009
55  Gentoo Biscoe     49.8     15.9         229      5950   male 2009
56  Gentoo Biscoe     51.5     16.3         230      5500   male 2009
57  Gentoo Biscoe     55.1     16.0         230      5850   male 2009
58  Gentoo Biscoe     48.8     16.2         222      6000   male 2009
59  Gentoo Biscoe     50.4     15.7         222      5750   male 2009
60  Gentoo Biscoe     45.2     14.8         212      5200 female 2009
61  Gentoo Biscoe     49.9     16.1         213      5400   male 2009
▶️ 查看代码
# 多个条件:AND(逗号或 &)
penguins |>
  filter(species == "Chinstrap", bill_len > 50)
     species island bill_len bill_dep flipper_len body_mass    sex year
1  Chinstrap  Dream     51.3     19.2         193      3650   male 2007
2  Chinstrap  Dream     52.7     19.8         197      3725   male 2007
3  Chinstrap  Dream     51.3     18.2         197      3750   male 2007
4  Chinstrap  Dream     51.3     19.9         198      3700   male 2007
5  Chinstrap  Dream     51.7     20.3         194      3775   male 2007
6  Chinstrap  Dream     52.0     18.1         201      4050   male 2007
7  Chinstrap  Dream     50.5     19.6         201      4050   male 2007
8  Chinstrap  Dream     50.3     20.0         197      3300   male 2007
9  Chinstrap  Dream     58.0     17.8         181      3700 female 2007
10 Chinstrap  Dream     50.6     19.4         193      3800   male 2007
11 Chinstrap  Dream     52.0     19.0         197      4150   male 2007
12 Chinstrap  Dream     50.5     18.4         200      3400 female 2008
13 Chinstrap  Dream     52.8     20.0         205      4550   male 2008
14 Chinstrap  Dream     54.2     20.8         201      4300   male 2008
15 Chinstrap  Dream     51.0     18.8         203      4100   male 2008
16 Chinstrap  Dream     52.0     20.7         210      4800   male 2008
17 Chinstrap  Dream     53.5     19.9         205      4500   male 2008
18 Chinstrap  Dream     50.9     19.1         196      3550   male 2008
19 Chinstrap  Dream     50.9     17.9         196      3675 female 2009
20 Chinstrap  Dream     50.8     18.5         201      4450   male 2009
21 Chinstrap  Dream     50.1     17.9         190      3400 female 2009
22 Chinstrap  Dream     51.5     18.7         187      3250   male 2009
23 Chinstrap  Dream     51.4     19.0         201      3950   male 2009
24 Chinstrap  Dream     50.7     19.7         203      4050   male 2009
25 Chinstrap  Dream     52.2     18.8         197      3450   male 2009
26 Chinstrap  Dream     50.2     18.8         202      3800   male 2009
27 Chinstrap  Dream     51.9     19.5         206      3950   male 2009
28 Chinstrap  Dream     55.8     19.8         207      4000   male 2009
29 Chinstrap  Dream     50.8     19.0         210      4100   male 2009
30 Chinstrap  Dream     50.2     18.7         198      3775 female 2009
▶️ 查看代码
# 多个条件:OR(|)
penguins |>
  filter(island == "Biscoe" | island == "Dream") |>
  count(island)
  island   n
1 Biscoe 168
2  Dream 124

4.3 filter() 的进阶用法

▶️ 查看代码
# %in%:匹配多个值
penguins |>
  filter(species %in% c("Adelie", "Gentoo")) |>
  count(species)
  species   n
1  Adelie 152
2  Gentoo 124
▶️ 查看代码
# between():范围筛选
penguins |>
  filter(between(bill_len, 40, 45))
     species    island bill_len bill_dep flipper_len body_mass    sex year
1     Adelie Torgersen     40.3     18.0         195      3250 female 2007
2     Adelie Torgersen     42.0     20.2         190      4250   <NA> 2007
3     Adelie Torgersen     41.1     17.6         182      3200 female 2007
4     Adelie Torgersen     42.5     20.7         197      4500   male 2007
5     Adelie    Biscoe     40.6     18.6         183      3550   male 2007
6     Adelie    Biscoe     40.5     17.9         187      3200 female 2007
7     Adelie    Biscoe     40.5     18.9         180      3950   male 2007
8     Adelie     Dream     40.9     18.9         184      3900   male 2007
9     Adelie     Dream     42.2     18.5         180      3550 female 2007
10    Adelie     Dream     40.8     18.4         195      3900   male 2007
11    Adelie     Dream     44.1     19.7         196      4400   male 2007
12    Adelie     Dream     41.1     19.0         182      3425   male 2007
13    Adelie     Dream     42.3     21.2         191      4150   male 2007
14    Adelie    Biscoe     40.1     18.9         188      4300   male 2008
15    Adelie    Biscoe     42.0     19.5         200      4050   male 2008
16    Adelie    Biscoe     41.4     18.6         191      3700   male 2008
17    Adelie    Biscoe     40.6     18.8         193      3800   male 2008
18    Adelie    Biscoe     41.3     21.1         195      4400   male 2008
19    Adelie    Biscoe     41.1     18.2         192      4050   male 2008
20    Adelie    Biscoe     41.6     18.0         192      3950   male 2008
21    Adelie    Biscoe     41.1     19.1         188      4100   male 2008
22    Adelie Torgersen     41.8     19.4         198      4450   male 2008
23    Adelie Torgersen     42.8     18.5         195      4250   male 2008
24    Adelie Torgersen     40.9     16.8         191      3700 female 2008
25    Adelie Torgersen     42.1     19.1         195      4000   male 2008
26    Adelie Torgersen     42.9     17.6         196      4700   male 2008
27    Adelie     Dream     41.3     20.3         194      3550   male 2008
28    Adelie     Dream     41.1     18.1         205      4300   male 2008
29    Adelie     Dream     40.8     18.9         208      4300   male 2008
30    Adelie     Dream     40.3     18.5         196      4350   male 2008
31    Adelie     Dream     43.2     18.5         192      4100   male 2008
32    Adelie    Biscoe     41.0     20.0         203      4725   male 2009
33    Adelie    Biscoe     43.2     19.0         197      4775   male 2009
34    Adelie    Biscoe     42.2     19.5         197      4275   male 2009
35    Adelie    Biscoe     42.7     18.3         196      4075   male 2009
36    Adelie Torgersen     41.1     18.6         189      3325   male 2009
37    Adelie Torgersen     40.2     17.0         176      3450 female 2009
38    Adelie Torgersen     41.4     18.5         202      3875   male 2009
39    Adelie Torgersen     40.6     19.0         199      4000   male 2009
40    Adelie Torgersen     41.5     18.3         195      4300   male 2009
41    Adelie Torgersen     44.1     18.0         210      4000   male 2009
42    Adelie Torgersen     43.1     19.2         197      3500   male 2009
43    Adelie     Dream     41.1     17.5         190      3900   male 2009
44    Adelie     Dream     40.2     20.1         200      3975   male 2009
45    Adelie     Dream     40.2     17.1         193      3400 female 2009
46    Adelie     Dream     40.6     17.2         187      3475   male 2009
47    Adelie     Dream     40.7     17.0         190      3725   male 2009
48    Adelie     Dream     41.5     18.5         201      4000   male 2009
49    Gentoo    Biscoe     43.3     13.4         209      4400 female 2007
50    Gentoo    Biscoe     40.9     13.7         214      4650 female 2007
51    Gentoo    Biscoe     42.0     13.5         210      4150 female 2007
52    Gentoo    Biscoe     42.9     13.1         215      5000 female 2007
53    Gentoo    Biscoe     44.5     14.3         216      4100   <NA> 2007
54    Gentoo    Biscoe     42.8     14.2         209      4700 female 2007
55    Gentoo    Biscoe     42.6     13.7         213      4950 female 2008
56    Gentoo    Biscoe     44.4     17.3         219      5250   male 2008
57    Gentoo    Biscoe     44.0     13.6         208      4350 female 2008
58    Gentoo    Biscoe     42.7     13.7         208      3950 female 2008
59    Gentoo    Biscoe     43.6     13.9         217      4900 female 2008
60    Gentoo    Biscoe     44.9     13.3         213      5100 female 2008
61    Gentoo    Biscoe     45.0     15.4         220      5050   male 2008
62    Gentoo    Biscoe     43.8     13.9         208      4300 female 2008
63    Gentoo    Biscoe     43.2     14.5         208      4450 female 2008
64    Gentoo    Biscoe     43.5     14.2         220      4700 female 2008
65    Gentoo    Biscoe     44.9     13.8         212      4750 female 2009
66    Gentoo    Biscoe     43.4     14.4         218      4600 female 2009
67    Gentoo    Biscoe     44.5     14.7         214      4850 female 2009
68    Gentoo    Biscoe     41.7     14.7         210      4700 female 2009
69    Gentoo    Biscoe     43.3     14.0         208      4575 female 2009
70    Gentoo    Biscoe     43.5     15.2         213      4650 female 2009
71    Gentoo    Biscoe     44.5     15.7         217      4875   <NA> 2009
72 Chinstrap     Dream     42.4     17.3         181      3600 female 2007
73 Chinstrap     Dream     43.2     16.6         187      2900 female 2007
74 Chinstrap     Dream     40.9     16.6         187      3200 female 2008
75 Chinstrap     Dream     42.5     16.7         187      3350 female 2008
76 Chinstrap     Dream     42.5     17.3         187      3350 female 2009
77 Chinstrap     Dream     43.5     18.1         202      3400 female 2009
▶️ 查看代码
# 筛选缺失值 / 非缺失值
penguins |>
  filter(is.na(sex))
   species    island bill_len bill_dep flipper_len body_mass  sex year
1   Adelie Torgersen       NA       NA          NA        NA <NA> 2007
2   Adelie Torgersen     34.1     18.1         193      3475 <NA> 2007
3   Adelie Torgersen     42.0     20.2         190      4250 <NA> 2007
4   Adelie Torgersen     37.8     17.1         186      3300 <NA> 2007
5   Adelie Torgersen     37.8     17.3         180      3700 <NA> 2007
6   Adelie     Dream     37.5     18.9         179      2975 <NA> 2007
7   Gentoo    Biscoe     44.5     14.3         216      4100 <NA> 2007
8   Gentoo    Biscoe     46.2     14.4         214      4650 <NA> 2008
9   Gentoo    Biscoe     47.3     13.8         216      4725 <NA> 2009
10  Gentoo    Biscoe     44.5     15.7         217      4875 <NA> 2009
11  Gentoo    Biscoe       NA       NA          NA        NA <NA> 2009

4.4 arrange():排序行

▶️ 查看代码
# 升序排列(默认)
penguins |>
  select(species, body_mass) |>
  arrange(body_mass) |>
  head(5)
    species body_mass
1 Chinstrap      2700
2    Adelie      2850
3    Adelie      2850
4    Adelie      2900
5    Adelie      2900
▶️ 查看代码
# 降序:用 desc()
penguins |>
  select(species, island, body_mass) |>
  arrange(desc(body_mass)) |>
  head(5)
  species island body_mass
1  Gentoo Biscoe      6300
2  Gentoo Biscoe      6050
3  Gentoo Biscoe      6000
4  Gentoo Biscoe      6000
5  Gentoo Biscoe      5950
▶️ 查看代码
# 多列排序:先按物种升序,再按体重降序
penguins |>
  select(species, body_mass) |>
  arrange(species, desc(body_mass)) |>
  head(8)
  species body_mass
1  Adelie      4775
2  Adelie      4725
3  Adelie      4700
4  Adelie      4675
5  Adelie      4650
6  Adelie      4600
7  Adelie      4600
8  Adelie      4500

4.5 slice():按位置选行

▶️ 查看代码
# 取第 1 到第 5 行
penguins |> slice(1:5)
  species    island bill_len bill_dep flipper_len body_mass    sex year
1  Adelie Torgersen     39.1     18.7         181      3750   male 2007
2  Adelie Torgersen     39.5     17.4         186      3800 female 2007
3  Adelie Torgersen     40.3     18.0         195      3250 female 2007
4  Adelie Torgersen       NA       NA          NA        NA   <NA> 2007
5  Adelie Torgersen     36.7     19.3         193      3450 female 2007
▶️ 查看代码
# 每组取体重最重的 2 只
penguins |>
  select(species, island, body_mass) |> 
  drop_na(body_mass) |>
  slice_max(body_mass, n = 2, by = species) 
    species island body_mass
1    Adelie Biscoe      4775
2    Adelie Biscoe      4725
3    Gentoo Biscoe      6300
4    Gentoo Biscoe      6050
5 Chinstrap  Dream      4800
6 Chinstrap  Dream      4550

提示

slice_max() / slice_min() / slice_head() / slice_tail() / slice_sample() 是一组实用的行选取函数,配合 by = 分组参数使用特别强大。

Part 5 列操作

select()mutate()

5.1 select():选择列

▶️ 查看代码
# 按名称选列
penguins |>
  select(species, island, body_mass) |>
  head(4)
  species    island body_mass
1  Adelie Torgersen      3750
2  Adelie Torgersen      3800
3  Adelie Torgersen      3250
4  Adelie Torgersen        NA
▶️ 查看代码
# 用 - 号排除列
penguins |>
  select(-year, -island) |>
  head(4)
  species bill_len bill_dep flipper_len body_mass    sex
1  Adelie     39.1     18.7         181      3750   male
2  Adelie     39.5     17.4         186      3800 female
3  Adelie     40.3     18.0         195      3250 female
4  Adelie       NA       NA          NA        NA   <NA>
▶️ 查看代码
# 选择连续列:用 :
penguins |>
  select(species, bill_len:body_mass) |>
  head(4)
  species bill_len bill_dep flipper_len body_mass
1  Adelie     39.1     18.7         181      3750
2  Adelie     39.5     17.4         186      3800
3  Adelie     40.3     18.0         195      3250
4  Adelie       NA       NA          NA        NA

5.2 select() 的辅助函数

▶️ 查看代码
# starts_with():以某字符串开头的列
penguins |>
  select(species, starts_with("bill")) |>
  head(4)
  species bill_len bill_dep
1  Adelie     39.1     18.7
2  Adelie     39.5     17.4
3  Adelie     40.3     18.0
4  Adelie       NA       NA
▶️ 查看代码
# ends_with()、contains():类似
penguins |>
  select(species, ends_with("mass")) |>
  head(4)
  species body_mass
1  Adelie      3750
2  Adelie      3800
3  Adelie      3250
4  Adelie        NA
▶️ 查看代码
# where():按列类型选择
penguins |>
  select(where(is.numeric)) |>
  head(3)
  bill_len bill_dep flipper_len body_mass year
1     39.1     18.7         181      3750 2007
2     39.5     17.4         186      3800 2007
3     40.3     18.0         195      3250 2007

5.3 rename():重命名列

▶️ 查看代码
# rename(新名称 = 旧名称)
penguins |>
  rename(
    物种     = species,
    岛屿     = island,
    体重_克  = body_mass
  ) |>
  select(物种, 岛屿, 体重_克) |>
  head(4)
    物种      岛屿 体重_克
1 Adelie Torgersen    3750
2 Adelie Torgersen    3800
3 Adelie Torgersen    3250
4 Adelie Torgersen      NA
▶️ 查看代码
# rename_with():批量重命名(用函数)
penguins |>
  rename_with(toupper, starts_with("bill")) |>
  select(species, starts_with("BILL")) |>
  head(3)
  species BILL_LEN BILL_DEP
1  Adelie     39.1     18.7
2  Adelie     39.5     17.4
3  Adelie     40.3     18.0

5.4 mutate():创建或修改列

▶️ 查看代码
# 基本语法
数据框 |> mutate(新列名 = 表达式)
▶️ 查看代码
# 创建新列:体重从克转换为千克
penguins |>
  mutate(体重_千克 = body_mass / 1000) |>
  select(species, body_mass, 体重_千克) |>
  head(4)
  species body_mass 体重_千克
1  Adelie      3750      3.75
2  Adelie      3800      3.80
3  Adelie      3250      3.25
4  Adelie        NA        NA
▶️ 查看代码
# 同时创建多个新列,后面的列可以引用前面刚创建的列
penguins |>
  mutate(
    体重_千克    = body_mass / 1000,
    嘴峰长宽比   = bill_len / bill_dep,
    大型企鹅     = body_mass > 4500
  ) |>
  select(species, 体重_千克, 嘴峰长宽比, 大型企鹅) |>
  head(4)
  species 体重_千克 嘴峰长宽比 大型企鹅
1  Adelie      3.75       2.09    FALSE
2  Adelie      3.80       2.27    FALSE
3  Adelie      3.25       2.24    FALSE
4  Adelie        NA         NA       NA

5.5 mutate() 与条件判断

▶️ 查看代码
# if_else():简单二元条件(比 ifelse() 更严格,类型安全)
penguins |>
  mutate(
    体型 = if_else(body_mass > 4200, "大型", "小型")
  ) |>
  select(species, body_mass, 体型) |>
  head(5)
  species body_mass 体型
1  Adelie      3750 小型
2  Adelie      3800 小型
3  Adelie      3250 小型
4  Adelie        NA <NA>
5  Adelie      3450 小型
▶️ 查看代码
# case_when():多个条件分支(tidyverse 版的 ifelse 嵌套)
penguins |>
  mutate(
    体重等级 = case_when(
      body_mass >= 5000 ~ "重型",
      body_mass >= 4000 ~ "中型",
      body_mass >= 3000 ~ "轻型",
      .default            = "超轻"
    )
  ) |>
  count(体重等级)
  体重等级   n
1     中型 110
2     超轻  11
3     轻型 156
4     重型  67

5.6 mutate() 的进阶用法

▶️ 查看代码
# 修改已有列(覆盖原列)
penguins |>
  mutate(
    species = str_to_upper(species)   # 物种名改为大写
  ) |>
  count(species)
    species   n
1    ADELIE 152
2 CHINSTRAP  68
3    GENTOO 124
▶️ 查看代码
# across():对多列同时应用同一个操作
penguins |>
  mutate(
    across(where(is.numeric), ~ round(.x, 1))
  ) |>
  select(bill_len, bill_dep, flipper_len) |>
  head(4)
  bill_len bill_dep flipper_len
1     39.1     18.7         181
2     39.5     17.4         186
3     40.3     18.0         195
4       NA       NA          NA

提示

across() 是 dplyr 1.0 引入的神器。~ round(.x, 1) 是匿名函数写法,.x 代表当前列。等价于对每一个符合条件的列,执行 round(列, 1)

Part 6 汇总与分组

summarise().by=

6.1 summarise():汇总统计

▶️ 查看代码
mean(penguins$body_mass, na.rm = T)
[1] 4202
▶️ 查看代码
sd(penguins$body_mass, na.rm = T)
[1] 802
▶️ 查看代码
max(penguins$body_mass, na.rm = T)
[1] 6300
▶️ 查看代码
sum(is.na(penguins$body_mass))
[1] 2
▶️ 查看代码
# 对整个数据集计算汇总统计
penguins |>
  summarise(
    样本量     = n(),
    平均体重   = mean(body_mass, na.rm = TRUE),
    体重标准差 = sd(body_mass, na.rm = TRUE),
    最大体重   = max(body_mass, na.rm = TRUE),
    缺失数量   = sum(is.na(body_mass))
  )
  样本量 平均体重 体重标准差 最大体重 缺失数量
1    344     4202        802     6300        2

注记

summarise() 的结果是一个新的 tibble,原数据框的行被"压缩"成汇总行。常用汇总函数:n()mean()sd()median()min()max()sum()n_distinct()

6.2 .by=:分组操作的新语法

dplyr 1.1.0 引入了 .by= 参数,可以直接在 summarise()mutate() 内部指定分组,无需 group_by()ungroup()

▶️ 查看代码
# .by= 直接在 summarise() 内分组
penguins |>
  summarise(
    样本量   = n(),
    平均体重 = round(mean(body_mass, na.rm = TRUE), 0),
    平均嘴长 = round(mean(bill_len, na.rm = TRUE), 1),
    .by = species
  )
    species 样本量 平均体重 平均嘴长
1    Adelie    152     3701     38.8
2    Gentoo    124     5076     47.5
3 Chinstrap     68     3733     48.8
▶️ 查看代码
# 多变量分组:用 c() 传入多个列
penguins |>
  drop_na(sex) |>
  summarise(
    样本量   = n(),
    平均体重 = round(mean(body_mass, na.rm = TRUE), 0),
    .by = c(species, sex)
  )
    species    sex 样本量 平均体重
1    Adelie   male     73     4043
2    Adelie female     73     3369
3    Gentoo female     58     4680
4    Gentoo   male     61     5485
5 Chinstrap female     34     3527
6 Chinstrap   male     34     3939

提示

.by=group_by() 更简洁,且操作完成后自动取消分组,不会留下"隐藏状态"造成后续意外。本课程统一推荐使用 .by=

6.3 .by=mutate():组内计算

▶️ 查看代码
# .by= 同样适用于 mutate():在原数据上添加组内统计列
# (不压缩行数,只是添加新列)
penguins |>
  mutate(
    组内均值     = mean(body_mass, na.rm = TRUE),
    与均值之差   = body_mass - 组内均值,
    组内排名     = rank(desc(body_mass)),
    .by = species
  ) |>
  select(species, body_mass, 组内均值, 与均值之差, 组内排名) |>
  filter(species == "Adelie") |>
  head(5)
  species body_mass 组内均值 与均值之差 组内排名
1  Adelie      3750     3701       49.3     66.5
2  Adelie      3800     3701       99.3     59.5
3  Adelie      3250     3701     -450.7    125.5
4  Adelie        NA     3701         NA    152.0
5  Adelie      3450     3701     -250.7    103.5

提示

.by= + mutate().by= + summarise() 的关键区别:前者保留所有行,只是添加组内统计列;后者压缩行数,每组只剩一行汇总结果。

6.4 综合实战:完整的数据分析流程

▶️ 查看代码
# 从原始数据到汇总报告:一气呵成
penguins |>
  drop_na(body_mass, sex) |>
  mutate(
    嘴峰长宽比 = round(bill_len / bill_dep, 2),
    体重等级   = if_else(body_mass > 4200, "大型", "小型")
  ) |>
  summarise(
    样本量       = n(),
    平均体重     = round(mean(body_mass), 0),
    平均长宽比   = round(mean(嘴峰长宽比), 2),
    大型比例     = round(mean(体重等级 == "大型") * 100, 1),
    .by = c(species, sex)
  ) |>
  arrange(species, sex)
    species    sex 样本量 平均体重 平均长宽比 大型比例
1    Adelie female     73     3369       2.12      0.0
2    Adelie   male     73     4043       2.12     32.9
3 Chinstrap female     34     3527       2.65      0.0
4 Chinstrap   male     34     3939       2.66     20.6
5    Gentoo female     58     4680       3.20     91.4
6    Gentoo   male     61     5485       3.15    100.0

本讲小结

  • tidyverse 是什么:一组共享设计哲学的包的集合;核心是 tidy data——每列一个变量,每行一个观测;代码可读性是它的核心价值

  • 管道操作符|> 把左边的结果传给右边;让代码从左到右、从上到下,与操作顺序一致;快捷键 Ctrl+Shift+M

  • count():频率统计最简写法,count(x) 直接返回各类别计数;add_count() 保留原始行数;sort = TRUE 降序排列

  • 行操作filter() 按条件筛选行,支持 %in%between()is.na()arrange() 排序,desc() 降序;drop_na() 去除缺失行;slice_max() 等取极值行

  • 列操作select() 选列,starts_with() 等辅助函数;mutate() 创建新列,if_else() 二元条件,case_when() 多分支;across() 批量操作多列

  • 汇总与分组summarise() 压缩为汇总行;.by= 直接在函数内分组,无需 group_by() / ungroup()mutate(.by=) 添加组内统计列而不压缩行数

课后练习

基础练习(必做)

  1. 使用 penguins 数据,筛选出 Biscoe 岛上体重超过 4500 克的企鹅,按体重降序排列,只保留物种、岛屿、体重、性别四列
  2. mutate() 创建新列:(a)将体重从克转换为磅(1 克 = 0.00220462 磅);(b)用 case_when() 将嘴峰长度分为"短"(< 40mm)、"中"(40–50mm)、"长"(> 50mm)三类
  3. 按物种和岛屿分组(使用 .by=),计算每组的样本量、体重均值和标准差,结果保留一位小数

进阶挑战(选做)

  1. 找出每个物种中,嘴峰长宽比(bill_len / bill_dep)最高的 3 只企鹅,输出包含物种、岛屿、性别、长宽比的表格(提示:用 slice_max(.., by = species)
  2. mutate(.by=) 计算每只企鹅的体重在同物种同性别中的百分位排名(提示:用 percent_rank()),找出各组中排名前 10% 的个体

下讲预告

第6讲:tidyverse 数据操作(下)

  • tidyr:数据整形 —— pivot_longer()pivot_wider()
  • 多表合并:left_join()inner_join()anti_join()
  • 字符串处理:stringr 核心函数
  • 缺失值处理:drop_na()fill()replace_na()
  • 综合实战:从"乱"数据到"整洁"数据的完整流程

提示

第6讲是本讲的延伸——学完这两讲,你就能应对日常数据分析中 80% 的数据清洗任务,无论数据有多"乱"。

谢谢!

第5讲:tidyverse 数据操作(上)


「整洁的数据是分析的起点;tidyverse 让整理数据这件事,本身也变得整洁。」