研究人员必备良器

R
packages
作者

Reddy Lee

发布于

2023年7月30日星期日

修改于

2024年12月12日星期四


首先,让我们学习一下研究人员必知必会的R基本工具包:tidyverse [@tidyverse-2]

1 Setup

代码
library(tidyverse) # 调用工具包

2 基本操作:

2.1 选择已有变量(行): select

2.2 筛选样本(列): filter

2.3 生成新变量: mutate

代码
data() # 使用R的内置数据

starwars %>% 
  head() # 查看数据前6行,也可以输入具体行数
# A tibble: 6 × 14
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>
代码
starwars %>% 
  select(gender,mass,height,species) %>%   # 选择变量会有提示,提高输入效率
  filter(species == "Human") %>% 
  na.omit() %>%  # 去掉NA数据
  mutate(height = height / 100, # 这里使用相同变量名,则会替换掉原变量
         BMI = mass / height^2) %>%  # 这里使用不同的变量名,则会新生成一个变量
  summarise(Average_BMI = mean(BMI),.by = gender) # tidyverse升级后,group_by 可以通过.by实现
# A tibble: 2 × 2
  gender    Average_BMI
  <chr>           <dbl>
1 masculine        25.7
2 feminine         20.8

2.4 教学视频

3 分类命令:case_when()

case_when 命令用于将数据按一定条件进行分类。

3.1 导入样本数据

代码
gradebook <- read_csv("https://raw.githubusercontent.com/equitable-equations/youtube/main/Mutating%20data%20frames%20with%20case_when/gradebook.csv")

view(gradebook)

3.2 演示

代码
gradebook %>% 
  mutate(grade = case_when(score>=90~"A", # mutate 是创建新列的基本命令,这里令90分以上为A等
                           score>=80~"B", # 注意我们不需要加上 score<90的条件,case_when足够智能
                           score>=70~"c",
                           score>=60~"D",
                           .default = "F")) # 最后是一个兜底的条件,即 “其他全部”归入F等。
# A tibble: 34 × 3
   name       score grade
   <chr>      <dbl> <chr>
 1 student 1     80 B    
 2 student 2     66 D    
 3 student 3     72 c    
 4 student 4     75 c    
 5 student 5     74 c    
 6 student 6     71 c    
 7 student 7     77 c    
 8 student 8     49 F    
 9 student 9     66 D    
10 student 10    84 B    
# ℹ 24 more rows
代码
starwars |> 
  select(name, species, height, contains("color")) |> 
  filter(species == "Human",
         height < 200,
         eye_color  %in% c("blue", "brown")) |>
  head()
# A tibble: 6 × 6
  name               species height hair_color  skin_color eye_color
  <chr>              <chr>    <int> <chr>       <chr>      <chr>    
1 Luke Skywalker     Human      172 blond       fair       blue     
2 Leia Organa        Human      150 brown       light      brown    
3 Owen Lars          Human      178 brown, grey light      blue     
4 Beru Whitesun Lars Human      165 brown       light      blue     
5 Biggs Darklighter  Human      183 black       light      brown    
6 Anakin Skywalker   Human      188 blond       fair       blue     
代码
starwars |> 
  select(name, species, height, mass) |> 
  mutate(height = height / 100,
         BMI = mass / height^2) |>
  head()
# A tibble: 6 × 5
  name           species height  mass   BMI
  <chr>          <chr>    <dbl> <dbl> <dbl>
1 Luke Skywalker Human     1.72    77  26.0
2 C-3PO          Droid     1.67    75  26.9
3 R2-D2          Droid     0.96    32  34.7
4 Darth Vader    Human     2.02   136  33.3
5 Leia Organa    Human     1.5     49  21.8
6 Owen Lars      Human     1.78   120  37.9
代码
msleep |> 
  select(genus, order, sleep_total) |> 
  arrange(-sleep_total) |>
  head()
# A tibble: 6 × 3
  genus      order           sleep_total
  <chr>      <chr>                 <dbl>
1 Myotis     Chiroptera             19.9
2 Eptesicus  Chiroptera             19.7
3 Lutreolina Didelphimorphia        19.4
4 Priodontes Cingulata              18.1
5 Didelphis  Didelphimorphia        18  
6 Dasypus    Cingulata              17.4
代码
starwars |> 
  select(name, species, height, contains("color")) |>
  mutate(species = recode(species, 
                          "Droid" = "Robot")) |> 
  head(10)
# A tibble: 10 × 6
   name               species height hair_color    skin_color  eye_color
   <chr>              <chr>    <int> <chr>         <chr>       <chr>    
 1 Luke Skywalker     Human      172 blond         fair        blue     
 2 C-3PO              Robot      167 <NA>          gold        yellow   
 3 R2-D2              Robot       96 <NA>          white, blue red      
 4 Darth Vader        Human      202 none          white       yellow   
 5 Leia Organa        Human      150 brown         light       brown    
 6 Owen Lars          Human      178 brown, grey   light       blue     
 7 Beru Whitesun Lars Human      165 brown         light       blue     
 8 R5-D4              Robot       97 <NA>          white, red  red      
 9 Biggs Darklighter  Human      183 black         light       brown    
10 Obi-Wan Kenobi     Human      182 auburn, white fair        blue-gray
代码
starwars |> 
  select(sex, height, mass) |> 
  filter(sex == "male" | sex == "female") |>
  mutate(height = height / 100) |>
  drop_na() |>
  group_by(sex) |>
  summarise(mean_height = mean(height),
            mean_mass = mean(mass))
# A tibble: 2 × 3
  sex    mean_height mean_mass
  <chr>        <dbl>     <dbl>
1 female        1.72      54.7
2 male          1.78      80.2

3.3 教学视频

回到顶部