【R读取报错】解决: Can't bind data because some arguments have the same name

最近读取一个数据时，报如标题的错误。

args[1] <- "RT_10-VS-RT_0"
all <- read.delim(paste0(args[1],".xls"),header = T,check.names = F) 
dat <- all %>% dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)

这是因为select函数对于有重复列名的数据框，选择不了。（即使不选择重复的列也会报此错误）。

可以用以下脚本查下重复的列名：

#检查重复列名
> tibble::enframe(names(all)) %>% count(value) %>% filter(n > 1)
# A tibble: 1 x 2
  value          n
  <chr>      <int>
1 Protein_ID     2

发现有两个Protein_ID的列。

如何解决呢？可改用readr读取，会智能解析。

all <- readr::read_delim(paste0(args[1],".xls"),delim = "	") %>% 
  dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)

Parsed with column specification:
cols(
  .default = col_character(),
  No. = col_double(),
  Mass = col_double(),
  Protein_Coverage = col_double(),
  `Mean_Ratio_RT_10_118/RT_0_117` = col_double(),
  `Tremble Identity` = col_double(),
  `Tremble E-value` = col_double()
)
See spec(...) for full column specifications.
Warning: 29 parsing failures.
 row                           col expected actual                file
1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.

Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]

警告中也有提示解析（按默认解析方式col_double）失败的列和行，提示了重复列Protein_ID。怎么去掉长长的Parsed with column specification信息呢，我们可以指定读入时列名解析类型，或使用默认参数col_types = cols()。

all <- readr::read_delim(paste0(args[1],".xls"),delim = "	",col_types = cols()) %>% 
  dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)  

Warning: 29 parsing failures.
 row                           col expected actual                file
1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.

Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]

警告信息还在，最好保留。

Ref：https://github.com/tidyverse/readr/issues/954

相关阅读:
进阶篇：3.1.8）DFM塑胶-注射模具和设备
 PAT-甲级-1001-A+B Format
腾讯-004-两个排序数组的中位数
 机器学习三决策树
 腾讯-002-两数相加
 2018.3.15
2018.3.14
2018.3.13
废代码合集
 边缘检测程序（matlab）
原文地址：https://www.cnblogs.com/jessepeng/p/12452211.html