• 【R读取报错】解决: Can't bind data because some arguments have the same name


    最近读取一个数据时,报如标题的错误。

    args[1] <- "RT_10-VS-RT_0"
    all <- read.delim(paste0(args[1],".xls"),header = T,check.names = F) 
    dat <- all %>% dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence) 
    

    这是因为select函数对于有重复列名的数据框,选择不了。(即使不选择重复的列也会报此错误)。

    可以用以下脚本查下重复的列名:

    #检查重复列名
    > tibble::enframe(names(all)) %>% count(value) %>% filter(n > 1)
    # A tibble: 1 x 2
      value          n
      <chr>      <int>
    1 Protein_ID     2
    

    发现有两个Protein_ID的列。

    如何解决呢?可改用readr读取,会智能解析。

    all <- readr::read_delim(paste0(args[1],".xls"),delim = "	") %>% 
      dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)
    
    Parsed with column specification:
    cols(
      .default = col_character(),
      No. = col_double(),
      Mass = col_double(),
      Protein_Coverage = col_double(),
      `Mean_Ratio_RT_10_118/RT_0_117` = col_double(),
      `Tremble Identity` = col_double(),
      `Tremble E-value` = col_double()
    )
    See spec(...) for full column specifications.
    Warning: 29 parsing failures.
     row                           col expected actual                file
    1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
    1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    .... ............................. ........ ...... ...................
    See problems(...) for more details.
    
    Warning message:
    Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]
    

    警告中也有提示解析(按默认解析方式col_double)失败的列和行,提示了重复列Protein_ID。怎么去掉长长的Parsed with column specification信息呢,我们可以指定读入时列名解析类型,或使用默认参数col_types = cols()

    all <- readr::read_delim(paste0(args[1],".xls"),delim = "	",col_types = cols()) %>% 
      dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)  
    
    Warning: 29 parsing failures.
     row                           col expected actual                file
    1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
    1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
    1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
    .... ............................. ........ ...... ...................
    See problems(...) for more details.
    
    Warning message:
    Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14] 
    

    警告信息还在,最好保留。

    Ref:https://github.com/tidyverse/readr/issues/954

  • 相关阅读:
    进阶篇:3.1.8)DFM塑胶-注射模具和设备
    PAT-甲级-1001-A+B Format
    腾讯-004-两个排序数组的中位数
    机器学习三 决策树
    腾讯-002-两数相加
    2018.3.15
    2018.3.14
    2018.3.13
    废代码合集
    边缘检测程序(matlab)
  • 原文地址:https://www.cnblogs.com/jessepeng/p/12452211.html
Copyright © 2020-2023  润新知