• The advantages and disadvantages of short- and long-read metagenomics to infer bacterial and eukaryotic community composition


    短读宏基因组学和长读宏基因组学在推断细菌和真核生物群落组成方面的优缺点

    Abstract

    Background The first step in understanding ecological community diversity and dynamics is quantifying community membership. An increasingly common method for doing so is through metagenomics. Because of the rapidly increasing popularity of this approach, a large number of computational tools and pipelines are available for analysing metagenomic data. However, the majority of these tools have been designed and benchmarked using highly accurate short read data (i.e. illumina), with few studies benchmarking classification accuracy for long error-prone reads (PacBio or Oxford Nanopore). In addition, few tools have been benchmarked for non-microbial communities.

    Results Here we use simulated error prone Oxford Nanopore and high accuracy Illumina read sets to systematically investigate the effects of sequence length and taxon type on classification accuracy for metagenomic data from both microbial and non-microbial communities. We show that very generally, classification accuracy is far lower for non-microbial communities, even at low taxonomic resolution (e.g. family rather than genus).

    Conclusions We then show that for two popular taxonomic classifiers, long error-prone reads can significantly increase classification accuracy, and this is most pronounced for non-microbial communities. This work provides insight on the expected accuracy for metagenomic analyses for different taxonomic groups, and establishes the point at which read length becomes more important than error rate for assigning the correct taxon.

    背景
    理解生态群落多样性和动态的第一步是量化群落成员。
    一种越来越普遍的方法是通过宏基因组学。
    由于这种方法的迅速流行,有大量的计算工具和管道可用于分析宏基因组数据。
    然而,这些工具中的大多数都是使用高度精确的短读数据(例如illumina)设计和基准测试的,很少有研究对容易出错的长读数据(PacBio或Oxford Nanopore)的分类精度进行基准测试。
    此外,很少有工具被作为非微生物群落的基准。

    结果
    在这里,我们使用模拟易出错的牛津纳米孔和高精度Illumina read集,系统地研究序列长度和分类单元类型对微生物和非微生物群落元基因组数据分类精度的影响。
    我们发现,一般来说,非微生物群落的分类精度要低得多,即使分类分辨率很低(例如,科而不是属)。

    结论
    然后我们表明,对于两种流行的分类分类器,长时间容易出错的读取可以显著提高分类精度,这对非微生物群落最为明显。
    这项工作提供了对不同分类组的元基因组分析的预期准确性的见解,并建立了点,在这一点上,读取长度变得比错误率更重要的分配正确的分类单元。

  • 相关阅读:
    B
    Labyrinth 树的直径加DFS
    Speech to Text for iOS
    苹果开发者:Siri未开放API 有些让人失望
    ios6.0 siri语音识别
    Sample example for Speech to Text in iOS
    iOS升级经验分享
    苹果放宽了 iOS 5.0 对应用本地存储的限制
    iOS5可能会删除本地文件储存
    iOS 5 does not allow to store downloaded data in Documents directory? ios5.0及以后的版本对于下载的文件存储路径有了改变
  • 原文地址:https://www.cnblogs.com/wangprince2017/p/13756456.html
Copyright © 2020-2023  润新知