Non Hybrid Long Read Consensus Using Local De Bruijn Graph Assembly 非混合长读一致性使用局部德布鲁因图装配

Abstract

While second generation sequencing led to a vast increase in sequenced data, the shorter reads which came with it made assembly a much harder task and for some regions impossible with only short read data. This changed again with the advent of third generation long read sequencers. The length of the long reads allows a much better resolution of repetitive regions, their high error rate however is a major challenge. Using the data successfully requires to remove most of the sequencing errors. The first hybrid correction methods used low noise second generation data to correct third generation data, but this approach has issues when it is unclear where to place the short reads due to repeats and also because second generation sequencers fail to sequence some regions which third generation sequencers work on. Later non hybrid methods appeared. We present a new method for non hybrid long read error correction based on De Bruijn graph assembly of short windows of long reads with subsequent combination of these correct windows to corrected long reads. Our experiments show that this method yields a better correction than other state of the art non hybrid correction approaches.

虽然第二代测序导致了测序数据的大量增加，但随之而来的短读取使组装变得更加困难，对于某些地区来说，只有短读取数据是不可能的。
随着第三代长读测序器的出现，这种情况又发生了改变。
长读取的长度允许一个更好的分辨率的重复区域，但他们的高错误率是一个主要的挑战。
成功地使用这些数据需要去除大部分的测序错误。
第一个混合校正方法用低噪声第二代数据正确的第三代数据,但是这种方法有问题时不清楚短读的位置由于重复也因为第二代测序失败序列一些第三代测序工作的区域。
后来出现了非混合方法。
本文提出了一种新的非混合长读纠错方法，该方法基于长读的短窗口的德布鲁因图装配，然后将这些正确的窗口组合起来进行长读纠错。
实验结果表明，该方法比现有的非混合校正方法具有更好的校正效果。

相关阅读:
uvm_reg_map——寄存器模型（八）
uvm_reg_block——寄存器模型（七）
mysql_secure_installation 安全安装（用于生产环境设置）
一键安装nginx-1.12.2
nginx优化之隐藏版本号
shell script 之六：循环 while
shell script 之五：循环控制 for
shell script 之四：流程控制 if 分支语句
shell script 之三：打印输出 echo printf
shell script 之一：变量和赋值

原文地址：https://www.cnblogs.com/wangprince2017/p/13756448.html