• Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data


    Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data利用光学制图和染色体构象捕获数据改善和纠正三种植物长read  基因组装配的连续性

    1. Wen-Biao Jiao1
    2. 利用光学制图和染色体构象捕获数据改善和纠正三种植物长时间读取基因组装配的连续性
    3. Gonzalo Garcia Accinelli2
    4. Benjamin Hartwig1
    5. Christiane Kiefer1
    6. David Baker2
    7. Edouard Severing1
    8. Eva-Maria Willing1
    9. Mathieu Piednoel1
    10. Stefan Woetzel1
    11. Eva Madrid-Herrero1
    12. Bruno Huettel3
    13. Ulrike Hümann1
    14. Richard Reinhard3
    15. Marcus A. Koch4
    16. Daniel Swan2
    17. Bernardo Clavijo2
    18. George Coupland1 and 
    19. Korbinian Schneeberger1

    +Author Affiliations

    1. 1Department of Plant Developmental Biology, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany;
    2. 2Earlham Institute, Norwich Research Park, Norwich NR4 7UH, United Kingdom;
    3. 3Max Planck-Genome-center Cologne, 50829 Cologne, Germany;
    4. 4Department of Biodiversity and Plant Systematics, Centre for Organismal Studies (COS) Heidelberg, Heidelberg University, 69120 Heidelberg, Germany
    1. Corresponding author: schneeberger@mpipz.mpg.de

    Abstract

    Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes;

    however, at present additional scaffolding is needed to achieve chromosome-level assemblies.

    We generated Pacific Biosciences (PacBio) long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs.

    To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding.

    Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values.

    After improving both integration methods, assembly contiguity reached chromosome-arm-levels.

    We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information.

    This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome.

    Most, but not all, of these misjoints were removed during the integration of the optical mapping and chromosome conformation capture data.

    Even though none of the centromeres were fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences.

    长读序列可以克服短读序列在真核生物基因组组装中的不足;

    然而,目前需要额外的脚手架来实现染色体级别的装配。

    我们获得了太平洋生物科学公司(PacBio)模式植物拟南芥的三个亲戚的长期基因组数据,并将所有三个基因组组装成仅几百个contigs。

    为了提高这些组件的连续性,我们生成了BioNano Genomics光学制图和燕尾基因组染色体构象捕获数据基因组脚手架。

    尽管技术上有差异,光学测图和染色体构象捕获的表现相似,N50值翻了一番。

    通过对两种积分方法的改进,装配的连续性达到了染色体臂的水平。

    我们使用Illumina配对库和遗传图谱信息严格评估了contigs和支架的质量。
    这表明PacBio装配具有很高的序列精度,但可能包含几个错误装配,连接基因组的未连接区域。
    大部分,但不是全部,这些错位被删除在整合光学映射和染色体构象捕获数据。
    虽然没有一个着丝粒完全组装,但是支架显示了一些着丝粒区域的大部分,甚至包括一些在金标准参考序列中不存在的异色区域。

  • 相关阅读:
    使用 richtextbox 输出程序运行信息
    多线程 更新 winform 控件的值,以避免UI线程的卡顿
    多线程 以及 主程序退出时 子线程的销毁
    supersocket 通过配置文件启动服务 总是 初始化失败的 解决办法
    增删改存储过程 框架
    winform DataGridView 通用初始化
    SQLServer存储过程 实例,很多语法可以以后参考
    Winform中 DataGridView控件中的 CheckBox 的值读出来 始终 为 False ,已解决
    winform中 让 程序 自己重启
    字符数组什么时候要加‘’
  • 原文地址:https://www.cnblogs.com/wangprince2017/p/13756103.html
Copyright © 2020-2023  润新知