基因组组装
Supplementary Text and Figures (1,578 KB)
基因大小2.5G,contig N50 34 kb,scaffold N50 1.6 Mb。
Short-insert paired-end (180, 300, and 500 bp) and large-insert mate-pair libraries (2, 5, 10 kb)
All libraries were sequenced at 2 × 100 bp on an Illumina HiSeq 2000 platform
Three BAC libraries with average insert sizes of 160 kb, 152 kb and 100 kb, respectively, were sequenced at both ends using the Sanger sequencing method.
Genome assembly, scaffolding and gap-closing.
All sequences were assembled using the SOAPdenovo package12. A de Bruijn graph was built using a K-mer size of 63. After removing tips, merging bubbles and concatenating the tiny repeats, contigs were built from the simplified de Bruijn graph. Paired-end short reads were then aligned back onto the contigs to construct the linkage relationship for contigs. Scaffolds were assembled based on these paired-end links and gaps in the scaffolds were filled by Gapcloser. BAC-ends were mapped to the assembly using BWA-SW software12. Further scaffolding was then conducted, based on links between BAC-ends.
野生大豆泛基因组阐明遗传多样性与重要农艺性状
Supplementary Text and Figures (10,233 KB)
De novo assembly.
First, we generated a 17-mer depth distribution of short-insert paired-end reads using Meryl50 and applied GCE51 to estimate the genome sizes of individual G. soja accessions. Reads were preprocessed by ALLPATHS-LG52 error correction module to remove base calling errors. We also used ErrorCorrection in SOAPdenovo11 package to connect 180-bp library pair end reads and to generate longer sequences for assembly. Reads of 180-bp and 500-bp library were used for contig building, and all pair-end reads libraries were used to provide links for scaffold construction. GapCloser (v1.12) from SOAPdenovo11 package was used for gap filling within assembled scaffolds using all pair-end reads. Finally, scaffold sequences, which can be aligned to bacterial genomes with identity ≥95% and e-value ≤1e-5, were filtered.
基因组组装软件评估文章
1. Bao, S., et al. (2011). "Evaluation of next-generation sequencing software in mapping and assembly." J Hum Genet.
2. Vezzi, F., et al. (2012). "Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons." PLoS One 7(12): e52210.
3. Salzberg, S. L., et al. (2012). "GAGE: A critical evaluation of genome assemblies and assembly algorithms." Genome Res 22(3): 557-567.
4. Zhang, W., et al. (2011). "A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies." PLoS One 6(3): e17915.
5. Narzisi, G. and B. Mishra (2011). "Comparing de novo genome assembly: the long and short of it." PLoS One 6(4): e19175.
6. Lin, Y., et al. (2011). "Comparative Studies of de novo Assembly Tools for Next-generation Sequencing Technologies." Bioinformatics.
7. Finotello, F., et al. (2011). "Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data." Brief Bioinform.
8. Earl, D. A., et al. (2011). "Assemblathon 1: A competitive assessment of de novo short read assembly methods." Genome Res.
三代组装
Assembly and diploid architecture of an individual human genome via single-molecule technologies
Falcon讲得非常详细,有详细的配置文件和方法。
Long-read sequence assembly of the gorilla genome
使用Falcon组装,没找到具体的配置参数。
多种组装软件的比较,其中包括Falcon