• 安装生物信息学软件-bowtie2


    所以,啃文档咯,官方文档Version 2.2.9 http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml


    Bowtie2适合将长度50-1000bp的reads比对到的参考序列上。Bowtie 2 indexes the genome with an FM Index
    (based on the Burrows-Wheeler Transform or BWT) 。输出结果为SAM格式。已集成在很多软件中,如
    TopHat(a fast splice junction mapper for RNA-seq reads),
    Cufflinks(transcriptome assembly and isoform quantitiation from RNA-seq reads),
    Crossbow( cloud-enabled tool for analyzing reseuqncing data),
    Myrna(a cloud-enabled tool for aligning RNA-seq reads and measuring differential gene expression)。

    Bowtie1和2的区别:Bowtie 2's command-line arguments and genome index format are both different from Bowtie 1's.
    2,Bowtie 2支持有空位的比对Number of gaps and gap lengths are not restricted, except by way of the configurable scoring scheme.
    3,Bowtie 2支持局部比对(local, some chars will be omited/trimmed),也可以全局比对(end-to-end, all char participate)
    4,Bowtie 2对最长序列没有要求,但是Bowtie 1最长不能超过1000bp。
    5. Bowtie 2 allows alignments to [overlap ambiguous characters] (e.g. `N`s) in the reference. Bowtie 1 does not.
    6,Bowtie 2不能比对colorspace reads.
    7, Bowtie 2's paired-end alignment is more flexible. Try to find unpaired alignments for each mate。
    8, Bowtie 2 reports a spectrum of mapping qualities, Bowtie 1 reports either 0 or high。

    MUMmer: align 2 very large sequences(eg: 2 genomes)
    NUCmer, BLAT/BLAST, Bowtie2: sensitive alignment to short ref seq(eg: a bacterial genome)

    安装bowtie2: 直接下载bowtie2-2.2.9-linux-x86_64.zip,解压,修改环境路径即可


    Scores: 更高分=更相似
    --ma :match bonus
    --mp :mismatch penalty
    --np :penality for having N in either the read or the ref
    --rdg :affine read gap penalty
    --rfg :affine ref gap penalty
    全局比对栗子:默认,高质量位点的mismatch罚分为-6,长度为2的gap罚分为-11(gap open-5, extension-3),如果在长度为50的read中只有这两个问题,则总分为-17。所以,最好的分数是0,指read和ref完全相同。
    default min score threshold:
    paired-end read:两个mate的分数相加

    Mapping quality:higher=more unique

    bowtie2是对paired read的每个mate分别比对的,所以如果两个比对结果不符合预期(比如方向不合适或者距离不合适),就说是align discordantly,这种在研究structural variants时有用。
    --ff --fr --rf: expected relative orientation of the mates
    -I and -X: expected range of inter-mates distances
    --no-discordant: 禁止找discordant alignments
    --no-mixed: mixed mode指对一个pair找不到paired-end alignment时,为每个mate找unpaired alignments。关了这个会快一点点。

    SAM 格式中有一些flag和optional fields描述了paired-end特征

    Mates can overlap, contain, or dovetail each other


    contain: mate2刚好是mate1的子序列

    dovetail: 咬合,延长

    --dovetail: 让dovetail变为concordant

    Reporting mode:
    dafault mode: 找多个匹配,报告最好的
    -D和-R: 会使程序变慢,但是增大了找到最好比对的可能性(针对有多个比对的)
    -k mode: 找一个或多个匹配,全报告
    -k N找最多N个匹配,按比对分数降序排序
    -a mode: 找和报告所有的。对大基因组,这个会很慢。


    To rapidly narrow the number of possible alignments that must be considered, Bowtie 2 begins by

    extracting substrings ("seeds") from the read and its reverse complement and aligning them in an

    ungapped fashion with the help of the FM Index.
    -L: seed length
    -i: interval between extracted seeds
    -N: # mismatches permitted per seed

    --n-ceil: upper limit on # N if valid

    Presets: setting many settings at once

    --very-sensitive: 等价于-D 20 -R 3 -N 0 -L 20 -i S,1,0.50 //可以查看文档得知

    bowtie2会在SAM记录里写出低质量read,YF:i SAM optional field 也会解释过滤它们的原因(多个原因只说一个),但是不会去比对它们。
    YF:Z:LN read长度<=seed mismatches(-N)
    YF:Z:NS read里N的数量>(--n-ceil)
    YF:Z:SC 低于--score-min
    YF:Z:QC 与--qv-filter设定相关。Illumina的QSEQ格式中,read最后一个域含1

    small(32-bit numbers, for <4 billion nucleotides in length, index.bt2) and large(64-bit numbers, .bt21) index自动选择,无需担心
    Performance tuning:
    -p: 多线程
    -o/--offrate: 使用bowtie2-build时,这个值比default设得小些。这样会有更大的index,适合-k和-a模式(报告多个比对的)。但如果计算机内存小,还是把这个值设大,减少内存消耗。


    Step 1:建索引


    bowtie2-build /home/pxy7896/Downloads/bowtie2/example/reference/lambda_virus.fa lambda_virus






    Step 2: 比对reads

    bowtie2 -x lambda_virus -U /home/pxy7896/Downloads/bowtie2/example/reads/reads_1.fq -S eg1.sam


    -x: 查看index文件首先在当前目录下找,找不到再去环境变量BOWTIE2_INDEXES下找。

    -U: 后面跟需要比对的read文件。多个用,分隔。

    -S: 后跟结果文件的名字。


    比对结果写到eg1.sam,要查看:head eg1.sam


    sam格式:(haven’t read yet)


    Step 3: 比对paired-end reads

    bowtie2 -x lambda_virus -1 /home/pxy7896/Downloads/bowtie2/example/reads/reads_1.fq -2 /home/pxy7896/Downloads/bowtie2/example/reads/reads_2.fq -S eg2.sam


    Step 4: long reads local alignment

    bowtie2 --local -x lambda_virus -U /home/pxy7896/Downloads/bowtie2/example/reads/longreads.fq -S eg3.sam



    Step 5: sam转为bam格式 (binary format of sam)

    samtools view -b /home/pxy7896/Desktop/eg/eg2.sam -o eg2.bam

    Step 6: sam排序为.sorted文件(这是压缩了的文件,适合存储)

    samtools sort eg2.bam -o eg2.sorted

    Step 7: generate variant calls in VCF format

    发现居然没有装bcftools(for calling variants and manipulating VCF and BCF files)

    sudo apt install bcftools


    samtools mpileup -v -u -f $BT2_HOME/example/reference/lambda_virus.fa eg2.sorted | bcftools view -o eg2.raw.bcf








