• [samtools] sam格式与bam格式互换,提取未匹配reads,转为fastq


    Converting a SAM file to a BAM file

    First, if you use the Unix command

    head test.sam

    The first 10 lines on your terminal after typing "head test.sam", should be lines starting with an "@" sign, which is an indicator for a header line. If you don't see lines starting with the "@" sign, the header information is most likely missing.

    If the header information is absent不存在 from the SAM file use the command below, where reference.fa is the reference fasta file used to map the reads:

    samtools view -bT reference.fa test.sam > test.bam

    If the header information is available:

    samtools view -bS test.sam > test.bam

    Sorting a BAM file

    samtools sort test.bam -o test_sorted

    Creating a BAM index file

    samtools index test_sorted.bam test_sorted.bai

    Converting a BAM file to a SAM file

    Note: remember to use -h to ensure the converted SAM file contains the header information. Generally, I suggest storing only sorted BAM files as they use much less disk space and are faster to process.

    samtools view -h NA06984.chrom16.ILLUMINA.bwa.CEU.low_coverage.20100517.bam > NA06984.chrom16.ILLUMINA.bwa.CEU.low_coverage.20100517.sam

    Simple stats

    samtools flagstat NA06984.chrom16.ILLUMINA.bwa.CEU.low_coverage.20100517.bam

    10182494 in total
    0 QC failure
    223627 duplicates
    9861117 mapped (96.84%)
    10095646 paired in sequencing
    5049066 read1
    5046580 read2
    8174084 properly paired (80.97%)
    9452892 with itself and mate mapped
    321377 singletons (3.18%)
    215316 with mate mapped to a different chr
    126768 with mate mapped to a different chr (mapQ>=5)
    For more statistics of SAM or BAM files have a look at the SAMStat program.

    Interpreting the BAM flags

    (也可以利用FLAG值含义解释工具:https://www.plob.org/2012/02/04/1697.html

    Here are some common BAM flags:

    163: 10100011 in binary

    147: 10010011 in binary

    99: 1100011 in binary

    83: 1010011 in binary

    Interpretation解释 of 10100011 (reading the binary from left to right):

    1 the read is paired in sequencing, no matter whether it is mapped in a pair
    1 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)
    0 the query sequence itself is unmapped
    0 the mate is unmapped
    0 strand of the query (0 for forward; 1 for reverse strand)
    1 strand of the mate
    0 the read is the first read in a pair
    1 the read is the second read in a pair

    163 second read of a pair on the positive strand with negative strand mate

    147 second read of a pair on the negative strand with positive strand mate

    99 first read of a pair on the forward strand with negative strand mate

    83 first read of a pair on the reverse strand with positive strand mate

    Extracting only the first read from paired end BAM files

    samtools view -h -f 0x0040 test.bam > test_first_pair.sam

    0x0040 is hexadecimal十六进制 for 64 (i.e. 16 * 4), which is binary for 1000000, corresponding to the read in the first read pair.

    Filtering out unmapped reads in BAM files

    samtools view -h -F 4 blah.bam > blah_only_mapped.sam

    Creating FASTQ files from a BAM file

    I found this great tool at http://www.hudsonalpha.org/gsl/software/bam2fastq.php

    For example to extract ONLY unaligned from a bam file:

    bam2fastq -o blah_unaligned.fastq --no-aligned blah.bam

    To extract ONLY aligned reads from a bam file:

    bam2fastq -o blah_aligned.fastq --no-unaligned blah.bam

  • 相关阅读:
    Serverless 时代下大规模微服务应用运维的最佳实践
    Dubbo 跨语言调用神兽:dubbo-go-pixiu
    Flink 1.12 资源管理新特性回顾
    [JDBC] Kettle on MaxCompute 使用指南
    AI运动:阿里体育端智能最佳实践
    MaxCompute非事务表如何更新数据
    如何利用云原生技术构建现代化应用
    实时数仓入门训练营:实时计算 Flink 版 SQL 实践
    实时数仓入门训练营:基于 Apache Flink + Hologres 的实时推荐系统架构解析
    Flink + Iceberg + 对象存储,构建数据湖方案
  • 原文地址:https://www.cnblogs.com/xiaofeiIDO/p/6424649.html
Copyright © 2020-2023  润新知