1、Goal of mapping
1)We want to assign reads to genes they were derived from
2)The result of the mapping will be used to construct a summary of the counts: the count table.
2 、不同情况 in RNA-seq
1)Reference genome sequenceavailable
2)NO reference genome sequence available
De novo assembly of the reads (trinity transcriptome construction)
Map the reads to the assembly (RSEM mapper)
Extract count table (note:no removal of polyA is required. Computationally expensive!)
3、Reads mapped to reference genome
1、比对过程中主要点
1)Reference is haplotype: mixture of alleles, leads to mismatches.
相比较而言:多倍体个体在进行比对时错配的概率要大。
2)Reads contain sequencing errors「
reads在测序仪测bases时出错,本身存在bases的错误。
3)Reads derived from mRNA, genome is DNA
4、visualize SAM or aBAM
The outcome of the alignment is a SAM or a BAM format, which you can visualize in Galaxy (or with a stand-alone viewer such as GenomeView or IGV.
Galaxy https://www.galaxyproject.org/ stand-double
GenomeView stand-alone
IGV stand-alone
5、Mapping QC
RseQC http://rseqc.sourceforge.net/ After checking the mapping visually, determine more metrics with RseQC
BAMQC http://qualimap.bioinfo.cipf.es/ mainly useful for DNA-seq
exeicise: http://wiki.bits.vib.be/index.php/RNA-Seq_analysis_for_differential_expression#Mapping_processed_data