1. 生成fai和dict文件:
java -Xmx50g -jar picard.jar CreateSequenceDictionary R=/home/dklv/GW/software/bwa.kit-0.7.15_x64-linux/index_gencode_hg38/GRCh38.primary_assembly.genome.fa O=/home/dklv/GW/software/bwa.kit-0.7.15_x64-linux/index_gencode_hg38/GRCh38.primary_assembly.genome.dict
samtools faidx GRCh38.primary_assembly.genome.fa
以上参考:https://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference
2. 修改bam文件中的RG信息:
java -Xmx50g -jar picard.jar AddOrReplaceReadGroups ID=B1 PL=illumina LB=LaneX PU=NONE SM=B1 INPUT=$tbam OUTPUT=B1.bam
此步是可以避免的,如果在map时,就把RG信息设置好。
RG的解释:This tag identifies which read group each read belongs to, so each read group's ID
must be unique. It is referenced both in the read group definition line in the file header (starting with @RG
) and in the RG:Z
tag for each read record. (参考:https://gatk.broadinstitute.org/hc/en-us/articles/360035890671?id=11015)
所以,一定要将设置好RG的ID!!
3.
./gatk --java-options "-Xmx40g" Mutect2 -R $fa -I $tbam -tumor $tbam --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter -O 1_PC.vcf.gz
参考:
https://gatk.broadinstitute.org/hc/en-us/articles/360037224712--Tool-Documentation-Index(GATK所有工具的汇总,点击其中Mutect2可进入该工具的tutorial)
https://gatk.broadinstitute.org/hc/en-us/articles/360035531132(该网址mutect2的tutorial网址)
https://gatk.broadinstitute.org/hc/en-us/articles/360035889791?id=11136#2(该网址是mutect2的tutorial,虽然标注为deprecatd,但我觉得此网址比上面的tutorial网址详细。)
https://gatk.broadinstitute.org/hc/en-us/articles/360035889791?id=11136#2 的“Create a sites-only PoN with CreateSomaticPanelOfNormals”。