• 【转录组】使用hisat2+stringtie组合进行转录组定量分析


    转载于:https://www.ivistang.com/articles/312

    准备软件和数据

    安装miniconda

    wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    sh Miniconda3-latest-Linux-x86_64.sh
    source ~/.bashrc

    安装相关软件

    ###安装aspera加速下载
    conda install -c hcc aspera-cli -y
    ###安装samtools
    conda install -c bioconda samtools openssl=1.0 -y
    ###安装sratoolkit hisat2 stringtie
    conda install -c bioconda sra-tools hisat2 stringtie -y
    ###下载trimmomatic并解压
    wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip
    unzip Trimmomatic-0.39.zip

    注:如果调用samtools出现如下报错:samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
    问题出在openssl的版本,通过conda install openssl=1.0 -y 来确保安装openssl的1.0.2版本

    下载sra数据并转换成fastq

    使用prefetch命令进行下载,命令形如prefetch SRRXXXXXX SRRXXXXX。
    下载完成后,可以用tree来查看下载结果。
    使用fasterq-dump将sra文件转换成fastq:

    fasterq-dump -e 60 -S *.sra
     

    转录组定量

    所有sra数据进行trimmomatic修剪

    双端PE

    for i in *.sra
    do
    java -jar /data/software/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 60 ${i}_1.fastq ${i}_2.fastq ${i}_clean_1.fastq ${i}_unpaired_1.fastq ${i}_clean_2.fastq ${i}_unpaired_2.fastq ILLUMINACLIP:/data/software/Trimmomatic-0.39/adapters/TruSeq3-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 MINLEN:36
    done

    单端SE

    for i in *.sra
    do
    java -jar /data/software/Trimmomatic-0.39/trimmomatic-0.39.jar SE -threads 60 ${i}.fastq  ${i}_clean.fastq ILLUMINACLIP:/data/software/Trimmomatic-0.39/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
    done

    Hisat2+stringtie代码举例

    构建索引Index

    hisat2-build Ghirsutum_527_v2.0.fa genome

    双端PE

    ref=Ghirsutum_527_v2.0.fa
    gff=Ghirsutum_527_v2.1.gene.gff3
    index=genome
    label=Ghir
    out=SRR5178068
    fq1="SRR5178068.sra_clean_1.fastq,SRR5186513.sra_clean_1.fastq"
    fq2="SRR5178068.sra_clean_2.fastq,SRR5186513.sra_clean_2.fastq"
    hisat2 --dta -p 60 -1 ${fq1} -2 ${fq2} -x ${index} -S ${out}.sam
    samtools view -@ 60 -b -S ${out}.sam > ${out}.bam
    samtools sort -@ 60 -m 4G -o ${out}.sorted.bam ${out}.bam
    stringtie -p 60 -G ${gff} -B -e -l ${label} -o ${out}/${out}.gtf ${out}.sorted.bam

    单端SE

    ref=G.arboreum_BGI-A2_assembly.fa
    gff=G.arboreum_BGI-A2_v1.0_transcripts.gff3
    index=genome
    label=Garb
    out=SRR617065
    fq="SRR617065.sra_clean.fastq,SRR617067.sra_clean.fastq"
    hisat2 --dta -p 60 -U ${fq} -x ${index} -S ${out}.sam
    samtools view -@ 60 -b -S ${out}.sam > ${out}.bam
    samtools sort -@ 60 -m 4G -o ${out}.sorted.bam ${out}.bam
    stringtie -p 60 -G ${gff} -B -e -l ${label} -o ${out}/${out}.gtf ${out}.sorted.bam注
     
    注:最后输出文件包含我想要的表达量gtf文件(包括FPKM,TPM)以及几个.ctab中间文件。我用了60个线程,注意修改代码中的线程数。
  • 相关阅读:
    软工人日常
    11.5
    11.4
    11.3
    11.2阅读笔记
    11.1阅读笔记
    10.31 异常
    10.30动手动脑
    10.29
    10.28
  • 原文地址:https://www.cnblogs.com/triple-y/p/14246809.html
Copyright © 2020-2023  润新知