• 【2】蛋白鉴定软件之Comet


    1.简介

    官网:http://comet-ms.sourceforge.net/

    • 1993年开发,持续更新,免费开源
    • 适用Windows/Linux
    • 多线程,支持多种输入输出格式:输入谱图文件(mzXML, mzML, mgf, or ms2/cms2),输出.pep.xml/.pin.xml/.sqt/.out等文件

    运行:

    comet.exe input.mzXML
    comet.exe input.mzML
    comet.exe input.mgf
    comet.exe input.ms2
    comet.exe *.ms2   #支持多文件输入
    

    其他整合了Comet的工具:

    2.下载安装

    下载UI界面版本:setup.exe.,用户指南:http://comet-ms.sourceforge.net/CometUI/CometUI-User-Guide.pdf
    下载Linux版本:https://sourceforge.net/projects/comet-ms/files/

    依然只试用Linux版本。

    unzip  comet_2019015.zip
    

    3.软件使用

    运行非常简单,软件后调用参数配置文件和谱图原始文件即可。

    参数配置文件在官网解释得非常详细:Search parameters。同时针对不同质谱仪的一级和二级质量误差,官方提供了3个示例参数文件:
    ●  comet.params.low-low 用于低一级和二级误差,如 ion trap
    ●  comet.params.high-low 用于高一级误差和低二级误差,如Velos-Orbitrap
    ●  comet.params.high-high 用于高一级和二级误差,如 Q Exactive 或 Q-Tof

    以高分辨质谱仪为例,以下参数除了数据库设置,大部分参数默认即可:

    # comet_version 2019.01 rev. 0
    # Comet MS/MS search engine parameters file.
    # Everything following the '#' symbol is treated as a comment.
    
    database_name = /some/path/db.fasta
    decoy_search = 0                       # 0=no (default), 1=concatenated search, 2=separate search
    peff_format = 0                        # 0=no (normal fasta, default), 1=PEFF PSI-MOD, 2=PEFF Unimod
    peff_obo =                             # path to PSI Mod or Unimod OBO file
    
    num_threads = 0                        # 0=poll CPU to set num threads; else specify num threads directly (max 128)
    
    #
    # masses
    #
    peptide_mass_tolerance = 20.00
    peptide_mass_units = 2                 # 0=amu, 1=mmu, 2=ppm
    mass_type_parent = 1                   # 0=average masses, 1=monoisotopic masses
    mass_type_fragment = 1                 # 0=average masses, 1=monoisotopic masses
    precursor_tolerance_type = 1           # 0=MH+ (default), 1=precursor m/z; only valid for amu/mmu tolerances
    isotope_error = 3                      # 0=off, 1=0/1 (C13 error), 2=0/1/2, 3=0/1/2/3, 4=-8/-4/0/4/8 (for +4/+8 labeling)
    
    #
    # search enzyme
    #
    search_enzyme_number = 1               # choose from list at end of this params file
    search_enzyme2_number = 0              # second enzyme; set to 0 if no second enzyme
    num_enzyme_termini = 2                 # 1 (semi-digested), 2 (fully digested, default), 8 C-term unspecific , 9 N-term unspecific
    allowed_missed_cleavage = 2            # maximum value is 5; for enzyme search
    
    #
    # Up to 9 variable modifications are supported
    # format:  <mass> <residues> <0=variable/else binary> <max_mods_per_peptide> <term_distance> <n/c-term> <required> <neutral_loss>
    #     e.g. 79.966331 STY 0 3 -1 0 0 97.976896
    #
    variable_mod01 = 15.9949 M 0 3 -1 0 0 0.0
    variable_mod02 = 0.0 X 0 3 -1 0 0 0.0
    variable_mod03 = 0.0 X 0 3 -1 0 0 0.0
    variable_mod04 = 0.0 X 0 3 -1 0 0 0.0
    variable_mod05 = 0.0 X 0 3 -1 0 0 0.0
    variable_mod06 = 0.0 X 0 3 -1 0 0 0.0
    variable_mod07 = 0.0 X 0 3 -1 0 0 0.0
    variable_mod08 = 0.0 X 0 3 -1 0 0 0.0
    variable_mod09 = 0.0 X 0 3 -1 0 0 0.0
    max_variable_mods_in_peptide = 5
    require_variable_mod = 0
    
    #
    # fragment ions
    #
    # ion trap ms/ms:  1.0005 tolerance, 0.4 offset (mono masses), theoretical_fragment_ions = 1
    # high res ms/ms:    0.02 tolerance, 0.0 offset (mono masses), theoretical_fragment_ions = 0, spectrum_batch_size = 10000
    #
    fragment_bin_tol = 0.02                # binning to use on fragment ions
    fragment_bin_offset = 0.0              # offset position to start the binning (0.0 to 1.0)
    theoretical_fragment_ions = 0          # 0=use flanking peaks, 1=M peak only
    use_A_ions = 0
    use_B_ions = 1
    use_C_ions = 0
    use_X_ions = 0
    use_Y_ions = 1
    use_Z_ions = 0
    use_NL_ions = 0                        # 0=no, 1=yes to consider NH3/H2O neutral loss peaks
    
    #
    # output
    #
    output_sqtstream = 0                   # 0=no, 1=yes  write sqt to standard output
    output_sqtfile = 0                     # 0=no, 1=yes  write sqt file
    output_txtfile = 0                     # 0=no, 1=yes  write tab-delimited txt file
    output_pepxmlfile = 1                  # 0=no, 1=yes  write pep.xml file
    output_percolatorfile = 0              # 0=no, 1=yes  write Percolator tab-delimited input file
    print_expect_score = 1                 # 0=no, 1=yes to replace Sp with expect in out & sqt
    num_output_lines = 5                   # num peptide results to show
    show_fragment_ions = 0                 # 0=no, 1=yes for out files only
    
    sample_enzyme_number = 1               # Sample enzyme which is possibly different than the one applied to the search.
                                           # Used to calculate NTT & NMC in pepXML output (default=1 for trypsin).
    
    #
    # mzXML parameters
    #
    scan_range = 0 0                       # start and end scan range to search; either entry can be set independently
    precursor_charge = 0 0                 # precursor charge range to analyze; does not override any existing charge; 0 as 1st entry ignores parameter
    override_charge = 0                    # 0=no, 1=override precursor charge states, 2=ignore precursor charges outside precursor_charge range, 3=see online
    ms_level = 2                           # MS level to analyze, valid are levels 2 (default) or 3
    activation_method = ALL                # activation method; used if activation method set; allowed ALL, CID, ECD, ETD, ETD+SA, PQD, HCD, IRMPD
    
    #
    # misc parameters
    #
    digest_mass_range = 600.0 5000.0       # MH+ peptide mass range to analyze
    peptide_length_range = 5 63            # minimum and maximum peptide length to analyze (default 1 63; max length 63)
    num_results = 100                      # number of search hits to store internally
    max_duplicate_proteins = 20            # maximum number of protein names to report for each peptide identification; -1 reports all duplicates
    skip_researching = 1                   # for '.out' file output only, 0=search everything again (default), 1=don't search if .out exists
    max_fragment_charge = 3                # set maximum fragment charge state to analyze (allowed max 5)
    max_precursor_charge = 6               # set maximum precursor charge state to analyze (allowed max 9)
    nucleotide_reading_frame = 0           # 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six
    clip_nterm_methionine = 0              # 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine
    spectrum_batch_size = 15000            # max. # of spectra to search at a time; 0 to search the entire scan range in one loop
    decoy_prefix = DECOY_                  # decoy entries are denoted by this string which is pre-pended to each protein accession
    equal_I_and_L = 1                      # 0=treat I and L as different; 1=treat I and L as same
    output_suffix =                        # add a suffix to output base names i.e. suffix "-C" generates base-C.pep.xml from base.mzXML input
    mass_offsets =                         # one or more mass offsets to search (values substracted from deconvoluted precursor mass)
    precursor_NL_ions =                    # one or more precursor neutral loss masses, will be added to xcorr analysis
    
    #
    # spectral processing
    #
    minimum_peaks = 10                     # required minimum number of peaks in spectrum to search (default 10)
    minimum_intensity = 0                  # minimum intensity value to read in
    remove_precursor_peak = 0              # 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD), 3=phosphate neutral loss peaks
    remove_precursor_tolerance = 1.5       # +- Da tolerance for precursor removal
    clear_mz_range = 0.0 0.0               # for iTRAQ/TMT type data; will clear out all peaks in the specified m/z range
    
    #
    # additional modifications
    #
    
    add_Cterm_peptide = 0.0
    add_Nterm_peptide = 0.0
    add_Cterm_protein = 0.0
    add_Nterm_protein = 0.0
    
    add_G_glycine = 0.0000                 # added to G - avg.  57.0513, mono.  57.02146
    add_A_alanine = 0.0000                 # added to A - avg.  71.0779, mono.  71.03711
    add_S_serine = 0.0000                  # added to S - avg.  87.0773, mono.  87.03203
    add_P_proline = 0.0000                 # added to P - avg.  97.1152, mono.  97.05276
    add_V_valine = 0.0000                  # added to V - avg.  99.1311, mono.  99.06841
    add_T_threonine = 0.0000               # added to T - avg. 101.1038, mono. 101.04768
    add_C_cysteine = 57.021464             # added to C - avg. 103.1429, mono. 103.00918
    add_L_leucine = 0.0000                 # added to L - avg. 113.1576, mono. 113.08406
    add_I_isoleucine = 0.0000              # added to I - avg. 113.1576, mono. 113.08406
    add_N_asparagine = 0.0000              # added to N - avg. 114.1026, mono. 114.04293
    add_D_aspartic_acid = 0.0000           # added to D - avg. 115.0874, mono. 115.02694
    add_Q_glutamine = 0.0000               # added to Q - avg. 128.1292, mono. 128.05858
    add_K_lysine = 0.0000                  # added to K - avg. 128.1723, mono. 128.09496
    add_E_glutamic_acid = 0.0000           # added to E - avg. 129.1140, mono. 129.04259
    add_M_methionine = 0.0000              # added to M - avg. 131.1961, mono. 131.04048
    add_O_ornithine = 0.0000               # added to O - avg. 132.1610, mono  132.08988
    add_H_histidine = 0.0000               # added to H - avg. 137.1393, mono. 137.05891
    add_F_phenylalanine = 0.0000           # added to F - avg. 147.1739, mono. 147.06841
    add_U_selenocysteine = 0.0000          # added to U - avg. 150.0379, mono. 150.95363
    add_R_arginine = 0.0000                # added to R - avg. 156.1857, mono. 156.10111
    add_Y_tyrosine = 0.0000                # added to Y - avg. 163.0633, mono. 163.06333
    add_W_tryptophan = 0.0000              # added to W - avg. 186.0793, mono. 186.07931
    add_B_user_amino_acid = 0.0000         # added to B - avg.   0.0000, mono.   0.00000
    add_J_user_amino_acid = 0.0000         # added to J - avg.   0.0000, mono.   0.00000
    add_X_user_amino_acid = 0.0000         # added to X - avg.   0.0000, mono.   0.00000
    add_Z_user_amino_acid = 0.0000         # added to Z - avg.   0.0000, mono.   0.00000
    
    #
    # COMET_ENZYME_INFO _must_ be at the end of this parameters file
    #
    [COMET_ENZYME_INFO]
    0.  No_enzyme              0      -           -
    1.  Trypsin                1      KR          P
    2.  Trypsin/P              1      KR          -
    3.  Lys_C                  1      K           P
    4.  Lys_N                  0      K           -
    5.  Arg_C                  1      R           P
    6.  Asp_N                  0      D           -
    7.  CNBr                   1      M           -
    8.  Glu_C                  1      DE          P
    9.  PepsinA                1      FL          P
    10. Chymotrypsin           1      FWYL        P
    

    一般设置数据库database_name,线程数num_threads,特异性酶search_enzyme_number = 1。(如果是多肽组学,设置为非特异性酶search_enzyme_number = 0

    运行命令

    comet.2019015.linux.exe -P./comet.params.high-high test_1.mzML
    

    谱图文件支持mzXML, mzML, mgf, or ms2/cms2等多种格式,obitrap的高分辨质谱仪(.raw)需要转化。关于Linux上质谱原始数据的格式转化,可参考博文:【ThermoRawFileParser】质谱raw格式转换mgf(-f参数设为1即可得到mzML格式)。

    4.结果

    运行结果会出现`test_1.pep.xml,test_1.pin,test_1.txt等文件。主要看txt文件,即为鉴定结果:
    第一行:

    CometVersion 2019.01 rev. 5     test_1       07/28/2020, 02:12:23 PM  /path/to/database/test.fasta
    

    image.png

    结果表头:

          1 scan
          2 num
          3 charge
          4 exp_neutral_mass
          5 calc_neutral_mass
          6 e-value
          7 xcorr
          8 delta_cn
          9 sp_score
         10 ions_matched
         11 ions_total
         12 plain_peptide
         13 modified_peptide
         14 prev_aa
         15 next_aa
         16 protein
         17 protein_count
         18 modifications
    

    一般也要根据需要,进行后处理。


    蛋白质组学鉴定定量系列软件总结:
    【1】蛋白鉴定软件之X!Tandem
    【2】蛋白鉴定软件之Comet
    【3】蛋白鉴定软件之Mascot
    【4】蛋白质组学鉴定软件之MSGFPlus
    【5】蛋白质组学鉴定定量软件之PD
    【6】蛋白质组学鉴定定量软件之MaxQuant

  • 相关阅读:
    Linux目录操作
    图形库
    Mybatis两表连接(一对一)
    ssm图片上传到数据库
    ajax函数实例
    html、css基础
    HDU 1213 How Many Tables
    HTML5简介
    在script中创建标签的三种方式
    html css js jq问题总结
  • 原文地址:https://www.cnblogs.com/jessepeng/p/13577637.html
Copyright © 2020-2023  润新知