• Multiple sequence alignment Benchmark Data set


    Multiple sequence alignment Benchmark Data set

     

    1. 汇总: 序列比对标准数据集: http://www.drive5.com/bench/

    This is a collection of multiple alignment benchmarks in a uniform
    format that is convenient for further analysis. All files are in
    FASTA format, with upper-case letters used to indicate aligned
    columns.

    See References below for original sources of benchmark data.

    Benchmarks are:

    --------------------------1---------------------------

    bali2dna
    BALIBASE v2, reverse-translated to DNA

    bali2dnaf
    Bali2dbn, with frame-shifts induced by random insertions of one
    or two nucleotides into the middle 50% of exactly one sequence
    in each set.

    bali3
    BALIBASE v3.

    bali3pdb
    BALIS, the structural subset of BALIBASE v3.

    bali3pdbm
    MU-BALIS, i.e. BALIS re-aligned by MUSTANG.

    ---------------------------2--------------------------

    ox
    OXBENCH.

    oxm
    MU-OXBENCH, i.e. OXBENCH re-aligned by MUSTANG.

    oxx
    OXBENCH-X, i.e. the Extended set in OBENCH.

    ---------------------------3--------------------------

    prefab4
    PREFAB v4.

    prefab4ref
    PREFAB-R, i.e. the pair-wise reference pairs in PREFAB v4.

    prefab4refm
    MU-PREFAB-R, i.e. PREFAB-R re-aligned by MUSTANG.

    ---------------------------4--------------------------

    sabre
    Consistent multiple alignments constructed from SABMARK v1.65.

    sabrem
    MU-SABRE, i.e. SABRE re-aligned by MUSTANG.

    -----------------------------------------------------

    Directory structure under each benchmark is:

    in/
    Input sequences.

    ref/
    Reference alignments. Upper-case regions indicate conservative
    regions that are intended for use in assessment. Lower-case regions
    should not be used.

    info/
    Contains ids.txt (list of set identifiers that are filenames in ref/
    and in/), nrseqs.txt (number of sequences in each set), and
    pctids.txt (%id in conservative regions in each set).

    Download page for qscore :http://www.drive5.com/bench/bench.tar.gz

    This is a quality scoring program that compares two multiple sequence alignments: an alignment to be evaluated (the "test" alignment) and a second alignment that is believed to be correct (the "reference" alignment). The program outputs the following scores:
    - The PREFAB Q score (aka the Balibase SPS score or the Developer score).
    - The Modeler score
    - The Cline et al. shift score
    - The Balibase TC (total column) score


    Balibase标准数据库地址: http://www.lbgi.fr/balibase/


    References
    ----------

    Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest
    developments of the multiple sequence alignment benchmark. Proteins
    61: 127-136.

    Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (Benchmark
    Alignment dataBASE): enhancements for repeats, transmembrane
    sequences and circular permutations. Nucleic Acids Res 29: 323-326.

    Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark
    alignment database for the evaluation of multiple alignment programs.
    Bioinformatics 15: 87-88.

    Van Walle I, Lasters I, Wyns L (2005) SABmark--a benchmark for
    sequence alignment that covers the entire known fold space.
    Bioinformatics 21: 1267-1268.

    Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003)
    OXBench: a benchmark for evaluation of protein multiple sequence
    alignment accuracy. BMC Bioinformatics 4: 47.

    Edgar RC (2004) MUSCLE: multiple sequence alignment with high
    accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.

  • 相关阅读:
    LeetCode: Search in Rotated Sorted Array
    LeetCode: Search in Rotated Sorted Array II
    LeetCode: Search Insert Position
    LeetCode: Same Tree
    LeetCode: Search a 2D Matrix
    Oracle工具下载地址
    华为创始人任正非简介
    个人开公司的流程,以后用得着!
    如何更改收藏夹的位置,收藏夹的位置
    帮助大家了解到Google关键字排名、价格
  • 原文地址:https://www.cnblogs.com/tsingke/p/5709360.html
Copyright © 2020-2023  润新知