• String Algorithm


    KMP

    char a b a b a b c a
    index 0 1 2 3 4 5 6 7
    value 0 0 1 2 3 4 0 1

    "slash"

    proper profix: s, sl, sla, slas

    proper suffix: h, sh, ash, lash

    partial match table(也有人叫失配函数,也有人叫next数组):

    每个index表示字符串是一个子串,子串的下标∈(0, index)

    value = "proper profix"和"proper suffix"相同的最长字符串

    举例:
    index = 3处, proper profix: a, ab, aba;  proper suffix:b, ab, bab;

    value = "ab"的长度 = 2

    实现:

     1 void computeLPSArray(char *pat, int M, int *lps)
     2 {
     3     // length of the previous longest prefix suffix
     4     int len = 0;
     5  
     6     lps[0] = 0; // lps[0] is always 0
     7  
     8     // the loop calculates lps[i] for i = 1 to M-1
     9     int i = 1;
    10     while (i < M)
    11     {
    12         if (pat[i] == pat[len])
    13         {
    14             len++;
    15             lps[i] = len;
    16             i++;
    17         }
    18         else // (pat[i] != pat[len])
    19         {
    20             // This is tricky. Consider the example.
    21             // AAACAAAA and i = 7. The idea is similar 
    22             // to search step.
    23             if (len != 0)
    24             {
    25                 len = lps[len-1];
    26  
    27                 // Also, note that we do not increment
    28                 // i here
    29             }
    30             else // if (len == 0)
    31             {
    32                 lps[i] = 0;
    33                 i++;
    34             }
    35         }
    36     }
    37 }

    模式字符串移动:

    当table[partial_match_length] > 1, 移动partial_match_length - table[partial_match_length - 1]

    KMP search实现:

     1 void KMPSearch(char *pat, char *txt)
     2 {
     3     int M = strlen(pat);
     4     int N = strlen(txt);
     5  
     6     // create lps[] that will hold the longest prefix suffix
     7     // values for pattern
     8     int lps[M];
     9  
    10     // Preprocess the pattern (calculate lps[] array)
    11     computeLPSArray(pat, M, lps);
    12  
    13     int i = 0;  // index for txt[]
    14     int j  = 0;  // index for pat[]
    15     while (i < N)
    16     {
    17         if (pat[j] == txt[i])
    18         {
    19             j++;
    20             i++;
    21         }
    22  
    23         if (j == M)
    24         {
    25             printf("Found pattern at index %d 
    ", i-j);
    26             j = lps[j-1];
    27         }
    28  
    29         // mismatch after j matches
    30         else if (i < N && pat[j] != txt[i])
    31         {
    32             // Do not match lps[0..lps[j-1]] characters,
    33             // they will match anyway
    34             if (j != 0)
    35                 j = lps[j-1];
    36             else
    37                 i = i+1;
    38         }
    39     }
    40 }

    紧凑的实现:

     1 void preparation(char *P, int *f) {
     2     int m = strlen(P);
     3     f[0] = f[1] = 0;
     4     for (int i = 1; i < m; i++) {
     5         int j = f[i];
     6         while (j && P[i] != P[j])
     7               j = f[j];
     8         f[i + 1] = (P[i] == P[j])? j+1 : 0;
     9     }
    10 }
    11 void KMP(char *T, char *P, int *f) {
    12     int n = strlen(T), m = strlen(P);
    13     preparation(P, f);
    14     int j = 0;
    15     for (int i = 0; i < n; i++) {
    16         while (j && P[i] != T[i]) j = f[j];
    17         if (P[j] == T[i]) j++;
    18         if (j == m) answer(i - m + 1);
    19     }
    20 }

    参考:

    http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/

     http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm

    AC 自动机

    背景:基于有限状态自动机

    KMP的partial match table那么多叫法什么失败指针,失配函数,就来源于AC自动机。

    参考:http://www.cs.uku.fi/~kilpelai/BSA05/lectures/slides04.pdf

    http://www.cnblogs.com/en-heng/p/5247903.html

  • 相关阅读:
    scala文件读取报错“java.nio.charset.MalformedInputException: Input length = 1”
    关于sparksql操作hive,读取本地csv文件并以parquet的形式装入hive中
    区别window.location.Reload()和window.location.href=window.location.href;
    datagrid GridView Repeater 绑定 序号 逐一递增的实现
    品牌机用Vista光盘分区
    网络路径结点回溯分析工具
    几种流行的AJAX框架jQuery,Mootools,Dojo,Ext JS的对比
    Ubuntu Linux 设置
    sql server中为某个整数前加上一定数量的0,如1,要返回00001
    本地测试域名
  • 原文地址:https://www.cnblogs.com/autoria/p/6013366.html
Copyright © 2020-2023  润新知