• 彻底弄明白之数据结构中的KMP算法


    如何加速朴素查找算法? KMP,当然还有其他算法,后续介绍.
      
     

    Knuth–Morris–Pratt string search algorithm

    Start at LHS of string, string[0], trying to match pattern, working right. 
    Trying to match string[i] == pattern[j].
     
    Given a search pattern, pre-build a table, next[j], showing, when there is a mismatch at pattern position j, where to reset j to

    If match fails, keep i same, reset j to position next[j].

     

     

    How to build the table

    Everything else below is just how to build the table.
     

    Construct a table showing where to reset j to

    1. If mismatch string[i] != pattern[0], just move string to i+1, j = 0
    2. If mismatch string[i] != pattern[1], we leave i the same, j = 0

      pattern = 1
      string = ... 1100000

    3. If mismatch string[i] != pattern[2], we leave i the same, and change j, but we need to consider repeats in pattern[0] .. pattern[1]

      pattern = 11
      string = ... 11100000 
      i stays same, j goes from 2 back to 1

      pattern = 10
      string = ... 10100000 
      i stays same, j goes from 2 back to 0

    4. If mismatch string[i] != pattern[j], we leave i the same, and change j, but we need to consider repeats in pattern[0] .. pattern[j-1]
    Given a certain pattern, construct a table showing where to reset j to.

     

     

    Construct a table of next[j]

    For each j, figure out: 
    next[j] = length of longest prefix in "pattern[0] .. pattern[j-1]" that matches the suffix of "pattern[1] .. pattern[j] 
    next[j] = “最大匹配的子串的长度"  
    That is:
    1. prefix must include pattern[0]
    2. suffix must include pattern[j]
    3. prefix and suffix are different
    key
                                                            
    Example for pattern  “ABABAC":
     

    next[j] = length of longest prefix in "pattern[0] .. pattern[j-1]" that matches the suffix of "pattern[1] .. pattern[j] 

    当j+1位与s[k]位比较,不匹配时

    j'=next[j], j’和s[k]比较了,j’移到了原j+1的位置

    j 0 1 2 3 4 5
    substring 0 to j A AB ABA ABAB ABABA ABABAC
    longest prefix-suffix match none none A AB ABA none
    next[j] 0 0 1 2 3 0
    notes no prefix and suffix that are different 
    i.e. next[0]=0 for all patterns
             


    Given j, let n = next[j] 
    "pattern[0] .. pattern[n-1]" = "pattern[j-(n-1)] .. pattern[j]"

    "pattern[0] .. pattern[next[j]-1]" = "pattern[j-(next[j]-1)] .. pattern[j]"

    e.g. j = 4, n = 3, 

    "pattern[0] .. pattern[2]" = "pattern[2] .. pattern[4]"

    If match fails at position j+1(compare with s[j+1]), keep i same, reset pattern to position n(next[j]). 
    Have already matched pattern[0] .. pattern[n-1],    pattern[0] .. pattern[n-1]=pattern[1] .. pattern[n]

    e.g. We have matched ABABA so far. 
    If next one fails, say we have matched ABA so far and then see if next one matches. 
    That is, keep i same, just reset j to 3 (= precisely length of longest prefix-suffix match) 
    Then, if match after ABA fails too, by the same rule we say we have matched A so far, reset to j = 1, and try again from there. 
    In other words, it starts by trying to match the longest prefix-suffix, but if that fails it works down to the shorter ones until exhausted (no prefix-suffix matches left).

     

    Algorithm to construct table of next[j]

    Do this once, when the pattern comes in.
    pattern[0] ... pattern[m-1] 
    Here, i and j both index pattern.
    就是说是两个模式串在比较
     NewImage

    next[0] = 0
    
    i = 1
    j = 0
    m = pattern.length
    
    while ( i < m )
    {
      // on 1 step i=1,j=0 
      if ( pattern[j] == pattern[i] )
      {
        next[i] = j+1 // it’s i not j
        i++
        j++
      }
      else ( pattern[j] != pattern[i] )
      {
        if ( j > 0 ){
    
                // 比如[0],[1],[2]  === [4],[5][6]
    
                //  这时 [3] <> [7]
    
         //maybe there is another pattern we can shift right though,就是前缀和后缀
     j = next[j-1] // 因为next[j]就是给j+1用的,这个可记为定律,并且用j-1的原因还有0到[j-1]才有前后缀匹配的概念,
     // j是没有和模式串中的前缀匹配的,画画图就知道了
         }
         else ( j == 0 )
         {
     // 模式串的下标为0时,与文本串s的下标i的值不匹配,i右移一位,模式串右移一位,0右移还是0
           next[i] = 0
           i++
           j = 0  // redundant, just to make it clear what we are looping with
         }
      }
    }
    
    
    
     
     
     

     

     
  • 相关阅读:
    arm-gcc 命名规则
    Ubuntu中安装最新 Node.js 和 npm
    Tutorial: Create a Blinky ARM test project(创建一个闪灯的arm测试项目)
    Tutorial: How to install GNU MCU Eclipse?
    操作系统有关概念
    移植 uCos-III 3.03 到 STM32F429 上
    Kubernetes工作原理
    Kubernetes基础特性
    nmap详解之原理与用法
    nmap详解之基础示例
  • 原文地址:https://www.cnblogs.com/yakun/p/3588636.html
Copyright © 2020-2023  润新知