• Regular Number 字符串匹配算法 Shift_and


    Using regular expression to define a numeric string is a very common thing. Generally, use the shape as follows: 
    (0|9|7) (5|6) (2) (4|5) 
    Above regular expression matches 4 digits:The first is one of 0,9 and 7. The second is one of 5 and 6. The third is 2. And the fourth is one of 4 and 5. The above regular expression can be successfully matched to 0525, but it cannot be matched to 9634. 
    Now,giving you a regular expression like the above formula,and a long string of numbers,please find out all the substrings of this long string that can be matched to the regular expression. 
    Input
    It contains a set of test data.The first line is a positive integer N (1 ≤ N ≤ 1000),on behalf of the regular representation of the N bit string.In the next N lines,the first integer of the i-th line is ai(1≤ai≤10)ai(1≤ai≤10),representing that the i-th position of regular expression has aiai numbers to be selected.Next there are aiai numeric characters. In the last line,there is a numeric string.The length of the string is not more than 5 * 10^6.
    Output
    Output all substrings that can be matched by the regular expression. Each substring occupies one line
    Sample Input
    4
    3 0 9 7
    2 5 7
    2 2 5
    2 4 5
    09755420524
    Sample Output
    9755
    7554
    0524

    适用于t[]串长度较小的情况,利用位运算一般比KMP算法快两倍以上。

    用D来记录前缀的匹配情况,要使用Shift 算法,需要一个辅助表B。B 是一个字典,key 是问题域字符集中的每个字符,value 是一个n 位无符号整数,记录该字符在模式串T 的哪些位置出现。

    由于D【j】表示的是T[0..J]是否是S[0...i]的后缀,所以只有当D[j-1]==1而且S[i]==T[j]的情况下,D[j]才等于1,同时将最低位设置为1,这样产生从当前位作为第一位的解。

      ,Shift-And 算法实现
    Shift-And 匹配过程代码:

     


    由于位运算在计算机中可以并行进行,每次循环的执行是常数时间的,所以上面代码段的复杂度是 O(m)。

    3,辅助表 B
    上面没有提到如何得到辅助表B。很简单,只要获得模式串T 中每个字符出现的位置。

     
    #include<iostream>
    #include<cstdio>
    #include<cmath>
    #include<cstring>
    #include<sstream>
    #include<algorithm>
    #include<queue>
    #include<deque>
    #include<iomanip>
    #include<vector>
    #include<cmath>
    #include<map>
    #include<stack>
    #include<set>
    #include<memory>
    #include<list>
    #include<bitset>
    #include<string>
    #include<functional>
    
    using namespace std;
    typedef long long LL;
    typedef unsigned long long ULL;
    const int MAXN = 5e6 + 9;
    #define L 1009
    #define INF 1000000009
    #define eps 0.00000001
    #define MOD 1000
    bitset<1009> B[256], D;
    char str[MAXN];
    int main()
    {
        int n, tmp, t;
        scanf("%d", &n);
        for (int i = 0; i < n; i++)
        {
            scanf("%d", &tmp);
            while (tmp--)
            {
                scanf("%d", &t);
                B[t].set(i);
            }
        }
        getchar();
        gets(str);
        int l = strlen(str);
        for (int i = 0; i < l; i++)
        {
            D = (D << 1).set(0)&B[str[i] - '0'];
            if (D[n - 1])
            {
                char ch = str[i + 1];
                str[i + 1] = '';
                puts(str + i - n + 1);
                str[i + 1] = ch;
            }
        }
    }
  • 相关阅读:
    设计模式 go语言实践-5 外观模式
    .net 5 preview发布
    设计模式 Vs实践-4 桥接模式
    设计模式 Vs实践-3 装饰器模式
    PowerDesign字段和中文名切换显示
    设计模式 Vs实践-2 抽象工厂模式
    设计模式 Vs实践-1 工厂模式
    环境变量path的值大于1024的解决办法
    powshell 输出字符编码的问题,设置为utf-8
    模拟真实点击click,专门对付clickoutside
  • 原文地址:https://www.cnblogs.com/joeylee97/p/7373330.html
Copyright © 2020-2023  润新知