• Oulipo (poj3461


    Oulipo
    Time Limit: 1000MS   Memory Limit: 65536K
    Total Submissions: 29759   Accepted: 11986

    Description

    The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

    Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

    Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

    So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A''B''C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

    Input

    The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

    • One line with the word W, a string over {'A''B''C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
    • One line with the text T, a string over {'A''B''C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

    Output

    For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

    Sample Input

    3
    BAPC
    BAPC
    AZA
    AZAZAZA
    VERDI
    AVERDXIVYERDIAN

    Sample Output

    1
    3
    0

    Source

    题意:两个字符串,在第二个字符串中可以找到多少个含有第一个字符串的区间……
    kmp能干嘛,next是干嘛的。
    按照正常思维一个个查的话,效率太低,于是伟大的前辈们就创造出了针对如何快速查找字符串的高大上的kmp。
     
    理解:    因为每次要查找的时候都会对字符串从头开始查找,但是如果我们已经知道在查找字符串当前位置的时候,有部分前缀和后缀是相等的,但是当前位置又不相等,
    那下次查找的时候就不用从头开始找啦,因为前缀和后缀相等,所以前缀和后缀相等的长度就不用查询了,就从已经查找到的位置减去前缀和后缀相等的长度开始查询就行了。
    那么问题又转化成了怎么求字符串当前位置前缀和后缀相等的长度。用数组next存储,next[j] = k,意味着:p[0, k-1] = p[j-k, j-1].
     
     
    因此KMP算法的关键在于求算next[]数组的值,即求算模式串每个位置处的最长后缀与前缀相同的长度, 而求算next[]数组的值有两种思路,第一种思路是用递推的思想去求算,还有一种就是直接去求解。

    1.按照递推的思想:

       根据定义next[0]=-1,假设next[j]=k, 即P[0...k-1]==P[j-k,j-1]

       1)若P[j]==P[k],则有P[0..k]==P[j-k,j],很显然,next[j+1]=next[j]+1=k+1;

       2)若P[j]!=P[k],则可以把其看做模式匹配的问题,即匹配失败的时候,k值如何移动,显然k=next[k]。

       因此可以这样去实现:

     1 void getNext(char *p,int *next)
     2 {
     3     int j,k;
     4     next[0]=-1;
     5     j=0;
     6     k=-1;
     7     while(j<strlen(p)-1)
     8     {
     9         if(k==-1||p[j]==p[k])    //匹配的情况下,p[j]==p[k]
    10         {
    11             j++;
    12             k++;
    13             next[j]=k;
    14         }
    15         else                   //p[j]!=p[k]
    16             k=next[k];
    17     }
    18 }
    View Code

    问题: next应该到数组长度……

    优化一点:

     1 void getNext(char *p,int *next)
     2 {
     3     int j,k;
     4     int lastIndex=strlen(p)-1;
     5     next[0]=-1;
     6     j=0;
     7     k=-1;
     8     while(j < lastIndex)
     9     {
    10         if(k==-1||p[j]==p[k])    //匹配的情况下,p[j]==p[k]
    11         {
    12             ++j;
    13             ++k;
    14              
    15             //若p[j]==p[k],则需要修正
    16             if(p[j]==p[k]) 
    17                 next[j]=next[k];
    18             else
    19                 next[j]=k;
    20         }
    21         else                   //p[j]!=p[k]
    22             k=next[k];
    23     }
    24 }
    View Code

    2.直接求解方法

     1 void getNext(char *p,int *next)
     2 {
     3     int i,j,temp;
     4     for(i=0;i<strlen(p);i++)
     5     {
     6         if(i==0)
     7         {
     8             next[i]=-1;     //next[0]=-1
     9         }
    10         else if(i==1) 
    11         {
    12             next[i]=0;      //next[1]=0
    13         }
    14         else
    15         {
    16             temp=i-1;
    17             for(j=temp;j>0;j--)
    18             {
    19                 if(equals(p,i,j))
    20                 {
    21                     next[i]=j;   //找到最大的k值
    22                     break;
    23                 }
    24             }
    25             if(j==0)
    26                 next[i]=0;
    27         }
    28     }
    29 }
    30 
    31 bool equals(char *p,int i,int j)     //判断p[0...j-1]与p[i-j...i-1]是否相等  
    32 {
    33     int k=0;
    34     int s=i-j;
    35     for(;k<=j-1&&s<=i-1;k++,s++)
    36     {
    37         if(p[k]!=p[s])
    38             return false;
    39     }
    40     return true;
    41 }
    View Code

    本题代码:

     1 #include<iostream>
     2 #include<cstdio>
     3 #include<cstring>
     4 
     5 using namespace std;
     6 
     7 char w[10008], t[1000008];
     8 int next[10008];
     9 
    10 void getNext()
    11 {
    12     int j, k;
    13     int i = strlen(w);
    14     j = 0;
    15     k = -1;
    16     next[0] = -1;
    17 
    18     while(j < i)
    19     {
    20         if(k == -1 || w[j] == w[k])
    21         {
    22             j++;
    23             k++;
    24             next[j] = k;
    25         }
    26         else
    27             k = next[k];
    28     }
    29 }
    30 
    31 int main()
    32 {
    33     int c;
    34     scanf("%d", &c);
    35     while(c--)
    36     {
    37         memset(next, 0, sizeof(next));
    38         scanf("%s", w);
    39         scanf("%s", t);
    40         getNext();
    41         int i, j, l1, l2;
    42         l1 = strlen(w);
    43         l2 = strlen(t);
    44         i = j = 0;
    45         int sum = 0;
    46         while(i < l1 && j < l2)
    47         {
    48             if(w[i] == t[j] || i == -1)
    49             {
    50                 i++;
    51                 j++;
    52             }
    53             else
    54                 i = next[i];            
    55 
    56             if(i == l1)
    57             {
    58                 sum++;
    59                 i = next[i];
    60             }
    61         }
    62         printf("%d
    ", sum);
    63     }
    64     return 0;
    65 }
  • 相关阅读:
    shell脚本编程-重定向
    web安全-剖析_基础构架剖析
    shell脚本编程-函数
    shell脚本编程-循环
    web安全-入侵基础
    shell脚本编程-检查和测试
    shell脚本编程-特殊字符
    shell脚本编程-计算方式
    python例子-抓取网站IP列表
    linux问题-APR not Found
  • 原文地址:https://www.cnblogs.com/Tinamei/p/4798918.html
Copyright © 2020-2023  润新知