最长公共子串问题( longest common substring problem)也就是找到两个或以上的字符串的最长公共子串,该字串在位置上相邻。
例如ABAB,BABA,ABBA,最长公共子串为AB。
我们可以把该问题定义如下:
给出两个字符串S和T,S的长度为m,T的长度为n,找出S与T的最长公共子串。
假设S = “ABAB”,T=“BABA”,我们可以构造一个如下的矩阵:
A | B | A | B | ||
0 | 0 | 0 | 0 | 0 | |
B | 0 | 0 | 1 | 0 | 1 |
A | 0 | 1 | 0 | 2 | 0 |
B | 0 | 0 | 2 | 0 | 3 |
A | 0 | 1 | 0 | 3 | 0 |
对于数组array[m][n],扩展为array[m+1][n+1],这样是为了方便初始化,减少循环中对边界的判断。
- if( S[i] == T[j] ) array[i+1][j+1] = array[i][j] + 1
- if( S[i] != T[j] ) array[i+1][j+1] = 0
以下给出源码:
1 #include <string.h>
2 #include <stdlib.h>
3 #include <stdio.h>
4
5 const char * LongestCommonSubstring(const char * strA, const char * strB)
6 {
7 char * LCS = NULL;
8 const size_t LengthA = strlen(strA);
9 const size_t LengthB = strlen(strB);
10 size_t LCSLength = 0;
11 unsigned int PositionX = 0;
12 unsigned int PositionY = 0;
13
14 int i, j;
15 int Matrix[LengthA + 1][LengthB + 1];;
16
17 for(i = 0; i < LengthA ; ++i)
18 {
19 for(j = 0; j < LengthB ; ++j)
20 {
21 Matrix[i][j] = 0;
22 }
23 }
24
25 for(i = 0; i < LengthA; ++i)
26 {
27 for(j = 0; j < LengthB; ++j)
28 {
29 if(strA[i] == strB[j])
30 {
31 if((i == 0)||(j == 0))
32 Matrix[i][j] = 1;
33 else
34 Matrix[i][j] = Matrix[i - 1][j - 1] + 1;
35 }
36 if(Matrix[i][j] > LCSLength)
37 {
38 LCSLength = Matrix[i][j];
39 PositionX = i;
40 PositionY = j;
41 }
42 }
43 }
44
45
46 LCS = (char *)malloc(LCSLength + 1);
47 int index = LCSLength - 1;
48 while(index >= 0)
49 {
50 LCS[index] = strA[PositionX];
51 --index;
52 --PositionX;
53 }
54 LCS[LCSLength] = '\0';
55
56 return LCS;
57 }
58 int main(int argc, char **argv)
59 {
60 const char * strA = "abab";
61 const char * strB = "baba";
62 const char * LCS = LongestCommonSubstring(strA, strB);
63 printf("Longest Common Substring is %s\n", LCS);
64 return 0;
65 }