.NET脏字过滤算法(转)

来源：xingd.net - 博客园
　　但在我这里测试的时候，RegEx要快一倍左右。但是还是不太满意，应为我们网站上脏字过滤用的相当多，对效率已经有了一些影响，经过一番思考后，自己做了一个算法。在自己的机器上测试了一下，使用原文中的脏字库，0x19c的字符串长度，1000次循环，文本查找耗时1933.47ms，RegEx用了1216.719ms，而我的算法只用了244.125ms.

　　主要算法如代码所示
private static Dictionary dic = new Dictionary();
private static BitArray fastcheck = new BitArray(char.MaxValue);
static void Prepare()
{
string[] badwords = // read from file
foreach (string word in badwords)
{
if (!dic.ContainsKey(word))
{
dic.Add(word, null);
maxlength = Math.Max(maxlength, word.Length);
int value = word[0];
fastcheck[word[0]] = true;
}
}
}

　　使用的时候
int index = 0;
while (index ＜ target.Length)
{
if (!fastcheck[target[index]])
{
while (index ＜target.Length - 1 && !fastcheck[target[++index]]) ;
}
for (int j = 0; j ＜ Math.Min(maxlength, target.Length - index); j++)
{
string sub = target.Substring(index, j);
if (dic.ContainsKey(sub))
{
sb.Replace(sub, "***", index, j);
index += j;
break;
}
}
index++;
}

相关阅读:
自己做一个无敌的文件粉碎机
编程王道，唯“慢”不破
在Flex4中嵌入字体
java函数参数默认值
Adobe Air移动开发本人体会
安装VS2013，可是电脑C盘没空间了，今天早上整理了下
SilverFoxServer出炉！！
C#中Abstract和Virtual
解决insert语句插入时，需要写列值的问题
SQL 标量函数-----日期函数 day() 、month()、year()

原文地址：https://www.cnblogs.com/ami/p/906453.html