• BlogEngine.net搜索


    谈及Blogengine的搜索,真的好强大,也许我少见多怪,呵呵。看过以前一个大大写的文章,知道这里有开放式搜索这一应用

      A          B

    A图没有打开博客,搜索引擎里就没有B图里的添加“Name of the blog”这一选项,是不是很神奇,呵呵。
    B里就是多了一个 <link href="http://localhost:52457/BlogEngine.NET/opensearch.axd" title="Name of the blog" rel="search" type="application/opensearchdescription+xml">
    如果把“Name of the blog”添加进去,那么就可以选择它作为搜索引擎,进行搜索,自然搜索的页面就是跳到我们的blog里了,呵呵。


    看看具体Blogengine是怎么去搜索的
    按一贯的思维去调试,输入内容然后点击search,是以 http://localhost:52457/BlogEngine.NET/search.aspx?q=1 跳转 q后面就是搜索的内容。
    一步一步走

     1 protected override void OnLoad(EventArgs e)
    2 {
    3 base.OnLoad(e);
    4
    5 rep.ItemDataBound += new RepeaterItemEventHandler(rep_ItemDataBound);
    6
    7 var term = Request.QueryString["q"];
    8 if (!Utils.StringIsNullOrWhitespace(term))
    9 {
    10 bool includeComments = (Request.QueryString["comment"] == "true");
    11
    12 var encodedTerm = Server.HtmlEncode(term);
    13 Page.Title = Server.HtmlEncode(Resources.labels.searchResultsFor) + " '" + encodedTerm + "'";
    14 h1Headline.InnerHtml = Resources.labels.searchResultsFor + " '" + encodedTerm + "'";
    15
    16 Uri url;
    17 if (!Uri.TryCreate(term, UriKind.Absolute, out url))
    18 {
    19 List<IPublishable> list = Search.Hits(term, includeComments);
    20 BindSearchResult(list);
    21 }
    22 else
    23 {
    24 SearchByApml(url);
    25 }
    26 }
    27 else
    28 {
    29 Page.Title = Resources.labels.search;
    30 h1Headline.InnerHtml = Resources.labels.search;
    31 }
    32
    33 }

    看到List<IPublishable> list = Search.Hits(term, includeComments); 这句,我们顺藤摸瓜

     1 ///<summary>
    2 /// Searches all the posts and returns a ranked result set.
    3 ///</summary>
    4 ///<param name="searchTerm">The term to search for</param>
    5 ///<param name="includeComments">True to include a post's comments and their authors in search</param>
    6 ///<returns>A list of IPublishable.</returns>
    7 public static List<IPublishable> Hits(string searchTerm, bool includeComments)
    8 {
    9 lock (SyncRoot)
    10 {
    11 var results = BuildResultSet(searchTerm, includeComments);
    12 var items = results.ConvertAll(ResultToPost);
    13 results.Clear();
    14 OnSearcing(searchTerm);
    15 return items;
    16 }
    17 }

    搜索所有内容,并且返回一个有序的结果集。看程序很显然还得继续跟 BuildResultSet

     1  ///<summary>
    2 /// Builds the results set and ranks it.
    3 ///</summary>
    4 ///<param name="searchTerm">
    5 /// The search Term.
    6 ///</param>
    7 ///<param name="includeComments">
    8 /// The include Comments.
    9 ///</param>
    10 private static List<Result> BuildResultSet(string searchTerm, bool includeComments)
    11 {
    12 var results = new List<Result>();
    13 var term = CleanContent(searchTerm.ToLowerInvariant().Trim(), false);
    14 var terms = term.Split(new[] { '' }, StringSplitOptions.RemoveEmptyEntries);
    15 var regex = string.Format(CultureInfo.InvariantCulture, "({0})", string.Join("|", terms));
    16
    17 foreach (var entry in Catalog)
    18 {
    19 var result = new Result();
    20 if (!(entry.Item is Comment))
    21 {
    22 var titleMatches = Regex.Matches(entry.Title, regex).Count;
    23 result.Rank = titleMatches * 20;
    24
    25 var postMatches = Regex.Matches(entry.Content, regex).Count;
    26 result.Rank += postMatches;
    27
    28 var descriptionMatches = Regex.Matches(entry.Item.Description, regex).Count;
    29 result.Rank += descriptionMatches * 2;
    30 }
    31 else if (includeComments)
    32 {
    33 var commentMatches = Regex.Matches(entry.Content + entry.Title, regex).Count;
    34 result.Rank += commentMatches;
    35 }
    36
    37 if (result.Rank > 0)
    38 {
    39 result.Item = entry.Item;
    40 results.Add(result);
    41 }
    42 }
    43
    44 results.Sort();
    45 return results;
    46 }

    先不管Catalog具体是怎样,这里的匹配操作都是为了给result.Rank 这里的权值赋值,匹配数越多,权值越高,那么排序也就越靠前,把权值大于0的结果添加进list<result>
    集合里,然后sort()排序,这里没有指定comparer那就是默认的,当然blogengine自己写了

     1 ///<summary>
    2 /// Compares the current object with another object of the same type.
    3 ///</summary>
    4 ///<param name="other">
    5 /// An object to compare with this object.
    6 ///</param>
    7 ///<returns>
    8 /// A 32-bit signed integer that indicates the relative order of the objects being compared. The return value
    9 /// has the following meanings: Value Meaning Less than zero This object is less than the other parameter.Zero
    10 /// This object is equal to other. Greater than zero This object is greater than other.
    11 ///</returns>
    12 public int CompareTo(Result other)
    13 {
    14 return other.Rank.CompareTo(this.Rank);
    15 }

    最后返回List<Result>排序后的结果集。再说Catalog是什么呢?他是一个用来被搜索的集合Collection<Entry>,看看Entry的结构

     1 ///<summary>
    2 /// A search optimized post object cleansed from HTML and stop words.
    3 ///</summary>
    4 internal struct Entry
    5 {
    6 #region Constants and Fields
    7
    8 ///<summary>
    9 /// The content of the post cleansed for stop words and HTML
    10 ///</summary>
    11 internal string Content;
    12
    13 ///<summary>
    14 /// The post object reference
    15 ///</summary>
    16 internal IPublishable Item;
    17
    18 ///<summary>
    19 /// The title of the post cleansed for stop words
    20 ///</summary>
    21 internal string Title;
    22
    23 #endregion
    24 }

    回过去看BuildResultSet函数里的匹配方法,我们就会发现原来如此了。我们知道有这么一个东西是用来搜索的,那么它是如何形成的呢?

     1  ///<summary>
    2 /// Initializes static members of the <see cref="Search"/> class.
    3 ///</summary>
    4 static Search()
    5 {
    6 BuildCatalog();
    7 Post.Saved += Post_Saved;
    8 Page.Saved += Page_Saved;
    9 BlogSettings.Changed += delegate { BuildCatalog(); };
    10 Post.CommentAdded += Post_CommentAdded;
    11 Post.CommentRemoved += delegate { BuildCatalog(); };
    12 Comment.Approved += Post_CommentAdded;
    13 }

    在静态构造函数内有一个BuildCatalog的方法用来建立搜索集合,同时为其他的post,page。。。等等都添加了事件,也就是说他们一有变动,那么就更新catalog,从这里
    又可以看出搜索的集合包含了很多对象,其实他们都有一个公共点就是继承了IPublishable接口
    至此,有了搜索的关键字,也有了被搜索的集合,那么自然可以返回搜索后的集合了。
    这里的搜索让我想起了lucene.net,呵呵,同样要考虑权值这一说,不过lucene的分词就高级多了,不像这里只能整个关键字去匹配,"ABC"就只能搜出含“ABC”的,而不能搜
    出含有“A”或“B”或“C”之类的。











  • 相关阅读:
    TDateTime 的相关用法
    Delphi 2005 之后的版本如何装组件
    (收藏)《博客园精华集》分类索引
    用 IIS 7、ARR 與 Velocity 建设高性能的大型网站
    异常处理准则
    Linq之动态排序(字符传入)
    用存储过程构造一个虚拟日期表发现的趣事
    Linq to SQL 加注Data Annotation在 Asp.Net MVC2中的应用
    .net framework加密方法
    SQL Server到Oracle连接服务器
  • 原文地址:https://www.cnblogs.com/whosedream/p/2259823.html
Copyright © 2020-2023  润新知