• [ lucene FAQ ] 检索结果怎么排序?对于不同类型(例如int型)的字段排序有什么区别吗?


    lucene问题汇总:

    Lucene常见问题汇总

    从api中我们可以了解到:

    The fields usedto determine sort order must be carefully chosen. Documents must contain asingle term in such a field, and the value of the term should indicate thedocument's relative position in a given sort order. The field must be indexed,but should not be tokenized, and does not need to be stored (unless you happento want it back with the rest of your document data). In other words:

    document.add (new Field ("byNumber",Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));

    总之需要排序的字段需要索引但不能被分词。

     

     

    在常规的检索方法中加入一个参数即可完成排序的要求。

    Ø  TopFieldDocs search(Query query, Filter filter, int n, Sort sort)
              Searchimplementation with arbitrary sorting.

    Ø  Sort(SortField field)
              Sorts by thecriteria in the given SortField.

    Ø  SortField(String field, int type)
              Creates a sort byterms in the given field with the type of term values explicitly given.

     

    代码示例:

           SortField sortF = new SortField("f", SortField.INT);

           Sort sort = new Sort(sortF);

           TopFieldDocs docs = searcher.search(query, null, 10, sort);

           ScoreDoc[] docs2 = docs.scoreDocs;

    假设当前索引中有5份文档,f域的值分别是:-2,0,1,5,10;当用上述方式(SortField.INT)执行后返回结果为:-2,0,1,5,10;

    当改用SortField.STRING后返回结果为:-2,0,1,10,5。

    通过实验可知,特别对与数字(日期)相关的字段排序,选择SortField的类型很重要。

    ps:

    排序字段类型: 

    Field Summary
    static int BYTE
              Sort using term values as encoded Bytes.
    static int CUSTOM
              Sort using a custom Comparator.
    static int DOC
              Sort by document number (index order).
    static int DOUBLE
              Sort using term values as encoded Doubles.
    static SortField FIELD_DOC
              Represents sorting by document number (index order).
    static SortField FIELD_SCORE
              Represents sorting by document score (relevancy).
    static int FLOAT
              Sort using term values as encoded Floats.
    static int INT
              Sort using term values as encoded Integers.
    static int LONG
              Sort using term values as encoded Longs.
    static int SCORE
              Sort by document score (relevancy).
    static int SHORT
              Sort using term values as encoded Shorts.
    static int STRING
              Sort using term values as Strings.
    static int STRING_VAL
              Sort using term values as Strings, but comparing by value (using String.compareTo) for all comparisons.

    注意:

    上面提到过选用Int型和String型对排序的效果不同,还有一点需要注意——那就是效率的问题,在数据量比较大的时候,对数字型字段进行排序最好选用合理的类型,不要笼统的全部是用String型进行排序。

  • 相关阅读:
    爬虫大作业
    数据结构化与保存
    爬取校园新闻首页的新闻
    爬取校园新闻
    网络爬虫基础练习
    中文词频统计
    在线检测网页在各种浏览器的打开效果
    Tomcat内存设置详解
    as3.0 [Embed]标签嵌入外部资源
    Flex学习总结
  • 原文地址:https://www.cnblogs.com/huangfox/p/1851188.html
Copyright © 2020-2023  润新知