lucene问题汇总:
从api中我们可以了解到:
The fields usedto determine sort order must be carefully chosen. Documents must contain asingle term in such a field, and the value of the term should indicate thedocument's relative position in a given sort order. The field must be indexed,but should not be tokenized, and does not need to be stored (unless you happento want it back with the rest of your document data). In other words:
document.add (new Field ("byNumber",Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));
总之需要排序的字段需要索引但不能被分词。
在常规的检索方法中加入一个参数即可完成排序的要求。
Ø TopFieldDocs search(
Query query,
Filter filter, int n,
Sort sort)
Searchimplementation with arbitrary sorting.
Ø Sort(
SortField field)
Sorts by thecriteria in the given SortField.
Ø SortField(
String field, int type)
Creates a sort byterms in the given field with the type of term values explicitly given.
代码示例:
SortField sortF = new SortField("f", SortField.INT);
Sort sort = new Sort(sortF);
TopFieldDocs docs = searcher.search(query, null, 10, sort);
ScoreDoc[] docs2 = docs.scoreDocs;
假设当前索引中有5份文档,f域的值分别是:-2,0,1,5,10;当用上述方式(SortField.INT)执行后返回结果为:-2,0,1,5,10;
当改用SortField.STRING后返回结果为:-2,0,1,10,5。
通过实验可知,特别对与数字(日期)相关的字段排序,选择SortField的类型很重要。
ps:
排序字段类型:
Field Summary | |
---|---|
static int |
BYTE Sort using term values as encoded Bytes. |
static int |
CUSTOM Sort using a custom Comparator. |
static int |
DOC Sort by document number (index order). |
static int |
DOUBLE Sort using term values as encoded Doubles. |
static SortField |
FIELD_DOC Represents sorting by document number (index order). |
static SortField |
FIELD_SCORE Represents sorting by document score (relevancy). |
static int |
FLOAT Sort using term values as encoded Floats. |
static int |
INT Sort using term values as encoded Integers. |
static int |
LONG Sort using term values as encoded Longs. |
static int |
SCORE Sort by document score (relevancy). |
static int |
SHORT Sort using term values as encoded Shorts. |
static int |
STRING Sort using term values as Strings. |
static int |
STRING_VAL Sort using term values as Strings, but comparing by value (using String.compareTo) for all comparisons. |
注意:
上面提到过选用Int型和String型对排序的效果不同,还有一点需要注意——那就是效率的问题,在数据量比较大的时候,对数字型字段进行排序最好选用合理的类型,不要笼统的全部是用String型进行排序。