• Lucene根据字段进行自定义搜索扩展


    最近需要对公司的产品搜索功能做一步改动,搜索到的结果首先按照是否有库存进行排序,然后再按照销量。由于库存量也是一个整数,如果直接按照库存量进行倒序排序的话,是不符合要求的,Lucene也没有支持我们这种特殊的业务需求,但是可以通过扩展的方式进行改写。
     
     
    public class EmptyStockComparatorSource extends FieldComparatorSource {
        @Override
        public FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed)
                throws IOException {
            return new LongComparator(numHits, fieldname, 0L);
        }
    
        public static class LongComparator extends FieldComparator.NumericComparator<Long> {
            private final long[] values;
            private long bottom;
            private long topValue;
    
            /**
             * Creates a new comparator based on {@link Long#compare} for {@code numHits}.
             * When a document has no value for the field, {@code missingValue} is substituted.
             */
            public LongComparator(int numHits, String field, Long missingValue) {
                super(field, missingValue);
                values = new long[numHits];
            }
    
            @Override
            protected void doSetNextReader(LeafReaderContext context) throws IOException {
                currentReaderValues = getNumericDocValues(context, field);
                if (missingValue != null) {
                    docsWithField = getDocsWithValue(context, field);
                    // optimization to remove unneeded checks on the bit interface:
                    if (docsWithField instanceof Bits.MatchAllBits) {
                        docsWithField = null;
                    }
                } else {
                    docsWithField = null;
                }
            }
    
            @Override
            public int compare(int slot1, int slot2) {
                return Long.compare(values[slot1], values[slot2]);
            }
    
            @Override
            public int compareBottom(int doc) {
                // TODO: there are sneaky non-branch ways to compute
                // -1/+1/0 sign
                long v2 = currentReaderValues.get(doc);
                // Test for v2 == 0 to save Bits.get method call for
                // the common case (doc has value and value is non-zero):
                if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {
                    v2 = missingValue;
                }
    
                return Long.compare(bottom, v2);
            }
    
            @Override
            public void copy(int slot, int doc) {
                long v2 = currentReaderValues.get(doc);
                // Test for v2 == 0 to save Bits.get method call for
                // the common case (doc has value and value is non-zero):
                if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {
                    v2 = missingValue;
                }
    
                values[slot] = v2 > 0L ? 1L : 0L;
            }
    
            @Override
            public void setBottom(final int bottom) {
                this.bottom = values[bottom];
            }
    
            @Override
            public void setTopValue(Long value) {
                topValue = value;
            }
    
            @Override
            public Long value(int slot) {
                return Long.valueOf(values[slot]) ;
            }
    
            @Override
            public int compareTop(int doc) {
                long docValue = currentReaderValues.get(doc);
                // Test for docValue == 0 to save Bits.get method call for
                // the common case (doc has value and value is non-zero):
                if (docsWithField != null && docValue == 0 && !docsWithField.get(doc)) {
                    docValue = missingValue;
                }
                return Long.compare(topValue, docValue);
            }
        }
    }
     
     
    其中LongComparator直接从lucene源码中copy出来,只需要做些许修改即可,最主要的修改就是copy(int slot, int doc)方法,在复制比较值得过程中,将所有存在库存的值都视为1,否则视为0,这样排序的结果就是我们所期待的。
     
    我们用到的测试用例:
     
    Directory directory1 = FSDirectory.open(Paths.get(
                    "/Users/xxx/develop/tools/solr-5.5.0/server/solr/product/data/index"));
            DirectoryReader directoryReader1 = DirectoryReader.open(directory1);
            IndexSearcher searcher1 = new IndexSearcher(directoryReader1);
            Sort sort1 = new Sort(new SortField("psfixstock", new EmptyStockComparatorSource(), true),
                    new SortField("salesVolume", SortField.Type.INT, true));
    
            TopFieldDocs topDocs1 = searcher1.search(new TermQuery(new Term("gender_text", "女士")), 10, sort1);
            for (ScoreDoc scoreDoc : topDocs1.scoreDocs) {
                int doc = scoreDoc.doc;
                Document document = searcher1.doc(doc);
                System.out.println(String.format("docId=%s, psfixstock=%s, salesVolumn=%s", doc, document.get("psfixstock"), document.get("salesVolume")));
            }
     
     
    在排序时,需要将其加入至Sort对象中,但执行的时候出现错误,显示docvalues的类型不正确:
     
    Exception in thread "main" java.lang.IllegalStateException: unexpected docvalues type NONE for field 'psfixstock' (expected=NUMERIC). Use UninvertingReader or index with docvalues.
        at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
        at org.apache.lucene.index.DocValues.getNumeric(DocValues.java:227)
        at org.apache.lucene.search.FieldComparator$NumericComparator.getNumericDocValues(FieldComparator.java:167)
        at com.zp.solr.handler.component.EmptyStockComparatorSource$LongComparator.doSetNextReader(EmptyStockComparatorSource.java:36)
        at org.apache.lucene.search.SimpleFieldComparator.getLeafComparator(SimpleFieldComparator.java:36)
        at org.apache.lucene.search.FieldValueHitQueue.getComparators(FieldValueHitQueue.java:183)
        at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector.getLeafCollector(TopFieldCollector.java:164)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:812)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744)
        at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:729)
        at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:671)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:577)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:627)
        at com.zp.solr.handler.component.EmptyStockSortingTest.main(EmptyStockSortingTest.java:57)
    
     
    经过一番查找,找到原因,参考文档:http://qindongliang.iteye.com/blog/2297280,我们搜索所使用到的字段没有设置对应的docType。如果在solr中,需要进行手动排序的字段,设置docValues=“true”,并进行重新索引(使用full-import方式):
     
       
    <field name="psfixstock" type="tint" indexed="true" stored="true" multiValued="false" docValues="true" />
     
     
    必须要重新建立索引才可以正常运行。注意,此时Solr与Elastic Search采取的方案有所不同,Solr默认docValues=false,而ES则相反,使用Doc索引方式会对性能产生一定的影响,要谨慎使用。
     
    对于lucene中,需要将添加document中增加数字类型Field:NumericDocValuesField,否则出现上面的错误,
     
    document.add(new NumericDocValuesField("stock", stock));
    
     
     
    最终的排序结果已经按照我们的需要进行了:
     
    docId=2629, psfixstock=98391, salesVolumn=4685
    docId=305, psfixstock=991, salesVolumn=14
    docId=16762, psfixstock=3, salesVolumn=12
    docId=22350, psfixstock=993, salesVolumn=10
    docId=29021, psfixstock=11076, salesVolumn=10
    docId=3635, psfixstock=61, salesVolumn=6
    docId=4111, psfixstock=1104, salesVolumn=5
    docId=10608, psfixstock=4395, salesVolumn=5
    docId=4874, psfixstock=4975, salesVolumn=4
    docId=4911, psfixstock=6, salesVolumn=4
    docId=15071, psfixstock=998, salesVolumn=4
    docId=4837, psfixstock=9, salesVolumn=3
    docId=4860, psfixstock=1002, salesVolumn=3
    docId=3749, psfixstock=2240, salesVolumn=2
    docId=4109, psfixstock=1493, salesVolumn=2
    docId=15068, psfixstock=1000, salesVolumn=2
    docId=25901, psfixstock=11110, salesVolumn=2
    docId=3688, psfixstock=21, salesVolumn=1
    docId=4912, psfixstock=17, salesVolumn=1
    docId=5035, psfixstock=2, salesVolumn=1
    docId=11835, psfixstock=8, salesVolumn=1
    docId=12044, psfixstock=1, salesVolumn=1
    docId=13508, psfixstock=2, salesVolumn=1
    docId=20019, psfixstock=1, salesVolumn=1
    docId=20884, psfixstock=100000, salesVolumn=1
    docId=22620, psfixstock=1, salesVolumn=1
    docId=24128, psfixstock=1, salesVolumn=1
    docId=0, psfixstock=2, salesVolumn=0
    docId=9, psfixstock=1, salesVolumn=0
    docId=11, psfixstock=4, salesVolumn=0
    docId=15, psfixstock=3, salesVolumn=0
    docId=20, psfixstock=4, salesVolumn=0
    docId=23, psfixstock=2, salesVolumn=0
    docId=24, psfixstock=5, salesVolumn=0
    docId=25, psfixstock=7, salesVolumn=0
    docId=35, psfixstock=2, salesVolumn=0
    docId=53, psfixstock=2, salesVolumn=0
     
     
     
     
  • 相关阅读:
    准备工作
    个人作业感言
    年度书单-结对编程
    案例分析
    编程作业_词频统计
    2、阅读任务
    1、准备工作
    个人作业获奖感言
    3 20210405-1 案例分析作业
    202103226-1 编程作业
  • 原文地址:https://www.cnblogs.com/mmaa/p/5789862.html
Copyright © 2020-2023  润新知