• 一步一步跟我学习lucene(18)---lucene索引时join和查询时join使用演示样例


    了解sql的朋友都知道,我们在查询的时候能够採用join查询,即对有一定关联关系的对象进行联合查询来对多维的数据进行整理。这个联合查询的方式挺方便的。跟我们现实生活中的托人找关系类似,我们想要完毕一件事。先找自己的熟人,然后通过熟人在一次找到其它,终于通过这样的手段找到想要联系到的人。

    有点类似于”世间万物皆有联系“的感觉。

    lucene的join包提供了索引时join和查询时join的功能。

    Index-time join

    大意是索引时join提供了查询时join的支持,且IndexWriter.addDocuments()方法调用时被join的documents以单个document块存储索引。索引时join对普通文本内容(如xml文档或数据库表)是方便可用的。特别是对类似于数据库的那种多表关联的情况,我们须要对提供关联关系的列提供join支持;

    在索引时join的时候,索引中的documents被切割成parent documents(每一个索引块的最后一个document)和child documents (除了parent documents外的全部documents).  因为lucene并不记录doc块的信息,我们须要提供一个Filter来标示parent documents。

    在搜索结果的时候。我们利用ToParentBlockJoinQuery来从child query到parent document space来remap/join相应的结果。

    假设我们仅仅关注匹配查询条件的parent documents,我们能够用随意的collector来採集匹配到的parent documents;假设我们还想採集匹配parent document查询条件的child documents,我们就须要利用ToParentBlockJoinCollector来进行查询。一旦查询完毕,我们能够利用ToParentBlockJoinCollector.getTopGroups()来获取匹配条件的TopGroups.

    Query-time joins

    查询时join是基于索引词,事实上现有两步:

    • 第一步先从匹配fromQuery的fromField中採集全部的数据。
    • 从第一步得到的数据中筛选出全部符合条件的documents

    查询时join接收一下输入參数:

    • fromField:fromField的名称,即要join的documents中的字段;
    • formQuery: 用户的查询条件
    • multipleValuesPerDocument: fromField在document是否是多个值
    • scoreMode:定义other join side中score是怎样被使用的。假设不关注scoring,我们仅仅须要设置成ScoreMode.None,此种方式会忽略评分因此会更高效和节约内存。
    • toField:toField的名称。即要join的toField的在相应的document中的字段

    通常查询时join的实现类似于例如以下:

       String fromField = "from"; // Name of the from field
       boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
       String toField = "to"; // Name of the to field
       ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
       Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
     
       Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
       TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
       // Render topDocs...

    查询演示样例

    这里我们模拟6组数据。演示样例代码例如以下

    package com.lucene.index.test;
    
    import static org.junit.Assert.assertEquals;
    
    import java.nio.file.Paths;
    
    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.document.SortedDocValuesField;
    import org.apache.lucene.document.TextField;
    import org.apache.lucene.index.DirectoryReader;
    import org.apache.lucene.index.IndexReader;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.index.IndexWriterConfig;
    import org.apache.lucene.index.IndexWriterConfig.OpenMode;
    import org.apache.lucene.index.Term;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.TermQuery;
    import org.apache.lucene.search.TopDocs;
    import org.apache.lucene.search.join.JoinUtil;
    import org.apache.lucene.search.join.ScoreMode;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.FSDirectory;
    import org.apache.lucene.util.BytesRef;
    import org.junit.Test;
    
    public class TestJoin {
    	@Test
    	public void testSimple() throws Exception {
    	    final String idField = "id";
    	    final String toField = "productId";
    
    	    Directory dir = FSDirectory.open(Paths.get("index"));
    	    Analyzer analyzer = new StandardAnalyzer();
    	    IndexWriterConfig config = new IndexWriterConfig(analyzer);
    	    config.setOpenMode(OpenMode.CREATE);
    	    IndexWriter w = new IndexWriter(dir, config);
    
    	    // 0
    	    Document doc = new Document();
    	    doc.add(new TextField("description", "random text", Field.Store.YES));
    	    doc.add(new TextField("name", "name1", Field.Store.YES));
    	    doc.add(new TextField(idField, "1", Field.Store.YES));
    	    doc.add(new SortedDocValuesField(idField, new BytesRef("1")));
    	    
    	    w.addDocument(doc);
    
    	    // 1
    	    Document doc1 = new Document();
    	    doc1.add(new TextField("price", "10.0", Field.Store.YES));
    	    doc1.add(new TextField(idField, "2", Field.Store.YES));
    	    doc1.add(new SortedDocValuesField(idField, new BytesRef("2")));
    	    doc1.add(new TextField(toField, "1", Field.Store.YES));
    	    doc1.add(new SortedDocValuesField(toField, new BytesRef("1")));
    	    
    	    w.addDocument(doc1);
    
    	    // 2
    	    Document doc2 = new Document();
    	    doc2.add(new TextField("price", "20.0", Field.Store.YES));
    	    doc2.add(new TextField(idField, "3", Field.Store.YES));
    	    doc2.add(new SortedDocValuesField(idField, new BytesRef("3")));
    	    doc2.add(new TextField(toField, "1", Field.Store.YES));
    	    doc2.add(new SortedDocValuesField(toField, new BytesRef("1")));
    	    
    	    w.addDocument(doc2);
    
    	    // 3
    	    Document doc3 = new Document();
    	    doc3.add(new TextField("description", "more random text", Field.Store.YES));
    	    doc3.add(new TextField("name", "name2", Field.Store.YES));
    	    doc3.add(new TextField(idField, "4", Field.Store.YES));
    	    doc3.add(new SortedDocValuesField(idField, new BytesRef("4")));
    	    
    	    w.addDocument(doc3);
    	    
    
    	    // 4
    	    Document doc4 = new Document();
    	    doc4.add(new TextField("price", "10.0", Field.Store.YES));
    	    doc4.add(new TextField(idField, "5", Field.Store.YES));
    	    doc4.add(new SortedDocValuesField(idField, new BytesRef("5")));
    	    doc4.add(new TextField(toField, "4", Field.Store.YES));
    	    doc4.add(new SortedDocValuesField(toField, new BytesRef("4")));
    	    w.addDocument(doc4);
    
    	    // 5
    	    Document doc5 = new Document();
    	    doc5.add(new TextField("price", "20.0", Field.Store.YES));
    	    doc5.add(new TextField(idField, "6", Field.Store.YES));
    	    doc5.add(new SortedDocValuesField(idField, new BytesRef("6")));
    	    doc5.add(new TextField(toField, "4", Field.Store.YES));
    	    doc5.add(new SortedDocValuesField(toField, new BytesRef("4")));
    	    w.addDocument(doc5);
    	    
    	    //6
    	    Document doc6 = new Document();
    	    doc6.add(new TextField(toField, "4", Field.Store.YES));
    	    doc6.add(new SortedDocValuesField(toField, new BytesRef("4")));
    	    w.addDocument(doc6);
    	    w.commit();
    	    w.close();
    	    IndexReader reader = DirectoryReader.open(dir);
    	    IndexSearcher indexSearcher = new IndexSearcher(reader);
    	    
    
    	    // Search for product
    	    Query joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name2")), indexSearcher, ScoreMode.None);
    	    System.out.println(joinQuery);
    	    TopDocs result = indexSearcher.search(joinQuery, 10);
    	    System.out.println("查询到的匹配数据:"+result.totalHits);
    	    
    
    	    joinQuery = JoinUtil.createJoinQuery(idField, false, toField, new TermQuery(new Term("name", "name1")), indexSearcher, ScoreMode.None);
    	    result = indexSearcher.search(joinQuery, 10);
    	    System.out.println("查询到的匹配数据:"+result.totalHits);
    	    // Search for offer
    	    joinQuery = JoinUtil.createJoinQuery(toField, false, idField, new TermQuery(new Term("id", "5")), indexSearcher, ScoreMode.None);
    	    result = indexSearcher.search(joinQuery, 10);
    	    System.out.println("查询到的匹配数据:"+result.totalHits);
    
    	    indexSearcher.getIndexReader().close();
    	    dir.close();
    	  }
    
    }
    

    程序的执行结果例如以下:

    查询到的匹配数据:3
    查询到的匹配数据:2
    查询到的匹配数据:1
    

    以第一个查询为例:

    我们在查询的时候先依据name=name2这个查询条件找到记录为doc3的document,因为查询的是toField匹配的,我们在依据doc3找到其toField的值为4,然后查询条件变为productId:4,找出除本条记录外的其它数据。结果正好为3。符合条件。

    一步一步跟我学习lucene是对最近做lucene索引的总结,大家有问题的话联系本人的Q-Q:  891922381,同一时候本人新建Q-Q群:106570134(lucene,solr,netty,hadoop)。大家共同探讨,本人争取每日一博。希望大家持续关注,会带给大家惊喜的


    
  • 相关阅读:
    Linux cat命令详解
    服务器使用itchat.,hotReload=True,自动LOG OUT
    ubantu,安装pip3,修改默认Python版本号
    Linux SSH 远程登录错误解决办法 WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!
    微信防撤回python3
    微信定时发送天气python3
    微信,爬取每日一句,文本,schedule函数定时发送消息
    微信,爬取每日一句,发送至多人,多个群
    python中的轻量级定时任务调度库:schedule
    Django基础02
  • 原文地址:https://www.cnblogs.com/blfbuaa/p/6890546.html
Copyright © 2020-2023  润新知