Lucence是Apache的一个全文检索引擎工具包。可以将采集的数据存储到索引库中,然后在根据查询条件从索引库中取出结果。索引库可以存在内存中或者存在硬盘上。
本文主要是参考了这篇博客进行学习的,原博客地址https://blog.csdn.net/bskfnvjtlyzmv867/article/details/80914156
主要开发流程是:采集数据,将数据转化成索引文档,然后存储在索引库中,索引库可以保存在内存中,或者保存在硬盘上。在查询的时候通过索引库查询结果,返回数据。
下面的例子主要是将Product表中的数据存储到索引库中,并通过索引库进行查询。项目依赖的jar包可以参考原博客,我用的Lucence版本是4.7。
新建实体类Product,其代码如下:
public class Product { private Long id; private String title; private String sellPoint; }
将Product实体转化成索引库中Document,并存到索引库中。Product数据可以从数据库中查询,然后通过此方法转化成索引库中的Document,此处省略从数据库查询Product的逻辑。
import java.io.IOException; import java.nio.file.Path; import java.nio.file.Paths; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import entity.Product; public class ProductRepository { public void createIndex(Product product) { Field id = new StringField("id", product.getId().toString(), Field.Store.YES); Field title = new TextField("title", product.getTitle().toString(), Field.Store.YES); Field sellPoint = new TextField("sellPoint", product.getSellPoint().toString(), Field.Store.YES); Document document = new Document(); document.add(id); document.add(title); document.add(sellPoint); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47, analyzer); Path path = Paths.get("D:/develop/workspace/slem_compass/data"); try { Directory directory = FSDirectory.open(path.toFile()); IndexWriter indexWriter = new IndexWriter(directory, config); indexWriter.addDocument(document); indexWriter.close(); } catch (IOException e) { e.printStackTrace(); } } }
其中上面的代码中Path是索引库在硬盘上的位置,我这里是放在D盘上的某个文件夹内。
下面如何从索引库中查询数据呢?我写了一个Servlet,用户提交查询关键字,request获取到后,根据关键字从索引库中查询数据。当然也可以用Main方法或者test测试类。
import java.io.IOException; import java.nio.file.Path; import java.nio.file.Paths; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; @WebServlet("/search") public class SearchServlet extends HttpServlet { private static final long serialVersionUID = 1L; public SearchServlet() { super(); } protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setCharacterEncoding("utf-8"); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47); QueryParser parser = new QueryParser(Version.LUCENE_47, "title", analyzer); String title = request.getParameter("title"); System.out.println(""); System.out.println("title: " + title); try { Query query = parser.parse(title); Path path = Paths.get("D:/develop/workspace/slem_compass/data"); Directory directory = FSDirectory.open(path.toFile()); IndexReader reader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(reader); TopDocs topDocs = indexSearcher.search(query, 10); ScoreDoc[] scoreDocs = topDocs.scoreDocs; for (ScoreDoc scoreDoc : scoreDocs) { int docID = scoreDoc.doc; Document doc = indexSearcher.doc(docID); System.out.println(doc.get("id") + " " + doc.get("title") + " " + doc.get("sellPoint")); } System.out.println(""); reader.close(); } catch (Exception e) { e.printStackTrace(); } response.setContentType("text/html;charset=utf-8"); response.getWriter().append("Served at: ").append(request.getContextPath()); } protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { doGet(request, response); } }
查询的时候也是从D盘上的索引库中读取相应的信息,然后根据关键字进行查询。
这样就完成了索引库的存储和查询。索引的查询很复杂,上面的demo只是一个比较简单的例子,说明大致的原理,后面继续补充索引的查询。