• Lucene实战之初体验


    前言

         最早做非结构化数据搜索时用的还是lucene.net,一直说在学习java的同时把lucene这块搞一搞,这拖了2年多了,终于开始搞这块了。

    开发环境

        idea2016、lucene6.0、jdk1.8

    使用lucene准备条件

    1、pom.xml

    2、测试数据。  我从博客园首页拿了几条数据,直接写了个伪Dao返回一个List<Blog>

    public List<Blog> listBlogs(){
            List<Blog> list = new ArrayList<Blog>();
    
            Blog b1=new Blog(6538488,"风过无痕-唐","你真的了解volatile吗,关于volatile的那些事","http://www.cnblogs.com/tangyanbo/p/6538488.html");
            Blog b2=new Blog(6541312,"布鲁克石","TypeScript设计模式之职责链、状态","http://www.cnblogs.com/brookshi/p/6541312.html");
            Blog b7=new Blog(6547215,"嵘么么","轻松理解JavaScript闭包","http://www.cnblogs.com/rongmm/p/6547215.html");
            Blog b3=new Blog(6541220,"一线码农","使用Task的一些知识优化了一下同事的多线程协作取消的一串代码","http://www.cnblogs.com/huangxincheng/p/6541220.html");
    
            Blog b4=new Blog(6516387,"xybaby","并发与同步、信号量与管程、生产者消费者问题","http://www.cnblogs.com/xybaby/p/6516387.html");
            Blog b5=new Blog(6532713,"王福朋","深入理解 JavaScript 异步系列(4)—— Generator","http://www.cnblogs.com/wangfupeng1988/p/6532713.html");
            Blog b6=new Blog(6517788,"欢醉","入坑系列之HAProxy负载均衡","http://www.cnblogs.com/zhangs1986/p/6517788.html");
    
            list.add(b1);
            list.add(b2);
            list.add(b3);
            list.add(b4);
            list.add(b5);
            list.add(b6);
            list.add(b7);
    
            return list;
        }
    

    IndexSearcher搜索

    1、搜索体验,基于servlet开发,直接接受路径参数key

    2、先写入索引

    public class IndexWriterServlet extends HttpServlet {
    
        @Override
        protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
            String index_path="d:\indexDir";
            Directory dir=null;
            Analyzer analyzer=null;
            IndexWriterConfig config=null;
            IndexWriter indexWriter=null;
            try {
                dir =new SimpleFSDirectory(Paths.get(index_path));
    
                analyzer=new StandardAnalyzer();
                config =new IndexWriterConfig(analyzer);
    
                indexWriter =new IndexWriter(dir,config);
                BlogDao blogDao=new BlogDao();
                List<Blog> list = blogDao.listBlogs();
    
                for (Blog blog:list){
    
                    Document document=new Document();
                    document.add(new Field("id",blog.getId().toString(),TextField.TYPE_STORED));
                    document.add(new Field("author",blog.getAuthor(),TextField.TYPE_STORED));
                    document.add(new Field("title",blog.getTitle(),TextField.TYPE_STORED));
                    document.add(new Field("url",blog.getUrl(),TextField.TYPE_STORED));
    
                    indexWriter.addDocument(document);
                }
                indexWriter.commit();
                System.out.println("====索引创建完成====");
            }catch (Exception ex){
                ex.printStackTrace();
            }finally {
                if(indexWriter !=null){
                    indexWriter.close();
                }
            }
        }
    
        @Override
        protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
            doGet(req,resp);
        }
    }
    

    2、查询索引数据

    public class IndexReaderServlet extends HttpServlet {
        @Override
        protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
            Directory dir =null;
            DirectoryReader reader =null;
            IndexSearcher searcher=null;
            String index_path="d:\indexDir";
            try{
                //dir =new RAMDirectory();
                dir =new SimpleFSDirectory(Paths.get(index_path));
                reader = DirectoryReader.open(dir);
                searcher=new IndexSearcher(reader);
    
                String key = req.getParameter("key");
                Query query = new TermQuery(new Term("title",key));
    
                TopDocs rs=searcher.search(query,100);
                resp.setContentType("text/html");
                resp.setCharacterEncoding("UTF-8");
                PrintWriter out = resp.getWriter();
    
                StringBuilder strHtml =new StringBuilder();
                for (int i = 0; i < rs.scoreDocs.length; i++) {
                    Document firstHit = searcher.doc(rs.scoreDocs[i].doc);
                    strHtml.append("title:"+firstHit.getField("title").stringValue()+"<br>");
                    strHtml.append("author:"+firstHit.getField("author").stringValue()+"<br>");
                    strHtml.append("<a href='"+firstHit.getField("url").stringValue()+"'>"+ firstHit.getField("url").stringValue()+"</a><br>");
                    strHtml.append("====================================================<br>");
                }
    
    
                out.write(strHtml.toString());
    
            }catch (Exception ex){
                ex.printStackTrace();
            }finally {
                reader.close();
            }
        }
    
        @Override
        protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
            doGet(req,resp);
        }
    }
    

      

    总结

         读写索引用到的几个关键类:IndexWriter、IndexSearcher、Directory、DirectoryReader、Document、TermQuery、Field

  • 相关阅读:
    回溯法 | 子集树:装载问题
    再谈01背包 | 使用【跳跃点集合】代替【求解矩阵】解题
    线性代数 | 行列式的计算
    慢,也是一种态度
    回溯法 | 图的m着色问题
    回溯法 | 旅行商问题(TSP问题)
    回溯法 | n皇后问题
    基数排序 RadixSort
    03.父工程pom、整合测试、SpringBootApplication注解
    02.自定义banner、全局配置文件、@Value获取自定义配置、@ConfigurationProperties、profiles配置
  • 原文地址:https://www.cnblogs.com/sword-successful/p/6547965.html
Copyright © 2020-2023  润新知