• ElasticSearch如何一次查询出全部数据——基于Scroll


    Elasticsearch 查询结果默认只显示10条,可以通过设置fromsize来达到分页的效果(详见附3),但是 from + size <= 10,000,因为index.max_result_window 默认值是10,000,而 from+ size 必须小于index.max_result_window 。因此只能用Scroll(一次取一点,分多次取)取出所有的结果

    转载请注明出处:https://www.cnblogs.com/NaughtyCat/p/how-to-search-all-results-once-in-es.html

    • Scroll相当于传统数据库的游标,具体代码片段如下:
    		SearchResponse scrollResp = client.prepareSearch(availableIndices)
    				.setTypes(type)
    				.setScroll(new TimeValue(60000))
    				.setQuery(boolQueryBuilder)
    				.setSize(SEARCH_HITS_SIZE).get(); //max of SEARCH_HITS_SIZE hits will be returned for each scroll
    		//Scroll until no hits are returned
    		do {
    
    			for (SearchHit hit : scrollResp.getHits().getHits()) {
    					tmpJsonList.add( (JSONObject) JSONValue.parse(hit.getSourceAsString()));
    				}
    			}
    			jsonList.addAll(tmpJsonList);
    			tmpJsonList.clear();
    			scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
    		} while (scrollResp.getHits().getHits().length != 0);

    setScroll()里传入的时间,表示一次处理setSize()中size大小的数据的超时时间,即处理一个分页最长不超过的时间,上面的代码表示TimeOut = 1分钟(详情可搜索Scroll context。另外,数据量比较大,TimeOut得设长一点,笔者20亿左右的数据,至少TimeOut得设置为3min;否则会抛出异常: ElasticSearch: SearchContextMissingException[No search context found for id

    )。scrollResp.getScrollId()每次会生成一个ScrollID,如下图:

    • 用from + size循环读取的代码片段如下(作者【CoderBaby】)
           int index = 0; 
                {
                    tmpJsonList.clear();
                    srb.setFrom(Math.multiplyExact(index, SEARCH_HITS_SIZE));
                    index++;
                    MultiSearchResponse.Item[] items = sr.get().getResponses();
                    for (MultiSearchResponse.Item item : items) {
                        SearchResponse response = item.getResponse();
                        SearchHit[] hits = response.getHits().getHits();
                        if (hits.length != 0) {
                            for (SearchHit hit : hits) {
                                    tmpJsonList.add((JSONObject) JSONValue.parse(hit.getSourceAsString());
                                }
                            }
                        }
                    jsonList.addAll(tmpJsonList);
                    }
                } while (tmpJsonList.size() > 0);

    其中:SEARCH_HITS_SIZE = 1000, srb是多条件组合查询,前置代码如下:

     queryBuilders.forEach(query -> {          
            boolQueryBuilder.must(query);
                });
    
    MultiSearchRequestBuilder sr = client.prepareMultiSearch();
                SearchRequestBuilder srb = client.prepareSearch().setTypes(type).setIndices(availableIndices).setQuery(boolQueryBuilder).setSize(SEARCH_HITS_SIZE);
                sr.add(srb);

    查询条件的构造代码片段如下(用QueryBuilders根据需要选择term, range, match等):

    	StringUtil.isEmpty(l7p)) {
                queryBuilders.add(QueryBuilders.termQuery(Event.FIELD_L7P, l7p));
            }
      
            if (!StringUtil.isEmpty(startTime) && StringUtil.isEmpty(endTime)) {
                queryBuilders.add(QueryBuilders.rangeQuery(Event.FIELD_START_TIME).from(startTime));
            }

    附:

    1)using scroll in java https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

    2)scroll https://www.elastic.co/guide/en/elasticsearch/reference/5.1/search-request-scroll.html

    3) from and size https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-from-size

    *****************************************************************************************************

    精力有限,想法太多,专注做好一件事就行

    • 我只是一个程序猿。5年内把代码写好,技术博客字字推敲,坚持零拷贝和原创
    • 写博客的意义在于打磨文笔,训练逻辑条理性,加深对知识的系统性理解;如果恰好又对别人有点帮助,那真是一件令人开心的事

    *****************************************************************************************************

  • 相关阅读:
    NIO与普通IO文件读写性能对比
    JAVA学习.java.sql.date 与java.util.date以及gettime()方法的分析
    软件工程之软件设计
    ubuntu下管理android手机
    AFNetworking2.0 NSHipster翻译
    【Jsoup爬取网页内容】
    IOS 表视图UITableView 束NSBundle
    如何将位图格式图片文件(.bmp)生成geotiff格式图片?
    opencv3 使用glob遍历并修改文件名
    Ubuntu clion下载及激活
  • 原文地址:https://www.cnblogs.com/NaughtyCat/p/how-to-search-all-results-once-in-es.html
Copyright © 2020-2023  润新知