• ElasticSearch查询max_result_window问题处理


    需要出一份印地语文章的表,导出规则为:

      1.所有印地语(包含各种颜色,各种状态)的文章

      2.阅读数大于300

      3.按照阅读推荐比进行排序,取前3000篇文章


    说明:

      1.文章信息,和阅读推荐数量在两个Es中

      2.印地语文章共30w+篇(不超过40w)


    思路:

      从Topic-Es中每次获取500个文章uuid,再去UserLog-Es中查询这500个uuid的阅读推荐数,将阅读数大于300的文章信息放入List集合中,导出Excel。


    问题:

      1.QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. 

        Failed to execute phase [dfs], all shards failed; shardFailures {[aPdAdh6fTlOzXsE7-rJ71Q][holga_index][0]: RemoteTransportException[[node-01][10.25.167.4:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[aPdAdh6fTlOzXsE7-rJ71Q][holga_index][1]: RemoteTransportException[[node-01][10.25.167.4:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[aPdAdh6fTlOzXsE7-rJ71Q][holga_index][2]: RemoteTransportException[[node-01][10.25.167.4:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }
    Error

      多次测试这个问题是必现问题,只要使用from...size...查询的页码大于1w就会出现该错误。使用的程序代码为:

    searchRequestBuilder.setQuery(query).addSort(SortBuilders.fieldSort("add_time").order(SortOrder.DESC)).setFrom(index).setSize(100);

      解决这个问题需要使用到scroll,解决方案如下:

    searchRequestBuilder.setQuery(query).addSort(SortBuilders.fieldSort("add_time").order(SortOrder.DESC)).setSize(500).setScroll(new TimeValue(total));

      2.The supplied data appears to be in the Office 2007+ XML. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

    Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:152)
        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:140)
        at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:302)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:87)
        at com.mkit.export.main.ExportExcel.write2File(ExportExcel.java:86)
        at com.mkit.export.main.ExportExcel.main(ExportExcel.java:35)
    Error

      出现这个问题是因为,读取的Excel文件是xlsx(offic2007版本excel),但是却使用了HSSF(HSSF只支持office2003版本文件)去接收读取到的Workbook变量,所以会导致错误发生。

     FileInputStream fs=new FileInputStream("d://aa.xls");      //offic2003文件
     POIFSFileSystem ps=new POIFSFileSystem(fs);    
     HSSFWorkbook wb = new HSSFWorkbook(ps);             //HSSFWorkbook(office 2003)       XSSFWorkbook(office 2007)
     HSSFSheet sheet = wb.getSheetAt(0);                //获取到工作表,因为一个excel可能有多个工作表
     int lastRowNum = sheet.getLastRowNum();
     System.out.println("获取最后一行为:"+lastRowNum);
  • 相关阅读:
    操作系统相关知识点
    const define static extern
    Openstack neutron学习
    TensorFlow_Faster_RCNN中demo.py的运行(CPU Only)
    研一前的暑假,深度学习初体验
    List.remove()的使用注意
    iOS开发之多线程(NSThread、NSOperation、GCD)
    PS 滤镜算法原理——碎片效果
    【翻译】ExtJS vs AngularJS
    【翻译】在Ext JS 5应用程序中如何使用路由
  • 原文地址:https://www.cnblogs.com/0xcafedaddy/p/6547267.html
Copyright © 2020-2023  润新知