• elasticsearch 2.2+ index.codec: best_compression启用压缩


    官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:

    index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

    注意:2.1以下都是实验特性!2.2+才稳定!

    Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space. 

    摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

    下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0

    The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details. 

    Test String fields _all index size /w LZ4 index size /w DEFLATE expansion ratio /w LZ4 expansion ratio /w DEFLATE Impact of DEFLATE
    Structured data file. Original file size: 67644119              
    1 analyzed and not_analyzed  enabled 63047579 53131592 0.932 0.785 -0.157
    2 analyzed and not_analyzed  disabled 48271433 38327106 0.713 0.566 -0.206
    3 not_analyzed disabled 38920800 29014796 0.575 0.428 -0.254
    3b not_analyzed, except for 'message' field which is retained and analyzed disabled 65382872 49532858 0.966 0.732 -0.242
    4 not_analyzed, except for 'agent' field which is analyzed disabled 43083702 32063602 0.636 0.474 -0.255
    Semi-structured data file.
    Original file size: 75037027
                 
    1 analyzed and not_analyzed  enabled 100478376 82132782 1.339 1.094 -0.182
    2 analyzed and not_analyzed  disabled 75238480 56911638 1.002 0.758 -0.243
    3 not_analyzed disabled 71866672 53553561 0.957 0.713 -0.254
    3b not_analyzed, except for 'message' field which is retained and analyzed disabled 104638750 83824398 1.394 1.117 -0.198
    4 not_analyzed, except for 'agent' field which is analyzed disabled 72925624 54603882 0.971 0.727 -0.251

    With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters. 

    As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.

    Conclusion

    There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.

  • 相关阅读:
    面试6 在c#中如何声明一个类不能被继承
    面试5 如何理解静态变量,局部变量,全局变量
    面试4 你在什么情况下会用到虚方法?它与接口有什么不同
    面试3 不用系统自带的方法将字符串类型123456转换为值类型
    面试2 递归的算法求1,1,2,3,5,8.......的第30位数是多少,然后求这些数的和.
    面试1 SQL SERVER 查询第20行到30之间的数据
    ubuntu安装nginx
    ubuntu上使用ufw配置管理防火墙
    ubuntu上安装docker
    ubuntu上使用vim编辑文本内容
  • 原文地址:https://www.cnblogs.com/bonelee/p/6269582.html
Copyright © 2020-2023  润新知