• Hbase 优化参数


    在Hbase使用中,除了进行预分区,rowkey合理的设计外,平时也会对系统的内置参数进行优化

    1、 堆内存的优化

    HBase-site.xml

     <!-- regionServer的全局memstore的大小,超过该大小会触发flush到磁盘的操作,默认是堆大小的40%,而且regionserver级别的   
            flush会阻塞客户端读写 -->  
        <property>  
            <name>hbase.regionserver.global.memstore.size</name>  
            <value></value>  
            <description>Maximum size of all memstores in a region server before  
                new  
                updates are blocked and flushes are forced. Defaults to 40% of heap (0.4).  
                Updates are blocked and flushes are forced until size of all  
                memstores  
                in a region server hits  
                hbase.regionserver.global.memstore.size.lower.limit.  
                The default value in this configuration has been intentionally left  
                emtpy in order to  
                honor the old hbase.regionserver.global.memstore.upperLimit property if  
                present.  
            </description>  
        </property>  
    
        <!--可以理解为一个安全的设置,有时候集群的“写负载”非常高,写入量一直超过flush的量,这时,我们就希望memstore不要超过一定的安全设置。   
            在这种情况下,写操作就要被阻塞一直到memstore恢复到一个“可管理”的大小, 这个大小就是默认值是堆大小 * 0.4 * 0.95,也就是当regionserver级别   
            的flush操作发送后,会阻塞客户端写,一直阻塞到整个regionserver级别的memstore的大小为 堆大小 * 0.4 *0.95为止 -->  
        <property>  
            <name>hbase.regionserver.global.memstore.size.lower.limit</name>  
            <value></value>  
            <description>Maximum size of all memstores in a region server before  
                flushes are forced.  
                Defaults to 95% of hbase.regionserver.global.memstore.size (0.95).  
                A 100% value for this value causes the minimum possible flushing to  
                occur when updates are  
                blocked due to memstore limiting.  
                The default value in this configuration has been intentionally left  
                emtpy in order to  
                honor the old hbase.regionserver.global.memstore.lowerLimit property if  
                present.  
            </description>  
        </property>

    这个参数并不是越大越好,因为内存如果设置得很大,当数据量一旦积压到阻塞条件,要想刷写到恢复正常的数据量(堆大小 * 0.4 * 0.95)也会加大,这样一来阻塞的时间就会加长

    如果要更改堆内存的大小,

    hbase-env.sh

    2、优化DataNode允许的最大文件打开数

    hdfs-site.xml

    <!--  HBase一般都会同一时间操作大量的文件,根据集群的数量和规模以及数据动作,设置为4096或者更高。-->
    <property>
      <name>dfs.datanode.max.transfer.threads</name>
      <value>4096</value>
      <description>
            Specifies the maximum number of threads to use for transferring data
            in and out of the DN.
      </description>
    </property>

    3、优化延迟高的数据操作的等待时间

    hdfs-site.xml

    <!--如果对于某一次数据操作来讲,延迟非常高,socket需要等待更长的时间,建议把该值设置为更大的值(默认60000毫秒),以确保socket不会被timeout掉。 -->
    <property>
      <name>dfs.image.transfer.timeout</name>
      <value>60000</value>
      <description>
            Socket timeout for image transfer in milliseconds. This timeout and the related
            dfs.image.transfer.bandwidthPerSec parameter should be configured such
            that normal image transfer can complete successfully.
            This timeout prevents client hangs when the sender fails during
            image transfer. This is socket timeout during image transfer.
      </description>
    </property>

    4、优化数据的写入效率(即开启压缩)

    mapred-site.xml

    <property>
      <name>mapreduce.map.output.compress</name>
      <value>false</value>
      <description>Should the outputs of the maps be compressed before being
                   sent across the network. Uses SequenceFile compression.
      </description>
    </property>
    
    <property>
      <name>mapreduce.map.output.compress.codec</name>
      <value>org.apache.hadoop.io.compress.DefaultCodec</value>
      <description>If the map outputs are compressed, how should they be 
                   compressed?
      </description>
    </property>

    5、设置RPC监听数量

    hbase-site.xml

    <!-- regionServer端默认开启的RPC监控实例数,也即RegionServer能够处理的IO请求线程数
            当客户端过多或者读写请求过多时,可增加该值-->  
        <property>  
            <name>hbase.regionserver.handler.count</name>  
            <value>30</value>  
            <description>Count of RPC Listener instances spun up on RegionServers.  
                Same property is used by the Master for count of master handlers.  
            </description>  
        </property> 

    6、优化HStore文件大小

    hbase-site.xml

    <!--HStoreFile最大的大小,当某个region的某个列族超过这个大小会进行region拆分
            如果需要运行HBase的MR任务,可以减小此值,因为一个region对应一个map任务,如果单个region过大,会导致map任务执行时间过长。-->  
        <property>  
            <name>hbase.hregion.max.filesize</name>  
            <value>10737418240</value>  
            <description>  
                Maximum HStoreFile size. If any one of a column families' HStoreFiles has  
                grown to exceed this value, the hosting HRegion is split in two.  
            </description>  
        </property> 

    7、增大读缓存,写缓存

    hbase-site.xml

     <!-- hbase客户端每次 写缓冲的大小(也就是客户端批量提交到server端),这块大小会同时占用客户端和服务端,缓冲区更大可以减少RPC次数,但是更大意味着内存占用更多 -->  
        <property>  
            <name>hbase.client.write.buffer</name>  
            <value>2097152</value>  
            <description>Default size of the HTable client write buffer in bytes.  
                A bigger buffer takes more memory -- on both the client and server  
                side since server instantiates the passed write buffer to process  
                it -- but a larger buffer size reduces the number of RPCs made.  
                For an estimate of server-side memory-used, evaluate  
                hbase.client.write.buffer * hbase.regionserver.handler.count  
            </description>  
        </property>
    <!-- 在执行hbase scan操作的时候,客户端缓存的行数,设置小意味着更多的rpc次数,设置大比较吃内存 -->  
        <property>  
            <name>hbase.client.scanner.caching</name>  
            <value>2147483647</value>  
            <description>Number of rows that we try to fetch when calling next  
                on a scanner if it is not served from (local, client) memory. This  
                configuration  
                works together with hbase.client.scanner.max.result.size to try and use  
                the  
                network efficiently. The default value is Integer.MAX_VALUE by default so  
                that  
                the network will fill the chunk size defined by  
                hbase.client.scanner.max.result.size  
                rather than be limited by a particular number of rows since the size of  
                rows varies  
                table to table. If you know ahead of time that you will not require more  
                than a certain  
                number of rows from a scan, this configuration should be set to that row  
                limit via  
                Scan#setCaching. Higher caching values will enable faster scanners but will eat up  
                more  
                memory and some calls of next may take longer and longer times when the  
                cache is empty.  
                Do not set this value such that the time between invocations is greater  
                than the scanner  
                timeout; i.e. hbase.client.scanner.timeout.period  
            </description>  
        </property> 
  • 相关阅读:
    完美解决ListView中事件ItemCreated中使用ClientID导致插入数据失败
    cookie 和session 的区别详解
    ref与out之间的区别整理
    showModalDialog介绍
    meta元素
    [转] Spring Boot特性
    [转] Spring Boot 揭秘与实战(二) 数据存储篇
    Jsch初步
    如何通俗的解释交叉熵与相对熵
    pip安装时的异常,找不到lib2to3\Grammar.txt
  • 原文地址:https://www.cnblogs.com/yangxusun9/p/12546048.html
Copyright © 2020-2023  润新知