用户操作HBase产生的数据并不是立即同步到HDFS,为了保证读写效率,而是先存在每个Region(存储水平切分后的所有列族的数据)对应的MemStore中,到达一定时机才会刷写到HDFS。
1 当某个memstroe的大小达到了hbase.hregion.memstore.flush.size(默认值128M),其所在region的所有memstore都会刷写(阻塞写)。
<!-- 单个region里memstore的缓存大小,超过那么整个HRegion就会flush,默认128M --> <property> <name>hbase.hregion.memstore.flush.size</name> <value>134217728</value> <description> Memstore will be flushed to disk if size of the memstore exceeds this number of bytes. Value is checked by a thread that runs every hbase.server.thread.wakefrequency. </description> </property>
2 当region server 中memstore 的总大小达到堆内存的百分之多少时,region 会按照其所有 memstore 的大小顺序(由大到小)依次进行刷写。直到 region server中所有 memstore 的总大小减小到上述值以下。(阻塞写)
<!-- regionServer的全局memstore的大小,超过该大小会触发flush到磁盘的操作,默认是堆大小的40%,而且regionserver级别的 flush会阻塞客户端读写 --> <property> <name>hbase.regionserver.global.memstore.size</name> <value></value> <description>Maximum size of all memstores in a region server before new updates are blocked and flushes are forced. Defaults to 40% of heap (0.4). Updates are blocked and flushes are forced until size of all memstores in a region server hits hbase.regionserver.global.memstore.size.lower.limit. The default value in this configuration has been intentionally left emtpy in order to honor the old hbase.regionserver.global.memstore.upperLimit property if present. </description> </property>
当阻塞刷写到上个参数的0.95倍时,客户端可以继续写
<!--可以理解为一个安全的设置,有时候集群的“写负载”非常高,写入量一直超过flush的量,这时,我们就希望memstore不要超过一定的安全设置。 在这种情况下,写操作就要被阻塞一直到memstore恢复到一个“可管理”的大小, 这个大小就是默认值是堆大小 * 0.4 * 0.95,也就是当regionserver级别 的flush操作发送后,会阻塞客户端写,一直阻塞到整个regionserver级别的memstore的大小为 堆大小 * 0.4 *0.95为止 --> <property> <name>hbase.regionserver.global.memstore.size.lower.limit</name> <value></value> <description>Maximum size of all memstores in a region server before flushes are forced. Defaults to 95% of hbase.regionserver.global.memstore.size (0.95). A 100% value for this value causes the minimum possible flushing to occur when updates are blocked due to memstore limiting. The default value in this configuration has been intentionally left emtpy in order to honor the old hbase.regionserver.global.memstore.lowerLimit property if present. </description> </property>
3.到达自动刷写的时间,也会触发memstoreflush。自动刷新的时间间隔由该属性进行配置hbase.regionserver.optionalcacheflushinterval(默认1小时)。
<!-- 内存中的文件在自动刷新之前能够存活的最长时间,默认是1h --> <property> <name>hbase.regionserver.optionalcacheflushinterval</name> <value>3600000</value> <description> Maximum amount of time an edit lives in memory before being automatically flushed. Default 1 hour. Set it to 0 to disable automatic flushing. </description> </property>