转 hbase参数的意义和调优

转 hbase参数的意义和调优
测试时发现理解这些参数都代表什么意义非常的重要，而且通过参数调优可以提高性能，希望仔细阅读一下每个属性代表的意义！

感谢原作者的整理，转来仅做学习笔记使用
1. <?xml version="1.0"?>
2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3. 
24. <configuration>
25. <property>
26. <name>hbase.rootdir</name>
27. <value>file:///tmp/hbase-${user.name}/hbase</value>
28. <description>The directory shared by region servers and into
29. which HBase persists. The URL should be 'fully-qualified'
30. to include the filesystem scheme. For example, to specify the
31. HDFS directory '/hbase' where the HDFS instance's namenode is
32. running at namenode.example.org on port 9000, set this value to:
33. hdfs://namenode.example.org:9000/hbase. By default HBase writes
34. into /tmp. Change this configuration else all data will be lost
35. on machine restart.
36. </description>
37. </property>
38. <property>
39. <name>hbase.master.port</name>
40. <value>60000</value>
41. <description>The port the HBase Master should bind to.</description>
42. </property>
43. <property>
44. <name>hbase.cluster.distributed</name>
45. <value>false</value>
46. <description>The mode the cluster will be in. Possible values are
47. false for standalone mode and true for distributed mode. If
48. false, startup will run all HBase and ZooKeeper daemons together
49. in the one JVM.
50. </description>
51. </property>
52. <property>
53. <name>hbase.tmp.dir</name>
54. <value>/tmp/hbase-${user.name}</value>
55. <description>Temporary directory on the local filesystem.
56. Change this setting to point to a location more permanent
57. than '/tmp' (The '/tmp' directory is often cleared on
58. machine restart).
59. </description>
60. </property>
61. <property>
62. <name>hbase.master.info.port</name>
63. <value>60010</value>
64. <description>The port for the HBase Master web UI.
65. Set to -1 if you do not want a UI instance run.
66. </description>
67. </property>
68. <property>
69. <name>hbase.master.info.bindAddress</name>
70. <value>0.0.0.0</value>
71. <description>The bind address for the HBase Master web UI
72. </description>
73. </property>
74. <property>
75. <name>hbase.client.write.buffer</name>
76. <value>2097152</value>
77. <description>Default size of the HTable clien write buffer in bytes.
78. A bigger buffer takes more memory -- on both the client and server
79. side since server instantiates the passed write buffer to process
80. it -- but a larger buffer size reduces the number of RPCs made.
81. For an estimate of server-side memory-used, evaluate
82. hbase.client.write.buffer * hbase.regionserver.handler.count
83. </description>
84. </property>
85. <property>
86. <name>hbase.regionserver.port</name>
87. <value>60020</value>
88. <description>The port the HBase RegionServer binds to.
89. </description>
90. </property>
91. <property>
92. <name>hbase.regionserver.info.port</name>
93. <value>60030</value>
94. <description>The port for the HBase RegionServer web UI
95. Set to -1 if you do not want the RegionServer UI to run.
96. </description>
97. </property>
98. <property>
99. <name>hbase.regionserver.info.port.auto</name>
100. <value>false</value>
101. <description>Whether or not the Master or RegionServer
102. UI should search for a port to bind to. Enables automatic port
103. search if hbase.regionserver.info.port is already in use.
104. Useful for testing, turned off by default.
105. </description>
106. </property>
107. <property>
108. <name>hbase.regionserver.info.bindAddress</name>
109. <value>0.0.0.0</value>
110. <description>The address for the HBase RegionServer web UI
111. </description>
112. </property>
113. <property>
114. <name>hbase.regionserver.class</name>
115. <value>org.apache.hadoop.hbase.ipc.HRegionInterface</value>
116. <description>The RegionServer interface to use.
117. Used by the client opening proxy to remote region server.
118. </description>
119. </property>
120. <property>
121. <name>hbase.client.pause</name>
122. <value>1000</value>
123. <description>General client pause value. Used mostly as value to wait
124. before running a retry of a failed get, region lookup, etc.</description>
125. </property>
126. <property>
127. <name>hbase.client.retries.number</name>
128. <value>10</value>
129. <description>Maximum retries. Used as maximum for all retryable
130. operations such as fetching of the root region from root region
131. server, getting a cell's value, starting a row update, etc.
132. Default: 10.
133. </description>
134. </property>
135. <property>
136. <name>hbase.client.scanner.caching</name>
137. <value>1</value>
138. <description>Number of rows that will be fetched when calling next
139. on a scanner if it is not served from (local, client) memory. Higher
140. caching values will enable faster scanners but will eat up more memory
141. and some calls of next may take longer and longer times when the cache is empty.
142. Do not set this value such that the time between invocations is greater
143. than the scanner timeout; i.e. hbase.regionserver.lease.period
144. </description>
145. </property>
146. <property>
147. <name>hbase.client.keyvalue.maxsize</name>
148. <value>10485760</value>
149. <description>Specifies the combined maximum allowed size of a KeyValue
150. instance. This is to set an upper boundary for a single entry saved in a
151. storage file. Since they cannot be split it helps avoiding that a region
152. cannot be split any further because the data is too large. It seems wise
153. to set this to a fraction of the maximum region size. Setting it to zero
154. or less disables the check.
155. </description>
156. </property>
157. <property>
158. <name>hbase.regionserver.lease.period</name>
159. <value>60000</value>
160. <description>HRegion server lease period in milliseconds. Default is
161. 60 seconds. Clients must report in within this period else they are
162. considered dead.</description>
163. </property>
164. <property>
165. <name>hbase.regionserver.handler.count</name>
166. <value>10</value>
167. <description>Count of RPC Server instances spun up on RegionServers
168. Same property is used by the Master for count of master handlers.
169. Default is 10.
170. </description>
171. </property>
172. <property>
173. <name>hbase.regionserver.msginterval</name>
174. <value>3000</value>
175. <description>Interval between messages from the RegionServer to Master
176. in milliseconds.
177. </description>
178. </property>
179. <property>
180. <name>hbase.regionserver.flushlogentries</name>
181. <value>1</value>
182. <description>Sync the HLog to HDFS when it has accumulated this many
183. entries. Default 1. Value is checked on every HLog.hflush
184. </description>
185. </property>
186. <property>
187. <name>hbase.regionserver.optionallogflushinterval</name>
188. <value>1000</value>
189. <description>Sync the HLog to the HDFS after this interval if it has not
190. accumulated enough entries to trigger a sync. Default 1 second. Units:
191. milliseconds.
192. </description>
193. </property>
194. <property>
195. <name>hbase.regionserver.regionSplitLimit</name>
196. <value>2147483647</value>
197. <description>Limit for the number of regions after which no more region
198. splitting should take place. This is not a hard limit for the number of
199. regions but acts as a guideline for the regionserver to stop splitting after
200. a certain limit. Default is set to MAX_INT; i.e. do not block splitting.
201. </description>
202. </property>
203. <property>
204. <name>hbase.regionserver.logroll.period</name>
205. <value>3600000</value>
206. <description>Period at which we will roll the commit log regardless
207. of how many edits it has.</description>
208. </property>
209. <property>
210. <name>hbase.regionserver.hlog.reader.impl</name>
211. <value>org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader</value>
212. <description>The HLog file reader implementation.</description>
213. </property>
214. <property>
215. <name>hbase.regionserver.hlog.writer.impl</name>
216. <value>org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter</value>
217. <description>The HLog file writer implementation.</description>
218. </property>
219. <property>
220. <name>hbase.regionserver.thread.splitcompactcheckfrequency</name>
221. <value>20000</value>
222. <description>How often a region server runs the split/compaction check.
223. </description>
224. </property>
225. <property>
226. <name>hbase.regionserver.nbreservationblocks</name>
227. <value>4</value>
228. <description>The number of resevoir blocks of memory release on
229. OOME so we can cleanup properly before server shutdown.
230. </description>
231. </property>
232. <property>
233. <name>hbase.zookeeper.dns.interface</name>
234. <value>default</value>
235. <description>The name of the Network Interface from which a ZooKeeper server
236. should report its IP address.
237. </description>
238. </property>
239. <property>
240. <name>hbase.zookeeper.dns.nameserver</name>
241. <value>default</value>
242. <description>The host name or IP address of the name server (DNS)
243. which a ZooKeeper server should use to determine the host name used by the
244. master for communication and display purposes.
245. </description>
246. </property>
247. <property>
248. <name>hbase.regionserver.dns.interface</name>
249. <value>default</value>
250. <description>The name of the Network Interface from which a region server
251. should report its IP address.
252. </description>
253. </property>
254. <property>
255. <name>hbase.regionserver.dns.nameserver</name>
256. <value>default</value>
257. <description>The host name or IP address of the name server (DNS)
258. which a region server should use to determine the host name used by the
259. master for communication and display purposes.
260. </description>
261. </property>
262. <property>
263. <name>hbase.master.dns.interface</name>
264. <value>default</value>
265. <description>The name of the Network Interface from which a master
266. should report its IP address.
267. </description>
268. </property>
269. <property>
270. <name>hbase.master.dns.nameserver</name>
271. <value>default</value>
272. <description>The host name or IP address of the name server (DNS)
273. which a master should use to determine the host name used
274. for communication and display purposes.
275. </description>
276. </property>
277. <property>
278. <name>hbase.balancer.period
279. </name>
280. <value>300000</value>
281. <description>Period at which the region balancer runs in the Master.
282. </description>
283. </property>
284. <property>
285. <name>hbase.master.logcleaner.ttl</name>
286. <value>600000</value>
287. <description>Maximum time a HLog can stay in the .oldlogdir directory,
288. after which it will be cleaned by a Master thread.
289. </description>
290. </property>
291. <property>
292. <name>hbase.master.logcleaner.plugins</name>
293. <value>org.apache.hadoop.hbase.master.TimeToLiveLogCleaner</value>
294. <description>A comma-separated list of LogCleanerDelegate invoked by
295. the LogsCleaner service. These WAL/HLog cleaners are called in order,
296. so put the HLog cleaner that prunes the most HLog files in front. To
297. implement your own LogCleanerDelegate, just put it in HBase's classpath
298. and add the fully qualified class name here. Always add the above
299. default log cleaners in the list.
300. </description>
301. </property>
302. <property>
303. <name>hbase.regionserver.global.memstore.upperLimit</name>
304. <value>0.4</value>
305. <description>Maximum size of all memstores in a region server before new
306. updates are blocked and flushes are forced. Defaults to 40% of heap
307. </description>
308. </property>
309. <property>
310. <name>hbase.regionserver.global.memstore.lowerLimit</name>
311. <value>0.35</value>
312. <description>When memstores are being forced to flush to make room in
313. memory, keep flushing until we hit this mark. Defaults to 35% of heap.
314. This value equal to hbase.regionserver.global.memstore.upperLimit causes
315. the minimum possible flushing to occur when updates are blocked due to
316. memstore limiting.
317. </description>
318. </property>
319. <property>
320. <name>hbase.server.thread.wakefrequency</name>
321. <value>10000</value>
322. <description>Time to sleep in between searches for work (in milliseconds).
323. Used as sleep interval by service threads such as log roller.
324. </description>
325. </property>
326. <property>
327. <name>hbase.hregion.memstore.flush.size</name>
328. <value>67108864</value>
329. <description>
330. Memstore will be flushed to disk if size of the memstore
331. exceeds this number of bytes. Value is checked by a thread that runs
332. every hbase.server.thread.wakefrequency.
333. </description>
334. </property>
335. <property>
336. <name>hbase.hregion.preclose.flush.size</name>
337. <value>5242880</value>
338. <description>
339. If the memstores in a region are this size or larger when we go
340. to close, run a "pre-flush" to clear out memstores before we put up
341. the region closed flag and take the region offline. On close,
342. a flush is run under the close flag to empty memory. During
343. this time the region is offline and we are not taking on any writes.
344. If the memstore content is large, this flush could take a long time to
345. complete. The preflush is meant to clean out the bulk of the memstore
346. before putting up the close flag and taking the region offline so the
347. flush that runs under the close flag has little to do.
348. </description>
349. </property>
350. <property>
351. <name>hbase.hregion.memstore.block.multiplier</name>
352. <value>2</value>
353. <description>
354. Block updates if memstore has hbase.hregion.block.memstore
355. time hbase.hregion.flush.size bytes. Useful preventing
356. runaway memstore during spikes in update traffic. Without an
357. upper-bound, memstore fills such that when it flushes the
358. resultant flush files take a long time to compact or split, or
359. worse, we OOME.
360. </description>
361. </property>
362. <property>
363. <name>hbase.hregion.max.filesize</name>
364. <value>268435456</value>
365. <description>
366. Maximum HStoreFile size. If any one of a column families' HStoreFiles has
367. grown to exceed this value, the hosting HRegion is split in two.
368. Default: 256M.
369. </description>
370. </property>
371. <property>
372. <name>hbase.hstore.compactionThreshold</name>
373. <value>3</value>
374. <description>
375. If more than this number of HStoreFiles in any one HStore
376. (one HStoreFile is written per flush of memstore) then a compaction
377. is run to rewrite all HStoreFiles files as one. Larger numbers
378. put off compaction but when it runs, it takes longer to complete.
379. </description>
380. </property>
381. <property>
382. <name>hbase.hstore.blockingStoreFiles</name>
383. <value>7</value>
384. <description>
385. If more than this number of StoreFiles in any one Store
386. (one StoreFile is written per flush of MemStore) then updates are
387. blocked for this HRegion until a compaction is completed, or
388. until hbase.hstore.blockingWaitTime has been exceeded.
389. </description>
390. </property>
391. <property>
392. <name>hbase.hstore.blockingWaitTime</name>
393. <value>90000</value>
394. <description>
395. The time an HRegion will block updates for after hitting the StoreFile
396. limit defined by hbase.hstore.blockingStoreFiles.
397. After this time has elapsed, the HRegion will stop blocking updates even
398. if a compaction has not been completed. Default: 90 seconds.
399. </description>
400. </property>
401. <property>
402. <name>hbase.hstore.compaction.max</name>
403. <value>10</value>
404. <description>Max number of HStoreFiles to compact per 'minor' compaction.
405. </description>
406. </property>
407. <property>
408. <name>hbase.hregion.majorcompaction</name>
409. <value>86400000</value>
410. <description>The time (in miliseconds) between 'major' compactions of all
411. HStoreFiles in a region. Default: 1 day.
412. Set to 0 to disable automated major compactions.
413. </description>
414. </property>
415. <property>
416. <name>hbase.mapreduce.hfileoutputformat.blocksize</name>
417. <value>65536</value>
418. <description>The mapreduce HFileOutputFormat writes storefiles/hfiles.
419. This is the minimum hfile blocksize to emit. Usually in hbase, writing
420. hfiles, the blocksize is gotten from the table schema (HColumnDescriptor)
421. but in the mapreduce outputformat context, we don't have access to the
422. schema so get blocksize from Configuation. The smaller you make
423. the blocksize, the bigger your index and the less you fetch on a
424. random-access. Set the blocksize down if you have small cells and want
425. faster random-access of individual cells.
426. </description>
427. </property>
428. <property>
429. <name>hfile.block.cache.size</name>
430. <value>0.2</value>
431. <description>
432. Percentage of maximum heap (-Xmx setting) to allocate to block cache
433. used by HFile/StoreFile. Default of 0.2 means allocate 20%.
434. Set to 0 to disable.
435. </description>
436. </property>
437. <property>
438. <name>hbase.hash.type</name>
439. <value>murmur</value>
440. <description>The hashing algorithm for use in HashFunction. Two values are
441. supported now: murmur (MurmurHash) and jenkins (JenkinsHash).
442. Used by bloom filters.
443. </description>
444. </property>
445. <property>
446. <name>zookeeper.session.timeout</name>
447. <value>180000</value>
448. <description>ZooKeeper session timeout.
449. HBase passes this to the zk quorum as suggested maximum time for a
450. session. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
451. "The client sends a requested timeout, the server responds with the
452. timeout that it can give the client. "
453. In milliseconds.
454. </description>
455. </property>
456. <property>
457. <name>zookeeper.znode.parent</name>
458. <value>/hbase</value>
459. <description>Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper
460. files that are configured with a relative path will go under this node.
461. By default, all of HBase's ZooKeeper file path are configured with a
462. relative path, so they will all go under this directory unless changed.
463. </description>
464. </property>
465. <property>
466. <name>zookeeper.znode.rootserver</name>
467. <value>root-region-server</value>
468. <description>Path to ZNode holding root region location. This is written by
469. the master and read by clients and region servers. If a relative path is
470. given, the parent folder will be ${zookeeper.znode.parent}. By default,
471. this means the root location is stored at /hbase/root-region-server.
472. </description>
473. </property>
474. 
478. <property>
479. <name>hbase.zookeeper.quorum</name>
480. <value>localhost</value>
481. <description>Comma separated list of servers in the ZooKeeper Quorum.
482. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
483. By default this is set to localhost for local and pseudo-distributed modes
484. of operation. For a fully-distributed setup, this should be set to a full
485. list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
486. this is the list of servers which we will start/stop ZooKeeper on.
487. </description>
488. </property>
489. <property>
490. <name>hbase.zookeeper.peerport</name>
491. <value>2888</value>
492. <description>Port used by ZooKeeper peers to talk to each other.
493. See http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
494. for more information.
495. </description>
496. </property>
497. <property>
498. <name>hbase.zookeeper.leaderport</name>
499. <value>3888</value>
500. <description>Port used by ZooKeeper for leader election.
501. See http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
502. for more information.
503. </description>
504. </property>
505. 
506. 
516. <property>
517. <name>hbase.zookeeper.property.initLimit</name>
518. <value>10</value>
519. <description>Property from ZooKeeper's config zoo.cfg.
520. The number of ticks that the initial synchronization phase can take.
521. </description>
522. </property>
523. <property>
524. <name>hbase.zookeeper.property.syncLimit</name>
525. <value>5</value>
526. <description>Property from ZooKeeper's config zoo.cfg.
527. The number of ticks that can pass between sending a request and getting an
528. acknowledgment.
529. </description>
530. </property>
531. <property>
532. <name>hbase.zookeeper.property.dataDir</name>
533. <value>${hbase.tmp.dir}/zookeeper</value>
534. <description>Property from ZooKeeper's config zoo.cfg.
535. The directory where the snapshot is stored.
536. </description>
537. </property>
538. <property>
539. <name>hbase.zookeeper.property.clientPort</name>
540. <value>2181</value>
541. <description>Property from ZooKeeper's config zoo.cfg.
542. The port at which the clients will connect.
543. </description>
544. </property>
545. <property>
546. <name>hbase.zookeeper.property.maxClientCnxns</name>
547. <value>30</value>
548. <description>Property from ZooKeeper's config zoo.cfg.
549. Limit on number of concurrent connections (at the socket level) that a
550. single client, identified by IP address, may make to a single member of
551. the ZooKeeper ensemble. Set high to avoid zk connection issues running
552. standalone and pseudo-distributed.
553. </description>
554. </property>
555. 
556. <property>
557. <name>hbase.rest.port</name>
558. <value>8080</value>
559. <description>The port for the HBase REST server.</description>
560. </property>
561. <property>
562. <name>hbase.rest.readonly</name>
563. <value>false</value>
564. <description>
565. Defines the mode the REST server will be started in. Possible values are:
566. false: All HTTP methods are permitted - GET/PUT/POST/DELETE.
567. true: Only the GET method is permitted.
568. </description>
569. </property>
570. </configuration>
[plain] view plain copy
1. 转自：http://blog.csdn.net/macyang/article/details/6211141
2. 
除了这篇介绍hbase配置的文章之外，在推荐几篇其他的文章：

hbase参数配置及优化（http://blog.csdn.net/huoyunshen88/article/details/9169077）全文如下：

因官方Book Performance Tuning部分章节没有按配置项进行索引，不能达到快速查阅的效果。所以我以配置项驱动，重新整理了原文，并补充一些自己的理解，如有错误，欢迎指正。

配置优化

zookeeper.session.timeout
默认值：3分钟（180000ms）
说明：RegionServer与Zookeeper间的连接超时时间。当超时时间到后，ReigonServer会被Zookeeper从RS集群清单中移除，HMaster收到移除通知后，会对这台server负责的regions重新balance，让其他存活的RegionServer接管.
调优：
这个timeout决定了RegionServer是否能够及时的failover。设置成1分钟或更低，可以减少因等待超时而被延长的failover时间。
不过需要注意的是，对于一些Online应用，RegionServer从宕机到恢复时间本身就很短的（网络闪断，crash等故障，运维可快速介入），如果调低timeout时间，反而会得不偿失。因为当ReigonServer被正式从RS集群中移除时，HMaster就开始做balance了（让其他RS根据故障机器记录的WAL日志进行恢复）。当故障的RS在人工介入恢复后，这个balance动作是毫无意义的，反而会使负载不均匀，给RS带来更多负担。特别是那些固定分配regions的场景。

hbase.zookeeper.quorum
默认值：localhost
说明：hbase所依赖的zookeeper部署
调优：
部署的zookeeper越多，可靠性就越高，但是部署只能部署奇数个，主要为了便于选出leader。最好给每个zookeeper 1G的内存和独立的磁盘，可以确保高性能。hbase.zookeeper.property.dataDir可以修改zookeeper保存数据的路径。

hbase.regionserver.handler.count
默认值：10
说明：RegionServer的请求处理IO线程数。
调优：
这个参数的调优与内存息息相关。
较少的IO线程，适用于处理单次请求内存消耗较高的Big PUT场景（大容量单次PUT或设置了较大cache的scan，均属于Big PUT）或ReigonServer的内存比较紧张的场景。
较多的IO线程，适用于单次请求内存消耗低，TPS要求非常高的场景。设置该值的时候，以监控内存为主要参考。
这里需要注意的是如果server的region数量很少，大量的请求都落在一个region上，因快速充满memstore触发flush导致的读写锁会影响全局TPS，不是IO线程数越高越好。
压测时，开启Enabling RPC-level logging，可以同时监控每次请求的内存消耗和GC的状况，最后通过多次压测结果来合理调节IO线程数。
这里是一个案例?Hadoop and HBase Optimization for Read Intensive Search Applications，作者在SSD的机器上设置IO线程数为100，仅供参考。

hbase.hregion.max.filesize
默认值：256M
说明：在当前ReigonServer上单个Reigon的最大存储空间，单个Region超过该值时，这个Region会被自动split成更小的region。
调优：
小region对split和compaction友好，因为拆分region或compact小region里的storefile速度很快，内存占用低。缺点是split和compaction会很频繁。
特别是数量较多的小region不停地split, compaction，会导致集群响应时间波动很大，region数量太多不仅给管理上带来麻烦，甚至会引发一些Hbase的bug。
一般512以下的都算小region。

大region，则不太适合经常split和compaction，因为做一次compact和split会产生较长时间的停顿，对应用的读写性能冲击非常大。此外，大region意味着较大的storefile，compaction时对内存也是一个挑战。
当然，大region也有其用武之地。如果你的应用场景中，某个时间点的访问量较低，那么在此时做compact和split，既能顺利完成split和compaction，又能保证绝大多数时间平稳的读写性能。

既然split和compaction如此影响性能，有没有办法去掉？
compaction是无法避免的，split倒是可以从自动调整为手动。
只要通过将这个参数值调大到某个很难达到的值，比如100G，就可以间接禁用自动split（RegionServer不会对未到达100G的region做split）。
再配合RegionSplitter这个工具，在需要split时，手动split。
手动split在灵活性和稳定性上比起自动split要高很多，相反，管理成本增加不多，比较推荐online实时系统使用。

内存方面，小region在设置memstore的大小值上比较灵活，大region则过大过小都不行，过大会导致flush时app的IO wait增高，过小则因store file过多影响读性能。

hbase.regionserver.global.memstore.upperLimit/lowerLimit
默认值：0.4/0.35
upperlimit说明：hbase.hregion.memstore.flush.size 这个参数的作用是当单个Region内所有的memstore大小总和超过指定值时，flush该region的所有memstore。RegionServer的flush是通过将请求添加一个队列，模拟生产消费模式来异步处理的。那这里就有一个问题，当队列来不及消费，产生大量积压请求时，可能会导致内存陡增，最坏的情况是触发OOM。
这个参数的作用是防止内存占用过大，当ReigonServer内所有region的memstores所占用内存总和达到heap的40%时，HBase会强制block所有的更新并flush这些region以释放所有memstore占用的内存。
lowerLimit说明：同upperLimit，只不过lowerLimit在所有region的memstores所占用内存达到Heap的35%时，不flush所有的memstore。它会找一个memstore内存占用最大的region，做个别flush，此时写更新还是会被block。lowerLimit算是一个在所有region强制flush导致性能降低前的补救措施。在日志中，表现为 “** Flush thread woke up with memory above low water.”
调优：这是一个Heap内存保护参数，默认值已经能适用大多数场景。
参数调整会影响读写，如果写的压力大导致经常超过这个阀值，则调小读缓存hfile.block.cache.size增大该阀值，或者Heap余量较多时，不修改读缓存大小。
如果在高压情况下，也没超过这个阀值，那么建议你适当调小这个阀值再做压测，确保触发次数不要太多，然后还有较多Heap余量的时候，调大hfile.block.cache.size提高读性能。
还有一种可能性是?hbase.hregion.memstore.flush.size保持不变，但RS维护了过多的region，要知道 region数量直接影响占用内存的大小。

hfile.block.cache.size

默认值：0.2
说明：storefile的读缓存占用Heap的大小百分比，0.2表示20%。该值直接影响数据读的性能。
调优：当然是越大越好，如果写比读少很多，开到0.4-0.5也没问题。如果读写较均衡，0.3左右。如果写比读多，果断默认吧。设置这个值的时候，你同时要参考?hbase.regionserver.global.memstore.upperLimit?，该值是memstore占heap的最大百分比，两个参数一个影响读，一个影响写。如果两值加起来超过80-90%，会有OOM的风险，谨慎设置。

hbase.hstore.blockingStoreFiles
默认值：7
说明：在flush时，当一个region中的Store（Coulmn Family）内有超过7个storefile时，则block所有的写请求进行compaction，以减少storefile数量。
调优：block写请求会严重影响当前regionServer的响应时间，但过多的storefile也会影响读性能。从实际应用来看，为了获取较平滑的响应时间，可将值设为无限大。如果能容忍响应时间出现较大的波峰波谷，那么默认或根据自身场景调整即可。

hbase.hregion.memstore.block.multiplier
默认值：2
说明：当一个region里的memstore占用内存大小超过hbase.hregion.memstore.flush.size两倍的大小时，block该region的所有请求，进行flush，释放内存。
虽然我们设置了region所占用的memstores总内存大小，比如64M，但想象一下，在最后63.9M的时候，我Put了一个200M的数据，此时memstore的大小会瞬间暴涨到超过预期的hbase.hregion.memstore.flush.size的几倍。这个参数的作用是当memstore的大小增至超过hbase.hregion.memstore.flush.size 2倍时，block所有请求，遏制风险进一步扩大。
调优：这个参数的默认值还是比较靠谱的。如果你预估你的正常应用场景（不包括异常）不会出现突发写或写的量可控，那么保持默认值即可。如果正常情况下，你的写请求量就会经常暴长到正常的几倍，那么你应该调大这个倍数并调整其他参数值，比如hfile.block.cache.size和hbase.regionserver.global.memstore.upperLimit/lowerLimit，以预留更多内存，防止HBase server OOM。

hbase.hregion.memstore.mslab.enabled
默认值：true
说明：减少因内存碎片导致的Full GC，提高整体性能。
调优：详见 http://kenwublog.com/avoid-full-gc-in-hbase-using-arena-allocation

hbase.client.scanner.caching
默认值：1
说明：scanner调用next方法一次获取的数据条数
调优：少的RPC是提高hbase执行效率的一种方法，理论上一次性获取越多数据就会越少的RPC，也就越高效。但是内存是最大的障碍。设置这个值的时候要选择合适的大小，一面一次性获取过多数据占用过多内存，造成其他程序使用内存过少。或者造成程序超时等错误（这个超时与hbase.regionserver.lease.period相关）。

hbase.regionserver.lease.period
默认值：60000
说明：客户端租用HRegion server 期限，即超时阀值。
调优：
这个配合hbase.client.scanner.caching使用，如果内存够大，但是取出较多数据后计算过程较长，可能超过这个阈值，适当可设置较长的响应时间以防被认为宕机
相关阅读:
【转】Visual Studio 2008中使用科学计算库GSL
【转】[Python Tip]如何在Windows下方便地进入命令行运行程序
 【转】URL Encoding (URL转义字符)
RDLC报表，纯文字内容，动态数据源解决方案
 C# 数值计算
 标准库List使用注意
 VS2008 运行出现 “无法启动该程序计算机中丢失 MSVCR90D.dll”
数据结构在游戏中的简单应用（转）
SQL SERVER 2005 四种排序函数
 2010.11.30
原文地址：https://www.cnblogs.com/hello-kelly/p/4505064.html

转 hbase参数的意义和调优

配置优化