HBase BlockCache - 润新知

HBase BlockCache
1. Cache 读写
调用逻辑：
hmaster.handleCreateTable->HRegion.createHRegion-> HRegion. initialize->initializeRegionInternals->instantiateHStore
->Store.Store->new CacheConfig(conf, family)-> CacheConfig.instantiateBlockCache->new LruBlockCache
传入参数
Java代码
1. /**
2. * Configurable constructor. Use this constructor if not using defaults.
3. * @param maxSize maximum size of this cache, in bytes
4. * @param blockSize expected average size of blocks, in bytes
5. * @param evictionThread whether to run evictions in a bg thread or not
6. * @param mapInitialSize initial size of backing ConcurrentHashMap
7. * @param mapLoadFactor initial load factor of backing ConcurrentHashMap
8. * @param mapConcurrencyLevel initial concurrency factor for backing CHM
9. * @param minFactor percentage of total size that eviction will evict until
10. * @param acceptableFactor percentage of total size that triggers eviction
11. * @param singleFactor percentage of total size for single-access blocks
12. * @param multiFactor percentage of total size for multiple-access blocks
13. * @param memoryFactor percentage of total size for in-memory blocks
14. */
15. public LruBlockCache(long maxSize, long blockSize, boolean evictionThread,
16. int mapInitialSize, float mapLoadFactor, int mapConcurrencyLevel,
17. float minFactor, float acceptableFactor,
18. float singleFactor, float multiFactor, float memoryFactor)
new LruBlockCache时除了设置默认的参数外，还会创建evictionThread并wait和一个定时打印的线程StatisticsThread

当执行HFileReaderV2的readBlock时，会先看判断是否开户了Cache ，如果开启，则使用cache中block
Java代码
1. // Check cache for block. If found return.
2. if (cacheConf.isBlockCacheEnabled()) {
3. // Try and get the block from the block cache. If the useLock variable is true then this
4. // is the second time through the loop and it should not be counted as a block cache miss.
5. HFileBlock cachedBlock = (HFileBlock)
6. cacheConf.getBlockCache().getBlock(cacheKey, cacheBlock, useLock);
7. if (cachedBlock != null) {
8. BlockCategory blockCategory =
9. cachedBlock.getBlockType().getCategory();
11. getSchemaMetrics().updateOnCacheHit(blockCategory, isCompaction);
13. if (cachedBlock.getBlockType() == BlockType.DATA) {
14. HFile.dataBlockReadCnt.incrementAndGet();
15. }
17. validateBlockType(cachedBlock, expectedBlockType);
19. // Validate encoding type for encoded blocks. We include encoding
20. // type in the cache key, and we expect it to match on a cache hit.
21. if (cachedBlock.getBlockType() == BlockType.ENCODED_DATA &&
22. cachedBlock.getDataBlockEncoding() !=
23. dataBlockEncoder.getEncodingInCache()) {
24. throw new IOException(“Cached block under key ” + cacheKey + “ ” +
25. “has wrong encoding: ” + cachedBlock.getDataBlockEncoding() +
26. “ (expected: ” + dataBlockEncoder.getEncodingInCache() + “)”);
27. }
28. return cachedBlock;
29. }
30. // Carry on, please load.
31. }
在getBlock方法中，会更新一些统计数据，重要的时更新
Java代码
1. BlockPriority.SINGLE为BlockPriority.MULTI
2. public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat) {
3. CachedBlock cb = map.get(cacheKey);
4. if(cb == null) {
5. if (!repeat) stats.miss(caching);
6. return null;
7. }
8. stats.hit(caching);
9. cb.access(count.incrementAndGet());
10. return cb.getBuffer();
11. }
———————
若是第一次读，则将block加入Cache.
Java代码
1. // Cache the block if necessary
2. if (cacheBlock && cacheConf.shouldCacheBlockOnRead(
3. hfileBlock.getBlockType().getCategory())) {
4. cacheConf.getBlockCache().cacheBlock(cacheKey, hfileBlock,
5. cacheConf.isInMemory());
6. }
2. LRU evict

写入cache时就是将block加入到一个 ConcurrentHashMap中，并更新Metrics,之后判断if(newSize > acceptableSize() && !evictionInProgress), acceptableSize是初始化时给的值(long)Math.floor(this.maxSize * this.acceptableFactor)，acceptableFactor是一个百分比，是可以配置的：”hbase.lru.blockcache.acceptable.factor”(0.85f)，这里的意思就是判断总Size是不是大于这个值，如果大于并且没有正在执行的eviction线程，那么就执行evict。
Java代码
1. /**
2. * Cache the block with the specified name and buffer.
3. * <p>
4. * It is assumed this will NEVER be called on an already cached block. If
5. * that is done, an exception will be thrown.
6. * @param cacheKey block’s cache key
7. * @param buf block buffer
8. * @param inMemory if block is in-memory
9. */
10. public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) {
11. CachedBlock cb = map.get(cacheKey);
12. if(cb != null) {
13. throw new RuntimeException(“Cached an already cached block”);
14. }
15. cb = new CachedBlock(cacheKey, buf, count.incrementAndGet(), inMemory);
16. long newSize = updateSizeMetrics(cb, false);
17. map.put(cacheKey, cb);
18. elements.incrementAndGet();
19. if(newSize > acceptableSize() && !evictionInProgress) {
20. runEviction();
21. }
22. }
在evict方法中，
1. 计算总size和需要free的size, minsize = (long)Math.floor(this.maxSize * this.minFactor);其中minFactor是可配置的”hbase.lru.blockcache.min.factor”（0.75f）;
Java代码
1. long currentSize = this.size.get();
2. long bytesToFree = currentSize - minSize();
2. 初始化三种BlockBucket：bucketSingle，bucketMulti，bucketMemory并遍历map,按照三种类型分别add进各自的queue（MinMaxPriorityQueue.expectedSize(initialSize).create();）中，并按照访问的次数逆序。
三种类型的区别是：
    SINGLE对应第一次读的
    MULTI对应多次读
    MEMORY是设定column family中的IN_MEMORY为true的
Java代码
1. // Instantiate priority buckets
2. BlockBucket bucketSingle = new BlockBucket(bytesToFree, blockSize,
3. singleSize());
4. BlockBucket bucketMulti = new BlockBucket(bytesToFree, blockSize,
5. multiSize());
6. BlockBucket bucketMemory = new BlockBucket(bytesToFree, blockSize,
7. memorySize());
其中三种BlockBuckt Size大小分配比例默认是：
static final float DEFAULT_SINGLE_FACTOR = 0.25f;
static final float DEFAULT_MULTI_FACTOR = 0.50f;
static final float DEFAULT_MEMORY_FACTOR = 0.25f;
Java代码
1. private long singleSize() {
2. return (long)Math.floor(this.maxSize * this.singleFactor * this.minFactor);
3. }
4. private long multiSize() {
5. return (long)Math.floor(this.maxSize * this.multiFactor * this.minFactor);
6. }
7. private long memorySize() {
8. return (long)Math.floor(this.maxSize * this.memoryFactor * this.minFactor);
9. }
并将三种BlockBuckt 加入到优先队列中，按照totalSize – bucketSize排序，,再计算需要free大小，执行free：
Java代码
1. PriorityQueue<BlockBucket> bucketQueue =
2. new PriorityQueue<BlockBucket>(3);
4. bucketQueue.add(bucketSingle);
5. bucketQueue.add(bucketMulti);
6. bucketQueue.add(bucketMemory);
8. int remainingBuckets = 3;
9. long bytesFreed = 0;
11. BlockBucket bucket;
12. while((bucket = bucketQueue.poll()) != null) {
13. long overflow = bucket.overflow();
14. if(overflow > 0) {
15. long bucketBytesToFree = Math.min(overflow,
16. (bytesToFree - bytesFreed) / remainingBuckets);
17. bytesFreed += bucket.free(bucketBytesToFree);
18. }
19. remainingBuckets–;
20. }
free方法中一个一个取出queue中block，由于是按照访问次数逆序，所以从后面取出就是先取出访问次数少的，将其在map中一个一个remove，并更新Mertrics.
Java代码
1. public long free(long toFree) {
2. CachedBlock cb;
3. long freedBytes = 0;
4. while ((cb = queue.pollLast()) != null) {
5. freedBytes += evictBlock(cb);
6. if (freedBytes >= toFree) {
7. return freedBytes;
8. }
9. }
10. return freedBytes;
11. }
16. otected long evictBlock(CachedBlock block) {
17. map.remove(block.getCacheKey());
18. updateSizeMetrics(block, true);
19. elements.decrementAndGet();
20. stats.evicted();
21. return block.heapSize();
3. HBase LruBlockCache的特点是针对不同的访问次数使用不同的策略，避免频繁的更新的Cache（便如Scan）,这样更加有利于提高读的性能。
相关阅读:
springboot项目打war包流程
 ant配置文件详解(转)
如何提升java服务器并发性能
 find用法
 基姆拉尔森计算公式推导计算星期几
 递归第二弹：分类强化
 拨钟问题
 POJ1222熄灯问题【位运算+枚举】
POJ1013称硬币【枚举】
4148生理周期
原文地址：https://www.cnblogs.com/cl1024cl/p/6205202.html