MongoDB中merge空的chunks

chunk的维护

我们知道，MongoDB中有一个autoSplitter进程，在chunk变得太大的时候，就会对chunk进行分区。同时还有一个balancer进程，负责移动chunks，保证在分片之间平均分布。所以，随着数据的增长，chunks会被分区，并可能被移动到别的分片上。

但是，在我们删除数据的时候，会发生什么呢？有些chunks可能就变的很空。如果我们删除了很多数据，就会有一定数量的chunks变得很空。这对于带有TTL索引的分片集合就会是个很大的问题。

潜在的问题

其中一个潜在的问题就是，当有大量的空的chunks时，分片环境数据的分布就会变得不平衡。balancer进程会保证每个分片上的chunks数量是均衡的，但是balancer进程并没有考虑到空的chunks的场景。所以，你可能会遇到，整个集群看似是均衡的，但实际上不是，有些分片上的数据会远远多于别的分片。

为了解决这个问题，首先要找出哪些chunks是空的。

找出空的chunks

假设有个集合通过org_id进行分片。并假设结合目前的chunks为：

minKey –> 11 -–> 55 —-> 1010 –> 1515 —-> 20….

我们可以使用dataSize命令来检测chunk的大小。

例如，检查第三个chunk上有多少的documents，我们可以：

db.runCommand({ dataSize: "mydatabase.clients", keyPattern: { org_id: 1 }, min: { org_id: 5 }, max: { org_id: 10 } })

会返回类似下面的结果：

{"size" : 0,"numObjects" : 0,"millis" : 30,"ok" : 1,"operationTime" : Timestamp(1641829163, 2),"$clusterTime" : {"clusterTime" : Timestamp(1641829163, 3),"signature" : {"hash" : BinData(0,"LbBPsTEahzG/v7I6oe7iyvLr/pU="),"keyId" : NumberLong("7016744225173049401")        }    }}

如果size是0，我们就可以知道，这是空的chunk。我们就可以考虑将该空的chunk和其后面(range 10 → 15)或者前面的(range 1 → 5)chunk进行合并了。

合并chunks

假设我们将其和后面的chunk进行合并：

db.adminCommand( {mergeChunks: "database.collection",bounds: [ { "field" : "5" },             { "field" : "15" } ]} )

新的chunks范围就会变成：

minKey –> 11 —-> 55 —-> 1515 —-> 20….

如果我们要merge的两个chunks不在同一个分片，那么，我们要先执行moveChunk操作。

合并所有的chunks

按照上面的逻辑，我们可以根据分片键的顺序，迭代检查所有的chunks，检查他们的大小。如果我们发现空的chunks，就和该chunk的上一个chunk进行merge。如果不是在相同的分片上，就先移动到一起。下面的脚本就可以输出所有需要的命令：

var mergeChunkInfo = function(ns){var chunks = db.getSiblingDB("config").chunks.find({"ns" : ns}).sort({min:1}).noCursorTimeout(); //some counters for overall stats at the endvar totalChunks = 0;var totalMerges = 0;var totalMoves = 0;var previousChunk = {};var previousChunkInfo = {};var ChunkJustChanged = false;    chunks.forEach(         function printChunkInfo(currentChunk) { var db1 = db.getSiblingDB(currentChunk.ns.split(".")[0]) var key = db.getSiblingDB("config").collections.findOne({_id:currentChunk.ns}).key;         db1.getMongo().setReadPref("secondary");var currentChunkInfo = db1.runCommand({datasize:currentChunk.ns, keyPattern:key, min:currentChunk.min, max:currentChunk.max, estimate:true });        totalChunks++;// if the current chunk is empty and the chunk before it was not merged in the previous iteration (or was the first chunk) we have candidates for mergingif(currentChunkInfo.size == 0 && !ChunkJustChanged) {     // if the chunks are contiguousif(JSON.stringify(previousChunk.max) == JSON.stringify(currentChunk.min) ) {// if they belong to the same shard, merge with the previous chunkif(previousChunk.shard.toString() == currentChunk.shard.toString() ) {print('db.runCommand( { mergeChunks: "' + currentChunk.ns.toString() + '",' + ' bounds: [ ' + JSON.stringify(previousChunk.min) + ',' + JSON.stringify(currentChunk.max) + ' ] })');// after a merge or move, we don't consider the current chunk for the next iteration. We skip to the next chunk. ChunkJustChanged=true;              totalMerges++;            } // if they contiguous but are on different shards, we need to have both chunks to the same shard before merging, so move the current one and don't merge for nowelse {              print('db.runCommand( { moveChunk: "' + currentChunk.ns.toString() + '",' + ' bounds: [ ' + JSON.stringify(currentChunk.min) + ',' + JSON.stringify(currentChunk.max) + ' ], to: "' + previousChunk.shard.toString() + '" });');// after a merge or move, we don't consider the current chunk for the next iteration. We skip to the next chunk. ChunkJustChanged=true;              totalMoves++;                        }          }else {// chunks are not contiguous (this shouldn't happen unless this is the first iteration)            previousChunk=currentChunk;            previousChunkInfo=currentChunkInfo;ChunkJustChanged=false;           }                  }else {// if the current chunk is not empty or we already operated with the previous chunk let's continue with the next chunk pair          previousChunk=currentChunk;          previousChunkInfo=currentChunkInfo;ChunkJustChanged=false;         }      }    )print("***********Summary Chunk Information***********");print("Total Chunks: "+totalChunks);print("Total Move Commands to Run: "+totalMoves);print("Total Merge Commands to Run: "+totalMerges);}

可以在mongo shell中执行：

mergeChunkInfo("mydb.mycollection")

这个脚本会生成需要merge chunks需要的所有命令。运行生成的命令之后，会将空的chunks的数量减半，多次执行就会逐渐减少了空的chunks。

最后

很多人都意识到了巨大的chunk的问题，现在我们也看到了空的chunk在某些场景下也会产生问题。在执行修改chunk的时候(比如对chunks进行merge)，最好停止balancer进程。这样不会产生冲突。别忘了，操作结束后重启开启balancer进程。

相关阅读:
【276】◀▶ Python 字符串函数说明
 Spring事务配置的五种方式巨全！不看后悔，一看必懂！
Android Developers:两个视图渐变
 《Linux命令行与shell脚本编程大全》第二十七章学习笔记
 Android的TextView与Html相结合的用法
 嵌入式C语言优化小技巧
 vxworks获取系统时间编程
 【算法与数据结构】在n个数中取第k大的数（基础篇）
字符集转换字符类型转换 utf-8 gb2312 url
java 从零开始，学习笔记之基础入门<Oracle_基础>（三十三)
原文地址：https://www.cnblogs.com/abclife/p/15968077.html