• mongodb的sharding(分片)实验


    其实sharding也不是什么很新的数据库功能,更不是mongodb独有的,类似的oracle中的功能叫做分区表(partition table),目前多数互联网公司的数据库应该都使用了类似的技术,口头交流中也有人喜欢叫“分库分表”。其功能是,当一个集合(或者说oracle中的表)变的很大之后,可以把这个集合的数据分到2个数据库(即shards)中,这样不会导致数据库变得大到难以维护,同时自动实现读写和负载均衡,这样在集合持续变大的过程中,只需要增大数据库的个数即可。

    mongodb的sharding需要好几个服务,配置服务器(config server),路由进程(route process),片数据库(shards)。config server用来保存sharding的配置信息,在整个sharding中是单点,一般会有多个备份;route process,mongodb中叫mongos,负责与client连接并将数据分布到各个shards上;shards就是存储分片之后的数据。当然,从数据的安全和运行的业务连续性角度来看,shards和route process也应该有多个备份。

    实验中,使用1个config server,1个mongos,2个shards,为了启动进程加快,试验中基本都使用了--nojournal选项。

    首先建立好存储数据文件的目录,并启动config server和mongos,分别监听10000和10100端口

    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/config
    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard1
    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard2
    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/config_log
    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard1_log
    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard2_log
    XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/config --logpath /var/lib/mongodb_sharding/config_log/mongodb.log --port 10000 --nojournal --fork
    forked process: 8430
    all output going to: /var/lib/mongodb_sharding/config_log/mongodb.log
    log file [/var/lib/mongodb_sharding/config_log/mongodb.log] exists; copied to temporary file [/var/lib/mongodb_sharding/config_log/mongodb.log.2014-03-21T08-03-50]
    child process started successfully, parent exiting
    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/mongos_log
    XXXXX@XXXXX-asus:/var/lib$ sudo mongos --bind_ip localhost --port 10100 --logpath /var/lib/mongodb_sharding/mongos_log/mongos.log --configdb localhost:10000 --fork
    forked process: 8709
    all output going to: /var/lib/mongodb_sharding/mongos_log/mongos.log
    log file [/var/lib/mongodb_sharding/mongos_log/mongos.log] exists; copied to temporary file [/var/lib/mongodb_sharding/mongos_log/mongos.log.2014-03-21T08-12-03]
    child process started successfully, parent exiting

    然后启动2个shard,分别监听20000和30000端口。

    XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/shard1 --logpath /var/lib/mongodb_sharding/shard1_log/mongodb.log --port 20000 --nojournal --fork
    forked process: 8834
    all output going to: /var/lib/mongodb_sharding/shard1_log/mongodb.log
    log file [/var/lib/mongodb_sharding/shard1_log/mongodb.log] exists; copied to temporary file [/var/lib/mongodb_sharding/shard1_log/mongodb.log.2014-03-21T08-15-43]
    child process started successfully, parent exiting
    XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/shard2 --logpath /var/lib/mongodb_sharding/shard2_log/mongodb.log --port 30000 --nojournal --fork
    forked process: 8846
    all output going to: /var/lib/mongodb_sharding/shard2_log/mongodb.log
    child process started successfully, parent exiting
    XXXXX@XXXXX-asus:/var/lib$ 

    然后用admin用户登陆mongos,并将2个shard加入到整个分片体系中,

    XXXXX@XXXXX-asus:/var/lib$ mongo localhost:10100
    MongoDB shell version: 2.2.4
    connecting to: localhost:10100/test
    mongos> use admin
    switched to db admin
    mongos> db.printShardingStatus()
    --- Sharding Status --- 
      sharding version: { "_id" : 1, "version" : 3 }
      shards:
      databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    
    mongos> db.runCommand({addshard:"localhost:20000", allowLocal:true})
    { "shardAdded" : "shard0000", "ok" : 1 }
    mongos> db.runCommand({addshard:"localhost:30000", allowLocal:true})
    { "shardAdded" : "shard0001", "ok" : 1 }
    mongos> db.printShardingStatus()
    --- Sharding Status --- 
      sharding version: { "_id" : 1, "version" : 3 }
      shards:
        {  "_id" : "shard0000",  "host" : "localhost:20000" }
        {  "_id" : "shard0001",  "host" : "localhost:30000" }
      databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    
    mongos> 

    一般情况下,如果要做sharding的话,会对某个表的“_id”这个列来做片键(shard key),因为这个是每个集合都有的元素。当然,也可以用其他的列来做片键,不过要对这个列新建索引,否则对集合做shardcollection的时候会报错。本实验中仍然是用"_id"来做片键,实验用的数据库是test_sharding,集合是test_sharding.test。

    mongos> use admin
    switched to db admin
    mongos> db.runCommand({"enablesharding":"test_sharding"})
    { "ok" : 1 }
    mongos> db.runCommand({"shardcollection":"test_sharding.test","key":{"_id":1}})
    { "collectionsharded" : "test_sharding.test", "ok" : 1 }
    mongos> 

    在没有insert任何数据的情况下,test集合的统计信息如下

    mongos> use test_sharding
    switched to db test_sharding
    mongos> db.test.stats()
    {
        "sharded" : true,
        "ns" : "test_sharding.test",
        "count" : 0,
        "numExtents" : 1,
        "size" : 0,
        "storageSize" : 8192,
        "totalIndexSize" : 8176,
        "indexSizes" : {
            "_id_" : 8176
        },
        "avgObjSize" : 0,
        "nindexes" : 1,
        "nchunks" : 1,
        "shards" : {
            "shard0000" : {
                "ns" : "test_sharding.test",
                "count" : 0,
                "size" : 0,
                "storageSize" : 8192,
                "numExtents" : 1,
                "nindexes" : 1,
                "lastExtentSize" : 8192,
                "paddingFactor" : 1,
                "systemFlags" : 1,
                "userFlags" : 0,
                "totalIndexSize" : 8176,
                "indexSizes" : {
                    "_id_" : 8176
                },
                "ok" : 1
            }
        },
        "ok" : 1
    }
    mongos> 

    可以看到sharding已经被启用,但是此时只有一个shard在使用中,然后用脚本插入很多条数据,我是用ruby

    #!/usr/bin/env ruby
    # 20140321, sharding_test.rb
    ###
    # test sharding
    ###
    
    
    require "rubygems"
    require "mongo"
    
    class MongoConnection
      def initialize(host, port)
        @mongoconn = Mongo::MongoClient.new(host, port)
        @db = @mongoconn.db("test_sharding")
      end
    
      def test()
        @coll = @db.collection("test")
    
        @coll.insert({"num"=>0})
        (1..600000).each { |i| @coll.insert({ num:i, num_org:i, create_at:Time.now, txt:"The quick brown fox jumps over the lazy dog." } ) }
        puts @coll.find({num:5000}).explain
    
      end
    
    end
    
    mongo_conn = MongoConnection.new("localhost", 10100)
    
    puts "======TEST WITHOUT INDEX========================"
    mongo_conn.test()

    插入600000条数据之后,集合的统计信息发生了明显的变化

    mongos> db.test.stats()
    {
        "sharded" : true,
        "ns" : "test_sharding.test",
        "count" : 600001,
        "numExtents" : 18,
        "size" : 72000032,
        "storageSize" : 116883456,
        "totalIndexSize" : 19491584,
        "indexSizes" : {
            "_id_" : 19491584
        },
        "avgObjSize" : 119.99985333357778,
        "nindexes" : 1,
        "nchunks" : 6,
        "shards" : {
            "shard0000" : {
                "ns" : "test_sharding.test",
                "count" : 319678,
                "size" : 38361360,
                "avgObjSize" : 120,
                "storageSize" : 58441728,
                "numExtents" : 9,
                "nindexes" : 1,
                "lastExtentSize" : 20643840,
                "paddingFactor" : 1,
                "systemFlags" : 1,
                "userFlags" : 0,
                "totalIndexSize" : 10383520,
                "indexSizes" : {
                    "_id_" : 10383520
                },
                "ok" : 1
            },
            "shard0001" : {
                "ns" : "test_sharding.test",
                "count" : 280323,
                "size" : 33638672,
                "avgObjSize" : 119.99968607641898,
                "storageSize" : 58441728,
                "numExtents" : 9,
                "nindexes" : 1,
                "lastExtentSize" : 20643840,
                "paddingFactor" : 1,
                "systemFlags" : 1,
                "userFlags" : 0,
                "totalIndexSize" : 9108064,
                "indexSizes" : {
                    "_id_" : 9108064
                },
                "ok" : 1
            }
        },
        "ok" : 1
    }
    mongos> 

    可见,2个shard上都被分配到了数据,分配的还算近似均匀。试验中,如果数据少的话,基本看不到sharding的效果,数据都放在shard0000上,但是数据多了之后,sharding的效果就有了。引用别人博客上的一段话,不知道对不对,来自这里(http://blog.sina.com.cn/s/blog_502c8cc40100pdiv.html

    为了测试Sharding的balance效果,我陆续插入了大约200M的数据,插入过程中使用db.stats() 查询数据分布情况。发现在数据量较小,30M以下时,所有trunk都存储在了shard0000上,但继续插入后,数据开始平均分布,并且mongos会对多个shard之间的数据进行rebalance 。在插入数据达到200M,刚插入结束时,shard0000上大约有135M数据,而shard0001上大约有65M数据,但过一段时间之后,shard0000上的数据量减少到了115M,shard0001上的数据量达到了85M。

    实验中发现,mongodb对shard的选择有某种记忆性,比如,写入10000条数据,最后第10000条数据是写在shard0001上的,此时用remove删除所有数据,再次写数据的时候,发现mongodb是从shard0001开始写,而不是shard0000。还有一个比较恶心的地方,mongodb似乎不允许更改shard key,即时是空集合也不行,至少我没有找到相关的方法。

    insert数据之后可以发现sharding status发生变化

    mongos> db.printShardingStatus()
    --- Sharding Status --- 
      sharding version: { "_id" : 1, "version" : 3 }
      shards:
        {  "_id" : "shard0000",  "host" : "localhost:20000" }
        {  "_id" : "shard0001",  "host" : "localhost:30000" }
      databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
        {  "_id" : "test_sharding",  "partitioned" : true,  "primary" : "shard0000" }
            test_sharding.test chunks:
                    shard0000    3
                    shard0001    3
                { "_id" : { $minKey : 1 } } -->> { "_id" : ObjectId("532c3064cf7c7c2f53000001") } on : shard0000 Timestamp(5000, 0) 
                { "_id" : ObjectId("532c3064cf7c7c2f53000001") } -->> { "_id" : ObjectId("532c306acf7c7c2f530012d9") } on : shard0001 Timestamp(5000, 1) 
                { "_id" : ObjectId("532c306acf7c7c2f530012d9") } -->> { "_id" : ObjectId("532c310acf7c7c2f530290e0") } on : shard0000 Timestamp(4000, 1) 
                { "_id" : ObjectId("532c310acf7c7c2f530290e0") } -->> { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } on : shard0000 Timestamp(3000, 2) 
                { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } -->> { "_id" : ObjectId("532c3234cf7c7c2f53075090") } on : shard0001 Timestamp(4000, 2) 
                { "_id" : ObjectId("532c3234cf7c7c2f53075090") } -->> { "_id" : { $maxKey : 1 } } on : shard0001 Timestamp(4000, 3) 
    
    mongos> 

    此时,还可以再增加一个shard,操作和之前类似

    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard3
    XXXXX@XXXXX-asus:/var/lib$ sudo mkdir -p mongodb_sharding/shard3_log
    XXXXX@XXXXX-asus:/var/lib$ sudo mongod --dbpath /var/lib/mongodb_sharding/shard3 --logpath /var/lib/mongodb_sharding/shard3_log/mongodb.log --port 40000 --nojournal --fork
    forked process: 12525
    all output going to: /var/lib/mongodb_sharding/shard3_log/mongodb.log
    child process started successfully, parent exiting
    XXXXX@XXXXX-asus:/var/lib$ mongo localhost:10100
    MongoDB shell version: 2.2.4
    connecting to: localhost:10100/test
    mongos> use admin
    switched to db admin
    mongos> db.runCommand({addshard:"localhost:40000", allowLocal:true})
    { "shardAdded" : "shard0002", "ok" : 1 }
    mongos> 

    加入新的shard之后,数据会自动飘移到新的shard上,可以查看集合的统计信息

    mongos> db.test.stats()
    {
        "sharded" : true,
        "ns" : "test_sharding.test",
        "count" : 600001,
        "numExtents" : 22,
        "size" : 72000032,
        "storageSize" : 117579776,
        "totalIndexSize" : 19499760,
        "indexSizes" : {
            "_id_" : 19499760
        },
        "avgObjSize" : 119.99985333357778,
        "nindexes" : 1,
        "nchunks" : 6,
        "shards" : {
            "shard0000" : {
                "ns" : "test_sharding.test",
                "count" : 319678,
                "size" : 38361360,
                "avgObjSize" : 120,
                "storageSize" : 58441728,
                "numExtents" : 9,
                "nindexes" : 1,
                "lastExtentSize" : 20643840,
                "paddingFactor" : 1,
                "systemFlags" : 1,
                "userFlags" : 0,
                "totalIndexSize" : 10383520,
                "indexSizes" : {
                    "_id_" : 10383520
                },
                "ok" : 1
            },
            "shard0001" : {
                "ns" : "test_sharding.test",
                "count" : 275499,
                "size" : 33059880,
                "avgObjSize" : 120,
                "storageSize" : 58441728,
                "numExtents" : 9,
                "nindexes" : 1,
                "lastExtentSize" : 20643840,
                "paddingFactor" : 1,
                "systemFlags" : 1,
                "userFlags" : 0,
                "totalIndexSize" : 8952720,
                "indexSizes" : {
                    "_id_" : 8952720
                },
                "ok" : 1
            },
            "shard0002" : {
                "ns" : "test_sharding.test",
                "count" : 4824,
                "size" : 578792,
                "avgObjSize" : 119.98175787728026,
                "storageSize" : 696320,
                "numExtents" : 4,
                "nindexes" : 1,
                "lastExtentSize" : 524288,
                "paddingFactor" : 1,
                "systemFlags" : 1,
                "userFlags" : 0,
                "totalIndexSize" : 163520,
                "indexSizes" : {
                    "_id_" : 163520
                },
                "ok" : 1
            }
        },
        "ok" : 1
    }
    mongos> 

    用admin账户可以看新的sharding status

    mongos> db.printShardingStatus()
    --- Sharding Status --- 
      sharding version: { "_id" : 1, "version" : 3 }
      shards:
        {  "_id" : "shard0000",  "host" : "localhost:20000" }
        {  "_id" : "shard0001",  "host" : "localhost:30000" }
        {  "_id" : "shard0002",  "host" : "localhost:40000" }
      databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
        {  "_id" : "test_sharding",  "partitioned" : true,  "primary" : "shard0000" }
            test_sharding.test chunks:
                    shard0002    2
                    shard0000    2
                    shard0001    2
                { "_id" : { $minKey : 1 } } -->> { "_id" : ObjectId("532c3064cf7c7c2f53000001") } on : shard0002 Timestamp(6000, 0) 
                { "_id" : ObjectId("532c3064cf7c7c2f53000001") } -->> { "_id" : ObjectId("532c306acf7c7c2f530012d9") } on : shard0002 Timestamp(7000, 0) 
                { "_id" : ObjectId("532c306acf7c7c2f530012d9") } -->> { "_id" : ObjectId("532c310acf7c7c2f530290e0") } on : shard0000 Timestamp(6000, 1) 
                { "_id" : ObjectId("532c310acf7c7c2f530290e0") } -->> { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } on : shard0000 Timestamp(3000, 2) 
                { "_id" : ObjectId("532c31a1cf7c7c2f5304f397") } -->> { "_id" : ObjectId("532c3234cf7c7c2f53075090") } on : shard0001 Timestamp(7000, 1) 
                { "_id" : ObjectId("532c3234cf7c7c2f53075090") } -->> { "_id" : { $maxKey : 1 } } on : shard0001 Timestamp(4000, 3) 
    
    mongos> 

    可以看到,新增的shard没有很好的去平均所有的数据,而是只有很少的一部分数据飘移到了新的shard中,当然,这可能也是因为数据不足够多造成的。

  • 相关阅读:
    吴裕雄--天生自然python学习笔记:python 用 Tesseract 识别验证码
    吴裕雄--天生自然python学习笔记:python安装配置tesseract-ocr-setup-3.05.00dev.exe
    吴裕雄--天生自然python学习笔记:python 用 Open CV通过人脸识别进行登录
    吴裕雄--天生自然python学习笔记:python 用 Open CV抓取摄像头视频图像
    HDU4278 Faulty Odometerd
    最大流 总结
    HDU1411 欧拉四面体
    HDU3336 Count the string
    HDU1711
    HDU2203 亲和串
  • 原文地址:https://www.cnblogs.com/valleylord/p/3616072.html
Copyright © 2020-2023  润新知