• 使用mongojavadriver将10万以上的Mongodb数据document从A全量拷贝迁移到B方案优化提速踩坑


    首先,你需要对 MongoDBdatabasecollectiondocument 有一个大致了解。
    其中 MongoDB 中的 collectiondocument 分别对应 SQL 中的 tablerow 的概念。了解更多

    需要用到的依赖:

    <!-- mongodb jdbc driver -->
    <dependency>
        <groupId>org.mongodb</groupId>
        <artifactId>mongo-java-driver</artifactId>
        <version>3.4.3</version>
    </dependency>
    

    接着就是如何使用 API 来读取 Document => 了解更多

    有了这些基础知识之后,我就来说说我踩的坑。

    分页查询越来越慢!

    首先,考虑到 10W 肯定不能一次性取出来并存储到List中,否则内存会爆炸,所以准备采取分页的方式,skiplimit 正好可以帮助我实现分页,代码如下:

    private List<Document> page(MongoCollection<Document> collection, int count, int pageSize) {
        List<Document> result = new ArrayList<>();
        long beginTime = System.currentTimeMillis();
        FindIterable<Document> documents = collection.find().skip(count).limit(pageSize);
        try (MongoCursor<Document> cursor = documents.iterator()) {
            while (cursor.hasNext()) {
                result.add(cursor.next());
            }
        }
        long duration = System.currentTimeMillis()-beginTime;
        log.info("It takes {} ms to page from {} - {}", duration, count, count + result.size() - 1);
        return result;
    }
    

    然后,调用的代码如下:

    long total = collection.count();
    log.info("The collection {} contains {} documents.", collectionName, total);
    
    int count = 0; // 单个集合已经处理的数量
    boolean hasMore = true;
    while (hasMore) {
      List<Document> documents = page(collection, count, 500);
    
      // ... 处理查出来的 Document 列表,比如插入新库。
    
      hasMore = documents.size() == 500; 
      count += documents.size();
    }
    
    但是,这方法却有问题,点击展开日志
    
    The collection test_big_data contains 100002 documents.
    It takes 594 ms to page from 0 - 499
    It takes 554 ms to page from 500 - 999
    It takes 549 ms to page from 1000 - 1499
    It takes 565 ms to page from 1500 - 1999
    It takes 561 ms to page from 2000 - 2499
    It takes 583 ms to page from 2500 - 2999
    It takes 583 ms to page from 3000 - 3499
    It takes 596 ms to page from 3500 - 3999
    It takes 595 ms to page from 4000 - 4499
    It takes 615 ms to page from 4500 - 4999
    It takes 614 ms to page from 5000 - 5499
    It takes 632 ms to page from 5500 - 5999
    It takes 653 ms to page from 6000 - 6499
    It takes 653 ms to page from 6500 - 6999
    It takes 645 ms to page from 7000 - 7499
    It takes 669 ms to page from 7500 - 7999
    It takes 685 ms to page from 8000 - 8499
    It takes 671 ms to page from 8500 - 8999
    It takes 695 ms to page from 9000 - 9499
    It takes 706 ms to page from 9500 - 9999
    It takes 692 ms to page from 10000 - 10499
    It takes 719 ms to page from 10500 - 10999
    It takes 709 ms to page from 11000 - 11499
    It takes 722 ms to page from 11500 - 11999
    It takes 739 ms to page from 12000 - 12499
    It takes 749 ms to page from 12500 - 12999
    It takes 768 ms to page from 13000 - 13499
    It takes 755 ms to page from 13500 - 13999
    It takes 770 ms to page from 14000 - 14499
    It takes 795 ms to page from 14500 - 14999
    It takes 797 ms to page from 15000 - 15499
    It takes 836 ms to page from 15500 - 15999
    It takes 809 ms to page from 16000 - 16499
    It takes 831 ms to page from 16500 - 16999
    It takes 843 ms to page from 17000 - 17499
    It takes 875 ms to page from 17500 - 17999
    It takes 910 ms to page from 18000 - 18499
    It takes 872 ms to page from 18500 - 18999
    It takes 937 ms to page from 19000 - 19499
    It takes 898 ms to page from 19500 - 19999
    It takes 913 ms to page from 20000 - 20499
    It takes 926 ms to page from 20500 - 20999
    It takes 966 ms to page from 21000 - 21499
    It takes 970 ms to page from 21500 - 21999
    It takes 957 ms to page from 22000 - 22499
    It takes 989 ms to page from 22500 - 22999
    It takes 1009 ms to page from 23000 - 23499
    It takes 1011 ms to page from 23500 - 23999
    It takes 1031 ms to page from 24000 - 24499
    It takes 1038 ms to page from 24500 - 24999
    It takes 1066 ms to page from 25000 - 25499
    It takes 1068 ms to page from 25500 - 25999
    It takes 1085 ms to page from 26000 - 26499
    It takes 1123 ms to page from 26500 - 26999
    It takes 1111 ms to page from 27000 - 27499
    It takes 1109 ms to page from 27500 - 27999
    It takes 1159 ms to page from 28000 - 28499
    It takes 1134 ms to page from 28500 - 28999
    It takes 1144 ms to page from 29000 - 29499
    It takes 1152 ms to page from 29500 - 29999
    It takes 1165 ms to page from 30000 - 30499
    It takes 1179 ms to page from 30500 - 30999
    It takes 1216 ms to page from 31000 - 31499
    It takes 1247 ms to page from 31500 - 31999
    It takes 1230 ms to page from 32000 - 32499
    It takes 1250 ms to page from 32500 - 32999
    It takes 1283 ms to page from 33000 - 33499
    It takes 1264 ms to page from 33500 - 33999
    It takes 1301 ms to page from 34000 - 34499
    It takes 1251 ms to page from 34500 - 34999
    It takes 1297 ms to page from 35000 - 35499
    It takes 1316 ms to page from 35500 - 35999
    It takes 1327 ms to page from 36000 - 36499
    It takes 1348 ms to page from 36500 - 36999
    It takes 1359 ms to page from 37000 - 37499
    It takes 1343 ms to page from 37500 - 37999
    It takes 1363 ms to page from 38000 - 38499
    It takes 1402 ms to page from 38500 - 38999
    It takes 1351 ms to page from 39000 - 39499
    It takes 1410 ms to page from 39500 - 39999
    It takes 1407 ms to page from 40000 - 40499
    It takes 1400 ms to page from 40500 - 40999
    It takes 1426 ms to page from 41000 - 41499
    It takes 1405 ms to page from 41500 - 41999
    It takes 1443 ms to page from 42000 - 42499
    It takes 1474 ms to page from 42500 - 42999
    It takes 1459 ms to page from 43000 - 43499
    It takes 1446 ms to page from 43500 - 43999
    It takes 1519 ms to page from 44000 - 44499
    It takes 1537 ms to page from 44500 - 44999
    It takes 1579 ms to page from 45000 - 45499
    It takes 1506 ms to page from 45500 - 45999
    It takes 1563 ms to page from 46000 - 46499
    It takes 1572 ms to page from 46500 - 46999
    It takes 1602 ms to page from 47000 - 47499
    It takes 1623 ms to page from 47500 - 47999
    It takes 1639 ms to page from 48000 - 48499
    It takes 1633 ms to page from 48500 - 48999
    It takes 1613 ms to page from 49000 - 49499
    It takes 1661 ms to page from 49500 - 49999
    It takes 1641 ms to page from 50000 - 50499
    It takes 1677 ms to page from 50500 - 50999
    It takes 1635 ms to page from 51000 - 51499
    It takes 1729 ms to page from 51500 - 51999
    It takes 1741 ms to page from 52000 - 52499
    It takes 1700 ms to page from 52500 - 52999
    It takes 1747 ms to page from 53000 - 53499
    It takes 1703 ms to page from 53500 - 53999
    It takes 1736 ms to page from 54000 - 54499
    It takes 1725 ms to page from 54500 - 54999
    It takes 1766 ms to page from 55000 - 55499
    It takes 1849 ms to page from 55500 - 55999
    It takes 1837 ms to page from 56000 - 56499
    It takes 1836 ms to page from 56500 - 56999
    It takes 1817 ms to page from 57000 - 57499
    It takes 1845 ms to page from 57500 - 57999
    It takes 1870 ms to page from 58000 - 58499
    It takes 1857 ms to page from 58500 - 58999
    It takes 1920 ms to page from 59000 - 59499
    It takes 1884 ms to page from 59500 - 59999
    It takes 1874 ms to page from 60000 - 60499
    It takes 1876 ms to page from 60500 - 60999
    It takes 1895 ms to page from 61000 - 61499
    It takes 1958 ms to page from 61500 - 61999
    It takes 1917 ms to page from 62000 - 62499
    It takes 1914 ms to page from 62500 - 62999
    It takes 1890 ms to page from 63000 - 63499
    It takes 1943 ms to page from 63500 - 63999
    It takes 1956 ms to page from 64000 - 64499
    It takes 2021 ms to page from 64500 - 64999
    It takes 1984 ms to page from 65000 - 65499
    It takes 1972 ms to page from 65500 - 65999
    It takes 1992 ms to page from 66000 - 66499
    It takes 1959 ms to page from 66500 - 66999
    It takes 1997 ms to page from 67000 - 67499
    It takes 2084 ms to page from 67500 - 67999
    It takes 2148 ms to page from 68000 - 68499
    It takes 2159 ms to page from 68500 - 68999
    It takes 2185 ms to page from 69000 - 69499
    It takes 2171 ms to page from 69500 - 69999
    It takes 2053 ms to page from 70000 - 70499
    It takes 2109 ms to page from 70500 - 70999
    It takes 2380 ms to page from 71000 - 71499
    It takes 2126 ms to page from 71500 - 71999
    It takes 2183 ms to page from 72000 - 72499
    It takes 2186 ms to page from 72500 - 72999
    It takes 2215 ms to page from 73000 - 73499
    It takes 2160 ms to page from 73500 - 73999
    It takes 2259 ms to page from 74000 - 74499
    It takes 2178 ms to page from 74500 - 74999
    It takes 2231 ms to page from 75000 - 75499
    It takes 2273 ms to page from 75500 - 75999
    It takes 2259 ms to page from 76000 - 76499
    It takes 2323 ms to page from 76500 - 76999
    It takes 2293 ms to page from 77000 - 77499
    It takes 2302 ms to page from 77500 - 77999
    It takes 2274 ms to page from 78000 - 78499
    It takes 2379 ms to page from 78500 - 78999
    It takes 2358 ms to page from 79000 - 79499
    It takes 2384 ms to page from 79500 - 79999
    It takes 2290 ms to page from 80000 - 80499
    It takes 2324 ms to page from 80500 - 80999
    It takes 2416 ms to page from 81000 - 81499
    It takes 2650 ms to page from 81500 - 81999
    It takes 2545 ms to page from 82000 - 82499
    It takes 2468 ms to page from 82500 - 82999
    It takes 2388 ms to page from 83000 - 83499
    It takes 2468 ms to page from 83500 - 83999
    It takes 2565 ms to page from 84000 - 84499
    It takes 2492 ms to page from 84500 - 84999
    It takes 2554 ms to page from 85000 - 85499
    It takes 2520 ms to page from 85500 - 85999
    It takes 2523 ms to page from 86000 - 86499
    It takes 2585 ms to page from 86500 - 86999
    It takes 2540 ms to page from 87000 - 87499
    It takes 2555 ms to page from 87500 - 87999
    It takes 2592 ms to page from 88000 - 88499
    It takes 2585 ms to page from 88500 - 88999
    It takes 2647 ms to page from 89000 - 89499
    It takes 2536 ms to page from 89500 - 89999
    It takes 2519 ms to page from 90000 - 90499
    It takes 2582 ms to page from 90500 - 90999
    It takes 2519 ms to page from 91000 - 91499
    It takes 2567 ms to page from 91500 - 91999
    It takes 2582 ms to page from 92000 - 92499
    It takes 2568 ms to page from 92500 - 92999
    It takes 2734 ms to page from 93000 - 93499
    It takes 2736 ms to page from 93500 - 93999
    It takes 2648 ms to page from 94000 - 94499
    It takes 2850 ms to page from 94500 - 94999
    It takes 2664 ms to page from 95000 - 95499
    It takes 2714 ms to page from 95500 - 95999
    It takes 2653 ms to page from 96000 - 96499
    It takes 2696 ms to page from 96500 - 96999
    It takes 2768 ms to page from 97000 - 97499
    It takes 2755 ms to page from 97500 - 97999
    It takes 2776 ms to page from 98000 - 98499
    It takes 2767 ms to page from 98500 - 98999
    It takes 2888 ms to page from 99000 - 99499
    It takes 2814 ms to page from 99500 - 99999
    It takes 2366 ms to page from 100000 - 100001
    

    通过观察打印日志发现,分页查数据的速度越来越慢,不符合我预期的每段数据查询时间相同。

    只创建一个游标

    因此,我换了另一种查询方式进行尝试:

    long total = collection.count();
    log.info("The collection {} contains {} documents.", collectionName, total);
    /*
     * 检索所有文档
     * 1. 获取迭代器FindIterable<Document>
     * 2. 获取游标MongoCursor<Document>
     * 3. 通过游标遍历检索出的文档集合
     */
    int count = 0;
    FindIterable<Document> documents = collection.find();
    try (MongoCursor<Document> cursor = documents.iterator()) {
        List<Document> list = new ArrayList<>();
        long begin = System.currentTimeMillis();
        while (cursor.hasNext()) {
            list.add(cursor.next());
            count++;
            if (count % 100 == 0 || total - count == 0) {
                long duration = System.currentTimeMillis() - begin;
                log.info("It takes {} ms to current {}", duration, count);
                // 消费数据!!!
                list.clear();
                begin = System.currentTimeMillis();
            }
        }
    }
    
    点击展开(部分)打印日志
    
    The collection test_big_data contains 100002 documents.
    It takes 0 ms to current 100
    It takes 1545 ms to current 200
    It takes 0 ms to current 300
    It takes 0 ms to current 400
    It takes 0 ms to current 500
    It takes 0 ms to current 600
    It takes 0 ms to current 700
    It takes 0 ms to current 800
    It takes 0 ms to current 900
    It takes 0 ms to current 1000
    It takes 0 ms to current 1100
    It takes 0 ms to current 1200
    It takes 0 ms to current 1300
    It takes 0 ms to current 1400
    It takes 1497 ms to current 1500
    It takes 0 ms to current 1600
    It takes 0 ms to current 1700
    It takes 0 ms to current 1800
    It takes 0 ms to current 1900
    It takes 0 ms to current 2000
    It takes 0 ms to current 2100
    It takes 0 ms to current 2200
    It takes 0 ms to current 2300
    It takes 0 ms to current 2400
    It takes 0 ms to current 2500
    It takes 0 ms to current 2600
    It takes 0 ms to current 2700
    It takes 0 ms to current 2800
    It takes 1486 ms to current 2900
    It takes 0 ms to current 3000
    It takes 0 ms to current 3100
    It takes 0 ms to current 3200
    It takes 0 ms to current 3300
    It takes 0 ms to current 3400
    It takes 0 ms to current 3500
    It takes 0 ms to current 3600
    It takes 0 ms to current 3700
    It takes 0 ms to current 3800
    It takes 0 ms to current 3900
    It takes 0 ms to current 4000
    It takes 0 ms to current 4100
    It takes 0 ms to current 4200
    It takes 1488 ms to current 4300
    It takes 0 ms to current 4400
    It takes 0 ms to current 4500
    It takes 0 ms to current 4600
    It takes 0 ms to current 4700
    It takes 0 ms to current 4800
    It takes 0 ms to current 4900
    It takes 0 ms to current 5000
    It takes 0 ms to current 5100
    It takes 0 ms to current 5200
    It takes 0 ms to current 5300
    It takes 0 ms to current 5400
    It takes 0 ms to current 5500
    It takes 0 ms to current 5600
    It takes 1503 ms to current 5700
    It takes 0 ms to current 5800
    It takes 0 ms to current 5900
    It takes 0 ms to current 6000
    It takes 0 ms to current 6100
    It takes 0 ms to current 6200
    It takes 0 ms to current 6300
    It takes 0 ms to current 6400
    It takes 0 ms to current 6500
    It takes 0 ms to current 6600
    It takes 0 ms to current 6700
    It takes 0 ms to current 6800
    It takes 0 ms to current 6900
    It takes 0 ms to current 7000
    It takes 1475 ms to current 7100
    
    根据观察可以发现,每隔大约 1300 ~ 1400 条左右的 Document,就会进行一次网络IO,加载数据到内存中,而内存中读取数据几乎不花时间。

    当然,间隔数量主要和你单个 Document 的大小以及缓冲区的总大小有关,你我的实验结果将因人而异。

    综上所述

    本文结论:在使用 mongo-java-driver 时,如果需要扫描全表的情况下,创建多个cursor分页查询的效率不及只用一个cursor查全表效率高。

  • 相关阅读:
    LeetCode之移除元素
    有被开心到hh(日常)
    交换排序
    插入排序
    顺序查找&折半查找
    C++之引用
    MySQL学习笔记
    C/C++程序编译过程
    计算机面试知识整合(更新中...)
    MFC之编辑框
  • 原文地址:https://www.cnblogs.com/kendoziyu/p/16258102.html
Copyright © 2020-2023  润新知