• 使用DataX实现mysql数据迁移


    DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能

    github地址 https://github.com/alibaba/DataX

    在github上找到 Quick Start 部分

    https://github.com/alibaba/DataX/blob/master/userGuid.md

    1.安装

      JDK 1.8.0_151

      Python 2.7.18

      下载后解压

    2.启动

      进入bin目录

    python datax.py {YOUR_JOB.json}

    3.简单输出

    (1)创建作业的配置文件(json格式)

      stream.json

    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "streamreader",
              "parameter": {
                "sliceRecordCount": 2,
                "column": [
                  {
                    "type": "long",
                    "value": "2"
                  },
                  {
                    "type": "string",
                    "value": "hello,world-DataX"
                  }
                ]
              }
            },
            "writer": {
              "name": "streamwriter",
              "parameter": {
                "encoding": "UTF-8",
                "print": true
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": 3
           }
        }
      }
    }

    (2)启动DataX

    python datax.py ./stream.json

    输出

    ……
    2021-02-04 16:58:42.891 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
    2    hello,world-DataX
    2    hello,world-DataX
    2021-02-04 16:58:42.893 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
    2    hello,world-DataX
    2    hello,world-DataX
    2    hello,world-DataX
    2    hello,world-DataX
    2021-02-04 16:58:42.993 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms
    2021-02-04 16:58:42.993 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[105]ms
    2021-02-04 16:58:42.993 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[101]ms
    2021-02-04 16:58:42.994 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
    2021-02-04 16:58:52.887 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 108 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
    2021-02-04 16:58:52.888 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
    2021-02-04 16:58:52.889 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
    2021-02-04 16:58:52.889 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
    2021-02-04 16:58:52.889 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
    2021-02-04 16:58:52.891 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:datadataxhook
    2021-02-04 16:58:52.893 [job-0] INFO  JobContainer - 
         [total cpu info] => 
            averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
            -1.00%                         | -1.00%                         | -1.00%
                            
    
         [total gc info] => 
             NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
             PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
             PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
    
    2021-02-04 16:58:52.894 [job-0] INFO  JobContainer - PerfTrace not enable!
    2021-02-04 16:58:52.894 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 108 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
    2021-02-04 16:58:52.896 [job-0] INFO  JobContainer - 
    任务启动时刻                    : 2021-02-04 16:58:42
    任务结束时刻                    : 2021-02-04 16:58:52
    任务总计耗时                    :                 10s
    任务平均流量                    :               10B/s
    记录写入速度                    :              0rec/s
    读出记录总数                    :                   6
    读写失败总数                    :                   0

    4.获取mysql数据本地打印

    mysqltest.json

    {
        "job": {
            "setting": {
                "speed": {
                     "channel":1
                }
            },
            "content": [
                {
                    "reader": {
                        "name": "mysqlreader",
                        "parameter": {
                            "username": "root",
                            "password": "root",
                            "connection": [
                                {
                                    "querySql": [
                                        "select id,name from sys_user limit 3;"
                                    ],
                                    "jdbcUrl": [
                                        "jdbc:mysql://localhost:3306/dream"
                                    ]
                                }
                            ]
                        }
                    },
                    "writer": {
                        "name": "streamwriter",
                        "parameter": {
                            "print": true,
                            "encoding": "UTF-8"
                        }
                    }
                }
            ]
        }
    }

    输出

    ……
    2021-02-04 17:00:24.237 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from sys_user limit 3;
    ] jdbcUrl:[jdbc:mysql://localhost:3306/dream?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
    33    abc
    1    admin
    25    caocao
    2021-02-04 17:00:24.291 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms
    2021-02-04 17:00:24.291 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
    2021-02-04 17:00:34.187 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
    2021-02-04 17:00:34.188 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
    2021-02-04 17:00:34.189 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
    2021-02-04 17:00:34.189 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
    2021-02-04 17:00:34.189 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
    2021-02-04 17:00:34.191 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:datadataxhook
    2021-02-04 17:00:34.194 [job-0] INFO  JobContainer - 
         [total cpu info] => 
            averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
            -1.00%                         | -1.00%                         | -1.00%
                            
    
         [total gc info] => 
             NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
             PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
             PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
    
    2021-02-04 17:00:34.194 [job-0] INFO  JobContainer - PerfTrace not enable!
    2021-02-04 17:00:34.195 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
    2021-02-04 17:00:34.196 [job-0] INFO  JobContainer - 
    任务启动时刻                    : 2021-02-04 17:00:23
    任务结束时刻                    : 2021-02-04 17:00:34
    任务总计耗时                    :                 10s
    任务平均流量                    :                1B/s
    记录写入速度                    :              0rec/s
    读出记录总数                    :                   3
    读写失败总数                    :                   0

    5.mysql 不同库表数据迁移

    migration.json

    {
        "job": {
            "content": [
                {
                    "reader": {
                        "name": "mysqlreader",
                        "parameter": {
                            "column": ["id","name"],
                            "connection": [
                                {
                                  "jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/dream" ],
                                    "table": ["sys_user"]
                                }
                            ],
                            "password": "root",
                            "username": "root",
                            "where": "id>29"
                        }
                    },
                    "writer": {
                        "name": "mysqlwriter",
                        "parameter": {
                            "column": ["id","name"],
                            "connection": [
                              {
                                "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/dream1",
                                "table": ["user"]
                              }
                            ],
                            "password": "root",
                            "username": "root",
                            "preSql": [],
                            "session": [],
                            "writeMode": "insert"
                        }
                    }
                }
            ],
            "setting": {
                "speed": {
                    "channel": "1"
                }
            }
        }
    }

    输出

    ……
    2021-02-04 17:10:17.061 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 30 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
    2021-02-04 17:10:17.061 [job-0] INFO  JobContainer - 
    任务启动时刻                    : 2021-02-04 17:10:06
    任务结束时刻                    : 2021-02-04 17:10:17
    任务总计耗时                    :                 10s
    任务平均流量                    :                3B/s
    记录写入速度                    :              0rec/s
    读出记录总数                    :                   4
    读写失败总数                    :                   0

    查看 dream1的 user数据已经插入

    其他参数可以查看github上mysqlreader和mysqlwriter下的doc

  • 相关阅读:
    尝试加载 Oracle 客户端库时引发 BadImageFormatException。如果在安装 32 位 Oracle 客户端组件的情况下以 64 位模式运行 已解决!
    iis 无法在Web服务器上启动调试。打开的URL的IIS辅助进程当前没有运行
    aspx页面,Page_Load 无人进入,解决
    Ajax后台传数组参数,接收不到报错!
    FusionCharts和highcharts 饼图区别!
    redis
    Hibernate不同数据库的连接及SQL方言
    Kafka
    Zookeeper
    BaseDao+万能方法 , HibernateDaoSupport
  • 原文地址:https://www.cnblogs.com/baby123/p/14372147.html
Copyright © 2020-2023  润新知