• Sqoop--Free-form Query Imports 自由查询模式下$CONDITIONS关键字的作用


    Scoop是用来实现HDFS文件系统和关系型数据库如MySQL之间数据传输和转换的工具。

    从MySQL导出到HDFS可以通过--table, --columns and --where等设置数据抽出的条件。但是同时也只是自由sql语句(Free-form Query )的方式抽出数据。此时我们用--query加sql语句方式自由抽取数据。

    1,必须制定目标文件的位置--target-dir

    2,必须使用$CONDITIONS关键字,

    3,你也可以选择使用--split-by分片(分区,结果分成多个小文件,请参考mapreduce分区)

    我们主要讨论$CONDITIONS关键字的作用是什么。

    1如果直接输出,这里面是空的条件

    2,我们在执行log中发现被替换成了1=0

    sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2  --m 1 --delete-target-dir 
    --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'
    [root@server72 sqoop]# sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2  
    --m 1 --delete-target-dir  --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'
    Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    17/11/10 13:42:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    17/11/10 13:42:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    17/11/10 13:42:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    17/11/10 13:42:16 INFO tool.CodeGenTool: Beginning code generation
    17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0)
    17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0)
    17/11/10 13:42:18 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and  (1 = 0)
    17/11/10 13:42:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
    Note: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    17/11/10 13:42:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ac7745794cf5f0bf5859e7e8369a8c5f/QueryResult.jar
    17/11/10 13:42:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/11/10 13:42:41 INFO tool.ImportTool: Destination directory /sqoopout2 deleted.
    17/11/10 13:42:41 INFO mapreduce.ImportJobBase: Beginning query import.
    17/11/10 13:42:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    17/11/10 13:42:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/11/10 13:42:43 INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:8032
    17/11/10 13:42:58 INFO db.DBInputFormat: Using read commited transaction isolation
    17/11/10 13:42:58 INFO mapreduce.JobSubmitter: number of splits:1
    17/11/10 13:43:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0011
    17/11/10 13:43:03 INFO impl.YarnClientImpl: Submitted application application_1510279795921_0011
    17/11/10 13:43:04 INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0011/
    17/11/10 13:43:04 INFO mapreduce.Job: Running job: job_1510279795921_0011
    17/11/10 13:44:01 INFO mapreduce.Job: Job job_1510279795921_0011 running in uber mode : false
    17/11/10 13:44:01 INFO mapreduce.Job:  map 0% reduce 0%
    17/11/10 13:44:58 INFO mapreduce.Job:  map 100% reduce 0%
    17/11/10 13:45:00 INFO mapreduce.Job: Job job_1510279795921_0011 completed successfully
    17/11/10 13:45:01 INFO mapreduce.Job: Counters: 30
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=124473
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=87
            HDFS: Number of bytes written=61
            HDFS: Number of read operations=4
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters
            Launched map tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=45099
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=45099
            Total vcore-milliseconds taken by all map tasks=45099
            Total megabyte-milliseconds taken by all map tasks=46181376
        Map-Reduce Framework
            Map input records=3
            Map output records=3
            Input split bytes=87
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=370
            CPU time spent (ms)=6380
            Physical memory (bytes) snapshot=106733568
            Virtual memory (bytes) snapshot=842854400
            Total committed heap usage (bytes)=16982016
        File Input Format Counters
            Bytes Read=0
        File Output Format Counters
            Bytes Written=61
    17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Transferred 61 bytes in 139.3429 seconds (0.4378 bytes/sec)
    17/11/10 13:45:01 INFO mapreduce.ImportJobBase: Retrieved 3 records.

    输出结果查看,发现1202以上的数据被正常抽出
    [root@server72 sqoop]# hdfs dfs -cat /sqoopout2/part-m-00000
    17/11/10 13:48:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    1203,khalil,php dev
    1204,prasanth,php dev
    1205,kranthi,admin

     通过以上过程,我们得知一点:$CONTITONS是linux系统的变量,在执行过程中被赋值为(1=0),虽然实际执行的这个sql很奇怪。

    现在正式开始研究CONTITONS到底是什么,所以我们先查看官方文档。

    If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.

    如果你想通过并行的方式导入结果,每个map task需要执行sql查询语句的副本,结果会根据sqoop推测的边界条件分区。query必须包含$CONDITIONS。这样每个scoop程序都会被替换为一个独立的条件。同时你必须指定--split-by.分区

    For example:

    $ sqoop import 
      --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' 
      --split-by a.id --target-dir /user/foo/joinresults

     直接理解可能有点困难,我先修改一些条件,大家观察joblog的区别。

    sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root  --password 123456  --target-dir /sqoopout2 

          --m 2 --delete-target-dir  --query 'select id,name,deg from emp where id>1202 and $CONDITIONS'

          --split-by id

    我按照要求添加了--split-by id 分区,并设置map task数量为2

    [root@server72 sqoop]# sqoop import   --connect jdbc:mysql://server74:3306/Server74   --username root 
    --password 123456 --target-dir /sqoopout2 --m 2 --delete-target-dir --query 'select id,name,deg from emp where id>1202 and $CONDITIONS' --split-by id
    Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/11/10 13:50:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/11/10 13:50:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/11/10 13:50:28 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/11/10 13:50:28 INFO tool.CodeGenTool: Beginning code generation 17/11/10 13:50:30 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and (1 = 0) 17/11/10 13:50:31 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and (1 = 0) 17/11/10 13:50:31 INFO manager.SqlManager: Executing SQL statement: select id,name,deg from emp where id>1202 and (1 = 0) 17/11/10 13:50:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop Note: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/11/10 13:50:43 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/1024341fa58082466565e5bd648cb10e/QueryResult.jar 17/11/10 13:50:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/11/10 13:50:55 INFO tool.ImportTool: Destination directory /sqoopout2 deleted. 17/11/10 13:50:55 INFO mapreduce.ImportJobBase: Beginning query import. 17/11/10 13:50:55 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/11/10 13:50:55 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/11/10 13:50:56 INFO client.RMProxy: Connecting to ResourceManager at server71/192.168.32.71:8032 17/11/10 13:51:12 INFO db.DBInputFormat: Using read commited transaction isolation 17/11/10 13:51:12 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select id,name,deg from emp where id>1202 and (1 = 1) ) AS t1 17/11/10 13:51:12 INFO mapreduce.JobSubmitter: number of splits:3 17/11/10 13:51:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1510279795921_0012 17/11/10 13:51:18 INFO impl.YarnClientImpl: Submitted application application_1510279795921_0012 17/11/10 13:51:19 INFO mapreduce.Job: The url to track the job: http://server71:8088/proxy/application_1510279795921_0012/ 17/11/10 13:51:19 INFO mapreduce.Job: Running job: job_1510279795921_0012 17/11/10 13:52:19 INFO mapreduce.Job: Job job_1510279795921_0012 running in uber mode : false 17/11/10 13:52:19 INFO mapreduce.Job: map 0% reduce 0% 17/11/10 13:53:23 INFO mapreduce.Job: map 33% reduce 0% 17/11/10 13:54:19 INFO mapreduce.Job: map 67% reduce 0% 17/11/10 13:54:20 INFO mapreduce.Job: map 100% reduce 0% 17/11/10 13:54:24 INFO mapreduce.Job: Job job_1510279795921_0012 completed successfully 17/11/10 13:54:25 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=374526 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=301 HDFS: Number of bytes written=61 HDFS: Number of read operations=12 HDFS: Number of large read operations=0 HDFS: Number of write operations=6 Job Counters Killed map tasks=2 Launched map tasks=5 Other local map tasks=5 Total time spent by all maps in occupied slots (ms)=349539 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=349539 Total vcore-milliseconds taken by all map tasks=349539 Total megabyte-milliseconds taken by all map tasks=357927936 Map-Reduce Framework Map input records=3 Map output records=3 Input split bytes=301 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=3013 CPU time spent (ms)=21550 Physical memory (bytes) snapshot=321351680 Virtual memory (bytes) snapshot=2528706560 Total committed heap usage (bytes)=52994048 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=61
    
    
  • 相关阅读:
    vue+springboot+element+vue-resource实现文件上传
    使用bfg快速清理git历史大文件
    git clone异常 【fatal: protocol error: bad line length character: Inte】
    excel 一次删除所有空行
    vim编辑器
    prometheus安装
    递归计算分波那契数列和阶乘
    如何理解线程安全?
    创建线程的方式
    为什么说一个对象是线程安全的?
  • 原文地址:https://www.cnblogs.com/kouryoushine/p/7814312.html
Copyright © 2020-2023  润新知