• Sqoop导入到hdfs


     1.注意win下直接复制进linux 改一下--等

    sqoop-list-databases --connect jdbc:mysql://122.206.79.212:3306/ --username root -P 
    

      

     先看一下有什么数据库,发现有些数据库,能查询到的数据库才能导入,很奇怪。

    2.导入到hdfs

    sqoop import  --connect jdbc:mysql://122.206.79.212:3306/dating --username root --password 123456 --table t_rec_top --driver com.mysql.jdbc.Driver 
    

      那个数据库 端口号 账户名 密码 那个表 不需要加上驱动  那没指定导入到hdfs的哪,肯定会有默认位置的

    可以看出只有map任务 没有reduce任务

    Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /home/hxsyl/Spark_Relvant/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    17/03/15 11:05:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    17/03/15 11:05:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    17/03/15 11:05:12 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
    17/03/15 11:05:12 INFO manager.SqlManager: Using default fetchSize of 1000
    17/03/15 11:05:12 INFO tool.CodeGenTool: Beginning code generation
    17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
    17/03/15 11:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
    17/03/15 11:05:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/mapreduce
    Note: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    17/03/15 11:05:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hxsyl/compile/ddeeb02cdbd25cddc2662317b89c80f1/t_rec_top.jar
    17/03/15 11:05:18 INFO mapreduce.ImportJobBase: Beginning import of t_rec_top
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hxsyl/Spark_Relvant/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    17/03/15 11:05:19 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    17/03/15 11:05:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM t_rec_top AS t WHERE 1=0
    17/03/15 11:05:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    17/03/15 11:05:21 INFO client.RMProxy: Connecting to ResourceManager at CentOSMaster/192.168.58.180:8032
    17/03/15 11:05:28 INFO db.DBInputFormat: Using read commited transaction isolation
    17/03/15 11:05:28 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM t_rec_top
    17/03/15 11:05:28 INFO mapreduce.JobSubmitter: number of splits:1
    17/03/15 11:05:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1489547007191_0001
    17/03/15 11:05:30 INFO impl.YarnClientImpl: Submitted application application_1489547007191_0001
    17/03/15 11:05:31 INFO mapreduce.Job: The url to track the job: http://CentOSMaster:8088/proxy/application_1489547007191_0001/
    17/03/15 11:05:31 INFO mapreduce.Job: Running job: job_1489547007191_0001
    17/03/15 11:05:48 INFO mapreduce.Job: Job job_1489547007191_0001 running in uber mode : false
    17/03/15 11:05:48 INFO mapreduce.Job:  map 0% reduce 0%
    17/03/15 11:06:06 INFO mapreduce.Job:  map 100% reduce 0%
    17/03/15 11:06:07 INFO mapreduce.Job: Job job_1489547007191_0001 completed successfully
    17/03/15 11:06:07 INFO mapreduce.Job: Counters: 30
    	File System Counters
    		FILE: Number of bytes read=0
    		FILE: Number of bytes written=127058
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=99
    		HDFS: Number of bytes written=21
    		HDFS: Number of read operations=4
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=2
    	Job Counters 
    		Launched map tasks=1
    		Other local map tasks=1
    		Total time spent by all maps in occupied slots (ms)=13150
    		Total time spent by all reduces in occupied slots (ms)=0
    		Total time spent by all map tasks (ms)=13150
    		Total vcore-milliseconds taken by all map tasks=13150
    		Total megabyte-milliseconds taken by all map tasks=13465600
    	Map-Reduce Framework
    		Map input records=1
    		Map output records=1
    		Input split bytes=99
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=183
    		CPU time spent (ms)=1200
    		Physical memory (bytes) snapshot=107761664
    		Virtual memory (bytes) snapshot=2069635072
    		Total committed heap usage (bytes)=30474240
    	File Input Format Counters 
    		Bytes Read=0
    	File Output Format Counters 
    		Bytes Written=21
    17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Transferred 21 bytes in 46.7701 seconds (0.449 bytes/sec)
    17/03/15 11:06:07 INFO mapreduce.ImportJobBase: Retrieved 1 records.
    

      

    创建一个user/yonhumig的目录,其中t_rec_top里就是我们的数据,不过没有标头,可以看出只是以m,表示map任务就结束了

    wc00是配置文件

    "AS	1
    "License");	1
    ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.	1
    (the	1
    -->	3
    2.0	1
    <!--	3
    </configuration>	1
    </description>	1
    </property>	15
    <?xml	1
    <configuration>	1
    <description>Amount	1
    <description>List	1
    <description>Number	1
    <description>The	7
    <description>Where	1
    <description>Whether	1
    <description>fair-scheduler	1
    <description>the	1
    <name>yarn.log-aggregation-enable</name>	1
    <name>yarn.nodemanager.aux-services</name>	1
    <name>yarn.nodemanager.local-dirs</name>	1
    <name>yarn.nodemanager.remote-app-log-dir</name>	1
    <name>yarn.nodemanager.resource.cpu-vcores</name>	1
    <name>yarn.nodemanager.resource.memory-mb</name>	1
    <name>yarn.resourcemanager.address</name>	1
    <name>yarn.resourcemanager.admin.address</name>	1
    <name>yarn.resourcemanager.hostname</name>	1
    <name>yarn.resourcemanager.resource-tracker.address</name>	1
    <name>yarn.resourcemanager.scheduler.address</name>	1
    <name>yarn.resourcemanager.scheduler.class</name>	1
    <name>yarn.resourcemanager.webapp.address</name>	1
    <name>yarn.resourcemanager.webapp.https.address</name>	1
    <name>yarn.scheduler.fair.allocation.file</name>	1
    <property>	15
    <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>	1
    <value>${yarn.resourcemanager.hostname}:8030</value>	1
    <value>${yarn.resourcemanager.hostname}:8031</value>	1
    <value>${yarn.resourcemanager.hostname}:8032</value>	1
    <value>${yarn.resourcemanager.hostname}:8033</value>	1
    <value>${yarn.resourcemanager.hostname}:8088</value>	1
    <value>${yarn.resourcemanager.hostname}:8090</value>	1
    <value>/home/hxsyl/Spark_Relvant/yarn/local</value>	1
    <value>/tmp/logs</value>	1
    <value>12</value>	1
    <value>30720</value>	1
    <value>CentOSMaster</value>	1
    <value>mapreduce_shuffle</value>	1
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>	1
    <value>true</value>	1
    ANY	1
    An	1
    Apache	1
    BASIS,	1
    CONDITIONS	1
    CPU	1
    Configs	1
    IS"	1
    Individual	1
    KIND,	1
    LICENSE	1
    License	3
    License,	1
    License.	2
    Licensed	1
    MB,	1
    Manager	1
    OF	1
    OR	1
    RM	3
    RM.</description>	2
    Resource	1
    See	2
    Site	1
    Unless	1
    Version	1
    WARRANTIES	1
    WITHOUT	1
    YARN	1
    You	1
    a	1
    a-zA-Z0-9_	1
    accompanying	1
    adddress	1
    address	4
    admin	1
    aggregate	1
    aggregation</description>	1
    agreed	1
    allocated	2
    an	1
    and	2
    applicable	1
    application's	1
    application.</description>	2
    applications	1
    as	1
    at	1
    be	4
    by	1
    called	1
    can	3
    class	1
    compliance	1
    conf	1
    configuration	1
    contain	1
    container_${contid},	1
    containers'	1
    containers.</description>	2
    copy	1
    cores	1
    directories	1
    directories,	1
    directory	1
    distributed	2
    either	1
    enable	1
    except	1
    express	1
    file	2
    file.	1
    files	1
    for	3
    found	1
    governing	1
    hostname	1
    http	1
    http://www.apache.org/licenses/LICENSE-2.0	1
    https	1
    implied.	1
    in	4
    in.	1
    in:	1
    interface	1
    interface.</description>	2
    is	1
    language	1
    law	1
    limitations	1
    localized	2
    location</description>	1
    log	1
    logs	1
    manager	1
    may	2
    memory,	1
    name	1
    not	2
    numbers</description>	1
    obtain	1
    of	11
    on	1
    only	1
    or	2
    permissions	1
    physical	1
    properties	1
    required	1
    resource	1
    scheduler	1
    scheduler.</description>	1
    service	1
    should	1
    software	1
    specific	2
    start	1
    store	1
    subdirectories	1
    that	2
    the	15
    this	1
    this.	1
    to	5
    to.</description>	1
    under	3
    use	2
    valid	1
    version="1.0"?>	1
    web	2
    will	2
    with	2
    work	1
    writing,	1
    you	1
    

      

     --target-dir  /path       放到那个路径        -m :标书numberMapper

    从hdfs上打开的文件可以看出  默认是逗号       --fields-terminated-by ' '   这个分隔符不是为了写入到hdfs来分割,而是原始数据的分隔符

    --columns 'id,account,income'    只导入某些特定的列

    符合特定条件的列才被导入,--where "id>2 and id <9"

    从多个表查询或者指定查询语句  --query "select * form t_detail where id >5 and $CONDITIONS"      $那个必须加 

    但是如果-m大于1 就需要指定各个Mapper读取几条记录或者找分隔符 --split-by t_detail.id   $CONDITIONS就是根据分割的信息找到记录条数,进而切分数据,

    建议使用单引号 使用双引号需要转义, --后边跟的是全称 -是简写

    单引号与双引号的最大不同在于双引号仍然可以保有变量的内容,但单引号内仅能是
    一般字符 ,而不会有特殊符号。我们以底下的例子做说明:假设您定义了一个变量, 
    name=VBird ,现在想以 name 这个变量的内容定义出 myname 显示 VBird its me 这
    个内容,要如何订定呢? 

    [root@linux ~]# name=VBird 
    [root@linux ~]# echo $name 
    VBird 
    [root@linux ~]# myname="$name its me" 
    [root@linux ~]# echo $myname 
    VBird its me 
    [root@linux ~]# myname='$name its me' 
    [root@linux ~]# echo $myname 
    $name its me 

    发现了吗?没错!使用了单引号的时候,那么 $name 将失去原有的变量内容, 仅为
    一般字符的显示型态而已!这里必需要特别小心在意!

  • 相关阅读:
    springboot系列九,springboot整合邮件服务、整合定时任务调度
    springboot系列八、springboot整合kafka
    springboot系列六、springboot配置错误页面及全局异常
    springboot系列五、springboot常用注解使用说明
    springboot系列四、配置模板引擎、配置热部署
    springboot系列三、springboot 单元测试、配置访问路径、多个配置文件和多环境配置,项目打包发布
    springboot系列二、springboot项目搭建
    springboot系列一、springboot产生背景及介绍
    kafka系列十、kafka常用管理命令
    Jmeter4.X
  • 原文地址:https://www.cnblogs.com/hxsyl/p/6553033.html
Copyright © 2020-2023  润新知