在学习sqoop job之前,最好先学习一下sqoop命令的导入导出
sqoop 使用 import 将 mysql 中数据导入到 hive
sqoop 使用 import 将 mysql 中数据导入到 hdfs
sqoop 使用 export 将 hive 中数据导出到 mysql
sqoop job
sqoop job 可将一些参数配置以及命令语句保存起来,方便调用。
接下来实现一个从mysql导入到hive的任务
- mysql建表,表名为 sqoop_job
CREATE TABLE `sqoop_job` ( `id` int(11) DEFAULT NULL, `name` varchar(255) DEFAULT NULL, `jobname` varchar(255) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1
- 给sqoop_job插入测试数据
insert into sqoop_job values(1,"name1","jobname1"); insert into sqoop_job values(2,"name2","jobname2"); insert into sqoop_job values(3,"name3","jobname3");
- 将mysql表结构同步到hive
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/sqooptest --username root --password 123qwe --table sqoop_job
--hive-table sqoop_job --fields-terminated-by , - 创建一个导入任务的sqoop job
sqoop job --create sqoopimport1 -- import --connect jdbc:mysql://localhost:3306/sqooptest --username root -password 123qwe --table sqoop_job
--hive-import --hive-table sqoop_job --fields-terminated-by ',' -m 1创建成功后可使用命令查看当前job列表
sqoop job -list
sqoop还支持查看已创建任务的参数配置
使用命令 sqoop job --show jobname
EFdeMacBook-Pro:sbin FengZhen$ sqoop job --show sqoopimport1 Job: sqoopimport1 Tool: import Options: ---------------------------- verbose = false db.connect.string = jdbc:mysql://localhost:3306/sqooptest codegen.output.delimiters.escape = 0 codegen.output.delimiters.enclose.required = false codegen.input.delimiters.field = 0 hbase.create.table = false db.require.password = true hdfs.append.dir = false db.table = sqoop_job codegen.input.delimiters.escape = 0 import.fetch.size = null accumulo.create.table = false codegen.input.delimiters.enclose.required = false db.username = root reset.onemapper = false codegen.output.delimiters.record = 10 import.max.inline.lob.size = 16777216 hbase.bulk.load.enabled = false hcatalog.create.table = false db.clear.staging.table = false codegen.input.delimiters.record = 0 enable.compression = false hive.overwrite.table = false hive.import = true codegen.input.delimiters.enclose = 0 hive.table.name = sqoop_job accumulo.batch.size = 10240000 hive.drop.delims = false codegen.output.delimiters.enclose = 0 hdfs.delete-target.dir = false codegen.output.dir = . codegen.auto.compile.dir = true relaxed.isolation = false mapreduce.num.mappers = 1 accumulo.max.latency = 5000 import.direct.split.size = 0 codegen.output.delimiters.field = 44 export.new.update = UpdateOnly incremental.mode = None hdfs.file.format = TextFile codegen.compile.dir = /tmp/sqoop-FengZhen/compile/546e29b092f451585b5c8547b3e9985e direct.import = false hive.fail.table.exists = false db.batch = false
- 执行job
sqoop job --exec sqoopimport1
执行成功后可查看hive中表的数据
hive> select * from sqoop_job; OK 1 name1 jobname1 2 name2 jobname2 3 name3 jobname3 Time taken: 1.618 seconds, Fetched: 3 row(s)
Done.