• Sqoop修改sqoop元信息实现job的增量导入


    最简单方式是按主键增量导入:http://blog.csdn.net/ggz631047367/article/details/50185319

    以下方法只做存档

    需求:redis缓存的数据隔段时间往MySQL中写入一次。如果按照job的增量导入,比如上次redis向mysql导入数据时间为8:00,下一次导入时间为9:00,8:20sqoop进行增量导入,导入的时*~8:20的数据,sqoop把job中的最后导入时间设置为8:20。当9:00redis向mysql再次导入数据,9:20sqoop会将mysql中8:20~9:20数据导入到hive中,那么8:00~8:20的数据没有导入到hive中。
    1、批量创建sqoop job:

    #!/bin/bash
    set -e
    
    #create sqoop jobs
    ret=$(cat tables|wc -l)  #tables中存放所有表名
    i=1
    
    while [ $i -le $ret ]
    do
            row=$(sed -n "$i,0p" tables)
            sqoop job --create "incre_"$row -- import --connect  jdbc:mysql://10.9.1.19:3306/db  --table $row --username root --password root -m 1  --hive-import --incremental lastmodified --check-column dtTime  --last-value '2015-12-02 10:59:56.0'
    #此处last--value可以随意指定
            i=`expr $i + 1`
    done
    sqoop job --list

    2、批量删除job

    #!/bin/bash
    set -e
    
    #create sqoop jobs
    ret=$(cat tables|wc -l)
    i=1
    while [ $i -le $ret ]
    do
            row=$(sed -n "$i,0p" tables)
            sqoop job --delete "incre_"$row
            i=`expr $i + 1`
    done
    sqoop job --list

    3、查询最后导入到hive中数据的dtTime值

    自动增量导入,实现按照某个字段在hive中最后的值开始的导入,而不是按saved job中指定的值
    
    #获取所有tables文件中指定的表的最后dtTime值,并修改sqoop  job的元数据,最后执行job
    #!/bin/bash
    set -e
    #set -x
    
    source ~/.bash_profile
    SQOOP_METASTORE=/usr/local/sqoop-1.4.6/metastore/meta.db.script
    ret=$(cat tables|wc -l)
    i=1
    while [ $i -le $ret ]
    do
            row=$(sed -n "$i,0p" tables)
            JOB_NAME="incre_"$row
            lastedate=$(hive -e "select max(dtTime) from $row")
            echo "lastest time:"$lastedate>>/data/incre.log.d/$JOB_NAME.log
            #将时间中的.0替换为.1
            newdate=$(echo $lastedate|sed 's/.0/.1/g')
            echo "beginning time:"$newdate>>/data/incre.log.d/$JOB_NAME.log
            #替换sqoop的元数据
            #cat $SQOOP_METASTORE|grep "'$JOB_NAME','incremental.last.value'"|sed "s/20.*-.*.0/$ret/g"
            sed -i "s/'$JOB_NAME','incremental.last.value','.*','SqoopOptions'/'$JOB_NAME','incremental.last.value','$newdate','SqoopOptions'/g" $SQOOP_METASTORE
            cat $SQOOP_METASTORE|grep "'$JOB_NAME','incremental.last.value'">>/data/incre.log.d/$JOB_NAME.log
            sqoop job --exec $JOB_NAME
            i=`expr $i + 1`
    done
  • 相关阅读:
    Anagram
    HDU 1205 吃糖果(鸽巢原理)
    Codeforces 1243D 0-1 MST(补图的连通图数量)
    Codeforces 1243C Tile Painting(素数)
    Codeforces 1243B2 Character Swap (Hard Version)
    Codeforces 1243B1 Character Swap (Easy Version)
    Codeforces 1243A Maximum Square
    Codeforces 1272E Nearest Opposite Parity(BFS)
    Codeforces 1272D Remove One Element
    Codeforces 1272C Yet Another Broken Keyboard
  • 原文地址:https://www.cnblogs.com/ggzone/p/10121215.html
Copyright © 2020-2023  润新知