• Sqoop 防止数据导出不一致的参数配置


    问题来源

    官网原话是这样的:

    Since Sqoop breaks down export process into multiple transactions, it is possible that a failed export job may result in partial data being committed to the database.
    This can further lead to subsequent jobs failing due to insert collisions in some cases, or lead to duplicated data in others.
    You can overcome this problem by specifying a staging table via the --staging-table option which acts as an auxiliary table that is used to stage exported data.
    The staged data is finally moved to the destination table in a single transaction.

    大概意思就是

    “由于Sqoop将导出过程分解为多个事务,因此失败的导出作业可能会导致将部分数据提交到数据库。

     在某些情况下,这可能进一步导致后续作业因插入冲突而失败,而在其他情况下,则可能导致数据重复。

    您可以通过--staging-table选项指定暂存表来解决此问题,该选项用作用于暂存导出数据的辅助表。

    最后,已分阶段处理的数据将在单个事务中移至目标表。”

    解决

    sqoop export 
    --connect jdbc:mysql://192.168.137.10:3306/user_behavior
    --username root
    --password 123456
    --table app_cource_study_report
    --columns watch_video_cnt,complete_video_cnt,dt
    --fields-terminated-by " "
    --export-dir "/user/hive/warehouse/tmp.db/app_cource_study_analysis_${day}"
    --staging-table app_cource_study_report_tmp #创建临时表来存储结果,全部成功后再提交
    --clear-staging-table
    --input-null-string 'N'
  • 相关阅读:
    1.Apache与Tomcat
    jeeplus 多选框
    GIT 回滚
    jsp 中data 转换 字符串
    Pattern和Matcher中表达式
    web.xml 详细介绍
    $.ajax()方法详解
    My 2016
    如何做好一个保安队长。
    集合之WeakHashMap
  • 原文地址:https://www.cnblogs.com/yangxusun9/p/13022535.html
Copyright © 2020-2023  润新知