• sqoop import mysql to hive table:GC overhead limit exceeded


    1. Scenario description

    when I use sqoop to import mysql table into hive, I got the following error:

    19/07/12 15:41:35 WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not  assignment compatible
    19/07/12 15:41:35 WARN hcat.SqoopHCatUtilities: The HCatalog field submername has type string.  Expected = varchar based on database column type : VARCHAR
    19/07/12 15:41:35 WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not  assignment compatible
    19/07/12 15:41:35 INFO mapreduce.DataDrivenImportJob: Configuring mapper for HCatalog import job
    19/07/12 15:41:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    19/07/12 15:41:35 INFO client.RMProxy: Connecting to ResourceManager at hadoop-namenode01/192.168.1.101:8032
    19/07/12 15:41:35 WARN conf.HiveConf: HiveConf of name hive.server2.webui.host.port does not exist
    19/07/12 15:41:35 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
    19/07/12 15:42:52 INFO db.DBInputFormat: Using read commited transaction isolation
    19/07/12 15:42:52 INFO mapreduce.JobSubmitter: number of splits:1
    19/07/12 15:42:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562229385371_50086
    19/07/12 15:42:53 INFO impl.YarnClientImpl: Submitted application application_1562229385371_50086
    19/07/12 15:42:53 INFO mapreduce.Job: The url to track the job: http://hadoop-namenode01:8088/proxy/application_1562229385371_50086/
    19/07/12 15:42:53 INFO mapreduce.Job: Running job: job_1562229385371_50086
    19/07/12 15:43:38 INFO hive.metastore: Closed a connection to metastore, current connections: 1
    19/07/12 15:43:48 INFO mapreduce.Job: Job job_1562229385371_50086 running in uber mode : false
    19/07/12 15:43:48 INFO mapreduce.Job:  map 0% reduce 0%
    19/07/12 15:48:31 INFO mapreduce.Job: Task Id : attempt_1562229385371_50086_m_000000_0, Status : FAILED
    Error: GC overhead limit exceeded

    Why Sqoop Import throws this exception?
    The answer is – During the process, RDBMS database (NOT SQOOP) fetches all the rows at one shot and tries to load everything into memory. This causes memory spill out and throws error. To overcome this you need to tell RDBMS database to return the data in batches. The following parameters “?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true” following the jdbc connection string tells database to fetch 10000 rows per batch.

    The script I use to import is as follows:

    file sqoop_order_detail.sh

    #!/bin/bash
    
    /home/lenmom/sqoop-1.4.7/bin/sqoop import 
    --connect jdbc:mysql://lenmom-mysql:3306/inventory 
    --username root --password root 
    --driver com.mysql.jdbc.Driver 
    --table order_detail 
    --hcatalog-database orc 
    --hcatalog-table order_detail 
    --hcatalog-partition-keys pt_log_d 
    --hcatalog-partition-values 20190709 
    --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' 
    -m 1

    the target mysql table has 10 billion record.

    2.Solution:

    2.1 solution 1

    modify the mysql url to set stream read data style by append the following content:

    ?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true

    of which the defaultFetchSize can be changed according to specific condition,in my case, the whole script is :

    #!/bin/bash
    
    /home/lenmom/sqoop-1.4.7/bin/sqoop import 
    --connect jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true&useUnicode=yes&characterEncoding=utf8&characterEncoding=utf8 
    --username root --password root 
    --driver com.mysql.jdbc.Driver 
    --table order_detail 
    --hcatalog-database orc 
    --hcatalog-table order_detail 
    --hcatalog-partition-keys pt_log_d 
    --hcatalog-partition-values 20190709 
    --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' 
    -m 1

    don't  forget to use escape for & in shell script, or we can also use "jdbc url" to instead of using escape.

    #!/bin/bash
    
    /home/lenmom/sqoop-1.4.7/bin/sqoop import 
    --connect "jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true&useUnicode=yes&characterEncoding=utf8&characterEncoding=utf8" 
    --username root --password root 
    --driver com.mysql.jdbc.Driver 
    --table order_detail 
    --hcatalog-database orc 
    --hcatalog-table order_detail 
    --hcatalog-partition-keys pt_log_d 
    --hcatalog-partition-values 20190709 
    --hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' 
    -m 1

    2.2 Solution 2

    sqoop import -Dmapreduce.map.memory.mb=6000 -Dmapreduce.map.java.opts=-Xmx1600m -Dmapreduce.task.io.sort.mb=4800

    Above parameters needs to be tuned according to the data for a successful SQOOP pull.

    2.3 Solution 3

    increase mapper number(the default mapper number is 4, should not greater than datanode number)

    sqoop job --exec lenmom-job -- --num-mappers 8;

    reference:

    https://stackoverflow.com/questions/26484873/cloudera-settings-sqoop-import-gives-java-heap-space-error-and-gc-overhead-limit

  • 相关阅读:
    Linux三种网络连接模式
    hadoop知识点总结
    Wordpress 删除 Storefront 主题的购物车
    英文俚语600句及释义
    雅思口语俚语150句 A-Z of English Idioms: 150 Most Common Expressions
    雅思作文策略总结
    雅思作文高分词汇及词组
    剑桥雅思写作高分范文ESSAY113
    剑桥雅思写作高分范文ESSAY112
    剑桥雅思写作高分范文ESSAY111
  • 原文地址:https://www.cnblogs.com/lenmom/p/11176756.html
Copyright © 2020-2023  润新知