• Hive之简单查询不启用MapReduce


    假设你想查询某个表的某一列。Hive默认是会启用MapReduce Job来完毕这个任务,例如以下:

    01 hive> SELECT id, money FROM m limit 10;
    02 Total MapReduce jobs = 1
    03 Launching Job 1 out of 1
    04 Number of reduce tasks is set to 0 since there's no reduce operator
    05 Cannot run job locally: Input Size (= 235105473) is larger than
    06 hive.exec.mode.local.auto.inputbytes.max (= 134217728)
    07 Starting Job = job_1384246387966_0229, Tracking URL =
    08  
    09 http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/
    10  
    11 Kill Command = /home/q/hadoop-2.2.0/bin/hadoop job 
    12 -kill job_1384246387966_0229
    13 hadoop job information forStage-1: number of mappers: 1;
    14 number of reducers: 0
    15 2013-11-13 11:35:16,167 Stage-1 map = 0%,  reduce = 0%
    16 2013-11-13 11:35:21,327 Stage-1 map = 100%,  reduce = 0%,
    17  Cumulative CPU 1.26 sec
    18 2013-11-13 11:35:22,377 Stage-1 map = 100%,  reduce = 0%,
    19  Cumulative CPU 1.26 sec
    20 MapReduce Total cumulative CPU time: 1 seconds 260 msec
    21 Ended Job = job_1384246387966_0229
    22 MapReduce Jobs Launched:
    23 Job 0: Map: 1   Cumulative CPU: 1.26sec  
    24 HDFS Read: 8388865 HDFS Write: 60 SUCCESS
    25 Total MapReduce CPU Time Spent: 1 seconds 260 msec
    26 OK
    27 1       122
    28 1       185
    29 1       231
    30 1       292
    31 1       316
    32 1       329
    33 1       355
    34 1       356
    35 1       362
    36 1       364
    37 Time taken: 16.802 seconds, Fetched: 10 row(s)

      我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题。从Hive0.10.0版本号開始,对于简单的不须要聚合的类似SELECT <col> from <table> LIMIT n语句,不须要起MapReduce job,直接通过Fetch task获取数据,能够通过以下几种方法实现:
      方法一:

    01 hive> set hive.fetch.task.conversion=more;
    02 hive> SELECT id, money FROM m limit 10;
    03 OK
    04 1       122
    05 1       185
    06 1       231
    07 1       292
    08 1       316
    09 1       329
    10 1       355
    11 1       356
    12 1       362
    13 1       364
    14 Time taken: 0.138 seconds, Fetched: 10 row(s)

    上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!


      方法二:

    1 bin/hive --hiveconf hive.fetch.task.conversion=more

      方法三:
    上面的两种方法都能够开启了Fetch任务,可是都是暂时起作用的;假设你想一直启用这个功能。能够在${HIVE_HOME}/conf/hive-site.xml里面增加下面配置:

    01 <property>
    02   <name>hive.fetch.task.conversion</name>
    03   <value>more</value>
    04   <description>
    05     Some select queries can be converted to single FETCH task
    06     minimizing latency.Currently the query should be single
    07     sourced not having any subquery and should not have
    08     any aggregations or distincts (which incurrs RS),
    09     lateral views and joins.
    10     1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
    11     2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
    12   </description>
    13 </property>

    这样就能够长期启用Fetch任务了,非常不错吧。也赶紧去试试吧!

  • 相关阅读:
    张旭结对编程作业
    团队第一次作业(软工C#造梦厂)
    张旭第二次作业
    跨域问题
    .Net Core3.1使用AspectCore
    .Net中HttpClient之SendAsync方法
    两个具有相同属性的类赋值
    工具类--HttpUtils
    工具类--CacheHelper
    工具类--JsonHelper
  • 原文地址:https://www.cnblogs.com/mthoutai/p/6751585.html
Copyright © 2020-2023  润新知