假设你想查询某个表的某一列。Hive默认是会启用MapReduce Job来完毕这个任务,例如以下:
01 |
hive>
SELECT id, money FROM m limit 10 ; |
02 |
Total
MapReduce jobs = 1 |
03 |
Launching
Job 1 out
of 1 |
04 |
Number
of reduce tasks is set to 0 since
there's no reduce operator |
05 |
Cannot
run job locally: Input Size (= 235105473 )
is larger than |
06 |
hive.exec.mode.local.auto.inputbytes.max
(= 134217728 ) |
07 |
Starting
Job = job_1384246387966_0229, Tracking URL = |
11 |
Kill
Command = /home/q/hadoop- 2.2 . 0 /bin/hadoop
job |
12 |
-kill
job_1384246387966_0229 |
13 |
hadoop
job information for Stage- 1 :
number of mappers: 1 ; |
15 |
2013 - 11 - 13 11 : 35 : 16 , 167 Stage- 1 map
= 0 %,
reduce = 0 % |
16 |
2013 - 11 - 13 11 : 35 : 21 , 327 Stage- 1 map
= 100 %,
reduce = 0 %, |
17 |
Cumulative
CPU 1.26 sec |
18 |
2013 - 11 - 13 11 : 35 : 22 , 377 Stage- 1 map
= 100 %,
reduce = 0 %, |
19 |
Cumulative
CPU 1.26 sec |
20 |
MapReduce
Total cumulative CPU time: 1 seconds 260 msec |
21 |
Ended
Job = job_1384246387966_0229 |
22 |
MapReduce
Jobs Launched: |
23 |
Job 0 :
Map: 1 Cumulative
CPU: 1.26 sec |
24 |
HDFS
Read: 8388865 HDFS
Write: 60 SUCCESS |
25 |
Total
MapReduce CPU Time Spent: 1 seconds 260 msec |
37 |
Time
taken: 16.802 seconds,
Fetched: 10 row(s) |
我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题。从Hive0.10.0版本号開始,对于简单的不须要聚合的类似SELECT <col> from <table> LIMIT n语句,不须要起MapReduce job,直接通过Fetch task获取数据,能够通过以下几种方法实现:
方法一:
01 |
hive>
set hive.fetch.task.conversion=more; |
02 |
hive>
SELECT id, money FROM m limit 10 ; |
14 |
Time
taken: 0.138 seconds,
Fetched: 10 row(s) |
上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
方法二:
1 |
bin/hive
--hiveconf hive.fetch.task.conversion=more |
方法三:
上面的两种方法都能够开启了Fetch任务,可是都是暂时起作用的;假设你想一直启用这个功能。能够在${HIVE_HOME}/conf/hive-site.xml里面增加下面配置:
02 |
<name>hive.fetch.task.conversion</name> |
05 |
Some
select queries can be converted to single FETCH task |
06 |
minimizing
latency.Currently the query should be single |
07 |
sourced
not having any subquery and should not have |
08 |
any
aggregations or distincts (which incurrs RS), |
09 |
lateral
views and joins. |
10 |
1 .
minimal : SELECT STAR, FILTER on partition columns, LIMIT only |
11 |
2 .
more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns) |
这样就能够长期启用Fetch任务了,非常不错吧。也赶紧去试试吧!