060 关于Hive的调优（本身，sql，mapreduce）

1.关于hive的优化

　　-》大表拆分小表
　　　　-》过滤字段
　　　　-》按字段分类存放

　　-》外部表与分区表
　　　　-》外部表：删除时只删除元数据信息，不删除数据文件
　　　　　　　　　　多人使用多个外部表操作同一份数据文件
　　　　-》分区表：hive中的数据库，表，分区来说都是文件夹
　　　　　　　　　　提高了检索效率
　　　　　　-》手动创建
　　　　　　-》动态分区
　　　　-》外部表+分区表

　　-》数据的存储
　　　　-》存储格式：列式存储

　　　　-》压缩

2.SQL的优化
　　　　-》后 join 先 filter

3.mapreduce的优化

　　-》并行处理

　　　　job1&job2 job3
　　　　hive.exec.parallel=true
　　　　hive.exec.parallel.thread.number=8

　　-》JVM重用
　　　　mapreduce.job.jvm.numtasks=$number

　　　　因为每次的jvm开启与关闭都是需要许多的资源

　　-》推测执行
　　　　mapreduce.map.speculative=true
　　　　mapreduce.reduce.speculative=true
　　　　hive.mapred.reduce.tasks.speculative.execution=true

　　-》map和reduce的个数
　　　　-》map个数：不好人为的设置
　　　　-》hdfs块的大小：dfs.blocks.size=128M
　　　　　　分片的大小：minisize/maxsize
　　　　　　mapreduce.input.fileinputformat.split.minisize

　　　　　　-》企业情景
　　　　　　　　-》文件大，少 200M 100个 map默认按块处理
　　　　　　　　-》文件小，多 40M 400个 map按分片

　　-》reudce个数
　　　　0.95-1.75*node*容器的个数

　　-》本地模式local:在当前节点运行整个任务
　　　　<property>
　　　　　　<name>hive.exec.mode.local.auto</name>
　　　　　　<value>true</value>
　　　　　　<description> Let Hive determine whether to run in local mode automatically </description>
　　　　</property>
　　　　条件：
　　　　　　1、job的输入数据的大小不能超过默认参数
　　　　　　　　inputbytes.size=128M
　　　　　　2、job处理的map task的个数
　　　　　　　　至多4个

　　　　　　3.reduce task的个数

　　　　　　　　至多1个。

相关阅读:
.NET XmlNavigator with Namespace
编程要素
【FOJ】1962 新击鼓传花游戏
【POJ】1389 Area of Simple Polygons
【POJ】2482 Stars in Your Window
【HDU】3265 Posters
【HDU】1199 Color the Ball
【HDU】3642 Get The Treasury
【HDU】4027 Can you answer these queries?
【HDU】1542 Atlantis

原文地址：https://www.cnblogs.com/juncaoit/p/6077512.html