Pig 和 Hive 的表连接 Join

Pig：

1. 内连接：

join_table = join table1 by col1, join table2 by col2

2. 外连接：

左外连接：join_table = join table1 by col1 left outer, join table2 by col2 其中outer 可省略

右外连接：join_table = join table1 by col1 right outer, join table2 by col2 其中outer 可省略

全外连接：join_table = join table1 by col1 full outer, join table2 by col2

3. 自连接

不能直接进行： join table1 by col1, join table1 by col1

必须将数据加载两次，在进行内连接

4. 分片-复制连接 replicated join

join_table = join big_table by col1 , join small_table by col2 using 'replicated'

这样small_table 会加载到内存中复制到各个节点，再与big_table 连接。

replicated join 只支持内连接和左外连接

5. 倾斜数据连接 skew join

join_table = join table1 by col1, join table2 by col2 using 'skewed'

先对table2的键值进行抽样，判断键值的分布情况，然后给各个Reducer分配键值范围，可能同一个键值被分配到好几个reducer中

6. 排好顺序的数据连接 merge join

join_table = join table1 by col1, join table2 by col2 using 'merge'

7. 笛卡尔积

cross table1, table2

Hive：

1. 内连接，可连接多个，注意小表在前，大表在后

select * from table1 a join table2 b on a.col1=b.col2

2. 外连接

左：select * from table1 a left outer join table2 b on a.col1=b.col2

右：select * from table1 a right outer join table2 b on a.col1=b.col2

全：select * from table1 a full outer join table2 b on a.col1=b.col2

3. 左半开连接

select a.col1 , a.col2 , a.col3 from table1 a left semi join table2 b on a.col1=b.col2

实现与内连接相同功能，但是比内连接高效。

4.笛卡尔积

select * from table1 join table2

5.map-side join

设置了 hive.auto.convert.join=true, hive.mapjoin.smalltable.filesize=2500, hive.optimize.bucketmapJoin=true 等参数会自动开启

相关阅读:
美化的滚动条
网站系统开发参考网址
正则表达式获取URL参数
类实例及实例化对象对象引用
C# 静态方法静态属性调用静态方法
C# 静态方法调用非静态方法
winform 窗体间传值
从数据库中读出数据并输出
数据库链接字符串
DbHelper

原文地址：https://www.cnblogs.com/leeeee/p/7276133.html