1概述
hash join 在特性与merge join相同,都需要一个等值条件。当在连接条上无法命中索引,或大集合的Join, nested join和 merge join可能就无法得到很好的性能,这时我们就需要考虑用hash join.
2基本算法
Hash join 分为两个阶段,build和probe。在build阶段,会将其中一个集合作为build set,然后hash build table在连接条件上的列,并将结果存储在内存中的(命名为build hash table). 在probe阶段(将第二个集合命名为probe set),每一行hash probe set在连接条件上的列,然后与build hash table比较,如果相等,则返回。
伪代码:
for each row R1 in the build table
begin
calculate hash value on R1 join
key(s)
insert R1 into the appropriate hash
bucket
end
for each row R2 in the probe table
begin
calculate hash value on R2 join
key(s)
for each row R1 in the corresponding
hash bucket
if R1 joins
with R2
return (R1, R2)
end
3 示例
测试数据
View Code
create table T1 (a int, b int, x char(200)) create table T2 (a int, b int, x char(200)) create table T3 (a int, b int, x char(200)) set nocount on declare @i int set @i = 0 while @i < 1000 begin insert T1 values (@i * 2, @i * 5, @i) set @i = @i + 1 end set @i = 0 while @i < 10000 begin insert T2 values (@i * 3, @i * 7, @i) set @i = @i + 1 end set @i = 0 while @i < 100000 begin insert T3 values (@i * 5, @i * 11, @i) set @i = @i + 1 end
执行SQL:
SET STATISTICS PROFILE ON select * from ( T1 inner join T2 on T1.a = T2.a ) inner join T3 on T1.b = T3.a option (hash join)
执行结果: