Joins https://docs.oracle.com/database/121/TGSQL/tgsql_join.htm#TGSQL242
tidb/index_lookup_hash_join.go at master · pingcap/tidb https://github.com/pingcap/tidb/blob/master/executor/index_lookup_hash_join.go
Hash Join: Basic Steps
The optimizer uses the smaller data source to build a hash table on the join key in memory, and then scans the larger table to find the joined rows.
The basic steps are as follows:
-
The database performs a full scan of the smaller data set, called the build table, and then applies a hash function to the join key in each row to build a hash table in the PGA.
In pseudocode, the algorithm might look as follows:
FOR small_table_row IN (SELECT * FROM small_table) LOOP slot_number := HASH(small_table_row.join_key); INSERT_HASH_TABLE(slot_number,small_table_row); END LOOP;
-
The database probes the second data set, called the probe table, using whichever access mechanism has the lowest cost.
Typically, the database performs a full scan of both the smaller and larger data set. The algorithm in pseudocode might look as follows:
FOR large_table_row IN (SELECT * FROM large_table) LOOP slot_number := HASH(large_table_row.join_key); small_table_row = LOOKUP_HASH_TABLE(slot_number,large_table_row.join_key); IF small_table_row FOUND THEN output small_table_row + large_table_row; END IF; END LOOP;
For each row retrieved from the larger data set, the database does the following:
-
Applies the same hash function to the join column or columns to calculate the number of the relevant slot in the hash table.
For example, to probe the hash table for department ID
30
, the database applies the hash function to30
, which generates the hash value4
. -
Probes the hash table to determine whether rows exists in the slot.
If no rows exist, then the database processes the next row in the larger data set. If rows exist, then the database proceeds to the next step.
-
Checks the join column or columns for a match. If a match occurs, then the database either reports the rows or passes them to the next step in the plan, and then processes the next row in the larger data set.
If multiple rows exist in the hash table slot, the database walks through the linked list of rows, checking each one. For example, if department
30
hashes to slot4
, then the database checks each row until it finds30
.
-
Example 9-4 Hash Joins
An application queries the oe.orders
and oe.order_items
tables, joining on the order_id
column.
SELECT o.customer_id, l.unit_price * l.quantity FROM orders o, order_items l WHERE l.order_id = o.order_id;
The execution plan is as follows:
-------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 665 | 13300 | 8 (25)| |* 1 | HASH JOIN | | 665 | 13300 | 8 (25)| | 2 | TABLE ACCESS FULL | ORDERS | 105 | 840 | 4 (25)| | 3 | TABLE ACCESS FULL | ORDER_ITEMS | 665 | 7980 | 4 (25)| -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("L"."ORDER_ID"="O"."ORDER_ID")
Because the orders
table is small relative to the order_items
table, which is 6 times larger, the database hashes orders
. In a hash join, the data set for the build table always appears first in the list of operations (Step 2). In Step 3, the database performs a full scan of the larger order_items
later, probing the hash table for each row.
How Hash Joins Work When the Hash Table Does Not Fit in the PGA
The database must use a different technique when the hash table does not fit entirely in the PGA. In this case, the database uses a temporary space to hold portions (called partitions) of the hash table, and sometimes portions of the larger table that probes the hash table.
The basic process is as follows:
-
The database performs a full scan of the smaller data set, and then builds an array of hash buckets in both the PGA and on disk.
When the PGA hash area fills up, the database finds the largest partition within the hash table and writes it to temporary space on disk. The database stores any new row that belongs to this on-disk partition on disk, and all other rows in the PGA. Thus, part of the hash table is in memory and part of it on disk.
-
The database takes a first pass at reading the other data set.
For each row, the database does the following:
-
Applies the same hash function to the join column or columns to calculate the number of the relevant hash bucket.
-
Probes the hash table to determine whether rows exist in the bucket in memory.
If the hashed value points to a row in memory, then the database completes the join and returns the row. If the value points to a hash partition on disk, however, then the database stores this row in the temporary tablespace, using the same partitioning scheme used for the original data set.
-
-
The database reads each on-disk temporary partition one by one
-
The database joins each partition row to the row in the corresponding on-disk temporary partition.