【STL】关联容器 — hashtable

C++ 11哈希表已被列入标准列。hashtable这是hash_set、hash_map、hash_multiset、hash_multimap的底层机制。即这四种容器中都包括一个hashtable。

解决碰撞问题的办法有很多，线性探測、二次探測、开链等等。SGI STL的hashtable採用的开链方法，每一个hash table中的元素用vector承载，每一个元素称为桶（bucket），一个桶指向一个存储了实际元素的链表（list），链表节点（node）结构例如以下：

template <class Value>
struct __hashtable_node
{
  __hashtable_node* next;
  Value val;     // 存储实际值
};

再来看看hash table的迭代器定义：

template <class Value, class Key, class HashFcn,
          class ExtractKey, class EqualKey, class Alloc>
struct __hashtable_iterator {         // 迭代器
  typedef hashtable<Value, Key, HashFcn, ExtractKey, EqualKey, Alloc>
          hashtable;
  ....
 
  typedef __hashtable_node<Value> node;
 
  // 定义迭代器对应类型
  typedef forward_iterator_tag iterator_category;     // 前向迭代器
  typedef Value value_type;
  typedef ptrdiff_t difference_type;
  typedef size_t size_type;
  typedef Value& reference;
  typedef Value* pointer;
 
  node* cur;      // 迭代器眼下所指节点
  hashtable* ht;  // 和hashtable之间的纽带
 
  __hashtable_iterator(node* n, hashtable* tab) : cur(n), ht(tab) {}
  __hashtable_iterator() {}
  reference operator*() const { return cur->val; }
  pointer operator->() const { return &(operator*()); }
  iterator& operator++();
  iterator operator++(int);
  bool operator==(const iterator& it) const { return cur == it.cur; }
  bool operator!=(const iterator& it) const { return cur != it.cur; }
};

hash table的迭代器不能后退，这里关注迭代器的自增操作。代码例如以下：

template <class V, class K, class HF, class ExK, class EqK, class A>
__hashtable_iterator<V, K, HF, ExK, EqK, A>&
__hashtable_iterator<V, K, HF, ExK, EqK, A>::operator++() // 注意类模板成员函数的定义
{
  const node* old = cur;
  cur = cur->next;  // 移动到下一个node
  if (!cur) {       // 到了list结尾
    size_type bucket = ht->bkt_num(old->val);     // 依据节点值定位旧节点所在桶号
    while (!cur && ++bucket < ht->buckets.size()) // 计算下一个可用桶号
      cur = ht->buckets[bucket];    // 找到，另cur指向新桶的第一个node
  }
  return *this;
}

hashtable数据结构内容非常多，这里仅仅列出少量代码：

template <class Value, class Key, class HashFcn,
          class ExtractKey, class EqualKey,
          class Alloc>
class hashtable {   // hash table数据结构
public:
  typedef Key key_type;
  typedef Value value_type;
  typedef HashFcn hasher;          // 散列函数类型
  typedef EqualKey key_equal;
 
  typedef size_t            size_type;
  typedef ptrdiff_t         difference_type;
  ....
 
private:
  hasher hash;          // 散列函数
  key_equal equals;     // 推断键值是否相等
  ExtractKey get_key;   // 从节点取出键值
 
  typedef __hashtable_node<Value> node;
  typedef simple_alloc<node, Alloc> node_allocator; // 空间配置器
 
  vector<node*,Alloc> buckets;  // 桶的集合，能够看出一个桶实值上是一个node*
  size_type num_elements;      // node个数
  ....
}

SGI STL将hash table的大小，也就是vector的大小设计为28个质数，并存放在一个数组中：

static const int __stl_num_primes = 28; // 28个质数
static const unsigned long __stl_prime_list[__stl_num_primes] =
{
  53,         97,         193,       389,       769,
  1543,       3079,       6151,      12289,     24593,
  49157,      98317,      196613,    393241,    786433,
  1572869,    3145739,    6291469,   12582917,  25165843,
  50331653,   100663319,  201326611, 402653189, 805306457, 
  1610612741, 3221225473, 4294967291
};

当vector容量不足时，会以两倍的容量进行扩充。

以下介绍插入操作。以insert_unique为例：

// 插入新元素。键值不能反复
  pair<iterator, bool> insert_unique(const value_type& obj)
  {
    resize(num_elements + 1);           // 推断vector是否须要扩充
    return insert_unique_noresize(obj); // 直接插入obj
  }

insert操作大致分两步：第一步是扩充（假设须要的话），第二步是插入。

resize代码例如以下：

template <class V, class K, class HF, class Ex, class Eq, class A>
void hashtable<V, K, HF, Ex, Eq, A>::resize(size_type num_elements_hint)  // 推断是否须要扩充vector
{
  const size_type old_n = buckets.size();
  if (num_elements_hint > old_n)
  {  // 元素个数大于vector容量，则须要扩充vector
    const size_type n = next_size(num_elements_hint);
    if (n > old_n)
    {
      vector<node*, A> tmp(n, (node*) 0); // 建立一个暂时的vector作为转移目的地
      for (size_type bucket = 0; bucket < old_n; ++bucket)
      {  // 一个桶一个桶进行转移
        node* first = buckets[bucket];
        while (first)
        {   // 一个节点一个节点进行转移
            size_type new_bucket = bkt_num(first->val, n);  // 散列过程，对n取模
            buckets[bucket] = first->next;
            first->next = tmp[new_bucket];  // 这一句和下一句表示从链表前端插入
            tmp[new_bucket] = first;
            first = buckets[bucket];        // first指向旧vector的下一个node
        }
        buckets.swap(tmp);  // 两个vector的内容互换。使buckets彻底改变
      }
    }
  }
}

上述代码基本思路就是：先扩充，再移动。最后交换。

扩充利用next_size函数。next_size的作用就是从质数表中选取最接近而且不小于num_elements_hint的质数并返回，利用这个较大值开辟一个新vector。
移动实质上就是指针的移动。又一次对每一个节点进行散列，然后从前链入到新的vector中。
交换过程就是上面代码红色部分。这里使用了vector内部的swap成员函数，将*this和tmp的内容进行了互换。这是copy-and-swap技术。《Effective C++》条款11有说明这个技术。扩充完vector后，就能够顺利插入须要插入的元素了。

insert_unique_noresize代码例如以下：

template <class V, class K, class HF, class Ex, class Eq, class A>
pair<typename hashtable<V, K, HF, Ex, Eq, A>::iterator, bool>                 // 注意，返回一个pair
hashtable<V, K, HF, Ex, Eq, A>::insert_unique_noresize(const value_type& obj) // 直接插入节点，无需扩充
{
  const size_type n = bkt_num(obj); // 对obj进行散列，然后模上vector大小，从而确定桶号
  node* first = buckets[n];         // first指向相应桶的第一个node
 
  for (node* cur = first; cur; cur = cur->next) 
    if (equals(get_key(cur->val), get_key(obj)))  // 遇到同样node。则直接返回这个node
      return pair<iterator, bool>(iterator(cur, this), false);
 
  // 没有遇到同样node，则在list开头插入
  node* tmp = new_node(obj);
  tmp->next = first;
  buckets[n] = tmp;
  ++num_elements;
  return pair<iterator, bool>(iterator(tmp, this), true);
}

这里也是将新节点插入list的开头，具体过程已在凝视中说明。

參考：

《STL源代码剖析》 P253.

相关阅读:
winform制作自定义控件(入门)
VB2012读取xml
VB生成xml
通宵疯狂积累VB.NET基础知识
 【转】vs2010下创建webservice
React Native 快速入门之认识Props和State
mac下 mysql / nginx 问题总汇
 Oracle例外定义
 mac os x 把reids nignx mongodb做成随机启动吧
 Oracle 记录
原文地址：https://www.cnblogs.com/gcczhongduan/p/4752510.html