【Java基础】HashMap工作原理

HashMap

Hash table based implementation of the Map interface. This
implementation provides all of the optional map operations, and permits
null values and the null key. (The HashMap class is roughly equivalent
to Hashtable, except that it is unsynchronized and permits nulls.)
This class makes no guarantees as to the order of the map; in particular,
it does not guarantee that the order will remain constant over time.

官方文档描述信息：基于Map接口实现，键值都允许null，非线程同步的，不按插入顺序排，也不保证不随时间变化。

HashMap底层的数据结构实现是数组加链表，数组的每一项都是链。
构造函数

HashMap提供了四个构造函数：
- HashMap(int initialCapacity, float loadFactor)：构造一个带有指定容量和加载因子的空的HashMap。
- HashMap(int initialCapacity)：构造一个指定容量和默认加载因子为0.75的空的HashMap
- HashMap()：构造一个默认容量为16和默认加载因子为0.75的空的HashMap。
- HashMap(Map<? extends K, ? extends V> m)：构造一个匹配所有map中所有的元素并且加载因子是0.75的空HashMap。
初始容量和加载因子是影响HashMap性能的重要参数。
- 初始容量：创建哈希表时的容量(bucket)
- 加载因子：哈希表在其容量自动增加之前可以达到多满的一个尺度。
put()的实现

put()大致的思路：
1. 对key的hashCode()做hash，然后再计算index
2. 如果没碰到直接放到bucket里
3. 如果碰撞了，以链表的形式存在buckets后
4. 如果碰撞导致链表过长(大于等于TREEIFY_THRESHOLD)，就把链接表转换成红黑树
5. 如果节点已经存在就替换old value(保证key的唯一性)
6. 如果bucket满了(超过加载因子 * 当前容量)，就要resize

    public V put(K key, V value) {
        // 对key的hashCode()做hash()
        return putVal(hash(key), key, value, false, true);
    }
    
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // table为空则创建
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 计算index并做特殊处理
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            // 如果hash和key都相同
            if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 如果该链为树
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // 如果该链为链表
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 如果链表的长度超过了这个阈值，
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // 如果节点存在的话，就替换新值返回旧值
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // 如果大小超过了 加载因子*当前容量，就进行扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

get()的实现

get()的大致思路：
1. bucket里的第一个节点，直接命中；
2. 如果有冲突，则通过key.equals(k)去查找对应的entry
  若为树，则在树中通过key.equals(k)查找
  若为链表，则在链表中通过key.equals(k)查找

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // table不为空才进行以下操作，table为null的话直接返回null
        if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
            // 直接命中
            if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                // 在树中命中
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 在链表中命中
                do {
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

hash()的实现

    static final int hash(Object key) {
        int h;
        // 使用key的hashCode进行hash计算
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

这个函数的作用是：高16bit不变，低16bit和高16bit做了一个异或。
在计算下标的时候是这样实现的：

    tab[i = (n - 1) & hash] // 使用&操作，而非%操作

resize()的实现

当put时，当超过限制的时候会resize，然而又因为我们使用的2次幂的扩展(指长度扩展为原来的2倍)，所以，元素的位置要么是在原位置，要么是在原位置再移动2次幂的位置。
面试题
1. HashMap有什么特点？
  
  基于Map接口实现，存储键值对时，他可以接受null的键值，是非同步的，HashMap存储着Entry对象
2. 你知道HashMap的工作原理么？
  
  通过hash的方法，通过put和get存储获取对象。存储时，我们将k/v传给put方法时，通过获取k的hashCode并计算hash值从而获取到bucket的位置，进一步存储，HashMap会根据当前bucket的占用情况自动扩容(当超出加载因子 * 当前容量时扩容到当前容量的两倍)。获取对象时，我们将k传给get方法，通过获取k的hashCode并计算hash值获取到在bucket中的位置，并进一步调用获取equals()获取键值对。如果发生碰撞时，HashMap通过链表将产生碰撞冲突的元素组织起来，在Java8中，当一个bucket的存储容量超过某个限制(默认是8)时就会用红黑树来代替链表，从而提高速度。
3. 你知道get和put的原理么？equals()和hashCode都有什么用？
  
  通过对key的hashCode()进行hashing，通过(n - 1 & hash)计算下标，当发生碰撞时，则利用key.equals()方法去链表或者树中查找对应的键值对。
4. 你知道hash的实现么？为什么要这样的实现？
```
     (h = key.hashCode()) ^ (h >>> 16)
```
  这么做可以在bucket的n比较小的时候，也能保证考虑到高地bit都参与到hash的计算中，同时不会有太大的开销
5. 如果HashMap的大小超过了负载因子(load factor)定义的容量，怎么办？
  
  当超过了负载因子，会重新resize一个原来长度两倍的HashMap，并重新调用hash方法

参考文章：

原文地址：https://www.cnblogs.com/lebo0425/p/6504301.html