HashMap 基本接口和功能分析

带着问题来阅读此文：
1)，HashMap内部存储格式是什么样式的？
2)，HashMap扩容策略是什么？
3)，HashMap增删改查是如何实现的？
4)，为什么使用HashMap，优劣势？

1,HashMap注释解读

注意：以下分析和代码，是基于Android aosp4.4 源码中的以下java类（openjdk?），可能跟java-jdk中的部分逻辑看起来不太一样
libcore/luni/src/main/java/java/util/HashMap.java

官方注释：

/**
 * HashMap is an implementation of {@link Map}. All optional operations are supported.
 *
 * <p>All elements are permitted as keys or values, including null.
 *
 * <p>Note that the iteration order for HashMap is non-deterministic. If you want
 * deterministic iteration, use {@link LinkedHashMap}.
 *
 * <p>Note: the implementation of {@code HashMap} is not synchronized.
 * If one thread of several threads accessing an instance modifies the map
 * structurally, access to the map needs to be synchronized. A structural
 * modification is an operation that adds or removes an entry. Changes in
 * the value of an entry are not structural changes.
 *
 * <p>The {@code Iterator} created by calling the {@code iterator} method
 * may throw a {@code ConcurrentModificationException} if the map is structurally
 * changed while an iterator is used to iterate over the elements. Only the
 * {@code remove} method that is provided by the iterator allows for removal of
 * elements during iteration. It is not possible to guarantee that this
 * mechanism works in all cases of unsynchronized concurrent modification. It
 * should only be used for debugging purposes.
 *
 * @param <K> the type of keys maintained by this map
 * @param <V> the type of mapped values
 */

翻译：
1，允许所有元素作为键或值，包括null。
2，请注意，HashMap的迭代顺序是不确定的。如果要进行确定性迭代，请使用{@link LinkedHashMap}。
解读：是无序的，iteration的最后一个，可能不是最后插入的那一个
3，非线程同步的
4，map.entrySet().iterator()可能导致ConcurrentModificationException，需要在使用map增删改查的地方，保证没有多线程调用，或加线程同步锁

2，HashMap构造函数和重要的常量

/**
 * Max capacity for a HashMap. Must be a power of two >= MINIMUM_CAPACITY.
 */
private static final int MAXIMUM_CAPACITY = 1 << 30;
private static final int MINIMUM_CAPACITY = 4;
private static final Entry[] EMPTY_TABLE
        = new HashMapEntry[MINIMUM_CAPACITY >>> 1];

private transient int threshold;
// 构造
public HashMap() {
    table = (HashMapEntry<K, V>[]) EMPTY_TABLE;
    threshold = -1; // Forces first put invocation to replace EMPTY_TABLE
}

// 静态内部类HashMapEntry，链表格式数据结构
static class HashMapEntry<K, V> implements Entry<K, V> {
    final K key;
    V value;
    final int hash;
    HashMapEntry<K, V> next;
}

3，增加和更新数据，HashMap.put方法是HashMap的核心

这里进行了HashMap的插入或替换原有value值，扩容的工作。

@Override public V put(K key, V value) {
    if (key == null) {
        // 专门为key为null的数据，准备了一个内部对象entryForNullKey
        // 可能是null的hash值没办法计算
        return putValueForNullKey(value);
    }

    // 获取key对应的hash int值, 通过secondaryHash计算的数值，不唯一
    int hash = secondaryHash(key);
    // HashMapEntry为链表格式数据结构, table为数组类型：HashMapEntry[]
    // 数组中每个item是HashMapEntry链表
    HashMapEntry<K, V>[] tab = table;
    // hash值与当前数组大小-1  取按位与，获取到的数据的值
    // 这个数值，会落在0  到 tab.length - 1的某一位上，应该是为了均匀分布到
    // 注意length大小为2的n次方，length-1时，即二进制形式高位变为0，其余位变为1
    // 这里计算的index的值，取的是hash的低length-1位的二进制数值
    // 注意这个地方的计算方式，在doubleCapacity()扩容逻辑里需要了解
    int index = hash & (tab.length - 1);
    // 获取table数组 index位置上的对象HashMapEntry， 遍历当前位置的链表深度
    // 查找是否有相同key的HashMapEntry，如果存在此对象，替换value值即可。
    // 疑问，为什么只从这一个index里查找，扩容后，应该落不到这个index里吧？
    // 等扩容后，会落在新的index里，待会儿需要确认下扩容是不是重新计算了index值
    for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
        // 判断到e.hash != hash 时，就不再判断equals，短路后面的判断逻辑，提高效率？
        // 两个不同的key，通过secondaryHash计算出来的数值一样，所以需要&&运算
        if (e.hash == hash && key.equals(e.key)) {
            // 找到了相同的key，替换value即可！插入数据完成。
            preModify(e);
            V oldValue = e.value;
            e.value = value;
            return oldValue;
        }
    }
    // 到这里，说明没有找到已存储的对象
    // No entry for (non-null) key is present; create one
    // 虽然不支持线程同步，但是这里还是尽可能用此方式来检查线程同步问题。
    modCount++;
    // size++即当前数据总数+1， 然后检查是否需要扩容
    // 为什么要扩容？可能是为了减少链表的深度，加快查询效率
    if (size++ > threshold) {
        // 扩容，下面进行详解，猜测是对threshold赋值，对table数组扩容，并且对之前的数据进行复制。
        tab = doubleCapacity();
        // 重新根据当前数组的length，计算下标index。
        // 注意，如果不重新计算index，用之前的size计算出的index会有问题！
        // 下次put相同的key时，找的index下标可能不一致，导致脏数据存在！
        index = hash & (tab.length - 1);
    }
    // 将数据加入数组下标index对应的链表中。
    addNewEntry(key, value, hash, index);
    // 没找到对应的数据，所以返回null.插入数据完成。
    return null;
}

addNewEntry方法j简介：

    // 初始化一个新的HashMapEntry对象
    // 并将此对象的next指向table[index]，然后将table[index]指向此对象
    void addNewEntry(K key, V value, int hash, int index) {
        table[index] = new HashMapEntry<K, V>(key, value, hash, table[index]);
    }

    //HashMaoENtry构造方法
    HashMapEntry(K key, V value, int hash, HashMapEntry<K, V> next) {
        this.key = key;
        this.value = value;
        this.hash = hash;
        // 此对象变成了链表的头，并将next指向链表之前的头
        this.next = next;
    }

secondaryHash数值计算的疑问：
1)，不同的key是否通过此方法计算后，得出相同的数值，根据链表的存储结构猜测是这样的
2)，同一个key值，在不同的size时，hash数值跟size-1进行逻辑与&运算后，如何保证落在同一个链表里？应该是扩容时，重新计算了index。

    // Doug Lea's supplemental secondaryHash function (non-inlined).
    // Replace with Collections.secondaryHash when the VM is fast enough (http://b/8290590).
    static int secondaryHash(Object key) {
        int hash = key.hashCode();
        hash ^= (hash >>> 20) ^ (hash >>> 12);
        hash ^= (hash >>> 7) ^ (hash >>> 4);
        return hash;
    }

扩容方法，做了以下工作：
1，重新申请更大的数组
2，赋值新的threshold临界值
3，将旧数据复制到新数组

    private HashMapEntry<K, V>[] doubleCapacity() {
        HashMapEntry<K, V>[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            // 如果超过了最大值，则不进行扩容，若有新的数据，则链表深度进行增加
            return oldTable;
        }
        int newCapacity = oldCapacity * 2;//两倍扩容
        // new 出来一个新的数组，并给threshold赋值
        // threshold是newCapacity*3/4，是指达到3/4容量就进行扩容吗？？
        HashMapEntry<K, V>[] newTable = makeTable(newCapacity);
        if (size == 0) {
            // 当前没有数据插入，返回新数组即可，当构造方法传入HashMap集合时，可能触发此逻辑
            return newTable;
        }
        // 下面应该是重新计算hash值对应的index，以及拷贝原来数据吧？？
        for (int j = 0; j < oldCapacity; j++) {
            /*
             * Rehash the bucket using the minimum number of field writes.
             * This is the most subtle and delicate code in the class.
             * 翻译：使用最少的字段写入次数重新哈希存储桶。
             * 这是类里最微妙的代码，确实是！第一二三遍看不懂～～～
             */
            HashMapEntry<K, V> e = oldTable[j];
            if (e == null) {
                continue;
            }
            // 这是啥！跟之前index = hash & (tab.length - 1)看起来有点像！难道是亲兄弟？？
            // tab.length=2的n次方，即100000……
            // index与highBit的差别：
            // index舍弃了第length高位1，取的剩余length-1位的值
            // highBit只取第length高位的值，剩余的舍弃。这哥俩真是两个极端！
            int highBit = e.hash & oldCapacity;
            HashMapEntry<K, V> broken = null;
            // 下面一行代码，应该就是数据拷贝的地方，这里的下标为什么这样计算j | highBit？
            // 跟前面put函数里计算下标的结果一样吗？
            // 如果不一样的话，会出现插入已存在的key值时，找不到正确index和链表的错误啊？
            // 进行以下方程式确认算法是否一致：
            // j=hash&(oldCapacity-1) ,
            // j | highBit=(hash & (oldCapacity-1)) | (e.hash & oldCapacity)
            // 注意oldCapacity为2的n次方，oldCapacity-1，在二进制时，是高位变为0，其余位变为1
            //            = (hash最高位为0，保留低位)) | (hash保留高位的数值，其余位数为0);
            //            = hash & oldCapacity二进制位数这样多的1;
            //            = hash &(oldCapacity*2-1);
            //            = hash & (newCapacity-1); 
            // 因此此处oldCapacity是2的n次方，所以此等式是成立的
            // 所以此处的运算跟put函数里计算下标的结果是一样的！
            // 此处费了这么大劲儿的好处是什么？比着后面的这个计算方式，少了-1的运算吗？？？
            // 没有明白！比后面的好到哪里去了？改天写个test试下效率！hash & (newCapacity -1)
            newTable[j | highBit] = e;
            // 以下逻辑未理清
            for (HashMapEntry<K, V> n = e.next; n != null; e = n, n = n.next) {
                int nextHighBit = n.hash & oldCapacity;
                if (nextHighBit != highBit) {
                    if (broken == null)
                        newTable[j | nextHighBit] = n;
                    else
                        broken.next = n;
                    broken = e;
                    highBit = nextHighBit;
                }
            }
            if (broken != null)
                broken.next = null;
        }
        // 至此扩容结果
        return newTable;
    }

4，删除逻辑，看完上面的逻辑，这里就比较简单了

    public V remove(Object key) {
        if (key == null) {
            return removeNullKey();
        }
        //计算secondaryHash
        int hash = secondaryHash(key);
        HashMapEntry<K, V>[] tab = table;
        //计算存储的下标位置
        int index = hash & (tab.length - 1);
        // 查找链表是否存在，若存在，进行删除操作。
        for (HashMapEntry<K, V> e = tab[index], prev = null;
             e != null; prev = e, e = e.next) {
            if (e.hash == hash && key.equals(e.key)) {
                if (prev == null) {
                    tab[index] = e.next;
                } else {
                    prev.next = e.next;
                }
                modCount++;
                size--;
                postRemove(e);
                // 删除结束
                return e.value;
            }
        }
        // 没有找到此key对应的数据，删除失败
        return null;
    }

5，查，跟删除逻辑类似

    public V get(Object key) {
        if (key == null) {
            HashMapEntry<K, V> e = entryForNullKey;
            return e == null ? null : e.value;
        }

        // Doug Lea's supplemental secondaryHash function (inlined).
        // Replace with Collections.secondaryHash when the VM is fast enough (http://b/8290590).
        // 这里为什么不用secondaryHash方法，而是把方法里的逻辑摘出来写到这里？为了提高查询效率？
        int hash = key.hashCode();
        hash ^= (hash >>> 20) ^ (hash >>> 12);
        hash ^= (hash >>> 7) ^ (hash >>> 4);

        HashMapEntry<K, V>[] tab = table;
        for (HashMapEntry<K, V> e = tab[hash & (tab.length - 1)];
             e != null; e = e.next) {
            K eKey = e.key;
            if (eKey == key || (e.hash == hash && key.equals(eKey))) {
                return e.value;
            }
        }
        return null;
    }

读完以上逻辑，回答开篇的问题:
1)，HashMap内部存储格式是什么样式的？
答：HashMapEntry[]数组的形式，每个数组里的item-HashMapEntry，是链表的数据结构
2)，HashMap扩容策略是什么？
答：size达到当前数组大小的3/4时开始扩容，每次扩容2倍大小
3)，HashMap增删改查是如何实现的？
如以上代码示例。
4)，为什么使用HashMap，优劣势？
增：插入效率一般，比数组慢很多，比链表快
删：比数组快很多，跟链表一致
改：跟数组和链表都一致
查：在数据落到数组上的位置比较均匀的条件下，比链表快很多
若数据都落在同一个index时，hashmap退化为链表。
应该是比没有排序的数组效率也快很多。
总结后发现，HashMap是插入效率一般，但是查找效率极高的数据结构。

本文地址：https://blog.csdn.net/Railshiqian/article/details/107348836