java集合对象实现原理

1.集合包

集合包是java中最常用的包，它主要包括Collection和Map两类接口的实现。

对于Collection的实现类需要重点掌握以下几点：

1）Collection用什么数据结构实现？

2）Collection的创建、添加对象、删除对象、获取对象、遍历、判断是否存在、排序等操作的原理，及优缺点。

1.1.Collection

Collection存放的是多个单对象。Collection又分为两类接口，List和Set。

1.1.1.List

List支持放入重复的对象。List的实现类主要包括：ArrayList、LinkedList、Vector、Stack。

ArrayList

1）从ArrayList的构造方法可以看出，他使用数组作为底层数据结构。

public ArrayList(int initialCapacity) {

super();

if (initialCapacity < 0)

throw new IllegalArgumentException("Illegal Capacity: "+

initialCapacity);

this.elementData = new Object[initialCapacity];

}

默认的initialCapacity为10。

2）添加对象

由于是定容量的数组存储对象，总有数组满的时候，此时需要我们进行扩容。

public void ensureCapacity(int minCapacity) {//minCapacity为目前需要的最小容量

modCount++;

int oldCapacity = elementData.length;//当前list的最大存储容量

if (minCapacity > oldCapacity) {

Object oldData[] = elementData;

int newCapacity = (oldCapacity * 3)/2 + 1;//扩展容量1.5倍

if (newCapacity < minCapacity)

newCapacity = minCapacity;

// minCapacity is usually close to size, so this is a win:

elementData = Arrays.copyOf(elementData, newCapacity);//复制、扩容

}

其中Arrays.copyOf可能会比较陌生，他的具体实现如下：

public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {

T[] copy = ((Object)newType == (Object)Object[].class)

? (T[]) new Object[newLength]

: (T[]) Array.newInstance(newType.getComponentType(), newLength);//创建一个新数组，该数组的类型和之前ArrayList中元素的类型一致。

System.arraycopy(original, 0, copy, 0,

Math.min(original.length, newLength));//System arrayCopy

return copy;

}

添加对象还有一种方法，add(int index, E element)，在指定位置插入一个对象。

public void add(int index, E element) {

if (index > size || index < 0)

throw new IndexOutOfBoundsException(

"Index: "+index+", Size: "+size);

ensureCapacity(size+1); // Increments modCount!!

System.arraycopy(elementData, index, elementData, index + 1,

size - index);

elementData[index] = element;

size++;

}

指定的位置必须是存在的（0到size之间），由于ArrayList是数组实现，因此需要将插入位置之后的元素进行后移一个位置，腾出空间给新元素。因此这个方法多了一次数组复制的工作。

于此同时还有一个修改对象的方法，set(int index, E element)，将制定位置的对象替换掉。

public E set(int index, E element) {

RangeCheck(index);

E oldValue = (E) elementData[index];

elementData[index] = element;

return oldValue;

}

好奇他的范围检索只对“上界”检查，不对“下界”检查。

private void RangeCheck(int index) {

if (index >= size)

throw new IndexOutOfBoundsException(

"Index: "+index+", Size: "+size);

}

3）删除对象

删除指定位置的对象remove(int index)。

public E remove(int index) {

RangeCheck(index);//检查范围

modCount++;

E oldValue = (E) elementData[index];

int numMoved = size - index - 1;//计算移动几个元素

if (numMoved > 0)

System.arraycopy(elementData, index+1, elementData, index,

numMoved);

elementData[--size] = null; // gc回收

return oldValue;

}

remove(int index)和add(int index , E element)类似，需要通过数组的复制覆盖或腾出空间。

删除指定对象remove(E element)

public boolean remove(Object o) {

if (o == null) {

for (int index = 0; index < size; index++)

if (elementData[index] == null) {

fastRemove(index);

return true;

}

} else {

for (int index = 0; index < size; index++)

if (o.equals(elementData[index])) {

fastRemove(index);

return true;

}

return false;

}

删除指定的对象，需要对删除对象是否为null区别对待。如果为null，则遍历数组中的元素，并比较是否为null（==null），如果为null则调用fastRemove删除。如果不为null，则遍历数组中的元素，并用equals比较是否相等，相等则调用fastRemove删除。

private void fastRemove(int index) {

modCount++;

int numMoved = size - index - 1;

if (numMoved > 0)

System.arraycopy(elementData, index+1, elementData, index,

numMoved);

elementData[--size] = null; // Let gc do its work

}

fastRemove是简化的remove(int index)，不需要进行范围检查。

还有removeRange(int fromIndex, int toIndex)意思差不多，不予赘述！

4)获取单个对象

get(int index)，传入的参数的为数组元素的位置。

public E get(int index) {

RangeCheck(index);

return (E) elementData[index];

}

获取指定对象的位置，indexOf(Object o)。

public int indexOf(Object o) {

if (o == null) {

for (int i = 0; i < size; i++)

if (elementData[i]==null)

return i;

} else {

for (int i = 0; i < size; i++)

if (o.equals(elementData[i]))

return i;

}

return -1;

}

其实remove(Object o)可以改成：

public boolean remove(Object o) {

int index = indexOf(o);

if(index>=0){

fastRemove(index);

return true;

}else

return false;

}

有了indexOf，contains(Object E)就简单了。

public boolean contains(Object o) {

return indexOf(o) >= 0;

}

5）遍历

iterator有ArrayList的父类AbstractList实现，调用iterator会创建一个内部类Itr的实例（class Itr implements Iterator<E>）。主要关注hasNext、next方法。

public boolean hasNext() {

return cursor != size();

}

比较当前指向数组的位置是否和数组中已有元素的个数相等。

public E next() {

checkForComodification();

try {

E next = get(cursor);

lastRet = cursor++;

return next;

} catch (IndexOutOfBoundsException e) {

checkForComodification();

throw new NoSuchElementException();

}

checkForComodification

final void checkForComodification() {

if (modCount != expectedModCount)

throw new ConcurrentModificationException();

}

调用next的时候要比较当前的modCount和创建iterator时的modCount是否相等。如果不相等，则说明对集合大小产生了影响，此时抛出ConcurrentModificationException。

相等则调用get方法，此时有可能抛出IndexOutOfBoundsException，在捕获IndexOutOfBoundException后，检查modCount（checkForComodification），如果modCount不相等，抛出ConcurrentModificationException，

如果相等则抛出NoSuchElementException。

LinkedList

LinkedList是基于双向链表机制，在LinkedList中，Entry类来代表集合中的元素。

private static class Entry<E> {

E element;

Entry<E> next;

Entry<E> previous;

Entry(E element, Entry<E> next, Entry<E> previous) {

this.element = element;

this.next = next;

this.previous = previous;

}

元素的值赋给element，previous指向前一个元素，next指向后一个元素，通过previous、next将多个独立的Entry串起来形成链表，因为它有两个方向的关联，所以称为双向链表。

1）创建LinkedList

private transient Entry<E> header = new Entry<E>(null, null, null);

private transient int size = 0;

/**

* Constructs an empty list.

*/

public LinkedList() {

header.next = header.previous = header;

}

创建一个Entry对象，将其previous、nest全部指向自己（header），形成一个闭环。

2）添加元素

add(E e)实际调用了addBefore。（addBefore(e, header);）

private Entry<E> addBefore(E e, Entry<E> entry) {

Entry<E> newEntry = new Entry<E>(e, entry, entry.previous);

newEntry.previous.next = newEntry;

newEntry.next.previous = newEntry;

size++;

modCount++;

return newEntry;

}

这个地方稍微有点绕，新建一个Entry对象，并将next指向header，previous指向header.previous，实际header.previous都是指向最后一个元素（为添加之前最后一个元素）。

将前一元素的next指向自己，前一元素为header.previous，即为添加前最后一个元素。

将自己的next元素，即header元素的previous指向自己，这样也始终保持了header.previous都是指向最后一个元素。

3）删除元素

remove(Object o)

public boolean remove(Object o) {

if (o == null) {

for (Entry<E> e = header.next; e != header; e = e.next) {

if (e.element == null) {

remove(e);

return true;

}

} else {

for (Entry<E> e = header.next; e != header; e = e.next) {

if (o.equals(e.element)) {

remove(e);

return true;

}

return false;

}

先遍历找到对应的Entry，然后在调用remove(Entry e)。

private E remove(Entry<E> e) {

if (e == header)

throw new NoSuchElementException();

E result = e.element;

e.previous.next = e.next;

e.next.previous = e.previous;

e.next = e.previous = null;

e.element = null;

size--;

modCount++;

return result;

}

要删除指定的Entry e比较简单，让e的前一个元素的next指向e的next(e.previous.next = e.next)，让e的后一个元素的previous指向e的previous(e.next.previous = e.previous)。

然后将e的element、next和previous置为null，此时gc应该有机会将删除的e消灭掉。

4）获取指定位置的元素

get(int index)有entry(int index)实现。

private Entry<E> entry(int index) {

if (index < 0 || index >= size)

throw new IndexOutOfBoundsException("Index: " + index + ", Size: "

+ size);

Entry<E> e = header;

if (index < (size >> 1)) {

for (int i = 0; i <= index; i++)

e = e.next;

} else {

for (int i = size; i > index; i--)

e = e.previous;

}

return e;

}

这里有个小小的优化，如果index<size/2，则从前往后遍历，否则从后往前遍历链表。

5）遍历

iterator会创建一个AbstractList的内部类ListItr。

这里的类结构有必要说明一下。Iterator接口就定义了三个方法：hasNext、next、remove。

ListIterator接口，继承Iterator接口，又定义了：add、hasPrevious、previous、set、previousIndex、nextIndex等方法。

Ite作为AbstractList的内部类，实现了Iterator接口，主要用于ArrayList的遍历。

ListIte作为AbstractList的内部类，实现了ListIterator接口、同时继承了Ite类，主要用于LinkedList的遍历。

看看ListIte的previous方法：

public E previous() {

checkForComodification();

try {

int i = cursor - 1;

E previous = get(i);

lastRet = cursor = i;

return previous;

} catch (IndexOutOfBoundsException e) {

checkForComodification();

throw new NoSuchElementException();

}

因此LinkedList可以向前、向后遍历。

6）其他一些方法

offer、peek、poll、pop、push。

offer和add类似（offerFirst、offerLast）

peek和get类似（peekFirst、peekLast）

poll和remove类似（pollFirst、pollLast）

pop等价removeFirst

push等价addFirst

Vector

Vector和ArrayList一样，也是基于数组的方式来实现的。

Vector是基于synchronized实现的线程安全的ArrayList，因此很多方法都和ArrayList的类似，只是添加了synchronized关键字。

除此之外，还有扩容方面稍有差别。

private void ensureCapacityHelper(int minCapacity) {

int oldCapacity = elementData.length;

if (minCapacity > oldCapacity) {

Object[] oldData = elementData;

int newCapacity = (capacityIncrement > 0) ?

(oldCapacity + capacityIncrement) : (oldCapacity * 2);

if (newCapacity < minCapacity) {

newCapacity = minCapacity;

}

elementData = Arrays.copyOf(elementData, newCapacity);

}

他这里直接就是翻倍，而ArrayList是1.5倍。为什么还搞个这样的区别呢？

Stack

Stack是继承Vector，实现了LIFO的栈操作。主要由push、pop、peek方法。

public E push(E item) {

addElement(item);

return item;

}

public synchronized E pop() {

E obj;

int len = size();

obj = peek();

removeElementAt(len - 1);

return obj;

}

public synchronized E peek() {

int len = size();

if (len == 0)

throw new EmptyStackException();

return elementAt(len - 1);

}

ArrayList是基于数组的，get很快，但添加、删除操作需要移动元素，效率较低。且不是线程安全的。

LinkedList是基于双向链表的，添加、删除不需要移动元素，仅仅只要改变元素的previous、next，效率较高。get需要从前或从后开始遍历，效率较低。同样不是线程安全的。

Vector是线程安全的ArrayList实现，Stack在继承Vector的基础上实现了栈的操作。

通常我们会在外部对线程安全进行控制而选用ArrayList而非Vector。

1.1.2.Set

Set不支持放入重复的对象。Set的实现类主要包括：HashSet、TreeSet。

（先看Map部分）

HashSet

HashSet是基于HashMap来实现的。

TreeSet

TreeSet是基于TreeMap来实现的。

1.2.Map

Map存放Key-Value形式的键值对。Map的实现类主要包括：HashMap、TreeMap。

HashMap

1）创建

public HashMap() {

this.loadFactor = DEFAULT_LOAD_FACTOR;

threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);

table = new Entry[DEFAULT_INITIAL_CAPACITY];

init();

}

默认loadFactor为0.75，threshold为12，创建一个大小为16的Entry对象数组。（大小为2的4次方）

public HashMap(int initialCapacity, float loadFactor) {

if (initialCapacity < 0)

throw new IllegalArgumentException("Illegal initial capacity: " +

initialCapacity);

if (initialCapacity > MAXIMUM_CAPACITY)

initialCapacity = MAXIMUM_CAPACITY;

if (loadFactor <= 0 || Float.isNaN(loadFactor))

throw new IllegalArgumentException("Illegal load factor: " +

loadFactor);

// Find a power of 2 >= initialCapacity

int capacity = 1;

while (capacity < initialCapacity)

capacity <<= 1;

this.loadFactor = loadFactor;

threshold = (int)(capacity * loadFactor);

table = new Entry[capacity];

init();

}

指定initialCapacity、loadFactor，capacity为大于initialCapacity的最小的2的n次方。capacity为Entry数组的大小。

2）添加

put(Object key , Object value)

public V put(K key, V value) {

if (key == null)

return putForNullKey(value);

int hash = hash(key.hashCode());

int i = indexFor(hash, table.length);

for (Entry<K,V> e = table[i]; e != null; e = e.next) {

Object k;

if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {

V oldValue = e.value;

e.value = value;

e.recordAccess(this);

return oldValue;

}

modCount++;

addEntry(hash, key, value, i);//key不存在

return null;

}

indexFor是根据capacity做取余操作。

static int indexFor(int h, int length) {

return h & (length-1);

}

当key不存在的时候，也许会冲突，这个交由addEntry处理。

void addEntry(int hash, K key, V value, int bucketIndex) {

Entry<K,V> e = table[bucketIndex];

table[bucketIndex] = new Entry<K,V>(hash, key, value, e);

if (size++ >= threshold)

resize(2 * table.length);

}

addEntry不对冲突进行特殊处理，都会将新加的k-v作为一个Entry加入到每个列表的头部。

当size大于等于threshod时，需要进行扩容。扩容是一个比较繁琐的过程，需要对当前Entry对象数组中的元素重新hash，并填充数组，最后重新设置threshold值。

void resize(int newCapacity) {

Entry[] oldTable = table;

int oldCapacity = oldTable.length;

if (oldCapacity == MAXIMUM_CAPACITY) {

threshold = Integer.MAX_VALUE;

return;

}

Entry[] newTable = new Entry[newCapacity];

transfer(newTable);

table = newTable;

threshold = (int)(newCapacity * loadFactor);

}

void transfer(Entry[] newTable) {

Entry[] src = table;

int newCapacity = newTable.length;

for (int j = 0; j < src.length; j++) {

Entry<K,V> e = src[j];

if (e != null) {

src[j] = null;

do {

Entry<K,V> next = e.next;

int i = indexFor(e.hash, newCapacity);//重新hash

e.next = newTable[i];

newTable[i] = e;

e = next;

} while (e != null);

}

这块内容也比较复杂，有兴趣的同学最好跟下代码。

如果我们预知需要存入很多k-v，还调用默认无参构造map，那么就会面临很多次不必要的扩容操作。因此最好选用public HashMap(int initialCapacity)构造方法。（或者：public HashMap(int initialCapacity, float loadFactor) ）

3）获取

get(Object key)

public V get(Object key) {

if (key == null)

return getForNullKey();

int hash = hash(key.hashCode());

for (Entry<K,V> e = table[indexFor(hash, table.length)];

e != null;

e = e.next) {

Object k;

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

return e.value;

}

return null;

}

对于key为null的情况，直接获取数组中的第一个Entry对象，并基于next属性进行遍历，寻找key为null的Entry，如果找到了则返回该Entry的value，没有找到返回null。

如果key不为null，则对key进行hash，然后取余获取其的存储位置。然后获取该位置上的Entry，并基于next属性进行遍历，寻找key为null的Entry，如果找到了则返回该Entry的value，没有找到返回null。

4）删除

remove(Object key)

具体的删除过程如下：

final Entry<K,V> removeEntryForKey(Object key) {

int hash = (key == null) ? 0 : hash(key.hashCode());

int i = indexFor(hash, table.length);

Entry<K,V> prev = table[i];

Entry<K,V> e = prev;

while (e != null) {

Entry<K,V> next = e.next;

Object k;

if (e.hash == hash &&

((k = e.key) == key || (key != null && key.equals(k)))) {

modCount++;

size--;

if (prev == e)

table[i] = next;

else

prev.next = next;

e.recordRemoval(this);

return e;

}

prev = e;

e = next;

}

return e;

}

remove和get类似，也是先找到key对应的存储位置，然后遍历找到key删除entry。这里使用单向链表解决冲突。

5）包含

containKey(Object key)

public boolean containsKey(Object key) {

return getEntry(key) != null;

}

/**

* Returns the entry associated with the specified key in the

* HashMap. Returns null if the HashMap contains no mapping

* for the key.

*/

final Entry<K,V> getEntry(Object key) {

int hash = (key == null) ? 0 : hash(key.hashCode());

for (Entry<K,V> e = table[indexFor(hash, table.length)];

e != null;

e = e.next) {

Object k;

if (e.hash == hash &&

((k = e.key) == key || (key != null && key.equals(k))))

return e;

}

return null;

}

getEntry和get方法类似，getEntry返回Entry对象，get返回Entry的value。

HashMap参考：

http://www.cnblogs.com/huangfox/archive/2012/07/06/2579614.html

TreeMap

TreeMap是支持排序的Map实现，可以自己指定Comparator参数。

1）创建

public TreeMap(Comparator<? super K> comparator) {

this.comparator = comparator;

}

2）添加

put(Object key , Object value)

public V put(K key, V value) {

Entry<K,V> t = root;

if (t == null) {

// TBD:

// 5045147: (coll) Adding null to an empty TreeSet should

// throw NullPointerException

//

// compare(key, key); // type check

root = new Entry<K,V>(key, value, null);

size = 1;

modCount++;

return null;

}

int cmp;

Entry<K,V> parent;

// split comparator and comparable paths

Comparator<? super K> cpr = comparator;

if (cpr != null) {

do {

parent = t;

cmp = cpr.compare(key, t.key);

if (cmp < 0)

t = t.left;

else if (cmp > 0)

t = t.right;

else

return t.setValue(value);

} while (t != null);

}

else {

if (key == null)

throw new NullPointerException();

Comparable<? super K> k = (Comparable<? super K>) key;

do {

parent = t;

cmp = k.compareTo(t.key);

if (cmp < 0)

t = t.left;

else if (cmp > 0)

t = t.right;

else

return t.setValue(value);

} while (t != null);

}

Entry<K,V> e = new Entry<K,V>(key, value, parent);

if (cmp < 0)

parent.left = e;

else

parent.right = e;

fixAfterInsertion(e);

size++;

modCount++;

return null;

}

先判断root是否为null，如果是则新建一个Entry对象，并赋值给root。

如果root不为null，首先判断是否指定了Comparator，如果已经传入，则基于红黑树的方式遍历，通过比较结果选择左树或者右树（左小右大）。

如果找到相等的key则直接替换vlaue，并返回结束put操作。

如果遍历结束都没有找到相等的key，则根据最后一次比较结果在遍历的最后一个节点添加一个左结点或右结点，依据依然是左小右大。

如果没有指定Comparator（Comparator == null），则需要根据key来创建一个比较器（Comparable<? super K> k = (Comparable<? super K>) key;），操作过程和上面相似。

2）获取

get(Object key)

final Entry<K,V> getEntry(Object key) {

// Offload comparator-based version for sake of performance

if (comparator != null)

return getEntryUsingComparator(key);

if (key == null)

throw new NullPointerException();

Comparable<? super K> k = (Comparable<? super K>) key;

Entry<K,V> p = root;

while (p != null) {

int cmp = k.compareTo(p.key);

if (cmp < 0)

p = p.left;

else if (cmp > 0)

p = p.right;

else

return p;

}

return null;

}

3)删除

remove(Object key)

public V remove(Object key) {

Entry<K,V> p = getEntry(key);

if (p == null)

return null;

V oldValue = p.value;

deleteEntry(p);

return oldValue;

}

首先通过getEntry获取entry对象，如果不为null将此entry从红黑树上删除，并重新调整树的相关节点。

这个过程比较复杂，可以参考红黑树的相关知识。

相关阅读:
pysam操作sam文件
NCBI SRA数据库
通过bed文件获取fasta序列
利用mysql客户端查询UCSC数据库
Biopython常用功能模块
FASTX-Toolkit组件用法
SQL HAVING用法详解
jquery获取、改变元素属性值
《JavaScript DOM编程艺术》
sublime text3使用心得及个人配置 sublime常用快捷键大全

原文地址：https://www.cnblogs.com/hlongch/p/5742981.html