3.1.4 无序链表中的顺序查找
符号表中使用的数据结构的一个简单选择是链表,每个结点存储一个键值对,如以下代码所示。get()的实现即为遍历链表,用equals()方法比较需被查找的键和每个节点中的键。如果匹配成功我们就返回null。put()的实现也是遍历链表,用equals()方法比较需被查找的键。如果匹配成功我们就用第二个参数指定的值更新和改键现关联的值,否则我们就用给定的键值对创建一个新的节点并将其插入到链表的开头。这种方法也被称为顺序查找:在查找中我们一个一个地顺序遍历符号表中的所有键并使用equals()方法来寻找与被查找的键匹配的键。
算法(SequentialSearchST)用链表实现了符号表的基本API,这里我们将size()、keys()和即时型的delete()方法留作联系。
1 /************************************************************************* 2 * Compilation: javac SequentialSearchST.java 3 * Execution: java SequentialSearchST 4 * Dependencies: StdIn.java StdOut.java 5 * Data files: http://algs4.cs.princeton.edu/31elementary/tinyST.txt 6 * 7 * Symbol table implementation with sequential search in an 8 * unordered linked list of key-value pairs. 9 * 10 * % more tinyST.txt 11 * S E A R C H E X A M P L E 12 * 13 * % java SequentialSearchST < tiny.txt 14 * L 11 15 * P 10 16 * M 9 17 * X 7 18 * H 5 19 * C 4 20 * R 3 21 * A 8 22 * E 12 23 * S 0 24 * 25 *************************************************************************/ 26 27 /** 28 * The <tt>SequentialSearchST</tt> class represents an (unordered) 29 * symbol table of generic key-value pairs. 30 * It supports the usual <em>put</em>, <em>get</em>, <em>contains</em>, 31 * <em>delete</em>, <em>size</em>, and <em>is-empty</em> methods. 32 * It also provides a <em>keys</em> method for iterating over all of the keys. 33 * A symbol table implements the <em>associative array</em> abstraction: 34 * when associating a value with a key that is already in the symbol table, 35 * the convention is to replace the old value with the new value. 36 * The class also uses the convention that values cannot be <tt>null</tt>. Setting the 37 * value associated with a key to <tt>null</tt> is equivalent to deleting the key 38 * from the symbol table. 39 * <p> 40 * This implementation uses a singly-linked list and sequential search. 41 * It relies on the <tt>equals()</tt> method to test whether two keys 42 * are equal. It does not call either the <tt>compareTo()</tt> or 43 * <tt>hashCode()</tt> method. 44 * The <em>put</em> and <em>delete</em> operations take linear time; the 45 * <em>get</em> and <em>contains</em> operations takes linear time in the worst case. 46 * The <em>size</em>, and <em>is-empty</em> operations take constant time. 47 * Construction takes constant time. 48 * <p> 49 * For additional documentation, see <a href="http://algs4.cs.princeton.edu/31elementary">Section 3.1</a> of 50 * <i>Algorithms, 4th Edition</i> by Robert Sedgewick and Kevin Wayne. 51 * 52 * @author Robert Sedgewick 53 * @author Kevin Wayne 54 */ 55 public class SequentialSearchST<Key, Value> { 56 private int N; // number of key-value pairs 57 private Node first; // the linked list of key-value pairs 58 59 // a helper linked list data type 60 private class Node { 61 private Key key; 62 private Value val; 63 private Node next; 64 65 public Node(Key key, Value val, Node next) { 66 this.key = key; 67 this.val = val; 68 this.next = next; 69 } 70 } 71 72 /** 73 * Initializes an empty symbol table. 74 */ 75 public SequentialSearchST() { 76 } 77 78 /** 79 * Returns the number of key-value pairs in this symbol table. 80 * @return the number of key-value pairs in this symbol table 81 */ 82 public int size() { 83 return N; 84 } 85 86 /** 87 * Is this symbol table empty? 88 * @return <tt>true</tt> if this symbol table is empty and <tt>false</tt> otherwise 89 */ 90 public boolean isEmpty() { 91 return size() == 0; 92 } 93 94 /** 95 * Does this symbol table contain the given key? 96 * @param key the key 97 * @return <tt>true</tt> if this symbol table contains <tt>key</tt> and 98 * <tt>false</tt> otherwise 99 */ 100 public boolean contains(Key key) { 101 return get(key) != null; 102 } 103 104 /** 105 * Returns the value associated with the given key. 106 * @param key the key 107 * @return the value associated with the given key if the key is in the symbol table 108 * and <tt>null</tt> if the key is not in the symbol table 109 */ 110 public Value get(Key key) { 111 for (Node x = first; x != null; x = x.next) { 112 if (key.equals(x.key)) return x.val; 113 } 114 return null; 115 } 116 117 /** 118 * Inserts the key-value pair into the symbol table, overwriting the old value 119 * with the new value if the key is already in the symbol table. 120 * If the value is <tt>null</tt>, this effectively deletes the key from the symbol table. 121 * @param key the key 122 * @param val the value 123 */ 124 public void put(Key key, Value val) { 125 if (val == null) { delete(key); return; } 126 for (Node x = first; x != null; x = x.next) 127 if (key.equals(x.key)) { x.val = val; return; } 128 first = new Node(key, val, first); 129 N++; 130 } 131 132 /** 133 * Removes the key and associated value from the symbol table 134 * (if the key is in the symbol table). 135 * @param key the key 136 */ 137 public void delete(Key key) { 138 first = delete(first, key); 139 } 140 141 // delete key in linked list beginning at Node x 142 // warning: function call stack too large if table is large 143 private Node delete(Node x, Key key) { 144 if (x == null) return null; 145 if (key.equals(x.key)) { N--; return x.next; } 146 x.next = delete(x.next, key); 147 return x; 148 } 149 150 151 /** 152 * Returns all keys in the symbol table as an <tt>Iterable</tt>. 153 * To iterate over all of the keys in the symbol table named <tt>st</tt>, 154 * use the foreach notation: <tt>for (Key key : st.keys())</tt>. 155 * @return all keys in the sybol table as an <tt>Iterable</tt> 156 */ 157 public Iterable<Key> keys() { 158 Queue<Key> queue = new Queue<Key>(); 159 for (Node x = first; x != null; x = x.next) 160 queue.enqueue(x.key); 161 return queue; 162 } 163 164 165 /** 166 * Unit tests the <tt>SequentialSearchST</tt> data type. 167 */ 168 public static void main(String[] args) { 169 SequentialSearchST<String, Integer> st = new SequentialSearchST<String, Integer>(); 170 for (int i = 0; !StdIn.isEmpty(); i++) { 171 String key = StdIn.readString(); 172 st.put(key, i); 173 } 174 for (String s : st.keys()) 175 StdOut.println(s + " " + st.get(s)); 176 } 177 }
这种基于链表的实现能够用于和我们的用例类似的、需要大型符号表的应用?我们已经说过,分析符号表算法比分析排序算法更加困难,因为不同的用例所进行的操作序列各不相同。对于FrequencyCounter,最常见的情形是虽然查找和插入的使用模式是不可预测的,但他妈的使用肯定不是随机的。因此我们主要研究最坏情况下的性能。为了方便,我们使用命令中表示一次成功的查找来命中表示一次失败的查找。使用基于链表的符号表的索引引用例的轨迹如下图。
符号表的实现是i用了一个私有内部Node类来在链表中保存键和值。get()的实现会顺序地搜索链表查找给定的键(找到则返回相关联的值)。put()的实现也会顺序地搜索链表查找给定的键,如果找到则更新相关联的值,否则它会用给定的键值对创建一个新的结点并将其插入到链表的开头。size()、keys()和即时型的delete()方法的实现自行练习。
命题A。在含有N对键值的基于(无序)链表的符号表中,未命中的查找和插入操作都需要N次比较。未命中的查找和插入都需要N次比较。命中的查找在最坏情况下需要N次比较。特别地,向一个空表中插入N个不用的键余姚~N^2 / 2次比较。
证明。在表中查找一个不存在的键时,我们会将表中的每个键和给定的键比较。因为不允许出现重复的键,每次插入操作之前我们都需要这样查找一遍。
推论想空表中插入N个不同的键需要~N^2 / 2次比较。