题目要求:
Download the text file here.
The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xidenote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then mk is the (k/2)th smallest number among x1,…,xk.)
In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000) mod 10000.
OPTIONAL EXERCISE: Compare the performance achieved by heap-based and search-tree-based implementations of the algorithm.
大致意思是说,有一个文件,其中包含了1~10000这一万个无序的数字,要求我们每次读入一个数字,并且每次读入数字后,找出所有已读入数字的中位数,计算所有这些中位数的和,然后输出和模10000的结果。
文件中的数据差不多是这样子的:
...
6195
2303
5685
1354
4292
7600
6447
4479
9046
7293
5147
1260
1386
6193
4135
3611
8583
...
解题思路:
这道题当然可以采用最暴力的方法,即每次读入一个数后就对数组进行排序,然后记录中位数,但是显然应该还有更好的方法。没错,如果借用“堆”这一数据结构,可以让算法的时间复杂度大大降低。具体的思路如下:
- 创建两个堆:最大堆和最小堆(最大堆即父节点大于子节点的堆,反之则是最小堆);
- 每次读入一个数后,我们将它和最大堆与最小堆的根节点大小进行比较,如果大于最大堆的根节点,那么就把它插入到最小堆当中;反之,就插入最大堆当中。可以想象一下,通过这个操作,比这两个根节点大的数字都在最小堆的根节点之下,而比这两个根节点小的数字,都在最大堆的根节点之下;
- 有了上述结论后,我们还不能保证中位数就在两个根节点中,因为两个堆的大小可能会差的很大,因此每次读入一个数并且插入相应的堆后,我们都要检查两个堆的大小,然后平衡他们的大小(只有在两个堆的大小差异不大于1的情况下, 中位数才是两个根节点中的一个)
- 平衡的具体做法是:如果两个堆的大小差异超过了1,那么就把size较大的那个堆的根节点pop出来,并将其插入到size较小的堆中;
- 最后就是计算中位数了,因为最小堆的根节点会大于最大堆的根节点,因此如果最小堆的size比最大堆大1,那么中位数就是最小堆根节点;如果两者大小相等,或者最大堆的size比最小堆大1,那么中位数就是最大堆的根节点。
代码实现:
有了上述的思路,利用C++对其进行了实现,代码如下:
#include <iostream> #include <fstream> #include <string> #include <sstream> #include <limits> using namespace std; class MinMaxHeap { public: MinMaxHeap(bool is_min); ~MinMaxHeap(); int Top(); int Size(); void Insert(int num); void Pop(); private: void swap(int index1, int index2); int size; int *element; bool is_min; }; MinMaxHeap::MinMaxHeap(bool is_min = true) { // for this problem, 5010 is just fine this->element = new int[5010]; this->size = 0; this->is_min = is_min; } MinMaxHeap::~MinMaxHeap() { delete[] this->element; } int MinMaxHeap::Top() { return this->element[0]; } int MinMaxHeap::Size() { return this->size; } // The position of each element(the number means the index of the array) // 0 // / // 1 2 // / / // 3 4 5 6 void MinMaxHeap::Insert(int num) { int pos = size; element[size++] = num; if (is_min) { while (pos > 0) { int parent = (pos - 1) >> 1; // same as (pos - 1) / 2 if (element[parent] <= element[pos]) { break; } swap(parent, pos); pos = parent; } } else { while (pos > 0) { int parent = (pos - 1) >> 1; // same as (pos - 1) / 2 if (element[parent] >= element[pos]) { break; } swap(parent, pos); pos = parent; } } } void MinMaxHeap::Pop() { element[0] = element[--size]; int pos = 0; if (is_min) { while (pos < (size >> 1)) // if pos >= (size / 2), then element[pos] must be a leaf { int left_child = pos * 2 + 1; int right_child = left_child + 1; int smallest_child; if (right_child < size && element[left_child] > element[right_child]) { smallest_child = right_child; } else { smallest_child = left_child; } if (element[pos] < element[smallest_child]) { break; } swap(pos, smallest_child); pos = smallest_child; } } else { while (pos < (size >> 1)) // if pos >= (size / 2), then element[pos] must be a leaf { int left_child = pos * 2 + 1; int right_child = left_child + 1; int biggest_child; if (right_child < size && element[left_child] < element[right_child]) { biggest_child = right_child; } else { biggest_child = left_child; } if (element[pos] > element[biggest_child]) { break; } swap(pos, biggest_child); pos = biggest_child; } } } void MinMaxHeap::swap(int index1, int index2) { int tmp = element[index1]; element[index1] = element[index2]; element[index2] = tmp; } int main() { ifstream fin; fin.open("Median.txt"); MinMaxHeap MinHeap(true); MinMaxHeap MaxHeap(false); // because we want to find the median, so insert // both min of int and max of int is ok. MinHeap.Insert(numeric_limits<int>::max()); MaxHeap.Insert(numeric_limits<int>::min()); int input, sum = 0, min_top, max_top; string tmp; while (getline(fin, tmp)) { input = atoi(tmp.c_str()); min_top = MinHeap.Top(); max_top = MaxHeap.Top(); if (input < max_top) { MaxHeap.Insert(input); } else { MinHeap.Insert(input); } // balance if (MaxHeap.Size() > MinHeap.Size() + 1) { max_top = MaxHeap.Top(); MaxHeap.Pop(); MinHeap.Insert(max_top); } if (MinHeap.Size() > MaxHeap.Size() + 1) { min_top = MinHeap.Top(); MinHeap.Pop(); MaxHeap.Insert(min_top); } //find the median if (MinHeap.Size() == MaxHeap.Size() + 1) { sum += MinHeap.Top(); } else { sum += MaxHeap.Top(); } } cout << sum % 10000 << endl; fin.close(); system("pause"); return 0; }
通过这个方法,运算效率大大提升。