问题描述为: 一个无序的数列,每个数有其对应的权重,权重为非负整数,代表数列中的数字出现的次数。要求找出这一无序数列中的中位数。
1. 直接解法,先对该数列和权重排序。然后找出累计权重为中位数的数字。 时间复杂度为排序的 O(nlog(n)+n)
2 import numpy as np 3 4 def weighted_median(data, weights): 5 """ 6 Args: 7 data (list or numpy.array): data 8 weights (list or numpy.array): weights 9 """ 10 data, weights = np.array(data).squeeze(), np.array(weights).squeeze() 11 s_data, s_weights = map(np.array, zip(*sorted(zip(data, weights)))) 12 midpoint = 0.5 * sum(s_weights) 13 if any(weights > midpoint): 14 w_median = (data[weights == np.max(weights)])[0] 15 else: 16 cs_weights = np.cumsum(s_weights) 17 idx = np.where(cs_weights <= midpoint)[0][-1] 18 if cs_weights[idx] == midpoint: 19 w_median = np.mean(s_data[idx:idx+2]) 20 else: 21 w_median = s_data[idx+1] 22 return w_median 23 24 def test_weighted_median(): 25 print("hello, world") 26 data = [ 27 [7, 1, 2, 4, 10], 28 [7, 1, 2, 4, 10], 29 [7, 1, 2, 4, 10, 15], 30 [1, 2, 4, 7, 10, 15], 31 [0, 10, 20, 30], 32 [1, 2, 3, 4, 5], 33 [30, 40, 50, 60, 35], 34 [2, 0.6, 1.3, 0.3, 0.3, 1.7, 0.7, 1.7, 0.4], 35 ] 36 weights = [ 37 [1, 1/3, 1/3, 1/3, 1], 38 [1, 1, 1, 1, 1], 39 [1, 1/3, 1/3, 1/3, 1, 1], 40 [1/3, 1/3, 1/3, 1, 1, 1], 41 [30, 191, 9, 0], 42 [10, 1, 1, 1, 9], 43 [1, 3, 5, 4, 2], 44 [2, 2, 0, 1, 2, 2, 1, 6, 0], 45 ] 46 answers = [7, 4, 8.5, 8.5, 10, 2.5, 50, 1.7] 47 for datum, weight, answer in zip(data, weights, answers): 48 assert(weighted_median(datum, weight) == answer) 49 50 if __name__ == "__main__": 51 test_weighted_median()
2. 按照快速排序的思路,先找到一个数字,然后 按照该数字将数列划分成左右两段,根据左右两段的权重之和,递归调用左半侧或者右半侧数列。