[LeetCode in Python] 692 (M) top k frequent words 前K个高频单词

[LeetCode in Python] 692 (M) top k frequent words 前K个高频单词
题目：

https://leetcode-cn.com/problems/top-k-frequent-words/

给一非空的单词列表，返回前 k 个出现次数最多的单词。
返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率，按字母顺序排序。

示例 1：

输入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
输出: ["i", "love"]
解析: "i" 和 "love" 为出现次数最多的两个单词，均为2次。
注意，按字母顺序 "i" 在 "love" 之前。

示例 2：

输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
输出: ["the", "is", "sunny", "day"]
解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词，
出现次数依次为 4, 3, 2 和 1 次。

注意：

假定 k 总为有效值， 1 ≤ k ≤ 集合元素数。
输入的单词均由小写字母组成。

解题思路(最小堆)
- python自带最小堆的实现heapq
- heapq有取top k的函数heapq.nlargest(n, iterable[, key]))
- 上面函数的第三个参数支持多参数级联比较
- 直接使用nlargest()无法同时满足频率降序和名称升序
- 技巧是将频率前加-号，然后转为使用nsmallest()
- 注意：使用heapq属于投机取巧，严格来讲，需要自己实现nsmallest()才能达到考察目的
代码(最小堆)
```
class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        # - statistic word frequency
        freq_dict = {}
        for w in words:
            if w not in freq_dict:
                freq_dict[w] = 0
            freq_dict[w] += 1

        # - top k, sort by -freq and word
        return heapq.nsmallest(k, freq_dict, key=lambda w:(-freq_dict[w], w))
```
解题思路（桶排序）
- 感觉使用桶排序代码清晰明快，值得推荐
- 先对单词进行词频统计
- 再构造桶，其实就是创建字典，key=词频，value=[单词]
- 构造桶时获取词频的最大值max_f
- 从max_f到0循环，挨个将桶里全部单词排序后添加到结果列表中
- 记得返回结果要取前k个
代码（桶排序）
```
class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        # - get word frequency
        freq_dict = defaultdict(int)
        for w in words:
            freq_dict[w] += 1

        # - build bucket
        bucket_dict = defaultdict(list)
        max_f = 0
        for w,f in freq_dict.items():
            bucket_dict[f].append(w)
            max_f = max(max_f, f)

        # - reverse iterate
        res = []
        for f in range(max_f, 0, -1):
            if f in bucket_dict:
                res += sorted(bucket_dict[f])
                if len(res) >= k:
                    return res[:k]
```
相关阅读:
[CF149D] Coloring Brackets（区间dp）
[CF1437E] Make It Increasing（LIS）
洛谷试题之跳石头
 【模板】深搜和广搜
 高精度阶乘
 【模板】拓扑排序
 【模板】最小生成树——Kruskal算法
 判断素数的方法
 高精度乘法
 高精度加法
原文地址：https://www.cnblogs.com/journeyonmyway/p/12543887.html

[LeetCode in Python] 692 (M) top k frequent words 前K个高频单词

题目：

解题思路(最小堆)

代码(最小堆)

解题思路（桶排序）

代码（桶排序）