• python实现简易词频统计-源码


    需求:给瓦尔登湖文章统计单词出现的频率

    思路:首先读取文件并以空格分割得到列表,然后利用for循环遍历列表中的元素并把去掉列表元素中的符号,第三步去掉相同的元素,将列表转换为一个字典,最后按照键值对升序排序。

    源码:

     1 #!/user/bin/env python
     2 #-*-coding:utf-8 -*-
     3 #Author: qinjiaxi
     4 import string
     5 path = "C:\Users\Administrator\Desktop\walden.txt"
     6 with open(path, 'r') as test:
     7     # words = test.read().split()
     8     # print(words)
     9     # for word in words:
    10     #     print('{}-{} times'.format(word, words.count(word)))
    11     words = [raw_word.strip(string.punctuation).lower() for raw_word in test.read().split()]#去掉每个单词的包含的标点符号并首字母变成小写
    12     words_index = set(words)#去同
    13     counts_dict = {index:words.count(index) for index in words_index}#字典推导式,键是每个单词,值是对应的单词在文件中出现的频率
    14 for word in sorted(counts_dict, key = lambda x: counts_dict[x], reverse = True):#利用字典的值进行排序-降序
    15     print('{}--{} times'.format(word, counts_dict[word]))
  • 相关阅读:
    maven搭建
    javascript
    FTP工具类
    jsp相关知识
    java mail 邮箱发送
    servlet相关
    hibernate文档
    6月
    Spring AOP 使用总结
    spring事务配置总结
  • 原文地址:https://www.cnblogs.com/qinlangsky/p/9475219.html
Copyright © 2020-2023  润新知