• 英语词频统计


    1. 词频统计预处理
    2. 下载一首英文的歌词或文章
    3. 将所有,.?!’:等分隔符全部替换为空格
    4. 将所有大写转换为小写
    5. 生成单词列表
    6. 生成词频统计
    7. 排序
    8. 排除语法型词汇,代词、冠词、连词
    9. 输出词频最大TOP10

    代码:

    复制代码
    # -*- coding:utf-8 -*-
    
    song = '''
    Nobody ever knows
    Nobody ever sees
    I left my soul
    Back then no I'm too weak
    Most nights I pray for you to come home
    Praying to the lord
    Praying for my soul
    Now please don't go
    Most nights I hardly sleep when I'm alone
    Now please don't go oh no
    I think of you whenever I'm alone
    So please don't go

    Cause I don't ever wanna know
    Don't ever want to see things change
    Cause when I'm living on my own
    I wanna take it back and start again
    Most nights I pray for you to come home
    I'm praying to the lord
    I'm praying for my soul
    Now please don't go
    Most nights I hardly sleep
    When I'm alone
    Now please don't go oh no
    I think of you whenever I'm alone
    So please don't go
    I sent so many messages
    You don't reply
    Gotta feel around what am I missing babe
    Singing now oh oh oh
    I need you now I need your love oh
    Now please don't go
    I said most nights I hardly sleep
    When I'm alone
    Now please don't go oh no
    I think of you whenever I'm alone
    So please don't go
    So please don't go
    So please don't go
    Oh no
    I think of you whenever I'm alone
    So please don't go ''' symbol = list(''',.?!’:"“”-%$''') for i in symbol: song = song.replace(i, ' ') song = song.lower() split = song.split() word = {} for i in split: count = song.count(i) word[i] = count words = ''' a an the in on to at and of is was are were i he she you your they us their our it or for be too do no that s so as but it's '''
    prep = words.split() for i in prep: # 判断单词是否在字典中 if i in word.keys(): del(word[i]) word = sorted(word.items(), key=lambda item: item[1], reverse=True) for i in range(10): print(word[i])
  • 相关阅读:
    shell 脚本模板
    运动拉伸
    nature作图要求
    R语言画图曼哈顿图来源网络
    选择合适的统计图形和统计方法|图片来自松哥统计
    GO富集图
    batch gene expression plot
    植物生理生化研究进展
    手机图片
    jquery练习之超链接提示效果
  • 原文地址:https://www.cnblogs.com/verson/p/8629082.html
Copyright © 2020-2023  润新知