• 文件方式实现完整的英文词频统计实例


    可以下载一长篇的英文小说,进行词频的分析。

    1.读入待分析的字符串

    2.分解提取单词 

    3.计数字典

    4.排除语法型词汇

    5.排序

    6.输出TOP(20)

    7.对输出结果的简要说明。

    song='''It's been a long day without you my friend
    
    And I'll tell you all about it when I see you again
    
    We've come a long way from where we began
    
    Oh I'll tell you all about it when I see you again
    
    When I see you again
    
    Damn, who knew all the planes we flew?
    
    Good things we've been through
    
    That I'll be standing right here
    
    Talking to you about another path
    
    I know we loved to hit the road and laugh
    
    But something told me that it wouldn't last
    
    Had to switch up look at things different see the bigger picture
    
    Those were the days hard work forever pays
    
    Now I see you win the better place
    
    How could we not talk about family when family's all that we got?
    
    Everything I went through you were standing there by my side
    
    And now you gonna be with me for the last ride
    
    It's been a long day without you my friend
    
    And I'll tell you all about it when I see you again
    
    We've come a long way from where we began
    
    Oh I'll tell you all about it when I see you again
    
    When I see you again
    
    First you both go out your way?
    
    And the vibe is feeling strong and what's small turn to a friendship
    
    Turn into a bond and that bond will never be broken and the love will never get lost
    
    And when brotherhood come first then the line
    
    Will never be crossed established it on our own
    
    When that line had to be drawn and that line is what?
    
    We reach so remember me when I'm gone
    
    How could we not talk about family when family's all that we got?
    
    Everything I went through you were standing there by my side
    
    And now you gonna be with me for the last ride
    
    So let the light guide your way
    
    Hold every memory as you go
    
    And every road you take will always lead you home
    
    Hoo?
    
    It's been a long day without you my friend?
    
    And I'll tell you all about it when I see you again
    
    We've come a long way from where we began
    
    Oh I'll tell you all about it when I see you again
    
    When I see you again
    
    When I see you again, see you again
    
    When I see you again'''
    a=open('ph.txt','r')
    song=a.read()
    a.close()
    
    song=song.lower()#字符串处理,把大写换成小写
    for i in '?':
        song=song.replace(i,' ')#把'?'换成' '
    words=song.split(' ')#单词的分隔
    exp={'the','to','and','were','i','we','you','be','a'}#不统计单词的集合
    
    dic={}
    key=set(words)-exp#键的集合
    for y in key:
        dic[y]=words.count(y)#计算单词的个数
        
    yz=list(dic.items())#(单词,计数)元组的列表
    yz.sort(key= lambda x:x[1],reverse=True)#对列表进行排序
    
    for i in range(20):#输出前20个出现次数最多的单词
        print(yz[i]

     

  • 相关阅读:
    Apache与Nginx的优缺点比较
    [PHP基础]有关isset empty 函数的面试题
    PHP求解一个值是否为质数
    15个魔术方法的总结
    对象在类中的存储方式有哪些?
    cookie大小
    Tp3.2 和 Tp5.0之间的区别
    经典的面试题,(这是著名的约瑟夫环问题)
    怎么计算数据库有多大的数据量
    [置顶] 实用电子电路设计丛书
  • 原文地址:https://www.cnblogs.com/gdlyzx/p/7602852.html
Copyright © 2020-2023  润新知