• 综合练习:词频统计


    下载一首英文的歌词或文章

    将所有,.?!’:等分隔符全部替换为空格

    将所有大写转换为小写

    生成单词列表

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    for c in sep:
       news=news.replace(c,' ')
       wordList=news.lower().split()
    
    for w in wordList:
          print(w)

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    for c in sep:
       news=news.replace(c,' ')
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    exclude={'be','i','so','over','hearing'}
    for c in sep:
       news=news.replace(c,' ')
       wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
        wordDict[w]=wordList.count(w)
    for w in wordDict:
          print(w,wordDict[w])

    f=open('news.txt','r')
    news=f.read()
    f.close()
    sep=''',.'!"?:'''
    exclude={'be','i','so','over','hearing'}
    for c in sep:
    news=news.replace(c,' ')
    wordList=news.lower().split()
    wordDict={}
    wordSet=set(wordList)-exclude
    for w in wordSet:
    wordDict[w]=wordList.count(w)

    dic=sorted(wordDict.items(),key=lambda d:d[1],reverse=True)
    print(dic)
    for i in range(20):
    print(dic[i])

    f=open('news.txt','r')
    text=f.read()
    f.close()
    print(text)

    
    
    
    
    
    
    
    
    
    
    
    
  • 相关阅读:
    【类的继承与派生】学习笔记
    c++类的学习笔记
    c++链表
    实验六--类和对象
    mission3--dp
    POJ2718Smallest Difference(暴力全排列)
    我也不知道该起什么标题....
    noip2014题解
    Windows平台整合SpringBoot+KAFKA__第2部分_代码编写前传
    Windows平台整合SpringBoot+KAFKA_第1部分_环境配置部分
  • 原文地址:https://www.cnblogs.com/dean666/p/8653994.html
Copyright © 2020-2023  润新知