• 爬虫大作业


    1.选一个自己感兴趣的主题。

    2.用python 编写爬虫程序,从网络上爬取相关主题的数据。

    3.对爬了的数据进行文本分析,生成词云。

    4.对文本分析结果进行解释说明。

    5.写一篇完整的博客,描述上述实现过程、遇到的问题及解决办法、数据分析思想及结论。

    6.最后提交爬取的全部数据、爬虫及数据分析源代码。

           此次实验中,通过爬取网易云音乐-遥远的她网页中的热门评论来做词频分析
    爬取热门评论并保存至txt文本的代码:

    # -*- coding : UTF-8 -*-
    # -*- author : Kamchuen -*-
    # -*- file : project1 -*-
    # -*- time: 2018/4/28 -*-
    # -*- description: 歌曲热门评论获取-*-
    # -*- songname: 遥远的她-*-
    
    import requests
    import json
    
    def getcomments(musicid):
        url = 'http://music.163.com/weapi/v1/resource/comments/R_SO_4_191232?csrf_token=5594eaee83614ea8ca9017d85cd9d1b3'.format(musicid)
        payload = {
            'params': '5KX6Y3k92jAA0++OK5aeNVeABXbc97L0PMviuChCa4cr1OW3wHCWDSLEtRyQwXdMC2+BbLjG3qA8N9gLOgL5WPBSl2qyIv++7YRnEn3BxO2hq6opl5B3Jgbi/I/PPkxneLfgYuIo71F0VxqAVaoJlMefBej5M/fZ5eO3YkvOKWaAHbtJYL+g17z8XskUfcHm',
            'encSecKey': '5baedada4b87cc31a46b38d0a3893895da30fe3bddaf8a24020ce17fa004f2e646cb15e109c4786e55eca0e88dbd6dc905aa55fd6cd7f4de1e625937d986dd4e1217a19e4f4a2f0e00504763308148ab3215ae25a5ac99e65ba8e46f2f8a734182fc20c5a7281f1d127a513a954b17d66a75f4a06f0044fc6c783dd356b017a7'
        }
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
            'Referer': 'http://music.163.com/song?id={}'.format(musicid),
            'Host': 'music.163.com',
            'Origin': 'http://music.163.com'
        }
    
        response = requests.post(url=url, headers=headers, data=payload)
        data = json.loads(response.text)
        hotcomments = []
        for hotcomment in data['hotComments']:
            item = {
                'nickname': hotcomment['user']['nickname'],
                'content': hotcomment['content']
            }
            hotcomments.append(item)
    
        # 返回热门评论
        return [content['content'] for content in hotcomments]
    
    if __name__ == '__main__':
        hot = getcomments(191232)
        print(hot)
        file = open('hot.txt','w')
        for hotword in hot:
               file.write(hotword+'
    ')
        file.close()

    打开txt文本生成词云的代码:

    # -*- coding : UTF-8 -*-
    # -*- author : Kamchuen -*-
    # -*- file : wordcloudTest1 -*-
    # -*- time: 2018/4/28 -*-
    # -*- description: 词云测试-*-
    from os import path
    from PIL import Image
    import numpy as np
    import matplotlib.pyplot as plt
    
    from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
    
    d = path.dirname(__file__)
    
    
    text = open(path.join(d, 'hot.txt')).read()
    
    
    alice_coloring = np.array(Image.open(path.join(d, "奥特曼.jpg")))
    stopwords = set(STOPWORDS)
    stopwords.add("said")
    
    wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,
                   stopwords=stopwords, max_font_size=85, font_path="STHUPO.TTF",random_state=50)
    
    wc.generate(text)
    
    image_colors = ImageColorGenerator(alice_coloring)
    
    # show
    plt.imshow(wc, interpolation="bilinear")
    plt.axis("off")
    plt.figure()
    plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
    plt.axis("off")
    plt.figure()
    plt.imshow(alice_coloring, cmap=plt.cm.gray, interpolation="bilinear")
    plt.axis("off")
    plt.show()

    结果:

  • 相关阅读:
    爬虫1:get请求的翻页及思考
    Selenium终极自动化测试环境搭建(一) Selenium+Eclipse+Junit+TestNG
    理解RESTful架构(转)
    vue-devtools(vue 2.0)手动安装与使用 ? 如何处理Vue.js is detected on this page ?
    小程序开放更多连接能力
    vscode 开发工具
    video 在微信中,安卓全屏和ios上如何内联播放?H5同层播放器相关接入规范
    微信小程序<web-view>嵌入网页后,小程序如何和网页交互传值?
    SpringCloud(三) 微服务架构-微服务稳定性
    SpringCloud(二) 微服务架构-服务注册与发现
  • 原文地址:https://www.cnblogs.com/LauSir139/p/8970094.html
Copyright © 2020-2023  润新知