• 【Python】分析自己的博客 https://www.cnblogs.com/xiandedanteng/p/?page=XX,看每个月发帖量是多少


    要执行下面程序,需要安装Beautiful Soup和requests,具体安装方法请见:https://www.cnblogs.com/xiandedanteng/p/8668492.html

    # 分析自己的博客 https://www.cnblogs.com/xiandedanteng/p/?page=XX,看每个月发帖量是多少
    from bs4 import BeautifulSoup
    import requests
    import re
    
    user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)'
    headers={'User-Agent':user_agent}
    
    dic={}; #定义个字典对象,存月份和个数
    
    #把2013年8月以来的每个月都放进去
    for i in range(8,13):
         yearMonth="2013-"+"{:0>2d}".format(i)
         dic[yearMonth]=0
    
    for i in range(1,13):
         yearMonth="2014-"+"{:0>2d}".format(i)
         dic[yearMonth]=0
    
    for i in range(1,13):
         yearMonth="2015-"+"{:0>2d}".format(i)
         dic[yearMonth]=0
    
    for i in range(1,13):
         yearMonth="2016-"+"{:0>2d}".format(i)
         dic[yearMonth]=0
    
    for i in range(1,13):
         yearMonth="2017-"+"{:0>2d}".format(i)
         dic[yearMonth]=0
    
    for i in range(1,13):
         yearMonth="2018-"+"{:0>2d}".format(i)
         dic[yearMonth]=0
    
    for i in range(1,12):
         yearMonth="2019-"+"{:0>2d}".format(i)
         dic[yearMonth]=0
    
    for i in range(1,90):
        html=requests.get('http://www.cnblogs.com/xiandedanteng/p/?page='+str(i),headers=headers)
        soup= BeautifulSoup(html.text,'html.parser',from_encoding='utf-8');
    
        for descDiv in soup.find_all(class_="postDesc2"):
             rawInfo=descDiv.text #得到class="postDesc2"的div的内容
             yearMonth=re.search(r'd{4}-d{2}',rawInfo).group() #用正则表达式去匹配年月并取其值
    
             # 将年月存入字典,如果存在就在原基础上加一         
             if yearMonth in dic:
                   dic[yearMonth]=dic[yearMonth]+1
             else:
                   dic[yearMonth]=1
    
    # 打印字典,需要再放开
    for item in dic.items():
        print(item)

     得到的结果是:

    ('2013-08', 28)
    ('2013-09', 43)
    ('2013-10', 14)
    ('2013-11', 15)
    ('2013-12', 4)
    ('2014-01', 8)
    ('2014-02', 5)
    ('2014-03', 3)
    ('2014-04', 14)
    ('2014-05', 14)
    ('2014-06', 1)
    ('2014-07', 26)
    ('2014-08', 15)
    ('2014-09', 2)
    ('2014-10', 7)
    ('2014-11', 12)
    ('2014-12', 22)
    ('2015-01', 14)
    ('2015-02', 4)
    ('2015-03', 0)
    ('2015-04', 6)
    ('2015-05', 4)
    ('2015-06', 5)
    ('2015-07', 10)
    ('2015-08', 7)
    ('2015-09', 0)
    ('2015-10', 0)
    ('2015-11', 1)
    ('2015-12', 2)
    ('2016-01', 0)
    ('2016-02', 9)
    ('2016-03', 15)
    ('2016-04', 0)
    ('2016-05', 1)
    ('2016-06', 1)
    ('2016-07', 17)
    ('2016-08', 12)
    ('2016-09', 0)
    ('2016-10', 1)
    ('2016-11', 0)
    ('2016-12', 0)
    ('2017-01', 20)
    ('2017-02', 3)
    ('2017-03', 2)
    ('2017-04', 1)
    ('2017-05', 1)
    ('2017-06', 21)
    ('2017-07', 9)
    ('2017-08', 38)
    ('2017-09', 80)
    ('2017-10', 5)
    ('2017-11', 32)
    ('2017-12', 21)
    ('2018-01', 7)
    ('2018-02', 0)
    ('2018-03', 19)
    ('2018-04', 56)
    ('2018-05', 45)
    ('2018-06', 2)
    ('2018-07', 2)
    ('2018-08', 0)
    ('2018-09', 0)
    ('2018-10', 0)
    ('2018-11', 0)
    ('2018-12', 0)
    ('2019-01', 0)
    ('2019-02', 0)
    ('2019-03', 37)
    ('2019-04', 1)
    ('2019-05', 2)
    ('2019-06', 0)
    ('2019-07', 1)
    ('2019-08', 18)
    ('2019-09', 42)
    ('2019-10', 66)
    ('2019-11', 17)

    把这个文本拷贝到Notepad++里面,将括号替换掉,然后另存为csv文件。再用Excel打开文件生成图表如下:

     工程下载:https://files.cnblogs.com/files/xiandedanteng/6.everyMonthMyblog20191104.rar

    --END-- 2019年11月4日09:06:52

  • 相关阅读:
    多选择文件打开对话框
    DirectoryExists
    获取IP地址
    获取WINDOWS特殊文件夹
    WPF WebBrowser
    DELPHI TDownLoadURL下载网络文件
    同步窗体移动 FormMove
    FireMonkey 使用Webbrowser
    网页截取图片
    FormMove
  • 原文地址:https://www.cnblogs.com/heyang78/p/11790284.html
Copyright © 2020-2023  润新知