• python数据抓取(与全国疫情合为一个项目)


    一、任务 

    这次任务是与上一次的疫情地图相关联的,通过在网站上实时抓取数据,来实现数据库更新,页面更新。本来我打算用java写爬虫,奈何不会。在网络上查找相关资料,java要写一个实体相关类,然后抓取部分代码和html页面,然后连接数据库。具体为这三个步骤

    (1)设置URL、URLConnection、BufferedReader *

    (2)设置正则表达式,通过获取的数据流进行解析 *

    (3)将符合匹配要求的数据存放到list数组中和数据库中

     连接博客https://blog.csdn.net/wjf_1997/article/details/78245702

    但是这种方式效率太低了,还要写dao层,于是我采用的python方法,python易操作、比较简单。

    话不多说,奉上代码:

    import requests
    from bs4 import BeautifulSoup
    import re
    import pymysql
    import json
    
    
    def create():
        db = pymysql.connect("localhost", "root", "0000", "grabdata_test",charset='utf8')  # 连接数据库
    
        cursor = db.cursor()
        cursor.execute("DROP TABLE IF EXISTS info")
    
        sql = """CREATE TABLE info (
                Id INT PRIMARY KEY AUTO_INCREMENT,
                Date varCHAR(255),
                Province varchar(255),
                City varchar(255),
                Confirmed_num varchar(255),
                Yisi_num varchar(255),
                Cured_num varchar(255),
                Dead_num varchar(255),
                Code varchar(255))"""
    
        cursor.execute(sql)
    
        db.close()
    
    
    def insert(value):
        db = pymysql.connect("localhost", "root", "0000", "grabdata_test",charset='utf8')
    
        cursor = db.cursor()
        sql = "INSERT INTO info(Date,Province,City,Confirmed_num,Yisi_num,Cured_num,Dead_num,Code) VALUES ( %s,%s,%s,%s,%s,%s,%s,%s)"
        try:
            cursor.execute(sql, value)
            db.commit()
            print('插入数据成功')
        except:
            db.rollback()
            print("插入数据失败")
        db.close()
    
    
    create()  # 创建表
    
    url = 'https://raw.githubusercontent.com/BlankerL/DXY-2019-nCoV-Data/master/json/DXYArea.json'
    response = requests.get(url)
    # 将响应信息进行json格式化
    versionInfo = response.text
    # print(versionInfo)#打印爬取到的数据
    # print("------------------------")#重要数据分割线↓
    
    #一个从文件加载,一个从内存加载#json.load(filename)#json.loads(string)
    jsonData = json.loads(versionInfo)
    
    #用于存储数据的集合
    dataSource = []
    provinceShortNameList = []
    confirmedCountList = []
    curedCount = []
    deadCountList = []
    #遍历对应的数据存入集合中
    for k in range(len(jsonData['results'])):
        if(jsonData['results'][k]['countryName'] == '中国'):
            provinceShortName = jsonData['results'][k]['provinceName']
            if("待明确地区" == provinceShortName):
                continue;
    
            for i in range(len(jsonData['results'][k]['cities'])):
                confirmnum=jsonData['results'][k]['cities'][i]['confirmedCount']
                yisi_num=jsonData['results'][k]['cities'][i]['suspectedCount']
                cured_num=jsonData['results'][k]['cities'][i]['curedCount']
                dead_num=jsonData['results'][k]['cities'][i]['deadCount']
                code=jsonData['results'][k]['cities'][i]['locationId']
                cityname=jsonData['results'][k]['cities'][i]['cityName']
                date='2020-3-10'
                insert((date,provinceShortName,cityname,confirmnum,yisi_num,cured_num,dead_num,code))

    运行结果:

    数据库已经导入数据

     运行疫情地图的界面:

  • 相关阅读:
    Linux三种网络连接模式
    hadoop知识点总结
    Wordpress 删除 Storefront 主题的购物车
    英文俚语600句及释义
    雅思口语俚语150句 A-Z of English Idioms: 150 Most Common Expressions
    雅思作文策略总结
    雅思作文高分词汇及词组
    剑桥雅思写作高分范文ESSAY113
    剑桥雅思写作高分范文ESSAY112
    剑桥雅思写作高分范文ESSAY111
  • 原文地址:https://www.cnblogs.com/hang-hang/p/12465491.html
Copyright © 2020-2023  润新知