1、网页分析
分析请求头
- name=disease_h5 是数据位置
- callback=jQuery341021795676971428168_1580642523637_=1580642523638返回当前时间戳的一个函数
2、数据准备
导入模块:
import time import json import requests from datetime import datetime import pandas as pd import numpy as np
抓取数据:
def catch_data(): url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5' reponse = requests.get(url=url).json() #返回数据字典 data = json.loads(reponse['data']) return data
data = catch_data()
data.keys()
dict_keys(['chinaTotal', 'chinaAdd', 'lastUpdateTime', 'areaTree', 'chinaDayList', 'chinaDayAddList'])
数据处理:
# 数据集包括["国内总量","国内新增","更新时间","数据明细","每日数据","每日新增"] lastUpdateTime = data['lastUpdateTime'] chinaTotal = data['chinaTotal'] chinaAdd = data['chinaAdd'] print(chinaTotal) print(chinaAdd)
{'confirm': 17238, 'suspect': 21558, 'dead': 361, 'heal': 475} {'confirm': 2858, 'suspect': 2014, 'dead': 57, 'heal': 147}