• python获取天气数据


    最近在学习python爬虫,在爬取某些网站的时候需要提交加密数据,于是记录下爬取过程,

    以供自己学习、存档。

    一、目标网站

    • 中国空气质量在线监测分析平台收录了全国367个城市的PM2.5及天气信息数据, 具体包括AQI, PM2.5, PM10, S02, N02, O3, CO, 温度,湿度,风级,风向,卫星云图等监测项。
    • 网址链接:https://www.aqistudy.cn/html/city_detail.php?v=1.10

    二、解析网页

    打开网址后按右键F12打开开发者模式,点击查询按钮出现如下请求,

    搜索关键字名称,我们发现加密的参数,提交的参数名称

    定位关键参数


    但是这个js文件里的只有部分加密代码,全局搜索后发现在这里,
    由图示eval函数导出,js反混淆后获取js代码


    后经过多番测试后发现,提交请求的那个js文件中的函数名称,秘钥是是动态变化的,后在
    主页里发现其来源,如图:

    组合两段js代码后即可获得完整加密、解密代码,用正则表达式提取出加密函数名称、提交的参数名称、解密函数。

    三、具体实现

    """
    ===================================
        -*- coding:utf-8 -*-
        Author     :GadyPu
        E_mail     :Gadypy@gmail.com
        Time       :2020/8/18 0010 下午 01:31
        FileName   :go_aqi.py
    ===================================
    """
    import re
    import json
    import execjs
    import requests
    import warnings
    from lxml import etree
    warnings.filterwarnings('ignore')
    
    class GetWeather(object):
        def __init__(self):
            self.url = 'https://www.aqistudy.cn/html/city_detail.php?v=1.10'
            self.api = 'https://www.aqistudy.cn'
            self.headers = {
                'Host': 'www.aqistudy.cn',
                'Origin': 'https://www.aqistudy.cn',
                'Referer': 'https://www.aqistudy.cn/html/city_detail.php?v=1.10',
                'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
                'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; ZTE BA520 Build/MRA58K; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/55.0.2883.77 Mobile Safari/537.36',
            }
            self.req_url = 'https://www.aqistudy.cn/apinew/aqistudyapi.php'
            self.js_code = None
            self.js_ctx = None
            self.encrypt_func = None
            self.encrypt_param = None
            self.decode_func = None
    
        def init_js_code(self):
            try:
                response = requests.get(url = self.url, headers = self.headers, verify = False)
                html = etree.HTML(response.text)
                js_path = html.xpath('/html/body/script[2]/@src')[0]
                #print(self.api + js_path[2:])
    
                response = requests.get(url = self.api + js_path[2:], headers = self.headers, verify = False)
                #print(response.text)
                # 加密参数的函数名
                pat_f = r'var param = (.*)(.*)'
                self.encrypt_func = re.findall(pat_f, response.text)[0]
                #print(func_name)
                # post提交数据的参数名
                pat_p = r'(?<={).+(?=})'
                self.encrypt_param = re.search(pat_p, response.text)[0].split(':')[0].strip()
                #print(param_name)
                # deocde函数
                pat_d = r'data = (.*)(.*)'
                self.decode_func = re.findall(pat_d, response.text)[-1]
                #print(func_decode)
    
                '''
                组合js代码
                '''
                with open('gogo.js', 'r', encoding = 'utf-8') as fp:
                    self.js_code = fp.read()
                self.js_code += response.text
                self.js_ctx = execjs.compile(self.js_code)
            except Exception as e:
                print(e)
    
        def get_weather_data(self, method, obj):
            param = {
                self.encrypt_param: self.js_ctx.call(self.encrypt_func, method, obj)
            }
            print(param)
            response = requests.post(url = self.req_url, headers = self.headers, data = param, verify = False)
            return self.js_ctx.call(self.decode_func, response.text)
    
        def run(self, method, obj, months: list):
            self.init_js_code()
            for mon in months:
                if mon[0] == '2020-08-01':
                    obj.update({"startTime":f"{mon[0]} 00:00:00"})
                    obj.update({"endTime":f"{mon[1]} 00:00:00"})
                    js_data = self.get_weather_data(method, obj)
                    # with open('urumqi_weather_2020.json', 'a', encoding = 'utf-8') as wf:
                    #     wf.write(js_data + '
    ')
                    js_data = json.loads(js_data)['result']['data']['rows']
                    max_per_month = max(js_data, key = lambda x: float(x['temp']))['temp']
                    min_per_month = min(js_data, key = lambda x: float(x['temp']))['temp']
                    ave_per_month = sum([float(i['temp']) for i in js_data])
                    print([max_per_month, min_per_month, round(ave_per_month / len(js_data), 1)])
    
    
    if __name__ == '__main__':
    
        obj = {"city":"乌鲁木齐","type":"DAY","startTime":"2020-08-01 00:00:00","endTime":"2020-08-14 00:00:00"}
        #'GETCITYWEATHER'获取天气数据如温度、湿度、风力大小
        #'GETDETAIL'获取 pm2.5、co、so2...
        method = "GETCITYWEATHER"
        ll = map(lambda m, d: "2020" + '-' + "%02d"%m + '-' + "%02d"%d, [_ for _ in range(1, 9)], [31, 28, 31, 30, 31, 30, 31, 18])
        months = map(lambda x, y: ("2020" + '-' + "%02d"%x + '-' + "01", y), [_ for _ in range(1, 9)], [_ for _ in list(ll)])
        d = GetWeather()
        d.run(method, obj, months)

    四、效果图

    五、参考链接

    https://www.cnblogs.com/bobo-zhang/p/11243138.html

  • 相关阅读:
    JavaCore和HeapDump 规格严格
    日志输出(转) 规格严格
    Assert理解(51cto) 规格严格
    Telnet工具 规格严格
    Java工具应用(转老外) 规格严格
    杂项选用 规格严格
    What means the errormessage 'java.lang.OutOfMemoryError: GC overhead limit exceeded' in Java? 规格严格
    如何才能让你的SQL运行得更快
    如何才能让你的SQL运行得更快
    关于-非法的xml字符
  • 原文地址:https://www.cnblogs.com/GadyPu/p/13524486.html
Copyright © 2020-2023  润新知