• Scrapy学习-12-使用DownloaderMiddleware随机修改User-Agent


    随机替换请求头中的User-Agent
    基于github开源项目,实现User-Agent的动态切换和管理
    1 https://github.com/hellysmile/fake-useragent
     
    fake-useragent维护不同User-Agent的字段值
    1 https://fake-useragent.herokuapp.com/browsers/0.1.8
    middlewares.py
    from fake_useragent import UserAgent
    
    class RandomUserAgentMiddlware(object):
        def __init__(self, crawler):
        super(RandomUserAgentMiddlware, self).__init__()
        self.ua = UserAgent()
        self.ua_type = crawler.settings.get("RANDOM_UA_TYPE", "random")
    
        @classmethod
        def from_crawler(cls, crawler):
            return cls(crawler)
    
        def process_request(self, request, spider):
            def get_ua():
                return getattr(self.ua, self.ua_type)
    
            request.headers.setdefault('User-Agent', get_ua())
    配置settings中downloader middleware的优先级
    DOWNLOADER_MIDDLEWARES = {
        'ArticleSpider.middlewares.JSPageMiddleware': 1,
        'ArticleSpider.middlewares.RandomUserAgentMiddlware': 543,
        'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    }
    RANDOM_UA_TYPE = "random"
     
  • 相关阅读:
    Loadrunner自带协议分析工具:Protocol Advisor
    selenium+python学习总结
    第三篇 HTML 表单及表格
    第二篇 HTML 常用元素及属性值
    第一篇 HTML 认识HTML
    int 问号的使用
    uploadify 上传文件插件
    poj3728 The merchant
    最大公约数
    Bzoj1529/POI2005 ska Piggy banks
  • 原文地址:https://www.cnblogs.com/cq146637/p/9072377.html
Copyright © 2020-2023  润新知