• 代理操作


    下载中间件作用: 拦截请求,可以将请求的ip进行更换

    流程:

       1.下载中间件类的自制定

        a) object

        b) 重写process_request(self, request, spider)的方法

      2.配置文件中进行下载中间价的开启

      middlewares.py

    # -*- coding: utf-8 -*-
    
    # Define here the models for your spider middleware
    #
    # See documentation in:
    # https://doc.scrapy.org/en/latest/topics/spider-middleware.html
    
    from scrapy import signals
    
    
    class middleadd(object):
    
        def process_request(self, request, spider):
            request.meta["proxy"] = "157.65.31.220:3128"

    settings.py里开启中间件

    spider/midtest.py

    import scrapy
    
    
    class MidtestSpider(scrapy.Spider):
        name = 'midtest'
        # allowed_domains = ['www.baidu.com']
        start_urls = ["https://www.baidu.com/s?wd=ip"]
    
        def parse(self, response):
            fp = open("record.html", "w",encoding="utf-8")
            fp.write(response.text)

    获取免费代理从 www.goubanjia.com

  • 相关阅读:
    Linux 线程池的简单实现
    m3u8(HLS) 抓包
    一个面试问题的思考
    简单实现无需密码 sudo
    转: NAT 穿透
    一个平均分配算法
    raft 算法扫盲
    20210615 JVM 优化
    20210614. 并发编程
    20210606 Java 并发编程之美
  • 原文地址:https://www.cnblogs.com/cjj-zyj/p/10144106.html
Copyright © 2020-2023  润新知