• Scrapy设置代理Proxy


    一. From: http://www.sharejs.com/codes/Python/8309

    1.在Scrapy工程下新建“middlewares.py”

     1 # Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
     2 import base64
     3  
     4 # Start your middleware class
     5 class ProxyMiddleware(object):
     6     # overwrite process request
     7     def process_request(self, request, spider):
     8         # Set the location of the proxy
     9         request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
    10  
    11         # Use the following lines if your proxy requires authentication
    12         proxy_user_pass = "USERNAME:PASSWORD"
    13         # setup basic authentication for the proxy
    14         encoded_user_pass = base64.encodestring(proxy_user_pass)
    15         request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
    16 
    17 
    18 #该代码片段来自于: http://www.sharejs.com/codes/Python/8309

    2.在项目配置文件里(./project_name/settings.py)添加

    1 DOWNLOADER_MIDDLEWARES = {
    2     'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
    3     'project_name.middlewares.ProxyMiddleware': 100,
    4 }

    只要两步,现在请求就是通过代理的了。测试一下^_^

     1 from scrapy.spider import BaseSpider
     2 from scrapy.contrib.spiders import CrawlSpider, Rule
     3 from scrapy.http import Request
     4  
     5 class TestSpider(CrawlSpider):
     6     name = "test"
     7     domain_name = "whatismyip.com"
     8     # The following url is subject to change, you can get the last updated one from here :
     9     # http://www.whatismyip.com/faq/automation.asp
    10     start_urls = ["http://xujian.info"]
    11  
    12     def parse(self, response):
    13         open('test.html', 'wb').write(response.body)
    14 #该代码片段来自于: http://www.sharejs.com/codes/Python/8309

    二.From: http://blog.csdn.net/haipengdai/article/details/50972983

    http://stackoverflow.com/questions/4710483/scrapy-and-proxies

    增加文件middlewares.py放置在setting.py平行的目录下

     1 import base64
     2 class ProxyMiddleware(object):
     3 # overwrite process request
     4 def process_request(self, request, spider):
     5     # Set the location of the proxy
     6     request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
     7 
     8     # Use the following lines if your proxy requires authentication
     9     proxy_user_pass = "USERNAME:PASSWORD"
    10     # setup basic authentication for the proxy
    11     encoded_user_pass = base64.b64encode(proxy_user_pass)
    12     request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

    很多网上的答案使用base64.encodestring来编码proxy_user_pass,有一种情况,当username太长的时候,会出现错误,所以推荐使用b64encode编码方式

    然后在setting.py中,在DOWNLOADER_MIDDLEWARES中把它打开,projectname.middlewares.ProxyMiddleware: 1就可以了

  • 相关阅读:
    随感
    LIKE运算符
    数据库运行时的关键字先后顺序
    联表查询
    进程、线程、协程
    算法复杂度
    redis支持的数据类型
    面向对象编程和面向过程编程的区别总结
    判断对象的变量是否存在,isset和property_exists区别
    构造函数和析构函数
  • 原文地址:https://www.cnblogs.com/v-BigdoG-v/p/7443623.html
Copyright © 2020-2023  润新知