Scrapy设置代理Proxy

一. From: http://www.sharejs.com/codes/Python/8309

1.在Scrapy工程下新建“middlewares.py”

 1 # Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
 2 import base64
 3  
 4 # Start your middleware class
 5 class ProxyMiddleware(object):
 6     # overwrite process request
 7     def process_request(self, request, spider):
 8         # Set the location of the proxy
 9         request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
10  
11         # Use the following lines if your proxy requires authentication
12         proxy_user_pass = "USERNAME:PASSWORD"
13         # setup basic authentication for the proxy
14         encoded_user_pass = base64.encodestring(proxy_user_pass)
15         request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
16 
17 
18 #该代码片段来自于: http://www.sharejs.com/codes/Python/8309

2.在项目配置文件里(./project_name/settings.py)添加

1 DOWNLOADER_MIDDLEWARES = {
2     'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
3     'project_name.middlewares.ProxyMiddleware': 100,
4 }

只要两步，现在请求就是通过代理的了。测试一下^_^

 1 from scrapy.spider import BaseSpider
 2 from scrapy.contrib.spiders import CrawlSpider, Rule
 3 from scrapy.http import Request
 4  
 5 class TestSpider(CrawlSpider):
 6     name = "test"
 7     domain_name = "whatismyip.com"
 8     # The following url is subject to change, you can get the last updated one from here :
 9     # http://www.whatismyip.com/faq/automation.asp
10     start_urls = ["http://xujian.info"]
11  
12     def parse(self, response):
13         open('test.html', 'wb').write(response.body)
14 #该代码片段来自于: http://www.sharejs.com/codes/Python/8309

二.From: http://blog.csdn.net/haipengdai/article/details/50972983

http://stackoverflow.com/questions/4710483/scrapy-and-proxies

增加文件middlewares.py放置在setting.py平行的目录下

 1 import base64
 2 class ProxyMiddleware(object):
 3 # overwrite process request
 4 def process_request(self, request, spider):
 5     # Set the location of the proxy
 6     request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
 7 
 8     # Use the following lines if your proxy requires authentication
 9     proxy_user_pass = "USERNAME:PASSWORD"
10     # setup basic authentication for the proxy
11     encoded_user_pass = base64.b64encode(proxy_user_pass)
12     request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

很多网上的答案使用base64.encodestring来编码proxy_user_pass，有一种情况，当username太长的时候，会出现错误，所以推荐使用b64encode编码方式

然后在setting.py中，在DOWNLOADER_MIDDLEWARES中把它打开，projectname.middlewares.ProxyMiddleware: 1就可以了

相关阅读:
随感
 LIKE运算符
 数据库运行时的关键字先后顺序
 联表查询
 进程、线程、协程
 算法复杂度
 redis支持的数据类型
 面向对象编程和面向过程编程的区别总结
 判断对象的变量是否存在，isset和property_exists区别
 构造函数和析构函数
原文地址：https://www.cnblogs.com/v-BigdoG-v/p/7443623.html