Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门
https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6EmUbbW&id=564564604865
可以每隔10秒更换ip
http://www.jb51.net/article/65513.htm
http://blog.csdn.net/yueguanghaidao/article/details/25246867
今天同事想测试WAF的页面统计功能,所以需要模拟多个IP向多个域名发送请求,也就是需要修改源IP地址。这个如果使用socket库就比较麻烦了,
需要使用raw socket,相当麻烦。还好咱有scapy,轻松搞定。
DOMAIN是我随机构造的域名库,SOURCE也是随机构造的源IP地址。
- #!/usr/bin/env python
- #-*-encoding:UTF-8-*-
- from scapy.all import *
- from threading import Thread
- from Queue import Queue
- import random
- import string
- USER_AGENTS = ( # items used for picking random HTTP User-Agent header value
- "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7_0; en-US) AppleWebKit/534.21 (KHTML, like Gecko) Chrome/11.0.678.0 Safari/534.21",
- "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
- "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.2) Gecko/20020508 Netscape6/6.1",
- "Mozilla/5.0 (X11;U; Linux i686; en-GB; rv:1.9.1) Gecko/20090624 Ubuntu/9.04 (jaunty) Firefox/3.5",
- "Opera/9.80 (X11; U; Linux i686; en-US; rv:1.9.2.3) Presto/2.2.15 Version/10.10"
- )
- TOP_DOMAIN = ('com','org','net','gov','edu','mil','info','name','biz')
- DOMAIN = ["www.%s.%s" %(
- '.'.join(''.join(random.sample(string.ascii_lowercase, random.randint(2,6))) for x in range(random.randint(1,2))),
- random.choice(TOP_DOMAIN))
- for _ in range(100)
- ]
- SOURCE = ['.'.join((str(random.randint(1,254)) for _ in range(4))) for _ in range(100)]
- class Scan(Thread):
- HTTPSTR = 'GET / HTTP/1.0 Host: %s User-Agent: %s '
- def run(self):
- for _ in xrange(100):
- domain = random.choice(DOMAIN)
- http = self.HTTPSTR % (domain,random.choice(USER_AGENTS))
- try:
- request = IP(src=random.choice(SOURCE),dst=domain) / TCP(dport=80) / http
- #request = IP(dst=domain) / TCP(dport=80) / http
- send(request)
- except:
- pass
- task = []
- for x in range(10):
- t = Scan()
- task.append(t)
- for t in task:
- t.start()
- for t in task:
- t.join()
- print 'all task done!'
但这将导致一个问题,由于我们域名是随机构造的,发送请求肯定首先查找DNS,很可能解析失败。这里有两个方法解决这个问题:
1.将所有域名添加到hosts本地文件中,IP可以为服务器地址
2. 由于hosts文件不支持通配符表示,所以可以使用DNS代理,或者自己写小工具,想怎么解析就怎么解析,这里有一个,http://code.google.com/p/marlon-tools/source/browse/tools/dnsproxy/dnsproxy.py