爬虫白名单,在扫描的时候特别有用,伪造成爬虫,绕过检测。
自己写示例代码(有工具直接支持吗???):
我自己写的一个示例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
#coding: utf-8 import requests headers = { #'User-Agent':"Mozilla/5.0 (compatible;Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)" 'User-Agent' : "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" } domain = "http://XXX.com/" with open ( "dicc.txt" ) as f: for line in f: path = line.strip() url = domain + path res = requests.get(url=url,headers=headers) status = res.status_code print( "url:{} status:{}" . format (url, status)) # print("response: ", res.text) # break |