同步请求库requests用来做测试和简单爬虫其实非常好用的,今天来讲一讲,毕竟不熟悉就用,吃了很大亏啊,文档一定要好好看
http://docs.python-requests.org/zh_CN/latest/user/quickstart.html
一、最简单常用的用法
GET请求
response = requests.get('http://httpbin.org/get') print(response.text) # 输出 { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "origin": "xx.xx.xx.xx", "url": "http://httpbin.org/get" }
POST请求
form = {'name': 'happy_codes'} response = requests.post('http://httpbin.org/post', data=form) print(response.text) # form表单数据 { "args": {}, "data": "", "files": {}, "form": { "name": "happy_codes" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Content-Length": "16", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "json": null, "origin": "xx.xx.xx.xx", "url": "http://httpbin.org/post" }
二、加UA,加cookies,加代理
cookies除了使用dict之外,还可以使用cookiejar类,还可以直接给字符串
proxies={'http:': 'http://127.0.0.1', 'https': 'http:127.0.0.1'}
意思是http协议和https协议使用怎样的代理,没配置正确,就不会用代理,切记。
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"} cookies = {"STM": "1545720205", 'haha': '123'} response = requests.get('http://httpbin.org/get', headers=headers, cookies=cookies, proxies={'http': 'http://125.123.122.10:42207', 'https': 'http://125.123.122.10:42207'}) print(response.text) # 输出 { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Cookie": "STM=1545720205; haha=123", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36" }, "origin": "125.123.122.10:42207", "url": "http://httpbin.org/get" }
其实可以加的,都写在注释里面了,GET,POST都一样:
def request(method, url, **kwargs): """Constructs and sends a :class:`Request <Request>`. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary, list of tuples or bytes to send in the body of the :class:`Request`. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How many seconds to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) Either a boolean, in which case it controls whether we verify the server's TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. :return: :class:`Response <Response>` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request('GET', 'https://httpbin.org/get') <Response [200]> """
三、session类的使用
Session类的作用是用来维持一个会话,可以让多个请求共用cookie和headers和proxies
headers->dict类型,可以通过 session.headers.update(headers) 更新
cookies->cookie Jar类, 可使用 session.cookies.set(key, value) 更新
proxies->dict类型, 可以通过直接赋值 session.proxies = proxies 更新
通过 session.get() 发起请求
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"} proxy = '59.61.38.48:34719'
# requests.Session类 session = requests.Session() session.headers.update(headers) session.cookies.set('STM', '1231214') session.cookies.set('S', '123123') proxies = { 'http': 'http://%s' % proxy, 'https': 'http://%s' % proxy } session.proxies = proxies print(session.get('http://httpbin.org/get').text)
# 输出
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Cache-Control": "max-age=259200",
"Connection": "close",
"Cookie": "S=123123; STM=1231214",
"Host": "httpbin.org",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
},
"origin": "59.61.38.48",
"url": "http://httpbin.org/get"
}