• Python模块-requests(一)


    requests不是python自带的,使用前需要安装

    • 发送请求

    HTTP请求类型有GET,POST,PUT,DELETE,HEAD和OPTIONS

    使用requests发送请求的方法如下:

    >>> import requests
    >>> r = requests.get("http://httpbin.org/get") #发送GET请求
    >>> r = requests.post("http://httpbin.org/post") #发送POST请求
    >>> r = requests.put("http://httpbin.org/put") #发送PUT请求
    >>> r = requests.delete("http://httpbin.org/delete") #发送DELETE请求
    >>> r = requests.head("http://httpbin.org/get") #发送HEAD请求
    >>> r = requests.options("http://httpbin.org/get") #发送OPTIONS请求
    
    • 传递URL参数

    params参数会对传入的参数进行拼接处理

    通常使用params传的参数为字典的格式

    >>> import requests
    >>> payload = {"word":"test","page":11}
    >>> r = requests.get("http://httpbin.org/get", params=payload)
    >>> print(r.url) #打印r里的url参数的值
    http://httpbin.org/get?word=test&page=11
    

    字典里的值还可以是列表

    >>> payload = {"word":"test","page":[1,2,3]}
    >>> r = requests.get("http://httpbin.org/get", params=payload)
    >>> print(r.url)
    http://httpbin.org/get?word=test&page=1&page=2&page=3
    

    字典中的值为None的键将不会被传参数到url里

    >>> payload = {"word":"test","page":None}
    >>> r = requests.get("http://httpbin.org/get", params=payload)
    >>> print(r.url)
    http://httpbin.org/get?word=test
    

    params传的参数也可以直接是字符串

    >>> payload = "word=test&page=11"
    >>> r = requests.get("http://httpbin.org/get", params=payload)
    >>> print(r.url)
    http://httpbin.org/get?word=test&page=11
    
    • 响应内容

    requests能读取服务器响应的内容

    >>> r = requests.get("https://www.cnblogs.com/")
    >>> r.text #获取网页源代码
    '''此处为网页源代码'''
    >>> r.encoding #查看网页源代码的编码
    'utf-8'
    >>> r.encoding = 'GBK' #把网页源码的编码改为gbk
    >>> r.encoding #再调用的时候,发现网页编码变成了gbk了
    'GBK'
    
    • 二进制响应内容

    对于非文本请求,requests也能用字节的方式来访问请求响应体

    >>> r = requests.get("http://p1.ifengimg.com/a/2018_06/75880eeacd0823d_size11_w230_h152.jpg")
    >>> r.content
    '''此处为bytes类型的图片内容'''
    >>> r.text
    '''一堆乱码'''
    

    该方式也能用于文本请求,不过返回的结果为bytes类型

    >>> r = requests.get("https://www.cnblogs.com/")
    >>> r.text
    '''此处为文本类型的网页源代码'''
    >>> r.content
    '''此处为bytes类型的网页源代码'''
    
    • JSON响应内容

    requests中也有一个内置的json解码器,帮助我们处理json数据

    >>> import requests
    >>> r = requests.get("https://github.com/timeline.json")
    >>> r.json()
    {'message': 'Hello there, wayfaring stranger. If you’re reading this then you probably didn’t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.', 'documentation_url': 'https://developer.github.com/v3/activity/events/#list-public-events'}
    >>> r.status_code
    410
    >>> r.raise_for_status
    <bound method Response.raise_for_status of <Response [410]>>
    

    如果json数据解码失败,就会抛出一个ValueError: No JSON object could be decoded的异常

    但是成功调用r.json()也不能说明响应成功,有的服务器会在失败的响应中包含一个json对象,如HTTP 500的错误细节,这种json也会被解码返回

    所以要检查请求是否成功,可以使用r.status_code和r.raise_for_status来检查

    • 原始响应内容

    requests获取来自服务器的原始套接字响应

    >>> import requests
    >>> r = requests.get("http://httpbin.org/get", stream=True)
    >>> r.raw
    <urllib3.response.HTTPResponse object at 0x000001B93F230518>
    >>> r.raw.read(300)
    b', 
        "Accept-Encoding": "gzip, deflate", 
        "Connection": "close", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.18.4"
      }, 
      "origin": "110.90.39.155", 
      "url": "http://httpbin.org/get"
    }
    '
    

    要在初始请求中设置stream=True,然后用r.raw,可以使用r.raw.read()对内容进行读取

    • 定制请求头

    HTTP请求头为字典格式

    >>> headers = {"user-agent":"Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"}
    >>> r = requests.get("http://httpbin.org/get",headers=headers)
    

    定制的请求头的优先级低于某些特定的信息源,例如:

    1. 如果在 .netrc 中设置了用户认证信息,使用请求头设置的授权就不会生效,而如果设置了 auth= 参数,.netrc 的设置就无效了
    2. 如果被重定向到别的主机,授权的请求头就会被删除
    3. 代理授权请求头会被URL中提供的代理身份覆盖掉
    4. 在我们能判断内容长度的情况下,请求头的Content-Length会被改写

    requests不会因为定制的请求头的具体情况改变自己的行为

    只不过会在最后的请求中,所有的请求头信息都会被传递进去

    所有的请求头值必须是 string、bytestring 或者 unicode

    尽管传递 unicode header 也是允许的,但不建议这样做

    • POST请求

    想要给网站发送post数据,例如登陆某网站的时候,可以用requests发送post请求并发送数据

    要发送的数据可以传data参数,然后使用post请求进行发送

    发送的数据常为字典

    >>> payload = "test" #post的数据为字符串
    >>> r = requests.post("http://httpbin.org/post", data=payload)
    >>> print(r.text)
    {
      "args": {},
      "data": "test",
      "files": {},
      "form": {},
      "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "close",
        "Content-Length": "4",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.18.4"
      },
      "json": null,
      "origin": "110.90.39.155",
      "url": "http://httpbin.org/post"
    }
    
    >>> payload = {'username':'test','password':'test1234'} #post的数据为字典
    >>> r = requests.post("http://httpbin.org/post", data=payload)
    >>> print(r.text)
    {
      "args": {},
      "data": "",
      "files": {},
      "form": {
        "password": "test1234",
        "username": "test"
      },
      "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "close",
        "Content-Length": "31",
        "Content-Type": "application/x-www-form-urlencoded",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.18.4"
      },
      "json": null,
      "origin": "110.90.39.155",
      "url": "http://httpbin.org/post"
    }
    
    >>> payload = {'username':['test','test123'],'password':'test1234'} #post的数据为字典和列表
    >>> r = requests.post("http://httpbin.org/post", data=payload)
    >>> print(r.text)
    {
      "args": {},
      "data": "",
      "files": {},
      "form": {
        "password": "test1234",
        "username": [
          "test",
          "test123"
        ]
      },
      "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "close",
        "Content-Length": "48",
        "Content-Type": "application/x-www-form-urlencoded",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.18.4"
      },
      "json": null,
      "origin": "110.90.39.155",
      "url": "http://httpbin.org/post"
    }
    
    • POST请求发送文件

    post请求不仅可以发送数据,也可以发送二进制文件,参数为file=二进制文件

    >>> import requests
    >>> files = {'file': open('python.txt', 'rb')} #以二进制打开
    >>> r = requests.post('http://httpbin.org/post', files=files)
    >>> print(r.text)
    {
      "args": {}, 
      "data": "", 
      "files": {
        "file": "Python
    " #文件的内容
      }, 
      "form": {}, 
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Connection": "close", 
        "Content-Length": "153", 
        "Content-Type": "multipart/form-data; boundary=03080f2f96834a78b2d509d2741ff17a", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.9.1"
      }, 
      "json": null, 
      "origin": "110.90.39.155", 
      "url": "http://httpbin.org/post"
    }
    
    • 响应状态码

    可以检测响应状态码

    >>> r = requests.get('http://httpbin.org/get')
    >>> r.status_code
    200
    >>> r.status_code == requests.codes.ok #判断状态码是否为200
    True
    

    如果发送了一个错误的请求(4XX客户端错误,5XX服务器错误响应),我们可以使用raise_for_status()来抛出异常

    >>> r = requests.get('http://httpbin.org/status/404')
    >>> r.status_code
    404
    >>> r.raise_for_status
    <bound method Response.raise_for_status of <Response [404]>>
    >>> r.raise_for_status()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:UsershpAppDataRoamingPythonPython36site-packages
    equestsmodels.py", line 935, in raise_for_status
        raise HTTPError(http_error_msg, response=self)
    requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: http://httpbin.org/status/404
    

    如果状态码为200,raise_for_status()返回的None

    • 响应头

    获取一个请求的响应头

    >>> r = requests.get('http://httpbin.org/get')
    >>> r.headers
    {
    	'Connection': 'keep-alive', 
    	'Server': 'meinheld/0.6.1', 
    	'Date': 'Sun, 04 Feb 2018 10:27:03 GMT', 
    	'Content-Type': 'application/json', 
    	'Access-Control-Allow-Origin': '*', 
    	'Access-Control-Allow-Credentials': 'true', 
    	'X-Powered-By': 'Flask', 
    	'X-Processed-Time': '0.000623941421509', 
    	'Content-Length': '266', 
    	'Via': '1.1 vegur'
    }
    

    获取请求头中特定的一些内容,如Content-Type和X-Powered-By

    >>> r = requests.get('http://httpbin.org/get')
    >>> r.headers.get("Content-Type")
    'application/json'
    >>> r.headers["Content-Type"]
    'application/json'
    >>> r.headers.get("X-Powered-By")
    'Flask'
    >>> r.headers["X-Powered-By"]
    'Flask'
    

    就是根据字典的键获取对应的值

    • COOKIE

    如果响应中包含cookie,我们可以快速地访问他们

    >>> r = requests.get("http://httpbin.org/get")
    >>> r.cookies['example_cookie_name']
    'example_cookie_value'
    

    如果要发送cookies给网站,可以使用cookies参数

    >>> cookies = {'uesrname':'test','password':'test1234'}
    >>> r = requests.get('http://httpbin.org/cookies',cookies=cookies)
    >>> print(r.text)
    {
      "cookies": {
        "password": "test1234",
        "uesrname": "test"
      }
    }
    

    cookie返回的对象为RequestsCookieJar,它的行为和字典类似,但界面更为完整,适合跨域名跨路径使用。还可以把 Cookie Jar 传到 Requests 中

    >>> jar = requests.cookies.RequestsCookieJar()
    >>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
    >>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
    >>> url = 'http://httpbin.org/cookies'
    >>> r = requests.get(url, cookies=jar)
    >>> r.text
    '{"cookies": {"tasty_cookie": "yum"}}'
    
    • 重定向与请求历史

    默认情况下,除了HEAD请求,requests会处理所有的重定向请求

    可以使用响应对象的history方法来追踪重定向

    >>> r = requests.get('https://www.baidu.com/test.php')
    >>> r.status_code
    200
    >>> r.url
    'http://www.baidu.com/forbiddenip/forbidden.html'
    >>> r.history
    [<Response [302]>]
    

    如果使用的请求方式为GET,POST,PUT,OPTIONS,PATCH,DELETE时,可以通过allow_redirects参数禁用重定向处理

    >>> r = requests.get('https://www.baidu.com/test.php',allow_redirects=False)
    >>> r.status_code
    302
    >>> r.url
    'https://www.baidu.com/test.php'
    >>> r.history
    []
    

    如果HEAD请求方法需要重定向,也可以通过allow_redirects参数来进行重定向

    • 超时

    requests会在time参数设置的秒数过后停止等待响应

    如果不使用,程序可能会永远失去响应

    >>> requests.get('https://www.baidu.com', timeout=0.01)
    '''省略一大堆'''
    requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='www.baidu.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x000001B93F2C3D68>, 'Connection to www.baidu.com timed out. (connect timeout=0.01)'))
    

    timeout也能对读取时间进行设置

    >>> requests.get('https://www.baidu.com', timeout=(5,1))
    

    响应的时间为5秒,读取的时间为1秒

    • 错误与异常

    遇到网络问题(如:DNS 查询失败、拒绝连接等)时,Requests 会抛出一个 ConnectionError 异常

    如果 HTTP 请求返回了不成功的状态码, Response.raise_for_status() 会抛出一个 HTTPError 异常

    若请求超时,则抛出一个 Timeout 异常

    若请求超过了设定的最大重定向次数,则会抛出一个 TooManyRedirects 异常

    所有Requests显式抛出的异常都继承自 requests.exceptions.RequestException

  • 相关阅读:
    1
    webpack
    webpack32
    41324
    124
    CSS 32
    Git 分支管理
    Git 标签管理
    datetime的timedelta对象
    unittest中的testCase执行顺序
  • 原文地址:https://www.cnblogs.com/sch01ar/p/8413446.html
Copyright © 2020-2023  润新知