• 5.爬虫 requests库讲解 高级用法


    0.文件上传

    import requests
    
    files = {'file': open('favicon.ico', 'rb')}
    response = requests.post("http://httpbin.org/post", files=files)
    print(response.text)

    1.获取cookies

    import requests
    
    response = requests.get("https://www.baidu.com")
    print(response.cookies)
    for key, value in response.cookies.items():
        print(key + '=' + value)

    2.会话维持

    import requests
    
    requests.get('http://httpbin.org/cookies/set/number/123456789')
    response = requests.get('http://httpbin.org/cookies')
    print(response.text)

    *可以通过http://httpbin.org/cookies/set/number/123456789对这个网址设置个cookies

    输出结果如下:

    {
      "cookies": {}
    }
    

    为空?!因为两次get请求,实际上相当于你用两个浏览器打开了不同的网页。用Session()方法试试?

    import requests
    
    s = requests.Session()
    s.get('http://httpbin.org/cookies/set/number/123456789')
    response = s.get('http://httpbin.org/cookies')
    print(response.text)

    输出结果如下:

    {
      "cookies": {
        "number": "123456789"
      }
    }
    

    * 用Session()我们实现了维持会话登陆模拟登陆(即用于模拟在一个浏览器中打开同一站点的不同页面)

    3.证书验证

    import requests
    
    response = requests.get('https://www.12306.cn')
    print(response.status_code)
    # 提示出现SSLError表示证书验证错误
    ####################### #去除警告 import requests from requests.packages import urllib3
    urllib3.disable_warnings() response
    = requests.get('https://www.12306.cn', verify=False) print(response.status_code) ####################### #指定一个本地证书用作客户端证书 import requests response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key')) print(respo
    nse.status_code)

    4.代理设置

    #无密码的
    import requests
    
    proxies = {
      "http": "http://127.0.0.1:9743",
      "https": "https://127.0.0.1:9743",
    }
    
    response = requests.get("https://www.taobao.com", proxies=proxies)
    print(response.status_code)
    
    ##############################
    
    #有密码的
    import requests
    
    proxies = {
        "http": "http://user:password@127.0.0.1:9743/",
    }
    response = requests.get("https://www.taobao.com", proxies=proxies)
    print(response.status_code)
    
    ##############################
    
    #代理不支持http,支持sockes
    #pip3 install 'requests[socks]'
    import requests
    
    proxies = {
        'http': 'socks5://127.0.0.1:9742',
        'https': 'socks5://127.0.0.1:9742'
    }
    response = requests.get("https://www.taobao.com", proxies=proxies)
    print(response.status_code)

    5.超时设置

    import requests
    from requests.exceptions import ReadTimeout
    try:
        response = requests.get("http://httpbin.org/get", timeout = 0.5)
        print(response.status_code)
    except ReadTimeout:
        print('Timeout')

     *timeout = (5,30) 5是连接超时时间 30是读取超时时间

     *timeout = 35 35是连接和读取两者之和

    *timeout = None 或者我不设置 代表永久等待

    6.认证设置

    import requests
    from requests.auth import HTTPBasicAuth
    
    r = requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user', '123'))
    #还可以像下面这样写 简单些(默认使用HTTPBasicAuth这个类来认证 当然这个网址访问不了的)
    #r = requests.get('http://120.27.34.24:9001', auth=('user', '123'))
    print(r.status_code)

    7.异常处理

    import requests
    from requests import ReadTimeout, ConnectionError, RequestException
    try:
        response = requests.get("http://httpbin.org/get", timeout = 0.5)
        print(response.status_code)
    except ReadTimeout:
        print('Timeout')
    except ConnectionError:
        print('Connection error')
    except RequestException:
        print('Error')

    *可以去requests库的官方文档,找到API,再看里面的异常!!

    8.Prepared Request

    *在urllib里,可以将请求表示为数据结构,其余各个参数都可以通过一个Request对象来表示.

    *在requests里,用Prepared Request同样可以做到!

    from requests import Request,Session
    url = "..."
    data = {'...':'...'}
    headers = {'User-Agent':'...'}
    s = Session()
    req = Request('POST',url,data = data,headers = headers)
    prepped = s.prepare_request(req)
    r = s.send(prepped)
    print(r.text)

    *在这里,我们引入Request,然后用url、data、headers参数构造了一个Requests对象,这时候调用Session的prepare_request()方法将其转换为一个Prepared Request对象,然后再调用send方法发送即可。

    *有了这个Requests对象,就可以将请求当作独立的对象来看待,这样在进行队列调度时会非常方便。

  • 相关阅读:
    抽象类存在的意义
    抽象类的特征
    抽象类的使用
    抽象类的概述
    引用类型作为方法参数和返回值
    继承的特点
    目前Java水平以及理解自我反思---01
    继承后- 构造器的特点
    指针函数
    C数组灵活多变的访问形式
  • 原文地址:https://www.cnblogs.com/DC0307/p/10679932.html
Copyright © 2020-2023  润新知