• 《The Python Standard Library》——http模块阅读笔记1


    官方文档:https://docs.python.org/3.5/library/http.html

    偷个懒,截图如下:

    即,http客户端编程一般用urllib.request库(主要用于“在这复杂的世界里打开各种url”,包括:authentication、redirections、cookies and more.)。

    1. urllib.request—— Extensible library for opening URLs

      使用手册,结合代码写的很详细:HOW TO Fetch Internet Resources Using The urllib Package 

    该模块提供的函数:

    urllib.request.urlopen(urldata=None[timeout]*cafile=Nonecapath=Nonecadefault=Falsecontext=None)

    urllib.request.install_opener(opener)

    urllib.request.build_opener([handler...])

    urllib.request.pathname2url(path)

    urllib.request.url2pathname(path)

    urllib.request.getproxies()

    该模块提供的类:

    class urllib.request.Request(urldata=Noneheaders={}origin_req_host=Noneunverifiable=Falsemethod=None)

    class urllib.request.OpenerDirector

    class urllib.request.BaseHandler

    class urllib.request.HTTPDefaultErrorHandler

    class urllib.request.HTTPRedirectHandler

    class urllib.request.HTTPCookieProcessor(cookiejar=None)

    class urllib.request.ProxyHandler(proxies=None)

    class urllib.request.HTTPPasswordMgr

    还有很多,不一一列出了。。。

    1.2 Request对象

    下面的方法是Request提供的公共接口,所以它们可以被子类重写。同时,也提供了一些客户端可以查阅解析的请求的公共属性。

    Request.full_url  Request.type  Request.host  Request.origin_req_host #不包含端口号

    Request.selector  Request.data  Request.unverifiable  Request.method

    Request.get_method()  Request.add_header(keyval)  Request.add_unredirected_header(keyheader)  Request.has_header(header)  Request.remove_header(header)

    Request.get_full_url()  Request.set_proxy(hosttype)  Request.get_header(header_namedefault=None)  Request.header_items()

    1.3 OpenerDirector Objects

    有以下方法:

    OpenerDirector.add_handler(handler)

    OpenerDirector.open(urldata=None[, timeout])

    OpenerDirector.error(proto*args)

    1.4  BaseHandler Objects

    1.5 HTTPRedirectHandler Objects

    1.6 HTTPCookieProcessor Objects

    它只有一个属性:HTTPCookieProcessor.cookiejar ,所有的cookies都保存在http.cookiejar.CookeiJar中。

    1.x 还有太多类,需要用时直接查看官方文档吧。。

    EXamples

    打开url读取数据:

    >>> import urllib.request
    >>> with urllib.request.urlopen('http://www.python.org/') as f:
    ...     print(f.read(300))
    ...
    b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
    
    <html
    xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    
    <head>
    
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    
    <title>Python Programming '
    注意:urlopen返回一个bytes object(字节对象)。
    >>> with urllib.request.urlopen('http://www.python.org/') as f:
    ...     print(f.read(100).decode('utf-8'))
    ...
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtm

    向CGI的stdin发送数据流:

    >>> import urllib.request
    >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
    ...                       data=b'This data is passed to stdin of the CGI')
    >>> with urllib.request.urlopen(req) as f:
    ...     print(f.read().decode('utf-8'))
    ...
    Got Data: "This data is passed to stdin of the CGI"

    CGI的另一端通过stdin接收数据:

    #!/usr/bin/env python
    import sys
    data = sys.stdin.read()
    print('Content-type: text/plain
    
    Got Data: "%s"' % data)

    Use of Basic HTTP Authentication:

    import urllib.request
    # Create an OpenerDirector with support for Basic HTTP Authentication...
    auth_handler = urllib.request.HTTPBasicAuthHandler()
    auth_handler.add_password(realm='PDQ Application',
                              uri='https://mahler:8092/site-updates.py',
                              user='klem',
                              passwd='kadidd!ehopper')
    opener = urllib.request.build_opener(auth_handler)
    # ...and install it globally so it can be used with urlopen.
    urllib.request.install_opener(opener)
    urllib.request.urlopen('http://www.example.com/login.html')

    添加HTTP头部:

    import urllib.request
    req = urllib.request.Request('http://www.example.com/')
    req.add_header('Referer', 'http://www.python.org/')
    # Customize the default User-Agent header value:
    req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
    r = urllib.request.urlopen(req)

    OpenerDirector automatically adds a User-Agent header to every Request. To change this:

    import urllib.request
    opener = urllib.request.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    opener.open('http://www.example.com/')

    Also, remember that a few standard headers (Content-LengthContent-Type and Host) are added when the Request is passed to urlopen() (or OpenerDirector.open()).

    GET:

    >>> import urllib.request
    >>> import urllib.parse
    >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
    >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
    >>> with urllib.request.urlopen(url) as f:
    ...     print(f.read().decode('utf-8'))

    POST:

    >>> import urllib.request
    >>> import urllib.parse
    >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
    >>> data = data.encode('ascii')
    >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
    ...     print(f.read().decode('utf-8'))

    The following example uses an explicitly specified HTTP proxy, overriding environment settings:

    >>> import urllib.request
    >>> proxies = {'http': 'http://proxy.example.com:8080/'}
    >>> opener = urllib.request.FancyURLopener(proxies)
    >>> with opener.open("http://www.python.org") as f:
    ...     f.read().decode('utf-8'

    The following example uses no proxies at all, overriding environment settings:

    >>> import urllib.request
    >>> opener = urllib.request.FancyURLopener({})
    >>> with opener.open("http://www.python.org/") as f:
    ...     f.read().decode('utf-8')



  • 相关阅读:
    URL解析模式(伪静态)
    PHP各环境下的伪静态配置
    亚马逊-购书(电子)
    前端路由-JS实现
    SpringBoot 2.3.0.RELEASE版本后自定义404页面,SpringBoot 404错误兼容Ajax请求
    不设置DIV宽度水平居中,div不设置宽度居中
    js 保留两位小数,Js四舍五入,JavaScript Math四舍五入
    Laravel 自定义公共函数的引入
    EF Core3.1 CodeFirst动态自动添加表和字段的描述信息
    Android 高德地图API INVALID_USER_SCODE 错误
  • 原文地址:https://www.cnblogs.com/miaoxiaonao/p/8631561.html
Copyright © 2020-2023  润新知