Python之美[从菜鸟到高手]--urlparse源码分析

Python之美[从菜鸟到高手]--urlparse源码分析
urlparse是用来解析url格式的，url格式如下：protocol :// hostname[:port] / path / [;parameters][?query]#fragment，其中;parameters一般用来指定特殊参数，使用的较少，至少我没怎么碰到，举几个链接：http://en.wikipedia.org/wiki/Robotics;Notes，http://en.wikipedia.org/wiki/Awesome;_I_Fuckin%27_Shot_That!

一：urlparse快速使用

urlparse(url, scheme='', allow_fragments=True)：将<scheme>://<netloc>/<path>;<params>?<query>#<fragment>解析成一个6元组：(scheme, netloc, path, params, query, fragment)。返回值是元组，继承自tuple，定义了一些属性，如netloc等。urlunparse是其逆操作。
```
from urlparse import *
url="http://www.test.com/search?key=python"
parse=urlparse(url)
print parse #('http', 'www.test.com', '/search','','key=python', '')
print parse.netloc #www.test.com
url2=urlunparse(parse)
print url2  #http://www.test.com/search?key=python
```
urlsplit(url, scheme='', allow_fragments=True)：将<scheme>://<netloc>/<path>?<query>#<fragment>解析成一个5元组：(scheme, netloc, path, query, fragment)。urlunsplit是其逆操作。和urlparse很像，只是少了一个较少适用的参数，urlparse的内部实现就是调用urlsplit，如果url中没有[;parameters]，建议使用urlsplit，更明确，更简洁。
```
from urlparse import *
url="http://www.test.com/search?key=python"
parse=urlsplit(url)
print parse #('http', 'www.test.com', '/search','key=python', '')
print parse.netloc #www.test.com
url2=urlunsplit(parse)
print url2  #http://www.test.com/search?key=python
```
二：源码分析

上述两个函数返回的对象都是元组，且都有自己的方法，主要是因为结果集是继承自tuple,代码如下：
```
class BaseResult(tuple):
    __slots__ = ()
    @property
    def scheme(self):
        return self[0]

    @property
    def username(self):
        netloc = self.netloc
        if "@" in netloc:
            userinfo = netloc.split("@", 1)[0]
            if ":" in userinfo:
                userinfo = userinfo.split(":", 1)[0]
            return userinfo
        return None

    

class SplitResult(BaseResult):

    __slots__ = ()

    def __new__(cls, scheme, netloc, path, query, fragment):
        return BaseResult.__new__(
            cls, (scheme, netloc, path, query, fragment))

    def geturl(self):
        return urlunsplit(self)


class ParseResult(BaseResult):

    __slots__ = ()

    def __new__(cls, scheme, netloc, path, params, query, fragment):
        return BaseResult.__new__(
            cls, (scheme, netloc, path, params, query, fragment))

    @property
    def params(self):
        return self[3]

    def geturl(self):
        return urlunparse(self)
```
其中SplitResult是urlsplit的返回值，ParseResult是urlparse的返回值，可以看出主要区别还是有无params参数。从这里也可以学习到如何扩展数据结构，tuple接受一个序列作为参数，不止是上述的元组对像，且__new__需要返回构建的对象。我们可以实现自己的扩展元组，接受一list对象。

注意一下BaseResult的__slot__用法，__slot__作用是阻止类实例化对象时分配__dict__，而如果有了__dict__，那么随便添加属性就很方便了。BaseResult将__slot__设为空，就是为了随意给返回对象添加属性，而我们刚刚自定义的就不一样。

我们看看BaseResult，

三：其它

urljoin(base, url, allow_fragments=True)，合成url函数，还记得项目中是自己写的，汗，这边有现成的。

urldefrag(url)，将url中的fragment去的，即去掉“#”后面的链接。

_splitnetloc(url, start=0)，从url中获取netloc。

值得说明一点的是整个urlparse模块都没有采用正则去匹配数据，完全是序列话的分析，很值得一看。
相关阅读:
android_SurfaceView 画图
 android_activity_研究(二)
android_sdcard读写(一)
双缓冲的小程序
 C++运算符重载小程序
 再练
 菜鸟的第一个博客（用java写的个小文本框）
LUCENE第一个小程序（有错）
蜗牛在奔跑
 指定目录下的java文件存储到一个Txt文本文件中
原文地址：https://www.cnblogs.com/riasky/p/3429149.html

Python之美[从菜鸟到高手]--urlparse源码分析

一：urlparse快速使用

二：源码分析

三：其它