Python tldextract模块

Python tldextract模块
最新发布的 PyPI：
```
pip install tldextract
```
或者最新的开发版本：
```
pip install -e 'git://github.com/john-kurkowski/tldextract.git#egg=tldextract'
```
命令行用法，按空格分开网址：
```
tldextract http://forums.bbc.co.uk
# forums bbc co.uk
```
当第一次运行该模块时，它会用实时HTTP请求更新其后缀列表。这个更新的后缀集在无限期缓存/path/to/tldextract/.tld_set 。（可以说运行时引导类似这样不应该是默认行为，就像生产系统，但我想要你有最新的后缀，特别是当我没有保持这个代码的最新）。要避免此提取或控制缓存的位置，请通过设置后缀EXTRACT_CACHE环境变量或通过在后缀Extract初始化中设置cache_file路径来使用您自己的提取调用。
# extract callable that falls back to the included TLD snapshot, no live HTTP fetching no_fetch_extract = tldextract.TLDExtract(suffix_list_urls=None) no_fetch_extract('http://www.google.com') # extract callable that reads/writes the updated TLD set to a different path custom_cache_extract = tldextract.TLDExtract(cache_file='/path/to/your/cache/file') custom_cache_extract('http://www.google.com') # extract callable that doesn't use caching no_cache_extract = tldextract.TLDExtract(cache_file=False) no_cache_extract('http://www.google.com')
如果你想保持最新后缀定义 - 虽然他们不经常更改 - 偶尔删除缓存文件，运行更新命令
tldextract --update
或：
env TLDEXTRACT_CACHE="~/tldextract.cache" tldextract --update
也建议在升级此lib之后删除文件。

高级用法

为后缀列表数据指定自己的URL或文件

您可以指定自己的输入数据代替默认的Mozilla公共后缀列表：
extract = tldextract.TLDExtract( suffix_list_urls=["http://foo.bar.baz"], # Recommended: Specify your own cache file, to minimize ambiguities about where # tldextract is getting its data, or cached data, from. cache_file='/path/to/your/cache/file')
以上片段将与您指定的网址提取，在首先需要下载后缀列表（即如果cache_file不存在）。如果你想从你的本地文件系统使用的输入数据，只需要使用file://协议：
extract = tldextract.TLDExtract( suffix_list_urls=["file://absolute/path/to/your/local/suffix/list/file"], cache_file='/path/to/your/cache/file')
请使用绝对路径suffix_list_urls关键字参数。 os.path是友好路径。
相关阅读:
main函数的一些特性
 确保函数的操作不超出数组实参的边界
 今天学习了一点sed
libevent 与事件驱动
 mvc3 action验证失败后的自定义处理
 使用spring.net+nibernate时如何用aspnet_regiis加密数据库连接字符串
 C# 中 IList IEnumable 转换成 List类型
 Nhibernate 过长的字符串报错 dehydration property
小论接口(interface)和抽象类(abstract class)的区别
 C# 语言在函数参数列表中出现this关键词的作用
原文地址：https://www.cnblogs.com/ltn26/p/11082648.html

Python tldextract模块

高级用法

为后缀列表数据指定自己的URL或文件