在解析xml文件时,需要url解析,代码如下:
from urllib.parse import urlparse
result = urlparse("http://sports.sohu.com/20041115/b222992554.shtml")
print(result)
url_lb = result.hostname.strip().split('.')[0]
print(url_lb)
输出结果为:
D:installedAnaconda3python.exe E:/文本分类——3/delete.py
ParseResult(scheme='http', netloc='sports.sohu.com', path='/20041115/b222992554.shtml', params='', query='', fragment='')
sports
Process finished with exit code 0