BeautifulSoup库是解析、遍历、维护.html或.xml的功能库
①BeautifulSoup库的安装:
在cmd命令行中输入: pip install beautifulsoup4即可
②BeautifulSoup库的引用:
from bs4 import BeautifulSoup
BeautifulSoup库,也叫beautifulsoup4或bs4
③检测Beautiful Soup库是否安装成功以及使用BeautifulSoup库对网页进行解析:
整个解析过程的主要代码:
from bf4 import BeautifulSoup soup=BeautifulSoup('<p>data</p>','html.parser')
④BeautifulSoup库的四种解析器:
⑤BeautifulSoup类的基本元素及相应用法:
在DOS命令下:
C:UsersAdministratorpython
>>>import requests
>>>r=requests.get(“http://python123.io/ws/demo.html”)
>>>r.text
>>>demo=r.text
>>>from bs4 import BeautifulSoup
>>>soup=BeautifulSoup(demo,”html.parser”)
>>>print(soup.prettify())
>>>soup.title
>>>tag=soup.a
>>>tag
Comment的用法:
⑥基于bs4库的HTML内容遍历方法
标签树的下行遍历:
遍历儿子节点 ==> for child in soup.body.children:
print(child)
遍历子孙节点 ==> for child in soup.body.children:
print(child)
标签树的上行遍历:
属性 .parent 说明 节点的父类标签
属性 .parents 说明 节点先辈标签的迭代类型,用于循环遍历先辈节点
标签树的平行遍历:
平行遍历发生在同一个父节点下的各节点间
1)遍历后续节点
for sibling in soup.a.next_siblings:
print(sibling)
2)遍历前续节点
for sibling in soup.a.previous_siblings:
print(sibling)