Python网络爬虫与信息提取（二）（BeautifulSoup库）

Python网络爬虫与信息提取（二）（BeautifulSoup库）
BeautifulSoup库是解析、遍历、维护.html或.xml的功能库

①BeautifulSoup库的安装：

在cmd命令行中输入： pip install beautifulsoup4即可

②BeautifulSoup库的引用：
```
from bs4 import BeautifulSoup
```
BeautifulSoup库，也叫beautifulsoup4或bs4

③检测Beautiful Soup库是否安装成功以及使用BeautifulSoup库对网页进行解析：

整个解析过程的主要代码：
```
from bf4 import BeautifulSoup
soup=BeautifulSoup('<p>data</p>','html.parser')
```
④BeautifulSoup库的四种解析器：

⑤BeautifulSoup类的基本元素及相应用法：

在DOS命令下：

C:UsersAdministratorpython

>>>import requests

>>>r=requests.get(“http://python123.io/ws/demo.html”)

>>>r.text

>>>demo=r.text

>>>from bs4 import BeautifulSoup

>>>soup=BeautifulSoup(demo,”html.parser”)

>>>print(soup.prettify())

>>>soup.title

>>>tag=soup.a

>>>tag

Comment的用法：

⑥基于bs4库的HTML内容遍历方法

标签树的下行遍历：

遍历儿子节点 ==> for child in soup.body.children:

       print(child)

遍历子孙节点 ==> for child in soup.body.children:

        print(child)

标签树的上行遍历：

属性 .parent      说明    节点的父类标签

属性 .parents    说明     节点先辈标签的迭代类型，用于循环遍历先辈节点

标签树的平行遍历：

平行遍历发生在同一个父节点下的各节点间

1）遍历后续节点

for sibling in soup.a.next_siblings:

      print(sibling)

2)遍历前续节点

for sibling in soup.a.previous_siblings:

      print(sibling)
天晴了，起飞吧
相关阅读:
python 类的自定义属性
 好的設計模式
 sql server 过滤重复数据
 sql server 2000 sp3
css
SQL server 与Oracle开发比较（同事帮忙整理的，放这里方便查询）
深圳帮部门招聘人才
 BCP等三個sql server 過程
 CTE and CONNECT BY 樹的查詢(轉)
继承System.Web.UI.Page的页面基类
原文地址：https://www.cnblogs.com/jianqiao123/p/11176124.html