网络爬虫BeautifulSoup库的使用

使用BeautifulSoup库提取HTML页面信息

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

url='http://python123.io/ws/demo.html'
r=requests.get(url)
if r.status_code==200:
    print('网络请求成功')

demo=r.text
soup=BeautifulSoup(demo,'html.parser')
print(soup.prettify())

BeautifulSoup类的基本属性

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

url='http://python123.io/ws/demo.html'
r=requests.get(url)
if r.status_code==200:
    print('网络请求成功')

demo=r.text
soup=BeautifulSoup(demo,'html.parser')
tag_title=soup.title
print(tag_title)
tag_a_attrs=soup.a.attrs
print(soup.p.string)

标签树的下行遍历

#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup

url='http://python123.io/ws/demo.html'
r=requests.get(url)
if r.status_code==200:
    print('网络请求成功')

demo=r.text
soup=BeautifulSoup(demo,'html.parser')

print(soup.prettify())
print('我是分割线'.center(80,'-'))
#遍历子节点

for child in soup.body.children:
    print(child)
#遍历子孙节点
for descendant in soup.body.descendants:
    print(descendant)

标签树的上行遍历

遍历title标签的上一级标签

print(soup.title.parent)

#a标签的下一标签
print(soup.a.next_sibling)

遍历a标签的所有前序节点以及后续节点

#遍历a标签的前序节点
for sibling in soup.a.next_siblings:
    print(sibling)
#遍历a标签的前序节点
for sibling in soup.a.previous_siblings:
    print(sibling)

soup标签的上一级标签为空，所以要进行判断

相关阅读:
Postman使用教程
CAD和ArcGIS转换矢量配准
SAP CRM Advanced search和Simple search里Max hit表现行为的差异
SAP CRM Product simple search的启用步骤
如何快速定位SAP CRM订单应用(Order Application)错误消息抛出的准确位置
如何动态修改SAP CRM WebClient UI表格栏的宽度
如何在SAP CRM WebClient UI里创建web service并使用ABAP消费
如何处理SAP CRM Web Service错误
如何使用SAP CRM WebClient UI实现一个类似新浪微博的字数统计器
如何开启SAP CRM基于WORD模板创建附件的功能

原文地址：https://www.cnblogs.com/liberate20/p/10778322.html