BeautifulSoup库的基本元素

BeautifulSoup库

<html>
    <body>
        <p class='title'></p>
    </body>
</html>

BeautifulSoup库是解析、遍历、维护、"标签树"的功能库

对标签的理解

<p class='title'></p>
<!--成对的尖括号和属性-->

导入beautifulsoup库

from bs4 import BeautifulSoup

import bs4

构造解析html的BeautifulSoup对象

from bs4 import BeautifulSoup
soup1=BeautifulSoup("<html>data</html>","html.parser")
soup2=BeautifulSoup(open("D://demo.html"),"html.parser")

BeautifulSoup库对应一个HTML/XML文档的全部内容

四种解析器

解析器	使用方法	条件
bs4的HTML解析器	BeautifulSoup(mk,'html.parser')	安装bs4库
lxml的HTML解析器	BeautifulSoup(mk,'lxml')	pip install lxml
lxml的xml解析器	BeautifulSoup(mk,'xml')	pip install lxml
html5lib的解析器	BeautifulSoup(mk,'html5lib')	pip install html5lib

五种基本元素

基本元素	说明
Tag	标签，<>开头和</>结尾
Name	标签的名字，格式.name
Attribute	标签的属性，字典形式进行组织,.attrs
NavigatableString	标签内非属性字符串，格式.string
Comment	标签内字符串注释部分

获取页面信息demo

from bs4 import BeautifulSoup
import requests
html=requests.get('http://python123.io/ws/demo.html').text
soup=Beautiful(demo,'html.parser')
tag=soup.a#获取第一个a标签
name=tag.name#'a'，标签的名称
parentName=soup.a.parent.name#获取父亲节点的名称
attr=tag.attrs#属性值，字典
attr['class']#访问对应标签的属性
type(attr)#字典
tag.a.string#标签之间的信息
newsoup=BeautifulSoup('<b><!--This is a comment-->></b><p>
This is not a comment</p>','html.parser')
type(newsoup.b.string)#注释类型
type(newsoup.p.string)#文本类型

相关阅读:
模块与包的导入
递归
day04
装饰器2_根据认证来源判断用户和计算登录时间
装饰器1_统计时间函数装饰欢迎登录函数
tail -f a.txt | grep 'python'
函数
内置函数1
python模块整理
VBS恶作剧代码

原文地址：https://www.cnblogs.com/mengxiaoleng/p/11581407.html