1、安装
下载地址:https://pypi.python.org/pypi/beautifulsoup4/4.5.3
安装:pip install beautiful4
pip install beautifulsoup4
Collecting beautifulsoup4
Downloading beautifulsoup4-4.5.3-py3-none-any.whl (85kB)
100% |████████████████████████████████| 92kB 460kB/s
Installing collected packages: beautifulsoup4
Successfully installed beautifulsoup4-4.5.3
判断是否安装成功:from bs4 import BeautifulSoup
2、example:
from bs4 import BeautifulSoup
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html)
soup = BeautifulSoup(open('index.html'))
print (soup.prettify())
3、beautifulsoup简介
Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种:
- Tag
- NavigableString
- BeautifulSoup
- Comment
- print (soup.title)
print (soup.head)
print (soup.a)
print (soup.p)
print (soup.name)
print (soup.head.name)
print (soup.p.attrs)
print (soup.p.get('class'))
soup.p['class']="newClass"
print (soup.p)
print (soup.p.string)