遍历文档树 - 润新知

遍历文档树
1、直接子节点

(1)　.contents

tag的.contents属性可以讲tag的子节点以列表的方式输出，我们可以用列表索引的方式获取它的某一个yuansu

>>> print(soup.head.contents)

[<title>The Dormouse's story</title>]

>>> print(soup.head.contents[0])

<title>The Dormouse's story</title>

(2) .children

.children返回的是一个列表生成器，要想得到里面的内容，用for循环遍历一下即可。

>>> print(soup.head.children)

<list_iterator object at 0x02E29A10>

>>> for child in soup.head.children:

　　 print(child)

<title>The Dormouse's story</title>

2、所有子孙节点

.descendants

.contents和.children属性仅仅包含tag的直接子节点，.descendants属性可以对所有tag的子孙节点进行递归循环

和children类似，我们也需要遍历获取其中的内容。

for child in soup.descendants:

　　print(child)

运行结果会把所有的节点打印出来。

3、节点内容

知识点：.string属性

如果，tag只有一个NavigableString类型子节点，那么这个tag可以使用.string得到子节点。

如果tag仅有一个子节点，那么也可以用.string方法。两者输出结果相同。

通俗点讲：如果一个标签里面没有标签了，那么.string就会返回里面的内容，如果标签里面只有唯一

一个标签，那么.string也会返回最里面的内容。
```
 1 >>> soup.head.string
 2 
 3 "The Dormouse's story"
 4 
 5 >>> soup.title.string
 6 
 7 "The Dormouse's story"
 8 
 9 如果tag包含了多个子节点，那么tag就无法确定用.string方法获取哪个子节点的内容了，输出结果是none
10 
11 >>> print(soup.html.string)
None
```
4、多个内容

知识点：.strings(注意区别与.string)、.stripped_strings属性

.strings

获取多个内容，需要遍历才能呈现。
```
1 >>> for string in soup.strings:
2 
3 　　print(repr(string))
4 
5 .stripped_strings
```
输出的字符串可能包含了很多空格，使用.stripped_strings可以剥离多余的空格

5、父节点

知识点: .parent属性
```
p = soup.p

print p.parent.name

#body

content = soup.head.title.string

print content.parent.name

#title
```
即返回父节点。

6、全部父节点

知识点：.parents

通过元素的.parents属性可以递归得到元素的所有父辈节点
```
>>> content = soup.head.title.string

>>> content "The Dormouse's story"

>>> for parent in content.parents:

　　print(parent.name) title head html [document]

>>> content.parents＃是个生成器

<generator object parents at 0x02E465D0>
```
7、兄弟节点

知识点：.next_sibling , .previous_sibling属性

兄弟节点可以理解为本节点处在同一级别的节点， .next_sibling获取该节点的下一个兄弟节点，

.previous_sibling则与之相反，如果节点不存在，则返回none

注意：实际文档中的tag的.next_sibling和.previous_sibling属性通常是字符串或者空白

因为空白或者换行也可以被视作一个节点，所以得到的结果可能是空白或者换行。
```
print soup.p.next_sibling
#       实际该处为空白
print soup.p.prev_sibling
#None   没有前一个兄弟节点，返回 None
print soup.p.next_sibling.next_sibling
#<p class="story">Once upon a time there were three little sisters; and their names were
#<a class="sister" href="http://example.com/elsie" id="link1"></a>,
#<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
#<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
#and they lived at the bottom of a well.</p>
#下一个节点的下一个兄弟节点是我们可以看到的节点
```
8、全部兄弟节点

知识点：.next_siblings 和.previous_siblings属性
相关阅读:
Bootstrap开发框架视频整理
 在Bootstrap开发中解决Tab标签页切换图表显示问题
 在小程序中使用腾讯视频插件播放教程视频
 在小程序后端中转获取接口数据，绕过前端调用限制
 浅析Android恶意应用及其检测技术
 Android恶意软件特征及分类
 半监督学习分类——？？？
强化学习的算法分类
 brew update 过慢的解决方法
 并查集模板——核心就是路径压缩
原文地址：https://www.cnblogs.com/themost/p/6682013.html