• BeautifulSoup学习笔记


    bs4 中 BeautifulSoup 常用命令:  BeautifulSoup,prettify,find,find_all,get,get_text

    示例:

    scenery.html如下:

     1 <html lang="en">
     2 <head>
     3     <meta charset="UTF-8">
     4         <title>武汉旅游景点</title>
     5     <meta name="description" content="武汉旅游景点 精简版" />
     6     <meta name="authot" content="hstking" >
     7 </head>
     8 <body>
     9     <div id="content">
    10         <div class="title">
    11             <h3>武汉景点</h3>
    12         </div>
    13         <ul class="table">
    14             <li>景点<a>门票价格</a></li>
    15         </ul>
    16         <ul class="content">
    17             <li nu="1">东湖 <a class="price">60</a></li>
    18             <li nu="2">磨山 <a class="price">60</a></li>
    19             <li nu="3">欢乐谷 <a class="price">108</a></li>
    20             <li nu="4">海洋世界 <a class="price">150</a></li>
    21             <li nu="5">水上乐园 <a class="price">150</a></li>
    22         </ul>
    23     </div>
    24 </body>
    25 </html>
    View Code

    操作代码如下:

     1 from bs4 import BeautifulSoup
     2 soup = BeautifulSoup(open('scenery.html',encoding='utf-8'),'lxml')  #常用lxml解析
     3 print("soup类型:	",end="");print(type(soup))
     4 
     5 l = soup.find('li')    #find()可以指定属性attrs,只返回一个结果
     6 #l = soup.find('li',attrs={'nu':'3'})
     7 l_all = soup.find_all('li') #find_all()返回所有结果
     8 #l_all = soup.find_all('li',attrs={'nu':'3'})
     9 print("find()的内容:	",end="");print(l)
    10 print("find()返回类型:	",end="");print(type(l))
    11 print("find_all()的内容:	",end="");print(l_all)
    12 print("find_all()返回类型:	",end="");print(type(l_all))    #find_all()返回类型可隐式变换为list
    13 print("find_all()中第3个元素的值:	 ",end="");print(l_all[2])
    14 print("find_all()中元素类型:	",end="");print(type(l_all[2]))
    15 print("find()可获取下级标签的内容:	",end="");print(l.find('a'))
    16 print("get()可获取属性的值:	",end="");print(l.get('nu'))
    17 print("get()返回类型:	",end="");print(type(l.get('nu')))
    18 print("get_text()可获取标签包含的字符串:	",end="");print(l.a.get_text())
    19 print("get_text()返回类型:	",end="");print(type(l.a.get_text()))
    View Code

    结果如下:

  • 相关阅读:
    CSS3--5.颜色属性
    CSS3---4.伪元素选择器
    CSS3---3.相对父元素的伪类
    CSS3---2.兄弟选择器(准确来说叫弟弟选择器,只能向下选)
    CSS3---简介与现状
    CSS3---1.属性选择器
    HTML5---22.LocalStorage的应用
    HTML5---21.SessionStorage的应用
    HTML5---19.地理定位的接口使用
    一首新裤子歌曲
  • 原文地址:https://www.cnblogs.com/ShadowCharle/p/10731959.html
Copyright © 2020-2023  润新知