• BeautifulSoup库children(),descendants()方法的使用


    BeautifulSoup库children(),descendants()方法的使用

    示例网站:http://www.pythonscraping.com/pages/page3.html

    网站内容:

    网站部分重要源代码:

    <table id="giftList">
    <tr><th>
    Item Title
    </th><th>
    Description
    </th><th>
    Cost
    </th><th>
    Image
    </th></tr>
    
    <tr id="gift1" class="gift"><td>
    Vegetable Basket
    </td><td>
    This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
    <span class="excitingNote">Now with super-colorful bell peppers!</span>
    </td><td>
    $15.00
    </td><td>
    <img src="../img/gifts/img1.jpg">
    </td></tr>
    
    <tr id="gift2" class="gift"><td>
    Russian Nesting Dolls
    </td><td>
    Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
    </td><td>
    $10,000.52
    </td><td>
    <img src="../img/gifts/img2.jpg">
    </td></tr>
    
    <tr id="gift3" class="gift"><td>
    Fish Painting
    </td><td>
    If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
    </td><td>
    $10,005.00
    </td><td>
    <img src="../img/gifts/img3.jpg">
    </td></tr>
    
    <tr id="gift4" class="gift"><td>
    Dead Parrot
    </td><td>
    This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
    </td><td>
    $0.50
    </td><td>
    <img src="../img/gifts/img4.jpg">
    </td></tr>
    
    <tr id="gift5" class="gift"><td>
    Mystery Box
    </td><td>
    If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
    </td><td>
    $1.50
    </td><td>
    <img src="../img/gifts/img6.jpg">
    </td></tr>
    </table>
    

     1.children()方法的使用

     

    # -*- coding: utf-8 -*-
    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    html = urlopen("http://www.pythonscraping.com/pages/page3.html")
    bsObj = BeautifulSoup(html,"lxml")
    
    for child in bsObj.find("table",{"id":"giftList"}).children:
        print(child)
    

     

     运行得到的结果为:

    <tr><th>
    Item Title
    </th><th>
    Description
    </th><th>
    Cost
    </th><th>
    Image
    </th></tr>
    
    
    <tr class="gift" id="gift1"><td>
    Vegetable Basket
    </td><td>
    This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
    <span class="excitingNote">Now with super-colorful bell peppers!</span>
    </td><td>
    $15.00
    </td><td>
    <img src="../img/gifts/img1.jpg"/>
    </td></tr>
    
    
    <tr class="gift" id="gift2"><td>
    Russian Nesting Dolls
    </td><td>
    Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
    </td><td>
    $10,000.52
    </td><td>
    <img src="../img/gifts/img2.jpg"/>
    </td></tr>
    
    
    <tr class="gift" id="gift3"><td>
    Fish Painting
    </td><td>
    If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
    </td><td>
    $10,005.00
    </td><td>
    <img src="../img/gifts/img3.jpg"/>
    </td></tr>
    
    
    <tr class="gift" id="gift4"><td>
    Dead Parrot
    </td><td>
    This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
    </td><td>
    $0.50
    </td><td>
    <img src="../img/gifts/img4.jpg"/>
    </td></tr>
    
    
    <tr class="gift" id="gift5"><td>
    Mystery Box
    </td><td>
    If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
    </td><td>
    $1.50
    </td><td>
    <img src="../img/gifts/img6.jpg"/>
    </td></tr>
    

     根据文章中的字面意思来分析:

    children()方法指代的是与parent离得最近(也就是下一个)标签,程序中的children指代的是tr这个标签。

    实验:将children用tr替换掉会得到与以上相同的结果吗?

    # -*- coding: utf-8 -*-
    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    html = urlopen("http://www.pythonscraping.com/pages/page3.html")
    bsObj = BeautifulSoup(html,"lxml")
    
    for child in bsObj.find("table",{"id":"giftList"}).tr:
        print(child)
    

     运行结果为:

    <th>
    Item Title
    </th>
    <th>
    Description
    </th>
    <th>
    Cost
    </th>
    <th>
    Image
    </th>
    

     对以上实验结果进行分析得到:children可以列出所有的子类,而直接指定标签,则不行。

    2.descendants()方法的使用

    # -*- coding: utf-8 -*-
    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    html = urlopen("http://www.pythonscraping.com/pages/page3.html")
    bsObj = BeautifulSoup(html,"lxml")
    
    for child in bsObj.find("table",{"id":"giftList"}).descendants:
        print(child)
    

     运行结果为:

    <tr><th>
    Item Title
    </th><th>
    Description
    </th><th>
    Cost
    </th><th>
    Image
    </th></tr>
    <th>
    Item Title
    </th>
    
    Item Title
    
    <th>
    Description
    </th>
    
    Description
    
    <th>
    Cost
    </th>
    
    Cost
    
    <th>
    Image
    </th>
    
    Image
    
    
    
    <tr class="gift" id="gift1"><td>
    Vegetable Basket
    </td><td>
    This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
    <span class="excitingNote">Now with super-colorful bell peppers!</span>
    </td><td>
    $15.00
    </td><td>
    <img src="../img/gifts/img1.jpg"/>
    </td></tr>
    <td>
    Vegetable Basket
    </td>
    
    Vegetable Basket
    
    <td>
    This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
    <span class="excitingNote">Now with super-colorful bell peppers!</span>
    </td>
    
    This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
    
    <span class="excitingNote">Now with super-colorful bell peppers!</span>
    Now with super-colorful bell peppers!
    
    
    <td>
    $15.00
    </td>
    
    $15.00
    
    <td>
    <img src="../img/gifts/img1.jpg"/>
    </td>
    
    
    <img src="../img/gifts/img1.jpg"/>
    
    
    
    
    <tr class="gift" id="gift2"><td>
    Russian Nesting Dolls
    </td><td>
    Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
    </td><td>
    $10,000.52
    </td><td>
    <img src="../img/gifts/img2.jpg"/>
    </td></tr>
    <td>
    Russian Nesting Dolls
    </td>
    
    Russian Nesting Dolls
    
    <td>
    Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
    </td>
    
    Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 
    <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
    8 entire dolls per set! Octuple the presents!
    
    
    <td>
    $10,000.52
    </td>
    
    $10,000.52
    
    <td>
    <img src="../img/gifts/img2.jpg"/>
    </td>
    
    
    <img src="../img/gifts/img2.jpg"/>
    
    
    
    
    <tr class="gift" id="gift3"><td>
    Fish Painting
    </td><td>
    If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
    </td><td>
    $10,005.00
    </td><td>
    <img src="../img/gifts/img3.jpg"/>
    </td></tr>
    <td>
    Fish Painting
    </td>
    
    Fish Painting
    
    <td>
    If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
    </td>
    
    If something seems fishy about this painting, it's because it's a fish! 
    <span class="excitingNote">Also hand-painted by trained monkeys!</span>
    Also hand-painted by trained monkeys!
    
    
    <td>
    $10,005.00
    </td>
    
    $10,005.00
    
    <td>
    <img src="../img/gifts/img3.jpg"/>
    </td>
    
    
    <img src="../img/gifts/img3.jpg"/>
    
    
    
    
    <tr class="gift" id="gift4"><td>
    Dead Parrot
    </td><td>
    This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
    </td><td>
    $0.50
    </td><td>
    <img src="../img/gifts/img4.jpg"/>
    </td></tr>
    <td>
    Dead Parrot
    </td>
    
    Dead Parrot
    
    <td>
    This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
    </td>
    
    This is an ex-parrot! 
    <span class="excitingNote">Or maybe he's only resting?</span>
    Or maybe he's only resting?
    
    
    <td>
    $0.50
    </td>
    
    $0.50
    
    <td>
    <img src="../img/gifts/img4.jpg"/>
    </td>
    
    
    <img src="../img/gifts/img4.jpg"/>
    
    
    
    
    <tr class="gift" id="gift5"><td>
    Mystery Box
    </td><td>
    If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
    </td><td>
    $1.50
    </td><td>
    <img src="../img/gifts/img6.jpg"/>
    </td></tr>
    <td>
    Mystery Box
    </td>
    
    Mystery Box
    
    <td>
    If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
    </td>
    
    If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. 
    <span class="excitingNote">Keep your friends guessing!</span>
    Keep your friends guessing!
    
    
    <td>
    $1.50
    </td>
    
    $1.50
    
    <td>
    <img src="../img/gifts/img6.jpg"/>
    </td>
    
    
    <img src="../img/gifts/img6.jpg"/>
    
  • 相关阅读:
    软工实践总结
    Beta总结
    beta冲刺6/7
    beta冲刺5/7
    Beta冲刺4/7
    beta冲刺3/7
    beta冲刺2/7
    beta冲刺1/7
    答辩总结
    ES6中的块级作用域与函数声明
  • 原文地址:https://www.cnblogs.com/chensimin1990/p/6725803.html
Copyright © 2020-2023  润新知