BeautifulSoup库children(),descendants()方法的使用
示例网站:http://www.pythonscraping.com/pages/page3.html
网站内容:
网站部分重要源代码:
<table id="giftList"> <tr><th> Item Title </th><th> Description </th><th> Cost </th><th> Image </th></tr> <tr id="gift1" class="gift"><td> Vegetable Basket </td><td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> </td><td> $15.00 </td><td> <img src="../img/gifts/img1.jpg"> </td></tr> <tr id="gift2" class="gift"><td> Russian Nesting Dolls </td><td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> </td><td> $10,000.52 </td><td> <img src="../img/gifts/img2.jpg"> </td></tr> <tr id="gift3" class="gift"><td> Fish Painting </td><td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> </td><td> $10,005.00 </td><td> <img src="../img/gifts/img3.jpg"> </td></tr> <tr id="gift4" class="gift"><td> Dead Parrot </td><td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span> </td><td> $0.50 </td><td> <img src="../img/gifts/img4.jpg"> </td></tr> <tr id="gift5" class="gift"><td> Mystery Box </td><td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> </td><td> $1.50 </td><td> <img src="../img/gifts/img6.jpg"> </td></tr> </table>
1.children()方法的使用
# -*- coding: utf-8 -*- from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).children: print(child)
运行得到的结果为:
<tr><th> Item Title </th><th> Description </th><th> Cost </th><th> Image </th></tr> <tr class="gift" id="gift1"><td> Vegetable Basket </td><td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> </td><td> $15.00 </td><td> <img src="../img/gifts/img1.jpg"/> </td></tr> <tr class="gift" id="gift2"><td> Russian Nesting Dolls </td><td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> </td><td> $10,000.52 </td><td> <img src="../img/gifts/img2.jpg"/> </td></tr> <tr class="gift" id="gift3"><td> Fish Painting </td><td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> </td><td> $10,005.00 </td><td> <img src="../img/gifts/img3.jpg"/> </td></tr> <tr class="gift" id="gift4"><td> Dead Parrot </td><td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span> </td><td> $0.50 </td><td> <img src="../img/gifts/img4.jpg"/> </td></tr> <tr class="gift" id="gift5"><td> Mystery Box </td><td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> </td><td> $1.50 </td><td> <img src="../img/gifts/img6.jpg"/> </td></tr>
根据文章中的字面意思来分析:
children()方法指代的是与parent离得最近(也就是下一个)标签,程序中的children指代的是tr这个标签。
实验:将children用tr替换掉会得到与以上相同的结果吗?
# -*- coding: utf-8 -*- from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).tr: print(child)
运行结果为:
<th> Item Title </th> <th> Description </th> <th> Cost </th> <th> Image </th>
对以上实验结果进行分析得到:children可以列出所有的子类,而直接指定标签,则不行。
2.descendants()方法的使用
# -*- coding: utf-8 -*- from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bsObj = BeautifulSoup(html,"lxml") for child in bsObj.find("table",{"id":"giftList"}).descendants: print(child)
运行结果为:
<tr><th> Item Title </th><th> Description </th><th> Cost </th><th> Image </th></tr> <th> Item Title </th> Item Title <th> Description </th> Description <th> Cost </th> Cost <th> Image </th> Image <tr class="gift" id="gift1"><td> Vegetable Basket </td><td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> </td><td> $15.00 </td><td> <img src="../img/gifts/img1.jpg"/> </td></tr> <td> Vegetable Basket </td> Vegetable Basket <td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> </td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!</span> Now with super-colorful bell peppers! <td> $15.00 </td> $15.00 <td> <img src="../img/gifts/img1.jpg"/> </td> <img src="../img/gifts/img1.jpg"/> <tr class="gift" id="gift2"><td> Russian Nesting Dolls </td><td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> </td><td> $10,000.52 </td><td> <img src="../img/gifts/img2.jpg"/> </td></tr> <td> Russian Nesting Dolls </td> Russian Nesting Dolls <td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> </td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> 8 entire dolls per set! Octuple the presents! <td> $10,000.52 </td> $10,000.52 <td> <img src="../img/gifts/img2.jpg"/> </td> <img src="../img/gifts/img2.jpg"/> <tr class="gift" id="gift3"><td> Fish Painting </td><td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> </td><td> $10,005.00 </td><td> <img src="../img/gifts/img3.jpg"/> </td></tr> <td> Fish Painting </td> Fish Painting <td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> </td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span> Also hand-painted by trained monkeys! <td> $10,005.00 </td> $10,005.00 <td> <img src="../img/gifts/img3.jpg"/> </td> <img src="../img/gifts/img3.jpg"/> <tr class="gift" id="gift4"><td> Dead Parrot </td><td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span> </td><td> $0.50 </td><td> <img src="../img/gifts/img4.jpg"/> </td></tr> <td> Dead Parrot </td> Dead Parrot <td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span> </td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span> Or maybe he's only resting? <td> $0.50 </td> $0.50 <td> <img src="../img/gifts/img4.jpg"/> </td> <img src="../img/gifts/img4.jpg"/> <tr class="gift" id="gift5"><td> Mystery Box </td><td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> </td><td> $1.50 </td><td> <img src="../img/gifts/img6.jpg"/> </td></tr> <td> Mystery Box </td> Mystery Box <td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> </td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span> Keep your friends guessing! <td> $1.50 </td> $1.50 <td> <img src="../img/gifts/img6.jpg"/> </td> <img src="../img/gifts/img6.jpg"/>