beautifulsoup学习

beautifulsoup学习
一，下载安装

http://www.crummy.com/software/BeautifulSoup

（1）python setup.py build

（2）python setup.py install

（3）小例子
```
import bs4
from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
The Dormouse's story

Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.

...
"""
soup = BeautifulSoup(html_doc)

print(soup.prettify())
```
二，使用例子
```
import bs4,os,types,urllib,re
from bs4 import BeautifulSoup
###This function is used for crawler the url

def parserCategary(urlString):
 html_doc = urllib.urlopen(urlString).read()
 soup = BeautifulSoup(html_doc)
 lis = soup.find_all('li', {'class':'category-item'})
 result = {}
 r = re.compile('/category/([a-zA-Z_]+)\?')
 for item in lis:
 result[r.findall(item.contents[0]['href'])[0]] = urlString+item.contents[0]['href']
 return result

if __name__ == '__main__':
 urlString = 'https://play.google.com/store'
 result = parserCategary(urlString)
 for item in result.items():
 print item
```
一个小问题：

当我去解析一段话的时候，有个这样的节点树：

[u'This can place a load on the CPU. You may feel slow indeed in the Android OS.', We also exhibit the versions of other OS. Downloads : <a href="https://www.google.com/url?q=http://wizapply.com/mp2mark/&sa=D&usg=AFQjCNFp2xblQHypa_Z4hvs_VRlXdqgUKw" target="_blank">http://wizapply.com/mp2mark/</a>+ Mobile GPU demonstration [GP2Mark] Search "GP2Mark" in the Google Play!We sell the "manual" and "MP2Mark C source code". If you are interested in is not please contact us.If there is a request, we will receive improvement, etc.tag bench,demo,tegra,multi,core,intel,amd,arm,snapdragon,samsung,exynos,cortex,ndk]

这个列表其实只有两项，一个是开头的unicode字符串，第二个就是一个树。但是因为树中间有逗号，导致本来只有两项的列表，错误的有了很多项。。。

郁闷。

利用DFS去爬网站，必须明确终止条件是什么？每一步处理的问题是什么？

今天花了不少时间去写DFS，遇到不少麻烦：

（1）没有归纳好终止条件是什么

（2）没有明确每一步的任务

（3）虽然是深度，但是在一个函数中，要一个接一个的广度遍历所有的直接孩子。

num-pagination-control

num-pagination-content

input : id="reviewUseAjaxUrl"
相关阅读:
类的空间问题
 面向对象初识
 collections模块,shelve模块
 一段水印文字的练习
 jquery选择器中（：button）的含义
 关于通过jquery来理解position的relative及absolute
[小明学算法]1.动态规划--最长递增子序列问题
 [小明学算法]2.贪心算法----
[Unity的序列化]2.技能编辑器的实现
 [Unity的序列化]1.什么是序列化
原文地址：https://www.cnblogs.com/jilichuan/p/3107690.html