Python第一个爬虫学习

Python第一个爬虫学习
在网上查看大神的关于Python爬虫的文章，代码如下：
```
#coding=utf-8
import urllib
import re

def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

def getImg(html):
    reg = r'src="(.+?.jpg)" pic_ext'
    imgre = re.compile(reg)
    imglist = re.findall(imgre,html)
    x = 0
    for imgurl in imglist:
        urllib.urlretrieve(imgurl,'%s.jpg' % x)
        x+=1

html = getHtml("http://tieba.baidu.com/p/2460150866")
print getImg(html)
```
以下则是在运行上述代码过程中遇到的相关问题，以及解决方式，虽然不怎么高级，但是也算是一种学习思路吧。

问题1：在Python3.2的环境下，未运行时，代码会报错：

解决1：将
```
print getImg(html)
```
修改为
```
print (getImg(html))
```
问题2：代码执行后，报如下错误：

解决2：度娘进行搜索，才发现3.2不兼容2.0的，于是进入官方文档查找最新的调用方式，对这三行进行以下修改，修改前：
```
import urllib 
page = urllib.urlopen(url) 
urllib.urlretrieve(imgurl,'%s.jpg' % x)
```
修改后：
```
import urllib.request
page = urllib.request.urlopen(url)
urllib.request.urlretrieve(imgurl,'%s.jpg' % x)
```
问题3：运行代码，提示以下错误：

C:Pythonpython.exe D:/selenium/getjpgTest.py
Traceback (most recent call last):
File "D:/selenium/getjpgTest.py", line 20, in <module>
print (getImg(html))
File "D:/selenium/getjpgTest.py", line 13, in getImg
imglist = re.findall(imgre,html)
File "C:Pythonlib e.py", line 213, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object

Process finished with exit code 1
解决3：百度之后，很容易得到答案，加上下面一句代码即可解决：
```
html=html.decode('utf-8')
```
最终得到以下代码：
```
#coding=utf-8
import urllib.request
import re

def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    return html

def getImg(html):
    reg = r'src="(.+?.jpg)" pic_ext'
    imgre = re.compile(reg)
    html = html.decode('utf-8')
    imglist = re.findall(imgre,html)
    x = 0
    for imgurl in imglist:
        urllib.request.urlretrieve(imgurl,'%s.jpg' % x)
        x+=1

html = getHtml("http://tieba.baidu.com/p/2460150866")
print (getImg(html))
```
执行结果如下：

参考文章：1、http://www.cnblogs.com/fnng/p/3576154.html

　　　　　2、http://blog.csdn.net/lxh199603/article/details/53192883
相关阅读:
Neko's loop HDU-6444（网络赛1007）
Parameters
SETLOCAL
RD / RMDIR Command
devenv 命令用法
 Cannot determine the location of the VS Common Tools folder.
'DEVENV' is not recognized as an internal or external command,
How to change Visual Studio default environment setting
error signing assembly unknown error
What is the Xcopy Command?:
原文地址：https://www.cnblogs.com/biyuting/p/8552440.html