python 爬虫重复下载二次请求

在写爬虫的时候，难免会遇到报错，比如 4XX ，5XX，有些可能是网络的原因，或者一些其他的原因，这个时候我们希望程序去做第二次下载，

有一种很low的解决方案，比如是用 try except

try:
    -------
except:
    try:
        --------
    except:
        try:
            ------
        except:
            try:
                ------
            except:
                try:
                    ------
                except:
                    try:
                        ------
                    except:
                        ------

有没有看起来更舒服的写法呢？

我们可以用递归实现这个过程

代码如下

request_urls = [
"https://www.baidu.com/",
"https://www.baidu.com/",
"https://www.baidu.com/",
"https://www.ba111111idu.com/",
"https://www.baidu.com/",
"https://www.baidu.com/",
]

def down_load(url,request_max=3):
    print "正在请求的URL是：",url
    result_html = ""
    result_status_code = ""
    try:
        result = session.get(url=url)
        result_html = result.content
        result_status_code = result.status_code
        print result_status_code
    except Exception as e:
        print e
        if request_max >0:
            if result_status_code != 200:
                return down_load(url,request_max-1)
    return result_html

for url in request_urls:
    down_load(url=url,request_max=13)

输出结果：

C:Python27python.exe C:/Users/xuchunlin/PycharmProjects/A9_25/auction/test.py
正在请求的URL是： https://www.baidu.com/
200
正在请求的URL是： https://www.baidu.com/
200
正在请求的URL是： https://www.baidu.com/
200
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6438>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA65F8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6828>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6A90>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA62E8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6D30>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6DD8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B682B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68080>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B685C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B687F0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68C50>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是： https://www.baidu.com/
200
正在请求的URL是： https://www.baidu.com/
200

Process finished with exit code 0

相关阅读:
一片非常有趣的文章三分钟读懂TT猫分布式、微服务和集群之路
 mysql+mycat搭建稳定高可用集群，负载均衡，主备复制，读写分离
 mycat读写分离+垂直切分+水平切分+er分片+全局表测试
 LVS Nginx HAProxy 优缺点
 nginx map配置根据请求头不同分配流量到不同后端服务
 Javamail发送邮件
 java发送html模板的高逼格邮件
 学习openresty时，nginx的一个坑
 mysql数据导出golang实现
 mysql支持原生json使用说明
原文地址：https://www.cnblogs.com/xuchunlin/p/8565952.html

python 爬虫 重复下载 二次请求

python 爬虫重复下载二次请求