python-web-下载所有xkcd漫画

下载所有xkcd漫画

# downloads every single xkcd comic

import requests,os,bs4
url='http://xkcd.com'  # start url
os.makedirs('xkcd',exist_ok=True) # store comics in ./xkcd
while not url.endswith('#'):
    # todo:download the page 
    print('downloading page %s...'%url)
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text)

    # todo find the url of the comic image
    comicElem = soup.select('#comic img')
    if comicElem == []:
        print('could not find comic image')
    else:
        comicUrl = 'http:'+comicElem[0].get('src')
        # todo: download the iamge
        print('downloading image %s .... '%(comicUrl))
        res = requests.get(comicUrl)
        res.raise_for_status()

        # todo: save the image to ./xkcd
        imageFile = open(os.path.join('xkcd',os.path.basename(comicUrl)),'wb')

        for chunk in res.iter_content(100000):
             imageFile.write(chunk)
        imageFile.close()


    # todo: get the prev button'url
    prevLink = soup.select('a[rel="prev"]')[0]
    url = 'http://xkcd.com'+prevLink.get('href')

相关阅读:
python 基于os模块的常用操作
python 文件的读写
Spring Boot 2.0(五)：Docker Compose + Spring Boot + Nginx + Mysql 实践
Docker(四)：Docker 三剑客之 Docker Compose
Docker(三)：Dockerfile 命令详解
Docker(二)：Dockerfile 使用介绍
Docker(一)：Docker入门教程
虚拟机vmware centos7 扩展磁盘空间
那些年我们遇到的坑（1）-Description Resource Path Location Type Archive for required library
RPM安装命令总结

原文地址：https://www.cnblogs.com/liu-wang/p/8997434.html