• 并发爬取网站图片


    某网站的图片:

    通过“https://photo.fengniao.com/#p=4”(人像)进入某一主题。

    显示的是几十张缩略的小图片以及相应的跳转地址,点击小图片后获取大图片。

    想获取小图片背后的大图片,如果通过串行方法依次访问大图链接后保存,会非常耗时。

    1,使用多线程获取图片

    import requests
    from lxml import etree
    from concurrent.futures import ThreadPoolExecutor
    from functools import partial
    
    
    def get_paths(path, regex, code):
        """
        :param path: 网页
        :param regex: 解析规则
        :param code: 编码
        :return: 根据解析规则,解析网页后返回内容列表
        """
        resp = requests.get(path)
        if resp.status_code == 200:
            select = etree.HTML(resp.text)
            paths = select.xpath(regex)
            return paths
    
    
    def save_pic(path, pic_name, directory):
        """
        :param pic_name: 保存的图片名称
        :param path: 图片的地址
        :param directory: 保存的图片目录
        :return:
        """
        resp = requests.get(path, stream=True)
        if resp.status_code == 200:
            with open('{}/{}.jpg'.format(directory, pic_name), 'wb') as f:
                f.write(resp.content)
    
    
    if __name__ == '__main__':
        paths = get_paths('https://photo.fengniao.com/#p=4', '//a[@class="pic"]/@href', 'utf-8')
        paths = ['https://photo.fengniao.com/' + p for p in paths]
    
        # 获取所有大图片路径
        p = partial(get_paths, regex='//img[@class="picBig"]/@src', code='utf-8')  # 冻结解析规则,编码
        with ThreadPoolExecutor() as excutor:
            res = excutor.map(p, paths)
        big_paths = [i[0] for i in res]  # 拿到所有图片的路径
    
        # 保存图片
        p = partial(save_pic, directory='fn_pics')   # 冻结保存目录
        with ThreadPoolExecutor() as excutor:
            res = excutor.map(p, big_paths, range(len(big_paths)))
        [r for r in res]
    
  • 相关阅读:
    ueditor单独调用图片上传
    百度Ueditor多图片上传控件
    linux基础之vim编辑器
    linux基础之进阶命令二
    linux基础之基础命令一
    Python基础之PyCharm快捷键大全
    IT菜鸟之VTP应用项目
    IT菜鸟之总结(Du teacher)
    IT菜鸟之DHCP
    IT菜鸟之路由器基础配置(静态、动态、默认路由)
  • 原文地址:https://www.cnblogs.com/guxh/p/10351655.html
Copyright © 2020-2023  润新知