爬取图虫网示例网址 https://wangxu.tuchong.com/23892889/

#coding=gbk
import requests
from fake_useragent import UserAgent
from lxml import etree
import urllib
import re
import os

pattern = 'https://(.+?).(.*).com'
# url = 'https://wangxu.tuchong.com/23892889/'
url = input("请输入图虫网图片地址:")
headers = {
    'User-Agent':UserAgent().chrome
}
response = requests.get(url,headers = headers)
e = etree.HTML(response.text)
img_path = '//article//img/@src'
img_urls = e.xpath(img_path)
# print(img_urls)
num = 1
for img_url in img_urls:
    response = requests.get(img_url,headers = headers)
    name = re.search(pattern,url).group(1)
    if os.path.exists("图虫_{}".format(name)):
        pass
    else:
        os.mkdir('图虫_{}'.format(name))
    urllib.request.urlretrieve(img_url, './图虫_{0}/图{1}.png'.format(name,num))
    print("第{}张图片下载完毕".format(num))
    num += 1

2020-07-15

相关阅读:
Python学习---IO的异步[tornado模块]
Python学习---IO的异步[twisted模块]
Python学习---IO的异步[gevent+Grequests模块]
Python学习---IO的异步[asyncio +aiohttp模块]
Python学习---IO的异步[asyncio模块(no-http)]
Python学习---Python的异步IO[all]
Python学习---爬虫学习[scrapy框架初识]
Python学习---Django关于POST的请求解析源码分析
Python学习---爬虫学习[requests模块]180411
Python实例---CRM管理系统分析180331

原文地址：https://www.cnblogs.com/hany-postq473111315/p/13306056.html

爬取图虫网 示例网址 https://wangxu.tuchong.com/23892889/

爬取图虫网示例网址 https://wangxu.tuchong.com/23892889/