python+selenium+PIL+glob+numpy实现网页截长图

一、需求描述

　　某些网页内容很多，一页截图完全展示不下，所以需要滚动来截长图展示。

二、实现方案

　　使用selenium+PIL+glob+numpy来实现。

1、PIL是Pillow库的简写，是一个python的第三方库，是用于图像处理的标准库。

2、glob是python自带的一个操作文件的模块，主要用来查找符合特定规则的文件路径，使用这个模块查找文件需要用到*、？、[]三个通配符；

　　*表示匹配0个或者多个字符；

　　？表示匹配单个字符；

　　[]表示匹配指定范围内的字符，如[0-9]

　　glob.glob方法主要返回所有匹配的文件路径列表，它只有一个参数，定义文件路径匹配规则，这里可以是绝对路径，也可以是相对路径。

3、NumPy是python的一个包，代表Numeric Python，它是一个有多维数组对象和用于处理数组的例程集合组成的库。

（1）NumPy 操作

使用NumPy，开发人员可以执行以下操作：

数组的算数和逻辑运算。
傅立叶变换和用于图形操作的例程。
与线性代数有关的操作。 NumPy 拥有线性代数和随机数生成的内置函数。NumPy – MatLab 的替代之一

（2）NumPy 通常与 SciPy（Scientific Python）和 Matplotlib（绘图库）一起使用。这种组合广泛用于替代 MatLab，是一个流行的技术计算平台。但是，Python 作为 MatLab 的替代方案，现在被视为一种更加现代和完整的编程语言。

（3）NumPy 是开源的，这是它的一个额外的优势。

三、实现代码

import glob
import os
import time
import numpy
from PIL import Image

class ScreenLongShot(object):
    def __init__(self,driver,url,js):
        self.driver = driver
        self.pageurl = url
        self.img = './pictures/bug.png'
        self.js = js
    def get_height(self):
        # 获取谷歌浏览器的高度以及网页的高度
        chrome_height = self.driver.get_window_size()['height']
        page_height = self.driver.execute_script('return ' + self.js + 'Height')
        return chrome_height,page_height
    def screen_long_shot(self):
        try:
            # 隐式等待10s
            self.driver.implicitly_wait(10)
            chrome_height,page_height = self.get_height()
            temp_img = './pictures/tmp.png'
            self.driver.save_screenshot(temp_img)
            # 取余网页高度和谷歌浏览器高度的余数来判断滚动几次鼠标
            if page_height > chrome_height:
                n = page_height // chrome_height
                # 将输入内容转换为二维数组
                base_mat = numpy.atleast_2d(Image.open(temp_img))
                for i in range(n):
                    # 每滚动一次鼠标就截图一次
                    self.driver.execute_script(f"{self.js+'Top'}={chrome_height * (i + 1)};")
                    time.sleep(5)
                    self.driver.save_screenshot(f'./pictures/tmp_{i}.png')
                    mat = numpy.atleast_2d(Image.open(f'./pictures/tmp_{i}.png'))
                    # 图片拼接
                    base_mat = numpy.append(base_mat,mat,axis=0)
                Image.fromarray(base_mat).save(self.img)
        except Exception as e:
            print(e)
        finally:
            # 获取当前目录下的所有包删除执行中间的缓存图片(getcwd表示获取当前路径)
            for i in glob.glob(os.path.join(os.getcwd(),'./pictures/tmp*.png')):
                os.remove(i)
    def close_chrome(self):
        self.driver.quit()
            
        

def main(driver,url,js):
    page_url = url
    s = ScreenLongShot(driver,page_url,js)
    try:
        s.screen_long_shot()
    except Exception as e:
        print(e)

相关阅读:
LeetCode--414--第三大的数
 LeetCode--412--Fizz Buzz
LeetCode--409--最长回文串
 《Cracking the Coding Interview》——第18章：难题——题目6
《Cracking the Coding Interview》——第18章：难题——题目5
《Cracking the Coding Interview》——第18章：难题——题目4
《Cracking the Coding Interview》——第18章：难题——题目3
《Cracking the Coding Interview》——第18章：难题——题目2
《Cracking the Coding Interview》——第18章：难题——题目1
《Cracking the Coding Interview》——第17章：普通题——题目14
原文地址：https://www.cnblogs.com/lxmtx/p/16490383.html