1、安装PIL
window键+r打开cmd,在cmd输入:pip install PIL
错误提示:
Could not find a version that satisfies the requirement PIL (from versions: )
No matching distribution found for PIL
先安装wheel。选择相应版本下载后,打开cmd(win+r),你要先安装pip,具体pip安装百度一下怎么安装,然后pip install wheel,然后直接打开PyCharm就可以直接使用了。
2、安装tesseract-ocr
github地址: https://github.com/tesseract-ocr/tesseract
windows:
The latest installer can be downloaded here: tesseract-ocr-setup-3.05.01.exe and tesseract-ocr-setup-4.00.00dev.exe (experimental).
pip install pytesseract
遇到问题及解决:
FileNotFoundError: [WinError 2] 系统找不到指定的文件
解决方案:
1[推荐]: 将tesseract.exe添加到环境变量PATH中,
例如: D:Tesseract-OCR,默认路径为C:Program Files (x86)Tesseract-OCR
注意: 为了使环境变量生效,需要关闭cmd窗口或是关闭pycharm等ide重新启动
方法2: 修改pytesseract.py文件,指定tesseract.exe安装路径
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe‘
3、用Pycharm自动化测试,验证码登陆,代码如下:
# -*- coding: utf-8 -*- # -*- author: hjd -*- #coding:utf-8 from selenium import webdriver from time import sleep import unittest from PIL import Image from PIL import ImageEnhance import pytesseract from selenium.webdriver.common.keys import Keys #测试Image类 # im=Image.open('D:\Pictures\Camera Roll\xuanku_chahua.jpg') # w,h = im.size # print(w,h) #打开登陆界面 driver = webdriver.Firefox() url = "需要登陆的网址" driver.get(url) driver.maximize_window() #点击显示验证码 sleep(2) driver.switch_to.default_content() driver.find_element_by_css_selector("input#captcha").clear() #截取当前网页中我们需要的验证码 sleep(2) driver.save_screenshot(r"E:aa.png") img = driver.find_element_by_id("codeimage") #定位验证码 location = img.location #获取验证码x,y轴坐标 size = img.size #获取验证码的长宽 coderange = (int(location['x']),int(location['y']),int(location['x']+size['width']),int(location['y']+size['height'])) #写成我们需要截取的位置坐标 i = Image.open(r"E:aa.png")#打开截图 frame4 = i.crop(coderange)#使用Image的crop函数,从截图中再次截取我们需要的区域 frame4.save(r"E:frame4.png") i2=Image.open(r"E:frame4.png") imgry = i2.convert('L') #图像加强,二值化,PIL中有九种不同模式。分别为1,L,P,RGB,RGBA,CMYK,YCbCr,I,F。L为灰度图像 sharpness = ImageEnhance.Contrast(imgry)#对比度增强 i3 = sharpness.enhance(3.0) #3.0为图像的饱和度 i3.save("E:\image_code.png") i4 = Image.open("E:\image_code.png") text = pytesseract.image_to_string(i4) #使用image_to_string识别验证码 print(text) #admin登陆 sleep(5) driver.switch_to.default_content() driver.find_element_by_id('user_name').send_keys('用户名') driver.find_element_by_id('password').send_keys('密码') driver.find_element_by_css_selector("input#captcha").send_keys(text)
需要考虑验证码的图片复杂度问题,如果是有干扰线和噪点的验证码图片,这段代码无法实现。