pytesseract介绍
1.Python-tesseract是一个基于google's Tesseract-OCR的独立封装包;
2.Python-tesseract功能是识别图片文件中文字,并作为返回参数返回识别结果;
3.Python-tesseract默认支持tiff、bmp格式图片,只有在安装PIL之后,才能支持jpeg、gif、png等其他图片格式
pytesseract安装
1.Python-tesseract支持python2.5及更高版本;
2.Python-tesseract需要安装PIL(Python Imaging Library) ,来支持更多的图片格式:
pip install pillow、pip install PIL
3.Python-tesseract需要安装tesseract-ocr安装包:Windows安装Tesseract-OCR 4.00并配置环境变量
4.安装pytesseract:
pip install pytesseract
pytesseract使用
使用步骤
> try: > import Image > except ImportError: > from PIL import Image > import pytesseract > print(pytesseract.image_to_string(Image.open('test.png'))) > print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
识别二维码
import pytesseract from PIL import Image import requests def Vercode(): url = "http://www.xxxx" header = {"user_agent":"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)"} r =requests.get(url,headers=header,timeout=5) with open('vcode.jpg','wb') as pic: pic.write(r.content) im = pytesseract.image_to_string(Image.open('vcode.jpg')) im = im.replace(' ', '') if im != '': return im else: return Vercode() print Vcode()
refer: