PaddleHub
https://github.com/PaddlePaddle/PaddleHub
令人惊叹的已训练好的模型工具库, 基于Paddle。
Awesome pre-trained models toolkit based on PaddlePaddle.(260+ models including Image, Text, Audio and Video with Easy Inference & Serving deployment)
提供丰富、高质量、直接可用的已训练好的模型
不需要深度学习背景
覆盖四大类别,图像、文本、音频、视频
开源、免费
Introduction
- PaddleHub aims to provide developers with rich, high-quality, and directly usable pre-trained models.
- No need for deep learning background, you can use AI models quickly and enjoy the dividends of the artificial intelligence era.
- Covers 4 major categories of Image, Text, Audio, and Video, and supports one-click prediction, easy service deployment and transfer learning
- All models are OPEN SOURCE, FREE to download and use them in offline scenario.
特定模型服务于特定场景
https://www.paddlepaddle.org.cn/hub
安装两个库
!pip install --upgrade paddlepaddle -i https://mirror.baidu.com/pypi/simple
!pip install --upgrade paddlehub -i https://mirror.baidu.com/pypi/simple
示例
几行代码就可使用。
如下是中文分词工具使用
!pip install --upgrade paddlepaddle -i https://mirror.baidu.com/pypi/simple !pip install --upgrade paddlehub -i https://mirror.baidu.com/pypi/simple import paddlehub as hub lac = hub.Module(name="lac") test_text = ["今天是个好天气。"] results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True) print(results) #{'word': ['今天', '是', '个', '好天气', '。'], 'tag': ['TIME', 'v', 'q', 'n', 'w']}
模型库-modelbase
https://www.paddlepaddle.org.cn/modelbase
chinese_ocr_db_crnn_mobile
https://www.paddlepaddle.org.cn/hubdetail?name=chinese_ocr_db_crnn_mobile&en_category=TextRecognition
支持中文的OCR模型。
支持三种使用方式
命令行预测
API调用
import paddlehub as hub
import cv2
ocr = hub.Module(name="chinese_ocr_db_crnn_mobile")
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
服务部署
启动PaddleHub Serving
运行启动命令:
$ hub serving start -m chinese_ocr_db_crnn_mobile
发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
import requests import json import cv2 import base64 def cv2_to_base64(image): data = cv2.imencode('.jpg', image)[1] return base64.b64encode(data.tostring()).decode('utf8') # 发送HTTP请求 data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} headers = {"Content-type": "application/json"} url = "http://127.0.0.1:8866/predict/chinese_ocr_db_crnn_mobile" r = requests.post(url=url, headers=headers, data=json.dumps(data)) # 打印预测结果 print(r.json()["results"])
DEMO
https://github.com/fanqingsong/code_snippet/blob/master/machine_learning/paddle/ocr.py
从验证码中提取数字
import paddlehub as hub import cv2 ocr = hub.Module(name="chinese_ocr_db_crnn_mobile") result = ocr.recognize_text(images=[cv2.imread('./test2.png')]) print(result)
如下为打印,粗体为提取数字。
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0310 14:03:05.659210 11502 default_variables.cpp:429] Fail to open /proc/self/io: No such file or directory [2]
/root/.pyenv/versions/3.6.8/lib/python3.6/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
[2021-03-10 14:03:14,697] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
W0310 14:03:14.708413 11502 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
[2021-03-10 14:03:15,063] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
[{'save_path': '', 'data': [{'text': '6067', 'confidence': 0.8805994987487793, 'text_box_position': [[9, 2], [52, 2], [52, 16], [9, 16]]}]}]