• 文本检测和识别 代码结构梳理


    前言:

    最近学习了一些OCR相关的基础知识,包含目标检测和自然语言处理。

    正好,在数字中国有相关的比赛:

    https://www.datafountain.cn/competitions/334/details/rule

    所以想动手实践一下,实际中发现,对于数据标签的处理和整个检测和识别的流程并不熟悉,自己从头去搞还是有很大难度。

    幸好,有大佬们之前开源的一些baseline可以参考,有检测的也有识别的,对于真真理解OCR识别是有帮助的。

    1)最初baseline AdvancedEAST + CRNN
    https://github.com/Tianxiaomo/Cultural_Inheritance-Recognizing_Chinese_Calligraphy_in_Multiple_Scenarios

    2)一个新的baseline:EAST + ocr_densenet

    https://github.com/DataFountainCode/huawei_code_share

    还有最原始的开源的EAST 源码,advanced EAST源码

    https://github.com/argman/EAST

    https://github.com/huoyijie/AdvancedEAST

    CRNN 源码

    https://github.com/bgshih/crnn

    以及densenet 等,都是很好的学习资源

    https://github.com/yinchangchang/ocr_densenet

    PART1: EAST 

    下面,先对EAST 的整个代码进行梳理:
    训练样本格式:

    img_1.jpg

    img_1.txt

    img_2.jpg

    img_2.txt

    (这个可以用第二个baseline中的convert_to_txt.py 实现)

    即训练集包含图像以及图像对应的标注信息(4个位置坐标和文字)

    python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ 
    --text_scale=512 --training_data_path=/data/ocr/icdar2015/ --geometry=RBOX --learning_rate=0.0001 --num_readers=24 
    --pretrained_model_path=/tmp/resnet_v1_50.ckpt

    训练完成之后们就可以进行测试
    python eval.py --test_data_path=./tmp/test_image/ --gpu_list=0 --checkpoint_path=./tmp/east_icdar2015_resnet_v1_50_rbox/ --output_dir=./tmp/output/
    加载已经训练好的模型进行测试


    bug解决:
    1、lanms 无法完成编译,将Makefile中的Python3 替换为 Python即可make:
    I modify the file lanms/Makefile ,change the python3-config to python-config

    CXXFLAGS = -I include -std=c++11 -O3 $(shell python3-config --cflags)
    LDFLAGS = $(shell python3-config --ldflags)

    2、在测试输出时出现

    Traceback (most recent call last):
      File "eval.py", line 194, in <module>
        tf.app.run()
      File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
        _sys.exit(main(argv))
      File "eval.py", line 160, in main
        boxes, timer = detect(score_map=score, geo_map=geometry, timer=timer)
      File "eval.py", line 98, in detect
        boxes = lanms.merge_quadrangle_n9(boxes.astype('float32'), nms_thres)
      File "/work/ocr/EAST/lanms/__init__.py", line 12, in merge_quadrangle_n9
        from .adaptor import merge_quadrangle_n9 as nms_impl
    ImportError: dynamic module does not define module export function (PyInit_adaptor)

    nms_locality.nms_locality() is a python implemention, its much slower than c++ code, if just want to test, you can use it, these two methods should provide the same result.

    When I change the lanms.merge_quadrangle_n9() in eval.py to nms_locality.nms_locality() There's no error. 

    C++版本实现调用有问题,直接用Python的实现,这里只是慢一点,结果都是一样的;

    PART2: CRNN

     参考源码:https://github.com/bai-shang/OCR_TF_CRNN_CTC

    训练方法:

    1)转换数据,对应图像和标签

    For example: image_list.txt

    90kDICT32px/1/2/373_coley_14845.jpg coley
    90kDICT32px/17/5/176_Nevadans_51437.jpg nevadans

    Note: make sure that images can be read from the path you specificed, such as:

    path/to/90kDICT32px/1/2/373_coley_14845.jpg
    path/to/90kDICT32px/17/5/176_Nevadans_51437.jpg
    .......


    命令行转换为tfrecord:

    python tools/create_crnn_ctc_tfrecord.py
    --image_dir ./data/ --anno_file ./data/train.txt --data_dir ./tfrecords/
    --validation_split_fraction 0.1

    问题:

    1)最初bug:TypeError: None has type NoneType, but expected one of: int, long

    是因为有未定义的字,也就是不在字典中的字,所以在字典中,字典不完整,单独加未在字典中的编码 "<undefined>": 6736

    而且在原代码中:

    def _string_to_int(label):
    # convert string label to int list by char map
    char_map_dict = json.load(open(FLAGS.char_map_json_file, 'r'))

    int_list = []
    for c in label:
    int_list.append(char_map_dict.get(c,6736))    # 增加新的分类6736

    2) python2 中会遇到许多编码的问题,建议换成Python3

    def _bytes_feature(value):
        if type(value) is str:
            value = value.encode('utf-8')
        if sys.version_info[0] > 2:
            value = value # convert string object to bytes
        if not isinstance(value, list):
            value = [value]
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
    

      

     代码调试的时候,一步步打印中间结果,分析问题原因:

    try:

        print (tf.train.Feature(int64_list=tf.train.Int64List(value=value)))

    except:
        print(value)

  • 相关阅读:
    Atitit 教育与培训学校 的计划策划 v4 qc18
    Atitit 设计模式的本质思考】
    Atitit.软件开发的几大规则,法则,与原则Principle v3
    Atitit 深入理解抽象类与接口 attilax总结
    titit. 深入理解 内聚( Cohesion)原理and  attilax大总结
    轻量级web富文本框——wangEditor使用手册(1)——基本应用
    重构wangEditor(web富文本编辑器),欢迎指正!
    js便签笔记(14)——用nodejs搭建最简单、轻量化的http server
    请用fontAwesome代替网页icon小图标
    javascript实现代码高亮-wangHighLighter.js
  • 原文地址:https://www.cnblogs.com/Allen-rg/p/10420553.html
Copyright © 2020-2023  润新知