• 任意角度的场景文本检测论文简单总结


    任意角度的场景文本检测

    论文思路总结
    特点:重新添加分支的创新更突出
    场景文本检测


    基于分割的检测方法


    spcnet(mask_rcnn+tcm+rescore)
    psenet(渐进扩展)
    mask text spottor(新加分割分支)
    craft
    incepText

    基于回归的检测方法:


    r2cnn(类别分支,水平分支,倾斜分支)
    rrpn(旋转rpn)
    textbox(ssd)
    textbox++
    sstd(tcm改进前身)
    rtn
    ctpn(微分)

    基于分割和回归的混合方法:


    spcnet
    利用mask_rcnn来进行实例分割,通过新模块tcm(获取全局语义分割图)以及rescore来提升准确率,实例分割映射在全局语义分割打分
    pixel-anchor(deeplabv3+ssd):
    分割的部分检测中大目标,ssd检测小目标
    east(deeplabv3)
    af-rpn
    位于文本核心区域中的每个滑动点,直接预测从它到文本边框顶点的偏移量
    (采用ohem)


    FPN官方给的训练时候是前面共享参数的,对结果影响不大,说是特征金字塔使得不同层学到了相同层次的语义特征
    FPN在得到多层金字塔模块的proposals结果之后,放到一块做nms处理
    FPN每层金字塔模块的scale都是一样的,因为对应到不同的feature map上面刚好检测不同大小的物体


    ***********************论文名字后边括号内容为亮点部分********************


    hybrid:---------------------------------------------------------------
    1.af-rpn(af)
    anchor-free
    直接预测中心点到box的四个顶点偏移量,
    避免了这种情况(to achieve high recall, anchors use various scales and shapes should be designed to cover the scale and shape variabilities of objects )
    scale-friendly
    FPN对大中小三种尺度的目标分开检测(实现细节与fpn有不同)


    2.inceptext(inceptext)
    整体就是 fpn+inception_module+deformable_conv+deformable PSROI pooling
    inception-text
    设计类似inception中(1*1,3*3,5*5)三种卷积核对大中小三种不同尺度的目标进行检测,
    也加入deformable卷积来调整感受野,把检测聚集到文字上面,不容易受方向限制;还有 two fused feature maps 增加多尺度信息。
    deformable psroi pooling
    (把检测聚集到文字上面,不容易受方向限制)
    加入offset集中检测文字部分的信息,tend to learn the context surrounding the text
    Each image is randomly cropped and scaled to have short edge of{640,800,960,1120}.
    The anchor scales are {2,4,8,16}, and ratios are {0.2,0.5,2,5}.

    3.rtn(无亮点)
    一个多尺度的特征,加上ctpn竖直框,加上只有回归的预测
    hierarchical convolutional
    获得更强的语义特征,融合了resnet的模块4和模块5
    vertical proposal mechanism
    用ctpn获取竖直框,目的是去掉proposal的分类

    4.fots(east改进)
    simultaneous detection and recognition,sharing compution and visual information
    contributions:
    (1)end-to-end trainable by Sharing convolutional features,detect and recognize simultaneously
    (2)ROIRotate,extract the oriented text regions from convolutional feature maps
    loss = pixel-wise classification loss + IOU loss + angle loss

    5.pixel-anchor
    combine FPN and ASPP operation as encoder-decoder structure at segmentation
    adaptive SSD (add adaptive predictor layer ADL)in anchor-level(share features with segmentation)
    for better detect large variances in size and aspect ratio(orioise long anchors and anchor density)
    the segmentation heat map in pixel-module is fed to anchor-module ,make the attention mechanism
    gather all the boxes from pixel-level and anchor-level and conduct a cascaded NMS


    regression:---------------------------------------------------------------
    1.ctpn
    detecting text in fine-scale proposals
    generate vertical proposals
    recurrent connectionist text proposals
    连接vertical proposals
    side-refinement
    针对左右边界的anchors预测文本行的边界进行调整
    2.textboxs
    采用ssd来做std(multi-scale)
    3.textboxs++
    可以借鉴数据增强的方式 random crop
    4.r2cnn(inclined box)
    three ROIPoolings use different pooled sizes
    anchor scales(4,8,16,32)
    axis-aligned 和 inclined box一起预测且是包含关系
    incline NMS
    compute convolutional feature maps on an image pyramid(非主要)
    augment ICDAR 2015
    We rotate our image at the following angles (-90, -75, -60, -45, -30, -15, 0, 15, 30, 45, 60, 75, 90).
    借鉴r2cnn的 ablation experiment
    5.rrpn
    rrpn
    r-anchors(54,3*3*6),generate inclined proposals(representation,x,y,h,w,θ)
    RROI pooling
    skew NMS
    image rotation strategy during data augmentation


    segmentation ------------------------------------------------------
    1.text-attention
    training a CNN include more informative supervised information,
    text region mask, character label and binary text/non-text information

    text region regression is trained by using an additional sub network
    includes two deconvolutional layers
    2.sstd(text attention)
    text attention module
    the attention map indicates rough text regions and is further
    encoded into the AIFs.
    hierarchical inception module
    capture richer context information by using multi-scale receptive fields
    3.mask text spotter
    precise text detection and recognition are acquired via semantic segmentation
    (1)end-to-end trainable model for text spotting
    (2)various shapes
    (3)via semantic segmentation
    (4)sota performances in both detection and text spotting
    4.east
    directly predicts words or text lines of arbitrary orientations and quad in full images
    (1)only two stages FCN(pvanet和u-net)+NMS
    (2)flexible geometric shapes
    (2)both accuracy and speed
    5.craft
    (不考虑借鉴)

  • 相关阅读:
    套接字的工作流程
    信安系统设计基础(个人报告阅读说明)
    1.1Linux 系统简介(学习过程)
    1.12Linux下软件安装(学习过程)
    作业3.5
    作业1
    变量与基本数据类型
    python入门
    计算机基础知识补充
    计算机基础
  • 原文地址:https://www.cnblogs.com/ywheunji/p/12334925.html
Copyright © 2020-2023  润新知