• FPN在faster_rcnn中实现细节代码说明


    代码参考自:https://github.com/DetectionTeamUCAS/FPN_Tensorflow

    主要分析fpn多层金字塔结构的输出如何进行预测。

    FPN金字塔结构插入在faster_rcnn的特征图获取之后,在rpn结构之前。

    具体代码如下所示:

    代码结构追溯至FPN部分:

    train.py(line 46 :build_whole_detection_network函数)

        build_whole_network(line 372:  build_whole_detection_network函数)

        按照注释分别查看7个步骤:

        1. build base network
        
        2. build rpn
     
        3. generate_anchors
        4. postprocess rpn proposals. such as: decode, clip, NMS(所得第一次框处理)
        5. build Fast-RCNN(5,roipooling  6,inference rois to obtain fc   7,cls reg)
        6. postprocess_fastrcnn(最后框处理)
        FPN部分在build base network中,得到的plist即为金字塔特征图的集合
            build_base_network一步一步回溯找到原函数resnet_base(fpn操作在这里,如下代码)
      1 def resnet_base(img_batch, scope_name, is_training=True):
      2     '''
      3     this code is derived from light-head rcnn.
      4     https://github.com/zengarden/light_head_rcnn
      5 
      6     It is convenient to freeze blocks. So we adapt this mode.
      7     '''
      8     if scope_name == 'resnet_v1_50':
      9         middle_num_units = 6
     10     elif scope_name == 'resnet_v1_101':
     11         middle_num_units = 23
     12     else:
     13         raise NotImplementedError('We only support resnet_v1_50 or resnet_v1_101. Check your network name....yjr')
     14 
     15     blocks = [resnet_v1_block('block1', base_depth=64, num_units=3, stride=2),
     16               resnet_v1_block('block2', base_depth=128, num_units=4, stride=2),
     17               resnet_v1_block('block3', base_depth=256, num_units=middle_num_units, stride=2),
     18               resnet_v1_block('block4', base_depth=512, num_units=3, stride=1)]
     19     # when use fpn . stride list is [1, 2, 2]
     20 
     21     with slim.arg_scope(resnet_arg_scope(is_training=False)):
     22         with tf.variable_scope(scope_name, scope_name):
     23             # Do the first few layers manually, because 'SAME' padding can behave inconsistently
     24             # for images of different sizes: sometimes 0, sometimes 1
     25             net = resnet_utils.conv2d_same(
     26                 img_batch, 64, 7, stride=2, scope='conv1')
     27             net = tf.pad(net, [[0, 0], [1, 1], [1, 1], [0, 0]])
     28             net = slim.max_pool2d(
     29                 net, [3, 3], stride=2, padding='VALID', scope='pool1')
     30 
     31     not_freezed = [False] * cfgs.FIXED_BLOCKS + (4-cfgs.FIXED_BLOCKS)*[True]
     32     # Fixed_Blocks can be 1~3
     33 
     34     with slim.arg_scope(resnet_arg_scope(is_training=(is_training and not_freezed[0]))):
     35         C2, end_points_C2 = resnet_v1.resnet_v1(net,
     36                                                 blocks[0:1],
     37                                                 global_pool=False,
     38                                                 include_root_block=False,
     39                                                 scope=scope_name)
     40 
     41     # C2 = tf.Print(C2, [tf.shape(C2)], summarize=10, message='C2_shape')
     42     add_heatmap(C2, name='Layer2/C2_heat')
     43 
     44     with slim.arg_scope(resnet_arg_scope(is_training=(is_training and not_freezed[1]))):
     45         C3, end_points_C3 = resnet_v1.resnet_v1(C2,
     46                                                 blocks[1:2],
     47                                                 global_pool=False,
     48                                                 include_root_block=False,
     49                                                 scope=scope_name)
     50 
     51     # C3 = tf.Print(C3, [tf.shape(C3)], summarize=10, message='C3_shape')
     52     add_heatmap(C3, name='Layer3/C3_heat')
     53     with slim.arg_scope(resnet_arg_scope(is_training=(is_training and not_freezed[2]))):
     54         C4, end_points_C4 = resnet_v1.resnet_v1(C3,
     55                                                 blocks[2:3],
     56                                                 global_pool=False,
     57                                                 include_root_block=False,
     58                                                 scope=scope_name)
     59 
     60     add_heatmap(C4, name='Layer4/C4_heat')
     61 
     62     # C4 = tf.Print(C4, [tf.shape(C4)], summarize=10, message='C4_shape')
     63     with slim.arg_scope(resnet_arg_scope(is_training=is_training)):
     64         C5, end_points_C5 = resnet_v1.resnet_v1(C4,
     65                                                 blocks[3:4],
     66                                                 global_pool=False,
     67                                                 include_root_block=False,
     68                                                 scope=scope_name)
     69     # C5 = tf.Print(C5, [tf.shape(C5)], summarize=10, message='C5_shape')
     70     add_heatmap(C5, name='Layer5/C5_heat')
     71 
     72     feature_dict = {'C2': end_points_C2['{}/block1/unit_2/bottleneck_v1'.format(scope_name)],
     73                     'C3': end_points_C3['{}/block2/unit_3/bottleneck_v1'.format(scope_name)],
     74                     'C4': end_points_C4['{}/block3/unit_{}/bottleneck_v1'.format(scope_name, middle_num_units - 1)],
     75                     'C5': end_points_C5['{}/block4/unit_3/bottleneck_v1'.format(scope_name)],
     76                     # 'C5': end_points_C5['{}/block4'.format(scope_name)],
     77                     }
     78 
     79     # feature_dict = {'C2': C2,
     80     #                 'C3': C3,
     81     #                 'C4': C4,
     82     #                 'C5': C5}
     83 
     84     pyramid_dict = {}
     85     with tf.variable_scope('build_pyramid'):
     86         with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(cfgs.WEIGHT_DECAY),
     87                             activation_fn=None, normalizer_fn=None):
     88 
     89             P5 = slim.conv2d(C5,
     90                              num_outputs=256,
     91                              kernel_size=[1, 1],
     92                              stride=1, scope='build_P5')
     93             if "P6" in cfgs.LEVLES:
     94                 P6 = slim.max_pool2d(P5, kernel_size=[1, 1], stride=2, scope='build_P6')
     95                 pyramid_dict['P6'] = P6
     96 
     97             pyramid_dict['P5'] = P5
     98 
     99             for level in range(4, 1, -1):  # build [P4, P3, P2]
    100 
    101                 pyramid_dict['P%d' % level] = fusion_two_layer(C_i=feature_dict["C%d" % level],
    102                                                                P_j=pyramid_dict["P%d" % (level+1)],
    103                                                                scope='build_P%d' % level)
    104             for level in range(4, 1, -1):
    105                 pyramid_dict['P%d' % level] = slim.conv2d(pyramid_dict['P%d' % level],
    106                                                           num_outputs=256, kernel_size=[3, 3], padding="SAME",
    107                                                           stride=1, scope="fuse_P%d" % level)
    108     for level in range(5, 1, -1):
    109         add_heatmap(pyramid_dict['P%d' % level], name='Layer%d/P%d_heat' % (level, level))
    110 
    111     # return [P2, P3, P4, P5, P6]
    112     print("we are in Pyramid::-======>>>>")
    113     print(cfgs.LEVLES)
    114     print("base_anchor_size are: ", cfgs.BASE_ANCHOR_SIZE_LIST)
    115     print(20 * "__")
    116     return [pyramid_dict[level_name] for level_name in cfgs.LEVLES]
    117     # return pyramid_dict  # return the dict. And get each level by key. But ensure the levels are consitant
    118     # return list rather than dict, to avoid dict is unordered

    观察原特征图的结构C2,C3,C4,C5,  以及特征金字塔的结构P5,P4,P3,P2,为5层的特征金字塔结构 。

    操作如图:

    金字塔结构的总层数为(p5,p6,p4,p3,p2)

    P5 = conv2d(C5)                 (因金字塔特征图每层的构造是 上面一层的2x upsaming 和左边的1*1conv后的结果相加)

    P6 = max_pool(P5)

    核心的融合部分在下面代码中显示:

    P4 = C4 + P5

    P3 = C3 + P4

    P2 = C2 + P3

    1             for level in range(4, 1, -1):  # build [P4, P3, P2]
    2 
    3                 pyramid_dict['P%d' % level] = fusion_two_layer(C_i=feature_dict["C%d" % level],
    4                                                                P_j=pyramid_dict["P%d" % (level+1)],
    5                                                                scope='build_P%d' % level)

    最后的P_LIST共有:  LEVLES = ['P2', 'P3', 'P4', 'P5', 'P6']层级

     对应每层特征图设置不同大小的anchors

      Instead, we assign anchors of a single scale to each level. Formally, we define the anchors to have areas of {32,64,128,256,512} pixels on {P2,P3,P4,P5,P6} respectively.

    As in [29] we also use anchors of multiple aspect ratios{1:2, 1:1, 2:1}at each level. So in total there are 15 anchors over the pyramid

    后面得到全部金字塔特征图的roi,下一步是要把roi对应到各自层的特征图上取roi特征,不同大小的roi对应不同的特征图,较大的roi对应深层的特征图,按照公式

     确定对应的层,然后送入roi pooling,统一特征图尺寸。

    在roipooling之后的处理过程都基本一样了;

    原来是一个map进行predict产生一些proposals;经过处理之后,送入全连接层之后进行cls  and reg;

    FPN现在是多个map进行predict产生更多不同尺度(更加鲁棒)的proposals,经过处理之后,也是送入全连接层之后进行cls and reg。

     


          

  • 相关阅读:
    MonoDev 冷门而好用的功能
    Android Runtime
    bat调用bat的一个巨坑
    AssetBundles
    赢家不会告诉你的事
    防止过度工程
    如果你不肯向这个世界投降
    《我的互联网方法论》
    Notepad++ HTML格式化
    Python 包的相对导入讲解
  • 原文地址:https://www.cnblogs.com/ywheunji/p/11408595.html
Copyright © 2020-2023  润新知