• #track#: MXNet中faster R-CNN (2),proposal Op与ground truth 的处理


    突然不知rcnn怎么操作的了,花些时间看看程序,mxnet非官方正式版的程序看起来很紧致,nice。主要关注end2end模式,有了些新的发现:

    proposal Op

    proposal Op后向没有实质操作,

    1. 实质性的操作发生在上一步,该步中进行了逐点损失函数的建设(具体来说应该是在上步的loss中调中的ground truth 机制),可以想见,为每个点分配ground truth是一项关键步骤;
    
    2. proposal Op中进行之前所说的IoU相关的操作,仅是整合,没有后向的条件。但会有新的问题:整合的机理应该和上一步的机理保持一定的协调性;
    

    ground truth

    先把看板抬出来:

    group = mx.symbol.Custom(rois=rois, gt_boxes=gt_boxes_reshape, op_type='proposal_target',
                                 num_classes=num_classes, batch_images=config.TRAIN.BATCH_IMAGES,
                                 batch_rois=config.TRAIN.BATCH_ROIS, fg_fraction=config.TRAIN.FG_FRACTION)
    rois = group[0]
    label = group[1]
    bbox_target = group[2]      # with from logarithms
    bbox_weight = group[3]
    
    # Fast R-CNN
    pool5 = mx.symbol.ROIPooling(
    name='roi_pool5', data=relu5_3, rois=rois, pooled_size=(7, 7), spatial_scale=1.0 / config.RCNN_FEAT_STRIDE)
    # group 6
    flatten = mx.symbol.Flatten(data=pool5, name="flatten")
    fc6 = mx.symbol.FullyConnected(data=flatten, num_hidden=4096, name="fc6")
    relu6 = mx.symbol.Activation(data=fc6, act_type="relu", name="relu6")
    drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
    # group 7
    fc7 = mx.symbol.FullyConnected(data=drop6, num_hidden=4096, name="fc7")
    relu7 = mx.symbol.Activation(data=fc7, act_type="relu", name="relu7")
    drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
    # classification
    cls_score = mx.symbol.FullyConnected(name='cls_score', data=drop7, num_hidden=num_classes)
    cls_prob = mx.symbol.SoftmaxOutput(name='cls_prob', data=cls_score, label=label, normalization='batch')
    # bounding box regression
    bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
    bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))
    

    proposal_target Op

    之前还没注意到,在proposal Op之后还有这样的一个操作。这个操作考虑了这样一个问题:

    如何为后续的(进一步)回归和分类器提供样本?

     
    其实仔细想想这个问题还是关键的,要使两者配合紧密,就要提供上一步输出的真实预测,但又要考虑其标签的真实性。具体操作看起来是通过选取与gt交集合适的pred_bbx进行的。

    bbox_pred

    ROIPooling后,对 bbox_pred的回归的输入是什么?

     # bounding box regression 见上文程序
     bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
     bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))
    

    从程序来看,回归的输入源是ROIPooling后的,那么如果ROIPooling仅仅输出一个指定尺寸的featureMap,这似乎就有些不妥了(用于分类还可以),回归预测的是全局,输入怎么会是一个局部量?需要查看损失函数输入中bbox_target的由来。
    经查看,再次的回归,是对前次预测的 误差 (即drop7)进行预测,路径如下:
    proposal_target —>sample_rois —> bbox_transform (nonlinear_transform)

    def nonlinear_transform(ex_rois,gt_rois):                                                                              
        assert ex_rois.shape[0] == gt_rois.shape[0], 'inconsistent rois number'
        ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
        ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
        ex_ctr_x = ex_rois[:, 0] + 0.5 * (ex_widths - 1.0)
        ex_ctr_y = ex_rois[:, 1] + 0.5 * (ex_heights - 1.0)
        gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
        gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
        gt_ctr_x = gt_rois[:, 0] + 0.5 * (gt_widths - 1.0)
        gt_ctr_y = gt_rois[:, 1] + 0.5 * (gt_heights - 1.0)
    
        targets_dx = (gt_ctr_x - ex_ctr_x) / (ex_widths + 1e-14)
        targets_dy = (gt_ctr_y - ex_ctr_y) / (ex_heights + 1e-14)
        targets_dw = np.log(gt_widths / ex_widths)
        targets_dh = np.log(gt_heights / ex_heights)
    
        targets = np.vstack(
                        (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
        return targets
    

    bbox_target

    前面提到了两个变量:
    该看看逐点设置ground truth的操作了。
    路径:
    train_end2end—>AnchorLoader(创建iterator)

    # rcnn/core/loader.py
    ...
    if config.TRAIN.END2END:
        self.data_name = ['data', 'im_info', 'gt_boxes']
    else:
        self.data_name = ['data']
        self.label_name = ['label', 'bbox_target', 'bbox_weight'] #  与 bbox_target 对应
        ...
    
    ########### gt_boxes 来自读取,来看 label 和 bbox_target   #####################
    
        label = assign_anchor(feat_shape, label['gt_boxes'], data['im_info'],
                                 self.feat_stride, self.anchor_scales,
                                      self.anchor_ratios, self.allowed_border)
    
    ##############################################
    ########## 追踪 assign_anchor   ###############
    ##############################################
            if not config.TRAIN.RPN_CLOBBER_POSITIVES:
                      # assign bg labels first so that positive labels can clobber them
                       labels[max_overlaps < config.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
    
                  # fg label: for each gt, anchor with highest overlap
                 labels[gt_argmax_overlaps] = 1
    
    
                   labels[max_overlaps >= config.TRAIN.RPN_POSITIVE_OVERLAP] = 1
    
                if config.TRAIN.RPN_CLOBBER_POSITIVES:
                   labels[max_overlaps < config.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
            else:
              labels[:] = 0
            ...
    
    
            if gt_boxes.size > 0:
                bbox_targets[:] = bbox_transform(anchors, gt_boxes[argmax_overlaps, :4])
    

    Note:用v_anchor指代经网络修正后的anchor

    1. label比较清晰了,先将每个 ground truth 映射到一个v_anchor,然后对达到阈值要求的v_anchor也进行设置;
    2. 但bbox_target却存有疑问,每个v_anchor都被强制附上了一个ground truth,是否会导致那些没有内容的v_anchor影响系统的收敛?一个比较合理的猜测,是在gt_boxes提供时已经吸收了那些没有目标的anchor,这个想法没有找到证据。于是想到了,bbox_weight关于此,发现了相关证据。

    先看回顾下bbox_weight出场的地方(参照上文程序):

     # bounding box regression 
     bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
     bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))
    

    然后来看是怎么产生的:
    (路径:
    proposal_target—>sample_rois—>expand_bbox_regression_targets):

    def expand_bbox_regression_targets(bbox_targets_data, num_classes):
        """ 
        expand from 5 to 4 * num_classes; only the right class has non-zero bbox regression targets
        :param bbox_targets_data: [k * 5]
        :param num_classes: number of classes
        :return: bbox target processed [k * 4 num_classes]
        bbox_weights ! only foreground boxes have bbox regression computation!
        """
        classes = bbox_targets_data[:, 0]
        bbox_targets = np.zeros((classes.size, 4 * num_classes), dtype=np.float32)
        bbox_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
        indexes = np.where(classes > 0)[0]  # >  :  exclude background
        for index in indexes:
            cls = classes[index]
            start = int(4 * cls)
            end = start + 4 
            bbox_targets[index, start:end] = bbox_targets_data[index, 1:]   #  why expanding? -> fullyConnected output (4*num_class)
            bbox_weights[index, start:end] = config.TRAIN.BBOX_WEIGHTS
        return bbox_targets, bbox_weights
    

    注释的第一行就把话说明白了:只有有目标(label>0)的的回归才会有非零权重。

    Conclusion

    做个小结。

    1. 整个网络有两次回归预测,而每次预测实际上只是预测了一个偏差:第一次的参考物是v_anchor(经一次预测修正后的anchor),第二次是roi,两次的对比物都是gt_boxes;另外,每次预测中,中心坐标的偏移都被视为待预测的一部分,而并不是anchor的中心在第一次修正(预测)中被保留(即使它是按pixel-wise产生的)。
    2. gt_weight用来屏蔽对产生背景box的惩罚。
    3. proposal Op仅进行融作,没有后向操作;proposal_target提供用于后续结构训练的样本,其中包含了label=0的样本。
  • 相关阅读:
    设计模式读书笔记
    effective_c++(第三版)读书笔记
    CS-Notes 操作系统读书笔记
    数据库笔记
    后台开发核心技术与应用读书笔记
    python3.7安装numpy pandas失败的处理方案
    线段树模板
    KMP算法
    离散实验——欧拉图的判定和应用
    堆排序算法及其实现
  • 原文地址:https://www.cnblogs.com/chenyliang/p/6780110.html
Copyright © 2020-2023  润新知