• 『计算机视觉』Mask-RCNN_锚框生成


    Github地址:Mask_RCNN
    『计算机视觉』Mask-RCNN_论文学习
    『计算机视觉』Mask-RCNN_项目文档翻译
    『计算机视觉』Mask-RCNN_推断网络其一:总览
    『计算机视觉』Mask-RCNN_推断网络其二:基于ReNet101的FPN共享网络
    『计算机视觉』Mask-RCNN_推断网络其三:RPN锚框处理和Proposal生成
    『计算机视觉』Mask-RCNN_推断网络其四:FPN和ROIAlign的耦合
    『计算机视觉』Mask-RCNN_推断网络其五:目标检测结果精炼
    『计算机视觉』Mask-RCNN_推断网络其六:Mask生成
    『计算机视觉』Mask-RCNN_推断网络终篇:使用detect方法进行推断
    『计算机视觉』Mask-RCNN_锚框生成
    『计算机视觉』Mask-RCNN_训练网络其一:数据集与Dataset类
    『计算机视觉』Mask-RCNN_训练网络其二:train网络结构&损失函数
    『计算机视觉』Mask-RCNN_训练网络其三:训练Model

    一、和SSD锚框对比

    Mask_RCNN的锚框本质上来说和SSD的是一样的(『TensorFlow』SSD源码学习_其三:锚框生成),

    中心点的个数等于特征层像素数

    框体生成是围绕中心点的

    最终的框体坐标需要归一化到01之间,都是对于输入图片的相对大小

    RCNN系列一般都是一个共享特征,但在Mask_RCNN结构引入了FPN结构后,和SSD一样,使用了多层特征,这样两者的锚框生成算法可以说是如出一辙了,只不过是生成策略有所微调:

    SSD中不同特征层对应着不同的网格增强比例参数;Mask_RCNN不通层的比例(anchor_ratios)则完全一致

    SSD每一层每一个中心点生成该层ratio+2个框;Mask_RCNN生成固定3个框

    SSD中心点为feat像素偏移0.5步长;Mask_RCNN中心点直接选为feat像素位置

    而基本生成方式两者完全一致:

    • h乘anchor_ratios**0.5
    • w除anchor_ratios**0.5

    h、w初始值为给定的参考尺寸,即感受野控制实际依赖的参数为每一层的anchor_ratios和参考尺寸,对SSD:

    anchor_sizes=[(21., 45.),
                  (45., 99.),
                  (99., 153.),
                  (153., 207.),
                  (207., 261.),
                  (261., 315.)]
    anchor_ratios=[[2, .5], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5, 3, 1./3], [2, .5], [2, .5]]

    对Mask_RCNN(h、w参考尺寸大小一致):

    self.config.BACKBONE_STRIDES  = [4, 8, 16, 32, 64]      # 特征层的下采样倍数,中心点计算使用
    self.config.RPN_ANCHOR_RATIOS = [0.5, 1, 2] # 特征层锚框生成参数

    self.config.RPN_ANCHOR_SCALES = [32, 64, 128, 256, 512] # 特征层锚框感受野

     二、锚框生成

    锚框生成入口函数位于model.py中的get_anchor函数,需要参数image_shape,保证含有[h, w]即可,也可以包含[h, w, c],

        def get_anchors(self, image_shape):
            """Returns anchor pyramid for the given image size."""
            # [N, (height, width)]
            backbone_shapes = compute_backbone_shapes(self.config, image_shape)
            # Cache anchors and reuse if image shape is the same
            if not hasattr(self, "_anchor_cache"):
                self._anchor_cache = {}
            if not tuple(image_shape) in self._anchor_cache:
                # Generate Anchors: [anchor_count, (y1, x1, y2, x2)]
                a = utils.generate_pyramid_anchors(
                    self.config.RPN_ANCHOR_SCALES,  # (32, 64, 128, 256, 512)
                    self.config.RPN_ANCHOR_RATIOS,  # [0.5, 1, 2]
                    backbone_shapes,                # with shape [N, (height, width)]
                    self.config.BACKBONE_STRIDES,   # [4, 8, 16, 32, 64]
                    self.config.RPN_ANCHOR_STRIDE)  # 1
                # Keep a copy of the latest anchors in pixel coordinates because
                # it's used in inspect_model notebooks.
                # TODO: Remove this after the notebook are refactored to not use it
                self.anchors = a
                # Normalize coordinates
                self._anchor_cache[tuple(image_shape)] = utils.norm_boxes(a, image_shape[:2])
            return self._anchor_cache[tuple(image_shape)]
    

    调用函数compute_backbone_shapes计算各个特征层shape:

    def compute_backbone_shapes(config, image_shape):
        """Computes the width and height of each stage of the backbone network.
    
        Returns:
            [N, (height, width)]. Where N is the number of stages
        """
        if callable(config.BACKBONE):
            return config.COMPUTE_BACKBONE_SHAPE(image_shape)
    
        # Currently supports ResNet only
        assert config.BACKBONE in ["resnet50", "resnet101"]
        return np.array(
            [[int(math.ceil(image_shape[0] / stride)),
                int(math.ceil(image_shape[1] / stride))]
                for stride in config.BACKBONE_STRIDES])  # [4, 8, 16, 32, 64]
    

    调用函数utils.generate_pyramid_anchors生成全部锚框:

    def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
                                 anchor_stride):
        """Generate anchors at different levels of a feature pyramid. Each scale
        is associated with a level of the pyramid, but each ratio is used in
        all levels of the pyramid.
    
        Returns:
        anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
            with the same order of the given scales. So, anchors of scale[0] come
            first, then anchors of scale[1], and so on.
        """
        # Anchors
        # [anchor_count, (y1, x1, y2, x2)]
        anchors = []
        for i in range(len(scales)):
            anchors.append(generate_anchors(scales[i],
                                            ratios,
                                            feature_shapes[i],
                                            feature_strides[i],
                                            anchor_stride))
        # [anchor_count, (y1, x1, y2, x2)]
        return np.concatenate(anchors, axis=0)
    

    utils.generate_pyramid_anchors会调用utils.generate_anchors来生成每一层的锚框(这一步较多的使用了函数meshgrid,介绍见『Numpy』np.meshgrid):

    def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
        """
        scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
        ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
        shape: [height, width] spatial shape of the feature map over which
                to generate anchors.
        feature_stride: Stride of the feature map relative to the image in pixels.
        anchor_stride: Stride of anchors on the feature map. For example, if the
            value is 2 then generate anchors for every other feature map pixel.
        """
        # Get all combinations of scales and ratios
        scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
        scales = scales.flatten()
        ratios = ratios.flatten()
    
        # Enumerate heights and widths from scales and ratios
        heights = scales / np.sqrt(ratios)
        widths = scales * np.sqrt(ratios)
    
        # Enumerate shifts in feature space
        shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
        shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
        shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
    
        # Enumerate combinations of shifts, widths, and heights
        box_widths, box_centers_x = np.meshgrid(widths, shifts_x)    # (n, 3) (n, 3)
        box_heights, box_centers_y = np.meshgrid(heights, shifts_y)  # (n, 3) (n, 3)
    
        # Reshape to get a list of (y, x) and a list of (h, w)
        # (n, 3, 2) -> (3n, 2)
        box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
        box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])
    
        # Convert to corner coordinates (y1, x1, y2, x2)
        boxes = np.concatenate([box_centers - 0.5 * box_sizes,
                                box_centers + 0.5 * box_sizes], axis=1)
        # 框体信息是相对于原图的, [N, (y1, x1, y2, x2)]
        return boxes
    

    模拟某层的中心点分布

    最后回到get_anchor,调用utils.norm_boxes将锚框坐标化为01之间:

    def norm_boxes(boxes, shape):
        """Converts boxes from pixel coordinates to normalized coordinates.
        boxes: [N, (y1, x1, y2, x2)] in pixel coordinates
        shape: [..., (height, width)] in pixels
    
        Note: In pixel coordinates (y2, x2) is outside the box. But in normalized
        coordinates it's inside the box.
    
        Returns:
            [N, (y1, x1, y2, x2)] in normalized coordinates
        """
        h, w = shape
        scale = np.array([h - 1, w - 1, h - 1, w - 1])
        shift = np.array([0, 0, 1, 1])
        return np.divide((boxes - shift), scale).astype(np.float32)
    

    最终返回相对坐标下的锚框,shape:[anchor_count, (y1, x1, y2, x2)]

  • 相关阅读:
    马氏距离的深入理解
    Mahalanobis Distance(马氏距离)
    Weka EM 协方差
    数据挖掘、概率分析与决策支持
    二、 Android中gravity与layout_gravity的区别
    一、 Android完全退出应用程序
    python configparse
    时间戳与时间互转
    python argparse
    时间插件
  • 原文地址:https://www.cnblogs.com/hellcat/p/9854736.html
Copyright © 2020-2023  润新知