• 『TensorFlow』SSD源码学习_其六:标签整理


    Fork版本项目地址:SSD

    一、输入标签生成

    在数据预处理之后,图片、类别、真实框格式较为原始,不能够直接作为损失函数的输入标签(ssd向前网络只需要图像就行,这里的处理主要需要满足loss的计算),对于一张图片(三维CHW)我们需要如下格式的数据作为损失函数标签:

    gclasse:           搜索框对应的真实类别

             长度为ssd特征层f的list,每一个元素是一个Tensor,shape为:该层中心点行数×列数×每个中心点包含搜索框数目

    gscores:           搜索框和真实框的IOU,gclasses中记录的就是该真实框的类别

                长度为ssd特征层f的list,每一个元素是一个Tensor,shape为:该层中心点行数×列数×每个中心点包含搜索框数目

    glocalisations:  搜索框相较于真实框位置修正,由于有4个坐标,所以维度多了一维

             长度为ssd特征层f的list,每一个元素是一个Tensor,shape为:该层中心点行数×列数×每个中心点包含搜索框数目×4

     为了计算出上面标签,我们函数调用如下(train_ssd_network.py):

                # f层个(m,m,k),f层个(m,m,k,4xywh),f层个(m,m,k) f层表示提取ssd特征的层的数目
                # 0-20数字,方便loss的坐标记录,IOU值
                gclasses, glocalisations, gscores = 
                    ssd_net.bboxes_encode(glabels, gbboxes, ssd_anchors)
    

    输入变量都是前几节中的函数输出(train_ssd_network.py):

    ssd_anchors = ssd_net.anchors(ssd_shape)  # 调用类方法,创建搜素框
                
    # Pre-processing image, labels and bboxes.
    # 'CHW' (n,) (n, 4)
    image, glabels, gbboxes = 
            image_preprocessing_fn(image, glabels, gbboxes,
                                   out_shape=ssd_shape,  # (300,300)
                                   data_format=DATA_FORMAT)  # 'NCHW'
    

    至此,我们再来看一看该函数如何实现,其处理过程是按照ssd特征层进行划分,首先建立三个list,然后对于每一个特征层计算该层的三个Tensor,最后分别添加进list中(ssd_common.py):

    def tf_ssd_bboxes_encode(labels,
                             bboxes,
                             anchors,
                             num_classes,
                             no_annotation_label,
                             ignore_threshold=0.5,
                             prior_scaling=(0.1, 0.1, 0.2, 0.2),
                             dtype=tf.float32,
                             scope='ssd_bboxes_encode'):
        with tf.name_scope(scope):
            target_labels = []
            target_localizations = []
            target_scores = []
            # anchors_layer: (y, x, h, w)
            for i, anchors_layer in enumerate(anchors):
                with tf.name_scope('bboxes_encode_block_%i' % i):
                    # (m,m,k),xywh(m,m,4k),(m,m,k)
                    t_labels, t_loc, t_scores = 
                        tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,
                                                   num_classes, no_annotation_label,
                                                   ignore_threshold,
                                                   prior_scaling, dtype)
                    target_labels.append(t_labels)
                    target_localizations.append(t_loc)
                    target_scores.append(t_scores)
            return target_labels, target_localizations, target_scores
    

    每一层处理是重点(ssd_common.py),从这里我们可以更深刻体会到所有框体长度信息归一化的便捷之处——不同层的框体均可以直接和真实框做运算,毕竟它们都是0~1的相对位置:

    # 为了有助理解,m表示该层中心点行列数,k为每个中心点对应的框数,n为图像上的目标数
    def tf_ssd_bboxes_encode_layer(labels,         # (n,)
                                   bboxes,         # (n, 4)
                                   anchors_layer,  # y(m, m, 1), x(m, m, 1), h(k,), w(k,)
                                   num_classes,
                                   no_annotation_label,
                                   ignore_threshold=0.5,
                                   prior_scaling=(0.1, 0.1, 0.2, 0.2),
                                   dtype=tf.float32):
        """Encode groundtruth labels and bounding boxes using SSD anchors from
        one layer.
    
        Arguments:
          labels: 1D Tensor(int64) containing groundtruth labels;
          bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
          anchors_layer: Numpy array with layer anchors;
          matching_threshold: Threshold for positive match with groundtruth bboxes;
          prior_scaling: Scaling of encoded coordinates.
    
        Return:
          (target_labels, target_localizations, target_scores): Target Tensors.
        """
        # Anchors coordinates and volume.
        yref, xref, href, wref = anchors_layer  # y(m, m, 1), x(m, m, 1), h(k,), w(k,)
        ymin = yref - href / 2.  # (m, m, k)
        xmin = xref - wref / 2.
        ymax = yref + href / 2.
        xmax = xref + wref / 2.
        vol_anchors = (xmax - xmin) * (ymax - ymin)  # 搜索框面积(m, m, k)
    
        # Initialize tensors...
        # 下面各个Tensor矩阵的shape等于中心点坐标矩阵的shape
        shape = (yref.shape[0], yref.shape[1], href.size)  # (m, m, k)
        feat_labels = tf.zeros(shape, dtype=tf.int64)  # (m, m, k)
        feat_scores = tf.zeros(shape, dtype=dtype)
    
        feat_ymin = tf.zeros(shape, dtype=dtype)
        feat_xmin = tf.zeros(shape, dtype=dtype)
        feat_ymax = tf.ones(shape, dtype=dtype)
        feat_xmax = tf.ones(shape, dtype=dtype)
    
        def jaccard_with_anchors(bbox):
            """Compute jaccard score between a box and the anchors.
            """
            int_ymin = tf.maximum(ymin, bbox[0])  # (m, m, k)
            int_xmin = tf.maximum(xmin, bbox[1])
            int_ymax = tf.minimum(ymax, bbox[2])
            int_xmax = tf.minimum(xmax, bbox[3])
            h = tf.maximum(int_ymax - int_ymin, 0.)
            w = tf.maximum(int_xmax - int_xmin, 0.)
            # Volumes.
            # 处理搜索框和bbox之间的联系
            inter_vol = h * w  # 交集面积
            union_vol = vol_anchors - inter_vol 
                + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])  # 并集面积
            jaccard = tf.div(inter_vol, union_vol)  # 交集/并集,即IOU
            return jaccard  # (m, m, k)
    
        def condition(i, feat_labels, feat_scores,
                      feat_ymin, feat_xmin, feat_ymax, feat_xmax):
            """Condition: check label index.
            """
            r = tf.less(i, tf.shape(labels))
            return r[0]  # tf.shape(labels)有维度,所以r有维度
    
        def body(i, feat_labels, feat_scores,
                 feat_ymin, feat_xmin, feat_ymax, feat_xmax):
            """Body: update feature labels, scores and bboxes.
            Follow the original SSD paper for that purpose:
              - assign values when jaccard > 0.5;
              - only update if beat the score of other bboxes.
            """
            # Jaccard score.
            label = labels[i]  # 当前图片上第i个对象的标签
            bbox = bboxes[i]   # 当前图片上第i个对象的真实框bbox
            jaccard = jaccard_with_anchors(bbox)  # 当前对象的bbox和当前层的搜索网格IOU,(m, m, k)
            # Mask: check threshold + scores + no annotations + num_classes.
            mask = tf.greater(jaccard, feat_scores)  # 掩码矩阵,IOU大于历史得分的为True,(m, m, k)
            # mask = tf.logical_and(mask, tf.greater(jaccard, matching_threshold))
            mask = tf.logical_and(mask, feat_scores > -0.5)
            mask = tf.logical_and(mask, label < num_classes)  # 不太懂,label应该必定小于类别数
            imask = tf.cast(mask, tf.int64)  # 整形mask
            fmask = tf.cast(mask, dtype)     # 浮点型mask
    
            # Update values using mask.
            # 保证feat_labels存储对应位置得分最大对象标签,feat_scores存储那个得分
            # (m, m, k) × 当前类别scalar + (1 - (m, m, k)) × (m, m, k)
            # 更新label记录,此时的imask已经保证了True位置当前对像得分高于之前的对象得分,其他位置值不变
            feat_labels = imask * label + (1 - imask) * feat_labels
            # 更新score记录,mask为True使用本类别IOU,否则不变
            feat_scores = tf.where(mask, jaccard, feat_scores)
    
            # 下面四个矩阵存储对应label的真实框坐标
            # (m, m, k) × 当前框坐标scalar + (1 - (m, m, k)) × (m, m, k)
            feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
            feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
            feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
            feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
    
            return [i+1, feat_labels, feat_scores,
                    feat_ymin, feat_xmin, feat_ymax, feat_xmax]
        # Main loop definition.
        # 对当前图像上每一个目标进行循环
        i = 0
        (i,
         feat_labels, feat_scores,
         feat_ymin, feat_xmin,
         feat_ymax, feat_xmax) = tf.while_loop(condition, body,
                                               [i,
                                                feat_labels, feat_scores,
                                                feat_ymin, feat_xmin,
                                                feat_ymax, feat_xmax])
        # Transform to center / size.
        # 这里的y、x、h、w指的是对应位置所属真实框的相关属性
        feat_cy = (feat_ymax + feat_ymin) / 2.
        feat_cx = (feat_xmax + feat_xmin) / 2.
        feat_h = feat_ymax - feat_ymin
        feat_w = feat_xmax - feat_xmin
    
        # Encode features.
        # prior_scaling: [0.1, 0.1, 0.2, 0.2],放缩意义不明
        # ((m, m, k) - (m, m, 1)) / (k,) * 10
        # 以搜索网格中心点为参考,真实框中心的偏移,单位长度为网格hw
        feat_cy = (feat_cy - yref) / href / prior_scaling[0]
        feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
        # log((m, m, k) / (m, m, 1)) * 5
        # 真实框宽高/搜索网格宽高,取对
        feat_h = tf.log(feat_h / href) / prior_scaling[2]
        feat_w = tf.log(feat_w / wref) / prior_scaling[3]
        # Use SSD ordering: x / y / w / h instead of ours.(m, m, k, 4)
        feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)  # -1会扩维,故有4
    
        return feat_labels, feat_localizations, feat_scores
    

    可以看到(最后几行),feat_localizations用于位置修正记录,其中存储的并不是直接的搜索框和真实框的差,而是按照loss函数所需要的格式进行存储,但是进行prior_scaling处理的意义不明,不过直观来看对loss函数不构成负面影响(损失函数值依旧是搜索框等于真实框最佳)。

    二、处理为batch

    生成batch数据队列

    截止到目前,我们的数据都是对单张图片而言,需要将之整理为batch size的Tensor,不过有点小麻烦,就是我们的数据以list包含Tensor为主,维度扩充需要一点小技巧(tf_utils.py):

    def reshape_list(l, shape=None):
        """Reshape list of (list): 1D to 2D or the other way around.
    
        Args:
          l: List or List of list.
          shape: 1D or 2D shape.
        Return
          Reshaped list.
        """
        r = []
        if shape is None:
            # Flatten everything.
            for a in l:
                if isinstance(a, (list, tuple)):
                    r = r + list(a)
                else:
                    r.append(a)
        else:
            # Reshape to list of list.
            i = 0
            for s in shape:
                if s == 1:
                    r.append(l[i])
                else:
                    r.append(l[i:i+s])
                i += s
        return r
    

    这个函数可以将list1:[Tensor11, [Tensor21, Tensor22, ……], [Ten31, Tensor32, ……], ……]和list2:[Tensor1, Tensor2, ……]这样的形式相互转换,需要的就是记录下list1中各子list长度,单个Tensor记为1(train_ssd_network.py):

                batch_shape = [1] + [len(ssd_anchors)] * 3  # (1,f层,f层,f层)
    
                # Training batches and queue.
                r = tf.train.batch(  # 图片,中心点类别,真实框坐标,得分
                    tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
                    batch_size=FLAGS.batch_size,  # 32
                    num_threads=FLAGS.num_preprocessing_threads,
                    capacity=5 * FLAGS.batch_size)
    
                b_image, b_gclasses, b_glocalisations, b_gscores = 
                    tf_utils.reshape_list(r, batch_shape)
    
                # Intermediate queueing: unique batch computation pipeline for all
                # GPUs running the training.
                batch_queue = slim.prefetch_queue.prefetch_queue(
                    tf_utils.reshape_list([b_image, b_gclasses, b_glocalisations, b_gscores]),
                    capacity=2 * deploy_config.num_clones)
    

    由于tf.train.batch接收输入格式为[Tensor1, Tensor2, ……],所以要先使用上面函数处理输入,使单张图片的标签数据变化为batch size的标签数据,再将标签数据格式变换回来(实际就是把list1化为list2后给其中每一个Tensor加了一个维度,再变换回list1的格式),最后将batch size的Tensor创建队列,不过没必要这么麻烦,实际上像下面这么做也不会报错,省略了来回折腾Tensor的过程……

                batch_shape = [1] + [len(ssd_anchors)] * 3 # (1,f层,f层,f层)            
                r = tf.train.batch(  # 图片,中心点类别,真实框坐标,得分
                    tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
                    batch_size=FLAGS.batch_size,  # 32
                    num_threads=FLAGS.num_preprocessing_threads,
                    capacity=5 * FLAGS.batch_size)
    
                # Intermediate queueing: unique batch computation pipeline for all
                # GPUs running the training.
                batch_queue = slim.prefetch_queue.prefetch_queue(
                    r,                                # <-----输入格式实际上并不需要调整
                    capacity=2 * deploy_config.num_clones)
    

    获取batch数据队列

                # Dequeue batch.
                b_image, b_gclasses, b_glocalisations, b_gscores = 
                    tf_utils.reshape_list(batch_queue.dequeue(), batch_shape)  # 重整list
    

    出队后整理一下list格式即可,此时获取的数据格式如下(vgg_300为例):

    <tf.Tensor 'batch:0' shape=(32, 3, 300, 300) dtype=float32>

    [<tf.Tensor 'batch:1' shape=(32, 38, 38, 4) dtype=int64>,
    <tf.Tensor 'batch:2' shape=(32, 19, 19, 6) dtype=int64>,
    <tf.Tensor 'batch:3' shape=(32, 10, 10, 6) dtype=int64>,
    <tf.Tensor 'batch:4' shape=(32, 5, 5, 6) dtype=int64>,
    <tf.Tensor 'batch:5' shape=(32, 3, 3, 4) dtype=int64>,
    <tf.Tensor 'batch:6' shape=(32, 1, 1, 4) dtype=int64>]

    [<tf.Tensor 'batch:7' shape=(32, 38, 38, 4, 4) dtype=float32>,
    <tf.Tensor 'batch:8' shape=(32, 19, 19, 6, 4) dtype=float32>,
    <tf.Tensor 'batch:9' shape=(32, 10, 10, 6, 4) dtype=float32>,
    <tf.Tensor 'batch:10' shape=(32, 5, 5, 6, 4) dtype=float32>,
    <tf.Tensor 'batch:11' shape=(32, 3, 3, 4, 4) dtype=float32>,
    <tf.Tensor 'batch:12' shape=(32, 1, 1, 4, 4) dtype=float32>]

    [<tf.Tensor 'batch:13' shape=(32, 38, 38, 4) dtype=float32>,
    <tf.Tensor 'batch:14' shape=(32, 19, 19, 6) dtype=float32>,
    <tf.Tensor 'batch:15' shape=(32, 10, 10, 6) dtype=float32>,
    <tf.Tensor 'batch:16' shape=(32, 5, 5, 6) dtype=float32>,
    <tf.Tensor 'batch:17' shape=(32, 3, 3, 4) dtype=float32>,
    <tf.Tensor 'batch:18' shape=(32, 1, 1, 4) dtype=float32>]

     此时的数据格式已经符合loss函数和网络输入要求,运行即可:

                # Construct SSD network.
                # 这个实例方法会返回之前定义的函数ssd_arg_scope(允许修改两个参数)
                arg_scope = ssd_net.arg_scope(weight_decay=FLAGS.weight_decay,
                                              data_format=DATA_FORMAT)
                with slim.arg_scope(arg_scope):
                    # predictions: (BS, H, W, 4, 21)
                    # localisations: (BS, H, W, 4, 4)
                    # logits: (BS, H, W, 4, 21)
                    predictions, localisations, logits, end_points = 
                        ssd_net.net(b_image, is_training=True)
    
                # Add loss function.
                ssd_net.losses(logits, localisations,
                               b_gclasses, b_glocalisations, b_gscores,
                               match_threshold=FLAGS.match_threshold,  # .5
                               negative_ratio=FLAGS.negative_ratio,  # 3
                               alpha=FLAGS.loss_alpha,  # 1
                               label_smoothing=FLAGS.label_smoothing)  # .0
    

    正向传播函数会获取相关的节点,损失函数则会将函数值添加到loss collection中。

  • 相关阅读:
    Unique Binary Search Trees(dp)
    Binary Tree Inorder Traversal
    Reverse Linked List II
    O​r​a​c​l​e​1​1​g​自​带​的​S​Q​L​ ​d​e​v​e​l​o​p​e​r​无​法​打​开​解​决​
    英语飙升的好方法
    MyEclipse加入jquery.js文件missing semicolon的错误
    js,jq获取手机屏幕分辨率的宽高
    给标签元素设固定宽高,内部添加滚动条显示
    解决手机端点击input的时候,页面会放大
    支付宝异步回调验证签名
  • 原文地址:https://www.cnblogs.com/hellcat/p/9355609.html
Copyright © 2020-2023  润新知