• 目标检测算法-YOLO-V1训练代码详解


    YOLO-V1网络结构由24个卷积层与2个全连接层构成,网络入口为448×448×3,输出维度:S×S×(B×5+C),S为划分网格数,B为每个网格负责目标个数,C为类别个数。

    YOLO-V1是将一副图像分成S×S个网格,如果某个object的中心落在这个网格中,则这个网格就负责预测这个object,每个网格要预测B个bounding box,每个bounding box要预测一个confidence值,这个confidence代表了所预测的bounding box中含有object的置信度和这个bounding box预测的有多准这两个重要信息。

    Pr(Object)IoUpredtruth

    如果有object落在一个网格中,公式第一项取1,否则取0,第二项是bounding box和真实框的IOU的值(confidence针对每个bounding box,框中有没有网格包含object中心点。YOLO-V1中每个网格有两个bounding box,对于每个bounding box有5个预测值,x,y,w,h,confidence,每一个网格还要预测C条件类别的概率,即在一个网格包含一个object的前提下,它属于某个类别的概率。(x,y)表示bounding box相对于网格单元的边界的offset,归一化到(0,1)范围之内,而w,h表示相对于整个图片的预测宽和高,也被归一化到(0,1)范围内。c代表的是object在某个bounding box的confidence。confidence计算如下:

     Pr(ClassiObject)Pr(Object)IoUpredtruth=Pr(Classi)IoUpredtruth

    下面说明如何将预测坐标的x,y用相对于对应网格的offset归一化到0-1和w,h是如何利用图像的宽高归一化到0-1之间。每个单元格预测的B个(x,y,w,h,confidence)向量,假设图片为S×S个网格,S=7,图片宽为w​i高为hi 。

    下面引用一张我看过的感觉讲解很详细的一张图片:

    1.(x,y)是bbox的中心相对于单元格的offset对应于上图中的蓝色单元格,坐标为(xcol=1,yrow=4),加射它的预测输出是红色框bbox,设bbox的中心坐标为(xc,yc),那么最终预测出来的(x,y)是经过归一化处理的,表示的是相对于单元格的offset,公式为:x=wi​ / x∗ Sxcoly=hi / y∗ Syrow

     2.(w,h)是bbox相对于整个图片的比例预测的bbox的宽高为wb,hb,(w,h)表示的是bbox相对于整张图片的占比,公式为:w=wi​ / wb,h=hi / hb

    YOLO-V1中需要的参数


     1 def __init__(self):
     2     self.classes = ["aeroplane", "bicycle", "bird", "boat", "bottle",
     3                     "bus", "car", "cat", "chair", "cow", "diningtable",
     4                     "dog", "horse", "motorbike", "person", "pottedplant",
     5                     "sheep", "sofa", "train", "tvmonitor"]
     6     #计算坐标用的
     7     self.x_offset = np.transpose(np.reshape(np.array([np.arange(7)] * 7 * 2, dtype=np.float32), [2, 7, 7]), [1, 2, 0])
     8     self.y_offset = np.transpose(self.x_offset, [1, 0, 2])
     9     #输入图片大小
    10     self.img_size = (448, 448)
    11     #阈值
    12     self.iou_threshold = 0.5
    13     self.batch_size = 45
    14     #计算loss需要的参数
    15     self.class_scale = 2.0
    16     self.object_scale = 1.0
    17     self.noobject_scale = 1.0
    18     self.coord_scale = 5.0

    网络部分开始


     1 def _build_net(self):
     2     x = tf.placeholder(tf.float32, [None, 448, 448, 3])
     3     with tf.variable_scope('yolo'):
     4         net = self.conv_layer(x, 64, 7, 2, 'conv_2')
     5         net = self.max_pool_layer(net, 2, 2)
     6         net = self.conv_layer(net, 192, 3, 1, 'conv_4')
     7         net = self.max_pool_layer(net, 2, 2)
     8         net = self.conv_layer(net, 128, 1, 1, 'conv_6')
     9         net = self.conv_layer(net, 256, 3, 1, 'conv_7')
    10         net = self.conv_layer(net, 256, 1, 1, 'conv_8')
    11         net = self.conv_layer(net, 512, 3, 1, 'conv_9')
    12         net = self.max_pool_layer(net, 2, 2)
    13         net = self.conv_layer(net, 256, 1, 1, 'conv_11')
    14         net = self.conv_layer(net, 512, 3, 1, 'conv_12')
    15         net = self.conv_layer(net, 256, 1, 1, 'conv_13')
    16         net = self.conv_layer(net, 512, 3, 1, 'conv_14')
    17         net = self.conv_layer(net, 256, 1, 1, 'conv_15')
    18         net = self.conv_layer(net, 512, 3, 1, 'conv_16')
    19         net = self.conv_layer(net, 256, 1, 1, 'conv_17')
    20         net = self.conv_layer(net, 512, 3, 1, 'conv_18')
    21         net = self.conv_layer(net, 512, 1, 1, 'conv_19')
    22         net = self.conv_layer(net, 1024, 3, 1, 'conv_20')
    23         net = self.max_pool_layer(net, 2, 2)
    24         net = self.conv_layer(net, 512, 1, 1, 'conv_22')
    25         net = self.conv_layer(net, 1024, 3, 1, 'conv_23')
    26         net = self.conv_layer(net, 512, 1, 1, 'conv_24')
    27         net = self.conv_layer(net, 1024, 3, 1, 'conv_25')
    28         net = self.conv_layer(net, 1024, 3, 1, 'conv_26')
    29         net = self.conv_layer(net, 1024, 3, 2, 'conv_28')
    30         net = self.conv_layer(net, 1024, 3, 1, 'conv_29')
    31         net = self.conv_layer(net, 1024, 3, 1, 'conv_30')
    32         net = self.flatten_layer(net)
    33         net = self.dense_layer(net, 512, activation=self.Leaky_Relu, scope='fc_33')
    34         net = self.dense_layer(net, 4096, activation=self.Leaky_Relu, scope='fc_34')
    35         net = self.dense_layer(net, 7 * 7 * 30, scope='fc_36')
    36     return net

    需要的一些层

     1 # 激活函数使用Leaky
     2 def Leaky_Relu(self, x):
     3     return tf.maximum(x * 0.1, x)
     4 # 卷积层
     5 def conv_layer(self, x, filter, kernel_size, stride, scope):
     6     channel = x.get_shape().as_list()[-1]
     7     weight = tf.Variable(tf.truncated_normal(shape=[kernel_size, kernel_size, channel, filter], stddev=0.1),
     8                          name="weights")
     9     bias = tf.Variable(tf.zeros([filter, ]), name="biases")
    10     pad_size = kernel_size // 2
    11     x = tf.pad(x, paddings=[[0, 0], [pad_size, pad_size], [pad_size, pad_size], [0, 0]])
    12 
    13     conv = tf.nn.conv2d(x, weight, strides=[1, stride, stride, 1], padding="VALID", name=scope)
    14     output = self.Leaky_Relu(tf.nn.bias_add(conv, bias))
    15     return output
    16 # 最大池化层
    17 def max_pool_layer(self, x, pool_size, stride):
    18     return tf.nn.max_pool(x, [1, pool_size, pool_size, 1], strides=[1, stride, stride, 1], padding="SAME")
    19 # 全连接层
    20 def dense_layer(self, x, filter, activation=None, scope=None):
    21     channel = x.get_shape().as_list()[-1]
    22     weight = tf.Variable(tf.truncated_normal(shape=[channel, filter], stddev=0.1), name="weights")
    23     bias = tf.Variable(tf.zeros([filter, ]), name="biases")
    24     output = tf.nn.xw_plus_b(x, weight, bias, name=scope)
    25     if activation:
    26         output = activation(output)
    27     return output
    28 # flatten层
    29 def flatten_layer(self, x):
    30     x = tf.transpose(x, [0, 3, 1, 2])
    31     shape = x.get_shape().as_list()[1:]
    32     nums = np.product(shape)
    33     return tf.reshape(x, [-1, nums])

    网络部分结束


    损失函数部分

    YOLO-V1损失函数:

     

    (1)只有当某个网格中有object的时候才对类别预测进行惩罚。

    (2)只有当某个bounding box对某个真实框负责的时候,才会对box的坐标预测进行惩罚,而对哪个真实框负责就看其bounding box和真实框的IOU是不是在那个网格中的所有box中最大。

    为什么公式中对w,h开根号呢?


    黑的框为bounding box,红色的框跟绿色的框为真实标注框,如果w,h没有平方根,那么bounding box跟两个真实标注的位置loss是相同的,但是从面积来看黑色的框是绿色的25倍,红色的框是黑色的81/25倍,黑色框跟绿色框的大小偏差更大,

    不应该得到相同的loss,如果w和h加上平方根,那么才更加符合我们的实际判断。

    计算IOU的函数

     1 def calc_iou(self, bboxes1, bboxes2):
     2     # 计算两个box的交集:交集左上角的点取两个box的max,交集右下角的点取两个box的min
     3     int_ymin = np.maximum(bboxes1[..., 0], bboxes2[..., 0])
     4     int_xmin = np.maximum(bboxes1[..., 1], bboxes2[..., 1])
     5     int_ymax = np.minimum(bboxes1[..., 2], bboxes2[..., 2])
     6     int_xmax = np.minimum(bboxes1[..., 3], bboxes2[..., 3])
     7 
     8     # 计算两个box交集的wh:如果两个box没有交集,那么wh为0(按照计算方式wh为负数,跟0比较取最大值)
     9     int_h = np.maximum(int_ymax - int_ymin, 0.)
    10     int_w = np.maximum(int_xmax - int_xmin, 0.)
    11 
    12     # 计算IOU
    13     int_vol = int_h * int_w  # 交集面积
    14     vol1 = (bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1])  # bboxes1面积
    15     vol2 = (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1])  # bboxes2面积
    16     iou = int_vol / (vol1 + vol2 - int_vol)  # IOU=交集/并集
    17     return iou
      1 def loss_layer(self, predicts, labels, scope='loss_layer'):
      2     # label为((batch_size,7,7,25))  5个为盒子信息  (x,y,w,h,c)  后20个为类别
      3     with tf.variable_scope(scope):
      4         # 预测值
      5         # class-20
      6         #网络输出是(batch_size,1470)
      7         predict_classes = tf.reshape(
      8             predicts[:, :7 * 7 * 20],
      9             [self.batch_size, 7, 7, 20])
     10         # confidence-2
     11         predict_confidence = tf.reshape(
     12             predicts[:, 7 * 7 * 20:7 * 7 * 20 + 7 * 7 * 2],
     13             [self.batch_size, 7, 7, 2])
     14         # bounding box-2*4
     15         predict_boxes = tf.reshape(
     16             predicts[:, 7 * 7 * 20 + 7 * 7 * 2:],
     17             [self.batch_size, 7, 7, 2, 4])
     18 
     19         # 实际值
     20         # shape(45,7,7,1)
     21         # response中的值为0或者1.对应的网格中存在目标为1,不存在目标为0.
     22         # 存在目标指的是存在目标的中心点,并不是说存在目标的一部分。所以,目标的中心点所在的cell其对应的值才为1,其余的值均为0
     23         response = tf.reshape(
     24             labels[..., 0],
     25             [self.batch_size, 7, 7, 1])
     26         # shape(45,7,7,1,4)
     27         boxes = tf.reshape(
     28             labels[..., 1:5],
     29             [self.batch_size, 7, 7, 1, 4])
     30         # shape(45,7,7,2,4),boxes的四个值,取值范围为0~1
     31         boxes = tf.tile(
     32             boxes, [1, 1, 1, 2, 1]) / self.img_shape[0]
     33         # shape(45,7,7,20)
     34         classes = labels[..., 5:]
     35 
     36         # self.offset shape(7,7,2)
     37         # offset shape(1,7,7,2)
     38 
     39         # shape(45,7,7,2)
     40         x_offset = tf.tile(self.x_offset, [self.batch_size, 1, 1, 1])  # (45,7,7,2)
     41         # shape(45,7,7,2)
     42         y_offset = tf.transpose(x_offset, (0, 2, 1, 3))
     43 
     44 
     45         # shape(45,7,7,2,4)  ->(x,y,w,h)
     46         predict_boxes_tran = tf.stack(
     47             [(predict_boxes[..., 0] + x_offset) / 7,
     48              (predict_boxes[..., 1] + y_offset) / 7,
     49              tf.square(predict_boxes[..., 2]),
     50              tf.square(predict_boxes[..., 3])], axis=-1)
     51 
     52         # 预测box与真实box的IOU,shape(45,7,7,2)
     53         iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)
     54 
     55         # shape(45,7,7,1)
     56         # 在训练时,如果该单元格内确实存在目标,那么只选择IOU最大的那个边界框来负责预测该目标,而其它边界框认为不存在目标
     57         object_mask = tf.reduce_max(iou_predict_truth, axis=3, keep_dims=True)
     58         # object_mask shape(45,7,7,2)
     59         object_mask = tf.cast(
     60             (iou_predict_truth >= object_mask), tf.float32) * response
     61 
     62         # noobject confidence(45,7,7,2)
     63         #单元格内没有物体的地方为1有物体的地方为0
     64         noobject_probs = tf.ones_like(
     65             object_mask, dtype=tf.float32) - object_mask
     66 
     67         # shape(45,7,7,2,4),对boxes的四个值进行规整,xy为相对于网格左上角,wh为取根号后的值,范围0~1
     68         boxes_tran = tf.stack(
     69             [boxes[..., 0] * 7 - x_offset,
     70              boxes[..., 1] * 7 - y_offset,
     71              tf.sqrt(boxes[..., 2]),
     72              tf.sqrt(boxes[..., 3])], axis=-1)
     73 
     74         # class_loss shape(45,7,7,20)
     75         class_delta = response * (predict_classes - classes)
     76         class_loss = tf.reduce_mean(
     77             tf.reduce_sum(tf.square(class_delta), axis=[1, 2, 3]),
     78             name='class_loss') * self.class_scale
     79 
     80         # object_loss  confidence=iou*p(object)
     81         # p(object)的值为1或0
     82         object_delta = object_mask * (predict_confidence - iou_predict_truth)
     83         object_loss = tf.reduce_mean(
     84             tf.reduce_sum(tf.square(object_delta), axis=[1, 2, 3]),
     85             name='object_loss') * self.object_scale
     86 
     87         # noobject_loss  p(object)的值为0
     88         noobject_delta = noobject_probs * predict_confidence
     89         noobject_loss = tf.reduce_mean(
     90             tf.reduce_sum(tf.square(noobject_delta), axis=[1, 2, 3]),
     91             name='noobject_loss') * self.noobject_scale
     92 
     93         # coord_loss
     94         coord_mask = tf.expand_dims(object_mask, 4)
     95         boxes_delta = coord_mask * (predict_boxes - boxes_tran)
     96         coord_loss = tf.reduce_mean(
     97             tf.reduce_sum(tf.square(boxes_delta), axis=[1, 2, 3, 4]),
     98             name='coord_loss') * self.coord_scale
     99 
    100         return class_loss + object_loss + noobject_loss + coord_loss

    损失函数部分结束


    YOLO_V1缺点
    1.每个网格只对应2个bounding box,当物体的长宽比不常见(也就是训练数据覆盖不到时),效果较差。

    2.原始图片只划分为7×7的网格,当两个物体考的很近时,效果比较差。

    3.最终每个网格只对应一个类别,容易出现漏检(物体没有被识别到) eg:两个物体中心点相同

    4.对于图片中比较小的物体,效果比较差。




     

  • 相关阅读:
    Lucene.Net 2.3.1开发介绍 —— 二、分词(一)
    控制‘控制台应用程序’的关闭操作
    详解for循环(各种用法)
    敏捷软件开发
    Sql Server的一些知识点
    在SharePoint 2010 中配置Remote Blob Storage FILESTREAM Provider
    使用LotusScript操作Lotus Notes RTF域
    JOpt Simple 4.5 发布,命令行解析器
    John the Ripper 1.8.0 发布,密码破解工具
    PacketFence ZEN 4.0.1 发布,网络接入控制
  • 原文地址:https://www.cnblogs.com/cucwwb/p/12791857.html
Copyright © 2020-2023  润新知