Faster-RCNN tensorflow源码阅读笔记

源码地址：https://github.com/endernewton/tf-faster-rcnn

看到一个博客写了对源码的解析，写的很简洁全面，估计再写也不可能比他写的好了，不过还是简单写下源码的解析及阅读后的感受吧。https://blog.csdn.net/u012457308/article/details/79566195

代码主要部分为输入处理，网络搭建及loss处理。最难的地方是各种reshape，如果不注意很容易就乱了，这个一定要理清

上一篇笔记简单介绍了Faster-RCNN,这篇主要介绍下其tensorflow源码阅读笔记。下载后工程如下，主要程序都存储在lib这个文件夹里面。接下来诸葛介绍该文件夹的内容。

1.数据读取

Datasets文件夹里主要是数据读取程序。这部分不做介绍，下一篇文章会介绍系列代码如何制作数据集等工作。

2.Nets文件夹包含特征提取网络(mobilenet,resnet_v1,vgg_16)，即RPN网络之前部分。第二部分为RPN网络。此部分作用是对特征提取网络进一步处理，产生porposes.第三部分为分类与预测框回归网络。RPN网络结构在network.py中。网络结构如下：

网络搭建部分主程序：

 1   def _build_network(self, is_training=True):
 2     # select initializers进行初始化
 3     if cfg.TRAIN.TRUNCATED:
 4       initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
 5       initializer_bbox = tf.truncated_normal_initializer(mean=0.0, stddev=0.001)
 6     else:
 7       initializer = tf.random_normal_initializer(mean=0.0, stddev=0.01)
 8       initializer_bbox = tf.random_normal_initializer(mean=0.0, stddev=0.001)
 9 
10     net_conv = self._image_to_head(is_training)##经过特征提取网络，初步提取特征
11     with tf.variable_scope(self._scope, self._scope):
12       # build the anchors for the image
13       self._anchor_component()###产生anchor
14       # region proposal network ###产生proposal的坐标
15       rois = self._region_proposal(net_conv, is_training, initializer)
16       # region of interest pooling
17       if cfg.POOLING_MODE == 'crop':
18         pool5 = self._crop_pool_layer(net_conv, rois, "pool5") ###对产生的porposal进行ROI池化，统一格式
19       else:
20         raise NotImplementedError
21 
22     fc7 = self._head_to_tail(pool5, is_training)
23     with tf.variable_scope(self._scope, self._scope):
24       # region classification 输入到Fast-RCNN网络中，对样本进行分类和预测框回归
25       cls_prob, bbox_pred = self._region_classification(fc7, is_training, 
26                                                         initializer, initializer_bbox)
27 
28     self._score_summaries.update(self._predictions)
29 
30     return rois, cls_prob, bbox_pred

逐步进行代码分析。

1. 特征提取网络，有三个备选项，论文好像选的是VGG-16，这部分就是输入网络，获得某卷积层的输出，没有特别的地方。

2.RPN网络，此部分是重点，也是Faster-RCNN与前面几种方法最大的不同

anchor产生方式：每个特征点周围产生9个anchor,分别为3种面积，3种比例。程序为generate_anchors.py。程序很好理解，就是(0,0,15,15）为基准点，有三种比例(0.5,1,2)求得基准anchor的坐标后，分别乘(8,16,32）倍的scal，就得到9种anchor的坐标，之后再平移，就得到所有特征点周围anchor的坐标。

产生proposal。

接下来是RPN网络部分主函数，产生目标的porposal：

 1   def _region_proposal(self, net_conv, is_training, initializer):
 2     rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer,
 3                         scope="rpn_conv/3x3") ##经过一个3X3卷积，之后分两条线
 4     self._act_summaries.append(rpn)
 5     rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training,
 6                                 weights_initializer=initializer,
 7                                 padding='VALID', activation_fn=None, scope='rpn_cls_score') ###第一条线产生预测类别确定是背景还是类别
 8     # change it so that the score has 2 as its channel size
 9     rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
10     rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
11     rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred")
12     rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
13     rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training,  ###第二条线产生预测框坐标，对预测框坐标进行预测
14                                 weights_initializer=initializer,
15                                 padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
16     if is_training:
17       rois, roi_scores = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois") ###根据预测的类别和预测框坐标对porposa进行筛选，对前N个进行NMS
18       rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor") 
19       # Try to have a deterministic order for the computing graph, for reproducibility
20       with tf.control_dependencies([rpn_labels]):
21         rois, _ = self._proposal_target_layer(rois, roi_scores, "rpn_rois")
22     else:
23       if cfg.TEST.MODE == 'nms':
24         rois, _ = self._proposal_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
25       elif cfg.TEST.MODE == 'top':
26         rois, _ = self._proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, "rois")
27       else:
28         raise NotImplementedError
29 
30     self._predictions["rpn_cls_score"] = rpn_cls_score
31     self._predictions["rpn_cls_score_reshape"] = rpn_cls_score_reshape
32     self._predictions["rpn_cls_prob"] = rpn_cls_prob
33     self._predictions["rpn_cls_pred"] = rpn_cls_pred
34     self._predictions["rpn_bbox_pred"] = rpn_bbox_pred
35     self._predictions["rois"] = rois
36 
37     return rois

产生目标porposals的坐标后，因为size大小不一样，需要进行ROI池化，使输出为统一维度。程序如下：

 1   def _crop_pool_layer(self, bottom, rois, name): ####bottom为convert层卷积输出， feat_stride为补偿乘积，用来求得原图的w,h.rois为选出的256个anchor的坐标
 2     with tf.variable_scope(name) as scope:
 3       batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
 4       # Get the normalized coordinates of bounding boxes
 5       bottom_shape = tf.shape(bottom)
 6       height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
 7       width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
 8       x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
 9       y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
10       x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
11       y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height###得到相对位置
12       # Won't be back-propagated to rois anyway, but to save time
13       bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
14       pre_pool_size = cfg.POOLING_SIZE * 2
15       crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")##利用tensorflow的自带函数作用类似于ROI池化
16 
17     return slim.max_pool2d(crops, [2, 2], padding='SAME')

之后就是Fast-RCNN部分的网络结构了，即根据产生的porposal，进行分类及预测。网络部分代码：

 1   def _region_classification(self, fc7, is_training, initializer, initializer_bbox):
 2     cls_score = slim.fully_connected(fc7, self._num_classes, 
 3                                        weights_initializer=initializer,
 4                                        trainable=is_training,
 5                                        activation_fn=None, scope='cls_score')
 6     cls_prob = self._softmax_layer(cls_score, "cls_prob")
 7     cls_pred = tf.argmax(cls_score, axis=1, name="cls_pred")
 8     bbox_pred = slim.fully_connected(fc7, self._num_classes * 4, 
 9                                      weights_initializer=initializer_bbox,
10                                      trainable=is_training,
11                                      activation_fn=None, scope='bbox_pred')
12 
13     self._predictions["cls_score"] = cls_score
14     self._predictions["cls_pred"] = cls_pred
15     self._predictions["cls_prob"] = cls_prob
16     self._predictions["bbox_pred"] = bbox_pred
17 
18     return cls_prob, bbox_pred

最后说一下loss函数：分为RPN部分loss和RCNN部分loss。整体网络训练是联合训练的。

 1   def _add_losses(self, sigma_rpn=3.0):
 2     with tf.variable_scope('LOSS_' + self._tag) as scope:
 3       # RPN, class loss
 4       rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2])
 5       rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1])
 6       rpn_select = tf.where(tf.not_equal(rpn_label, -1))
 7       rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2])
 8       rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1])
 9       rpn_cross_entropy = tf.reduce_mean(
10         tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))
11 
12       # RPN, bbox loss
13       rpn_bbox_pred = self._predictions['rpn_bbox_pred']
14       rpn_bbox_targets = self._anchor_targets['rpn_bbox_targets']
15       rpn_bbox_inside_weights = self._anchor_targets['rpn_bbox_inside_weights']
16       rpn_bbox_outside_weights = self._anchor_targets['rpn_bbox_outside_weights']
17       rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
18                                           rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])
19 
20       # RCNN, class loss
21       cls_score = self._predictions["cls_score"]
22       label = tf.reshape(self._proposal_targets["labels"], [-1])
23       cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=cls_score, labels=label))
24 
25       # RCNN, bbox loss
26       bbox_pred = self._predictions['bbox_pred']
27       bbox_targets = self._proposal_targets['bbox_targets']
28       bbox_inside_weights = self._proposal_targets['bbox_inside_weights']
29       bbox_outside_weights = self._proposal_targets['bbox_outside_weights']
30       loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)
31 
32       self._losses['cross_entropy'] = cross_entropy
33       self._losses['loss_box'] = loss_box
34       self._losses['rpn_cross_entropy'] = rpn_cross_entropy
35       self._losses['rpn_loss_box'] = rpn_loss_box
36 
37       loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box
38       regularization_loss = tf.add_n(tf.losses.get_regularization_losses(), 'regu')
39       self._losses['total_loss'] = loss + regularization_loss
40 
41       self._event_summaries.update(self._losses)
42 
43     return loss

相关阅读:
java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log from class org.apache.poi.openxml4j.opc.ZipPackage
相同域名不同端口的两个应用，cookie名字、路径都相同的情况下，后面cookie会覆盖前面cookie吗
 power designer 连接mysql提示“connection test failed”
疑问：Spring 中构造器、init-method、@PostConstruct、afterPropertiesSet 孰先孰后，自动注入发生时间
 intelj idea 创建聚合项目（典型web项目，包括子项目util、dao、service）
Mysql启动时提示：Another MySQL daemon already running with the same unix socket.
MySql中的varchar长度究竟是字节还是字符
 百度echarts使用--y轴label数字太长难以全部显示
 记录项目中用的laypage分页代码
 Ubuntu16.04下安装Cmake-3.8.2并为其配置环境变量
原文地址：https://www.cnblogs.com/the-home-of-123/p/9747963.html