Tutorial: Triplet Loss Layer Design for CNN

Tutorial: Triplet Loss Layer Design for CNN

Xiao Wang 2016.05.02

　　Triplet Loss Layer could be a trick for further improving the accuracy of CNN. Today, I will introduce the whole process, and display the code for you. This tutorial mainly from the blog:

　　http://blog.csdn.net/tangwei2014/article/details/46812153

　　http://blog.csdn.net/tangwei2014/article/details/46788025

　　and the paper: <FaceNet: A Unified Embedding for Face Recognition and Clustering>.

　　First, Let's talk about how to add the layer into caffe and make test this layer to check whether it works or not. And then, we will discuss the paper and introduce the process of how the triplet loss come from. In the new version of caffe framework, it mainly consists of these steps for add a new layer i.e.

　　step 1. add the paprameter message in the corresponding layer, which located in ./src/caffe/proto/caffe.proto ;

　　step 2. add the declaration information of the layer in ./include/caffe/***layers.hpp ;

　　step 3. add the corresponding .cpp and .cu files in ./src/caffe/layers/, realize the function of the new added layer;

　　step 4. add test code of new added layers in ./src/caffe/gtest/, test its foreward and back propagation and its computation speed.

　　Let's do it step by step.

　　First, we add triplet loss layer in caffe.proto file:

　　we could found that in line 101, it said: SolverParameter next available ID: 40 (last added: momentum2), thus we add the ID: 40 as the new added information :　　

　　　　　　　　message RankParameter {
　　　　　　　　optional uint32 neg_num = 1 [default = 1];
　　　　　　　　optional uint32 pair_size = 2 [default = 1];
　　　　　　　　optional float hard_ratio = 3;
　　　　　　　　optional float rand_ratio = 4;
　　　　　　　　optional float margin = 5 [default = 0.5];
　　　　　　　　}

　　Second, we add the declearation information about triplet loss layer in ./include/caffe/TripletLoss_layers.hpp

　　Third, We compile the triplet loss layer of .cpp and .cu file

　　First of all is the .cpp file

  1 #include <vector>
  2 
  3 #include <algorithm>
  4 #include <cmath>
  5 #include <cfloat>
  6 
  7 #include "caffe/layer.hpp"
  8 #include "caffe/util/io.hpp"
  9 #include "caffe/util/math_functions.hpp"
 10 #include "caffe/vision_layers.hpp"
 11 
 12 using std::max;
 13 using namespace std;
 14 using namespace cv;
 15 
 16 namespace caffe {
 17 
 18 int myrandom (int i) { return caffe_rng_rand()%i;}
 19 
 20 
 21 template <typename Dtype>
 22 void RankHardLossLayer<Dtype>::Reshape(
 23   const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
 24   LossLayer<Dtype>::Reshape(bottom, top);
 25 
 26   diff_.ReshapeLike(*bottom[0]);
 27   dis_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 28   mask_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 29 }
 30 
 31 
 32 template <typename Dtype>
 33 void RankHardLossLayer<Dtype>::set_mask(const vector<Blob<Dtype>*>& bottom)
 34 {
 35 
 36     RankParameter rank_param = this->layer_param_.rank_param();
 37     int neg_num = rank_param.neg_num();
 38     int pair_size = rank_param.pair_size();
 39     float hard_ratio = rank_param.hard_ratio();
 40     float rand_ratio = rank_param.rand_ratio();
 41     float margin = rank_param.margin();
 42 
 43     int hard_num = neg_num * hard_ratio;
 44     int rand_num = neg_num * rand_ratio;
 45 
 46     const Dtype* bottom_data = bottom[0]->cpu_data();
 47     const Dtype* label = bottom[1]->cpu_data();
 48     int count = bottom[0]->count();
 49     int num = bottom[0]->num();
 50     int dim = bottom[0]->count() / bottom[0]->num();
 51     Dtype* dis_data = dis_.mutable_cpu_data();
 52     Dtype* mask_data = mask_.mutable_cpu_data();
 53 
 54     for(int i = 0; i < num * num; i ++)
 55     {
 56         dis_data[i] = 0;
 57         mask_data[i] = 0;
 58     }
 59 
 60     // calculate distance
 61     for(int i = 0; i < num; i ++)
 62     {
 63         for(int j = i + 1; j < num; j ++)
 64         {
 65             const Dtype* fea1 = bottom_data + i * dim;
 66             const Dtype* fea2 = bottom_data + j * dim;
 67             Dtype ts = 0;
 68             for(int k = 0; k < dim; k ++)
 69             {
 70               ts += (fea1[k] * fea2[k]) ;   
 71             }                                
 72             dis_data[i * num + j] = -ts;    
 73             dis_data[j * num + i] = -ts;    
 74         }
 75     }
 76 
 77     //select samples
 78 
 79     vector<pair<float, int> >negpairs;
 80     vector<int> sid1;
 81     vector<int> sid2;
 82 
 83 
 84     for(int i = 0; i < num; i += pair_size)
 85     {
 86         negpairs.clear();
 87         sid1.clear();
 88         sid2.clear();
 89         for(int j = 0; j < num; j ++)
 90         {
 91             if(label[j] == label[i])
 92                 continue;
 93             Dtype tloss = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
 94             if(tloss == 0) continue;
 95 
 96             negpairs.push_back(make_pair(dis_data[i * num + j], j));
 97         }
 98         if(negpairs.size() <= neg_num)
 99         {
100             for(int j = 0; j < negpairs.size(); j ++)
101             {
102                 int id = negpairs[j].second;
103                 mask_data[i * num + id] = 1;
104             }
105             continue;
106         }
107         sort(negpairs.begin(), negpairs.end());
108 
109         for(int j = 0; j < neg_num; j ++)
110         {
111             sid1.push_back(negpairs[j].second);
112         }
113         for(int j = neg_num; j < negpairs.size(); j ++)
114         {
115             sid2.push_back(negpairs[j].second);
116         }
117         std::random_shuffle(sid1.begin(), sid1.end(), myrandom);
118         for(int j = 0; j < min(hard_num, (int)(sid1.size()) ); j ++)
119         {
120             mask_data[i * num + sid1[j]] = 1;
121         }
122         for(int j = hard_num; j < sid1.size(); j++)
123         {
124             sid2.push_back(sid1[j]);
125         }
126         std::random_shuffle(sid2.begin(), sid2.end(), myrandom);
127         for(int j = 0; j < min( rand_num, (int)(sid2.size()) ); j ++)
128         {
129             mask_data[i * num + sid2[j]] = 1;
130         }
131 
132     }
133 
134 
135 }
136 
137 
138 
139 
140 template <typename Dtype>
141 void RankHardLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
142     const vector<Blob<Dtype>*>& top) {
143 
144     const Dtype* bottom_data = bottom[0]->cpu_data();
145     const Dtype* label = bottom[1]->cpu_data();
146     int count = bottom[0]->count();
147     int num = bottom[0]->num();
148     int dim = bottom[0]->count() / bottom[0]->num();
149 
150 
151     RankParameter rank_param = this->layer_param_.rank_param();
152     int neg_num = rank_param.neg_num();      // 4
153     int pair_size = rank_param.pair_size();  // 5
154     float hard_ratio = rank_param.hard_ratio();
155     float rand_ratio = rank_param.rand_ratio();
156     float margin = rank_param.margin();
157     Dtype* dis_data = dis_.mutable_cpu_data();
158     Dtype* mask_data = mask_.mutable_cpu_data();
159 
160     set_mask(bottom);
161     Dtype loss = 0;
162     int cnt = neg_num * num / pair_size * 2;
163 
164     for(int i = 0; i < num; i += pair_size)
165     {
166         for(int j = 0; j < num; j++)
167         {
168             if(mask_data[i * num + j] == 0) 
169                 continue;
170             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
171             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
172             loss += tloss1 + tloss2;
173         }
174     }
175 
176     loss = loss / cnt;
177     top[0]->mutable_cpu_data()[0] = loss;
178 }
179 
180 
181 
182 
183 template <typename Dtype>
184 void RankHardLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
185     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
186 
187 
188     const Dtype* bottom_data = bottom[0]->cpu_data();
189     const Dtype* label = bottom[1]->cpu_data();
190     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
191     int count = bottom[0]->count();
192     int num = bottom[0]->num();
193     int dim = bottom[0]->count() / bottom[0]->num();
194 
195     RankParameter rank_param = this->layer_param_.rank_param();
196     int neg_num = rank_param.neg_num();
197     int pair_size = rank_param.pair_size();
198     float hard_ratio = rank_param.hard_ratio();
199     float rand_ratio = rank_param.rand_ratio();
200     float margin = rank_param.margin();
201 
202     Dtype* dis_data = dis_.mutable_cpu_data();
203     Dtype* mask_data = mask_.mutable_cpu_data();
204 
205     for(int i = 0; i < count; i ++ )
206         bottom_diff[i] = 0;
207 
208     int cnt = neg_num * num / pair_size * 2;
209 
210     for(int i = 0; i < num; i += pair_size)
211     {
212         const Dtype* fori = bottom_data + i * dim;
213         const Dtype* fpos = bottom_data + (i + 1) * dim;
214 
215         Dtype* fori_diff = bottom_diff + i * dim;
216         Dtype* fpos_diff = bottom_diff + (i + 1) * dim;
217         for(int j = 0; j < num; j ++)
218         {
219             if(mask_data[i * num + j] == 0) continue;
220             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
221             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
222 
223             const Dtype* fneg = bottom_data + j * dim;
224             Dtype* fneg_diff  = bottom_diff + j * dim;
225             if(tloss1 > 0)
226             {
227                 for(int k = 0; k < dim; k ++)
228                 {
229                     fori_diff[k] += (fneg[k] - fpos[k]); // / (pairNum * 1.0 - 2.0);
230                     fpos_diff[k] += -fori[k]; // / (pairNum * 1.0 - 2.0);
231                     fneg_diff[k] +=  fori[k];
232                 }
233             }
234             if(tloss2 > 0)
235             {
236                 for(int k = 0; k < dim; k ++)
237                 {
238                     fori_diff[k] += -fpos[k]; // / (pairNum * 1.0 - 2.0);
239                     fpos_diff[k] += fneg[k]-fori[k]; // / (pairNum * 1.0 - 2.0);
240                     fneg_diff[k] += fpos[k];
241                 }
242             }
243 
244         }
245     }
246 
247     for (int i = 0; i < count; i ++)
248     {
249         bottom_diff[i] = bottom_diff[i] / cnt;
250     }
251 
252 }
253 
254 #ifdef CPU_ONLY
255 STUB_GPU(RankHardLossLayer);
256 #endif
257 
258 INSTANTIATE_CLASS(RankHardLossLayer);
259 REGISTER_LAYER_CLASS(RankHardLoss);
260 
261 }  // namespace caffe

View Code

　　and the .cu file

 1 #include <vector>
 2 
 3 #include "caffe/layer.hpp"
 4 #include "caffe/util/io.hpp"
 5 #include "caffe/util/math_functions.hpp"
 6 #include "caffe/vision_layers.hpp"
 7 
 8 namespace caffe {
 9 
10 template <typename Dtype>
11 void RankHardLossLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
12     const vector<Blob<Dtype>*>& top) {
13   Forward_cpu(bottom, top);
14 }
15 
16 template <typename Dtype>
17 void RankHardLossLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
18     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
19    Backward_cpu(top, propagate_down, bottom);
20 }
21 
22 INSTANTIATE_LAYER_GPU_FUNCS(RankHardLossLayer);
23 
24 }  // namespace caffe

View Code

　　Finally, we make the caffe file and check whether have some mistakes about it.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

　　Let's continue to talk about the triplet loss:

　　Just like the above figure showns, the triplet loss usually have three components, i.e. the anchors, the positive, and the negative. What we are going to do is try to reduce the distance between the archor and the same, and push the different from the anchors.

　　Thus, the whole loss could be described as following:

　　Only select triplets randomly may lead to slow converage of the network, and we need to find those hard triplets, that are active and can therefore contribute to improving the model. The following section will give you an explanination about the approach.

　　Triplet Selection:

　　There are two appproaches for generate triplets, i.e.

　　1. Generate triplets offline every n steps, using the most recent newwork checkpoint and computing the argmin and argmax on a subset of the data.

　　2. Generate the triplets online. This can be done by selecting the hard positive/negative exemplars form within a mini-batch.

　　This paper use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. the all anchor-positive method was more stable and converaged slightly faster at the begining of training.

　　The code could refer the github page: https://github.com/wangxiao5791509/caffe-video_triplet

layer {     
    name: "loss"    
    type: "RankHardLoss"    
    rank_param{     
        neg_num: 4  
        pair_size: 2    
        hard_ratio: 0.5     
        rand_ratio: 0.5     
        margin: 1   
    }   
    bottom: "norml2"    
    bottom: "label"     
}

Triplet Loss Implementation using Pytorch:

　　the following document comes from: https://pytorch.org/docs/stable/nn.html#tripletmarginloss

　　Creates a criterion that measures the triplet loss given an input tensors x1, x2, x3 and a margin with a value greater than 0. This is used for measuring a relative similarity between samples. A triplet is composed by a, p and n: anchor, positive examples and negative example respectively. The shapes of all input tensors should be $(N, D)$

　　The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.

　　The loss function for each sample in the mini-batch is:

L (a, p, n) = max {d (a i, p i) - d (a i, n i) + m a r g i n, 0}

　　where $d (x_{i}, y_{i}) = {‖ x_{i} - y_{i} ‖}_{p}$

Parameters:	margin (float, optional) – Default: 1. p (int, optional) – The norm degree for pairwise distance. Default: 2. swap (float, optional) – The distance swap is described in detail in the paperLearning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. Default: `False`. size_average (bool, optional) – By default, the losses are averaged over observations for each minibatch. However, if the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Default: `True` reduce (bool, optional) – By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`

Parameters:

margin (float, optional) – Default: 1.
p (int, optional) – The norm degree for pairwise distance. Default: 2.
swap (float, optional) – The distance swap is described in detail in the paperLearning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. Default: False.
size_average (bool, optional) – By default, the losses are averaged over observations for each minibatch. However, if the field size_average is set to False, the losses are instead summed for each minibatch. Default: True
reduce (bool, optional) – By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True

Shape:

Input: $(N, D)$
Output: scalar. If reduce is False, then (N).

>>> triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
>>> input1 = torch.randn(100, 128, requires_grad=True)
>>> input2 = torch.randn(100, 128, requires_grad=True)
>>> input3 = torch.randn(100, 128, requires_grad=True)
>>> output = triplet_loss(input1, input2, input3)
>>> output.backward()
Example: 

import torch.nn as nn 
triplet_loss = nn.TripletMarginLoss(margin=1.2, p=2)   
# 计算特征向量
anchor = model.forward(data[0])
positive = model.forward(data[1])
negative = model.forward(data[2])
# 计算三元组loss
loss = triplet_loss.forward(anchor, positive, negative) 
loss.backward()
optimizer.step()
 

相关阅读:
leetcode每日一题：836. 矩形重叠
 单链表之删除头结点，查找等于定值x的结点数，单链表的逆置
 拼数，零幺串
 最大公约数/最小公倍数
 寻找二叉树双亲结点
 Object类的派生-c++
牛客小白月赛22
二叉树的基本操作
 字符串的反转，替换，删除
 [2011山东ACM省赛] Identifiers（模拟）
原文地址：https://www.cnblogs.com/wangxiaocvpr/p/5452367.html