LossLayer类简介
LossLayer类是caffe中各种loss layer的基类,本身并不涉及网络的loss的具体计算,只是规定loss layer的一些通用属性,如输出blob的loss权重默认为1,预测数据与标签数据的维度匹配等。
loss_layer.cpp源码
template <typename Dtype>
void LossLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// LossLayers have a non-zero (1) loss by default.
if (this->layer_param_.loss_weight_size() == 0) { //loss layer默认权重为1
this->layer_param_.add_loss_weight(Dtype(1)); //layer param中未设置则置为1
}
}
template <typename Dtype>
void LossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
CHECK_EQ(bottom[0]->shape(0), bottom[1]->shape(0))
<< "The data and label should have the same first dimension."; //第0维的值必须相等
vector<int> loss_shape(0); // Loss layers output a scalar; 0 axes.
top[0]->Reshape(loss_shape); //调整输出blob的大小,一维,大小为0
}
MultinomialLogisticLossLayer类简介
MultinomialLogisticLossLayer类用于计算单标签的多分类任务的logistic loss,每个数据只允许有一个标签值,但是可以划分成多种类别。
- 第一个输入blob为网络的预测概率,大小(N imes C imes H imes W),范围(hat{p}_{n,k} in [0, 1]),第(n)个数据的属于第(k)类的预测概率为(hat{p}_{n,k}),且(forall n, sumlimits_{k=1}^K hat{p}_{n,k} = 1)
- 其中(N)为数据的个数,(K=C imes H imes W)为类别总数
- 第二个输入blob为标签值,大小(N imes 1 imes 1 imes 1),范围(l_n in [0, 1, 2, ..., K - 1])之间的整数,数据的真实类别为(l_n)。
- 前向计算时,loss的计算公式为: (E=-frac{1}{N}sumlimits_{n=1}^{N} sumlimits_{k=1}^{K} y_{n,k}*log hat{p}_{n,k}= -frac{1}{N}sumlimits_{n=1}^{N} log(hat{p}_{n,l_n}))
- (y_{n,k})表示第(n)个数据的属于第(k)类的真实概率,(y_{n,k}=left{egin{matrix}1 & k=l_n\0 & k eq l_nend{matrix} ight.)
- 反向计算时,预测blob的梯度的计算公式为:(frac{partial J}{partial {hat{p}_{n,l_n}}} = frac{partial J}{partial E}*frac{partial E}{partial {hat{p}_{n,l_n}}}=-frac{1}{N}*frac{partial J}{partial E}*frac{1}{hat{p}_{n,l_n}})
- (J)表示整个网络的loss值
multinomial_logistic_loss_layer.cpp源码
template <typename Dtype>
void MultinomialLogisticLossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
LossLayer<Dtype>::Reshape(bottom, top); //调用基类的Reshape(),检查输入blob的第0维大小相等,调整输出blob为一维数据
CHECK_EQ(bottom[1]->channels(), 1); //检查,标签blob的形状必须为[N,1,1,1]
CHECK_EQ(bottom[1]->height(), 1);
CHECK_EQ(bottom[1]->width(), 1);
}
template <typename Dtype>
void MultinomialLogisticLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //前向计算,计算loss值
const Dtype* bottom_data = bottom[0]->cpu_data(); //预测blob的数据指针
const Dtype* bottom_label = bottom[1]->cpu_data(); //标签blob的数据指针
int num = bottom[0]->num(); //数据的个数
int dim = bottom[0]->count() / bottom[0]->num(); //K=C*H*W表示类别总数
Dtype loss = 0;
for (int i = 0; i < num; ++i) {
int label = static_cast<int>(bottom_label[i]); //第i个数据对应的标签值,即数据属于第label类
// bottom_data[i * dim + label]为第i个数据对于第label类的预测概率,kLOG_THRESHOLD为一个较小值,防止|log(prob)|过大
Dtype prob = std::max(bottom_data[i * dim + label], Dtype(kLOG_THRESHOLD));
loss -= log(prob); //计算loss值
}
top[0]->mutable_cpu_data()[0] = loss / num; //输出平均loss
}
template <typename Dtype>
void MultinomialLogisticLossLayer<Dtype>::Backward_cpu(
const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[1]) { //标签blob禁止梯度反传,报错
LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
}
if (propagate_down[0]) { //预测blob需要反传梯度
const Dtype* bottom_data = bottom[0]->cpu_data(); //预测blob的数据指针
const Dtype* bottom_label = bottom[1]->cpu_data(); //标签blob的数据指针
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff(); //预测blob的梯度数据指针
int num = bottom[0]->num(); //数据的个数
int dim = bottom[0]->count() / bottom[0]->num(); //K=C*H*W表示类别总数
caffe_set(bottom[0]->count(), Dtype(0), bottom_diff); //先清空预测blob的梯度,置为0
const Dtype scale = - top[0]->cpu_diff()[0] / num; //系数,即为 -frac{1}{N}*frac{partial J}{partial E}
for (int i = 0; i < num; ++i) {
int label = static_cast<int>(bottom_label[i]); //数据的标签值,数据属于第label类
Dtype prob = std::max(bottom_data[i * dim + label], Dtype(kLOG_THRESHOLD)); //第i个数据在label类别上的预测概率
bottom_diff[i * dim + label] = scale / prob; //得到当前数据对应的梯度
}
}
}
SoftmaxWithLossLayer类简介
SoftmaxWithLossLayer类同样用于计算单标签的多分类问题的损失函数,原理上等同于SoftmaxLayer + MultinomialLogisticLossLayer,但是caffe中推荐使用SoftmaxWithLossLayer层,单层计算的运算损失比两层分开来计算要小,数值更稳定。
- 第一个输入blob为网络的预测值,大小( ilde{N} imes C imes ilde H imes ilde W),范围(x_{n,k} in [-infty, +infty])。计算loss时使用softmax函数值作为其概率,(hat{p}_{n,k} = frac{e^{x_{n,k}}}{sumlimits_{k'=1}^{K} e^{x_{n,k'}}})。
- 后续假设计算softmax时是沿着第1维(维度(C))进行的,则维度(C)的大小即为类别总数(K),数据的总个数为外部个数(对应代码中的
outer_num_
)乘上内部个数inner_num_
,即(N= ilde N * ilde H * ilde W)。
- 第二个输入blob为标签值,大小(N imes 1 imes 1 imes 1),范围(l_n in [0, 1, 2, ..., K - 1])之间的整数,数据的真实类别为(l_n)。
- caffe代码中并没有严格限制标签blob的形状必须是(N imes 1 imes 1 imes 1)的形式,只要求预测blob与标签blob的第0维相等(LossLayer中要求),和标签blob的总个数等于(N)。
- 前向计算时,与MultinomialLogisticLossLayer类相同,loss的计算公式为: (E=-frac{1}{N} sumlimits_{n=1}^N log(hat{p}_{n,l_n}))
- 反向计算时,预测blob的梯度的计算过程如下:
-
(frac{{partial {{hat p}_{n,{l_n}}}}}{{partial {x_{n,k}}}}{ m{ = }}{left( {frac{{{e^{{x_{n,{l_n}}}}}}}{{{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}}}} ight)_{{x_{n,k}}}}^prime)
({ m{ = }}left{ {egin{array}{*{20}{c}}{frac{{ - {e^{{x_{n,{l_n}}}}}*{e^{{x_{n,k}}}}}}{{{{left( {{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}} ight)}^2}}} + frac{{{e^{{x_{n,{l_n}}}}}}}{{{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}}} = {{hat p}_{n,{l_n}}} - {{hat p}_{n,{l_n}}}*{{hat p}_{n,{l_n}}}{ m{ }},k = {l_n}}\{frac{{ - {e^{{x_{n,{l_n}}}}}*{e^{{x_{n,k}}}}}}{{{{left( {{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}} ight)}^2}}} = - {{hat p}_{n,{l_n}}}*{{hat p}_{n,k}}{ m{}},k e {l_n}}end{array}} ight.) -
(E = - frac{1}{N}sumlimits_{n = 1}^N {log } ({{hat p}_{n,{l_n}}}){ m{ = }} - frac{1}{N}left( {log {{hat p}_{1,{l_1}}} + log {{hat p}_{2,{l_2}}} + ... + log {{hat p}_{n,{l_n}}}} ight))
-
注意(frac{{partial E}}{{partial {{hat p}_{n,k'}}}})仅在(k'=l_n)时才为非0值。
-
(frac{{partial E}}{{partial {x_{n,k}}}} = sumlimits_{k' = 1}^K {frac{{partial E}}{{partial {{hat p}_{n,k'}}}}} frac{{partial {{hat p}_{n,k'}}}}{{partial {x_{n,k}}}} = - frac{1}{N}*frac{1}{{{{hat p}_{n,{l_n}}}}}*frac{{partial {{hat p}_{n,{l_n}}}}}{{partial {x_{n,k}}}})
(= left{ {egin{array}{*{20}{c}}{ - frac{1}{N}*frac{1}{{{{hat p}_{n,{l_n}}}}}*left( {{{hat p}_{n,{l_n}}} - {{hat p}_{n,{l_n}}}*{{hat p}_{n,{l_n}}}} ight) = frac{1}{N}left( {{{hat p}_{n,{l_n}}} - 1} ight){ m{ }},k = {l_n}}\{ - frac{1}{N}*frac{1}{{{{hat p}_{n,{l_n}}}}}*left( { - {{hat p}_{n,{l_n}}}*{{hat p}_{n,k}}} ight) = frac{1}{N}{{hat p}_{n,k}}{ m{}},k e {l_n}}end{array}} ight.) -
最后可计算:(frac{partial J}{partial {x_{n,k}}} = frac{partial J}{partial E}*frac{partial E}{partial {x_{n,k}}})
softmax_loss_layer.cpp源码
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //layer初始化
LossLayer<Dtype>::LayerSetUp(bottom, top); //调用父类的函数
LayerParameter softmax_param(this->layer_param_); //当前层的参数
softmax_param.set_type("Softmax"); //层的类型为"Softmax"
softmax_layer_ = LayerRegistry<Dtype>::CreateLayer(softmax_param); //根据层的类型创建一个Softmax层
softmax_bottom_vec_.clear();
softmax_bottom_vec_.push_back(bottom[0]); //Softmax层的输入blob与当前层的输入blob形状相同
softmax_top_vec_.clear();
softmax_top_vec_.push_back(&prob_); //将prob_用于存储Softmax层的输出数据
softmax_layer_->SetUp(softmax_bottom_vec_, softmax_top_vec_); //调用Softmax层的SetUp函数
has_ignore_label_ = this->layer_param_.loss_param().has_ignore_label(); //设置了无效标签
if (has_ignore_label_) {
ignore_label_ = this->layer_param_.loss_param().ignore_label(); //将参数中的无效标签保存在ignore_label_中
}
if (!this->layer_param_.loss_param().has_normalization() &&
this->layer_param_.loss_param().has_normalize()) { //未设置normalization(新版本)参数但是设置了normalize(旧版本)参数
//normalize为true时,使用VALID规范化形式,为false时,使用BATCH_SIZE规范化形式
normalization_ = this->layer_param_.loss_param().normalize() ?
LossParameter_NormalizationMode_VALID :
LossParameter_NormalizationMode_BATCH_SIZE;
} else {
normalization_ = this->layer_param_.loss_param().normalization(); //使用normalization中的设置
}
}
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //调整输入输出blob的
LossLayer<Dtype>::Reshape(bottom, top); //使用基类的函数,设置输出blob的形状为一维
softmax_layer_->Reshape(softmax_bottom_vec_, softmax_top_vec_); //调整Softmax层的输入输出blob的形状
//softmax_param().axis()可正可负,表示沿着第axis()维计算softmax值,其他维度之间的数据在计算时相互独立
//同时,第softmax_axis_维的大小也表示数据的类别总数,例如图像的C
softmax_axis_ = bottom[0]->CanonicalAxisIndex(this->layer_param_.softmax_param().axis());
outer_num_ = bottom[0]->count(0, softmax_axis_); //第[0, softmax_axis_)维之间的大小,作为数据的外部个数,例如图像的N
inner_num_ = bottom[0]->count(softmax_axis_ + 1); //第[softmax_axis_+1, end)维之间的大小,作为数据的内部总数,例如图像的H*W
CHECK_EQ(outer_num_ * inner_num_, bottom[1]->count()) //外部个数乘上内部总数,需等于输出blob的总大小
<< "Number of labels must match number of predictions; "
<< "e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), "
<< "label count (number of labels) must be N*H*W, "
<< "with integer values in {0, 1, ..., C-1}.";
if (top.size() >= 2) { //多个输出blob
// softmax output
top[1]->ReshapeLike(*bottom[0]); //将top[1]作为内部创建的Softmax层的输出
}
}
template <typename Dtype>
Dtype SoftmaxWithLossLayer<Dtype>::get_normalizer(
LossParameter_NormalizationMode normalization_mode, int valid_count) { //根据规范化类型和有效数据个数,计算规范化的系数
Dtype normalizer; //规范化系数
switch (normalization_mode) { //规范化类型
case LossParameter_NormalizationMode_FULL:
normalizer = Dtype(outer_num_ * inner_num_); //FULL模式,规范化系数为外部个数乘上内部个数
break;
case LossParameter_NormalizationMode_VALID: //VALID模式,规范化系数为有效数据的个数
if (valid_count == -1) { //valid_count为-1,表示所有数据均为有效数据,则与FULL模式等同
normalizer = Dtype(outer_num_ * inner_num_);
} else {
normalizer = Dtype(valid_count);
}
break;
case LossParameter_NormalizationMode_BATCH_SIZE: //BATCH_SIZE模式,规范化系数为数据的外部个数
normalizer = Dtype(outer_num_);
break;
case LossParameter_NormalizationMode_NONE: //NONE模式,规范化系数为1
normalizer = Dtype(1);
break;
default:
LOG(FATAL) << "Unknown normalization mode: "
<< LossParameter_NormalizationMode_Name(normalization_mode);
}
// Some users will have no labels for some examples in order to 'turn off' a
// particular loss in a multi-task setup. The max prevents NaNs in that case.
return std::max(Dtype(1.0), normalizer); //防止有效数据的个数为0,设置最小值为1
}
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) { //前向计算
// The forward pass computes the softmax prob values.
softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_); //先执行Softmax层的前向计算过程
//输入blob的形状为(N, C, H, W),假设沿第1维计算softmax,则Softmax层的输出prob_为(N, C, H, W)形状,
//标签的个数为N * H * W, outer_num_ = N, inner_num_ = H * W
const Dtype* prob_data = prob_.cpu_data(); //Softmax层的输出数据
const Dtype* label = bottom[1]->cpu_data(); //标签数据
int dim = prob_.count() / outer_num_; // C * H * W
int count = 0; //有效数据个数
Dtype loss = 0;
for (int i = 0; i < outer_num_; ++i) {
for (int j = 0; j < inner_num_; j++) {
const int label_value = static_cast<int>(label[i * inner_num_ + j]); //第(i, j)位置的数据的标签值
if (has_ignore_label_ && label_value == ignore_label_) {
continue; //设置了无效标签并且当前标签无效,则忽略
}
DCHECK_GE(label_value, 0); //检查,标签值不小于0
DCHECK_LT(label_value, prob_.shape(softmax_axis_)); //检查,标签值小于类别总数
//获取第(i, j)位置的数据在label_value类别上的预测值,并计算loss
loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j], Dtype(FLT_MIN)));
++count; //有效个数自增
}
}
top[0]->mutable_cpu_data()[0] = loss / get_normalizer(normalization_, count); //计算规范化系数,得到最终loss
if (top.size() == 2) {
top[1]->ShareData(prob_); //将Softmax层的输出作为SoftmaxWithLossLayer层的top[1]输出
}
}
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[1]) { //同样,标签blob不允许设置反传
LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
}
if (propagate_down[0]) { //允许反传
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff(); //预测blob的梯度数据指针
const Dtype* prob_data = prob_.cpu_data(); //Softmax层的输出数据指针
caffe_copy(prob_.count(), prob_data, bottom_diff); //bottom_diff = prob_data
const Dtype* label = bottom[1]->cpu_data(); //标签blob的数据指针
int dim = prob_.count() / outer_num_; // C * H * W
int count = 0; //有效数据个数
for (int i = 0; i < outer_num_; ++i) {
for (int j = 0; j < inner_num_; ++j) {
const int label_value = static_cast<int>(label[i * inner_num_ + j]); //第(i, j)位置的数据的标签值
if (has_ignore_label_ && label_value == ignore_label_) { //当前标签无效
for (int c = 0; c < bottom[0]->shape(softmax_axis_); ++c) { //维度C上的每个值
bottom_diff[i * dim + c * inner_num_ + j] = 0; //将预测blob的第(i, j)位置的维度C上的每个值的梯度都清零
}
} else {
bottom_diff[i * dim + label_value * inner_num_ + j] -= 1; //只在维度C上的第label_value类别上减1,bottom_diff -= 1
++count;
}
}
}
// Scale gradient
Dtype loss_weight = top[0]->cpu_diff()[0] / get_normalizer(normalization_, count); //计算系数
caffe_scal(prob_.count(), loss_weight, bottom_diff); //bottom_diff *= loss_weight
}
}
SigmoidCrossEntropyLossLayer类简介
SigmoidCrossEntropyLossLayer类用于计算多标签的二分类的交叉熵损失,每个数据允许有多个标签,但是每个标签只有0或1两种类别。
- 第一个输入blob为网络的预测值,大小(N imes C imes H imes W),范围(x_n in [-infty, +infty])。计算loss时使用Sigmoid函数值作为其概率,(hat{p}_n = sigma(x_n))
- 第二个输入blob为标签值,大小(N imes C imes H imes W),范围(p_n in [0, 1])。
- 同样,代码中并没有严格限制预测blob与标签blob的形状必须相同,只要求两个blob的第0维相等,和总个数相等。
- 前向计算时,loss的计算公式为: (E = -frac{1}{N} sumlimits_{n=1}^N left[p_n log hat{p}_n + (1 - p_n) log(1 - hat{p}_n) ight])
- (N)即为下文源码中的
normalizer_
。 - 注意代码中为了防止(e^{-x})在计算时过大,稍微变换了下计算公式:
- (E=-frac{1}{N} sumlimits_{n=1}^N [p_n log sigma(x_n) + (1 - p_n) log(1 - sigma(x_n))]\=-frac{1}{N} sumlimits_{n=1}^N [x_n (p_n-1)+log sigma(x_n)] \ =left{egin{matrix} -frac{1}{N} sumlimits_{n=1}^N [x_n (p_n-1)-log (1+e^{-x_n}))] & x_n geqslant 0\ -frac{1}{N} sumlimits_{n=1}^N [x_n p_n-log (1+e^{x_n}))] & x_n<0 end{matrix} ight.)
- 反向计算时,预测blob的梯度的计算公式为:(frac{partial J}{partial {x_n}} = frac{partial J}{partial E}*frac{partial E}{partial {x_n}}=frac{1}{N}*frac{partial J}{partial E}*(sigma(x_n)-p_n))
- (J)表示整个网络的loss值,(frac{partial J}{partial E})即为代码中的
top[0]->cpu_diff()[0]
sigmoid_cross_entropy_loss_layer.cpp源码
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::LayerSetUp(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
LossLayer<Dtype>::LayerSetUp(bottom, top); //调用基类的LayerSetUp(),设置输出blob的loss权重
sigmoid_bottom_vec_.clear();
sigmoid_bottom_vec_.push_back(bottom[0]); //先清空,再将预测值存入,作为SigmoidLayer层的输入
sigmoid_top_vec_.clear();
sigmoid_top_vec_.push_back(sigmoid_output_.get()); //清空,作为SigmoidLayer层的输出
sigmoid_layer_->SetUp(sigmoid_bottom_vec_, sigmoid_top_vec_); //创建SigmoidLayer层,调整输出blob的形状等
has_ignore_label_ = this->layer_param_.loss_param().has_ignore_label(); //如果设置了无效类别
if (has_ignore_label_) {
ignore_label_ = this->layer_param_.loss_param().ignore_label(); //将无效的类别保存在当前层中
}
if (this->layer_param_.loss_param().has_normalization()) { //如果设置了规范化方式
normalization_ = this->layer_param_.loss_param().normalization(); //将其保存在当前层中
} else if (this->layer_param_.loss_param().has_normalize()) { //如果设置了规范化方式,normalize(旧版本)
// normalize为true则为VALID规范化方式,为false则为BATCH_SIZE规范化方式
normalization_ = this->layer_param_.loss_param().normalize() ?
LossParameter_NormalizationMode_VALID :
LossParameter_NormalizationMode_BATCH_SIZE;
} else {
//默认使用BATCH_SIZE方式,只有SigmoidCrossEntropyLoss的默认规范化方式为BATCH_SIZE,其他的默认方式为VALID
normalization_ = LossParameter_NormalizationMode_BATCH_SIZE;
}
}
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Reshape(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
LossLayer<Dtype>::Reshape(bottom, top); //调用基类的Reshape()函数,检查输入输出的形状
outer_num_ = bottom[0]->shape(0); //数据的外部个数N, batch size
inner_num_ = bottom[0]->count(1); //数据的内部个数C*H*W, instance size: |output| == |target|
CHECK_EQ(bottom[0]->count(), bottom[1]->count()) <<
"SIGMOID_CROSS_ENTROPY_LOSS layer inputs must have the same count."; //检查预测与标签blob的总个数是否相等
sigmoid_layer_->Reshape(sigmoid_bottom_vec_, sigmoid_top_vec_); //调用SigmoidLayer的Reshape()函数,调整形状
}
// TODO(shelhamer) loss normalization should be pulled up into LossLayer,
// instead of duplicated here and in SoftMaxWithLossLayer
template <typename Dtype>
Dtype SigmoidCrossEntropyLossLayer<Dtype>::get_normalizer( //根据规范化类型和有效数据个数,计算规范化的系数
LossParameter_NormalizationMode normalization_mode, int valid_count) {
Dtype normalizer;
switch (normalization_mode) { //规范化类型
case LossParameter_NormalizationMode_FULL:
normalizer = Dtype(outer_num_ * inner_num_); //FULL模式,规范化系数为外部个数乘上内部个数
break;
case LossParameter_NormalizationMode_VALID: //VALID模式,规范化系数为有效数据的个数
if (valid_count == -1) { //valid_count为-1,表示所有数据均为有效数据,则与FULL模式等同
normalizer = Dtype(outer_num_ * inner_num_);
} else {
normalizer = Dtype(valid_count);
}
break;
case LossParameter_NormalizationMode_BATCH_SIZE: //BATCH_SIZE模式,规范化系数为数据的外部个数
normalizer = Dtype(outer_num_);
break;
case LossParameter_NormalizationMode_NONE: //NONE模式,规范化系数为1
normalizer = Dtype(1);
break;
default: //其他类型,返回错误
LOG(FATAL) << "Unknown normalization mode: " << LossParameter_NormalizationMode_Name(normalization_mode);
}
// Some users will have no labels for some examples in order to 'turn off' a
// particular loss in a multi-task setup. The max prevents NaNs in that case.
return std::max(Dtype(1.0), normalizer); //设置最小为1.某些数据可能不存在标签,valid_count可能为0,防止后续错误
}
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// The forward pass computes the sigmoid outputs.
sigmoid_bottom_vec_[0] = bottom[0]; //将预测值作为SigmoidLayer层的输入
sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_); //计算SigmoidLayer层的输出sigmoid_top_vec_
// Compute the loss (negative log likelihood)
// Stable version of loss computation from input data
const Dtype* input_data = bottom[0]->cpu_data(); //预测值
const Dtype* target = bottom[1]->cpu_data(); //标签值
int valid_count = 0; //有效数据的个数
Dtype loss = 0;
for (int i = 0; i < bottom[0]->count(); ++i) {
const int target_value = static_cast<int>(target[i]); //第i个数据的标签值
if (has_ignore_label_ && target_value == ignore_label_) { //如果设置了无效标签,并且当前标签即为无效值
continue; //则忽略
}
//x = input_data[i] < 0时, loss -= x*p-log(1+exp(x))
//x > 0时, loss -= x*(p-1)-log(1+exp(-x)),此处为了防止exp(x)的值过大
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
++valid_count; //有效个数自增
}
normalizer_ = get_normalizer(normalization_, valid_count); //计算规范化系数
top[0]->mutable_cpu_data()[0] = loss / normalizer_; //除以规范化系数,得到最终的loss值
}
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Backward_cpu(
const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[1]) { //同样,标签对应的输出blob不允许设置反传
LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
}
if (propagate_down[0]) { //预测blob需要反传
// First, compute the diff
const int count = bottom[0]->count(); //数据个数
const Dtype* sigmoid_output_data = sigmoid_output_->cpu_data(); //SigmoidLayer层的输出数据σ(x)
const Dtype* target = bottom[1]->cpu_data(); //标签值
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff(); //预测blob的梯度数据指针
caffe_sub(count, sigmoid_output_data, target, bottom_diff); //bottom_diff = sigmoid_output_data - target
// Zero out gradient of ignored targets.
if (has_ignore_label_) { //如果设置了无效标签
for (int i = 0; i < count; ++i) {
const int target_value = static_cast<int>(target[i]); //第i个数据的标签值
if (target_value == ignore_label_) { //当前数据为无效标签
bottom_diff[i] = 0; //梯度置为0
}
}
}
// Scale down gradient
Dtype loss_weight = top[0]->cpu_diff()[0] / normalizer_; //系数
caffe_scal(count, loss_weight, bottom_diff); //bottom_diff *= loss_weight
}
}
小结
- 梯度累加是针对layer的参数blob,layer的输入输出blob的梯度是不会累加的
参考
http://freemind.pluskid.org/machine-learning/softmax-vs-softmax-loss-numerical-stability/
Caffe的源码笔者是第一次阅读,一边阅读一边记录,对代码的理解和分析可能会存在错误或遗漏,希望各位读者批评指正,谢谢支持!