• Caffe源码-LossLayer类(上)


    LossLayer类简介

    LossLayer类是caffe中各种loss layer的基类,本身并不涉及网络的loss的具体计算,只是规定loss layer的一些通用属性,如输出blob的loss权重默认为1,预测数据与标签数据的维度匹配等。

    loss_layer.cpp源码

    template <typename Dtype>
    void LossLayer<Dtype>::LayerSetUp(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      // LossLayers have a non-zero (1) loss by default.
      if (this->layer_param_.loss_weight_size() == 0) {   //loss layer默认权重为1
        this->layer_param_.add_loss_weight(Dtype(1));     //layer param中未设置则置为1
      }
    }
    template <typename Dtype>
    void LossLayer<Dtype>::Reshape(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      CHECK_EQ(bottom[0]->shape(0), bottom[1]->shape(0))
          << "The data and label should have the same first dimension.";    //第0维的值必须相等
      vector<int> loss_shape(0);    // Loss layers output a scalar; 0 axes.
      top[0]->Reshape(loss_shape);  //调整输出blob的大小,一维,大小为0
    }
    

    MultinomialLogisticLossLayer类简介

    MultinomialLogisticLossLayer类用于计算单标签的多分类任务的logistic loss,每个数据只允许有一个标签值,但是可以划分成多种类别。

    1. 第一个输入blob为网络的预测概率,大小(N imes C imes H imes W),范围(hat{p}_{n,k} in [0, 1]),第(n)个数据的属于第(k)类的预测概率为(hat{p}_{n,k}),且(forall n, sumlimits_{k=1}^K hat{p}_{n,k} = 1)
    • 其中(N)为数据的个数,(K=C imes H imes W)为类别总数
    1. 第二个输入blob为标签值,大小(N imes 1 imes 1 imes 1),范围(l_n in [0, 1, 2, ..., K - 1])之间的整数,数据的真实类别为(l_n)
    2. 前向计算时,loss的计算公式为: (E=-frac{1}{N}sumlimits_{n=1}^{N} sumlimits_{k=1}^{K} y_{n,k}*log hat{p}_{n,k}= -frac{1}{N}sumlimits_{n=1}^{N} log(hat{p}_{n,l_n}))
    • (y_{n,k})表示第(n)个数据的属于第(k)类的真实概率,(y_{n,k}=left{egin{matrix}1 & k=l_n\0 & k eq l_nend{matrix} ight.)
    1. 反向计算时,预测blob的梯度的计算公式为:(frac{partial J}{partial {hat{p}_{n,l_n}}} = frac{partial J}{partial E}*frac{partial E}{partial {hat{p}_{n,l_n}}}=-frac{1}{N}*frac{partial J}{partial E}*frac{1}{hat{p}_{n,l_n}})
    • (J)表示整个网络的loss值

    multinomial_logistic_loss_layer.cpp源码

    template <typename Dtype>
    void MultinomialLogisticLossLayer<Dtype>::Reshape(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      LossLayer<Dtype>::Reshape(bottom, top);   //调用基类的Reshape(),检查输入blob的第0维大小相等,调整输出blob为一维数据
      CHECK_EQ(bottom[1]->channels(), 1);   //检查,标签blob的形状必须为[N,1,1,1]
      CHECK_EQ(bottom[1]->height(), 1);
      CHECK_EQ(bottom[1]->width(), 1);
    }
    
    template <typename Dtype>
    void MultinomialLogisticLossLayer<Dtype>::Forward_cpu(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {  //前向计算,计算loss值
      const Dtype* bottom_data = bottom[0]->cpu_data();   //预测blob的数据指针
      const Dtype* bottom_label = bottom[1]->cpu_data();  //标签blob的数据指针
      int num = bottom[0]->num();                         //数据的个数
      int dim = bottom[0]->count() / bottom[0]->num();    //K=C*H*W表示类别总数
      Dtype loss = 0;
      for (int i = 0; i < num; ++i) {
        int label = static_cast<int>(bottom_label[i]);    //第i个数据对应的标签值,即数据属于第label类
        // bottom_data[i * dim + label]为第i个数据对于第label类的预测概率,kLOG_THRESHOLD为一个较小值,防止|log(prob)|过大
        Dtype prob = std::max(bottom_data[i * dim + label], Dtype(kLOG_THRESHOLD));
        loss -= log(prob);    //计算loss值
      }
      top[0]->mutable_cpu_data()[0] = loss / num; //输出平均loss
    }
    
    template <typename Dtype>
    void MultinomialLogisticLossLayer<Dtype>::Backward_cpu(
        const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
        const vector<Blob<Dtype>*>& bottom) {
      if (propagate_down[1]) {    //标签blob禁止梯度反传,报错
        LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
      }
      if (propagate_down[0]) {    //预测blob需要反传梯度
        const Dtype* bottom_data = bottom[0]->cpu_data();       //预测blob的数据指针
        const Dtype* bottom_label = bottom[1]->cpu_data();      //标签blob的数据指针
        Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();     //预测blob的梯度数据指针
        int num = bottom[0]->num();                             //数据的个数
        int dim = bottom[0]->count() / bottom[0]->num();        //K=C*H*W表示类别总数
        caffe_set(bottom[0]->count(), Dtype(0), bottom_diff);   //先清空预测blob的梯度,置为0
        const Dtype scale = - top[0]->cpu_diff()[0] / num;      //系数,即为 -frac{1}{N}*frac{partial J}{partial E}
        for (int i = 0; i < num; ++i) {
          int label = static_cast<int>(bottom_label[i]);      //数据的标签值,数据属于第label类
          Dtype prob = std::max(bottom_data[i * dim + label], Dtype(kLOG_THRESHOLD)); //第i个数据在label类别上的预测概率
          bottom_diff[i * dim + label] = scale / prob;        //得到当前数据对应的梯度
        }
      }
    }
    

    SoftmaxWithLossLayer类简介

    SoftmaxWithLossLayer类同样用于计算单标签的多分类问题的损失函数,原理上等同于SoftmaxLayer + MultinomialLogisticLossLayer,但是caffe中推荐使用SoftmaxWithLossLayer层,单层计算的运算损失比两层分开来计算要小,数值更稳定。

    1. 第一个输入blob为网络的预测值,大小( ilde{N} imes C imes ilde H imes ilde W),范围(x_{n,k} in [-infty, +infty])。计算loss时使用softmax函数值作为其概率,(hat{p}_{n,k} = frac{e^{x_{n,k}}}{sumlimits_{k'=1}^{K} e^{x_{n,k'}}})
    • 后续假设计算softmax时是沿着第1维(维度(C))进行的,则维度(C)的大小即为类别总数(K),数据的总个数为外部个数(对应代码中的outer_num_)乘上内部个数inner_num_,即(N= ilde N * ilde H * ilde W)
    1. 第二个输入blob为标签值,大小(N imes 1 imes 1 imes 1),范围(l_n in [0, 1, 2, ..., K - 1])之间的整数,数据的真实类别为(l_n)
    • caffe代码中并没有严格限制标签blob的形状必须是(N imes 1 imes 1 imes 1)的形式,只要求预测blob与标签blob的第0维相等(LossLayer中要求),和标签blob的总个数等于(N)
    1. 前向计算时,与MultinomialLogisticLossLayer类相同,loss的计算公式为: (E=-frac{1}{N} sumlimits_{n=1}^N log(hat{p}_{n,l_n}))
    2. 反向计算时,预测blob的梯度的计算过程如下:
    • (frac{{partial {{hat p}_{n,{l_n}}}}}{{partial {x_{n,k}}}}{ m{ = }}{left( {frac{{{e^{{x_{n,{l_n}}}}}}}{{{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}}}} ight)_{{x_{n,k}}}}^prime)
      ({ m{ = }}left{ {egin{array}{*{20}{c}}{frac{{ - {e^{{x_{n,{l_n}}}}}*{e^{{x_{n,k}}}}}}{{{{left( {{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}} ight)}^2}}} + frac{{{e^{{x_{n,{l_n}}}}}}}{{{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}}} = {{hat p}_{n,{l_n}}} - {{hat p}_{n,{l_n}}}*{{hat p}_{n,{l_n}}}{ m{ }},k = {l_n}}\{frac{{ - {e^{{x_{n,{l_n}}}}}*{e^{{x_{n,k}}}}}}{{{{left( {{e^{{x_{n,1}}}}{ m{ + }}{e^{{x_{n,{ m{2}}}}}}{ m{ + }}...{ m{ + }}{e^{{x_{n,K}}}}} ight)}^2}}} = - {{hat p}_{n,{l_n}}}*{{hat p}_{n,k}}{ m{}},k e {l_n}}end{array}} ight.)

    • (E = - frac{1}{N}sumlimits_{n = 1}^N {log } ({{hat p}_{n,{l_n}}}){ m{ = }} - frac{1}{N}left( {log {{hat p}_{1,{l_1}}} + log {{hat p}_{2,{l_2}}} + ... + log {{hat p}_{n,{l_n}}}} ight))

    • 注意(frac{{partial E}}{{partial {{hat p}_{n,k'}}}})仅在(k'=l_n)时才为非0值。

    • (frac{{partial E}}{{partial {x_{n,k}}}} = sumlimits_{k' = 1}^K {frac{{partial E}}{{partial {{hat p}_{n,k'}}}}} frac{{partial {{hat p}_{n,k'}}}}{{partial {x_{n,k}}}} = - frac{1}{N}*frac{1}{{{{hat p}_{n,{l_n}}}}}*frac{{partial {{hat p}_{n,{l_n}}}}}{{partial {x_{n,k}}}})
      (= left{ {egin{array}{*{20}{c}}{ - frac{1}{N}*frac{1}{{{{hat p}_{n,{l_n}}}}}*left( {{{hat p}_{n,{l_n}}} - {{hat p}_{n,{l_n}}}*{{hat p}_{n,{l_n}}}} ight) = frac{1}{N}left( {{{hat p}_{n,{l_n}}} - 1} ight){ m{ }},k = {l_n}}\{ - frac{1}{N}*frac{1}{{{{hat p}_{n,{l_n}}}}}*left( { - {{hat p}_{n,{l_n}}}*{{hat p}_{n,k}}} ight) = frac{1}{N}{{hat p}_{n,k}}{ m{}},k e {l_n}}end{array}} ight.)

    • 最后可计算:(frac{partial J}{partial {x_{n,k}}} = frac{partial J}{partial E}*frac{partial E}{partial {x_{n,k}}})

    softmax_loss_layer.cpp源码

    template <typename Dtype>
    void SoftmaxWithLossLayer<Dtype>::LayerSetUp(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {  //layer初始化
      LossLayer<Dtype>::LayerSetUp(bottom, top);          //调用父类的函数
      LayerParameter softmax_param(this->layer_param_);   //当前层的参数
      softmax_param.set_type("Softmax");    //层的类型为"Softmax"
      softmax_layer_ = LayerRegistry<Dtype>::CreateLayer(softmax_param);  //根据层的类型创建一个Softmax层
      softmax_bottom_vec_.clear();
      softmax_bottom_vec_.push_back(bottom[0]);   //Softmax层的输入blob与当前层的输入blob形状相同
      softmax_top_vec_.clear();
      softmax_top_vec_.push_back(&prob_);         //将prob_用于存储Softmax层的输出数据
      softmax_layer_->SetUp(softmax_bottom_vec_, softmax_top_vec_);   //调用Softmax层的SetUp函数
    
      has_ignore_label_ = this->layer_param_.loss_param().has_ignore_label(); //设置了无效标签
      if (has_ignore_label_) {
        ignore_label_ = this->layer_param_.loss_param().ignore_label(); //将参数中的无效标签保存在ignore_label_中
      }
      if (!this->layer_param_.loss_param().has_normalization() &&
          this->layer_param_.loss_param().has_normalize()) {    //未设置normalization(新版本)参数但是设置了normalize(旧版本)参数
        //normalize为true时,使用VALID规范化形式,为false时,使用BATCH_SIZE规范化形式
        normalization_ = this->layer_param_.loss_param().normalize() ?
                         LossParameter_NormalizationMode_VALID :
                         LossParameter_NormalizationMode_BATCH_SIZE;
      } else {
        normalization_ = this->layer_param_.loss_param().normalization(); //使用normalization中的设置
      }
    }
    
    template <typename Dtype>
    void SoftmaxWithLossLayer<Dtype>::Reshape(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {    //调整输入输出blob的
      LossLayer<Dtype>::Reshape(bottom, top);   //使用基类的函数,设置输出blob的形状为一维
      softmax_layer_->Reshape(softmax_bottom_vec_, softmax_top_vec_); //调整Softmax层的输入输出blob的形状
      //softmax_param().axis()可正可负,表示沿着第axis()维计算softmax值,其他维度之间的数据在计算时相互独立
      //同时,第softmax_axis_维的大小也表示数据的类别总数,例如图像的C
      softmax_axis_ = bottom[0]->CanonicalAxisIndex(this->layer_param_.softmax_param().axis());
      outer_num_ = bottom[0]->count(0, softmax_axis_);    //第[0, softmax_axis_)维之间的大小,作为数据的外部个数,例如图像的N
      inner_num_ = bottom[0]->count(softmax_axis_ + 1);   //第[softmax_axis_+1, end)维之间的大小,作为数据的内部总数,例如图像的H*W
      CHECK_EQ(outer_num_ * inner_num_, bottom[1]->count())   //外部个数乘上内部总数,需等于输出blob的总大小
          << "Number of labels must match number of predictions; "
          << "e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), "
          << "label count (number of labels) must be N*H*W, "
          << "with integer values in {0, 1, ..., C-1}.";
      if (top.size() >= 2) {              //多个输出blob
        // softmax output
        top[1]->ReshapeLike(*bottom[0]);  //将top[1]作为内部创建的Softmax层的输出
      }
    }
    
    template <typename Dtype>
    Dtype SoftmaxWithLossLayer<Dtype>::get_normalizer(
        LossParameter_NormalizationMode normalization_mode, int valid_count) {  //根据规范化类型和有效数据个数,计算规范化的系数
      Dtype normalizer;               //规范化系数
      switch (normalization_mode) {   //规范化类型
        case LossParameter_NormalizationMode_FULL:
          normalizer = Dtype(outer_num_ * inner_num_);  //FULL模式,规范化系数为外部个数乘上内部个数
          break;
        case LossParameter_NormalizationMode_VALID:     //VALID模式,规范化系数为有效数据的个数
          if (valid_count == -1) {                      //valid_count为-1,表示所有数据均为有效数据,则与FULL模式等同
            normalizer = Dtype(outer_num_ * inner_num_);
          } else {
            normalizer = Dtype(valid_count);
          }
          break;
        case LossParameter_NormalizationMode_BATCH_SIZE:  //BATCH_SIZE模式,规范化系数为数据的外部个数
          normalizer = Dtype(outer_num_);
          break;
        case LossParameter_NormalizationMode_NONE:        //NONE模式,规范化系数为1
          normalizer = Dtype(1);
          break;
        default:
          LOG(FATAL) << "Unknown normalization mode: "
              << LossParameter_NormalizationMode_Name(normalization_mode);
      }
      // Some users will have no labels for some examples in order to 'turn off' a
      // particular loss in a multi-task setup. The max prevents NaNs in that case.
      return std::max(Dtype(1.0), normalizer);    //防止有效数据的个数为0,设置最小值为1
    }
    
    template <typename Dtype>
    void SoftmaxWithLossLayer<Dtype>::Forward_cpu(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {    //前向计算
      // The forward pass computes the softmax prob values.
      softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_); //先执行Softmax层的前向计算过程
      //输入blob的形状为(N, C, H, W),假设沿第1维计算softmax,则Softmax层的输出prob_为(N, C, H, W)形状,
      //标签的个数为N * H * W, outer_num_ = N, inner_num_ = H * W
      const Dtype* prob_data = prob_.cpu_data();    //Softmax层的输出数据
      const Dtype* label = bottom[1]->cpu_data();   //标签数据
      int dim = prob_.count() / outer_num_;         // C * H * W
      int count = 0;          //有效数据个数
      Dtype loss = 0;
      for (int i = 0; i < outer_num_; ++i) {
        for (int j = 0; j < inner_num_; j++) {
          const int label_value = static_cast<int>(label[i * inner_num_ + j]);  //第(i, j)位置的数据的标签值
          if (has_ignore_label_ && label_value == ignore_label_) {
            continue;       //设置了无效标签并且当前标签无效,则忽略
          }
          DCHECK_GE(label_value, 0);    //检查,标签值不小于0
          DCHECK_LT(label_value, prob_.shape(softmax_axis_)); //检查,标签值小于类别总数
          //获取第(i, j)位置的数据在label_value类别上的预测值,并计算loss
          loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j], Dtype(FLT_MIN)));
          ++count;    //有效个数自增
        }
      }
      top[0]->mutable_cpu_data()[0] = loss / get_normalizer(normalization_, count); //计算规范化系数,得到最终loss
      if (top.size() == 2) {
        top[1]->ShareData(prob_);   //将Softmax层的输出作为SoftmaxWithLossLayer层的top[1]输出
      }
    }
    
    template <typename Dtype>
    void SoftmaxWithLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
        const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
      if (propagate_down[1]) {  //同样,标签blob不允许设置反传
        LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
      }
      if (propagate_down[0]) {  //允许反传
        Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();   //预测blob的梯度数据指针
        const Dtype* prob_data = prob_.cpu_data();            //Softmax层的输出数据指针
        caffe_copy(prob_.count(), prob_data, bottom_diff);    //bottom_diff = prob_data
        const Dtype* label = bottom[1]->cpu_data();           //标签blob的数据指针
        int dim = prob_.count() / outer_num_;                 // C * H * W
        int count = 0;                                        //有效数据个数
        for (int i = 0; i < outer_num_; ++i) {
          for (int j = 0; j < inner_num_; ++j) {
            const int label_value = static_cast<int>(label[i * inner_num_ + j]);  //第(i, j)位置的数据的标签值
            if (has_ignore_label_ && label_value == ignore_label_) {    //当前标签无效
              for (int c = 0; c < bottom[0]->shape(softmax_axis_); ++c) {   //维度C上的每个值
                bottom_diff[i * dim + c * inner_num_ + j] = 0;  //将预测blob的第(i, j)位置的维度C上的每个值的梯度都清零
              }
            } else {
              bottom_diff[i * dim + label_value * inner_num_ + j] -= 1; //只在维度C上的第label_value类别上减1,bottom_diff -= 1
              ++count;
            }
          }
        }
        // Scale gradient
        Dtype loss_weight = top[0]->cpu_diff()[0] / get_normalizer(normalization_, count);    //计算系数
        caffe_scal(prob_.count(), loss_weight, bottom_diff);  //bottom_diff *= loss_weight
      }
    }
    

    SigmoidCrossEntropyLossLayer类简介

    SigmoidCrossEntropyLossLayer类用于计算多标签的二分类的交叉熵损失,每个数据允许有多个标签,但是每个标签只有0或1两种类别。

    1. 第一个输入blob为网络的预测值,大小(N imes C imes H imes W),范围(x_n in [-infty, +infty])。计算loss时使用Sigmoid函数值作为其概率,(hat{p}_n = sigma(x_n))
    2. 第二个输入blob为标签值,大小(N imes C imes H imes W),范围(p_n in [0, 1])
    • 同样,代码中并没有严格限制预测blob与标签blob的形状必须相同,只要求两个blob的第0维相等,和总个数相等。
    1. 前向计算时,loss的计算公式为: (E = -frac{1}{N} sumlimits_{n=1}^N left[p_n log hat{p}_n + (1 - p_n) log(1 - hat{p}_n) ight])
    • (N)即为下文源码中的normalizer_
    • 注意代码中为了防止(e^{-x})在计算时过大,稍微变换了下计算公式:
    • (E=-frac{1}{N} sumlimits_{n=1}^N [p_n log sigma(x_n) + (1 - p_n) log(1 - sigma(x_n))]\=-frac{1}{N} sumlimits_{n=1}^N [x_n (p_n-1)+log sigma(x_n)] \ =left{egin{matrix} -frac{1}{N} sumlimits_{n=1}^N [x_n (p_n-1)-log (1+e^{-x_n}))] & x_n geqslant 0\ -frac{1}{N} sumlimits_{n=1}^N [x_n p_n-log (1+e^{x_n}))] & x_n<0 end{matrix} ight.)
    1. 反向计算时,预测blob的梯度的计算公式为:(frac{partial J}{partial {x_n}} = frac{partial J}{partial E}*frac{partial E}{partial {x_n}}=frac{1}{N}*frac{partial J}{partial E}*(sigma(x_n)-p_n))
    • (J)表示整个网络的loss值,(frac{partial J}{partial E})即为代码中的top[0]->cpu_diff()[0]

    sigmoid_cross_entropy_loss_layer.cpp源码

    template <typename Dtype>
    void SigmoidCrossEntropyLossLayer<Dtype>::LayerSetUp(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      LossLayer<Dtype>::LayerSetUp(bottom, top);    //调用基类的LayerSetUp(),设置输出blob的loss权重
      sigmoid_bottom_vec_.clear();
      sigmoid_bottom_vec_.push_back(bottom[0]);     //先清空,再将预测值存入,作为SigmoidLayer层的输入
      sigmoid_top_vec_.clear();
      sigmoid_top_vec_.push_back(sigmoid_output_.get());  //清空,作为SigmoidLayer层的输出
      sigmoid_layer_->SetUp(sigmoid_bottom_vec_, sigmoid_top_vec_); //创建SigmoidLayer层,调整输出blob的形状等
    
      has_ignore_label_ = this->layer_param_.loss_param().has_ignore_label();   //如果设置了无效类别
      if (has_ignore_label_) {
        ignore_label_ = this->layer_param_.loss_param().ignore_label();   //将无效的类别保存在当前层中
      }
      if (this->layer_param_.loss_param().has_normalization()) {          //如果设置了规范化方式
        normalization_ = this->layer_param_.loss_param().normalization(); //将其保存在当前层中
      } else if (this->layer_param_.loss_param().has_normalize()) {       //如果设置了规范化方式,normalize(旧版本)
        // normalize为true则为VALID规范化方式,为false则为BATCH_SIZE规范化方式
        normalization_ = this->layer_param_.loss_param().normalize() ?
                         LossParameter_NormalizationMode_VALID :
                         LossParameter_NormalizationMode_BATCH_SIZE;
      } else {
        //默认使用BATCH_SIZE方式,只有SigmoidCrossEntropyLoss的默认规范化方式为BATCH_SIZE,其他的默认方式为VALID
        normalization_ = LossParameter_NormalizationMode_BATCH_SIZE;
      }
    }
    
    template <typename Dtype>
    void SigmoidCrossEntropyLossLayer<Dtype>::Reshape(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      LossLayer<Dtype>::Reshape(bottom, top);   //调用基类的Reshape()函数,检查输入输出的形状
      outer_num_ = bottom[0]->shape(0);         //数据的外部个数N, batch size
      inner_num_ = bottom[0]->count(1);         //数据的内部个数C*H*W, instance size: |output| == |target|
      CHECK_EQ(bottom[0]->count(), bottom[1]->count()) <<
          "SIGMOID_CROSS_ENTROPY_LOSS layer inputs must have the same count.";  //检查预测与标签blob的总个数是否相等
      sigmoid_layer_->Reshape(sigmoid_bottom_vec_, sigmoid_top_vec_); //调用SigmoidLayer的Reshape()函数,调整形状
    }
    
    // TODO(shelhamer) loss normalization should be pulled up into LossLayer,
    // instead of duplicated here and in SoftMaxWithLossLayer
    template <typename Dtype>
    Dtype SigmoidCrossEntropyLossLayer<Dtype>::get_normalizer(    //根据规范化类型和有效数据个数,计算规范化的系数
        LossParameter_NormalizationMode normalization_mode, int valid_count) {
      Dtype normalizer;
      switch (normalization_mode) {   //规范化类型
        case LossParameter_NormalizationMode_FULL:
          normalizer = Dtype(outer_num_ * inner_num_);  //FULL模式,规范化系数为外部个数乘上内部个数
          break;
        case LossParameter_NormalizationMode_VALID:     //VALID模式,规范化系数为有效数据的个数
          if (valid_count == -1) {                      //valid_count为-1,表示所有数据均为有效数据,则与FULL模式等同
            normalizer = Dtype(outer_num_ * inner_num_);
          } else {
            normalizer = Dtype(valid_count);
          }
          break;
        case LossParameter_NormalizationMode_BATCH_SIZE:  //BATCH_SIZE模式,规范化系数为数据的外部个数
          normalizer = Dtype(outer_num_);
          break;
        case LossParameter_NormalizationMode_NONE:        //NONE模式,规范化系数为1
          normalizer = Dtype(1);
          break;
        default:    //其他类型,返回错误
          LOG(FATAL) << "Unknown normalization mode: " << LossParameter_NormalizationMode_Name(normalization_mode);
      }
      // Some users will have no labels for some examples in order to 'turn off' a
      // particular loss in a multi-task setup. The max prevents NaNs in that case.
      return std::max(Dtype(1.0), normalizer);  //设置最小为1.某些数据可能不存在标签,valid_count可能为0,防止后续错误
    }
    
    template <typename Dtype>
    void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      // The forward pass computes the sigmoid outputs.
      sigmoid_bottom_vec_[0] = bottom[0];      //将预测值作为SigmoidLayer层的输入
      sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_); //计算SigmoidLayer层的输出sigmoid_top_vec_
      // Compute the loss (negative log likelihood)
      // Stable version of loss computation from input data
      const Dtype* input_data = bottom[0]->cpu_data();    //预测值
      const Dtype* target = bottom[1]->cpu_data();        //标签值
      int valid_count = 0;    //有效数据的个数
      Dtype loss = 0;
      for (int i = 0; i < bottom[0]->count(); ++i) {
        const int target_value = static_cast<int>(target[i]);   //第i个数据的标签值
        if (has_ignore_label_ && target_value == ignore_label_) {   //如果设置了无效标签,并且当前标签即为无效值
          continue;           //则忽略
        }
        //x = input_data[i] < 0时, loss -= x*p-log(1+exp(x))
        //x > 0时, loss -= x*(p-1)-log(1+exp(-x)),此处为了防止exp(x)的值过大
        loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
            log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
        ++valid_count;    //有效个数自增
      }
      normalizer_ = get_normalizer(normalization_, valid_count);  //计算规范化系数
      top[0]->mutable_cpu_data()[0] = loss / normalizer_;         //除以规范化系数,得到最终的loss值
    }
    
    template <typename Dtype>
    void SigmoidCrossEntropyLossLayer<Dtype>::Backward_cpu(
        const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down,
        const vector<Blob<Dtype>*>& bottom) {
      if (propagate_down[1]) {    //同样,标签对应的输出blob不允许设置反传
        LOG(FATAL) << this->type() << " Layer cannot backpropagate to label inputs.";
      }
      if (propagate_down[0]) {    //预测blob需要反传
        // First, compute the diff
        const int count = bottom[0]->count();                 //数据个数
        const Dtype* sigmoid_output_data = sigmoid_output_->cpu_data(); //SigmoidLayer层的输出数据σ(x)
        const Dtype* target = bottom[1]->cpu_data();          //标签值
        Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();   //预测blob的梯度数据指针
        caffe_sub(count, sigmoid_output_data, target, bottom_diff); //bottom_diff = sigmoid_output_data - target
        // Zero out gradient of ignored targets.
        if (has_ignore_label_) {    //如果设置了无效标签
          for (int i = 0; i < count; ++i) {
            const int target_value = static_cast<int>(target[i]); //第i个数据的标签值
            if (target_value == ignore_label_) {  //当前数据为无效标签
              bottom_diff[i] = 0;   //梯度置为0
            }
          }
        }
        // Scale down gradient
        Dtype loss_weight = top[0]->cpu_diff()[0] / normalizer_;    //系数
        caffe_scal(count, loss_weight, bottom_diff);  //bottom_diff *= loss_weight
      }
    }
    

    小结

    • 梯度累加是针对layer的参数blob,layer的输入输出blob的梯度是不会累加的

    参考

    http://freemind.pluskid.org/machine-learning/softmax-vs-softmax-loss-numerical-stability/
    Caffe的源码笔者是第一次阅读,一边阅读一边记录,对代码的理解和分析可能会存在错误或遗漏,希望各位读者批评指正,谢谢支持!

  • 相关阅读:
    BZOJ 4260 Codechef REBXOR
    [SHOI2008]小约翰的游戏John
    [POI2016]Nim z utrudnieniem
    [CQOI2013]棋盘游戏
    [SDOI2016]硬币游戏
    [BZOJ3083]遥远的国度
    [Luogu3727]曼哈顿计划E
    [HihoCoder1413]Rikka with String
    [CF666E]Forensic Examination
    [BZOJ4004][JLOI2015]装备购买
  • 原文地址:https://www.cnblogs.com/Relu110/p/12129792.html
Copyright © 2020-2023  润新知