• jrae源代码解析(二)


    本文细述上文引出的RAECost和SoftmaxCost两个类。

    SoftmaxCost

    我们已经知道。SoftmaxCost类在给定features和label的情况下(超參数给定),衡量给定权重(hidden×catSize)的误差值cost,并指出当前的权重梯度。看代码。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    @Override
        public double valueAt(double[] x)
        {
            if( !requiresEvaluation(x) )
                return value;
            int numDataItems = Features.columns;
             
            int[] requiredRows = ArraysHelper.makeArray(0, CatSize-2);
            ClassifierTheta Theta = new ClassifierTheta(x,FeatureLength,CatSize);
            DoubleMatrix Prediction = getPredictions (Theta, Features);
             
            double MeanTerm = 1.0 / (double) numDataItems;
            double Cost = getLoss (Prediction, Labels).sum() * MeanTerm;
            double RegularisationTerm = 0.5 * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);
             
            DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
            DoubleMatrix Delta = Features.mmul(Diff.transpose());
         
            DoubleMatrix gradW = Delta.getColumns(requiredRows);
            DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows));
             
            //Regularizing. Bias does not have one.
            gradW = gradW.addi(Theta.W.mul(Lambda));
             
            Gradient = new ClassifierTheta(gradW,gradb);
            value = Cost + RegularisationTerm;
            gradient = Gradient.Theta;
            return value;
        }<br><br>public DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br>    {<br>        int numDataItems = Features.columns;<br>        DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br>        Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(1,numDataItems));<br>        return Activation.valueAt(Input); <br>    }

     是个典型的2层神经网络,没有隐层,首先依据features预測labels,预測结果用softmax归一化,然后依据误差反向传播算出权重梯度。

    此处添加200字。

    这个典型的2层神经网络,label为一列向量,目标label置1,其余为0;转换函数为softmax函数,输出为每一个label的概率。

    计算cost的函数为getLoss。如果目标label的预測输出为p,则每一个样本的cost也即误差函数为:

    cost=E(p)=log(p)

    依据前述的神经网络后向传播算法,我们得到(j为目标label时,否则为0):

    Ewij=Epjhjnetjxi=1pjpj(1pj)xi=(1pj)xi=(labeljpj)featurei

    因此我们便理解了以下代码的含义:

    1
    DoubleMatrix Delta = Features.mmul(Diff.transpose());

     

    RAECost

    先看实现代码:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    @Override
        public double valueAt(double[] x)
        {
            if(!requiresEvaluation(x))
                return value;
             
            Theta Theta1 = new Theta(x,hiddenSize,visibleSize,dictionaryLength);
            FineTunableTheta Theta2 = new FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
            Theta2.setWe( Theta2.We.add(WeOrig) );
             
            final RAEClassificationCost classificationCost = new RAEClassificationCost(
                    catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
            final RAEFeatureCost featureCost = new RAEFeatureCost(
                    AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);
         
            Parallel.For(DataCell,
                new Parallel.Operation<LabeledDatum<Integer,Integer>>() {
                    public void perform(int index, LabeledDatum<Integer,Integer> Data)
                    {
                        try {
                            LabeledRAETree Tree = featureCost.Compute(Data);
                            classificationCost.Compute(Data, Tree);                
                        } catch (Exception e) {
                            System.err.println(e.getMessage());
                        }
                    }
            });
             
            double costRAE = featureCost.getCost();
            double[] gradRAE = featureCost.getGradient().clone();
                 
            double costSUP = classificationCost.getCost();
            gradient = classificationCost.getGradient();
                 
            value = costRAE + costSUP;
            for(int i=0; i<gradRAE.length; i++)
                gradient[i] += gradRAE[i];
             
            System.gc();    System.gc();
            System.gc();    System.gc();
            System.gc();    System.gc();
            System.gc();    System.gc();
             
            return value;
        }

    cost由两部分组成,featureCost和classificationCost。程序遍历每一个样本,用featureCost.Compute(Data)生成一个递归树,同一时候累加cost和gradient。然后用classificationCost.Compute(Data, Tree)依据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

    RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树。然后调用BackPropagate计算梯度并累加。详细的算法过程。下一章分解。

  • 相关阅读:
    DIV3E 基环树
    Codeforces Round #663 (Div. 2) D.505
    统计2进制中1的数量
    bitset 用法笔记
    扩展欧几里得
    KM算法(二分图最大权匹配)
    C1. Errich-Tac-Toe (Easy Version) 米奇妙妙屋
    求逆元
    python——标识符及其命名规则
    python基础——python对象概述
  • 原文地址:https://www.cnblogs.com/mfrbuaa/p/5344125.html
Copyright © 2020-2023  润新知