• tricks or tips


    sec. 1: Data Augmentation

    • horizontally flipping, random crops and color jittering
    • fancy PCA

    sec. 2: Pre-Processing

    • zero center and normailze
    X -= np.mean(X, axis=0) #zero center
    X /= np.std(X, axis=0)
    
    • PCA Whitening
    X -= np.mean(X, axis=0) # zero-center
    cov = np.dot(X.T, X) / X.shape[0]
    
    U,S,v = np.linalg.svd(cov)
    Xrot = np.dot(X, U)
    
    Xwhite = Xrot / np.sqrt(S+1e-5)
    

    sec. 3: Initialization

    • all zero initialization

      all the neurons compute the same gradients.

    • Initialization with Small Random Numbers

      weights ~ 0.001 * N(0,1)

    • Calibrating the Variances

      the outputs from a randomly initialized neuron has a variance grows with the number of inputs

      [egin{align} Var(X) &= E(X^2)-E(X)^2\ Var(s) &= Var(sum_{i=1}^nw_ix_i)\ &= sum_{i=1}^n{E(w_i^2x_i^2)-E(w_ix_i)^2}\ &= sum_{i=1}^n{E(w_i^2)E(x_i^2)-E(w_ix_i)^2}\ &= sum_{i=1}^n{Var(w_i)Var(x_i)-2E(w_ix_i)^2+E(w_i^2)E(x_i)^2+E(w_i)^2E(x_i^2)}\ &= sum_{i=1}^n{Var(w_i)Var(x_i)}\ &= nVar(w)Var(x) end{align} ]

    w = np.random.randn(n)/sqrt(n) # n: the number of inputs
    
    • Current Recommendation

    an initialization specifically for Relus:

    w = np.random.randn(n)*sqrt(2.0/n)
    

    Sec. 4: During Training

    • Learning rate: divide the LR by 2 (or by 5)
    • Fine-tune on pre-trained models on your own data
    very similar dataset very different dataset
    very little data Use linear classification on top layer Try linear classification from different stages
    quite a lot of data Finetune a few layers Finetune a large number of layers

    Sec. 5: Activation Functions

    • Sigmoid

      Cons: Sigmoids saturate and kill gradients & outputs are not zero centered

    • tanh

      Cons: saturate and kill gradients

    • Rectified Linear Unit

      Pros: Comutationally & non-saturating form Cons: dying ReLU

    • Leaky ReLU

    • Parametric ReLU

    • Randomized ReLU

    Sec. 6: Regularization

    • L2 regularization : heavily penalizing peaky weight vectors and preferring diffuse weight vectors
    • L1 regularization : explicit feature selection
    • Max norm constraints: enforce an absolute upper bound on the magnitude of the weight vector
    • Dropout: sampling a Neural Network with the full Neural Network

    Sec. 7: Insights from Figures

    • The loss curve: linear - low learning rate; doesn't decrease much - high learning rate
    • accuracy curve: big gap - increase regularization no gap - increase model capacity

    Sec. 8: Ensemble

    • Same model, different initialization
    • Top models discovered during cross-validation
    • Different checkpoints of a single model
    • early fusion & late fusion
  • 相关阅读:
    tomact与eclipces的配置
    IDEA中Flink环境pom.xml文件配置
    python读写mysql
    用pyspark实现Wordcount
    python计算相似度
    Spark读取mysql
    Spark之WordCount
    IDEA中Spark环境pom.xml文件配置
    Spark之HelloWorld
    (转)二叉树的中序遍历与后序遍历构造二叉树思路(递归方法,含图片说明)
  • 原文地址:https://www.cnblogs.com/blueprintf/p/8779918.html
Copyright © 2020-2023  润新知