Training Deep Neural Networks

Training Deep Neural Networks
http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于

Training Deep Neural Networks

Published: 09 Oct 2015 Category: deep_learning

Tutorials

Popular Training Approaches of DNNs — A Quick Overview

https://medium.com/@asjad/popular-training-approaches-of-dnns-a-quick-overview-26ee37ad7e96#.pqyo039bb

Activation functions

Rectified linear units improve restricted boltzmann machines (ReLU)
- paper: http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf
Rectifier Nonlinearities Improve Neural Network Acoustic Models (leaky-ReLU, aka LReLU)
- paper: http://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (PReLU)
- keywords: PReLU, Caffe “msra” weights initilization
- arXiv: http://arxiv.org/abs/1502.01852
Empirical Evaluation of Rectified Activations in Convolutional Network (ReLU/LReLU/PReLU/RReLU)
- arXiv: http://arxiv.org/abs/1505.00853
Deep Learning with S-shaped Rectified Linear Activation Units (SReLU)
- arxiv: http://arxiv.org/abs/1512.07030
Parametric Activation Pools greatly increase performance and consistency in ConvNets
- blog: http://blog.claymcleod.io/2016/02/06/Parametric-Activation-Pools-greatly-increase-performance-and-consistency-in-ConvNets/
Noisy Activation Functions
- arxiv: http://arxiv.org/abs/1603.00391
Weights Initialization

An Explanation of Xavier Initialization
- blog: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
- arxiv: http://arxiv.org/abs/1504.08291
All you need is a good init
- arxiv: http://arxiv.org/abs/1511.06422
- github: https://github.com/ducha-aiki/LSUVinit
Data-dependent Initializations of Convolutional Neural Networks
- arxiv: http://arxiv.org/abs/1511.06856
- github: https://github.com/philkr/magic_init
What are good initial weights in a neural network?
- stackexchange: http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
RandomOut: Using a convolutional gradient norm to win The Filter Lottery
- arxiv: http://arxiv.org/abs/1602.05931
Batch Normalization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift(ImageNet top-5 error: 4.82%)
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
- arxiv: http://arxiv.org/abs/1602.07868
- github(Lasagne): https://github.com/TimSalimans/weight_norm
- notes: http://www.erogol.com/my-notes-weight-normalization/
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
- arxiv: http://arxiv.org/abs/1603.01431
Loss Function

The Loss Surfaces of Multilayer Networks
- arxiv: http://arxiv.org/abs/1412.0233
Optimization Methods

On Optimization Methods for Deep Learning
- paper: http://www.icml-2011.org/papers/210_icmlpaper.pdf
On the importance of initialization and momentum in deep learning
- paper: http://jmlr.org/proceedings/papers/v28/sutskever13.pdf
Invariant backpropagation: how to train a transformation-invariant neural network
- arxiv: http://arxiv.org/abs/1502.04434
- github: https://github.com/sdemyanov/ConvNet
A practical theory for designing very deep convolutional neural network
Stochastic Optimization Techniques
- intro: SGD/Momentum/NAG/Adagrad/RMSProp/Adadelta/Adam/ESGD/Adasecant/vSGD/Rprop
- blog: http://colinraffel.com/wiki/stochastic_optimization_techniques
Alec Radford’s animations for optimization algorithms

http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html

Faster Asynchronous SGD (FASGD)
- arxiv: http://arxiv.org/abs/1601.04033
- github: https://github.com/DoctorTeeth/fred
An overview of gradient descent optimization algorithms (★★★★★)
- blog: http://sebastianruder.com/optimizing-gradient-descent/
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
- arxiv: http://arxiv.org/abs/1602.02151
Writing fast asynchronous SGD/AdaGrad with RcppParallel
- blog: http://gallery.rcpp.org/articles/rcpp-sgd/
Regularization

DisturbLabel: Regularizing CNN on the Loss Layer [University of California & MSR] (2016)
- intro: “an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration”
- paper: http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf
Dropout

Improving neural networks by preventing co-adaptation of feature detectors (Dropout)
- arxiv: http://arxiv.org/abs/1207.0580
Regularization of Neural Networks using DropConnect
- homepage: http://cs.nyu.edu/~wanli/dropc/
- gitxiv: http://gitxiv.com/posts/rJucpiQiDhQ7HkZoX/regularization-of-neural-networks-using-dropconnect
- github: https://github.com/iassael/torch-dropconnect
Regularizing neural networks with dropout and with DropConnect
- blog: http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/
Fast dropout training
- paper: http://jmlr.org/proceedings/papers/v28/wang13a.pdf
- github: https://github.com/sidaw/fastdropout
Dropout as data augmentation
- paper: http://arxiv.org/abs/1506.08700
- notes: https://www.evernote.com/shard/s189/sh/ef0c3302-21a4-40d7-b8b4-1c65b8ebb1c9/24ff553fcfb70a27d61ff003df75b5a9
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
- arxiv: http://arxiv.org/abs/1512.05287
- github: https://github.com/yaringal/BayesianRNN
Improved Dropout for Shallow and Deep Learning
- arxiv: http://arxiv.org/abs/1602.02220
Gradient Descent

Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)

http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html

An Introduction to Gradient Descent in Python
- blog: http://tillbergmann.com/blog/articles/python-gradient-descent.html
Train faster, generalize better: Stability of stochastic gradient descent
- arxiv: http://arxiv.org/abs/1509.01240
A Variational Analysis of Stochastic Gradient Algorithms
- arxiv: http://arxiv.org/abs/1602.02666
The vanishing gradient problem: Oh no — an obstacle to deep learning!
- blog: https://medium.com/a-year-of-artificial-intelligence/rohan-4-the-vanishing-gradient-problem-ec68f76ffb9b#.50hu5vwa8
Gradient Descent For Machine Learning

http://machinelearningmastery.com/gradient-descent-for-machine-learning/

Revisiting Distributed Synchronous SGD
- arxiv: http://arxiv.org/abs/1604.00981
Accelerate Training

Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices
- arxiv: http://arxiv.org/abs/1603.07341
Image Data Augmentation

DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm
- github: https://github.com/takmin/DataAugmentation
Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques
- github: https://github.com/ShaharKatz/Caffe-Data-Augmentation
Papers

Scalable and Sustainable Deep Learning via Randomized Hashing
- arxiv: http://arxiv.org/abs/1602.08194
Tools

pastalog: Simple, realtime visualization of neural network training performance
- github: https://github.com/rewonc/pastalog
torch-pastalog: A Torch interface for pastalog - simple, realtime visualization of neural network training performance
- github: https://github.com/Kaixhin/torch-pastalog
相关阅读:
操作系统_3：linux教程列表
 MongoEngine 查询语法
 Spark_1：教程索引
 软件需求十步走之阅读笔记03
软件需求十步走之阅读笔记02
软件需求十步走之阅读笔记01
暑期学习四
 暑期学习三
 暑期学习二
 暑期学习一
原文地址：https://www.cnblogs.com/hansjorn/p/5396677.html

Training Deep Neural Networks

http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于

Training Deep Neural Networks

Tutorials

Activation functions

Weights Initialization

Batch Normalization

Loss Function

Optimization Methods

Regularization

Dropout

Gradient Descent

Accelerate Training

Image Data Augmentation

Papers

Tools