Training Deep Neural Networks

2023-02-22,,

http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html  //转载于

Training Deep Neural Networks

 Published: 09 Oct 2015  Category: deep_learning

Tutorials

Popular Training Approaches of DNNs — A Quick Overview

https://medium.com/@asjad/popular-training-approaches-of-dnns-a-quick-overview-26ee37ad7e96#.pqyo039bb

Activation functions

Rectified linear units improve restricted boltzmann machines (ReLU)

paper: http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf

Rectifier Nonlinearities Improve Neural Network Acoustic Models (leaky-ReLU, aka LReLU)

paper: http://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (PReLU)

keywords: PReLU, Caffe “msra” weights initilization
arXiv: http://arxiv.org/abs/1502.01852

Empirical Evaluation of Rectified Activations in Convolutional Network (ReLU/LReLU/PReLU/RReLU)

arXiv: http://arxiv.org/abs/1505.00853

Deep Learning with S-shaped Rectified Linear Activation Units (SReLU)

arxiv: http://arxiv.org/abs/1512.07030

Parametric Activation Pools greatly increase performance and consistency in ConvNets

blog: http://blog.claymcleod.io/2016/02/06/Parametric-Activation-Pools-greatly-increase-performance-and-consistency-in-ConvNets/

Noisy Activation Functions

arxiv: http://arxiv.org/abs/1603.00391

Weights Initialization

An Explanation of Xavier Initialization

blog: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?

arxiv: http://arxiv.org/abs/1504.08291

All you need is a good init

arxiv: http://arxiv.org/abs/1511.06422
github: https://github.com/ducha-aiki/LSUVinit

Data-dependent Initializations of Convolutional Neural Networks

arxiv: http://arxiv.org/abs/1511.06856
github: https://github.com/philkr/magic_init

What are good initial weights in a neural network?

stackexchange: http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

RandomOut: Using a convolutional gradient norm to win The Filter Lottery

arxiv: http://arxiv.org/abs/1602.05931

Batch Normalization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift(ImageNet top-5 error: 4.82%)

arXiv: http://arxiv.org/abs/1502.03167
blog: https://standardfrancis.wordpress.com/2015/04/16/batch-normalization/
notes: http://blog.csdn.net/happynear/article/details/44238541

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

arxiv: http://arxiv.org/abs/1602.07868
github(Lasagne): https://github.com/TimSalimans/weight_norm
notes: http://www.erogol.com/my-notes-weight-normalization/

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

arxiv: http://arxiv.org/abs/1603.01431

Loss Function

The Loss Surfaces of Multilayer Networks

arxiv: http://arxiv.org/abs/1412.0233

Optimization Methods

On Optimization Methods for Deep Learning

paper: http://www.icml-2011.org/papers/210_icmlpaper.pdf

On the importance of initialization and momentum in deep learning

paper: http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

Invariant backpropagation: how to train a transformation-invariant neural network

arxiv: http://arxiv.org/abs/1502.04434
github: https://github.com/sdemyanov/ConvNet

A practical theory for designing very deep convolutional neural network

kaggle: https://www.kaggle.com/c/datasciencebowl/forums/t/13166/happy-lantern-festival-report-and-code/69284
paper: https://kaggle2.blob.core.windows.net/forum-message-attachments/69182/2287/A%20practical%20theory%20for%20designing%20very%20deep%20convolutional%20neural%20networks.pdf?sv=2012-02-12&se=2015-12-05T15%3A40%3A02Z&sr=b&sp=r&sig=kfBQKduA1pDtu837Y9Iqyrp2VYItTV0HCgOeOok9E3E%3D
slides: http://vdisk.weibo.com/s/3nFsznjLKn

Stochastic Optimization Techniques

intro: SGD/Momentum/NAG/Adagrad/RMSProp/Adadelta/Adam/ESGD/Adasecant/vSGD/Rprop
blog: http://colinraffel.com/wiki/stochastic_optimization_techniques

Alec Radford’s animations for optimization algorithms

http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html

Faster Asynchronous SGD (FASGD)

arxiv: http://arxiv.org/abs/1601.04033
github: https://github.com/DoctorTeeth/fred

An overview of gradient descent optimization algorithms (★★★★★)

blog: http://sebastianruder.com/optimizing-gradient-descent/

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

arxiv: http://arxiv.org/abs/1602.02151

Writing fast asynchronous SGD/AdaGrad with RcppParallel

blog: http://gallery.rcpp.org/articles/rcpp-sgd/

Regularization

DisturbLabel: Regularizing CNN on the Loss Layer [University of California & MSR] (2016)

intro: “an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration”
paper: http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf

Dropout

Improving neural networks by preventing co-adaptation of feature detectors (Dropout)

arxiv: http://arxiv.org/abs/1207.0580

Regularization of Neural Networks using DropConnect

homepage: http://cs.nyu.edu/~wanli/dropc/
gitxiv: http://gitxiv.com/posts/rJucpiQiDhQ7HkZoX/regularization-of-neural-networks-using-dropconnect
github: https://github.com/iassael/torch-dropconnect

Regularizing neural networks with dropout and with DropConnect

blog: http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/

Fast dropout training

paper: http://jmlr.org/proceedings/papers/v28/wang13a.pdf
github: https://github.com/sidaw/fastdropout

Dropout as data augmentation

paper: http://arxiv.org/abs/1506.08700
notes: https://www.evernote.com/shard/s189/sh/ef0c3302-21a4-40d7-b8b4-1c65b8ebb1c9/24ff553fcfb70a27d61ff003df75b5a9

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

arxiv: http://arxiv.org/abs/1512.05287
github: https://github.com/yaringal/BayesianRNN

Improved Dropout for Shallow and Deep Learning

arxiv: http://arxiv.org/abs/1602.02220

Gradient Descent

Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)

http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html

An Introduction to Gradient Descent in Python

blog: http://tillbergmann.com/blog/articles/python-gradient-descent.html

Train faster, generalize better: Stability of stochastic gradient descent

arxiv: http://arxiv.org/abs/1509.01240

A Variational Analysis of Stochastic Gradient Algorithms

arxiv: http://arxiv.org/abs/1602.02666

The vanishing gradient problem: Oh no — an obstacle to deep learning!

blog: https://medium.com/a-year-of-artificial-intelligence/rohan-4-the-vanishing-gradient-problem-ec68f76ffb9b#.50hu5vwa8

Gradient Descent For Machine Learning

http://machinelearningmastery.com/gradient-descent-for-machine-learning/

Revisiting Distributed Synchronous SGD

arxiv: http://arxiv.org/abs/1604.00981

Accelerate Training

Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices

arxiv: http://arxiv.org/abs/1603.07341

Image Data Augmentation

DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm

github: https://github.com/takmin/DataAugmentation

Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques

github: https://github.com/ShaharKatz/Caffe-Data-Augmentation

Papers

Scalable and Sustainable Deep Learning via Randomized Hashing

arxiv: http://arxiv.org/abs/1602.08194

Tools

pastalog: Simple, realtime visualization of neural network training performance

github: https://github.com/rewonc/pastalog

torch-pastalog: A Torch interface for pastalog - simple, realtime visualization of neural network training performance

github: https://github.com/Kaixhin/torch-pastalog

Training Deep Neural Networks的相关教程结束。

《Training Deep Neural Networks.doc》

下载本文的Word格式文档,以方便收藏与打印。