http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于

Training Deep Neural Networks

Published: 09 Oct 2015 Category: deep_learning

Tutorials

Popular Training Approaches of DNNs — A Quick Overview

https://medium.com/@asjad/popular-training-approaches-of-dnns-a-quick-overview-26ee37ad7e96#.pqyo039bb

Activation functions

Rectified linear units improve restricted boltzmann machines (ReLU)

paper: http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf

Rectifier Nonlinearities Improve Neural Network Acoustic Models (leaky-ReLU, aka LReLU)

paper: http://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (PReLU)

keywords: PReLU, Caffe “msra” weights initilization
arXiv: http://arxiv.org/abs/1502.01852

Empirical Evaluation of Rectified Activations in Convolutional Network (ReLU/LReLU/PReLU/RReLU)

arXiv: http://arxiv.org/abs/1505.00853

Deep Learning with S-shaped Rectified Linear Activation Units (SReLU)

arxiv: http://arxiv.org/abs/1512.07030

Parametric Activation Pools greatly increase performance and consistency in ConvNets

blog: http://blog.claymcleod.io/2016/02/06/Parametric-Activation-Pools-greatly-increase-performance-and-consistency-in-ConvNets/

Noisy Activation Functions

arxiv: http://arxiv.org/abs/1603.00391

Weights Initialization

An Explanation of Xavier Initialization

blog: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?

arxiv: http://arxiv.org/abs/1504.08291

All you need is a good init

arxiv: http://arxiv.org/abs/1511.06422
github: https://github.com/ducha-aiki/LSUVinit

Data-dependent Initializations of Convolutional Neural Networks

arxiv: http://arxiv.org/abs/1511.06856
github: https://github.com/philkr/magic_init

What are good initial weights in a neural network?

stackexchange: http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

RandomOut: Using a convolutional gradient norm to win The Filter Lottery

arxiv: http://arxiv.org/abs/1602.05931

Batch Normalization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift(ImageNet top-5 error: 4.82%)

arXiv: http://arxiv.org/abs/1502.03167
blog: https://standardfrancis.wordpress.com/2015/04/16/batch-normalization/
notes: http://blog.csdn.net/happynear/article/details/44238541

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

arxiv: http://arxiv.org/abs/1602.07868
github(Lasagne): https://github.com/TimSalimans/weight_norm
notes: http://www.erogol.com/my-notes-weight-normalization/

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

arxiv: http://arxiv.org/abs/1603.01431

Loss Function

The Loss Surfaces of Multilayer Networks

arxiv: http://arxiv.org/abs/1412.0233

Optimization Methods

On Optimization Methods for Deep Learning

paper: http://www.icml-2011.org/papers/210_icmlpaper.pdf

On the importance of initialization and momentum in deep learning

paper: http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

Invariant backpropagation: how to train a transformation-invariant neural network

arxiv: http://arxiv.org/abs/1502.04434
github: https://github.com/sdemyanov/ConvNet

A practical theory for designing very deep convolutional neural network

kaggle: https://www.kaggle.com/c/datasciencebowl/forums/t/13166/happy-lantern-festival-report-and-code/69284
paper: https://kaggle2.blob.core.windows.net/forum-message-attachments/69182/2287/A%20practical%20theory%20for%20designing%20very%20deep%20convolutional%20neural%20networks.pdf?sv=2012-02-12&se=2015-12-05T15%3A40%3A02Z&sr=b&sp=r&sig=kfBQKduA1pDtu837Y9Iqyrp2VYItTV0HCgOeOok9E3E%3D
slides: http://vdisk.weibo.com/s/3nFsznjLKn

Stochastic Optimization Techniques

intro: SGD/Momentum/NAG/Adagrad/RMSProp/Adadelta/Adam/ESGD/Adasecant/vSGD/Rprop
blog: http://colinraffel.com/wiki/stochastic_optimization_techniques

Alec Radford’s animations for optimization algorithms

http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html

Faster Asynchronous SGD (FASGD)

arxiv: http://arxiv.org/abs/1601.04033
github: https://github.com/DoctorTeeth/fred

An overview of gradient descent optimization algorithms (★★★★★)

blog: http://sebastianruder.com/optimizing-gradient-descent/

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

arxiv: http://arxiv.org/abs/1602.02151

Writing fast asynchronous SGD/AdaGrad with RcppParallel

blog: http://gallery.rcpp.org/articles/rcpp-sgd/

Regularization

DisturbLabel: Regularizing CNN on the Loss Layer [University of California & MSR] (2016)

intro: “an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration”
paper: http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf

Dropout

Improving neural networks by preventing co-adaptation of feature detectors (Dropout)

arxiv: http://arxiv.org/abs/1207.0580

Regularization of Neural Networks using DropConnect

homepage: http://cs.nyu.edu/~wanli/dropc/
gitxiv: http://gitxiv.com/posts/rJucpiQiDhQ7HkZoX/regularization-of-neural-networks-using-dropconnect
github: https://github.com/iassael/torch-dropconnect

Regularizing neural networks with dropout and with DropConnect

blog: http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/

Fast dropout training

paper: http://jmlr.org/proceedings/papers/v28/wang13a.pdf
github: https://github.com/sidaw/fastdropout

Dropout as data augmentation

paper: http://arxiv.org/abs/1506.08700
notes: https://www.evernote.com/shard/s189/sh/ef0c3302-21a4-40d7-b8b4-1c65b8ebb1c9/24ff553fcfb70a27d61ff003df75b5a9

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

arxiv: http://arxiv.org/abs/1512.05287
github: https://github.com/yaringal/BayesianRNN

Improved Dropout for Shallow and Deep Learning

arxiv: http://arxiv.org/abs/1602.02220

Gradient Descent

Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)

http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html

An Introduction to Gradient Descent in Python

blog: http://tillbergmann.com/blog/articles/python-gradient-descent.html

Train faster, generalize better: Stability of stochastic gradient descent

arxiv: http://arxiv.org/abs/1509.01240

A Variational Analysis of Stochastic Gradient Algorithms

arxiv: http://arxiv.org/abs/1602.02666

The vanishing gradient problem: Oh no — an obstacle to deep learning!

blog: https://medium.com/a-year-of-artificial-intelligence/rohan-4-the-vanishing-gradient-problem-ec68f76ffb9b#.50hu5vwa8

Gradient Descent For Machine Learning

http://machinelearningmastery.com/gradient-descent-for-machine-learning/

Revisiting Distributed Synchronous SGD

arxiv: http://arxiv.org/abs/1604.00981

Accelerate Training

Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices

arxiv: http://arxiv.org/abs/1603.07341

Image Data Augmentation

DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm

github: https://github.com/takmin/DataAugmentation

Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques

github: https://github.com/ShaharKatz/Caffe-Data-Augmentation

Papers

Scalable and Sustainable Deep Learning via Randomized Hashing

arxiv: http://arxiv.org/abs/1602.08194

Tools

pastalog: Simple, realtime visualization of neural network training performance

github: https://github.com/rewonc/pastalog

torch-pastalog: A Torch interface for pastalog - simple, realtime visualization of neural network training performance

github: https://github.com/Kaixhin/torch-pastalog

Training Deep Neural Networks

http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于

Training Deep Neural Networks

Tutorials

Activation functions

Weights Initialization

Batch Normalization

Loss Function

Optimization Methods

Regularization

Dropout

Gradient Descent

Accelerate Training

Image Data Augmentation

Papers

Tools

Training Deep Neural Networks的相关教程结束。

相关推荐

lec-1-Deep Reinforcement Learning, Decision Making, and Control

HNU2019 Summer Training 3 E. Blurred Pictures

#Deep Learning回顾#之LeNet、AlexNet、GoogLeNet、VGG、ResNet

[论文阅读] Deep Residual Learning for Image Recognition(ResNet)

Deep-DRM算法理解

【Deep Learning】DDPM

Deep Learning 教程翻译

vue中使用/deep/失效如何解决