深度学习训练相关的超参数

参数说明

Parameter	Default(常用值)	Range	Synopsis/Recommendation
Number of Epochs	20	Depends on scenario	Number of times the whole dataset is passed forward and backward through the network
Batch Size	32(32, 64, 128, 256)	Depends on scenario and hardware	Number of input images (and corresponding labels) that are transferred to device memory at once and then processed simultaneously. Default values are chosen such that a network with up to 100 classes fits onto a device with 8 GB memory. If trained on GPU, set as high as permitted by memory. See also the additional information below.
Learning Rate (λ)	0.001(0.01, 0.001, 0.0001)	0 < λ < 1	Determines the weight of the gradient on the updated loss function arguments; other name: step size. Too large values might result in divergence of the algorithm; very small values will take unnecessarily many steps (compare the figure Progress of Top-1 Error for Different Values of Learning Rate). You can configure to adapt (decrease) the learning rate after a certain number of epochs. See also Finding a Value for the Learning Rate.
Momentum (μ)	0.9(0.5-0.9)	0 ≤ μ < 1	Fraction of the previous update step (vector) to add to the current step This parameter can help to attenuate the fluctuation of the loss function.
Weight Prior (α)	0	0 ≤ α < 1	Regularization parameter penalizing large weights, used to prevent overfitting Start with a low value (e.g., 0.00001) and increase if overfitting occurs.