网络参数
# 测试总数/batchsize test_iter: 100 # 测试间隔 test_interval: 500 # 开始的学习率 base_lr: 0.01 # 冲量单元,用于加速收敛,v(t+1)=momentum*v(t)-lr*grad ; w(t+1)=w(t)+v(t+1) momentum: 0.9 # 权值衰减,用于惩罚项 weight_decay: 0.0005 # 学习率下降策略,此处计算方式为base_lr * (1 + gamma * iter) ^ (- power) lr_policy: "inv" gamma: 0.0001 power: 0.75 # 每display次打印显示loss display: 100 # train 最大迭代max_iter max_iter: 10000 # 每迭代snapshot次,保存一次快照 snapshot: 5000 snapshot_prefix: "examples/mnist/lenet" # 使用CPU还是GPU solver_mode: GPU
name: "LeNet"
layer {
name: "mnist"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
scale: 0.00390625
}
data_param {
source: "examples/mnist/mnist_train_lmdb"
batch_size: 64
backend: LMDB
}
}
name: 该层的名称。
type: 层类型,如果是Data,表示数据来源于LevelDB或LMDB。
top/bottom: 输出/ 输入。(data,label)配对作为输入数据进行分类。
include: 属于训练、测试或者两者均含。
Transform_param: 将数据变换到定义的范围。0.00390625指1/255。
source: 数据来源。
batch_size: 每次处理的数据个数。
backend: LevelDB/LMDB。
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
lr_mult: 学习率的系数,将乘以solver.prototxt配置文件中的base_lr。两个lr_mult对应两个参数。
num_output: 卷积核的个数。
kernel_size: 卷积核的大小。
stride: 卷积核的步长,默认为1。
pad: 扩充边缘,默认为0。
weight_filler: 权值初始化。 默认为“constant",值全为0,很多时候我们用"xavier"算法来进行初始化,也可以设置为”gaussian"。
bias_filler: 偏置项的初始化。一般设置为"constant",值全为0。
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
pool: 池化方法。
pad: 边缘扩充,默认为0。
kernel_size: 池化的核大小。
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
全连接层也可以理解为一种卷积层,卷积核大小和原数据大小一致。
num_output: 输出/卷积核的个数。
自定义网络
在之前的随笔基于theano的深度卷积神经网络中有一个神经网络。
此处用caffe训练。
主要参数有:
net = Network([
ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28),
filter_shape=(20, 1, 5, 5),
poolsize=(2, 2), activation_fn=ReLU),
ConvPoolLayer(image_shape=(mini_batch_size, 20, 12, 12),
filter_shape=(40, 20, 5, 5),
poolsize=(2, 2)), activation_fn=ReLU),
FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=sigmoid),
SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size=10)
net.SGD(training_data, 30, mini_batch_size=10, 0.1,
validation_data, test_data)
lr保持0.1不变。不过测试在0.01时才能收敛,>=0.02不可以。
没有momentum与weight_decay。
Transform_param去掉或改为1.0不能收敛。
高斯初始化,设置不同的std。
还有一些网络输入输出的修改。
net: "examples/mnist/lenet_train_test2.prototxt" test_iter: 100 test_interval: 500 base_lr: 0.01 momentum: 0 weight_decay: 0 lr_policy: "inv" gamma: 0 power: 0 display: 100 max_iter: 30000 snapshot: 30000 snapshot_prefix: "examples/mnist/lenet" solver_mode: GPU
修改后的net。
name: "LeNet" layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 } data_param { source: "examples/mnist/mnist_train_lmdb" batch_size: 10 backend: LMDB } } layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "examples/mnist/mnist_test_lmdb" batch_size: 10 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 1 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.09 } bias_filler { type: "gaussian" std: 1.0 } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 40 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.06 } bias_filler { type: "gaussian" std: 1.0 } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 100 weight_filler { type: "gaussian" std: 0.1 } bias_filler { type: "gaussian" std: 1.0 } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "gaussian" std: 0.33 } bias_filler { type: "gaussian" std: 1.0 } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }
得到的结果和theano接近。
I0105 17:29:22.523669 2836 solver.cpp:317] Iteration 30000, loss = 0.00268317 I0105 17:29:22.523669 2836 solver.cpp:337] Iteration 30000, Testing net (#0) I0105 17:29:22.648680 2836 solver.cpp:404] Test net output #0: accuracy = 0.985 I0105 17:29:22.648680 2836 solver.cpp:404] Test net output #1: loss = 0.0472795 (* 1 = 0.0472795 loss)