tensorflow,torch tips

apply weightDecay,L2 REGULARIZATION_LOSSES

weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
for w in weights:
    print(w)
l2r = tf.contrib.layers.l2_regularizer(0.001)
tf.contrib.layers.apply_regularization(l2r,weights)
tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)

##cross_entropy loss

tf.add_to_collection('losses', cross_entropy_mean)

loss = tf.add_n(tf.get_collection('losses'), name='cross_entropy_loss')

# config optimizer
target_loss = target_loss + tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES),name='l2_loss')
train_step = tf.train.AdamOptimizer(
learning_rate).minimize(target_loss,global_step)

.learningRateDecay

global_step = tf.Variable(0, trainable=False,name = 'global_step')
learning_rate = tf.train.exponential_decay(opts.learning_rate, global_step, 10000, 0.96, staircase=True)
train_step = tf.train.AdamOptimizer(learning_rate).minimize(target_loss,global_step)

tensorflow 与torch 中 learningRateDecay的差异

torch:  
 -- (3) learning rate decay (annealing)
   local clr = lr / (1 + state.t*lrd)

   state.t = state.t + 1

https://github.com/torch/optim/blob/master/adam.lua

tensorflow:
decayed_learning_rate = learning_rate *
                        decay_rate ^ (global_step / decay_steps)

https://www.tensorflow.org/versions/r0.11/api_docs/python/train/decaying_the_learning_rate

torch中是每个batch执行一次，如果lrd = 0.001

tensorflow 对应的应该是：decay_steps设为1，decay_steps = 1-lrd=0.999，这样就与torch的方法近似了？

不对，tesorflow中有等价的tf.train.inverse_time_decay

tensorflow 中的softmax与torch 中LogSoftmax

tf.nn.softmax

 exp(logits) / reduce_sum(exp(logits), dim)

tf.log(tf.nn.softmax(logits))并不与torch的LogSoftmax,torch中的LogSoftmax实现方式不一样：

https://github.com/torch/nn/blob/master/lib/THNN/generic/LogSoftMax.c

http://blog.csdn.net/lanchunhui/article/details/51248184

saver

http://www.jianshu.com/p/8487db911d9a

tensorflow 与torch 中 DropOut的差异

torch:
Furthermore, the outputs are scaled by a factor of 1/(1-p) during training. 

tensorflow:
With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0. The scaling is so that the expected sum is unchanged.

所以torch中的dropout_rate = p,相当于tesnsorflow中的keep_prob = 1-p

参数顺序

conv:torch outputs*inputs*kh*kw , tf kh*kw*inputs*outputs

deconv:torch inputs*outputs*kh*kw , tf kh*kw*outputs*inputs

移动端&MPS: outputs*kh*kw*inputs ，注意deconv kh*kw rotate 180度

原文地址：https://www.cnblogs.com/mlj318/p/7009178.html