• 用tensorlayer导入Slim模型迁移学习


      上一篇博客【用tensorflow迁移学习猫狗分类】笔者讲到用tensorlayer的【VGG16模型】迁移学习图像分类,那麽问题来了,tensorlayer没提供的模型怎么办呢?别担心,tensorlayer提供了tensorflow中的【slim模型】导入功能,代码例子在tutorial_inceptionV3_tfslim
      那么什么是slim?slim到底有什么用?
    slim是一个使构建,训练,评估神经网络变得简单的库。它可以消除原生tensorflow里面很多重复的模板性的代码,让代码更紧凑,更具备可读性。另外slim提供了很多计算机视觉方面的著名模型(VGG, AlexNet等),我们不仅可以直接使用,甚至能以各种方式进行扩展。(笔者注:总之功能跟tensorlayer差不多嘛)更多介绍可以看这篇文章:【Tensorflow】辅助工具篇——tensorflow slim(TF-Slim)介绍
      要进行迁移学习,首先需要slim模型代码以及预训练好的权重参数,这些谷歌都有提供下载,可以看到主页下面有各个模型以及在imagenet训练集下的参数地址。

    列表还列出了各个模型的top1、top5的正确率,模型很多了。
      好了我们下载Inception-ResNet-v2以及inception_resnet_v2_2016_08_30.tar.gz,py文件和解压出来的.ckpt文件放到项目根目录下面。至于为什么不用tensorlayer例子提供的Inception V3?因为Inception-ResNet-v2正确率高啊。(哈哈真正原因最后来讲)。
      我们依旧进行猫狗分类,按照教程导入模型修改num_classes再导入训练数据,直接训练是会报错的,因为最后的Logits层几个参数在恢复时维度不匹配。
    最后几个参数是不能恢复了,笔者也没有找到选择性恢复.ckpt参数的tensorflow方法。怎么办呢?幸好群里面有位朋友提供了一个方法,参见【Tensorflow 迁移学习】:

    主要思想是:先把所有.ckpt参数恢复成npz格式,再选择恢复npz中的参数,恢复npz中的参数就跟前一篇博客操作一样的了。
    所以整个过程分两步走:
    1.将参数恢复然后保存为npz格式:
      下面是具体代码:

    import os
    import time
    from recordutil import *
    import numpy as np
    # from tensorflow.contrib.slim.python.slim.nets.resnet_v2 import resnet_v2_152
    # from tensorflow.contrib.slim.python.slim.nets.vgg import vgg_16
    import skimage
    import skimage.io
    import skimage.transform
    import tensorflow as tf
    from tensorlayer.layers import *
    # from scipy.misc import imread, imresize
    # from tensorflow.contrib.slim.python.slim.nets.alexnet import alexnet_v2
    from inception_resnet_v2 import (inception_resnet_v2_arg_scope, inception_resnet_v2)
    from scipy.misc import imread, imresize
    from tensorflow.python.ops import variables
    import tensorlayer as tl
    
    slim = tf.contrib.slim
    try:
    from data.imagenet_classes import *
    except Exception as e:
    raise Exception(
    "{} / download the file from: https://github.com/zsdonghao/tensorlayer/tree/master/example/data".format(e))
    
    n_epoch = 200
    learning_rate = 0.0001
    print_freq = 2
    batch_size = 32
    ## InceptionV3 / All TF-Slim nets can be merged into TensorLayer
    x = tf.placeholder(tf.float32, shape=[None, 299, 299, 3])
    # 输出
    y_ = tf.placeholder(tf.int32, shape=[None, ], name='y_')
    net_in = tl.layers.InputLayer(x, name='input_layer')
    with slim.arg_scope(inception_resnet_v2_arg_scope()):
    network = tl.layers.SlimNetsLayer(
    prev_layer=net_in,
    slim_layer=inception_resnet_v2,
    slim_args={
    'num_classes': 1001,
    'is_training': True,
    },
    name='InceptionResnetV2' # <-- the name should be the same with the ckpt model
    )
    # network = fc_layers(net_cnn)
    sess = tf.InteractiveSession()
    network.print_params(False)
    # network.print_layers()
    saver = tf.train.Saver()
    
    # 加载预训练的参数
    # tl.files.assign_params(sess, npz, network)
    
    tl.layers.initialize_global_variables(sess)
    
    saver.restore(sess, "inception_resnet_v2.ckpt")
    print("Model Restored")
    all_params = sess.run(network.all_params)
    np.savez('inception_resnet_v2.npz', params=all_params)
    sess.close()

      执行成功之后,我们得到模型所有的908个参数。
    2.部分恢复npz参数然后训练模型:
      首先我们修改模型最后一层参数,由于进行的是2分类学习,所以做如下修改:

    with slim.arg_scope(inception_resnet_v2_arg_scope()):
    network = tl.layers.SlimNetsLayer(
    prev_layer=net_in,
    slim_layer=inception_resnet_v2,
    slim_args={
    'num_classes': 2,
    'is_training': True,
    },
    name='InceptionResnetV2' # <-- the name should be the same with the ckpt model
    )

      num_classes改为2,is_training为True。
      接着定义输入输出以及损失函数:

    sess = tf.InteractiveSession()
    # saver = tf.train.Saver()
    y = network.outputs
    y_op = tf.argmax(tf.nn.softmax(y), 1)
    cost = tl.cost.cross_entropy(y, y_, name='cost')
    correct_prediction = tf.equal(tf.cast(tf.argmax(y, 1), tf.float32), tf.cast(y_, tf.float32))
    acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


      下面是定义训练参数,我们只训练最后一层的参数,打印参数出来我们看到:

    [TL] param 900: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights:0 (5, 5, 128, 768) float32_ref
    [TL] param 901: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0 (768,) float32_ref
    [TL] param 902: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean:0 (768,) float32_ref
    [TL] param 903: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance:0 (768,) float32_ref
    [TL] param 904: InceptionResnetV2/AuxLogits/Logits/weights:0 (768, 2) float32_ref
    [TL] param 905: InceptionResnetV2/AuxLogits/Logits/biases:0 (2,) float32_ref
    [TL] param 906: InceptionResnetV2/Logits/Logits/weights:0 (1536, 2) float32_ref
    [TL] param 907: InceptionResnetV2/Logits/Logits/biases:0 (2,) float32_ref
    [TL] num of params: 56940900


      从param 904开始训练就行了,参数恢复到param 903
      下面是训练函数以及恢复部分参数,加载样本数据:

    # 定义 optimizer
    train_params = network.all_params[904:]
    print('训练参数:', train_params)
    # # 加载预训练的参数
    # tl.files.assign_params(sess, params, network)
    train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost, var_list=train_params)
    img, label = read_and_decode("D:\001-Python\train299.tfrecords")
    # 使用shuffle_batch可以随机打乱输入
    X_train, y_train = tf.train.shuffle_batch([img, label],
    batch_size=batch_size, capacity=200,
    min_after_dequeue=100)
    tl.layers.initialize_global_variables(sess)
    params = tl.files.load_npz('', 'inception_resnet_v2.npz')
    params = params[0:904]
    print('当前参数大小:', len(params))
    tl.files.assign_params(sess, params=params, network=network)


      下面依旧是训练模型的代码,跟上一篇一样:

    # # 训练模型
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    step = 0
    filelist = getfilelist()
    for epoch in range(n_epoch):
    start_time = time.time()
    val, l = sess.run([X_train, y_train])#next_data(filelist, batch_size) #
    for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True):
    sess.run(train_op, feed_dict={x: X_train_a, y_: y_train_a})
    if epoch + 1 == 1 or (epoch + 1) % print_freq == 0:
    print("Epoch %d of %d took %fs" % (epoch + 1, n_epoch, time.time() - start_time))
    train_loss, train_acc, n_batch = 0, 0, 0
    for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True):
    err, ac = sess.run([cost, acc], feed_dict={x: X_train_a, y_: y_train_a})
    train_loss += err
    train_acc += ac
    n_batch += 1
    print(" train loss: %f" % (train_loss / n_batch))
    print(" train acc: %f" % (train_acc / n_batch))
    # tl.files.save_npz(network.all_params, name='model_vgg_16_2.npz', sess=sess)
    coord.request_stop()
    coord.join(threads)


      batchsize为20训练200代,部分结果如下:

    Epoch 156 of 200 took 12.568609s
    train loss: 0.382517
    train acc: 0.950000
    Epoch 158 of 200 took 12.457161s
    train loss: 0.382509
    train acc: 0.850000
    Epoch 160 of 200 took 12.385407s
    train loss: 0.320393
    train acc: 1.000000
    Epoch 162 of 200 took 12.489218s
    train loss: 0.480686
    train acc: 0.700000
    Epoch 164 of 200 took 12.388841s
    train loss: 0.329189
    train acc: 0.850000
    Epoch 166 of 200 took 12.446472s
    train loss: 0.379127
    train acc: 0.900000
    Epoch 168 of 200 took 12.888571s
    train loss: 0.365938
    train acc: 0.900000
    Epoch 170 of 200 took 12.850605s
    train loss: 0.353434
    train acc: 0.850000
    Epoch 172 of 200 took 12.855129s
    train loss: 0.315443
    train acc: 0.950000
    Epoch 174 of 200 took 12.906666s
    train loss: 0.460817
    train acc: 0.750000
    Epoch 176 of 200 took 12.830738s
    train loss: 0.421025
    train acc: 0.900000
    Epoch 178 of 200 took 12.852572s
    train loss: 0.418784
    train acc: 0.800000
    Epoch 180 of 200 took 12.951322s
    train loss: 0.316057
    train acc: 0.950000
    Epoch 182 of 200 took 12.866213s
    train loss: 0.363328
    train acc: 0.900000
    Epoch 184 of 200 took 13.012520s
    train loss: 0.379462
    train acc: 0.850000
    Epoch 186 of 200 took 12.934583s
    train loss: 0.472857
    train acc: 0.750000
    Epoch 188 of 200 took 13.038168s
    train loss: 0.236005
    train acc: 1.000000
    Epoch 190 of 200 took 13.056378s
    train loss: 0.266042
    train acc: 0.950000
    Epoch 192 of 200 took 13.016137s
    train loss: 0.255430
    train acc: 0.950000
    Epoch 194 of 200 took 13.013147s
    train loss: 0.422342
    train acc: 0.900000
    Epoch 196 of 200 took 12.980659s
    train loss: 0.353984
    train acc: 0.900000
    Epoch 198 of 200 took 13.033676s
    train loss: 0.320018
    train acc: 0.950000
    Epoch 200 of 200 took 12.945982s
    train loss: 0.288049
    train acc: 0.950000


      好了,迁移学习Inception-ResNet-v2结束。
      作者说SlimNetsLayer是能导入任何Slim Model的。笔者已经验证过导入Inception-ResNet-v2和VGG16成功,Inception V3导入后训练了两三天,正确率一直在10到70之间波动(跟笔者的心情一样不稳定),笔者一直找不出原因,心累,希望哪位朋友再去验证一下Inception V3咯。

  • 相关阅读:
    华为云DevCloud为开发者提供高效智能的可信开发环境
    【HC资料合集】2019华为全联接大会主题资料一站式汇总,免费下载!
    在modelarts上部署mask-rcnn模型
    独立物理机和虚拟机比较有什么优势?
    .Net Core下使用MQTT协议直连IoT平台
    解惑Python模块学习,该如何着手操作...
    sar命令,linux中最为全面的性能分析工具之一
    窥探日志的秘密
    Debian 如何使用测试版更新软件包到最新的版本
    如何使用vsphere client 克隆虚拟机
  • 原文地址:https://www.cnblogs.com/zengfanlin/p/8970868.html
Copyright © 2020-2023  润新知