• 论文笔记 [5] SRCNN


    论文笔记 [5] Learning a Deep Convolutional Network for Image Super-Resolution

    emmm……这篇文章就是在那篇做deblocking和deringing的文章中提到的,仍然是大佬汤晓鸥等做的深度CNN超分辨率的网络,就是SRCNN。这篇文章给出了一个端到端的,进去低分辨率出来高分辨率。并且说明了传统的sparse coding的SR方法也可以用CNN表示。而且这个网络比较轻量,效果好且速度快。

    超分辨率实际上是一个很经典的问题,有的方法是通过图片内在的自相似性,而有的是通过external low- and high-resolution exemplar pairs 来学习映射关系。sparse coding(稀疏编码)是representative methods for external example-based image super-resolution 的一种方法。SC方法是这样,先分patch并且预处理,然后将这些patch用low-resolution dict编码,得到稀疏的coefficient,然后把码本换掉,换成high-resolution dict,然后重构出高分辨率的图像。所以以往对于基于SC的方法,人们主要关注怎么找到最好的码本,或其他模型。

    CNN解决SR问题,不必显式地学习dict,manifold,modeling patch space 等等,而是用隐含层隐式学到。而且几乎不怎么用预处理和后处理。

    SRCNN

    SRCNN有以下几个操作:

    1. patch extraction and representation
    2. non-linear mapping
    3. reconstruction

    如下图所示:


    这里写图片描述

    patch extraction and representation 即一个可以产生n1个feature map的卷积层。non-linear mapping 环节,我们希望将每一个n1维的向量映射到n2维的向量,可以用1×1的kernel实现。文章中说:It is possible to add more convolutional layers (whose spatial supports are 1 × 1) to increase the non-linearity. But this can significantly increase the complexity of the model, and thus demands more training data and time. 虽然可以增加非线性(?),但是会使得模型变得复杂,所以需要更多的数据和训练时间。因此文章只用了一层map层。最后一层卷积回原来的通道数量c,三层的公式为:


    这里写图片描述
    这里写图片描述
    这里写图片描述

    Relationship to Sparse-Coding based Method

    SC based 方法做SR问题的基本思路就是先对low-resolution(LR)的图像取patch并归一化,然后投影到一个LR 的dictionary,然后得到系数,再用 HR 的码本编码回去。在CNN中,第一层相当于提出了码本,filters就是dict中的元素,然后通过非线性映射相当于sparse coding solver,因为在SC的方法中,得到了n1个系数以后,要用sparse coding solver把n1个系数投射到n2系数,一般在SC中,n1=n2 。然后reconstruction过程相当于高分辨率码本进行合成。


    这里写图片描述

    文章说:Our non-linear operator can be considered as a pixel-wise fully-connected layer 。因为是1×1的kernel,实际上就是通道间逐像素的fc层。对于SC,没有对每个步骤都优化,而 But not all operations have been considered in the optimization in the sparse-coding-based SR methods. On the contrary, in our convolutional neural network, the low-resolution dictionary, high-resolution dictionary, non-linear mapping, together with mean subtraction and averaging, are all involved in the filters to be optimized. 所以可以对每个步骤达到最优。

    通过和SC的对比可以用来调参,如下:


    这里写图片描述

    Others

    loss function 就是MSE,用MSE实际上是favors a high PSNR,由于公式可以看出,PSNR和MSE的关系。另外PSNR只是部分的与perceptual quality相关,所以如果有更好的可导的loss function,可以在这个框架下把MSE替换掉,这也是传统方法不及之处。

    训练用了91张图,然后用set5和set14分别用来对不同的upscaling factor做evaluation。

    图像的合成:To synthesize the low-resolution samples {Y i }, we blur a sub-image by a proper Gaussian kernel, sub-sample it by the upscaling factor, and upscale it by the same factor via bicubic interpolation. 训练的patch文章中叫做sub-image,因为不像patch那样需要overlap和average。we mean these samples are treated as small “images” rather than “patches”, in the sense that “patches” are overlapping and require some averaging as post-processing but “sub-images” need not. 这些sub-image都是32×32 。

    Following [20], we only consider the luminance channel (in YCrCb color space) in our experiments, so c = 1 in the first/last layer. The two chrominance channels are bicubic upsampled only for the purpose of displaying, but not for training/testing.

    CNN模型可以处理多通道,作者说是为了fair comparison with 之前的SC方法,所以只用了一个luminance channel。为了避免boarder effect,没用padding,最后出来的patch是20×20.

    关于学习率:We empirically find that a smaller learning rate in the last layer is important for the network to converge (similar to the denoising case [12])

    在ImageNet上训练得到了更好的结果。

    关于filter number,就是feature map的数量,用更多的feature map会提高performance,但是如果对速度有要求则应该用少一些的filters,也可以取得不错的效果。 关于filter size,This suggests that a reasonably larger filter size could
    grasp richer structural information, which in turn lead to better results. However, the deployment speed will also decrease with a larger filter size. Therefore, the choice of the network scale should always be a trade-off between performance and speed. 大尺寸的filter使得效果略略好一些。


    这里写图片描述

    与各种传统方法的对比图放一张,看上去貌似在PSNR高到一定程度的情况下,实际上PSNR的少量偏差和 visual / perception 得到的图像质量的偏差并不再是完全同步或等价了,因为HVS对于不同的细节和位置等敏感程度并不是完全一样的,而这一点在PSNR中并未体现。


    这里写图片描述

    reference:
    Dong, Chao, Chen Change Loy, Kaiming He和Xiaoou Tang. 《Learning a Deep Convolutional Network for Image Super-Resolution》. 收入 Computer Vision – ECCV 2014, 184–99. Lecture Notes in Computer Science. Springer, Cham, 2014. https://doi.org/10.1007/978-3-319-10593-2_13.

    2018/01/24

    世人个个学长年,不悟常年在目前。我得宛秋平易法,只将食粥致神仙。 —— 陆游

  • 相关阅读:
    LeetCode 43. 字符串相乘(Multiply Strings)
    LeetCode 541. 反转字符串 II(Reverse String II)
    枚举类型
    c#字母加密
    汇率兑换Python
    冒泡排序c#
    c#
    HTML
    日历
    Java2
  • 原文地址:https://www.cnblogs.com/morikokyuro/p/13256828.html
Copyright © 2020-2023  润新知