论文笔记 [5] SRCNN

论文笔记 [5] SRCNN
论文笔记 [5] Learning a Deep Convolutional Network for Image Super-Resolution

emmm……这篇文章就是在那篇做deblocking和deringing的文章中提到的，仍然是大佬汤晓鸥等做的深度CNN超分辨率的网络，就是SRCNN。这篇文章给出了一个端到端的，进去低分辨率出来高分辨率。并且说明了传统的sparse coding的SR方法也可以用CNN表示。而且这个网络比较轻量，效果好且速度快。

超分辨率实际上是一个很经典的问题，有的方法是通过图片内在的自相似性，而有的是通过external low- and high-resolution exemplar pairs 来学习映射关系。sparse coding（稀疏编码）是representative methods for external example-based image super-resolution 的一种方法。SC方法是这样，先分patch并且预处理，然后将这些patch用low-resolution dict编码，得到稀疏的coefficient，然后把码本换掉，换成high-resolution dict，然后重构出高分辨率的图像。所以以往对于基于SC的方法，人们主要关注怎么找到最好的码本，或其他模型。

CNN解决SR问题，不必显式地学习dict，manifold，modeling patch space 等等，而是用隐含层隐式学到。而且几乎不怎么用预处理和后处理。

SRCNN

SRCNN有以下几个操作：
1. patch extraction and representation
2. non-linear mapping
3. reconstruction
如下图所示：

patch extraction and representation 即一个可以产生n1个feature map的卷积层。non-linear mapping 环节，我们希望将每一个n1维的向量映射到n2维的向量，可以用1×1的kernel实现。文章中说：It is possible to add more convolutional layers (whose spatial supports are 1 × 1) to increase the non-linearity. But this can significantly increase the complexity of the model, and thus demands more training data and time. 虽然可以增加非线性（？），但是会使得模型变得复杂，所以需要更多的数据和训练时间。因此文章只用了一层map层。最后一层卷积回原来的通道数量c，三层的公式为：

Relationship to Sparse-Coding based Method

SC based 方法做SR问题的基本思路就是先对low-resolution（LR）的图像取patch并归一化，然后投影到一个LR 的dictionary，然后得到系数，再用 HR 的码本编码回去。在CNN中，第一层相当于提出了码本，filters就是dict中的元素，然后通过非线性映射相当于sparse coding solver，因为在SC的方法中，得到了n1个系数以后，要用sparse coding solver把n1个系数投射到n2系数，一般在SC中，n1=n2 。然后reconstruction过程相当于高分辨率码本进行合成。

文章说：Our non-linear operator can be considered as a pixel-wise fully-connected layer 。因为是1×1的kernel，实际上就是通道间逐像素的fc层。对于SC，没有对每个步骤都优化，而 But not all operations have been considered in the optimization in the sparse-coding-based SR methods. On the contrary, in our convolutional neural network, the low-resolution dictionary, high-resolution dictionary, non-linear mapping, together with mean subtraction and averaging, are all involved in the filters to be optimized. 所以可以对每个步骤达到最优。

通过和SC的对比可以用来调参，如下：

Others

loss function 就是MSE，用MSE实际上是favors a high PSNR，由于公式可以看出，PSNR和MSE的关系。另外PSNR只是部分的与perceptual quality相关，所以如果有更好的可导的loss function，可以在这个框架下把MSE替换掉，这也是传统方法不及之处。

训练用了91张图，然后用set5和set14分别用来对不同的upscaling factor做evaluation。

图像的合成：To synthesize the low-resolution samples {Y i }, we blur a sub-image by a proper Gaussian kernel, sub-sample it by the upscaling factor, and upscale it by the same factor via bicubic interpolation. 训练的patch文章中叫做sub-image，因为不像patch那样需要overlap和average。we mean these samples are treated as small “images” rather than “patches”, in the sense that “patches” are overlapping and require some averaging as post-processing but “sub-images” need not. 这些sub-image都是32×32 。

Following [20], we only consider the luminance channel (in YCrCb color space) in our experiments, so c = 1 in the first/last layer. The two chrominance channels are bicubic upsampled only for the purpose of displaying, but not for training/testing.

CNN模型可以处理多通道，作者说是为了fair comparison with 之前的SC方法，所以只用了一个luminance channel。为了避免boarder effect，没用padding，最后出来的patch是20×20.

关于学习率：We empirically find that a smaller learning rate in the last layer is important for the network to converge (similar to the denoising case [12])

在ImageNet上训练得到了更好的结果。

关于filter number，就是feature map的数量，用更多的feature map会提高performance，但是如果对速度有要求则应该用少一些的filters，也可以取得不错的效果。关于filter size，This suggests that a reasonably larger filter size could
grasp richer structural information, which in turn lead to better results. However, the deployment speed will also decrease with a larger filter size. Therefore, the choice of the network scale should always be a trade-off between performance and speed. 大尺寸的filter使得效果略略好一些。

与各种传统方法的对比图放一张，看上去貌似在PSNR高到一定程度的情况下，实际上PSNR的少量偏差和 visual / perception 得到的图像质量的偏差并不再是完全同步或等价了，因为HVS对于不同的细节和位置等敏感程度并不是完全一样的，而这一点在PSNR中并未体现。

reference：
Dong, Chao, Chen Change Loy, Kaiming He和Xiaoou Tang. 《Learning a Deep Convolutional Network for Image Super-Resolution》. 收入 Computer Vision – ECCV 2014, 184–99. Lecture Notes in Computer Science. Springer, Cham, 2014. https://doi.org/10.1007/978-3-319-10593-2_13.

2018/01/24

世人个个学长年，不悟常年在目前。我得宛秋平易法，只将食粥致神仙。 —— 陆游
相关阅读:
LeetCode 43. 字符串相乘(Multiply Strings)
LeetCode 541. 反转字符串 II(Reverse String II)
枚举类型
 c#字母加密
 汇率兑换Python
冒泡排序c#
c#
HTML
日历
 Java2
原文地址：https://www.cnblogs.com/morikokyuro/p/13256828.html

论文笔记 [5] SRCNN

论文笔记 [5] Learning a Deep Convolutional Network for Image Super-Resolution

SRCNN

Relationship to Sparse-Coding based Method

Others