Scale-space flow for end-to-end optimized video compression(CVPR 2020)

Scale-space flow for end-to-end optimized video compression(CVPR 2020)
展示了一个能更好地处理常见错误情况的广义翘曲算子(generalized warping operator) (Q: 这是什么东东?)。
提出尺度空间流，这是光流的直观概括，它添加了尺度参数以允许网络更好地对不确定性进行建模

双线性翘曲(bilinear warping)通过2通道位移场((f_x,f_y))来直接从2D源图像采样
这篇论文对双线性翘曲做了扩展，通过3通道位移+标度场(displacements+scale field)((g_x,g_y,g_z))来对三维标度空间体积进行三线性采样
(Q: 位移场怎么定义怎么求? 标度场怎么定义怎么求? 三线性采样具体怎么采样?)

Bilinear Warping:
Given an image (x) shape (H imes W) and a flow field (f=(f_x,f_y))

[x':=Bilinear-Warp(x,f) \ s.t. x'[x,y]=x[x+f_x[x,y],y+f_y[x,y]] ]
We refer to the flow channels (f_x,f_y in mathbb{R}^{H imes W}) as the x- and y- displacement fields of the flow (f).

Construct a fixed-resolution scale-space volume (X=[x,x*G(sigma_0),x*G(2sigma_0),...,x*G(2^{M-1} sigma_0)])
(x*G(sigma))表示(x)经过尺度为(sigma)的Gaussian kernel卷积后得到的结果
(X)为一堆逐渐模糊的(x)的叠加, 尺寸为(H imes W imes (M+1))
通过三线性插值(trilinear interpolation)，可以在连续坐标((x,y,z))下采样

Scale-space Warping:
Define a scale-space flow field as a 3-channel field (g:=(g_x,g_y,g_z))
scale-space warp of image (x) is:

[x':=Scale-Space-Warp(x,g) \ s.t. x'[x,y]=X[x+g_x[x,y],y+g_y[x,y],g_z[x,y]] ]
第三维(g_z)称为scale-space flow (g)的scale field.

We set (M=5) in all of our experiments.

Trilinear interpolation:
In-between two levels (i<=z<i+1), with corresponding Gaussian kernel sizes (sigma_a) and (sigma_b)

[sigma=sqrt{(z-i)sigma_a^2 + (1-z+i)sigma_b^2} ]
So when given a desired effective kernel size (0<sigma<2^{M-1}sigma_0), we can easily solve for the corresponding value:

[z=i+frac{sigma_b^2 -sigma^2}{sigma_b^2 - sigma_a^2} ]
Compression Model:
Given a sequence of frames (x_0,...,x_N)
1. 将第一帧(I)编码为潜变量(z_0), (z_0)量化为一个整数([z_0]), 再得到重构(hat{x}_0)
2. 对于当前给定的帧(P), 用单个神经网络来联合估计和编码量化the quantized scale-space warp latents ([w_i]), 从中解码得到scale-space flow (g_i)
3. 对前面的重构(hat{x}_{i-1})进行scale-space warp, 获得当前帧的估计(overline{x}_i)
4. 但是(overline{x}_i)是不完美的，另一个分支会对残差(r_i=x_i-overline{x}_i)进行编码, 得到潜变量([v_i])，并解码得到(hat{r}_i), 最终的重建为(hat{x}_i=overline{x}_i+hat{r}_i)
总共有三个潜变量:
1. image latent (z_0)
2. motion compensation latents (w_i)
3. residual latents (v_i)
对每个潜变量, 用一个单独的超先验(hyperprior)来建模相应的密度

Optimize the whole system for the total rate-distortion loss unrolled over N frames:

[sum_{i=1}^{N-1} d(x_i,hat{x}_i)+lambda [H(z_0)+sum_{i=1}^{N-1}H(v_i)+H(w_i)] ]
(H(.))表示各个潜变量的熵估计(entropy estimate)，包括超先验提取的边缘信息，(d)表示失真度量如MSE或MS-SSIM.

Overview of this end-to-end optimized low-latency compression system:
相关阅读:
个人作业——软件工程实践总结作业
 用户调查报告
 β总结
 凡事预则立
 学习进度条
 作业八——单元测试练习（个人练习）
作业七——“南通大学教务管理系统微信公众号” 用户体验分析
 作业六——团队作业（学生成绩录入系统设计与实现）
作业5——需求分析（学生成绩录入系统）
作业4.2：结对项目—— 词频统计(第二阶段)
原文地址：https://www.cnblogs.com/hhhhhxh/p/13198645.html