Introduction:
Deconvolution; Computational costs; Strided convolutional nets; Markov patches;
1. Q: The task of texture synthesis have considerable computational costs becuase of a numerical deconvolution in previous work.
2. A: the author propose to procompute a feed-forwaed, strided convolutional network :
This framework can -
1. capture statistics of Markov patches.
2. directly generate output of arbitrary dimensions.
3. this method have considerable advantage in the fact of time-computation.
4. traditional complexity constraints(复杂性约束) using Markov random field that characterizes(表征) images by statistics of local patches of pixels(局部像素快的统计信息).
5. Deep architectures capture appearance variations in object classes beyond the abilities of pixel-level ap-
proaches.(深层架构能够捕获外表形状的变化的能力超过了基于像素水平的方法)
6.two main class of deep generative models:
1. full images models, often including specially trained 'auto-encoder', which limited fidelity(精确度) in details.
2. deep Markov models, capture the statistics of local patches, and assemble them to high-resolution.
Advantage:Markov model have good fidelity of details.
Disadvantage:
如果不重要的的全局结构要被产生, 则需要额外的辅助指导;
high time computation
这自然地提供了the blending of patches,并允许重用复杂的、紧急的多层特征表示的 大 型、有区别地训练的神经网络,如VGG网络[30],重新利用它们进行图像合成wih deconvolution framwork.
Objective: to improve the effciency of deep Markovian texture synthesis.
The key idea:
To precompute the inversion of strided the network by fitting a convolutional network [31,29] to the inversion process, which operates purely in a feed-forward fashion.(关键思想是通过将跨步卷积网络拟合到反演过程来预先计算网络的反演,该反演过程纯粹以前馈方式运行)
尽管在固定大小的patch上进行训练,得到的网络可以生成任意尺寸的连续图像,而不需要任何额外的优化或混合,从而产生一个具有特殊风格和高性能的高质量纹理合成器.
The model:
the framework of DCGANs is applied, nonetheless(然而).相同(be equivalent to )
Related work
1.Xie et al. [34] have proved that a generative random field model can be derived from used discriminative networks, and show applications to unguided texture synthesis.(Xie等人的[34]已经证明了从所使用的判别网络中可以导出一个生成的随机场模型,并展示了它在非制导纹理合成中的应用。)
2.full image method with auto-encoders as generative nets.
DCGANs stabilzed the performance of GANs and shows the generator have vector arithmeric properties(向量运算性质).生成器具备了“向量运算”的神奇性质,类似于word embedding可以操纵向量,并且能够按照“语义”生成新内容。
Adversarial nets offer perceptual metrics(感知指标) that allow AEs to be trianing effciency.
3. this PatchGANs is the use of feature-patch statistics rather than learn Gaussian distributions of individual feature vectors.(本文的主要概念差异是使用了Li等人的[21]特征-patch统计量,而不是学习单个特征向量的高斯分布,这在更忠实地再现纹理方面提供了一些好处。)
Model
Motivation:
1.As figure shown,real data does not always comly with(遵守) a Gaussian distribution(a), but a complex nonlinear monifold(复杂的非线性流体)(b), We adversarially learn a mapping to project contextually related patches to that manifold.
2. Statistics based mehods match the disribution of input and target with a Gaussian model.
3. Adversarial training (GANs) can recognize such manifold with its discriminative network. and strengthen its generative power with a projection on the manifold.
4. to improve adversarial training with contextually corresponding Markovian patches(上下文对应的马尔可夫patches),to focus on depictions(描述) of same context.
Model Depictions:
for D:
D (green blocks) that learns to distinguish actual feature patches (on VGG 19 layer Relu3 1, purple block) from inappropriately synthesized ones(不当的合成的patches).
第二次比较(管道下面的D)与VGG 19编码相同的图像在较高的,更抽象的层Relu5 1可以选择用于指导the distinguish of content.
for G:
encoding with VGG19_Relu4_1 and decodes it to pixels of the synthesis image
for MDANs: with a deconvolutional process is driven by adversarial traning
1. D (green blocks) is trained to distinguish between "neural patches" sampled from the synthesis image and sampled from the example image.
2. the score (1-s) is its texture loss.
with loss function:
$E_{t}$ denotes the loss between example texture image and synthsized image.
We initialize $x$ with random noise for un-guided synthesis, or an content image $x_{c}$ for guided synthesis.
with Hinge loss :
Here $s_{i}$ denotes the classication score of i-th neural patch, and $N$ is the total
number of sampled patches.
for MGANs
1. G decodes a picture through a ordinary convolution followed by a cascade(级联) of fractional-strided convolutions(分数阶跃卷积) (FS Conv).
Although being trained with fixed size input, the generator naturally extends to arbitrary size images.
2. 欧式距离的损失函数会使 产生(yield)的图像过于平滑(over-smooth)
3.compared with GANs, PatchGans do not operate on full images, but neural patches. in order to make learning easier with contextual correspondence between the patches
4. replace sigmoid by hinge loss.
Experiment detail
1. augment dataset with rotations and scales
2. samle subwindow of 128-by-128, where neural patches are sampled from its relu3_1 encoding as the input of D.
for Training
The training process has three main steps:
- Use MDAN to generate training images (MDAN_wrapper.lua).
- Data Augmentation (AG_wrapper.lua).
- Train MGAN (MDAN_wrapper.lua).