GAN:通过 将 样本 特征 化 以后, 告诉 模型 哪些 样本 是 黑 哪些 是 白, 模型 通过 训练 后, 理解 了 黑白 样本 的 区别, 再输入 测试 样本 时, 模型 就可以 根据 以往 的 经验 判断 是 黑 还是 白。 与 这些 分类 的 算法 不同, GAN 的 基本 原理 是, 有两 个 相生相克 的 模型 Generator 和 Discriminator,Generator 随机 生成 样本, Discriminator 将 真实 样本 标记 为 Real, 将 Generator 生成 的 样本 标记 为 Fake 进行 监督 学习, 然后 Generator 随机 生成 新的 样本, 标记 为 Real 企图 欺骗 Discriminator, Discriminator 反馈 给 Generator 判断 的 结果, Generator 得到 反馈 后 就可以 生成 更 逼真 的 样本 直到 可以 完全 欺骗 Discriminator(???实际应用中生成的数据和真实数据没有啥区别啊???无非一个是程序生成的,一个是现实世界真实的????现实应用的话,图像识别是不是可以用来伪造照片???嗯,看来就是这样!!!DNS tunnel或者DGA的话,可以直接利用GAN来生成DGA域名,骗过判别模型,因为他们和白样本数据几乎没有区别)。
GAN 之所以 可以 生成 逼真 的 样本, 很大 程度 上 依赖于 对抗 模型, 所谓 的 对抗 模型 其实 就是 Generator 和 Discriminator 结合 在一起, 前者 的 输出 作为 后者 的 输入, 然后 使用 欺骗 性的 标记, 企图 欺骗 Discriminator。
DCGAN 的 训练 过程 分为 两步: 第一步, 生成 一个 大小 为( BATCH_ SIZE, 100) 的 在- 1 ~ 1 平均 分布 的 噪声, 使用 Generator 生成 图像 样本, 然后 和 同样 大小 的 真实 MNIST 图像 样本 合并, 分别 标记 为 0 和 1, 对 Discriminator 进行 训练。 这个 过程中 Discriminator 的 trainable 状态 为 True, 训练过 程 会 更新 其 参数。
第 二步, 生成 一个 大小 为( BATCH_ SIZE, 100) 的 在- 1 ~ 1 平均 分布 的 噪声, 使用 Generator 生成 图像 样本, 标记 为 1, 欺骗 Discriminator, 这个 过程 针对 对抗 模型 进行 训练。
过程中 Discriminator 的 trainable 状态 为 False, 训练 过程 不会 更新 其 参数。 训练 完成 后 将 重新 把 Discriminator 的 trainable 状态 设为 True。
tflearn GAN:
# -*- coding: utf-8 -*- """ GAN Example Use a generative adversarial network (GAN) to generate digit images from a noise distribution. References: - Generative adversarial nets. I Goodfellow, J Pouget-Abadie, M Mirza, B Xu, D Warde-Farley, S Ozair, Y. Bengio. Advances in neural information processing systems, 2672-2680. Links: - [GAN Paper](https://arxiv.org/pdf/1406.2661.pdf). """ from __future__ import division, print_function, absolute_import import matplotlib.pyplot as plt import numpy as np import tensorflow as tf import tflearn # Data loading and preprocessing import tflearn.datasets.mnist as mnist X, Y, testX, testY = mnist.load_data() image_dim = 784 # 28*28 pixels z_dim = 200 # Noise data points total_samples = len(X) # Generator def generator(x, reuse=False): with tf.variable_scope('Generator', reuse=reuse): x = tflearn.fully_connected(x, 256, activation='relu') x = tflearn.fully_connected(x, image_dim, activation='sigmoid') return x # Discriminator def discriminator(x, reuse=False): with tf.variable_scope('Discriminator', reuse=reuse): x = tflearn.fully_connected(x, 256, activation='relu') x = tflearn.fully_connected(x, 1, activation='sigmoid') return x # Build Networks gen_input = tflearn.input_data(shape=[None, z_dim], name='input_noise') disc_input = tflearn.input_data(shape=[None, 784], name='disc_input') gen_sample = generator(gen_input) disc_real = discriminator(disc_input) disc_fake = discriminator(gen_sample, reuse=True) # Define Loss disc_loss = -tf.reduce_mean(tf.log(disc_real) + tf.log(1. - disc_fake)) gen_loss = -tf.reduce_mean(tf.log(disc_fake)) # Build Training Ops for both Generator and Discriminator. # Each network optimization should only update its own variable, thus we need # to retrieve each network variables (with get_layer_variables_by_scope) and set # 'placeholder=None' because we do not need to feed any target. gen_vars = tflearn.get_layer_variables_by_scope('Generator') gen_model = tflearn.regression(gen_sample, placeholder=None, optimizer='adam', loss=gen_loss, trainable_vars=gen_vars, batch_size=64, name='target_gen', op_name='GEN') disc_vars = tflearn.get_layer_variables_by_scope('Discriminator') disc_model = tflearn.regression(disc_real, placeholder=None, optimizer='adam', loss=disc_loss, trainable_vars=disc_vars, batch_size=64, name='target_disc', op_name='DISC') # Define GAN model, that output the generated images. gan = tflearn.DNN(gen_model) # Training # Generate noise to feed to the generator z = np.random.uniform(-1., 1., size=[total_samples, z_dim]) # Start training, feed both noise and real images. gan.fit(X_inputs={gen_input: z, disc_input: X}, Y_targets=None, n_epoch=100) # Generate images from noise, using the generator network. f, a = plt.subplots(2, 10, figsize=(10, 4)) for i in range(10): for j in range(2): # Noise input. z = np.random.uniform(-1., 1., size=[1, z_dim]) # Generate image from noise. Extend to 3 channels for matplot figure. temp = [[ii, ii, ii] for ii in list(gan.predict([z])[0])] a[j][i].imshow(np.reshape(temp, (28, 28, 3))) f.show() plt.draw() plt.waitforbuttonpress()
tflearn DCGAN:
# -*- coding: utf-8 -*- """ DCGAN Example Use a deep convolutional generative adversarial network (DCGAN) to generate digit images from a noise distribution. References: - Unsupervised representation learning with deep convolutional generative adversarial networks. A Radford, L Metz, S Chintala. arXiv:1511.06434. Links: - [DCGAN Paper](https://arxiv.org/abs/1511.06434). """ from __future__ import division, print_function, absolute_import import matplotlib.pyplot as plt import numpy as np import tensorflow as tf import tflearn # Data loading and preprocessing import tflearn.datasets.mnist as mnist X, Y, testX, testY = mnist.load_data() X = np.reshape(X, newshape=[-1, 28, 28, 1]) z_dim = 200 # Noise data points total_samples = len(X) # Generator def generator(x, reuse=False): with tf.variable_scope('Generator', reuse=reuse): x = tflearn.fully_connected(x, n_units=7 * 7 * 128) x = tflearn.batch_normalization(x) x = tf.nn.tanh(x) x = tf.reshape(x, shape=[-1, 7, 7, 128]) x = tflearn.upsample_2d(x, 2) x = tflearn.conv_2d(x, 64, 5, activation='tanh') x = tflearn.upsample_2d(x, 2) x = tflearn.conv_2d(x, 1, 5, activation='sigmoid') return x # Discriminator def discriminator(x, reuse=False): with tf.variable_scope('Discriminator', reuse=reuse): x = tflearn.conv_2d(x, 64, 5, activation='tanh') x = tflearn.avg_pool_2d(x, 2) x = tflearn.conv_2d(x, 128, 5, activation='tanh') x = tflearn.avg_pool_2d(x, 2) x = tflearn.fully_connected(x, 1024, activation='tanh') x = tflearn.fully_connected(x, 2) x = tf.nn.softmax(x) return x # Input Data gen_input = tflearn.input_data(shape=[None, z_dim], name='input_gen_noise') input_disc_noise = tflearn.input_data(shape=[None, z_dim], name='input_disc_noise') input_disc_real = tflearn.input_data(shape=[None, 28, 28, 1], name='input_disc_real') # Build Discriminator disc_fake = discriminator(generator(input_disc_noise)) disc_real = discriminator(input_disc_real, reuse=True) disc_net = tf.concat([disc_fake, disc_real], axis=0) # Build Stacked Generator/Discriminator gen_net = generator(gen_input, reuse=True) stacked_gan_net = discriminator(gen_net, reuse=True) # Build Training Ops for both Generator and Discriminator. # Each network optimization should only update its own variable, thus we need # to retrieve each network variables (with get_layer_variables_by_scope). disc_vars = tflearn.get_layer_variables_by_scope('Discriminator') # We need 2 target placeholders, for both the real and fake image target. disc_target = tflearn.multi_target_data(['target_disc_fake', 'target_disc_real'], shape=[None, 2]) disc_model = tflearn.regression(disc_net, optimizer='adam', placeholder=disc_target, loss='categorical_crossentropy', trainable_vars=disc_vars, batch_size=64, name='target_disc', op_name='DISC') gen_vars = tflearn.get_layer_variables_by_scope('Generator') gan_model = tflearn.regression(stacked_gan_net, optimizer='adam', loss='categorical_crossentropy', trainable_vars=gen_vars, batch_size=64, name='target_gen', op_name='GEN') # Define GAN model, that output the generated images. gan = tflearn.DNN(gan_model) # Training # Prepare input data to feed to the discriminator disc_noise = np.random.uniform(-1., 1., size=[total_samples, z_dim]) # Prepare target data to feed to the discriminator (0: fake image, 1: real image) y_disc_fake = np.zeros(shape=[total_samples]) y_disc_real = np.ones(shape=[total_samples]) y_disc_fake = tflearn.data_utils.to_categorical(y_disc_fake, 2) y_disc_real = tflearn.data_utils.to_categorical(y_disc_real, 2) # Prepare input data to feed to the stacked generator/discriminator gen_noise = np.random.uniform(-1., 1., size=[total_samples, z_dim]) # Prepare target data to feed to the discriminator # Generator tries to fool the discriminator, thus target is 1 (e.g. real images) y_gen = np.ones(shape=[total_samples]) y_gen = tflearn.data_utils.to_categorical(y_gen, 2) # Start training, feed both noise and real images. gan.fit(X_inputs={'input_gen_noise': gen_noise, 'input_disc_noise': disc_noise, 'input_disc_real': X}, Y_targets={'target_gen': y_gen, 'target_disc_fake': y_disc_fake, 'target_disc_real': y_disc_real}, n_epoch=10) # Create another model from the generator graph to generate some samples # for testing (re-using same session to re-use the weights learnt). gen = tflearn.DNN(gen_net, session=gan.session) f, a = plt.subplots(4, 10, figsize=(10, 4)) for i in range(10): # Noise input. z = np.random.uniform(-1., 1., size=[4, z_dim]) g = np.array(gen.predict({'input_gen_noise': z})) for j in range(4): # Generate image from noise. Extend to 3 channels for matplot figure. img = np.reshape(np.repeat(g[j][:, :, np.newaxis], 3, axis=2), newshape=(28, 28, 3)) a[j][i].imshow(img) f.show() plt.draw() plt.waitforbuttonpress()
keras DCGAN
# -*- coding: utf-8 -*- """ Train an Auxiliary Classifier Generative Adversarial Network (ACGAN) on the MNIST dataset. See https://arxiv.org/abs/1610.09585 for more details. You should start to see reasonable images after ~5 epochs, and good images by ~15 epochs. You should use a GPU, as the convolution-heavy operations are very slow on the CPU. Prefer the TensorFlow backend if you plan on iterating, as the compilation time can be a blocker using Theano. Timings: Hardware | Backend | Time / Epoch ------------------------------------------- CPU | TF | 3 hrs Titan X (maxwell) | TF | 4 min Titan X (maxwell) | TH | 7 min Consult https://github.com/lukedeo/keras-acgan for more information and example output """ from __future__ import print_function from collections import defaultdict try: import cPickle as pickle except ImportError: import pickle from PIL import Image from six.moves import range from keras.datasets import mnist from keras import layers from keras.layers import Input, Dense, Reshape, Flatten, Embedding, Dropout from keras.layers import BatchNormalization from keras.layers.advanced_activations import LeakyReLU from keras.layers.convolutional import Conv2DTranspose, Conv2D from keras.models import Sequential, Model from keras.optimizers import Adam from keras.utils.generic_utils import Progbar import numpy as np np.random.seed(1337) num_classes = 10 def build_generator(latent_size): # we will map a pair of (z, L), where z is a latent vector and L is a # label drawn from P_c, to image space (..., 28, 28, 1) cnn = Sequential() cnn.add(Dense(3 * 3 * 384, input_dim=latent_size, activation='relu')) cnn.add(Reshape((3, 3, 384))) # upsample to (7, 7, ...) cnn.add(Conv2DTranspose(192, 5, strides=1, padding='valid', activation='relu', kernel_initializer='glorot_normal')) cnn.add(BatchNormalization()) # upsample to (14, 14, ...) cnn.add(Conv2DTranspose(96, 5, strides=2, padding='same', activation='relu', kernel_initializer='glorot_normal')) cnn.add(BatchNormalization()) # upsample to (28, 28, ...) cnn.add(Conv2DTranspose(1, 5, strides=2, padding='same', activation='tanh', kernel_initializer='glorot_normal')) # this is the z space commonly referred to in GAN papers latent = Input(shape=(latent_size, )) # this will be our label image_class = Input(shape=(1,), dtype='int32') cls = Flatten()(Embedding(num_classes, latent_size, embeddings_initializer='glorot_normal')(image_class)) # hadamard product between z-space and a class conditional embedding h = layers.multiply([latent, cls]) fake_image = cnn(h) return Model([latent, image_class], fake_image) def build_discriminator(): # build a relatively standard conv net, with LeakyReLUs as suggested in # the reference paper cnn = Sequential() cnn.add(Conv2D(32, 3, padding='same', strides=2, input_shape=(28, 28, 1))) cnn.add(LeakyReLU(0.2)) cnn.add(Dropout(0.3)) cnn.add(Conv2D(64, 3, padding='same', strides=1)) cnn.add(LeakyReLU(0.2)) cnn.add(Dropout(0.3)) cnn.add(Conv2D(128, 3, padding='same', strides=2)) cnn.add(LeakyReLU(0.2)) cnn.add(Dropout(0.3)) cnn.add(Conv2D(256, 3, padding='same', strides=1)) cnn.add(LeakyReLU(0.2)) cnn.add(Dropout(0.3)) cnn.add(Flatten()) image = Input(shape=(28, 28, 1)) features = cnn(image) # first output (name=generation) is whether or not the discriminator # thinks the image that is being shown is fake, and the second output # (name=auxiliary) is the class that the discriminator thinks the image # belongs to. fake = Dense(1, activation='sigmoid', name='generation')(features) aux = Dense(num_classes, activation='softmax', name='auxiliary')(features) return Model(image, [fake, aux]) if __name__ == '__main__': # batch and latent size taken from the paper epochs = 100 batch_size = 100 latent_size = 100 # Adam parameters suggested in https://arxiv.org/abs/1511.06434 adam_lr = 0.0002 adam_beta_1 = 0.5 # build the discriminator print('Discriminator model:') discriminator = build_discriminator() discriminator.compile( optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1), loss=['binary_crossentropy', 'sparse_categorical_crossentropy'] ) discriminator.summary() # build the generator generator = build_generator(latent_size) latent = Input(shape=(latent_size, )) image_class = Input(shape=(1,), dtype='int32') # get a fake image fake = generator([latent, image_class]) # we only want to be able to train generation for the combined model discriminator.trainable = False fake, aux = discriminator(fake) combined = Model([latent, image_class], [fake, aux]) print('Combined model:') combined.compile( optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1), loss=['binary_crossentropy', 'sparse_categorical_crossentropy'] ) combined.summary() # get our mnist data, and force it to be of shape (..., 28, 28, 1) with # range [-1, 1] (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = (x_train.astype(np.float32) - 127.5) / 127.5 x_train = np.expand_dims(x_train, axis=-1) x_test = (x_test.astype(np.float32) - 127.5) / 127.5 x_test = np.expand_dims(x_test, axis=-1) num_train, num_test = x_train.shape[0], x_test.shape[0] train_history = defaultdict(list) test_history = defaultdict(list) for epoch in range(1, epochs + 1): print('Epoch {}/{}'.format(epoch, epochs)) num_batches = int(x_train.shape[0] / batch_size) progress_bar = Progbar(target=num_batches) # we don't want the discriminator to also maximize the classification # accuracy of the auxiliary classifier on generated images, so we # don't train discriminator to produce class labels for generated # images (see https://openreview.net/forum?id=rJXTf9Bxg). # To preserve sum of sample weights for the auxiliary classifier, # we assign sample weight of 2 to the real images. disc_sample_weight = [np.ones(2 * batch_size), np.concatenate((np.ones(batch_size) * 2, np.zeros(batch_size)))] epoch_gen_loss = [] epoch_disc_loss = [] for index in range(num_batches): # generate a new batch of noise noise = np.random.uniform(-1, 1, (batch_size, latent_size)) # get a batch of real images image_batch = x_train[index * batch_size:(index + 1) * batch_size] label_batch = y_train[index * batch_size:(index + 1) * batch_size] # sample some labels from p_c sampled_labels = np.random.randint(0, num_classes, batch_size) # generate a batch of fake images, using the generated labels as a # conditioner. We reshape the sampled labels to be # (batch_size, 1) so that we can feed them into the embedding # layer as a length one sequence generated_images = generator.predict( [noise, sampled_labels.reshape((-1, 1))], verbose=0) x = np.concatenate((image_batch, generated_images)) # use one-sided soft real/fake labels # Salimans et al., 2016 # https://arxiv.org/pdf/1606.03498.pdf (Section 3.4) soft_zero, soft_one = 0, 0.95 y = np.array([soft_one] * batch_size + [soft_zero] * batch_size) aux_y = np.concatenate((label_batch, sampled_labels), axis=0) # see if the discriminator can figure itself out... epoch_disc_loss.append(discriminator.train_on_batch( x, [y, aux_y], sample_weight=disc_sample_weight)) # make new noise. we generate 2 * batch size here such that we have # the generator optimize over an identical number of images as the # discriminator noise = np.random.uniform(-1, 1, (2 * batch_size, latent_size)) sampled_labels = np.random.randint(0, num_classes, 2 * batch_size) # we want to train the generator to trick the discriminator # For the generator, we want all the {fake, not-fake} labels to say # not-fake trick = np.ones(2 * batch_size) * soft_one epoch_gen_loss.append(combined.train_on_batch( [noise, sampled_labels.reshape((-1, 1))], [trick, sampled_labels])) progress_bar.update(index + 1) print('Testing for epoch {}:'.format(epoch)) # evaluate the testing loss here # generate a new batch of noise noise = np.random.uniform(-1, 1, (num_test, latent_size)) # sample some labels from p_c and generate images from them sampled_labels = np.random.randint(0, num_classes, num_test) generated_images = generator.predict( [noise, sampled_labels.reshape((-1, 1))], verbose=False) x = np.concatenate((x_test, generated_images)) y = np.array([1] * num_test + [0] * num_test) aux_y = np.concatenate((y_test, sampled_labels), axis=0) # see if the discriminator can figure itself out... discriminator_test_loss = discriminator.evaluate( x, [y, aux_y], verbose=False) discriminator_train_loss = np.mean(np.array(epoch_disc_loss), axis=0) # make new noise noise = np.random.uniform(-1, 1, (2 * num_test, latent_size)) sampled_labels = np.random.randint(0, num_classes, 2 * num_test) trick = np.ones(2 * num_test) generator_test_loss = combined.evaluate( [noise, sampled_labels.reshape((-1, 1))], [trick, sampled_labels], verbose=False) generator_train_loss = np.mean(np.array(epoch_gen_loss), axis=0) # generate an epoch report on performance train_history['generator'].append(generator_train_loss) train_history['discriminator'].append(discriminator_train_loss) test_history['generator'].append(generator_test_loss) test_history['discriminator'].append(discriminator_test_loss) print('{0:<22s} | {1:4s} | {2:15s} | {3:5s}'.format( 'component', *discriminator.metrics_names)) print('-' * 65) ROW_FMT = '{0:<22s} | {1:<4.2f} | {2:<15.4f} | {3:<5.4f}' print(ROW_FMT.format('generator (train)', *train_history['generator'][-1])) print(ROW_FMT.format('generator (test)', *test_history['generator'][-1])) print(ROW_FMT.format('discriminator (train)', *train_history['discriminator'][-1])) print(ROW_FMT.format('discriminator (test)', *test_history['discriminator'][-1])) # save weights every epoch generator.save_weights( 'params_generator_epoch_{0:03d}.hdf5'.format(epoch), True) discriminator.save_weights( 'params_discriminator_epoch_{0:03d}.hdf5'.format(epoch), True) # generate some digits to display num_rows = 40 noise = np.tile(np.random.uniform(-1, 1, (num_rows, latent_size)), (num_classes, 1)) sampled_labels = np.array([ [i] * num_rows for i in range(num_classes) ]).reshape(-1, 1) # get a batch to display generated_images = generator.predict( [noise, sampled_labels], verbose=0) # prepare real images sorted by class label real_labels = y_train[(epoch - 1) * num_rows * num_classes: epoch * num_rows * num_classes] indices = np.argsort(real_labels, axis=0) real_images = x_train[(epoch - 1) * num_rows * num_classes: epoch * num_rows * num_classes][indices] # display generated images, white separator, real images img = np.concatenate( (generated_images, np.repeat(np.ones_like(x_train[:1]), num_rows, axis=0), real_images)) # arrange them into a grid img = (np.concatenate([r.reshape(-1, 28) for r in np.split(img, 2 * num_classes + 1) ], axis=-1) * 127.5 + 127.5).astype(np.uint8) Image.fromarray(img).save( 'plot_epoch_{0:03d}_generated.png'.format(epoch)) with open('acgan-history.pkl', 'wb') as f: pickle.dump({'train': train_history, 'test': test_history}, f)
下文from:https://blog.csdn.net/qq_25737169/article/details/78857724
GAN(Generative adversarial nets),中文是生成对抗网络,他是一种生成式模型,也是一种无监督学习模型。其最大的特点是为深度网络提供了一种对抗训练的方式,此方式有助于解决一些普通训练方式不容易解决的问题。并且Yan lecun明确表示GAN是近几十年除了面包机最伟大的发明,并且希望是自己发明的GAN。
关于GAN的入门可以参考机器学习算法全栈工程师公众号之前的文章:GAN入门与实践,里面包含了GAN的入门介绍以及生成人脸图片的实践tensorflow代码。
1. GAN诞生背后的故事:
学术界流传,GAN创始人 Ian Goodfellow 在酒吧微醉后与同事讨论学术问题,当时灵光乍现提出了GAN初步的想法,不过当时并没有得到同事的认可,在从酒吧回去后发现女朋友已经睡了,于是自己熬夜写了代码,发现还真有效果,于是经过一番研究后,GAN就诞生了,一篇开山之作。附上一张大神照片。
Ian goodfellow
2. GAN的原理:
GAN的主要灵感来源于博弈论中零和博弈的思想,应用到深度学习神经网络上来说,就是通过生成网络G(Generator)和判别网络D(Discriminator)不断博弈,进而使G学习到数据的分布,如果用到图片生成上,则训练完成后,G可以从一段随机数中生成逼真的图像。G, D的主要功能是:
● G是一个生成式的网络,它接收一个随机的噪声z(随机数),通过这个噪声生成图像
● D是一个判别网络,判别一张图片是不是“真实的”。它的输入参数是x,x代表一张图片,输出D(x)代表x为真实图片的概率,如果为1,就代表100%是真实的图片,而输出为0,就代表不可能是真实的图片
训练过程中,生成网络G的目标就是尽量生成真实的图片去欺骗判别网络D。而D的目标就是尽量辨别出G生成的假图像和真实的图像。这样,G和D构成了一个动态的“博弈过程”,最终的平衡点即纳什均衡点.
3. GAN的特点:
● 相比较传统的模型,他存在两个不同的网络,而不是单一的网络,并且训练方式采用的是对抗训练方式
● GAN中G的梯度更新信息来自判别器D,而不是来自数据样本
4. GAN 的优点:
(以下部分摘自ian goodfellow 在Quora的问答)
● GAN是一种生成式模型,相比较其他生成模型(玻尔兹曼机和GSNs)只用到了反向传播,而不需要复杂的马尔科夫链
● 相比其他所有模型, GAN可以产生更加清晰,真实的样本
● GAN采用的是一种无监督的学习方式训练,可以被广泛用在无监督学习和半监督学习领域
● 相比于变分自编码器, GANs没有引入任何决定性偏置( deterministic bias),变分方法引入决定性偏置,因为他们优化对数似然的下界,而不是似然度本身,这看起来导致了VAEs生成的实例比GANs更模糊
● 相比VAE, GANs没有变分下界,如果鉴别器训练良好,那么生成器可以完美的学习到训练样本的分布.换句话说,GANs是渐进一致的,但是VAE是有偏差的
● GAN应用到一些场景上,比如图片风格迁移,超分辨率,图像补全,去噪,避免了损失函数设计的困难,不管三七二十一,只要有一个的基准,直接上判别器,剩下的就交给对抗训练了。
5. GAN的缺点:
● 训练GAN需要达到纳什均衡,有时候可以用梯度下降法做到,有时候做不到.我们还没有找到很好的达到纳什均衡的方法,所以训练GAN相比VAE或者PixelRNN是不稳定的,但我认为在实践中它还是比训练玻尔兹曼机稳定的多
● GAN不适合处理离散形式的数据,比如文本
● GAN存在训练不稳定、梯度消失、模式崩溃的问题(目前已解决)
模式崩溃(model collapse)原因
一般出现在GAN训练不稳定的时候,具体表现为生成出来的结果非常差,但是即使加长训练时间后也无法得到很好的改善。
具体原因可以解释如下:GAN采用的是对抗训练的方式,G的梯度更新来自D,所以G生成的好不好,得看D怎么说。具体就是G生成一个样本,交给D去评判,D会输出生成的假样本是真样本的概率(0-1),相当于告诉G生成的样本有多大的真实性,G就会根据这个反馈不断改善自己,提高D输出的概率值。但是如果某一次G生成的样本可能并不是很真实,但是D给出了正确的评价,或者是G生成的结果中一些特征得到了D的认可,这时候G就会认为我输出的正确的,那么接下来我就这样输出肯定D还会给出比较高的评价,实际上G生成的并不怎么样,但是他们两个就这样自我欺骗下去了,导致最终生成结果缺失一些信息,特征不全。
关于梯度消失的问题可以参考郑华滨的令人拍案叫绝的wassertein GAN,里面给出了详细的解释,不过多重复。
局部极小值点
鞍点
为什么GAN中的优化器不常用SGD
1. SGD容易震荡,容易使GAN训练不稳定,
2. GAN的目的是在高维非凸的参数空间中找到纳什均衡点,GAN的纳什均衡点是一个鞍点,但是SGD只会找到局部极小值,因为SGD解决的是一个寻找最小值的问题,GAN是一个博弈问题。
为什么GAN不适合处理文本数据
1. 文本数据相比较图片数据来说是离散的,因为对于文本来说,通常需要将一个词映射为一个高维的向量,最终预测的输出是一个one-hot向量,假设softmax的输出是(0.2, 0.3, 0.1,0.2,0.15,0.05)那么变为onehot是(0,1,0,0,0,0),如果softmax输出是(0.2, 0.25, 0.2, 0.1,0.15,0.1 ),one-hot仍然是(0, 1, 0, 0, 0, 0),所以对于生成器来说,G输出了不同的结果但是D给出了同样的判别结果,并不能将梯度更新信息很好的传递到G中去,所以D最终输出的判别没有意义。
2. 另外就是GAN的损失函数是JS散度,JS散度不适合衡量不想交分布之间的距离。
(WGAN虽然使用wassertein距离代替了JS散度,但是在生成文本上能力还是有限,GAN在生成文本上的应用有seq-GAN,和强化学习结合的产物)
训练GAN的一些技巧
1. 输入规范化到(-1,1)之间,最后一层的激活函数使用tanh(BEGAN除外)
2. 使用wassertein GAN的损失函数,
3. 如果有标签数据的话,尽量使用标签,也有人提出使用反转标签效果很好,另外使用标签平滑,单边标签平滑或者双边标签平滑
4. 使用mini-batch norm, 如果不用batch norm 可以使用instance norm 或者weight norm
5. 避免使用RELU和pooling层,减少稀疏梯度的可能性,可以使用leakrelu激活函数
6. 优化器尽量选择ADAM,学习率不要设置太大,初始1e-4可以参考,另外可以随着训练进行不断缩小学习率,
7. 给D的网络层增加高斯噪声,相当于是一种正则
GAN的变种
自从GAN出世后,得到了广泛研究,先后几百篇不同的GANpaper横空出世,国外有大神整理了一个GAN zoo(GAN动物园),链接如下,感兴趣的可以参考一下:
https://github.com/hindupuravinash/the-gan-zoo
GitHub上已经1200+star了,顺便附上一张GAN的成果图,可见GAN的研究火热程度:
由于GAN的变种实在太多,这里我只简单介绍几种比较常常用的成果,包括DCGAN,, WGAN, improved-WGAN,BEGAN,并附有详细的代码github链接。
GAN的广泛应用
1. GAN本身是一种生成式模型,所以在数据生成上用的是最普遍的,最常见的是图片生成,常用的有DCGAN WGAN,BEGAN,个人感觉在BEGAN的效果最好而且最简单。
2. GAN本身也是一种无监督学习的典范,因此它在无监督学习,半监督学习领域都有广泛的应用,比较好的论文有
Improved Techniques for Training GANs
Bayesian GAN(最新)
Good Semi-supervised Learning
3. 不仅在生成领域,GAN在分类领域也占有一席之地,简单来说,就是替换判别器为一个分类器,做多分类任务,而生成器仍然做生成任务,辅助分类器训练。
4. GAN可以和强化学习结合,目前一个比较好的例子就是seq-GAN
5. 目前比较有意思的应用就是GAN用在图像风格迁移,图像降噪修复,图像超分辨率了,都有比较好的结果,详见pix-2-pix GAN 和cycle GAN。但是GAN目前在视频生成上和预测上还不是很好。
6. 目前也有研究者将GAN用在对抗性攻击上,具体就是训练GAN生成对抗文本,有针对或者无针对的欺骗分类器或者检测系统等等,但是目前没有见到很典范的文章。
注:配图来自网络
参考文献:
https://www.zhihu.com/question/56171002/answer/148593584
http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/
https://github.com/soumith/ganhacks
https://github.com/hindupuravinash/the-gan-zoo
https://zhuanlan.zhihu.com/p/25071913