训练过程

可采用的机器学习数据集:

  1. https://www.kaggle.com/gasgallo/faces-data-new
  2. https://www.kaggle.com/gasgallo/lag-dataset

两者都包含人脸图像。我把这两个组合成一个文件夹。

为任务选择正确的网络

最常听到的两种图像生成技术是生成对抗网络(GAN)和LSTM网络。

LSTM训练的时候速度非常慢,GAN训练会快得多。实际结果花不到半小时,模糊的面孔就会开始出现。随着时间的推移,图像会更加逼真。

有许多GAN变种。我使用的一种称为深度卷积神经网络(DCGAN)。DCGAN的优点在于它使用了卷积层。卷积神经网络目前是存在的最佳图像分类算法。

简介

生成对抗网络是由一位名叫Ian Goodfellow的研究员发明的,并于2014年引入了GAN。

GAN非常强大。利用正确的数据,网络架构和超参数,您可以生成非常逼真的图像。

将来,一些高级版本的GAN或其他一些内容生成算法可能会让我们做一些很酷的事情:

  1. 生成逼真的视频游戏。
  2. 生成电影。
  3. 为新技术(更好的汽车,宇宙飞船等)生成3D设计

但GAN是如何运作的呢?

GAN实际上不是一个神经网络,而是两个。其中之一是Generator。它将随机值作为输入并生成图像。

第二是discriminator。它试图确定图像是假的还是真的。

训练GAN就像一场竞赛。Generator试图在愚弄discriminator时变得尽可能好。discriminator试图尽可能地将假图像与真实图像分开。

这将迫使他们两个都改善。理想情况下,这将在某种程度上导致以下情况:

  1. Generator生成的图像对于人类来说与真实图像无法区分。
  2. discriminator网络的准确率达到50%。换句话说,discriminator不能分离真的和假的,因此每次都必须猜测。

在现实中,您需要确保一切正常(数据、体系结构、超参数)。GAN对超参数值的微小变化非常敏感。

神经网络架构

Python实现

导入库

第一步是导入所有需要的Python库。

#Import everything that is needed from Keras library.from keras.layers import Input, Reshape, Dropout, Dense, Flatten, BatchNormalization, Activation, ZeroPadding2Dfrom keras.layers.advanced_activations import LeakyReLUfrom keras.layers.convolutional import UpSampling2D, Conv2Dfrom keras.models import Sequential, Model, load_modelfrom keras.optimizers import Adam#matplotlib will help with displaying the resultsimport matplotlib.pyplot as plt#numpy for some mathematical operationsimport numpy as np#PIL for opening,resizing and saving imagesfrom PIL import Image#tqdm for a progress bar when loading the datasetfrom tqdm import tqdm#os library is needed for extracting filenames from the dataset folder.import os

FaceGenerator类

class FaceGenerator: #RGB-images: 3-channels, grayscale: 1-channel, RGBA-images: 4-channels def __init__(self,image_width,image_height,channels): self.image_width = image_width self.image_height = image_height self.channels = channels self.image_shape = (self.image_width,self.image_height,self.channels) #Amount of randomly generated numbers for the first layer of the generator. self.random_noise_dimension = 100 #Just 10 times higher learning rate would result in generator loss being stuck at 0. optimizer = Adam(0.0002,0.5) self.discriminator = self.build_discriminator() self.discriminator.compile(loss="binary_crossentropy",optimizer=optimizer,metrics=["accuracy"]) self.generator = self.build_generator() #A placeholder for the generator input. random_input = Input(shape=(self.random_noise_dimension,)) #Generator generates images from random noise. generated_image = self.generator(random_input) # For the combined model we will only train the generator self.discriminator.trainable = False #Discriminator attempts to determine if image is real or generated validity = self.discriminator(generated_image) #Combined model = generator and discriminator combined. #1. Takes random noise as an input. #2. Generates an image. #3. Attempts to determine if image is real or generated. self.combined = Model(random_input,validity) self.combined.compile(loss="binary_crossentropy",optimizer=optimizer)

这段Python代码初始化了训练所需的一些重要变量。

  • image_width,simage_height =生成图像的大小(以像素为单位)
  • channels =生成的图像中的颜色通道数量
  • random_noise_dimension =generator作为输入的随机值的数量
  • optimizer= 用于反向传播的优化器
  • discriminator =一种卷积神经网络,试图确定图像是假的还是真的
  • generator =生成图像的卷积神经网络。
  • random_input =随机值的占位符。我们将使用它将随机值提供给generator。
  • generated_image =generator的输出
  • validity=generator在多大程度上欺骗discriminator
  • combined=generator和discriminator组合成一个模型。它不是单独训练generator器,而是通过组合模型进行训练。这是为了反向传播损失所必需的。

将训练数据加载到模型中

def get_training_data(self,datafolder): print("Loading training data...") training_data = [] #Finds all files in datafolder filenames = os.listdir(datafolder) for filename in tqdm(filenames): #Combines folder name and file name. path = os.path.join(datafolder,filename) #Opens an image as an Image object. image = Image.open(path) #Resizes to a desired size. image = image.resize((self.image_width,self.image_height),Image.ANTIALIAS) #Creates an array of pixel values from the image. pixel_array = np.asarray(image) training_data.append(pixel_array) #training_data is converted to a numpy array training_data = np.reshape(training_data,(-1,self.image_width,self.image_height,self.channels)) return training_data

此函数将文件夹的名称作为输入,并将该文件夹中的所有图像作为numpy数组返回。所有图像的大小都调整为__init__函数中指定的大小。

Shape=(图像的数量,宽度,高度,通道)。

神经网络

 def build_generator(self): #Generator attempts to fool discriminator by generating new images. model = Sequential() model.add(Dense(256*4*4,activation="relu",input_dim=self.random_noise_dimension)) model.add(Reshape((4,4,256))) #Four layers of upsampling, convolution, batch normalization and activation. # 1. Upsampling: Input data is repeated. Default is (2,2). In that case a 4x4x256 array becomes an 8x8x256 array. # 2. Convolution # 3. Normalization normalizes outputs from convolution. # 4. Relu activation: f(x) = max(0,x). If x < 0, then f(x) = 0. model.add(UpSampling2D()) model.add(Conv2D(256,kernel_size=3,padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(Activation("relu")) model.add(UpSampling2D()) model.add(Conv2D(256,kernel_size=3,padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(Activation("relu")) model.add(UpSampling2D()) model.add(Conv2D(128,kernel_size=3,padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(Activation("relu")) model.add(UpSampling2D()) model.add(Conv2D(128,kernel_size=3,padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(Activation("relu")) # Last convolutional layer outputs as many featuremaps as channels in the final image. model.add(Conv2D(self.channels,kernel_size=3,padding="same")) # tanh maps everything to a range between -1 and 1. model.add(Activation("tanh")) # show the summary of the model architecture model.summary() # Placeholder for the random noise input input = Input(shape=(self.random_noise_dimension,)) #Model output generated_image = model(input) #Change the model type from Sequential to Model (functional API) More at: https://keras.io/models/model/. return Model(input,generated_image) def build_discriminator(self): #Discriminator attempts to classify real and generated images model = Sequential() model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=self.image_shape, padding="same")) #Leaky relu is similar to usual relu. If x < 0 then f(x) = x * alpha, otherwise f(x) = x. model.add(LeakyReLU(alpha=0.2)) #Dropout blocks some connections randomly. This help the model to generalize better. #0.25 means that every connection has a 25% chance of being blocked. model.add(Dropout(0.25)) model.add(Conv2D(64, kernel_size=3, strides=2, padding="same")) #Zero padding adds additional rows and columns to the image. Those rows and columns are made of zeros. model.add(ZeroPadding2D(padding=((0,1),(0,1)))) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(128, kernel_size=3, strides=2, padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(256, kernel_size=3, strides=1, padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) model.add(Conv2D(512, kernel_size=3, strides=1, padding="same")) model.add(BatchNormalization(momentum=0.8)) model.add(LeakyReLU(alpha=0.2)) model.add(Dropout(0.25)) #Flatten layer flattens the output of the previous layer to a single dimension. model.add(Flatten()) #Outputs a value between 0 and 1 that predicts whether image is real or generated. 0 = generated, 1 = real. model.add(Dense(1, activation='sigmoid')) model.summary() input_image = Input(shape=self.image_shape) #Model output given an image. validity = model(input_image) return Model(input_image, validity)

这两个函数定义了generator和discriminator。

神经网络模型训练

 def train(self, datafolder ,epochs,batch_size,save_images_interval): #Get the real images training_data = self.get_training_data(datafolder) #Map all values to a range between -1 and 1. training_data = training_data / 127.5 - 1. #Two arrays of labels. Labels for real images: [1,1,1 ... 1,1,1], labels for generated images: [0,0,0 ... 0,0,0] labels_for_real_images = np.ones((batch_size,1)) labels_for_generated_images = np.zeros((batch_size,1)) for epoch in range(epochs): # Select a random half of images indices = np.random.randint(0,training_data.shape[0],batch_size) real_images = training_data[indices] #Generate random noise for a whole batch. random_noise = np.random.normal(0,1,(batch_size,self.random_noise_dimension)) #Generate a batch of new images. generated_images = self.generator.predict(random_noise) #Train the discriminator on real images. discriminator_loss_real = self.discriminator.train_on_batch(real_images,labels_for_real_images) #Train the discriminator on generated images. discriminator_loss_generated = self.discriminator.train_on_batch(generated_images,labels_for_generated_images) #Calculate the average discriminator loss. discriminator_loss = 0.5 * np.add(discriminator_loss_real,discriminator_loss_generated) #Train the generator using the combined model. Generator tries to trick discriminator into mistaking generated images as real. generator_loss = self.combined.train_on_batch(random_noise,labels_for_real_images) print ("%d [Discriminator loss: %f, acc.: %.2f%%] [Generator loss: %f]" % (epoch, discriminator_loss[0], 100*discriminator_loss[1], generator_loss)) if epoch % save_images_interval == 0: self.save_images(epoch) #Save the model for a later use self.generator.save("saved_models/facegenerator.h5")

对于每个epoch:

  1. 随机选择要在此epoch使用的一半真实图像。
  2. 创建一个介于0和1之间的随机数数组。这将是generator的输入。Shape =(batch_size,self.random_noise_dimension)
  3. 生成新的图像。生成的图像数量等于batch size。
  4. 训练discriminator辨别真伪图像。
  5. 计算discriminator的平均损失。
  6. 使用组合模型训练generator。
  7. 打印损失值。
  8. 如果 epochs数等于下一个间隔,则生成图像并保存它们。

训练结束后:

  • 保存训练好的模型以供日后使用。

显示结果

 def save_images(self,epoch): #Save 25 generated images for demonstration purposes using matplotlib.pyplot. rows, columns = 5, 5 noise = np.random.normal(0, 1, (rows * columns, self.random_noise_dimension)) generated_images = self.generator.predict(noise) generated_images = 0.5 * generated_images + 0.5 figure, axis = plt.subplots(rows, columns) image_count = 0 for row in range(rows): for column in range(columns): axis[row,column].imshow(generated_images[image_count, :], cmap='spring') axis[row,column].axis('off') image_count += 1 figure.savefig("generated_images/generated_%d.png" % epoch) plt.close()

此函数可用于在训练后生成新图像。

其他

 def generate_single_image(self,model_path,image_save_path): noise = np.random.normal(0,1,(1,self.random_noise_dimension)) model = load_model(model_path) generated_image = model.predict(noise) #Normalized (-1 to 1) pixel values to the real (0 to 256) pixel values. generated_image = (generated_image+1)*127.5 print(generated_image) #Drop the batch dimension. From (1,w,h,c) to (w,h,c) generated_image = np.reshape(generated_image,self.image_shape) image = Image.fromarray(generated_image,"RGB") image.save(image_save_path)if __name__ == '__main__': facegenerator = FaceGenerator(64,64,3) facegenerator.train(datafolder="data",epochs=4000, batch_size=32, save_images_interval=100) facegenerator.generate_single_image("saved_models/facegenerator.h5","test.png")

结论

训练GAN很难,当你成功时,这种感觉会非常有益。

此Python代码可以轻松用于其他图像数据集。请记住,您可能需要编辑网络体系结构和参数,具体取决于您尝试生成的图像。