作者 | 李秋鍵
責編 | Carol
封圖 | CSDN 下載自視覺中國
近幾年來GAN圖像生成應用越來越廣泛,其中主要得益於GAN 在博弈下不斷提高建模能力,最終實現以假亂真的圖像生成。GAN 由兩個神經網絡組成,一個生成器和一個判別器組成,其中生成器試圖產生欺騙判別器的真實樣本,而判別器試圖區分真實樣本和生成樣本。這種對抗博弈下使得生成器和判別器不斷提高性能,在達到納什平衡後生成器可以實現以假亂真的輸出。
其中GAN 在圖像生成應用最為突出,當然在計算機視覺中還有許多其他應用,如圖像繪畫,圖像標註,物體檢測和語義分割。在自然語言處理中應用 GAN 的研究也是一種增長趨勢,如文本建模,對話生成,問答和機器翻譯。然而,在 NLP 任務中訓練 GAN 更加困難並且需要更多技術,這也使其成為具有挑戰性但有趣的研究領域。
而今天我們就將利用CC-GAN訓練將側臉生成正臉的模型,其中迭代20次結果如下:
實驗前的準備
首先我們使用的python版本是3.6.5所用到的模塊如下:tensorflow用來模型訓練和網絡層建立;numpy模塊用來處理矩陣運算;OpenCV用來讀取圖片和圖像處理;os模塊用來讀取數據集等本地文件操作。
素材準備
其中準備訓練的不同角度人臉圖片放入以下文件夾作為訓練集,如下圖可見:
測試集圖片如下可見:
模型搭建
原始GAN(GAN 簡介與代碼實戰)在理論上可以完全逼近真實數據,但它的可控性不強(生成小圖片還行,生成的大圖片可能是不合邏輯的),因此需要對gan加一些約束,能生成我們想要的圖片,這個時候,CGAN就橫空出世了。其中CCGAN整體模型結構如下:
1、網絡結構參數的搭建:
首先是定義標準化、激活函數和池化層等函數:Batch_Norm是對其進行規整,是為了防止同一個batch間的梯度相互抵消。其將不同batch規整到同一個均值0和方差1。InstanceNorm是將輸入在深度方向上減去均值除以標準差,可以加快網絡的訓練速度。
def instance_norm(x, scope='instance_norm'):return tf_contrib.layers.instance_norm(x, epsilon=1e-05, center=True, scale=True, scope=scope)def batch_norm(x, scope='batch_norm'):return tf_contrib.layers.batch_norm(x, decay=0.9, epsilon=1e-05, center=True, scale=True, scope=scope)def flatten(x) :return tf.layers.flatten(x)def lrelu(x, alpha=0.2):return tf.nn.leaky_relu(x, alpha)def relu(x):return tf.nn.relu(x)def global_avg_pooling(x):gap = tf.reduce_mean(x, axis=[1, 2], keepdims=True)return gapdef resblock(x_init, c, scope='resblock'):with tf.variable_scope(scope):with tf.variable_scope('res1'):x = slim.conv2d(x_init, c, kernel_size=[3,3], stride=1, activation_fn = None)x = batch_norm(x)x = relu(x)with tf.variable_scope('res2'):x = slim.conv2d(x, c, kernel_size=[3,3], stride=1, activation_fn = None)x = batch_norm(x)return x + x_init然後是卷積層的定義:
def conv(x, c):x1 = slim.conv2d(x, c, kernel_size=[5,5], stride=2, padding = 'SAME', activation_fn=relu)# print(x1.shape)x2 = slim.conv2d(x, c, kernel_size=[3,3], stride=2, padding = 'SAME', activation_fn=relu)# print(x2.shape)x3 = slim.conv2d(x, c, kernel_size=[1,1], stride=2, padding = 'SAME', activation_fn=relu)# print(x3.shape)out = tf.concat([x1, x2, x3],axis = 3)out = slim.conv2d(out, c, kernel_size=[1,1], stride=1, padding = 'SAME', activation_fn=None)# print(out.shape)return out生成器函數定義:
def mixgenerator(x_init, c, org_pose, trg_pose): reuse = len([t for t in tf.global_variables() if t.name.startswith('generator')]) > 0with tf.variable_scope('generator', reuse = reuse):org_pose = tf.cast(tf.reshape(org_pose, shape=[-1, 1, 1, org_pose.shape[-1]]), tf.float32)print(org_pose.shape)org_pose = tf.tile(org_pose, [1, x_init.shape[1], x_init.shape[2], 1])print(org_pose.shape)x = tf.concat([x_init, org_pose], axis=-1)print(x.shape)x = conv(x, c)x = batch_norm(x, scope='bat_norm_1')x = relu(x)#64print('-')print(x.shape)x = conv(x, c*2)x = batch_norm(x, scope='bat_norm_2')x = relu(x)#32print(x.shape)x = conv(x, c*4)x = batch_norm(x, scope='bat_norm_3')x = relu(x)#16print(x.shape)f_org = xx = conv(x, c*8)x = batch_norm(x, scope='bat_norm_4')x = relu(x)#8print(x.shape)x = conv(x, c*8)x = batch_norm(x, scope='bat_norm_5')x = relu(x)#4print(x.shape)for i in range(6):x = resblock(x, c*8, scope = str(i)+"_resblock")trg_pose = tf.cast(tf.reshape(trg_pose, shape=[-1, 1, 1, trg_pose.shape[-1]]), tf.float32)print(trg_pose.shape)trg_pose = tf.tile(trg_pose, [1, x.shape[1], x.shape[2], 1])print(trg_pose.shape)x = tf.concat([x, trg_pose], axis=-1)print(x.shape)x = slim.conv2d_transpose(x, c*8, kernel_size=[3, 3], stride=2, activation_fn=None)x = batch_norm(x, scope='bat_norm_8')x = relu(x)#8print(x.shape)x = slim.conv2d_transpose(x, c*4, kernel_size=[3, 3], stride=2, activation_fn=None)x = batch_norm(x, scope='bat_norm_9')x = relu(x)#16print(x.shape)f_trg =xx = slim.conv2d_transpose(x, c*2, kernel_size=[3, 3], stride=2, activation_fn=None)x = batch_norm(x, scope='bat_norm_10')x = relu(x)#32print(x.shape)x = slim.conv2d_transpose(x, c, kernel_size=[3, 3], stride=2, activation_fn=None)x = batch_norm(x, scope='bat_norm_11')x = relu(x)#64print(x.shape)z = slim.conv2d_transpose(x, 3 , kernel_size=[3,3], stride=2, activation_fn = tf.nn.tanh)f = tf.concat([f_org, f_trg], axis=-1)print(f.shape)return z, f下面還有判別器等函數定義,不加贅述。
2、VGG程序設立:
VGG模型網絡層的搭建:
def build(self, rgb, include_fc=False):"""load variable from npy to build the VGGinput format: bgr image with shape [batch_size, h, w, 3]scale: (-1, 1)"""start_time = time.timergb_scaled = (rgb + 1) / 2 # [-1, 1] ~ [0, 1]# blue, green, red = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled)# bgr = tf.concat(axis=3, values=[blue - VGG_MEAN[0],# green - VGG_MEAN[1],# red - VGG_MEAN[2]])self.conv1_1 = self.conv_layer(rgb_scaled, "conv1_1")self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2")self.pool1 = self.max_pool(self.conv1_2, 'pool1')self.conv2_1 = self.conv_layer(self.pool1, "conv2_1")self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2")self.pool2 = self.max_pool(self.conv2_2, 'pool2')self.conv3_1 = self.conv_layer(self.pool2, "conv3_1")self.conv3_2_no_activation = self.no_activation_conv_layer(self.conv3_1, "conv3_2")self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2")self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3")self.conv3_4 = self.conv_layer(self.conv3_3, "conv3_4")self.pool3 = self.max_pool(self.conv3_4, 'pool3')self.conv4_1 = self.conv_layer(self.pool3, "conv4_1")self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2")self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3")self.conv4_4_no_activation = self.no_activation_conv_layer(self.conv4_3, "conv4_4")self.conv4_4 = self.conv_layer(self.conv4_3, "conv4_4")self.pool4 = self.max_pool(self.conv4_4, 'pool4')self.conv5_1 = self.conv_layer(self.pool4, "conv5_1")self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2")self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3")self.conv5_4_no_activation = self.no_activation_conv_layer(self.conv5_3, "conv5_4")self.conv5_4 = self.conv_layer(self.conv5_3, "conv5_4")self.pool5 = self.max_pool(self.conv5_4, 'pool5')if include_fc:self.fc6 = self.fc_layer(self.pool5, "fc6")assert self.fc6.get_shape.as_list[1:] == [4096]self.relu6 = tf.nn.relu(self.fc6)self.fc7 = self.fc_layer(self.relu6, "fc7")self.relu7 = tf.nn.relu(self.fc7)self.fc8 = self.fc_layer(self.relu7, "fc8")self.prob = tf.nn.softmax(self.fc8, name="prob")self.data_dict = Noneprint(("Finished building vgg19: %ds" % (time.time - start_time)))池化層、卷積層函數的定義:
def avg_pool(self, bottom, name):return tf.nn.avg_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)defmax_pool(self, bottom, name):return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)defconv_layer(self, bottom, name):with tf.variable_scope(name):filt = self.get_conv_filter(name)conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')conv_biases = self.get_bias(name)bias = tf.nn.bias_add(conv, conv_biases)relu = tf.nn.relu(bias)return reludefno_activation_conv_layer(self, bottom, name):with tf.variable_scope(name):filt = self.get_conv_filter(name)conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')conv_biases = self.get_bias(name)x = tf.nn.bias_add(conv, conv_biases)return xdeffc_layer(self, bottom, name):with tf.variable_scope(name):shape = bottom.get_shape.as_listdim = 1for d in shape[1:]:dim *= dx = tf.reshape(bottom, [-1, dim])weights = self.get_fc_weight(name)biases = self.get_bias(name)# Fully connected layer. Note that the '+' operation automatically# broadcasts the biases.fc = tf.nn.bias_add(tf.matmul(x, weights), biases)return fcdefget_conv_filter(self, name):return tf.constant(self.data_dict[name][0], name="filter")defget_bias(self, name):return tf.constant(self.data_dict[name][1], name="biases")defget_fc_weight(self, name):return tf.constant(self.data_dict[name][0], name="weights")
模型的訓練
設置GPU加速訓練,需要配置好CUDA環境,並按照tensorflow-gpu版本。
os.environ["CUDA_VISIBLE_DEVICES"] = "0"config = tf.ConfigProtoconfig.gpu_options.allow_growth = Truetf.reset_default_graphmodel = Sequential #創建一個神經網絡對象#添加一個卷積層,傳入固定寬高三通道的數據集讀取和訓練批次的劃分:imagedir = './data/'img_label_org, label_trg, img = reader.images_list(imagedir)epoch = 800batch_size = 10total_sample_num = len(img_label_org)if total_sample_num % batch_size == 0:n_batch = int(total_sample_num / batch_size)else:n_batch = int(total_sample_num / batch_size) + 1輸入輸出神經元和判別器等初始化:
org_image = tf.placeholder(tf.float32,[None,128,128,3], name='org_image')trg_image = tf.placeholder(tf.float32,[None,128,128,3], name='trg_image')org_pose = tf.placeholder(tf.float32,[None,9], name='org_pose')trg_pose = tf.placeholder(tf.float32,[None,9], name='trg_pose')gen_trg, feat = model.mixgenerator(org_image, 32, org_pose, trg_pose)out_trg = model.generator(feat, 32, trg_pose)#D_abD_r, real_logit, real_pose = model.snpixdiscriminator(trg_image)D_f, fake_logit, fake_pose = model.snpixdiscriminator(gen_trg)D_f_, fake_logit_, fake_pose_ = model.snpixdiscriminator(out_trg)# fake or real D_LOSSloss_pred_r = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=real_logit, labels=tf.ones_like(D_r)))loss_pred_f = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=fake_logit_, labels=tf.zeros_like(D_f_)))loss_d_pred = loss_pred_r + loss_pred_f#pose lossloss_d_pose = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=real_pose, labels=trg_pose))loss_g_pose_ = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=fake_pose_, labels=trg_pose))loss_g_pose = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=fake_pose, labels=trg_pose))#G_LOSSloss_g_pred = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=fake_logit_, labels=tf.ones_like(D_f_)))out_pix_loss = ops.L2_loss(out_trg, trg_image)out_pre_loss, out_feat_texture = ops.vgg_loss(out_trg, trg_image)out_loss_texture = ops.texture_loss(out_feat_texture)out_loss_tv = 0.0002 * tf.reduce_mean(ops.tv_loss(out_trg))gen_pix_loss = ops.L2_loss(gen_trg, trg_image)out_g_loss = 100*gen_pix_loss + 100*out_pix_loss + loss_g_pred + out_pre_loss + out_loss_texture + out_loss_tv + loss_g_pose_gen_g_loss = 100 * gen_pix_loss + loss_g_pose# d_lossdisc_loss = loss_d_pred + loss_d_poseout_global_step = tf.Variable(0, trainable=False)gen_global_step = tf.Variable(0, trainable=False)disc_global_step = tf.Variable(0, trainable=False)start_decay_step = 500000start_learning_rate = 0.0001decay_steps = 500000end_learning_rate = 0.0out_lr = (tf.where(tf.greater_equal(out_global_step, start_decay_step), tf.train.polynomial_decay(start_learning_rate, out_global_step-start_decay_step, decay_steps, end_learning_rate, power=1.0),start_learning_rate))gen_lr = (tf.where(tf.greater_equal(gen_global_step, start_decay_step), tf.train.polynomial_decay(start_learning_rate, gen_global_step-start_decay_step, decay_steps, end_learning_rate, power=1.0),start_learning_rate))disc_lr = (tf.where(tf.greater_equal(disc_global_step, start_decay_step), tf.train.polynomial_decay(start_learning_rate, disc_global_step-start_decay_step, decay_steps, end_learning_rate, power=1.0),start_learning_rate))t_vars = tf.trainable_variablesg_gen_vars = [var for var in t_vars if 'generator' in var.name]g_out_vars = [var for var in t_vars if 'generator_1' in var.name]d_vars = [var for var in t_vars if 'discriminator' in var.name]train_gen = tf.train.AdamOptimizer(gen_lr, beta1=0.5, beta2=0.999).minimize(gen_g_loss, var_list = g_gen_vars, global_step = gen_global_step)train_out = tf.train.AdamOptimizer(out_lr, beta1=0.5, beta2=0.999).minimize(out_g_loss, var_list = g_out_vars, global_step = out_global_step)train_disc = tf.train.AdamOptimizer(disc_lr, beta1=0.5, beta2=0.999).minimize(disc_loss, var_list = d_vars, global_step = disc_global_step)saver = tf.train.Saver(tf.global_variables)模型訓練、圖片生成和模型的保存:
with tf.Session(config=config) as sess:for d in ['/gpu:0']:with tf.device(d):ckpt = tf.train.get_checkpoint_state('./models/')if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path):saver.restore(sess, ckpt.model_checkpoint_path)print('Import models successful!')else:sess.run(tf.global_variables_initializer)print('Initialize successful!')for i in range(epoch):random.shuffle(img_label_org)random.shuffle(label_trg)for j in range(n_batch):if j == n_batch - 1:n = total_sample_numelse:n = j * batch_size + batch_sizeimg_org_output, img_trg_output, label_org_output, label_trg_output, image_name_output = reader.images_read(img_label_org[j*batch_size:n], label_trg[j*batch_size:n], img, imagedir)feeds = {org_image:img_org_output, trg_image:img_trg_output, org_pose:label_org_output,trg_pose:label_trg_output}if i < 400:sess.run(train_disc, feed_dict=feeds)sess.run(train_gen, feed_dict=feeds)sess.run(train_out, feed_dict=feeds)else:sess.run(train_gen, feed_dict=feeds)sess.run(train_out, feed_dict=feeds)if j%10==0:sess.run(train_disc, feed_dict=feeds)if j%2==0:gen_g_loss_,out_g_loss_, disc_loss_, org_image_, gen_trg_, out_trg_, trg_image_ = sess.run([gen_g_loss, out_g_loss, disc_loss, org_image, gen_trg, out_trg, trg_image],feeds)print("epoch:", i, "iter:", j, "gen_g_loss_:", gen_g_loss_, "out_g_loss_:", out_g_loss_, "loss_disc:", disc_loss_)for n in range(batch_size):org_image_output = (org_image_[n] + 1)*127.5gen_trg_output = (gen_trg_[n] + 1)*127.5out_trg_output = (out_trg_[n] + 1)*127.5trg_image_output = (trg_image_[n] + 1)*127.5temp = np.concatenate([org_image_output, gen_trg_output, out_trg_output, trg_image_output], 1)cv.imwrite("./record/%d_%d_%d_image.jpg" %(i, j, n), temp)if i%10==0 or i==epoch-1:saver.save(sess, './models/wssGAN.ckpt', global_step=gen_global_step)print("Finish!")最終運行程序結果如下:
初始訓練一次結果:
訓練20次結果:
經過對比,可以發現有明顯的提升!
源碼地址:
提取碼:kdxe
作者介紹:
李秋鍵,CSDN 博客專家,CSDN達人課作者。碩士在讀於中國礦業大學,開發有taptap安卓武俠遊戲一部,vip視頻解析,文意轉換工具,寫作機器人等項目,發表論文若干,多次高數競賽獲獎等等。