1.GAN简介“做饭的人,做饭的灵魂,做饭的人都是高手”。这个干扇子不是那个干扇子。本文要讨论的GAN是Goodfellow2014提出的生成对抗模型,即GenerativeAdversarialNets。那么GAN到底有什么神奇之处呢?常规的深度学习任务,如图像分类、目标检测和语义分割或实例分割,这些任务的结果都可以归因于预测。图像分类是预测单个类别,物体检测是预测bbox和类别,语义分割或实例分割是预测每个像素的类别。而GAN就是生成一个新的东西比如图片。GAN的原理可以用一句话来解释:通过对抗学习一种数据分布的生成模型。GAN是一种无监督的过程,可以捕获数据集的分布,从而可以从随机噪声中生成相同分布的数据。GAN的组成:判别模型和生成模型的左手博弈D判别模型:学习真假边界,G判断数据真假的生成模型:学习数据分布,生成数据GAN经典损失如下(minmax体现对抗)2.实战cycleGAN风格转换了解了GAN的作用之后,我们来体验一下GAN的神奇效果。这里以cycleGAN为例,实现图像的风格转换。所谓风格转换就是改变原图片的风格。下图左边是原图,中间是风格图(梵高画),生成的图右边是带梵高风格的原图。可以看到整体生成的图片保留了原图的大部分内容。2.1cycleGAN简介cycleGAN本质上与GAN相同,都是学习数据集中潜在的数据分布。GAN是从随机噪声中生成具有相同分布的图片,cycleGAN是将学习到的分布添加到有意义的图片中,生成另一个领域的图片。cycleGAN假设图像到图像的两个域之间存在潜在联系。众所周知,GAN的映射功能很难保证生成图片的有效性。cycleGAN使用循环一致性来保证生成图像和输入图像的结构一致性。再看cycleGAN的结构:特点总结如下:Two-wayGAN:两个生成器[G:X->Y,F:Y->X]和两个判别器[Dx,Dy],G的目的而Dy是生成的对象,Dy(正类是Y域),无法判别。F和Dx也是如此。循环一致性:G是生成Y的生成器,F是生成X的生成器。循环一致性是约束G和F生成对象的范围。是的,G生成的对象可以通过F生成器,如:x->G(x)->F(G(x))=xagainstloss如下:2.2实现cycleGAN2.2.1生成器从上面的介绍,生成器有两个生成器,一个是forward和另一个是相反的。结构是参考论文PerceptualLossesforReal-TimeStyleTransferandSuper-Resolution:SupplementaryMaterial。大致可以分为:下采样+残差residualblock+上采样,如下图(摘自论文):上采样和下采样的实现是stride=2的卷积,上采样使用nn.Upsample:#残差块类ResidualBlock(nn.Module):def__init__(self,in_features):super(ResidualBlock,self).__init__()self.block=nn.Sequential(nn.ReflectionPad2d(1),nn.Conv2d(in_features,in_features,3),nn.InstanceNorm2d(in_features),nn.ReLU(inplace=True),nn.ReflectionPad2d(1),nn.Conv2d(in_features,in_features,3),nn.InstanceNorm2d(in_features),)defforward(self,x):returnx+self.block(x)classGeneratorResNet(nn.Module):def__init__(self,input_shape,num_residual_blocks):super(GeneratorResNet,self).__init__()通道=input_shape[0]#Initialconvolutionblockout_features=64model=[nn.ReflectionPad2d(通道),nn.Conv2d(通道,out_features,7),nn.InstanceNorm2d(out_features),nn.ReLU(inplace=True),]in_features=out_features#Downsamplingfor_inrange(2):out_features*=2model+=[nn.Conv2d(in_features,out_features,3,stride=2,padding=1),nn.InstanceNorm2d(out_features),nn.ReLU(inplace=True),]in_features=out_features#Residualblocksfor_inrange(num_residual_blocks):model+=[ResidualBlock(out_features)]#Upsamplingfor_inrange(2):out_features//=2model+=[nn.Upsample(scale_factor=2),nn.Conv2d(in_features,out_features,3,步幅=1、padding=1),nn.InstanceNorm2d(out_features),nn.ReLU(inplace=True),]in_features=out_features#Outputlayermodel+=[nn.ReflectionPad2d(channels),nn.Conv2d(out_features,channels,7),nn.tanh()]self.model=nn.Sequential(*model)defforward(self,x):returnsself.model(x)2.2.2判别器传统的GAN判别器输出一个值来判断真假程度patchGAN输出的是N*N个值,每个值代表原图上一定大小的感受野,直观上是判断原图上裁剪下的一部分可重复区域的真实性,可以认为作为全卷积网络的最早提出是在pix2pix(Image-to-ImageTranslationwithConditionalAdversarialNetworks)中。优点是参数少,另外一个可以更好的捕捉局部的高频信息。classDiscriminator(nn.Module):def__init__(self,input_shape):super(Discriminator,self).__init__()channels,height,width=input_shape#Calculateoutputshapeofimagediscriminator(PatchGAN)self.output_shape=(1,height//2**4,width//2**4)defdiscriminator_block(in_filters,out_filters,normalize=True):"""Returnsdownsamplinglayersofeachdiscriminatorblock"""layers=[nn.Conv2d(in_filters,out_filters,4,stride=2,padding=1)]ifnormalize:layers.append(nn.InstanceNorm2d(out_filters))layers.append(nn.LeakyReLU(0.2,inplace=True))returnlayersself.model=nn.Sequential(*discriminator_block(channels,64,normalize=False),*discriminator_block(64,128),*discriminator_block(128,256),*discriminator_block(256,512),nn.ZeroPad2d((1,0,1,0)),nn.Conv2d(512,1,4,padding=1))defforward(self,img):returnsself.model(img)2.2.3训练损失和模型初化#Lossescriterion_GAN=torch.nn.MSELoss()criterion_cycle=torch.nn.L1Loss()criterion_identity=torch.nn.L1Loss()cuda=torch.cuda.is_available()input_shape=(opt.channels,opt.img_height,opt.img_width)#InitializegeneratoranddiscriminatorG_AB=GeneratorResNet(input_shape,opt.n_residual_blocks)G_BA=GeneratorResNet(input_shape,opt.n_residual_blocks)D_A=鉴别器(input_shape)D_B=鉴别器(input_shape)优化器和训练策略#Optimizersoptimizer_G=torch.optim.Adam(itertools.chain(G_AB.parameters(),G_BA.parameters()),lr=opt.lr,betas=(opt.b1,opt.b2))optimizer_D_A=torch.optim.Adam(D_A.parameters(),lr=opt.lr,betas=(opt.b1,opt.b2))optimizer_D_B=torch.optim.Adam(D_B.parameters(),lr=opt.lr,betas=(opt.b1,opt.b2))#Learningrateupdateschedulerslr_scheduler_G=torch.optim.lr_scheduler.LambdaLR(optimizer_G,lr_lambda=LambdaLR(opt.n_epochs,opt.epoch,opt.decay_epoch).step)lr_scheduler_D_A=火炬。optim.lr_scheduler.LambdaLR(optimizer_D_A,lr_lambda=LambdaLR(opt.n_epochs,opt.epoch,opt.decay_epoch).step)lr_scheduler_D_B=torch.optim.lr_scheduler.LambdaLR(optimizer_D_B,lr_lambda=LambdaLR(opt.n_epochs,opt.epoch),opt.decay_epoch).step)trainingiteration训练数据是paireddata,但是是unpaireddata,即A和B没有直接关系A为原图,B为风格图像生成器训练GAN损失:判别器区分A、B生成的fake_A、fake_B、GT两张图像的损失。循环损失:反过来,fake_A、fake_B生成的图像为onthesamepixelsasAandBDifferencediscriminatortraining:loss_real:MSELossfordistinguishingA/BandGTloss_fake:Discriminatesgeneratedfake_A/fake_BandMSELossforGTMSELossforepochinrange(opt.epoch,opt.n_epochs):fori,batchinenumerate(dataloader数据加载器):#Data是成对数据,但是是非成对数据,即A和B没有直接关系real_A=Variable(batch["A"].type(Tensor))real_B=Variable(batch["B"].type(Tensor))#Adversarialgroundtruthsvalid=Variable(Tensor(np.ones((real_A.size(0),*D_A.output_shape))),requires_grad=False)fake=Variable(张量(np.zeros((real_A.size(0),*D_A.output_shape))),requires_grad=False)#----------------#TrainGenerators#------------------G_AB.train()G_BA.train()optimizer_G.zero_grad()#Identitylossloss_id_A=criterion_identity(G_BA(real_A),real_A)loss_id_B=criterion_identity(G_AB(real_B),real_B)loss_identity=(loss_id_A+loss_id_B)/2#GANlossfake_B=G_AB(real_A)loss_GAN_AB=criterion_GAN(D_B(fake_B),valid)fake_A=G_BA(real_B)loss_GAN_BA=criterion_GAN(D_A(fake_A),valid)loss_GAN=(loss_GAN_AB+loss_GAN_BA)/2#Cyclelossrecov_A=G_BA(fake_B)loss_cycle_A=criterion_cycle(recov_A,real_A)recov_B=G_AB(fake_A)loss_cycle_B=criterion_cycle(recov_B,real_B)loss_cycle=(loss_cycle_A+loss_cycle_B)/2#Totalloslossloss_G=loss_GAN+opt.lambda_cyc*loss_cycle+opt.lambda_id*loss_identityloss_G.backward()optimizer_G.step()#----------------------#TrainDiscriminatorA#------------------------optimizer_D_A.zero_grad()#Reallosloss_real=criterion_GAN(D_A(real_A),valid)#Fakeloss(onbatchofpreviouslygeneratedsamples)#fake_A_=fake_A_buffer.push_and_pop(fake_A)loss_fake=criterion_GAN(D_A(fake_A_.detach()),fake)#Totalloslossloss_D_A=(loss_real+loss_fake)/2loss_D_A.backward()optimizer_D_A.step()#--------------------#TrainDiscriminatorB#-------------------------optimizer_D_B.zero_grad()#Reallosloss_real=criterion_GAN(D_B(real_B),valid)#Fakeloss(onbatchofpreviouslygeneratedsamples)#fake_B_=fake_B_buffer.push_and_pop(fake_B)loss_fake=criterion_GAN(D_B(fake_B_.detach()),fake)#Totalloslossloss_D_B=(loss_real+loss_fake)/2loss_D_B.backward()optimizer_D_B.step()loss_D=(loss_D_A+loss_2#D_B)/--------------#LogProgress#------------#Determineapproximatetimeleftbatches_done=epoch*len(dataloader)+ibatches_left=opt.n_epochs*len(dataloader)-batches_donetime_left=datetime.timedelta(seconds=batches_left*(time.time()-prev_time))prev_time=time.time()#Updatelearningrateslr_scheduler_G.step()lr_scheduler_D_A.step()lr_scheduler_D_B.step()2.2.4结果展示本文训练莫奈风格的转换,如下图所示:第一行和第二行是将莫奈风格的画作转换为普通照片,第三行和第四行是将普通照片转换为莫奈风格的画作。看手机实拍图:2.2.5cycleGAN其他用途3.总结本文详细介绍了GAN的应用之一cycleGAN,并将其应用于图像风格的转换如下:GAN是学习分布在数据中生成具有相同分布的新数据。CycleGAN是双向GAN:两个生成器和两个鉴别器;为了保证generator生成的图片和输入图片之间存在一定的关系,而不是随机生成的图片,引入循环一致性来判断A->fake_B->recove_A和A的区别:Downsampling+residualblock+upsamplingdiscriminator:不是对图生成判断值,而是patchGAN的方法,生成N*N个值,然后取平均值
