ブログ

Application of InfoGAN in generating symbols

06/05/2020

投稿者: Heligate Limited Company

カテゴリー: ブログ

In this blog, we describe how to generate symbols using the technical approach of Deep Learning Networks. This work will mainly be based on InfoGAN - a version of Generative Adversarial Nets GAN . In addition to a generator, this network takes better advantage of the characteristic vector (latent) to illustrate the characteristics of the data such as the category and the forms.

The structure of this blog will include the following parts: We first reintroduce in Background some basic knowledge on Generative Adversarial Nets, as well as its modified version InfoGAN. In Customization, we explain our customization of InfoGAN in order to adapt to the problem. Experiments display some results obtained with our dataset, which are very promising.

 

1. Background 

a. Generative Adversarial Nets

Fig.1 : Generative Adversarial Nets
GAN process

Motivations & Ideas:

The Generative Adversarial Nets were meant to estimate generative models via an adversarial process. The process is illustrated in Fig.1 with the combination of

  • Generative models (Generator G): generating samples from certain noise latents zz.
  • Adversarial process (Discriminator D): examining generated samples' quality via a real / fake discriminator

Consequently, the process is to obtain a good real / false discriminator while the generator is good enough to deceive this discriminator with its generated samples.

Loss formulation:

The ideas resulted in the following formulation:

  • Call xfakexfake the sample generated from generator G: xfake=G(z)xfake=G(z)
  • Call V(D,G)V(D,G) the cost function following the discriminator DD and the generator GG
    V(D,G)=ExPdata[logD(x)]+Eznoise[log(1D(G(z)))]V(D,G)=Ex∼Pdata[log⁡D(x)]+Ez∼noise[log⁡(1−D(G(z)))]

where first component stands for the cost on real samples and the latter on fake samples

The training process is a minimax game between

  • Building a good discriminator means that it discriminates well between real and false samples.
  • Build a good generator that makes such a good sample to deceive the discriminator
    minGmaxDV(D,G)minGmaxDV(D,G)

b. InfoGAN 

Fig.2: InfoGAN
InfoGAN process

Motivations & Ideas

Although GAN works pretty well for generating samples, it is difficult to control the samples' characteristics generated from the latent noise zz with the original version of GAN. InfoGAN was launched with the ambition of creating a generative model while controlling certain characteristics of the samples generated. As a result, it inherits GAN structure with some modifications for the representation interpretation.

Coming up with these motivations, the latent space in InfoGAN is divided into 2 parts:

  • zz: conventional latent noise as in GAN
  • cc: latent codes where the interpretable information is encoded
Fig.3: Form variation of sample in the evolution of different latent codes [2]
InfoGAN categorical info
InfoGAN shape info

Specifically, the latent codes cc target meaningful information such as:

  • Discrete information such as category of the MNIST samples in Fig. 3a
  • Continuous information such as form information (rotation, width) of MNIST samples in Fig. 3c,d

A combination of all these types of information allows us to control certain attributes of the samples generated xx

Formulations

Mathematically, these meaningful information is illustrated as

  • Categorial information (for example, MNIST: 0-9) should follow a categorical distribution: c1Cat(K=10,p=0.1)c1∼Cat(K=10,p=0.1)
  • Shape information (rotation, width) should follow an uniform distribution, for example: c2,c3Unif(1,1)c2,c3∼Unif(−1,1)

To take controlling of these meaningful information, a mutual information term I(c,G(z,c))I(c,G(z,c)) is introduced to raise the connection between the latent codes cc and the generated samples G(z,c)G(z,c). This mutual information is intergrated into the loss function via its lower bound known as Variatational Mutual Information

LI(G,Q)=EcP(c),xG(z,c)[logQ(c|x)]+H(c)LI(G,Q)=Ec∼P(c),x∼G(z,c)[log⁡Q(c|x)]+H(c)

where Q(c|x)Q(c|x) is a DL network approximating to P(c|x)P(c|x)

The loss function is, therefore, modified as

VinfoGAN(D,G,Q)=V(D,G)λLI(G,Q)VinfoGAN(D,G,Q)=V(D,G)−λLI(G,Q)

where V(D,G)V(D,G) the tradition loss of GAN.

Like GAN, the training process will optimize the loss function

minG,QmaxDVinfoGANminG,QmaxDVinfoGAN

2. Customization 

In the next section, we present some of our customization of InfoGAN for our problem. First, we apply a supervised approach to categorical information instead of clustering it in traditional InfoGAN. This means that we will decide the class for each position in the latent code c1c1. We illustrate this customization in [Fig. 4] (#Fig.4) by generating 8 samples (top) with the same class with 8 real samples at the bottom.

Fig. 4: Customization to supervisedly control over the categorical information

In addition, we are also reconfiguring the architecture of the model for higher resolution of the samples generated. The details of the generator and the discriminator are displayed in Fig. 5.

Fig. 5: Customization to supervisedly control over the categorical information  
Generator Discriminator

3. Experiments 

The experiments are based on https://github.com/tdeboissiere/DeepLearningImplementations. As input for this experiments, we collected a total of 20002000 samples from 3 classes: check (✓), cross (ϰϰ), maru (⅁). More details on this dataset are displayed in Table. 1

  Table. 1: Experiment's dataset  
Category Illustration Sample Number
Check 1000
Cross 500
Maru 500

For the configuration of latent space:

  • Categorical latent from the label of real samples (supervised approach)
  • 8-dim continuous latent for more characteristic handling

We are more interested in the variation of the shapes of the generated samples. Therefore, we observe the variation of the shape information by varying each continuous latent in the range of [2,2][−2,2]. We notice that the shape information is encoded in the continuous information defined in the range of [1,1][−1,1]. Choosing a larger range [2,2][−2,2] is to check if the infoGAN is able to make more variations of the sample shapes.

  Table. 2: Symbol generation via InfoGAN
Category InfoGAN Generation
Check
Cross
Maru

Discussion

In Table. 2, we display the results for each class. We find an interesting phenomenon of [2] where the rotation of the shape could be found in the third row and the bold shape in the fifth row. In addition, we could also see that the shape is higher in the fifth row, and the shape becomes smaller in the first row.

Unfortunately, we cannot encode the color feature in these experiments, because we only have 4 colors in our dataset: red, blue, black and gray. However, we can still see the color change in the first line, which is quite promising for this feature.

4. Conclusion

After spending time with infoGAN, we could see the possibility to control the output shape of a generative model. This feature is very important where we want to simulate a target data with some specific attributes. This could help to make the generated samples more real and more efficace, especially when we don't have enough data for the training.

Another ideas from infoGAN is to extract some special features from a dataset, then to transfer it to another dataset. This is quite critical in the tranfer problematic, and infoGAN is absolutely a good approach.

After spending time with infoGAN, we saw the possibility of controlling the output form of a generative model. This functionality is very important when we want to simulate target data with certain specific attributes. This could help make the samples generated more real and more efficient, especially when we don't have enough data for training.

Another idea of InfoGAN is to extract some special attrbutes from one dataset and then transfer it to another. This is quite critical in the transfer issue where infoGAN is absolutely a good approach.

[1] Generative Adversarial Nets - Ian J. Goodfellow \& al.
[2] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets - Xi Chen & al.
* https://awesomeopensource.com/project/burness/tensorflow-101
**LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks