Conditional GANs
Conditional GANs [1] are a very interesting extension to the GAN framework. This architecture extends the original formulation by embedding the class label, (usually a one-hot encoded vector) with the noise vector input to the Generator and the image input to the discriminator. An interesting extension to this is the AC-GAN model [2], which uses Multi-Task Learning to train the discriminator to not only classify the image as real or fake, but to also classify the label of the image on a scale of 0 to the number of class labels +1, the labeling (class labels+1) denotes a fake image.
An interesting detail with the implementation of this concept is wether or not to simply concatenate the one-hot encoded vector onto the noise vector, or to use an embedding layer similar to how Word2Vec works, in order to encode the new input into a more meaningful representation.
The results of Conditional GANs are very impressive. They allow for much greater control over the final output from the generator. For example, instead of just asking the generator to generate digits from 0–9, the generator can be told to only output 3s. The class-conditioned embedding doesn’t directly control the latent space as something like InfoGANs seeks to do, but it does give the end-user a mechanism for controlling the Generator output. This same concept could be extended to latent space manipulations such as “What would this guy look like with sunglasses on?”.
The dimensionality of the class-conditioned label seems to be problematic for researchers as they frequently break this up into multiple models. For example, in the AC-GAN paper, 100 different GAN models each handle 10 different classes from the ImageNet dataset consisting of 1,000 different object categories.
Despite the current limitations, Conditional GANs are still a very interesting idea. It is a very intuitive solution that shines light on the potential control available with generative modeling. Thanks for reading, hopefully this helped to inspire interest into this subject!
References
[1] Mehdi Mirza, Simon Osindero. Conditional Generative Adversarial Nets. 2014.
[2] Augustus Odena, Christopher Olah, Jonathon Shlens. Conditional Image Synthesis with Auxiliary Classifier GANs. 2016.