Generating Basketball Shoes with DCGANs
Generative Adversarial Networks [1] are a very successful algorithm for generative modeling. The DCGAN architecture [2] presented by Radford et al. is very appealing because it is easy to implement in comparison with something like Progressively-Growing GANs [3] or StackGANs [4]. The DCGAN architecture is very similar to the Vanilla GAN except that the generator and discriminator architectures are modified to include convolutional and strided convolutional layers. The full code for this experiment is available here.
In this experiment, I am trying to see if DCGANs can generate low-resolution basketball shoes. These images, (depicted below), are originally 360 x 360 but are downsampled to 45 x 45 to facilitate using GANs.
Before discussing the architecture used and the miscellaneous details of this implementation, the results can be seen below:
When stacked up in a grid like this and further converted to grayscale… the shoe images above do not look so bad, but the images below are a more accurate representation of the true and disappointing results of the DCGAN on this experiment…
As detailed in the paper by Radford et al. [2], I used many of the same architecture optimizations such as using an Adam optimizer with a learning rate of 0.0002 and a beta_1 parameter of 0.5, using BatchNorm layers everywhere but the input to the generator and the output to the discriminator, and using a nearly identical upsampling design in the generator.
I used the code from this repository to get started with this project: https://github.com/GANs-in-Action/gans-in-action/blob/master/chapter-4/Chapter_4_DCGAN.ipynb. Unique to this repository, and what you will need to do when implementing this for yourself on custom datasets, my code shared on github has 2 contributions. Firstly, a data loader function that allows you to load custom data and rescale the pixel values to [0–1], (Raford et al. use [-1,1] and a tanh activation function on the generator, but I found better results with [0,1] and sigmoid activation). Secondly, the code shows you how to reconfigure the generator network’s upscaling process to fit your custom data dimensions. In my case, the starter code using MNIST outputs to 28x28x1, and I needed to change the final ouptut dimension to 45x45x3.
In order to get a better output, I am currently looking into changing the generator network’s architecture and iterating through hyperparameters on the Adam optimizer. Additionally, this experiment was run on an NVIDIA 1060 GPU, taking about 2 minutes per 1k iterations. Thanks for reading!
References
[1] Generative Adversarial Networks. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. 2014.
[2] Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015.
[3]Progressive Growing of GANs for Improved Quality, Stability, and Variation. Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen. 2017.
[4] StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas. 2016.