Laboratory One Research Blog

animeGM Part 2 - Anime Generative Model for Style Transfer

July 13, 2018

Web App - Mobile

Following my initial exploration of Image Style Transfer with Machine Learning in part 1, a number of steps we taken to improve my best models. The major focus here was tuning the hyper parameters. Next, the model was productionized as a web application. This user interface allowed for easier usage of the model. And finally, I used an implementation of VGG style transfer to start to get true style transfer. Thankfully, there are many implementations online. Let’s examine these improvements.

Improving the Convolutional Autoencoder

It became clear that my models required better implementation to break past training barriers. I had 3 issues to tackle:

  1. I couldn’t train with more than 2000 images
  2. Training was very slow
  3. Even the best models were very simple

Increasing number of training images

Tensorflow’s documentation made it clear that using feed_dict to input data for training was sub-optimal. Instead, a pipeline should be built to handle the extract, transform, load (ETL) process. This allows for just-in-time delivery of data during training. Futhermore, using Tensorflow rather than Numpy allowed for ETL allows for leveraging of graph optimizations. Nice.

Upon building an input pipeline, I found that could easily handle 5000 images. Previously, I had to wait for the ETL process to finish before I could run additional code. I suspect this was why I couldn’t load more than 2000 image. It is likely that my computer was trying to load them all into memory prior to training. Along with parallelized data transformation, the overall training times were greatly reduced. Additionally, the reconstructions were greatly improved!

Speeding up training

My training machine has a very low-end CPU from 2010, and a CUDA-incompatible GPU. Training was painfully slow, even with the prior improvements. It was time to get serious so I scored me a GeForce GTX 1050 2GB GPU. Not at high-end accelerator but a big step up. CUDA setup was a challenge but after getting through it and updating my models to use the GPU, training became at least 20 times faster! I was able to have the input pipeline use the CPU, while the new GPU was used for training. I continued to use the old GPU to run my monitors.

This hardware upgrade allowed me to quickly train far more epochs with larger batch sizes. I was even able to increase the size of my input images from 32 pixels by 32 pixels to 128 px by 128 px, and use color images. The resulting reconstructions were night and day.

GPU usage

Training Loss

Training Val Loss

Improving the model itself

With the prior issues out of my way, I could focus on experimenting with more better model architectures. The prior improvements allowed me to iterate faster, and train more complicated models. I was able to get much better reconstructions with a deeper Autoencoder, and by using more filters at each layer. I also used batch normalization after each Convolutional layer to reduce the effects of vanishing and exploding gradients. This allows me to reduce the number of training epochs and decrease batch size.

It is important to note that I had to balance model complexity and batch size, otherwise the GPU would run out of memory. I’ll need to investigate how this can be handled with software.

Transfer Reconstruction - 7

Transfer Reconstruction - 8

Transfer Reconstruction - 9

Productionizing the model

Upon achieving OK-ish results, I took a small detour into another concern of machine learning… productionizing trained models. I heard a great podcast on this topic and was inspired to give it shot. I found that Tensorflow.js could load Keras models into JavaScript, and make inferences on the client side. This was perfect for my usecase. I definitely don’t want to deal user information OR server fees. This learning experiment needs to stay simple.

The first step was to train a model and export it from Keras. This model can’t be used directly so I transformed it into model layers which can be ingested by Tensorflow.js. This model was put into a Content Distribution Network for consumption. Finally, I build a React.js web application from create-react-app. This web aplication loads up the Keras model, and allows users to make inferences on their device. Users can even upload their own photos. SICK.

Try it here

Web App - initial

Web App - Mobile

Style Transfer using a pretrained network

Having pushed the boundaries of my knowledge, it was time to defer to the experts. I could no load get better reconstructions. This was because I did not consult the literature on style transfer. It was a good learning experience though.

As it turns out, I was close in my methodology. Using a Convolutional model was the right choice. However, I didn’t need an Autoencoder. The literature says that using a Convolutional classification model works fine. Another mistake of mine was to train with anime images then feed a photo through the network for reconstructions. Instead, I needed to feed 1 style image, and 1 content image into a pre-trained network then use 2 losses to generate a style transferred image.

I found a simple implementation in Keras to try. This model used the pretrained VGG19 network, and 3 loss functions (I only used 2 of them to start). The resulting style transfers were much better… this is definitely the way to go!

Style Transfer

Next Steps

In third and final part of this series, I will:

  1. Tune hyperparameters and find the right style image
  2. Tune the implementation
  3. Improve by adding Total Variance Loss
  4. Put the trained model into production

Peter Chau

Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.