Laboratory One Research Blog

animeGM Part 1 - Anime Generative Model for Style Transfer

June 23, 2018

Transfer Reconstruction - 6

In this first installment I will explore Image Style Transfer with Machine Learning. The goal is to generate new anime-style images from photographs. I will build number of models, train them, and feed a photograph to it. If I succeed, resulting reconstructed photograph will be stylized as anime.

I used the danbooru 2017 anime image dataset. Now this dataset is huge. So I used 100-2000 images from the SFW subset. Each image is 512px x 512px in color. I reduced their size to 64px x 64px and gray-scaled to simplify the model.

From here, I tried to use 3 different models:

  • Principal Component Analysis
  • Autoencoder
  • Convolutional Autoencoder

Transfer Test Images

Principal Component Analysis

I started by using Principal Component Analysis to build a very simple model. I was unable to find the optimal number of components to fit sklearn’s PCA.

Transfer Test Image Reconstruction:

Transfer Reconstruction PCA

Autoencoder

Using an Autoencoder proved was worst than PCA. I suspect that this is because the Autoencoder was rather shallow for the size of the images. Flattening images doesn’t make that much sense.

Transfer Image Reconstruction:

Transfer Reconstruction Autoencoder

Convolutional Autoencoder

I spent a lot of time tuning the hyperparameters here. I found recontruction detail increased with more training images, epochs, and filters. I maxed out at 2000 training images and hit a wall.

Test Image Reconstructions:

Test Reconstructions - 1

Test Reconstructions - 2

Transfer Image Reconstruction:

Transfer Reconstruction - 1

Transfer Reconstruction - 2

Transfer Reconstruction - 3

Transfer Reconstruction - 4

Transfer Reconstruction - 5

Next steps

Although I haven’t succeeded, I was able to get some OK initial results. I definitely got some work to do before this will work.

In part 2, I’m going to:

  • Increase the number of training images by creating an input pipeline
  • Build a Variational Autoencoder to learn the latant space.
  • Use GPUs/TPUs to build deeper models

Peter Chau

Written by Peter Chau, a Canadian Software Engineer building AIs, APIs, UIs, and robots.

peter@labone.tech

laboratoryone

laboratory_one