ComicGAN: Text-to-Comic Generative Adversarial Network


Drawing and annotating comic illustrations as shown in figure 1 above is a complex and difficult process. No existing machine learning algorithms have been developed to create comic illustrations based on descriptions of illustrations, or the dialogue in comics. Moreover, it is not known if a generative adversarial network (GAN) can generate original comics that correspond to the dialogue and/or descriptions. GANs are successful in producing photo-realistic images, but this technology does not necessarily translate to the generation of flawless comics.

In this blog, we introduce ComicGAN, a novel text-to-comic pipeline based on a text-to-image GAN that synthesizes comics according to text descriptions. The results produced by ComicGAN using 1000 Dilbert comic panels and 6000 descriptions show synthetic comic panels from text inputs resemble original Dilbert panels. And that, generating illustrations from descriptions provided clear comics including characters and colours that were specified in the descriptions.


Figure 2: Text-to-comic model pipeline.

This section will discuss the methods used: to extract and create text descriptions that correspond to comics; to augment and modify AttnGAN to create comics from text and build a CNN image encoder for comics.


AttnGAN is a state-of-the-art GAN-based model that generates images from text, using images with multiple corresponding text captions as training data. This model was fine-tuned and augmented to improve results. AttnGAN uses multiple generators and discriminators, with encoded input text and encoded images.

ComicGAN first enables the generation of comics with AttnGAN by introducing a method to create text descriptions from a labelled image dataset. It then attempts to further improve the generation of comics by creating a CNN image encoder designed specifically for comics.

Comic Feature Extractor CNN

To create a feature extractor for images, the Inception v3 CNN is trained on the ImageNet dataset for multi-class classification. This same base network (with different final dense layers) is then used in AttnGAN as the feature extractor. Using the same transfer learning methodology, a multi-label classifier was trained specifically on comics, with the objective that a specialised CNN architecture could perform better feature extraction on comics, therefore creating higher quality comics with this input to the GAN.


Figure 3: Generated examples comics (text descriptions used to generate the illustrations appear below)

Based on the qualitative evaluation by looking at figure 3 above, ComicGAN learned the main features of the comics and the text. Coloured backgrounds matched the text description, and the colour gradient was almost identical to that seen in the real Dilbert comics. Almost all characters present in the training set could be recognised.


The main purpose of this blog was to showcase ComicGAN, a text-to-image GAN to create comics in the style of Dilbert. ComicGAN was created by using automatic text description generation methods and custom comic feature extraction using CNN’s. Synthesis of comics illustrations was shown to be successful and representative of Dilbert comics. This ultimately enables producing high-quality comics which can include the characters and colour based on the descriptions provided to generate them allowing for a more fast-paced and creative workflow to create comic illustrations. Read more about the research on this subject here.