Spatial-Channel Aware Generative Adversarial Network for Text to Image Translation

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Generative Adversarial Network is one of the emerging deep learning technologies that is used for many image generations techniques. From this application text to image translation is one of them. The domain difference between the text and the image is the main challenging task. To mitigate this problem different state of art methods was employed, but those approaches have limitations because almost all the current text to image approaches employ a stack of multiple generators and discriminators, this leads to the training process getting slow and the desired output image might not be synthesized well if the first image isn't effectively synthesized, subsequent generators will struggle to improve it to suitable quality of the image. Another methodology utilized one set of generators and discriminators to alleviate the previous issues but this approach generates an image by stacking the entire text description in a single sentence vector, but this lacks the word level information relevance of the text description. still, the quality of the image and the semantic consistency is an open task in the text-to-image translation. To address those issues, this study proposed spatial and channel-wise attention mechanisms to enhance the quality of the generated image. and this study proposed additional constraints deep attentional multimodal similarity model loss to improve the semantic coherence between the synthesized image and the written description at word-level and sentence level. Spectral normalization was introduced to stabilize the training procedure and mitigate the modal collapse problems. The extensive qualitative and quantitative results indicate the effectiveness of the proposed model. Based on the inception score analysis the proposed model improves the baseline work CycleGAN from 3.70±0.045 to 4.31±0.14 and also improves the baseline work of deep fusion generative adversarial Network (DFGAN) from 3.45±0.065 to 4.31±0.14. The average human evaluation of the proposed model improves the DFGAN by 46.66 percent. This thesis concludes that spatial and channel-wise attention mechanisms have a great influence on the generation of quality images and DAMSM loss also has a great improvement in semantic consistency between the text and image information.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By