Spatial-Channel Aware Generative Adversarial Network for Text to  Image Translation

Mengistu, Abebe

Spatial-Channel Aware Generative Adversarial Network for Text to Image Translation

Files

Mengistu Abebe.pdf (4.38 MB)

Date

2021-09

Authors

Mengistu, Abebe

Publisher

ASTU

Abstract

Generative Adversarial Network is one of the emerging deep learning technologies that is used for many image generations techniques. From this application text to image translation is one of them. The domain difference between the text and the image is the main challenging task. To mitigate this problem different state of art methods was employed, but those approaches have limitations because almost all the current text to image approaches employ a stack of multiple generators and discriminators, this leads to the training process getting slow and the desired output image might not be synthesized well if the first image isn't effectively synthesized, subsequent generators will struggle to improve it to suitable quality of the image. Another methodology utilized one set of generators and discriminators to alleviate the previous issues but this approach generates an image by stacking the entire text description in a single sentence vector, but this lacks the word level information relevance of the text description. still, the quality of the image and the semantic consistency is an open task in the text-to-image translation. To address those issues, this study proposed spatial and channel-wise attention mechanisms to enhance the quality of the generated image. and this study proposed additional constraints deep attentional multimodal similarity model loss to improve the semantic coherence between the synthesized image and the written description at word-level and sentence level. Spectral normalization was introduced to stabilize the training procedure and mitigate the modal collapse problems. The extensive qualitative and quantitative results indicate the effectiveness of the proposed model. Based on the inception score analysis the proposed model improves the baseline work CycleGAN from 3.70±0.045 to 4.31±0.14 and also improves the baseline work of deep fusion generative adversarial Network (DFGAN) from 3.45±0.065 to 4.31±0.14. The average human evaluation of the proposed model improves the DFGAN by 46.66 percent. This thesis concludes that spatial and channel-wise attention mechanisms have a great influence on the generation of quality images and DAMSM loss also has a great improvement in semantic consistency between the text and image information.

Keywords

GAN, CycleGAN, DFGAN, spatial attention, channel-wise attention, Deep Attentional Multimodal Similarity Model, spectral normalization

URI

http://10.240.1.28:4000/handle/123456789/1576

Collections

Thesis

Full item page

Spatial-Channel Aware Generative Adversarial Network for Text to Image Translation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By