Spatial-Channel Aware Generative Adversarial Network for Text to Image Translation
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Generative Adversarial Network is one of the emerging deep learning technologies that is used
for many image generations techniques. From this application text to image translation is one of
them. The domain difference between the text and the image is the main challenging task. To
mitigate this problem different state of art methods was employed, but those approaches have
limitations because almost all the current text to image approaches employ a stack of multiple
generators and discriminators, this leads to the training process getting slow and the desired
output image might not be synthesized well if the first image isn't effectively synthesized,
subsequent generators will struggle to improve it to suitable quality of the image. Another
methodology utilized one set of generators and discriminators to alleviate the previous issues but
this approach generates an image by stacking the entire text description in a single sentence
vector, but this lacks the word level information relevance of the text description. still, the quality
of the image and the semantic consistency is an open task in the text-to-image translation. To
address those issues, this study proposed spatial and channel-wise attention mechanisms to
enhance the quality of the generated image. and this study proposed additional constraints deep
attentional multimodal similarity model loss to improve the semantic coherence between the
synthesized image and the written description at word-level and sentence level. Spectral
normalization was introduced to stabilize the training procedure and mitigate the modal collapse
problems. The extensive qualitative and quantitative results indicate the effectiveness of the
proposed model. Based on the inception score analysis the proposed model improves the baseline
work CycleGAN from 3.70±0.045 to 4.31±0.14 and also improves the baseline work of deep fusion
generative adversarial Network (DFGAN) from 3.45±0.065 to 4.31±0.14. The average human
evaluation of the proposed model improves the DFGAN by 46.66 percent. This thesis concludes
that spatial and channel-wise attention mechanisms have a great influence on the generation of
quality images and DAMSM loss also has a great improvement in semantic consistency between
the text and image information.
