Generative Adversarial Network Based Visual Saliency Prediction with Cascaded Hierarchical Atrous Spatial Pyramid Pooling
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Visual saliency refers to an area of an image that attracts human attention. The Human Visual
System (HVS) can focus on specific parts of a scene, rather than the whole image. Visual
attention describes a set of cognitive procedures that choose important information and filter out
unnecessary information from cluttered visual scenes. Images become a soul in computer vision
since it contains plenty of information and human being receives 80% of information through
vision. In processing the whole image while only a certain part of an image is needed, more
resources are consumed. Instead of processing the whole pixels of an image, specifying only the
needed pixel is computationally efficient to minimize the efforts. This is achieved by using GAN
with CHASPP module and EfficientNet-B7 which uniformly scales up all dimensions of the
image (depth, width, and resolution) is selected as feature extractor in this study which improves
the way of extracting features in visual saliency prediction. Different datasets like CAT2000,
MIT1003, DUTOMRON, and PASCALS are used in this study to illustrate the efficiency of the
selected models and techniques. Human attention modeling focuses on a bottom-up approach
that computes the impact of visual stimuli popping from its surrounding. However different
models and algorithms have different results in the prediction of the attention area of an image.
In this study, we developed effective visual saliency prediction using GAN with CHASPP and
other factors like edge loss and perceptual loss. CHASPP module scored the best result on the
same datasets measured by different evaluation metrics. It improved the baseline work of
SalGAN+ASPP from 3.356 ± 0.04 to 3.851 ± 0.01 (SalGAN+CHASPP+e). This study
concluded that the CHASPP module, edge loss, and perceptual loss have a great influence on
visual saliency prediction using a generative model.
