Attention-Based Image-To-Video Translation For Synthesizing Facial Expression Using Generative Adversarial Network

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Image-To-Video Translation Seeks To Generate A Video Clip From A Single Static Image, Unlike Video Prediction, Its Input Is A Static Image With No Temporal Clues. The Fundamental Challenge In Video Generation Is Not Only Generating High-Quality Image Sequences But Also Generating Consistent Frames With No Abrupt Shifts. With The Development Of Generative Adversarial Networks (Gans), Great Progress Has Been Made In Image Generation Tasks One Of Which Is Facial Expression Synthesis. This Study Focuses On Generating Facial Expressions Video From A Single Static Image. Most Previous Works Focused On Synthesizing Frontal And Near Frontal Faces, Which Is Not Enough For Many Real-World Applications. Some Previous Works Manually Assign Expression Intensity Values To Produce Consistent Image Sequences, However, Manual Annotation Fails When The Video Is Incomplete. Affinegan, A Recent Study, Uses Affine Transformation In Latent Space To Automatically Infer Expression Intensity Value, However, This Work Requires Extraction Of The Feature Of The Target Ground Truth Image, And There Is Also A Quality Issue On The Generated Image Sequences. To Address These Issues, A New Model Is Proposed By Integrating Self And Channel Attention Mechanisms Into The Generator Network For Quality Image Generation, And Mean Absolute Error Is Utilized To Train The Network. This Study Also Proposed To Infer The Expression Intensity Value Automatically Without The Need To Extract The Feature Of The Ground Truth Images. The Local Dataset Is Prepared With Frontal And With Two Different Face Positions On The Left And Right Sides. Average Content Distance Metrics Of The Proposed Solution Along With Different Experiments Have Been Measured And The Proposed Solution Has Shown Improvements. Based On SSIM And PSNR The Proposed Model Improves The Baseline Work Affinegan From 0.808 To 0.842 And From 32.922 To 33.574 Respectively. This Work Concludes That Using Mean Absolute Error Loss To Train The Network And Integrating Self And Channel Attention Into The Generator Network Improves The Quality Of The Generated Images Sequences And Evenly Distributing Values Based On Frame Size To Assign Expression Intensity Value Improves The Consistency Of Image Sequences Being Generated And Also Enables The Generator To Generate Different Frame Size Videos While Remaining Within The Range [0,1].

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By