Disentangled Representation Based One-Shot Realistic Neural Talking Head Synthesis

dc.contributor.advisorGetinet Yilma (PhD)
dc.contributor.authorAdugna, Abe
dc.date.accessioned2025-12-17T10:54:24Z
dc.date.issued2023-09
dc.description.abstractThis study introduces a novel deep-learning model for realistic neural talking heads. Realistic neural talking heads synthesize a talking head video using the target person's appearance, which can be seen in the source image, and the output motion is controlled by a driving video. The primary focus of this study is to generate a video that preserves the original image's look by acquiring motion information from the driving video. Prior approaches in this area primarily depend on 2D representations, such as appearance and motion, acquired from the input image. Recent attempts have been made to execute motion transfer on any object utilizing unsupervised techniques without the use of prior information. However, the considerable difference in poses between the objects in the source and driving images remains a significant challenge for current unsupervised algorithms. Even the most recent method failed to achieve this correctly with good visual effects. In order to solve the problem of poor visual effects in the videos with the large scale pose change, the GAN-based one-shot realistic neural talking heads model is proposed to mitigate these aforementioned issues. The proposed model employs cross-modal attention to preserve identity-related information and enhance the quality of the generated images. And also, use background and warp loss to reducing the background's noisy motion and encouraging the network to produce high-quality images. Additionally, to provide more precise and vivid visual effects, the multi-scale occlusion restoration module used in this study upsamples the low-resolution occlusion map to produce a multi-resolution occlusion map. Finally, in this study, disentangled representations were employed to facilitate animation and prevent the leakage of the driving object's appearance or shape. The experimental outcomes demonstrated that the proposed approach led to enhancements in several evaluation metrics, and the visual quality of the animated videos notably surpassed that of the MRAA. This model improves the MRAA baseline work from 0.040 to 0.034, from 1.28 to 1.13, and from 0.133 to 0.115, respectively, based on L1, AKD, and AED. Experiments demonstrate the superior performance of the proposed solution over the existing state-of-the-art methods on the Voxceleb1 benchmarked dataset with cross-modal attention, background and warp loss, multi-occlusion network, and disentangled representation.en_US
dc.description.sponsorshipASTUen_US
dc.identifier.urihttp://10.240.1.28:4000/handle/123456789/1606
dc.language.isoen_USen_US
dc.publisherASTUen_US
dc.subject- Generative Adversarial Network, One-shot, MRAA, Disentangled Representation, 2D Representations.en_US
dc.titleDisentangled Representation Based One-Shot Realistic Neural Talking Head Synthesisen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Adugna Abe.pdf
Size:
2.7 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections