A Hybrid Deep Learning Approach for Color Face Generation Using Mask R-CNN and GAN
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Generating realistic, high-resolution color face images remains a challenging task in computer vision because it requires both a precise structural representation of facial components and fine grained texture generation. Existing GAN-based techniques tend to fail in preserving semantic consistency between facial areas and generating high-quality textures, thus resulting in unnatural or blurry images. To solve these issues, this research presents a hybrid deep learning approach that combines Mask R-CNN for semantic facial region segmentation and a GAN-based generator for realistic face image synthesis. Mask R-CNN is used to precisely segment dominant facial features, including eyes, nose, and mouth, which are then employed to guide an attention-enhanced U-Net generator. The generator uses self-attention modules to model long-range spatial dependencies and channel attention to dynamically highlight informative feature channels, thereby preserving local and global image details. A multiscale PatchGAN discriminator enforces image realism at various scales, while training process employs a combination of pixel-wise, perceptual, feature-matching, and structural similarity losses to enhance overall image realism. The introduced method is compared on the CelebA-HQ dataset, with PSNR of 28.33, SSIM of 0.9207, and FID of 21.77, outperforming baseline models and state-of-the-art methods. In addition, the employment of semantic guidance and attention mechanisms enables the model to generalize well even with small or diverse datasets, making it practical for real-world usage scenarios. The results show the framework's promise for high-fidelity facial synthesis in virtual reality, digital media, and other computer vision applications.
