Identity-Preserving Face Swapping via Dual Surrogate Generative Models

1Institute of Computing Technology, Chinese Academy of Sciences, China
2AI Lab, Tencent, China
3National Cheng Kung University, Taiwan
MY ALT TEXT

Face-swapped images generated by our method, E4S, StyleFusion, and InfoSwap. From top-to-down and left-to-right, female-to-female, female-to-male, male-to-male, and male-to-female swapping conditions are shown. We are superior to these high-fidelity face-swapping models in terms of identity preserving.

Abstract

In this study, we revisit the fundamental setting of face-swapping models and reveal that only using implicit supervision for training leads to the difficulty of advanced methods to preserve the source identity. We propose a novel reverse pseudo-input generation approach to offer supplemental data for training face-swapping models, which addresses the aforementioned issue. Unlike the traditional pseudo-label-based training strategy, we assume that arbitrary real facial images could serve as the ground-truth outputs for the face-swapping network and try to generate corresponding input <source, target> pair data. Specifically, we involve a source-creating surrogate that alters the attributes of the real image while keeping the identity, and a target-creating surrogate intends to synthesize attribute-preserved target images with different identities. Our framework, which utilizes proxy-paired data as explicit supervision to direct the face-swapping training process, partially fulfills a credible and effective optimization direction to boost the identity-preserving capability. We design explicit and implicit adaption strategies to better approximate the explicit supervision for face swapping. Quantitative and qualitative experiments on FF++, FFHQ, and wild images show that our framework could improve the performance of various face-swapping pipelines in terms of visual fidelity and ID preserving. Furthermore, we display applications with our method on re-aging, swappable attribute customization, cross-domain, and video face swapping.

Method

MY ALT TEXT

Comparison between different swapping training frameworks. (a) Face-swapping training with ground-truth swapped image $y_{gt}$; (b) Current swapping training with ID loss and reconstruction loss; (c) Our CSCS synthesis proxy paired data by dual-creating surrogates from one real image, and adaptation is operated to get the required training data. Credible supervision is applied by $L_{rec}$ between $y_{out}$ and $y_{gt}$ where a real image is utilized to guide swapping. The reserve design of dual surrogates prevents error accumulation from the multi-step synthesis process.

Image Results

Comparisons on FF++

MY ALT TEXT

Comparisons on More Data

MY ALT TEXT

Swappable Attribute Customization

MY ALT TEXT

Cross-domain Results

MY ALT TEXT

Re-aging

MY ALT TEXT

Video Results

BibTeX

@article{cscs,
        title={Identity-Preserving Face Swapping via Dual Surrogate Generative Models},
        author={Huang, Ziyao and Tang, Fan and Zhang, Yong and Cao, Juan and Li, Chengyu and Tang, Sheng and Li, Jintao and Lee, Tong-Yee},
        journal={ACM Transactions on Graphics (ToG)},
      }