Bobkov Denis
1
2
Vadim Titov
2
Aibek Alanov
1
2
Dmitry Vetrov
3
1 - HSE University
HSE icon
2 - Artificial Intelligence Research Institute
AIRI icon
3 - Constructor University, Bremen
CU icon
CVPR 2024
SFE teaser
SFE graph
StyleFeatureEditor is able to edit a real face image with the desired editing. On the left is an examples of how our method works for several directions with different editing power p. On the right we display a comparison with previous approaches. LPIPS (lower is better) indicates inversion quality, while FID (lower is better) indicates editing ability. The size of markers indicates the inference time of the method, with larger markers indicating a higher time.

Abstract

The task of manipulating real image attributes through StyleGAN inversion has been extensively researched. This process involves searching latent variables from a well-trained StyleGAN generator that can synthesize a real image, modifying these latent variables, and then synthesizing an image with the desired edits. A balance must be struck between the quality of the reconstruction and the ability to edit. Earlier studies utilized the low-dimensional W-space for latent search, which facilitated effective editing but struggled with reconstructing intricate details. More recent research has turned to the high-dimensional feature space F, which successfully inverses the input image but loses much of the detail during editing. In this paper, we introduce StyleFeatureEditor -- a novel method that enables editing in both w-latents and F-latents. This technique not only allows for the reconstruction of finer image details but also ensures their preservation during editing. We also present a new training pipeline specifically designed to train our model to accurately edit F-latents. Our method is compared with state-of-the-art encoding approaches, demonstrating that our model excels in terms of reconstruction quality and is capable of editing even challenging out-of-domain examples.

Overview

StyleFeatureEditor edits real images using StyleGAN's high-dimensional F-space. It first inverts the original image in F-space, then edits the found latents, and finally synthesises the edited image. In our work, we introduce a novel training pipeline that allows the use of more high-dimensional F-latents, leading to the preservation of much more original image details during editing.

Below we present examples of our method works on several celebrities. Our method succeed in desired editing, while backround and face details almost unchanged.

StyleFeatureEditor is also capable to edit complex examples. You can see comparison with previous methods below. SFE completyly apply editing, while previous approaches fail and cause artefacts.

Our method also works well on out-of-domain samples, such as faces from MetaFaces -- it applies editing while preserving the original style.

Metrics

Numerical results on CelebaHQ confirm the visual examples: StyleFeatureEditor outperforms previous SOTA in terms of LPIPS and L2 by more than a factor of four, while editing metrics and runtime are comparable. To measure editing quality, we used a technique that uses Celeba's markup; more details can be found in our paper (Section 4.3).

SFE Metrics

BibTeX

@InProceedings{Bobkov_2024_CVPR,
    author    = {Bobkov, Denis and Titov, Vadim and Alanov, Aibek and Vetrov, Dmitry},
    title     = {The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {9337-9346}
}