The Priors Guided Image Editing and Synthesis

ACCV 2022 tutorial

Dec 5th 2022 (9:00 AM - 1:00 PM)

Location: Orchid 3, 28F, Hotel Okura, Macau

ACCV 2022 Tutorial on " The Priors Guided Image Editing and Synthesis"

News

Our tutorial will be held on Dec 5th 2022 (9:00 AM - 1:00 PM) in Orchid 3, 28F, Hotel Okura, Macau.

Image inpainting

Image inpainting is a task of filling the missing region in the images with plausible contents. Benefited from the development of deep learning, Pathak et al. firstly propose a GAN-based network for image inpainting and achieve impressive results. DeepFill utilizes contextual attention for image inpainting, which explicitly borrows surrounding image features as references. And the Co-Modulation GAN tackles the large-scale image inpainting via co-modulation of both conditional and stochastic style representations based on sophisticated StyleGAN architecture. Recently, LaMa uses the Fast Fourier Convolution to encode the features in frequency fields with global receptive fields and achieves resolution-robust inpainting. We plan to conduct a comprehensive overview of these methods.

Prior guided image inpainting

Although image inpainting has made significant advances in recent years, it is still challenging to recover corrupted images with reasonable structures. EdgeConnect, in particular, utilizes canny edges to inpaint masked areas with precise structural results. However, no holistic structure information has been considered for man-made situations. In this talk, we will introduce a novel method MST to conquer the task of inpainting man-made scenes. Specifically, MST proposes learning a Sketch Tensor (ST) space for inpainting man-made scenes. Such a space is learned to restore the edges, lines, and junctions in images, and thus makes reliable predictions of the holistic image structures. Besides, these sophisticated prior-based methods are usually based on multi-stage or multi-model designs, which are costly to be trained from scratch. Therefore, ZITS was proposed to incrementally incorporate the prior into a pre-trained inpainting model without retraining. Moreover, ZITS can tackle high-resolution inpainting with intuitive structural upsampling and masking positional encoding. We will review these methods and discuss their pros and cons.

GAN based Image Editing

Compared with vanilla AutoEncoders (AEs), GANs achieved superior performance in various image-to-image tasks, e.g., pix2pix, cycleGAN, StarGAN, and SC-FEGAN. Because the adversarial loss can effectively relieve blurry artifacts with more perceptually pleasant results. Recently, benefited by the powerful generative capability of StyleGAN, various GAN inversion works also get impressive editing results with only optimizing the latent code with freezed GAN models. And the encoder-based GAN inversion method just trains an encoder to project the guidance to the initial latent codes for a pre-trained StyleGAN, which enjoys a faster inference efficiency compared with optimization-based GAN inversion approaches. Moreover, the work in further combines the optimization-based method and the encoder-based one, and enjoys both advantages of them. We will provide a comprehensive review of these methods.

Transformer based Image Editing

Vision Transformers (ViTs) have achieved great success in many vision tasks. Benefited by the perceptual image tokenizer, such as VQVAE, many ViT-based methods can also be leveraged to solve various conditional-based image editing tasks. However, these works fail to address local editing suffering from the limited receptive fields of the standard AutoRegressive (AR) attention setting. In iLAT, a local AR ViT is proposed to solve this problem and achieve good results in both face and pose editing. Besides, ManiTrans can flexibly solve entity-level text-guided image manipulation with a semantic alignment module and a powerful ViT model. We will discuss these approaches and show pros and cons of them.

Program

Sessions Title (with slides) Video Speakers
9:00 - 9:10 Opening Remarks   Yanwei Fu
9:10 - 10:20 The Priors Guided Image Synthesis and Editing TBD Yanwei Fu
10:20 - 10:50 Image Inpainting and Editing with Structural Prior Guidance [Youtube] [Bilibili] Chenjie Cao
10:50 - 11:05 Break  
11:05 - 12:15 Structure Guided Image Inpainting and Novel View Synthesis TBD Shenghua Gao
12:15 - 12:45 Image Inpainting and Editing with Various Prior Guidance [Youtube] [Bilibili] Qiaole Dong

Organizers and Speakers

Yanwei Fu

Fudan University

Shenghua Gao

ShanghaiTech University

Chenjie Cao

Fudan University

Qiaole Dong

Fudan University

Contacts

Contact the Organizing Committee: yanweifu@fudan.edu.cn and gaoshh@shanghaitech.edu.cn.