Image inpainting is a task of filling the missing region in the images with plausible contents. Benefited from the development of deep learning, Pathak et al. firstly propose a GAN-based network for image inpainting and achieve impressive results. DeepFill utilizes contextual attention for image inpainting, which explicitly borrows surrounding image features as references. And the Co-Modulation GAN tackles the large-scale image inpainting via co-modulation of both conditional and stochastic style representations based on sophisticated StyleGAN architecture. Recently, LaMa uses the Fast Fourier Convolution to encode the features in frequency fields with global receptive fields and achieves resolution-robust inpainting. We plan to conduct a comprehensive overview of these methods.
Although image inpainting has made significant advances in recent years, it is still challenging to recover corrupted images with reasonable structures. EdgeConnect, in particular, utilizes canny edges to inpaint masked areas with precise structural results. However, no holistic structure information has been considered for man-made situations. In this talk, we will introduce a novel method MST to conquer the task of inpainting man-made scenes. Specifically, MST proposes learning a Sketch Tensor (ST) space for inpainting man-made scenes. Such a space is learned to restore the edges, lines, and junctions in images, and thus makes reliable predictions of the holistic image structures. Besides, these sophisticated prior-based methods are usually based on multi-stage or multi-model designs, which are costly to be trained from scratch. Therefore, ZITS was proposed to incrementally incorporate the prior into a pre-trained inpainting model without retraining. Moreover, ZITS can tackle high-resolution inpainting with intuitive structural upsampling and masking positional encoding. We will review these methods and discuss their pros and cons.
Compared with vanilla AutoEncoders (AEs), GANs achieved superior performance in various image-to-image tasks, e.g., pix2pix, cycleGAN, StarGAN, and SC-FEGAN. Because the adversarial loss can effectively relieve blurry artifacts with more perceptually pleasant results. Recently, benefited by the powerful generative capability of StyleGAN, various GAN inversion works also get impressive editing results with only optimizing the latent code with freezed GAN models. And the encoder-based GAN inversion method just trains an encoder to project the guidance to the initial latent codes for a pre-trained StyleGAN, which enjoys a faster inference efficiency compared with optimization-based GAN inversion approaches. Moreover, the work in further combines the optimization-based method and the encoder-based one, and enjoys both advantages of them. We will provide a comprehensive review of these methods.
Vision Transformers (ViTs) have achieved great success in many vision tasks. Benefited by the perceptual image tokenizer, such as VQVAE, many ViT-based methods can also be leveraged to solve various conditional-based image editing tasks. However, these works fail to address local editing suffering from the limited receptive fields of the standard AutoRegressive (AR) attention setting. In iLAT, a local AR ViT is proposed to solve this problem and achieve good results in both face and pose editing. Besides, ManiTrans can flexibly solve entity-level text-guided image manipulation with a semantic alignment module and a powerful ViT model. We will discuss these approaches and show pros and cons of them.
Sessions | Title (with slides) | Video | Speakers |
9:00 - 9:10 | Opening Remarks | Yanwei Fu | |
9:10 - 10:20 | The Priors Guided Image Synthesis and Editing | TBD | Yanwei Fu |
10:20 - 10:50 | Image Inpainting and Editing with Structural Prior Guidance | [Youtube] [Bilibili] | Chenjie Cao |
10:50 - 11:05 | Break | ||
11:05 - 12:15 | Structure Guided Image Inpainting and Novel View Synthesis | TBD | Shenghua Gao |
12:15 - 12:45 | Image Inpainting and Editing with Various Prior Guidance | [Youtube] [Bilibili] | Qiaole Dong |
Contact the Organizing Committee: yanweifu@fudan.edu.cn and gaoshh@shanghaitech.edu.cn.