TFill: Image Completion via a Transformer-Based Architecture

Video -- Abstract Bridging distant context interactions is important for high quality image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior.

The Spatially-Correlative Loss for Various Image Translation Tasks

Abstract We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation.

Pluralistic Image Completion

Video Abstract Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach for pluralistic image completion the task of generating multiple diverse and plausible solutions for image completion.

T2Net: Synthetic-to-Realistic Translation for Depth Estimation Tasks

Abstract Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire. We propose a framework, trained on synthetic imagedepth pairs and unpaired real images, that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network.

Semantic segmentation based on aggregated features and contextual information