Read Aloud the Text Content
This audio was created by Woord's Text to Speech service by content creators from all around the world.
Text Content or SSML code:
We propose unicorn to solve conditional image repainting task. Let's start with a brief overview of the task. Conditional image repainting model aims at repainting the designated regions of the original image to synthesize appearance specified by the user under a series of conditions. In this paper, we repaint original images under four cross-modality conditions, including texture, color, geometry, and background. Existing CIR methods are implemented in a two-phase way. Firstly, under the guidance of input conditions, visual content is generated. Then, meaningless background area is replaced with a given background, after that visual content is seamlessly synthesized with the input background. In this way, these models can only adjust the color tone of repainted regions after the first generation phase, which degrades the performance. Then, we elaborate on the detailed designs of our unified conditional image repainting network. We show the pipeline of we proposed unicorn here. The color condition is first embedded and broadcast under the guidance of geometry. Then we convolve it and obtain the hidden feature as the initial input to generator. Other input conditions are fed into feature adaptive batch normalization. In feature adaptive batch normalization, geometry condition, background condition, and hidden feature are fused in cross-modality condition fusion module and then convolved to produce appearance parameters, along with pattern parameters from texture condition. The produced parameters are used to modulate hidden feature after batch normalization. In cross-modality condition fusion module, geometry condition is convolved as a gate to fuse the hidden feature and the background feature. We also propose a multi-grained attentive similarity loss to better constrain color consistency composed of two encoders. The image encoder is a group convolutional network that extracts features from middle layers as outputs, while the label encoder consists of encoder units that represent attributes in different semantic levels. For each sample, we obtain the features of synthetic images and color conditions at different semantic levels. Then we calculate the posterior probability of color condition and synthetic image matching as our loss. Next, we present experimental results to validate the advantages of our method and demonstrate its applications. Following the previous works, we use F I D, R precision, and M score for performance evaluation to demonstrate that unicorn achieves better performance in synthetic quality, condition consistency, and compositing effect than other state-of-the-art methods. We further conduct user study experiments on three datasets to confirming its subjective advantages. We demonstrate the robustness of unicorn in synthesizing images by modified input conditions. Noting the synthetic images vary with the left color bar. In person dataset, we change people’s clothing by modifying the geometry conditions. In landscape dataset, we modify the geometry condition to change the landscape layout. Thanks for watching!