File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Controllable multi-lingual visual text generation via diffusion model

Author(s)
Ku, Yunhoe
Advisor
Baek, Seungryul
Issued Date
2024-02
URI
https://scholarworks.unist.ac.kr/handle/201301/82147 http://unist.dcollection.net/common/orgView/200000744302
Abstract
This paper aims to address the limitations of visual text generation in text-to-image synthesis. Although recent efforts have been made for utilizing the stable-diffusion model in visual text generation task, they are constrained to the specific language used during training, and have limitations in the controllability such as reflecting the font of the text or generating it in desired regions. Additionally, geometric transformations have been restricted, leading to failures in text generation with arbitrary geometry, and the alignments between objects and texts have been rather neglected. In this paper, we propose to attach additional modules on the stable-diffusion model to extend its capabilities in controllable multi-lingual visual text generation. Our approach simultaneously expands the model to support multi-lingual generation accommodating arbitrary geometry and font styles in visual text generation with the alignment of texts and images at desired locations. Moreover, our model is able to generalize to new languages exploiting only a small number of training data for the language. Experiments are conducted on multi-lingual benchmarks, and we demonstrate the state-of-the-art performance through user studies and OCR accuracies.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Graduate School of Artificial Intelligence

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.