As a cost effective learning, word based zero-shot semantic segmentation (w-ZSSS) approaches are proposed, which recognizes an unseen target class only with a word vector and without a supporting image. The expressiveness of w-ZSSS is limited because their class representation of a novel class is constant. Tackling w-ZSSS, we propose a Spatial and Multi-scale aware Visual Class Embedding Network (SM-VCENet) for zero-shot semantic segmentation. SM-VCENet generates visual class embedding of an unseen class by transferring visual context knowledge on the query image, resulting domain-aware class representation. SM-VCENet enriches visual information of visual class embedding by incorporating multi-scale attention and spatial attention. Our SM-VCENet outperforms the state-of-the-art with a noticeable margin on the PASCAL and COCO test sets. We also propose a novel benchmark (PASCAL2COCO) for zero-shot semantic segmentation, which includes domain adaptation and more challenging samples.
Publisher
Ulsan National Institute of Science and Technology (UNIST)