Aneeshan Sain |
Ayan Kumar Bhunia |
Yongxin Yang |
Tao (Tony) Xiang |
Yi-Zhe Song |
SketchX, Centre for Vision Speech and Signal Processing, University of Surrey, United Kingdom |
|
Sketch as an image search query is an ideal alternative to text in capturing the finegrained visual details. Prior successes on fine-grained sketch-based image retrieval (FGSBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixelperfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail – a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at
corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.
Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval
Our method’s (blue) efficiency over a Triplet-loss trained Siamese baseline network (red) at varying extent (coarse, coarse++) of sketch details is shown. Numbers denote rank of the matching photo.
Paper and Bibtex
[Paper]
|
|
Citation Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval. In BMVC 2020.
[Bibtex]
@article{sain2020cross,
title={Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval},
author={Sain, Aneeshan and Bhunia, Ayan Kumar and Yang, Yongxin and Xiang, Tao and Song, Yi-Zhe},
journal={arXiv preprint arXiv:2007.15103},
year={2020}
}
|
|
|
|
Acknowledgements
Website template from here and here.