[Topic] Visual Relationship and Vision-Language Representation


Reasoning about the relationships between objects is a crucial task for holistic scene understanding. Beyond existing works of recognition and detection, relationships between objects also constitute rich semantic information about the scene. We have developed a series of methods for visual relation detection, visual relation grounding and object relation reasoning. In particular, we are interested in cross-modality representation learning in visual and language domain, and using commonsense knowledge to facilitate visual relation understanding.

Learning Cross-Modal Context Graph for Visual Grounding, Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He AAAI Conference on Artificial Intelligence (AAAI), 2020

Pose-aware Multi-level Feature Network for Human Object Interaction Detection, Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He International Conference on Computer Vision (ICCV), 2019

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text, Alexander Mathews, Lexing Xie, Xuming He IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018