Scene Understanding

SGTR+: End-to-end Scene Graph Generation with Transformer

Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property. Most previous works adopt a bottom-up, two-stage or point-based, one-stage approach, which often suffers from high time complexity or …

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding. One promising and scalable strategy for learning visual grounding is to utilize weak …

Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation

Scene graph generation is an important visual under-standing task with a broad range of vision applications.Despite recent tremendous progress, it remains challenging due to the intrinsic long-tailed class distribution and large …