Cross Modal Reasoning

Learning Cross Modal Context Graph for Visual Grounding

Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. …