Scene Graph Generation, Vision-Language

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

We introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-trained models (VLM) by incorporating an image-to-graph generation paradigm.