Multimodal Reasoning: Datasets, Models, and even some commonsense? – Laboratoire d'Informatique de l'Université du Mans

Séminaire de Peter Vickers, doctorant à l’Université de Sheffield

Date: 4/03/2022
Heure: 11h00
Lieu: IC2, Salle des conseils et en ligne
Intervenant: Peter Vickers

Multimodal Reasoning: Datasets, Models, and even some commonsense?

My PhD project involves researching multimodal question answering, looking at text, vision, and knowledge graphs. How can we pose difficult questions for AI to answer, which enforce complex reasoning over multiple modalities? What biases come into play when we create datasets to capture these questions? And do the state-of-the-art models for answering these questions actually attend to the data, or do they exploit shortcuts? I attempt to answer these questions and outline how we can ask genuinely challenging, complex questions which require deep, grounded multimodal understanding and complex reasoning.