Séminaire de Peter Vickers, doctorant à l’Université de Sheffield
Lieu: IC2, Salle des conseils et en ligne
Intervenant: Peter Vickers
Multimodal Reasoning: Datasets, Models, and even some commonsense?
My PhD project involves researching multimodal question answering, looking at text, vision, and knowledge graphs. How can we pose difficult questions for AI to answer, which enforce complex reasoning over multiple modalities? What biases come into play when we create datasets to capture these questions? And do the state-of-the-art models for answering these questions actually attend to the data, or do they exploit shortcuts? I attempt to answer these questions and outline how we can ask genuinely challenging, complex questions which require deep, grounded multimodal understanding and complex reasoning.