Technical Program

Paper Detail

Paper: PS-2A.16
Session: Poster Session 2A
Location: Symphony/Overture
Session Time: Friday, September 7, 17:15 - 19:15
Presentation Time:Friday, September 7, 17:15 - 19:15
Presentation: Poster
Paper Title: A Large Scale Multi-Label Action Dataset for Video Understanding
Manuscript:  Click here to view manuscript
Authors: Mathew Monfort, Kandan Ramakrishnan, MIT, United States; Dan Gutfreund, IBM Research and MIT-IBM Watson AI Lab, United States; Aude Oliva, MIT, United States
Abstract: The world is inherently multi-label. Even when restricted to the space of actions, multiple things and events often happen simultaneously and a single label is commonly insufficient for adequately explaining the full meaning of an event. To develop methods reaching human-level understanding of dynamical events, we need to capture the complex nature of our environment. Here, we present a multi-label extension to the Moments in Time Dataset which includes annotation of multiple actions in each video. We perform a baseline analysis and compare recognition results, class selectivity, and network robustness of a temporal relation network (TRN) trained on both single-label Moments in Time and the proposed multi-label extension.