Large-scale and weakly supervised learning of objects and actions

mpi-is 22 January 2013 - 22 January 2013

22 January 2013 -

We, first, address the problems of large scale image classification. We present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We show and interpret the importance of an appropriate vector normalization. Furthermore, we discuss how to learn given a large number of classes and images with stochastic gradient descent and show results on ImageNet10k. We, then, present a weakly supervised approach for learning human actions modeled as interactions between humans and objects. Our approach is human-centric: we first localize a human in the image and then determine the object relevant for the action and its spatial relation with the human. The model is learned automatically from a set of still images annotated (only) with the action label. Finally, we present work on learning object detectors from realworld web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects in a set of videos of the class and learns a detector for it. The approach extracts candidate spatio-temporal tubes based on motion segmentation and then selects one tube per video jointly over all videos.

Über uns

Events

News

Public Engagement

Jobs

Start-up Network

Corporate Network

Investor Network

The AI Incubator

Health Cluster

Large-scale and weakly supervised learning of objects and actions

mpi-is 22 January 2013 - 22 January 2013