Recent progress in visual recognition goes hand-in-hand with the supervised learning and large-scale training data. While the number of existing images and videos is huge, their detailed annotation is expensive and often ambiguous. In this talk we will discuss addressing these problems, focusing on weakly supervised learning methods using incomplete and noisy supervision for training. In the first part, I will discuss recognition from still images and will describe our work on weakly supervised convolutional networks for recognizing and localizing objects and human actions. The second part of the talk will focus on the learning of human actions from videos. In particular, we will consider understanding specific tasks from YouTube instruction videos and corresponding narrations. We will conclude with future challenges in and opportunities for visual recognition.
https://mediaspace.gatech.edu/media/laptev/1_1t6ezunj
- Tags
-