Tutorials
June 17, Morning Session: 8:30am - 12:30pm
Using the Graphics Processing Unit for Computer Vision
E. Scott Larsen
The Graphics Processing Unit that is a built-in component of many computers today, has been used for acceleration of a variety of tasks in computer vision. Of those, some are simply a port from the CPU's architecture to the architecture of the GPU. Others are novel algorithms in and of themselves: ways that more natively take advantage of the true strengths of the GPU and would not necessarily be the best algorithms for a general-purpose CPU.
The tutorial will broadly cover the use of GPU's in computer vision. First, it will cover specific aspects of GPUs that directly relate to computer vision algorithms. Second, it will look at existing published computer vision algorithms that make use of the GPU. The final part of the tutorial will take the participants through an investigation and analysis of a broad array of current computer vision algorithms, discussing which aspects of each would or would not be good candidates for using the GPU, and perhaps discussing a few variations on these algorithms that may make them more suitable.
Linear and Multilinear (Tensor) Methods for Vision, Graphics, and Signal Processing
Fernando De la Torre & M. Alex O. Vasilescu
Linear and Multilinear methods (e.g., Principal Component Analysis, Independent Component Analysis, M-mode PCA, M-mode ICA) have been successfully applied in numerous visual, graphics and signal processing tasks over the past two decades. In this tutorial, we will provide a unified framework for several novel component analysis techniques useful for modeling, clustering and classification of high dimensional data.
In the first part of the tutorial, we will review traditional linear techniques such as PCA, LDA, CCA, and several extensions (linear and non-linear) to deal with outliers, lack of training data, etc. In the second part, we will show how to generalize the above methods to take advantage of the assets of multilinear algebra, the algebra of higher order tensors. We will discuss generalizations of the concepts of rank and orthogonality, tensor factorizations, generalization of the linear projection operator, etc. The tutorial will cover the application of these techniques to clustering, visual tracking, signal modeling (e.g. background estimation, virtual avatars), pattern recognition (e.g. face recognition, gait recognition), and computer graphics.
June 17, Afternoon Session: 1:30pm - 5:30pm
Embedded Computer Vision and Real-Time Algorithms for Smart Cameras
Branislav Kisacanin & Mathias Kolsch
As the number of applications of computer vision grows, so does the need for a comprehensive coverage of various issues that algorithm developers face as they work towards real-time, embedded systems. This tutorial is a focused, vertical introduction to this topic. It will build atop processor choices for real-time, embedded computer vision, cover low-level vision algorithms, and include a review of recent real-time algorithms for high-level vision.
Embedded vision systems are becoming prevalent for the military, for industrial applications, as well as for the consumer market, particularly for games. The current trend towards multimedia processors (integrated DSP) and multi-core processors will accelerate this proliferation and provide the grounds for many novel CV applications.
Human-Centered Vision Systems
T. Huang, Alex Jaimes, & Nicu Sebe
This tutorial will take a holistic view on the research issues and applications of Human- Centered Vision Systems focusing on three main areas: (1) multimodal interaction: visual (body, gaze, gesture) and audio (emotion) analysis; (2) image databases, indexing, and retrieval: context modeling, cultural issues, and machine learning for user-centric approaches; (3) multimedia data: conceptual analysis at different levels (feature, cognitive, and affective).
Human-computer Interaction lies at the crossroads of many research areas (computer vision, multimedia, psychology, artificial intelligence, pattern recognition, etc.) and is used in a wide range of applications. In particular, we are aiming at developing human-centered information systems. The most important issue here is how to achieve synergism between man and machine. The term human-centered is used to emphasize the fact that although all existing vision systems were designed with human uses in mind, many of them are far from being user friendly. What can the scientific/engineering community do to effect a change for the better?
In this short course, we take a holistic approach to the human-centered vision systems problem. We aim to identify the important research issues, and to ascertain potentially fruitful future research directions in relation to the two aspects above. In particular, we introduce key concepts, discuss technical approaches and open issues in three areas: (1) multimodal interaction: visual (body, gaze, gesture) and audio (emotion) analysis; (2) image databases, indexing, and retrieval: context modeling, cultural issues, and machine learning for user- centric approaches; (3) multimedia data: conceptual analysis at different levels (feature, cognitive, and affective).
The focus of the short course, therefore, is on technical analysis and interaction techniques formulated from the perspective of key human factors in a user-centered approach to developing Human-Centered Vision Systems.
June 18, Morning Session: 8:30am - 12:30pm
People Tracking
David Forsyth, Deva Ramanan, & Cristian Sminchisescu
This tutorial focuses on tracking people at the kinematic level, where one wishes to infer the 3D configuration of the major body segments from a single view. This is a difficult problem, because body segments are small and move quickly and unpredictably, and because the inferred 3D configurations are subject to complex ambiguities.
We describe the many different approaches to three core problems. Lifting involves determining what 3D configuration corresponds to a given image configuration; ambiguities in the lifting problem often create major problems for standard tracking algorithms. Data association is the problem of deciding which image pixels to track. Inference involves combining what is known about human motion with what is observed; the problem is complicated by the high dimension of the spaces involved, and by the fact that one may have to deal with multiple modes created by lifting ambiguities. This discussion will draw on background material from the animation community that constrains the family of available motion models.
June 18, Afternoon Session: 1:30pm - 5:30pm
Content-based Image and Video Retrieval
Theo Gevers, Arnold Smeulders & Nicu Sebe
The growing capacity of computers, the abundance of digital cameras and the increased connectivity of the world all point to large digital multimedia archives. They include images and videos from the World Wide Web, museum objects, flowers, trademarks, and views from everyday life. The faster they grow, the more prominently needed is the efficient access to the content of the images and videos.
In this short course, we will give a survey of the most recent developments on image and video search engines. First, the important step of feature extraction will be discussed in detail such as color, shape and texture information, particularly paying attention to discriminatory power and invariance. Then, we focus on the concepts of indexing and genre classification as intermediate step to sort the data. We pay attention to (interactive) ways to perform browsing and retrieval by means of information visualization and relevance feedback. Methods are being discussed to localize the retrieved objects in their images and images.
Bruce A. Maxwell, Swarthmore College
Tutorials Chair
