2017 Looking at People ICCV Challenge - Large-scale continuous gesture recognition (round 2)
Track description
Overview
Two large video multi-modal datasets called ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and Continuous Gesture Dataset (ConGD) are designed for gesture recognition, see Figure 1. Both datasets include 47933 gestures, 249 gesture classes, which are the largest RGB-D gesture datasets in term of gesture numbers and classes. The ConGD is designed for gesture spotting and recognition in continuous sequences of gestures while the second one is designed for gesture classification from segmented data. The main challenges of the new datasets are “user independent” and “large-scale” learning video-based gesture recognition.
Figure 1: Sample images taken from the depth video for a set of the considered gestures.
The competitions will be collocated to our ICCV 2017 workshop on action and gesture recognition. In this regard, the role of the competition is critical as it comprises a testbed for proposing new computer vision methodologies for the recognition of human actions and gestures.
Background & elements of interest
Owing to the limited amount of training samples on the previous released gesture datasets, it is hard to apply them on real applications. Therefore, we have built two large-scale gesture dataset: Chalearn LAP RGB-D Isolated/Continuous Gesture Dataset (IsoGD, ConGD). The focus of the challenges is "large-scale" learning and "user independent", which means gestures per each class are more than 200 RGB and depth videos, and training samples from the same person do not appear in the validation and testing sets. In both IsoGD and ConGD dataset, there are 249 gesture labels and manually labeled temporal segmentation to obtain the start and end frames for each gesture in continuous videos from the CGD dataset. Both RGB-D datasets are the largest ones in the current state of the art. We believe that the challenges will stimulate research people for large-scale multi-modal gesture recognition.
We only ran one challenge on this data at ICPR 2016. In previous edition several participants joined the competition, providing different deep learning solutions for both datasets. Still the best performance was around 60%, showing a big room of improvement that motivates this second challenge round. In this new stage of both challenges (tentatively running between May 2017 and August 2017) we will run a competition on these two large datasets: largest dataset of the state of the art for gesture recognition in terms of number of categories, including thousands of samples. In addition, we will include different evaluation metrics in order to better analyze methods performance and rank participants in both competitions.
Track information
It includes 47933 RGB-D gestures in 22535 RGB-D gesture videos (about 4G). Each RGB-D video may represent one or more gestures, and there are 249 gestures labels performed by 21 different individuals. The database has been split into three mutually exclusive subsets:
Sets |
# of Labels |
# of Gestures |
# of RGB Videos |
# of Depth Videos |
# of Performers |
Label Provided |
Temporal Segmentation Provided |
Training |
249 |
30442 |
14314 |
14314 |
17 |
Yes |
Yes |
Validation |
249 |
8889 |
4179 |
4179 |
2 |
No |
Yes |
Testing |
249 |
8602 |
4042 |
4042 |
2 |
No |
No |
Main Task:
- Gesture spotting and recognition from continuous RGB and depth videos.
- Large-scale Learning.
- User Independent: the users in the training set do not disappear in test and validation sets.
Participants can register and participate in this track through https://competitions.codalab.org/competitions/16499.