ChaLearn

Track description

Overview

Two large video multi-modal datasets called ChaLearn LAP RGB-D Isolated Gesture Dataset (IsoGD) and Continuous Gesture Dataset (ConGD) are designed for gesture recognition, see Figure 1. Both datasets include 47933 gestures, 249 gesture classes, which are the largest RGB-D gesture datasets in term of gesture numbers and classes. The ConGD is designed for gesture spotting and recognition in continuous sequences of gestures while the second one is designed for gesture classiﬁcation from segmented data. The main challenges of the new datasets are “user independent” and “large-scale” learning video-based gesture recognition.

Figure 1: Sample images taken from the depth video for a set of the considered gestures.

The competitions will be collocated to our ICCV 2017 workshop on action and gesture recognition. In this regard, the role of the competition is critical as it comprises a testbed for proposing new computer vision methodologies for the recognition of human actions and gestures.

Background & elements of interest

Owing to the limited amount of training samples on the previous released gesture datasets, it is hard to apply them on real applications. Therefore, we have built two large-scale gesture dataset: Chalearn LAP RGB-D Isolated/Continuous Gesture Dataset (IsoGD, ConGD). The focus of the challenges is "large-scale" learning and "user independent", which means gestures per each class are more than 200 RGB and depth videos, and training samples from the same person do not appear in the validation and testing sets. In both IsoGD and ConGD dataset, there are 249 gesture labels and manually labeled temporal segmentation to obtain the start and end frames for each gesture in continuous videos from the CGD dataset. Both RGB-D datasets are the largest ones in the current state of the art. We believe that the challenges will stimulate research people for large-scale multi-modal gesture recognition.

We only ran one challenge on this data at ICPR 2016. In previous edition several participants joined the competition, providing different deep learning solutions for both datasets. Still the best performance was around 60%, showing a big room of improvement that motivates this second challenge round. In this new stage of both challenges (tentatively running between May 2017 and August 2017) we will run a competition on these two large datasets: largest dataset of the state of the art for gesture recognition in terms of number of categories, including thousands of samples. In addition, we will include different evaluation metrics in order to better analyze methods performance and rank participants in both competitions.

Track information

It includes 47933 RGB-D gestures in 22535 RGB-D gesture videos (about 4G). Each RGB-D video may represent one or more gestures, and there are 249 gestures labels performed by 21 different individuals. The database has been split into three mutually exclusive subsets:

Sets	# of Labels	# of Gestures	# of RGB Videos	# of Depth Videos	# of Performers	Label Provided	Temporal Segmentation Provided
Training	249	30442	14314	14314	17	Yes	Yes
Validation	249	8889	4179	4179	2	No	Yes
Testing	249	8602	4042	4042	2	No	No

Main Task:

Gesture spotting and recognition from continuous RGB and depth videos.
Large-scale Learning.
User Independent: the users in the training set do not disappear in test and validation sets.

Participants can register and participate in this track through https://competitions.codalab.org/competitions/16499.

News

April 20: ICCV'17 competition started

Chalearn Coopetition on Action, Gesture, and Emotion Recognition started.