2015 Looking At People CVPR Challenge - Track 1: Action Recognition (second round)
Evaluation metrics
Evaluation Measurement:
Rercognition performance is evaluated using the Jaccard Index (overlap). Thus, for each one of the 11 action categories labelled for each RGB sequence s, the Jaccard Index is defined as follows:
where A_(s,n) is the ground truth of action n at sequence s, and B_(s,n) is the prediction for such an action at sequence s. A_(s,n) and B_(s,n) are binary vectors where 1-value entries denote frames in which the n-th action is being performed.
In the case of false positives (e.g predicting an action not labelled as ground truth), the Jaccard Index will be automatically 0 for that action prediction and such an action class will count in the mean Jaccard Index computation. In other words n equals the intersection of action categories appearing in the ground truth and in the predictions.
Participants will be evaluated based on the mean Jaccard Index among all action categories for all sequences, where all action categories are independent but not mutually exclusive (in a certain frame more than one action class can be active). In addition, when computing the mean Jaccard Index all action categories will have the same importance. Finally, the participant with the highest mean Jaccard Index will be the winner.
An example of the calculation for a single sequence and two action categories is show next. As shown in the top of the figure, in the case of action/interaction spotting, the ground truth annotations of different action categories can overlap (appear at the same time within the sequence). Also, note that if different actors appear within the sequence at the same time, actions are labelled in the corresponding periods of time (that may overlap) but without needing to identify the actors in the scene, just the 11 action/interaction categories.
This example shows the mean Jaccard Index calculation for different instances of actions categories in a sequence (single red lines denote ground truth annotations and double red lines denote predictions). In the top part of the image one can see the ground truth annotations for actions walk and fight at sequence s. In the center part of the image a prediction is evaluated obtaining a Jaccard Index of 0.72. In the bottom part of the image the same procedure is performed with the action fight and the obtained Jaccard Index is 0.46. Finally, the mean Jaccard Index is computed obtaining a value of 0.59.
Participants must submit their predictions in the following format: For each sequence SeqXX.zip in the data folder, participants should create a SeqXX_prediction.csv file with a line for each predicted action [ActorID, ActionID, StartFrame, EndFrame] (the same format as SeqXX_labels.csv). The csv file should NOT end with an empty line. The predictions csv files for each sequence will be put in a single ZIP file and submitted to Codalab.
Evaluation scripts:
In the file ChalearnLAPEvaluation.py there are some methods for evaluation. The first important script allows to export the labels of a set of samples into a ground truth folder, to be used to get the final ovelap value. Let's assume that you use the sequences 1 to 3 for validation purposes, and have a folder valSamples with the files Seq01.zip to Seq03.zip as you downloaded from the training data set. We can create a ground truth folder gtData using:
>> from ChalearnLAPEvaluation import exportGT_Action
>> exportGT_Action(valSamples,gtData)
This method exports the label files and data files for each sample in the valSample folder to the gtData folder. This new ground truth folder will be used by evaluation methods.
For each sample, we need to store the actions predictions in a CSV file in the same format that labels are provided, that is, a line for each action instance with the actionID, the initial frame and the final frame. This file must be named as SeqXX_predictions.csv. To make it easy, the class ActionSample allows to store this information for a given sample. Following the example from last section, we can store the predictions for sample using:
>> from ChalearnLAPSample import ActionSample
>> actionSample = ActionSample("SeqXX.zip")
Now, if our predictions are that we have actor 1 performing the action 1 from frame 102 to 203, and then, actor 2 performing action 5 from frame 250 to 325, and we want to store predictions in a certain folder valPredict, we can use the following code:
>> actionSample = ActionSample("SeqXX.zip")
>> actionSample.exportPredictions(([1,1,102,203], [2,5,250,325]),valPredict)
Assuming previous defined paths and objects, to evaluate the overlap for a single labeled sample prediction, that is, prediction for a sample from a set where labels are provided, we can use:
>> overlap=actionSample.evaluate(gtData)
Finally, to obtain the final score for all the predictions, in the same way performed in the Codalab platform, we use:
>> from ChalearnLAPEvaluation import exportGT_Action
>> score=evalAction(valPredict,gtData)