Workshop on Machine Vision for Earth Observation and Environment Monitoring

Data-Centric Land Cover Classification Challenge

Curating, labeling, and working with large data sets requires a significant investment of energy, time, and money. Focusing on a core set of the most valuable samples, tailored to the task at hand, can effectively reduce these costs. Such data-centric approaches that prioritize value and quality over quantity have the potential to achieve this goal. This relatively recent trend in the field of machine learning shifts the focus from improving machine learning models, also known as model-centric machine learning, to optimize the data used to train these models, including how this data is presented during the learning process.

Challenge Summary

The data-centric land cover classification challenge, as part of the Workshop on Machine Vision for Earth Observation and Environment Monitoring and the British Machine Vision Conference (BMVC) 2023, aims at novel data-centric approaches towards selecting a core set of training samples for a semantic segmentation task.

Participants of this challenge are asked to develop a ranking strategy that assigns a score to each sample from a pool of candidate examples based on each sample's importance to the training procedure. The generated score/ranking will then be used to select a core set of training samples to train a pre-defined classifier, in this case a U-Net. Success is measured by training the aforementioned classifier multiple times using training datasets of different sizes based on the given ranking/scores (for example, by training a model using the top 1000 and the top 500 samples) and calculating the average Jaccard index using an undisclosed test dataset for the trained models.

Dataset

The challenge will use the Data Fusion Contest 2022 dataset. For this challenge, 90% of the training set images were divided into patches (of 256x256 pixels) thus composing the pool of candidate samples for training. 10% of the training images are part of a validation dataset. Finally, the evaluation will be carried out using the undisclosed test set. The dataset can be openly downloaded at ieee-dataport.org/competitions/data-fusion-contest-2022-dfc2022.

Resources (including pool)

Participants can find some resources available here. In this repo, participants can find:

an initial code to train the U-Net model using the pool of candidate samples
the pool of training candidate samples (file train_coordinate_list.txt)
the list of validation images (file val_image_list.txt)

Submission

First, participants should download the pool of candidate samples. Each row of this file represents a candidate sample for the training and is composed of 5 columns:

< name of the city > < name of the image > < patch x coordinate > < patch y coordinate > < sample score (float from 0.0 to 1.0) >

Currently, all samples have the same score/importance (i.e., 1.0). As mentioned before, the main objective of the participants is to develop a ranking system that assigns different scores (from 0.0 (low priority) to 1.0 (high priority)) to the candidate samples depending on their importance to the training procedure.

The final submission must be a file with all candidate samples, each with the same 5 columns, but varying the score depending on the importance of the example. This file must be sent to data-centric-challenge-mveo@googlegroups.com, with title [MVEO Challenge Submission TEAMNAME], where TEAMNAME should be the name of your team for the leaderboard (below).

Leaderboard

Submissions will be evaluated by training multiple U-Net models using training datasets of different sizes (1%, 10%, and 25%, specifically) based on the given ranking/scores and calculating the average Jaccard index using an undisclosed test dataset for all trained models. Note that for each training dataset size/percentage (1%, 10%, and 25%) 2 runs will be performed (for stability purposes) and average will be used. Then final result generated by each submission will then be included into the leaderboard below, which will be updated according to the timeline and deadlines below. Please note that the update will happen at certain times, i.e. it will not be triggered by a new submission. This means, participants will not see their newest results right away.

Team Name	Submission date	mIoU	1% IoU	10% IoU	25% IoU
Baseline (100%)		0.1222
AI4GG	29 October	0.1113	0.0882	0.1233	0.1223
AI4GG	7 November	0.1082	0.0953	0.1123	0.1171
AI4GG	2 October	0.1056	0.0921	0.1125	0.1124
CodisLab_Cardiff	5 November	0.1032	0.0780	0.1178	0.1138
CodisLab_Cardiff	19 October	0.1021	0.0802	0.1099	0.1163
Baseline (Random 1, 10, and 25%)		0.1016	0.0952	0.0998	0.1098
CodisLab_Cardiff	27 October	0.1000	0.0835	0.1076	0.1088
NIRA	6 November	0.0932	0.0909	0.0901	0.0986
AI4GG	13 October	0.0856	0.0435	0.1019	0.1113
CodisLab_Cardiff	15 October	0.0745	0.0530	0.0723	0.0981

Timeline

Participants may submit multiple and intermediate solutions, which will be evaluated following the timetable below. All submitted solutions will be listed on the leaderboard. Once a solution has been submitted, assessed, and listed in the leaderboard below, it can not be removed.

Subject	Date (all deadlines 23:59 UK Time)
Intermediate Deadline 1 (to submit intermediate ranking files)	Sunday, 1 October 2023
Intermediate Deadline 2 (to submit intermediate ranking files)	Sunday, 15 October 2023
Intermediate Deadline 3 (to submit intermediate ranking files)	Sunday, 29 October 2023
Final Submission Deadline (to submit the final ranking file)	Monday, 6 November 2023
Publication of Final Results	Friday, 17 November 2023
Workshop	Friday, 24 November 2023

Results, Presentation, Awards, and Prizes

The final results of this challenge will be presented during the Workshop. The authors of the 1st to 3rd-ranked teams will be invited to present their approaches at the Workshop in Aberdeen/UK, on 24 November 2023. These authors will also be invited to co-author a journal paper which will summarize the outcome of this challenge and will be submitted with open access to IEEE JSTARS.

Organizers

Keiller Nogueira, University of Stirling, UK
Ribana Roscher, Research Center Jülich and University of Bonn, Germany
Ronny Hänsch, German Aerospace Center (DLR), Germany

Terms and Conditions

Participants of this challenge acknowledge that they have read and agree to the Dataset Terms and Conditions described here: https://www.grss-ieee.org/community/technical-committees/2022-ieee-grss-data-fusion-contest/.