Data-Centric Land Cover Classification Challenge
Curating, labeling, and working with large data sets requires a significant investment of energy, time, and money. Focusing on a core set of the most valuable samples, tailored to the task at hand, can effectively reduce these costs. Such data-centric approaches that prioritize value and quality over quantity have the potential to achieve this goal. This relatively recent trend in the field of machine learning shifts the focus from improving machine learning models, also known as model-centric machine learning, to optimize the data used to train these models, including how this data is presented during the learning process.
Challenge Summary
The data-centric land cover classification challenge, as part of the Workshop on Machine Vision for Earth Observation and Environment Monitoring and the British Machine Vision Conference (BMVC) 2023, aims at novel data-centric approaches towards selecting a core set of training samples for a semantic segmentation task.
Participants of this challenge are asked to develop a ranking strategy that assigns a score to each sample from a pool of candidate examples based on each sample's importance to the training procedure. The generated score/ranking will then be used to select a core set of training samples to train a pre-defined classifier, in this case a U-Net. Success is measured by training the aforementioned classifier multiple times using training datasets of different sizes based on the given ranking/scores (for example, by training a model using the top 1000 and the top 500 samples) and calculating the average Jaccard index using an undisclosed test dataset for the trained models.
Dataset
The challenge will use the Data Fusion Contest 2022 dataset. For this challenge, 90% of the training set images were divided into patches (of 256x256 pixels) thus composing the pool of candidate samples for training. 10% of the training images are part of a validation dataset. Finally, the evaluation will be carried out using the undisclosed test set. The dataset can be openly downloaded at ieee-dataport.org/competitions/data-fusion-contest-2022-dfc2022.
Resources (including pool)
Participants can find some resources available here. In this repo, participants can find:
- an initial code to train the U-Net model using the pool of candidate samples
- the pool of training candidate samples (file train_coordinate_list.txt)
- the list of validation images (file val_image_list.txt)
Submission
First, participants should download the pool of candidate samples. Each row of this file represents a candidate sample for the training and is composed of 5 columns:
< name of the city > < name of the image > < patch x coordinate > < patch y coordinate > < sample score (float from 0.0 to 1.0) >
Currently, all samples have the same score/importance (i.e., 1.0). As mentioned before, the main objective of the participants is to develop a ranking system that assigns different scores (from 0.0 (low priority) to 1.0 (high priority)) to the candidate samples depending on their importance to the training procedure.
The final submission must be a file with all candidate samples, each with the same 5 columns, but varying the score depending on the importance of the example. This file must be sent to data-centric-challenge-mveo@googlegroups.com, with title [MVEO Challenge Submission TEAMNAME], where TEAMNAME should be the name of your team for the leaderboard (below).
Leaderboard
Submissions will be evaluated by training multiple U-Net models using training datasets of different sizes (1%, 10%, and 25%, specifically) based on the given ranking/scores and calculating the average Jaccard index using an undisclosed test dataset for all trained models. Note that for each training dataset size/percentage (1%, 10%, and 25%) 2 runs will be performed (for stability purposes) and average will be used. Then final result generated by each submission will then be included into the leaderboard below, which will be updated according to the timeline and deadlines below. Please note that the update will happen at certain times, i.e. it will not be triggered by a new submission. This means, participants will not see their newest results right away.
Team Name | Submission date | mIoU | 1% IoU | 10% IoU | 25% IoU |
---|---|---|---|---|---|
Baseline (100%) | 0.1222 | ||||
AI4GG | 29 October | 0.1113 | 0.0882 | 0.1233 | 0.1223 |
AI4GG | 7 November | 0.1082 | 0.0953 | 0.1123 | 0.1171 |
AI4GG | 2 October | 0.1056 | 0.0921 | 0.1125 | 0.1124 |
CodisLab_Cardiff | 5 November | 0.1032 | 0.0780 | 0.1178 | 0.1138 |
CodisLab_Cardiff | 19 October | 0.1021 | 0.0802 | 0.1099 | 0.1163 |
Baseline (Random 1, 10, and 25%) | 0.1016 | 0.0952 | 0.0998 | 0.1098 | |
CodisLab_Cardiff | 27 October | 0.1000 | 0.0835 | 0.1076 | 0.1088 |
NIRA | 6 November | 0.0932 | 0.0909 | 0.0901 | 0.0986 |
AI4GG | 13 October | 0.0856 | 0.0435 | 0.1019 | 0.1113 |
CodisLab_Cardiff | 15 October | 0.0745 | 0.0530 | 0.0723 | 0.0981 |
Timeline
Participants may submit multiple and intermediate solutions, which will be evaluated following the timetable below. All submitted solutions will be listed on the leaderboard. Once a solution has been submitted, assessed, and listed in the leaderboard below, it can not be removed.
Subject | Date (all deadlines 23:59 UK Time) |
---|---|
Intermediate Deadline 1 (to submit intermediate ranking files) | Sunday, 1 October 2023 |
Intermediate Deadline 2 (to submit intermediate ranking files) | Sunday, 15 October 2023 |
Intermediate Deadline 3 (to submit intermediate ranking files) | Sunday, 29 October 2023 |
Final Submission Deadline (to submit the final ranking file) | Monday, 6 November 2023 |
Publication of Final Results | Friday, 17 November 2023 |
Workshop | Friday, 24 November 2023 |
Results, Presentation, Awards, and Prizes
The final results of this challenge will be presented during the Workshop. The authors of the 1st to 3rd-ranked teams will be invited to present their approaches at the Workshop in Aberdeen/UK, on 24 November 2023. These authors will also be invited to co-author a journal paper which will summarize the outcome of this challenge and will be submitted with open access to IEEE JSTARS.
Organizers
Keiller Nogueira, University of Stirling, UK
Ribana Roscher, Research Center Jülich and University of Bonn, Germany
Ronny Hänsch, German Aerospace Center (DLR), Germany
Terms and Conditions
Participants of this challenge acknowledge that they have read and agree to the Dataset Terms and Conditions described here: https://www.grss-ieee.org/community/technical-committees/2022-ieee-grss-data-fusion-contest/.