Stage M2 – Active learning and object detection in multimodal aerial images

01/04/2024 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : IRISA/UBS
Durée : 6 mois
Contact :
Date limite de publication : 2024-04-01

Contexte :
Detailed topic at:

The context of this internship is motivated by issues raised in studies
with data collected by airborne imagery. The automation of the processing of this data, by
object detection methods and supervised learning, requires annotated databases. The annotation
step is therefore a task of great interest, both in machine learning (ML) and computer vision
(CV). Carrying it out manually is tedious and costly in terms of time and human resources.
Furthermore, in the case of multimodal images (i.e. acquired by several sensors), annotation
must be performed for each modality.
Active Learning (AL) is related to semi-supervised Machine Learning in which a learning
algorithm can interact at each iteration with the user to get some information about labels of
new data during the training step. It is motivated by situations in which it is easy to collect
unlabeled data but costly (time, money, tedious task) to (manually) obtain their labels. It stems
from the idea that we should only acquire labels that actually improve the ability of the model
to make accurate predictions. Instances that are more useful than others according to some
performance measures have to be identified to create an optimal training dataset: well chosen,
fewer representative instances are needed to achieve similar performance as if we label and use
all available data. This selection process has been investigated as selective sampling [9]. The
importance of an instance is related to a high level of both the information and uncertainty
relative to the trained model, considering therefore a trade-off between informativeness (ability
to reduce the uncertainty of a statistical model) and representativeness (ability to represent the
whole input data space) of the selection process [6].
In remote sensing, AL has therefore become an important approach to collect informative
data for object detection and supervised classification tasks, and to assist the annotation process.
The effectiveness of object detection models is intricately tied to the quantity of annotated data
at their disposal. To overcome this challenge, AL attempts to formulate a strategy for cherrypicking pertinent data that an annotator should annotate, as elucidated by Choi et al. [5]. This
typically involves employing a scoring mechanism that is related to the model’s uncertainties
about the data. Computationally, ascertaining these uncertainties usually necessitates a multimodel approach. However, it’s noteworthy that these ensemble techniques are resource-intensive.
Hence, the overarching objective of AL lies in the formulation of a classification function that
faithfully mirrors the data’s contribution to the learning process.

Sujet :
In the paper by Brust et al. [3], a novel approach to object detection using
deep learning is introduced. Their approach incorporates AL strategies to explore unlabeled
data. The authors proposed and compared various learning metrics that are suitable for most
object detectors, taking into account class imbalance.
To start this project, the first step involves evaluating the performance of a multimodal
object detector (like YOLOrs [10], SuperYOLO [13], YOLOFusion [7] …) with respect to these
metrics by applying them to a single modality (RGB for example). This evaluation will be
carried out under different settings, including various sizes of the initial dataset and different
adjustments of algorithm parameters. Then, the aim is to extend the AL strategy to the case
of multimodal images. Indeed, for each object all modalities do not contribute equally to the
classification/localization tasks, one can be more informative than the other.
Finally, metrics proposed by Brust et al. [3], focus on classification uncertainty, however,
the aspect of localization is overlooked. To get the uncertainty of localization, we can use a
strategy like the one of the Gaussian YOLO approach [4, 5] that provides both classification
and localization uncertainties which we can then use with Brust et al. metrics.

Profil du candidat :
Student in computer science and/or machine learning and/or signal & image processing and/or applied statistics

Formation et compétences requises :
good programming skills in Python (Pytorch knowledge appreciated), knowledge of deep-learning for image analysis, and high interest to investigate machine learning methods.

Adresse d’emploi :
IRISA, UBS, Campus de Tohannic, 56000 Vannes

Document attaché : 202311201649_2024_IRISA-UBS_internship_Active learning and object detection.pdf