Unbalanced Optimal transport for novelty and out-of-distribution detection

When:

25/07/2021 – 26/07/2021 all-day

2021-07-25T02:00:00+02:00

2021-07-26T02:00:00+02:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LITIS laboratory (INSA ROUEN) and IRISA Vannes (Un
Durée : 3 ans
Contact : laetitia.chapel@irisa.fr
Date limite de publication : 2021-07-25

Contexte :
To be safe, a decision device learned from data requires a mechanism that adapts the decision according to whether or not there is a discrepancy between the distribution ptrain(Xtrain; Ytrain) of the training samples and the ones of test samples ptest(Xtest; Ytest). In case of distribution shift, deep-basedapproaches may be overconfident and tend to treat the given inputs as one of the previously seen situations leading to mislabelling. This brings to the scientific challenges of detecting out-of-distribution (OOD)
samples (the test point x0 is marginally sampled from ptest(x0)= ptrain(x0)) or of recognizing that point x0 belongs to an unseen class (new type of object occurs in the scenes). Moreover due to the multimodal nature of the inputs and sensors availability, the samples may not be embedded into the same space, and hence compromising the success of the detection task. We envision to leverage on the optimal transport theory to implement algorithms dealing with out-of-distribution detection, with specific applications on road scene.
Optimal transport (OT) has emerged as a powerful tool to compute distances (a.k.a. Wasserstein or earth mover’s distances) between empirical distribution of data, thanks to new computational schemes
that make the transport computation tractable. It has wide applications in computer vision, statistics, imaging and has been recently introduced in the machine learning community to efficiently solve
classification or transfer learning problems. The advantage of OT is that it can compare possibly high dimensional empirical probability measures, taking into account the geometry of the underlying metric spaces and dealing with discrete measures. Classical optimal transport problem seeks a transportation map that preserves the total mass between two probability distributions, requiring their mass to be the same. This may be too restrictive in certain applications such as color or shape matching, since the distributions may have arbitrary masses and/or
that only a fraction of the total mass has to be transported. This happens also when datasets Xtrain and/or Xtest are contaminated by outliers, in which we may want to discard them from the tranportation
plan: this is the unbalanced[5] or the partial OT problem. Several algorithms have been devised to solve the problem, among them solve the exact partial problem when given as input the total mass that has to be transported between the two empirical distributions. More recently, the team has been developped to solve the unbalanced problem, providing the first regularization path for unbalanced OT.

Sujet :
The objective of the thesis is to study and implement OT-based strategies for dealing with OOD samples or when the datasets are contaminated by outliers.
In many cases, the number of such samples are unknown and should be estimated from the data. To do so, one can rely on two-sample tests and their Wasserstein counterparts [9]; when there is a shift between ptrain and ptest, or even when the 2 distributions do not lie on the same space, one can rather build on the Gromov-Wasserstein based tests.

In more details, the aim is to study how the partial/unbalanced formulation of OT can be used in the OOD and outliers scenarii. Integration of two-sample tests within the OT formulation as a regularization term will be considered first. As such, we aim at estimating from the data the proportion of contaminated samples in the datasets, together with the optimal transport plan in a unified formulation, even when the 2 distributions live in incomparable spaces. One can also rely on the regularization path to select the “best” regularization parameter in a given context. Integration of partial-OT-based loss in deep-based approaches will serve as a playground to evaluate the proposed methods. The scalability
should be an important feature of the methods to be developed.
From an application point of view, a particular attention will be given on OOD detection for road scene. The intended methods will be evaluated on real-world datasets comprising of automotive images (such as nuScenes, KITTI) or on autonomous car scene benchmark https://github.com/OATML/
oatomobile in order to build robust system for road scene analysis. The developed methods will be challenged with some current position approaches and their applications.

Profil du candidat :
Applicants are expected to be graduated in computer science and/or machine learning and/or signal & image processing and/or applied mathematics/statistics, and show an excellent academic
profile. Beyond, good programming skills are expected.

Formation et compétences requises :
computer science and/or machine learning and/or signal & image processing and/or applied mathematics/statistics

Adresse d’emploi :
Vannes or Rouen

Document attaché : 202106301333_Unbalanced optimal transport for OOD.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Unbalanced Optimal transport for novelty and out-of-distribution detection