Optimal transport for novelty and out-of-distribution detection

When:

01/03/2024 – 02/03/2024 all-day

2024-03-01T01:00:00+01:00

2024-03-02T01:00:00+01:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IRISA / LITIS
Durée : 5 mois
Contact : laetitia.chapel@irisa.fr
Date limite de publication : 2024-03-01

Contexte :
For a decision-making system trained on data to be reliable, it must possess the ability to adjust its decisions based on di erences between the distribution ptrain(Xtrain; Ytrain) of training samples and that of test samples ptest(Xtest; Ytest). In case of distribution shift, deep-based-approaches may be overcon dent and tend to treat the given inputs as one of the previously seen situations leading to mislabelling. This underscores the challenges in detecting out-of-distribution (OOD) samples, where the test point x0 is marginally sampled from ptest(x0) = ptrain(x0), or recognizing that point x0 belongs to an unseen class (involving a new type of object in the scenes for instance). Additionally, given the multimodal nature of inputs and variations in sensor availability, samples may not be embedded into the
same space, posing further challenges related to incomparable spaces. Our approach envisions employing optimal transport theory to develop algorithms addressing out-of-distribution detection, aiming for a
robust optimal transport framework. Optimal transport (OT) has become a potent tool for computing distances (a.k.a. Wasserstein or
earth mover’s distances) between data distributions, facilitated by new computational schemes that make transport computations tractable.

Sujet :
The primary goal of the internship is to investigate the behavior of optimal transport (OT) in scenarios where distributions are tainted by outliers or out-of-distribution (OOD) samples and to formulate a robust OT framework. Existing studies have utilized OT in such contexts, employing a straightforward rule that identifies points significantly distant from the other distribution as outliers. While approaches
like the regularization path or OT profiles have been effective in selecting optimal regularization parameters, particularly using techniques like the elbow rule, they may fall short when dealing with points
that are OOD but situated “between” the two distributions.
Conversely, Monge-Kantorovich (MK) quantiles and ranks present an alternative. This method replaces the traditional “left-to-right” ordering of samples with a “center-outward” approach applicable in Rd.

The internship’s specific objectives include: i) examining how the placement of outliers influences the OT solution, ii) developing a robust OT formulation with statistical guarantees, leveraging MK quantiles,
and iii) implementing the solution in the POT toolbox.
Furthermore, the internship will explore the integration of partial-OT-based loss in deep learning approaches as a means to evaluate the proposed methods. Ensuring scalability will be a crucial aspect
of the method’s development. Additionally, investigations into adapting the approach for incomparable spaces will be undertaken.

Profil du candidat :
Master student

Formation et compétences requises :
Applicants are expected to be graduated in applied mathematics/statistics and/or
machine learning and show an excellent academic profile. Beyond, good programming skills are expected.

Adresse d’emploi :
Depending on the candidate:
– LITIS in Rouen
– IRISA in Rennes

Document attaché : 202401180908_OT for OOD – madics.pdf

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Optimal transport for novelty and out-of-distribution detection