Label Shift Matching for Anomaly Detection and Classification in Time Series

When:

08/05/2023 – 09/05/2023 all-day

2023-05-08T02:00:00+02:00

2023-05-09T02:00:00+02:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LITIS Lab (Université & INSA Rouen Normandy)
Durée : 3 ans
Contact : paul.honeine@univ-rouen.fr
Date limite de publication : 2023-05-08

Contexte :
Keywords:
Deep learning, anomaly detection, unsupervised learning, optimal transport, domain adaptation, label shift, time series

Supervision Team:
The PhD candidate will be a member of the Machine Learning group in LITIS Lab (University and INSA Rouen Normandy). She/he will be advised by Fannia Pacheco, Paul Honeine, Maxime Berar and Gilles Gasso.

Application:
Please send CV and transcripts of grades to fannia.pacheco@univ-rouen.fr and paul.honeine@univ-rouen.fr.
The deadline for application is the 8th of May 2023.

Sujet :
Description

Deep learning relies on large datasets to learn decision functions for a specific task. These decision functions are prone to be inaccurate at inference for online data, which may be corrupted by anomalies or suffer from a distribution shift. In the most difficult context, training data are labeled, while test data are unlabeled. Under some mild assumptions, the main distribution shift families are covariate shift and label shift. The former is related to causal learning (predicting effects, namely the conditional p(y|x) does not change), and the latter to anti-causal learning (predicting causes, namely the conditional p(x|y) does not change). This thesis focuses on the label shift for online data, since it emerges naturally in diagnosis tasks [1].

The PhD student will take advantage of recent developments in domain adaptation, and more specifically using Optimal Transport (OT), in order to address label shifts in time series. Domain adaptation by OT [2, 3] consists in transporting the source domain feature space to a space equivalent to the target domain space, and then learning a new feature space and decision function where both the source and the target label distributions match. Two major difficulties will be addressed in this PhD thesis. First, we consider unsupervised domain adaptation, namely the target data is available and unlabeled. Whilst label shift is still present in unsupervised domain adaptation, one would like to find the best matching between the source domain and the clusters created in the target domain. To this end, it assumes that all the classes (although unlabeled) are available in the target domain [4, 5]. Second, we consider online domain adaptation, which consists in performing domain adaptation on the fly [6]. This means that the target domain is not available, but new batches of data are available sequentially in order to infer the adaptation, thus requiring online algorithms [7].

This PhD thesis aims to address the most challenging conditions in an online framework and for real-world time-series applications. Its main objectives can be divided into three parts:
i) Study and formalize label shifts and its consequences over classification performance and anomaly detection in times series data.
ii) Propose a method to conceive same label matching in online contexts, by investigating recent advances in OT for domain adaptation.
iii) Create a framework that couples the label matching and unsupervised learning for new distribution discovery.
The proposed framework and devised methods will be evaluated in a variety of time series data for anomaly detection [8], with a focus on fault diagnosis for predictive maintenance in industrial applications [9, 10].

References

[1] Z. Lipton, Y. Wang and A. Smola, “Detecting and correcting for label shift with black box predictors,” ICML, 2018.
[2] N. Courty, R. Flamary, D. Tuia and A. Rakotomamonjy, “Optimal Transport for Domain Adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
[3] M. Alaya, M. Berar, G. Gasso, and A. Rakotomamonjy, “Theoretical guarantees for bridging metric measure embedding and optimal transport,” Neurocomputing, 2022.
[4] A. Rakotomamonjy, R. Flamary, G. Gasso, M. E. Alaya, M. Berar, and N. Courty, “Optimal transport for conditional domain matching and label shift,” Machine Learning, 2022.
[5] A. Alaoui-Belghiti et al., “Semi-supervised optimal transport methods for detecting anomalies,” ICASSP, 2020.
[6] M. de Carvalho et al., “ACDC: Online unsupervised cross-domain adaptation,” Knowledge-Based Systems, 2022.
[7] A. Mensch and G. Peyré, “Online Sinkhorn: Optimal transport distances from sample streams,” NeurIPS, 2020.
[8] K. Choi et al., “Deep Learning for Anomaly Detection in Time-Series Data: Review, Analysis, and Guidelines,” IEEE Access, 2021.
[9] F. Pacheco et al., “Deep Ensemble-Based Classifier for Transfer Learning in Rotating Machinery Fault Diagnosis,” IEEE Access, 2022.
[10] P. Honeine, S. Mouzoun, and M. Eltabach. “Neighbor retrieval visualizer for monitoring lifting cranes,” CMMNO, 2018.

Profil du candidat :
The PhD candidate must be a graduate student or have a MSc or engineering degree in one of the following fields: computer science, data science, applied mathematics or equivalent. She/he must have a strong background in machine learning and/or signal processing and/or computer vision. Experience in deep learning is appreciated, as well as proficient programming skills in Python.

Formation et compétences requises :
–

Adresse d’emploi :
Université de Rouen Normadie

MaDICS

Masses de Données, Informations et Connaissances en Sciences

Big Data - Data Science

Label Shift Matching for Anomaly Detection and Classification in Time Series