Handling classes’ imbalance in supervised classification for medical diagnostics

05/03/2022 – 06/03/2022 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LAMADE – Pôle Sciences des Données – Université P
Durée : 5-6 months
Contact : sana.mrabet@dauphine.psl.eu
Date limite de publication : 2022-03-05

Contexte :
The classification of highly imbalanced data is a big challenge for machine learning techniques. To deal with this challenge, many solutions have been proposed that could be classified in three categories: data pre-processing with under/oversampling technique that creates a training sample with a new instances distribution, active sampling that changes the training sampling throw the learning process, and the Synthetic Minority Over-sampling Technique (SMOTE) that creates new synthetic instances in the minority class. The efficiency of each approach depends on the context. For the medical diagnostics, if the input data contains categorical attributes, the SMOTE methods could be not suitable. Otherwise, if the data imbalance ratio is high, using the under/oversampling could induce loss of information in the training sample

Sujet :
Study and compare three different approaches to handle classes’ imbalance in medical data: data pre-processing with over/under sampling, synthetic minority over-sampling and active sampling.

Profil du candidat :
Master 2 ou dernière année d’école d’ingénieur en informatique

Formation et compétences requises :
Bonne connaissance en Machine Learning et en programmation Python.
Maîtrise de l’anglais et bonne capacité rédactionnelle

Adresse d’emploi :
Université Paris Dauphine – PSL
Place du Maréchal de Lattre de Tassigny – 75775 PARIS Cedex 16

Document attaché : 202202211348_Proposition sujet mémoire 2022.pdf