Constrained clustering: incremental and active integration

When:
21/05/2021 – 22/05/2021 all-day
2021-05-21T02:00:00+02:00
2021-05-22T02:00:00+02:00

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIFO and GREYC/LS2N
Durée : 3 years
Contact : thi-bich-hanh.dao@univ-orleans.fr
Date limite de publication : 2021-05-21

Contexte :
Clustering is an important task in Data Mining, which aims at partitioning data instances into groups to find the underlying structure of the data. Clustering has been extended to constrained clustering, which allows to integrate prior expert knowledge in the form of constraints, in order to make the clustering task more accurate [1]. Most constrained clustering methods request the specification of all the constraints before the subsequent running of the methods. It is however crucial that the expert could interact with the clustering process, that he could inject new information or knowledge in the form of constraints on a clustering result. Constraints can be pairwise must-link or cannot-link constraints, which state that two instances must be or cannot be in the same cluster, or can be constraints on the clusters, stating bounds on their size or their diameter, or can be operations on clusters, such as split a cluster or merge two clusters, etc. The constrained clustering process therefore becomes incremental and interactive. However, these two properties are not well considered in existing constrained clustering approaches. This thesis aims to investigate these two research directions.

Sujet :
The thesis will be organized into two complementary parts. The objective of the first part is to propose new clustering approaches enabling to integrate incrementally new constraints identified as important by the expert or by measures, such as [6]. Declarative approaches based on Constraint Programming (CP) or Integer Linear Programming (ILP) will be considered due to their expressiveness and their rich constraint language. Meanwhile, in order to avoid confusing the expert, the new partition solution should not be too different from the previous one. This could be guaranteed based on a measure of clustering similarity, which can be either statistical [8] or more explanatory [5].
In the second part, we will consider a more user-centered and interactive clustering approach. This new paradigm stresses that users should be presented quickly with new generated constraints likely to be interesting to them (i.e., which may improve clustering quality in later iterations), by giving feedback. These feedback could be of the form validate / invalidate the constraints. In the context of mono-clustering, these constraints can be generated based on the information on an existing partition to identify informative points (e.g. frontier points). We will also consider the case where a set of clusterings is available, like in collaborative and multiparadigme clustering [2,7]. In such settings, one can use information from different clusterings to identify for instance uncertainty pairs or to elicit best objective functions according to some criteria to be defined. Here, a pair is more uncertain if more clusterings disagree on whether it should be in the same cluster or not. Another approach is to exploit the history of the feedback to determine most informative points [3]. Meanwhile, to prevent contradiction during the collection of the user feedback, consistencies on the learned constraints must be ensured.
The proposed method will be generic and will not depend on the potential areas of application. As part of the HERELLES project, in order to validate the operability of the method, we will focus on understanding complex phenomena in our environment (soil artificialization, urbanization, construction of infrastructure, etc.) mainly via heterogeneous temporal data.

Profil du candidat :
Machine Learning, Data Mining, Constraint Programming and Applied Mathematics

Formation et compétences requises :
Master or Engineering shool

Adresse d’emploi :
The PhD position will be conducted at LIFO, University of Orléans in collaboration with GREYC/LS2N, University of Caen Normandy.

The complete application consists of the documents below, which should be sent as a single PDF file to:
Thi-Bich-Hanh Dao (thi-bich-hanh.dao@univ-orleans.fr) LIFO, University of Orleans
Samir Loudni (samir.loudni@imt-atlantique.fr) IMT Atlantique – CNRS – LS2N
● Detailed CV
● One-page cover letter (clearly indicating available starting date as well as relevant qualifications, experience and motivation)
● University certificates and transcripts (both B.Sc and M.Sc degrees marks)
● Contact details of up to three referees
● Possibly an English language certificate and a list of publications
● Attention: all documents should be in English or in French.

Document attaché : 202104230758_PhD subject Orleans 2021.pdf