Integration of a constraint extraction mechanism into a collaborative clustering process

When:
31/03/2020 – 01/04/2020 all-day
2020-03-31T02:00:00+02:00
2020-04-01T02:00:00+02:00

Annonce en lien avec l’Action/le Réseau : aucun

Laboratoire/Entreprise : ICUbe – Université de Strasbourg
Durée : 5-6 mois
Contact : gancarski@unistra.fr
Date limite de publication : 2020-03-31

Contexte :
Analysing satellite image time-series using supervised methods requires that thematic classes are perfectly known and defined, and that the expert is able to provide a sufficient set of training data in terms of both number and quality. Faced with the difficulty of obtaining enough examples for the such an analysis, new clustering methods use constraints to guide the clustering process [1,3,4,5]. In particular, in our team, we have developed SAMARAH an innovative method of collaborative interactive clustering under constraints [2]. This method allows the expert to add constraints “on the fly” to guide the process in order to produce clusters closer to the expert’s “intuition”, i.e. potential thematic classes. Thus, the SAMARAH collaborative method developed by ICube allows constraints to be considered incrementally.
Nevertheless, selecting which piece of additional information (object to be labelled, new constraint to apply, etc) is most relevant, i.e. that has a positive impact on the current result, is often very difficult for the expert. Indeed, to define new constraints, the expert almost exclusively uses a visualisation of the scene. Experiments show that, on the one hand, the expert focus on relatively large regions of the image and, on the other hand, they have no way of knowing whether the constraints that are proposed are consistent with each other and relevant a priori. In fact, selecting new information is an important scientific problem, especially since it is essential to optimise the manner in which to obtain this new information from an expert. If they do not see a rapid improvement of the solution following their help, they will quickly lose confidence in the system. Paradoxically, the potential disruptions to the current solution (by the new information) should be limited in order not to disorient the expert. To this end, the expert must be assisted with advice or propositions for new constraints by the method in an active way [6,7].

Sujet :
The objective of this internship is to study and implement mechanisms to propose potentially relevant constraints. This can be done, for example, using two approaches [1]: dependent on, and independent from the clustering algorithm. Ideas in the algorithm dependent direction are, to use the difference between results due to the heterogeneity of methods in SAMARAH, and/or by developing new measures based on the inconsistency [8] and informativeness [9] measures. Directions in the algorithm independent direction are to use a complexity measure, for example, based on trees of minimal weight to identify points at the boundaries between clusters and use them to define constraints, or by developing new measures similar to coherence [9] for time-series.
For the consolidation of proposals and thematic validation, the intern will be able to rely on the work undertaken between ICube and SERTIT. Different fields of application are envisaged such as (non-exhaustively):
1. Detection and monitoring of tree cuts in the Vosges mountains: the detection of clear cuts has already been the subject of previous studies. The case of selective cutting, which is much more complex, could be studied.
2. Monitoring of (re)vegetation around new infrastructure: this will involve identifying vegetation revitalisation/reinstallation classes around newly created infrastructure and then monitoring the evolution of this multi-annual vegetation.
The proposed mechanism(s) will be integrated into the FODOMUST-MULTICUBE platform [10] dedicated to the multi-temporal analysis of remote sensing data.

Profil du candidat :
Second year student of a Master’s of Computer Science degree,
Gratification : 550€ per month

Formation et compétences requises :
The candidate must have good skills in data analysis and more particularly in supervised or unsupervised classification of time series. Skills in remote sensing image analysis are welcome.

Adresse d’emploi :
ICube – SDC Team
Pierre Gançarski – Thoma Lampert
Pôle API
67 400 Illkirch

Document attaché : Sujet_HIATUS_ENG.pdf