MaDICS

Candidate genes prioritization using knowledge graphs and AI

Jun 23 – Jun 24 all-day

Offre en lien avec l’Action/le Réseau : DOING/– — –

Laboratoire/Entreprise : University of Montpellier. LIRMM computer science
Durée : 3 years
Contact : pierre.larmande@ird.fr
Date limite de publication : 2023-06-23

Contexte :
To meet the challenges of the global demand for food in a context of climate change, a better understanding of agronomically important traits, such as yield, quality, and resistance to abiotic and biotic stresses is crucial to improve crops production capacities. Deciphering molecular mechanisms that drive a particular trait is one of the most critical research areas in biology. However, these genotype-phenotype interactions are difficult to identify because they occur at different molecular levels in the plant and are strongly influenced by environmental factors (i.e., climate change). For biologists, it is difficult to search for relevant information as it is often dispersed in several databases on the Internet each with different data models, scales or distinct means of access. Today’s major challenges are related to the development of methods to integrate these heterogeneous data and to enrich biological knowledge. The scientists also need methods to dig into this mass of data and to highlight relevant information that identifies key genes. To this end, we developed the AgroLD [1] platform which is a knowledge graph that uses Semantic Web technologies to integrate heterogeneous agronomic data from the genome to the phenome (i.e., from the set of genes to the set of phenotypes observed in a plant organism). AgroLD is actively developed. As of today, AgroLD contains more than 900 million triples resulting from the integration of around 100 datasets gathered in 33 named graphs.

Sujet :
The thesis is proposed under the frame of the DIG-AI ANR project which aims to develop machine learning methods combined with knowledge graphs such as AgroLD to study the molecular interactions driving the phenotype development in crops.

Objective 1: The current challenges are related to the development of methods for functional analysis of genes and in particular to methods for prioritization of candidate genes. Indeed, the data integrated from databases are incomplete, heterogeneous, insufficient to infer genes function with good accuracy. One of the first objectives of the thesis will be the development of knowledge extraction methods to extract functional information on genes in scientific documents.

Objective 2: The recent success of graph neural networks (GNNs) suggests the possibility of systematically incorporating multiple sources of information into a heterogeneous network and learning the nonlinear relationship between phenotypes and genes [2]. However, knowledge graphs like AgroLD can be complex and contain interference information. Therefore, as proposed by [3, 4], some GNN models could reduce the influence of noisy data on the overall prediction effect by assigning low weights to unreliable nodes/edges. The second objective will be to develop an adapted approach to the AgroLD context by building meaningful representations from the high dimensional and complex gene data.

Objective 3: Finally, based on previous candidate gene studies in the biomedical field [5, 6] and because inferring gene regulatory networks (GRN) can be formulated as a link prediction problem in Graph Neural Networks (GNN) [7], the third objective will be to apply GNN models to implement candidate gene prioritization and GRN methods to answer biological questions related to adaptation of crops to drought stress and plant diseases.

References

1. Venkatesan A, Tagny Ngompe G, Hassouni NE, Chentli I, Guignon V, Jonquet C, et al. Agronomic Linked Data (AgroLD): A knowledge-based system to enable integrative biology in agronomy. PLOS ONE. 2018;13:1–17.
2. Zhang X-M, Liang L, Liu L, Tang M-J. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet. 2021;12.
3. Neil D, Briody J, Lacoste A, Sim A, Creed P, Saffari A. Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs. ArXiv181200279 Cs Stat. 2018.
4. Li X, Saude J. Explain Graph Neural Networks to Understand Weighted Graph Features in Node Classification. ArXiv200200514 Cs. 2020.
5. Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinform. 2018;34:i901–7.
6. Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics. 2021;37:853–60.
7. Gligorijević V, Barot M, Bonneau R. deepNF: deep network fusion for protein function prediction. Bioinformatics. 2018;34:3873–81.

Profil du candidat :
Expected profile:
The candidate must have the equivalent of a BAC+5 degree from a University or Engineering School, with specialization in applied mathematics, data science-related, graph theory and machine learning fields. A good understanding of molecular biology and bioinformatics is a plus. We are expecting applicants to have a solid background in programming (Python). The candidate must have a good understanding of English.

Formation et compétences requises :

How to apply:
Applications have to be send before June 23th 2023 and require the following documents:
1) Motivation letter
2) 2-pages max CV
3) M1, M2 academic transcripts
4) references if possible
to be sent by mail to: pierre.larmande@ird.fr and francois.scharffe@umontpellier.fr

Adresse d’emploi :
Link to the full description: https://sites.google.com/site/larmandepierre/positions/phd-in-computational-biology-and-bioinformatics

Categories: theses

Jun

Continual/life long learning for time series prediction in environmental sciences

Jun 30 – Jul 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LIFAT / RFAI, Université de Tours, France
Durée : 3 years
Contact : nicolas.ragot@univ-tours.fr
Date limite de publication : 2023-06-30

Contexte :
More details here: http://www.rfai.lifat.univ-tours.fr/phd-position-continual-life-long-learning-for-time-series-prediction-in-environmental-sciences/

The JUNON project, driven by the BRGM, is granted from the Centre-Val de Loire region through ARD program (« Ambition Recherche Développement »). The main goal of JUNON is to elaborate digital services through large scale digital twins in order to improve the monitoring, understanding and prediction of environmental resources evolution and phenomena, for a better management of natural resources. Digital twins will allow to virtually reproduce natural processes and phenomena using combination of AI and environmental tools.
JUNON will focus on the elaboration of digital twins concerning quality and quantity of ground waters, as well as emissions of greenhouse gases and pollutants with health effects, at the scale of geographical area corresponding to the North part of the Centre-Val-de-Loire region. These digital twins will rely on geological and meteorological knowledge and data (time series), as well as physic-based models.
The project actors are: BRGM, Université d’Orléans, Université de Tours, CNRS, INRAE, and ATOS and ANTEA companies.

Sujet :
The PhD position will be in the WP4 of Junon, focused on the prediction of quantity of ground waters and/or prediction of ground/air pollutants. Postdocs at the BRGM and LIFAT will have in charge respectively to collect and arrange data (ground waters levels at different locations) and to benchmark predictions with mechanistic models as well as with classical prediction AI tools integrating several sources of information like:
– meteorological data
– spatial information, i.e. geolocalization of sensors and locations of predictions to be made; topological information such as altitude
– integration of knowledge from mechanistic models as well as from expert knowledge (impact of attributes and variables used)
– etc.

The goal of the PhD will be, relying on these data and protocols, to work on new learning algorithms to allow these AI models to learn continuously giving new observed data as a stream. The scientific locks are clearly related to continual learning for Deep Learning prediction models and especially to deal with:
– few shot learning in DL
– drift and anomaly detection,
– plasticity/stability dilemma
– adapting such algorithms to suggested models by postdoc, based on Transformers or Spatio-Temporal Graph Neural networks using heterogeneous data.

Profil du candidat :
Student having a master degree in computer sciences with experiences in deep learning.

To apply, send the following documents by e-mail to nicolas.ragot [at] univ-tours.fr before 20th of June: a CV, a motivation letter, a short description of your experiences in machine/deep learning, references from academics.

Formation et compétences requises :
Master or Engineering degree or equivalent in computer sciences (Machine learning, data sciences) or applied mathematics

– a good experience in data analysis and machine learning (theory and practice of deep learning in python) is required
– experiences/knowledge in time series prediction and environmental science is welcome
– curiosity and ability to communicate (in English at least) and work in collaboration with scientists from other fields
– autonomy and good organization skills

Adresse d’emploi :
The RFAI group (Pattern Recognition and Image Analysis) is part of the LIFAT (EA 6300) computer science lab.
64 avenue Jean Portalis
37200 Tours , FRANCE

Document attaché : 202306051525_Thèse Junon apprentissage continu.pdf

Categories: theses

Machine learning and graph-based techniques to predict long-term bacterial community structure

Jun 30 – Jul 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Lorraine Research Laboratory in Computer Science a
Durée : 36 mois
Contact : sabeur.aridhi@loria.fr
Date limite de publication : 2023-06-30

Contexte :
We propose a fully funded 3-years PhD position in computer science with application to biomolecule analysis. The proposed position is funded by the Lorraine Université d’Excellence (LUE) through a multidisciplinary project that involves 2 researchers in computer science and 4 researchers in microbiology. A PhD thesis in microbiology will be conducted in parallel.

Context and motivations

Bacteriocins are antimicrobial peptides of bacterial origin with a very high economic potential in the agri-food sector. They are used in biopreservation/biocontrol applications to fight against undesirable microorganisms in the agronomy and food industry. LIBio has recently developed a technology based on the selection of two strains of the lactic acid bacterium Carnobacterium maltaromaticum producing anti-Listeria monocytogenes bacteriocins [9]. These strains inhibit the growth of this pathogen in cheeses when added to the manufacturing milk to produce the antimicrobial agents in the cheese matrix. These remarkable properties have led to a patent [10] that has very recently been licensed to a ferment producer. However, like the vast majority of biopreservation technologies, the effect is at best bacteriostatic: there is little or no decay of the pathogenic bacteria, which can then be maintained at low concentrations in the food. The biopreservation technologies described in the literature are based on engineering approaches that do not take advantage of the properties of the microbial communities forming the microbiomes of food products. Yet microbiome engineering is among the 12 promising technologies that could transform food systems over the next decade [11]. Indeed, in the case of biopreservation, assemblies of microorganisms could allow obtaining communities producing multiple antimicrobial agents and moreover being able to occupy the ecological niche of the undesirable microorganism to exclude it more efficiently. However, knowledge in the field of microbial community engineering is insufficient to fully exploit their potential. Indeed, due to the complexity of microbial communities, there is no available method to predict microbial community structure based on the knowledge of the ecological properties of microorganisms. Moreover, assembling microorganisms whose properties is to produce antimicrobial agents is a major difficulty because these agents can lead to the mutual exclusion of the microorganisms producing them.

Positioning

In a microbial ecosystem in which members produce antimicrobial substances like bacteriocins, three actors can be considered: the bacteriocin-producing microorganism (P) and the microorganisms sensitive (S) and resistant (R) to this bacteriocin. It was experimentally shown that in simple ecosystems mixing three such actors, all three actors are able to maintain equilibrium [12]. In these systems, S is more competitive than R because it does not pay the cost of resistance, and R is more competitive than P because it does not pay the cost of bacteriocin production. This cyclical relationship between P, S, and R is similar to that of the popular game « rock paper scissors » where no one player has an advantage over the other two: each player can overtake one player and each can be defeated by another. These simple experimental systems suggest that it is possible to implement engineering tools to predict the structure of complex communities based on the interactive properties of microorganisms. Thanks to the emergence of high throughput investigation methods, it is now possible to produce interaction data between large sets of microorganisms and thus reconstitute models of microorganism interaction networks. [6] [7] Lately, Ramia et al. [4] [5] built the interaction network corresponding to 73 Carnobacterium maltaromaticum strains. Like previously, the graph is sender-determined and also shows a highly nested structure [7], which means that it is different from a randomly built network with the same number of nodes and edges. The results also show that the competitive interaction network is very dense making C. maltaromaticum a very interesting model to develop community engineering approaches producing high performance antimicrobial substances cocktails for the fight against undesirable microorganisms. This project will use the data published in Ramia et al. [5] and will try to provide a rather computer science approach to the study of those interaction graph properties.

The originality of this project is that it will make it possible to integrate experimental variables describing the properties of interaction between microorganisms for the prediction of community structure which is not possible by existing methods.

Sujet :
Objectives of the thesis

The main goal of the thesis is to use advanced machine learning and graph-based approaches in order to predict the long-term community structure in microbiological ecosystems [3] [1]. Particularly, it aims at providing approaches to deduce diversity directly from the static, inner properties of the interaction graph the entities are involved in. The practical objectives of this interdisciplinary PhD project, which will be carried out in collaboration with researchers from the Laboratoire d’Ingénierie des Biomolécules (LIBio), are as follows:

to study existing research works on the analysis of interaction networks and long-term diversity prediction in bacteria.
to propose machine learning and graph-based approaches in order to learn models that are able to predict diversity based on the interaction graphs. In this context, regression methods could be used to learn the relation between the interaction graph properties and the diversity.
to study how graph embedding could help in predicting the level of development for each strain. In this context, we aim to study the impact of graph embedding methods on the prediction results. A specific embedding method could be proposed in the context of this project.

Profil du candidat :
Required qualification: Candidates must have a master degree in computer science. Good programming skills in a procedural language are essential. Experience of machine learning and graph mining is also desirable but not essential. A strong interest in bioinformatics would also be highly desirable.

Formation et compétences requises :
Required qualification: Candidates must have a master degree in computer science. Good programming skills in a procedural language are essential. Experience of machine learning and graph mining is also desirable but not essential. A strong interest in bioinformatics would also be highly desirable.

Adresse d’emploi :
Lorraine Research Laboratory in Computer Science and its Applications (LORIA), Nancy, France

Document attaché : 202305240952_PROJET-DE-THESE-LORIA-LUE.pdf

Categories: theses

modèles génératifs pour les données de mobilité maritime

Jun 30 – Jul 1 all-day

Offre en lien avec l’Action/le Réseau : DOING/– — –

Laboratoire/Entreprise : Ecole navale (EA3634)
Durée : 36 mois (+12)
Contact : cyril.ray@ecole-navale.fr
Date limite de publication : 2023-06-30

Contexte :
L’École navale recherche une/un doctorant(e) en informatique / science des données. En complément de ses travaux de recherche, elle/il interviendra dans les domaines de formation des élèves officiers ingénieurs et des étudiants de masters de l’Ecole navale.

Titulaire d’un master (ou équivalent) en informatique, la personne recrutée devra s’investir dans les activités d’enseignement et au sein du laboratoire dans des travaux de recherche liés au traitement de l’information maritime, à l’intelligence artificielle et plus généralement aux sciences des données. La thèse s’effectuera au sein de l’équipe de recherche MoTIM dans l’objectif de contribuer au domaine du Traitement de l’Information Maritime issue de sources hétérogènes (données capteurs, signaux, images, vidéos, informations géographiques, données textuelles) à l’aide d’algorithme d’intelligence artificielle.

Sujet :
La génération de données et de jeux données pseudo-synthétiques est utilisée pour un large éventail d’activités, notamment comme données de test pour de nouveaux outils ou algorithmes, pour la validation de modèles et dans la formation de modèles d’IA. Plus récemment la génération de données synthétiques créées artificiellement plutôt que générées par des événements réels a pris un essor avec l’apparition de modèles génératifs. Les données synthétiques constituent un type d’augmentation de données pour lequel les « Generative Adversarial Nets (GAN) » ont montré des performances prometteuses sur divers types de données. Dans le domaine maritime, le suivi et l’analyse des mobilités a été accéléré avec l’apparition du Système Automatiquement d’Identification (AIS) qui permet la localisation des navires équipés en temps-réel et à travers tous les océans. Les données produites sont des séries spatio-temporelles impactées par des données manquantes, des problèmes d’intégrité issues des capteurs et/ou de la transmission, et des malversations de natures diverses telles que la falsification de localisation, de trajectoire ou encore d’identité. Dans ce contexte, l’objectif de cette thèse est d’aborder la génération de données synthétiques et l’annotation sémantique de cette donnée. Les travaux de thèse pourront s’articuler notamment au travers des objectifs suivants :

– Développer un modèle génératif pour les données de mobilités maritimes permettant de produire des jeux de données
– Évaluer la prise en compte de données hétérogènes complémentaires ; eg. État de la mer.
– Aborder la scénarisation / annotation des jeux de données et évaluer l’utilité et l’impact de techniques « classiques » d’imputation de données pour aborder la variabilité de scénarios conçus.
– Considérer le problème de classification et de détection de nouveauté en simultanée, notamment pour la prise en compte de données falsifiées.
– Évaluer les performances / généricité de la démarche en fonction de la localisation géographique des données produites.

Profil du candidat :
Diplôme : Master (ou équivalent) en informatique.

Intérêt pour l’enseignement.
Intérêt pour un travail de recherche sur les problématiques maritimes et navales.
Compétences techniques en traitement de l’information.
Bonnes capacités de rédaction scientifique.
Bonnes capacités relationnelles et humaines

Formation et compétences requises :
Compétences : bonne connaissance des outils et des modèles de base de l’Intelligence Artificielle (apprentissage automatique / profond, etc.) et des techniques de représentation et de traitement de données (géographiques) hétérogènes (corrélation de données, analyse de séries temporelles, imputation de données, etc.)

Adresse d’emploi :
Institut de recherche de l’école navale
Lanvéoc-Poulmic / Brest

Document attaché : 202306121519_FDP_2023_DFS_DDR_E5033_AER_IA.pdf

Categories: theses

PhD studentship in information extraction and knowledge representation to optimize the surveillance of plant pathogen vectors

Jun 30 – Jul 1 all-day

Offre en lien avec l’Action/le Réseau : DOING/– — –

Laboratoire/Entreprise : MaIAGE/Wimmics
Durée : 36 months
Contact : claire.nedellec@inrae.fr
Date limite de publication : 2023-06-30

Contexte :
We are seeking a highly motivated PhD candidate within the framework of the research project “Information acquisition from textual data for early insect vector surveillance inplant health”. The central aim of the thesis project is to develop ontology-based NLP methods to acquire updated and relevant knowledge on epidemic diseases for improved risk management linked to insect vectors. The quality and relevance of the extracted information will be both derived from the collected documents and from the formal representation of the knowledge base. For plant health, the biological interaction between insect vectors, pathogens, and host plants is the primary focus.
The studentship will be affiliated with the laboratory MaIAGE [2] at the INRAE research center in Jouy-en-Josas University of Paris-Saclay (Computer Science GS), with PHIM [3] at the INRAE research center in Montpellier, and Wimmics group, I3S at Univ Côte d’Azur and Inria [4].

[1] https://maiage.inrae.fr/fr/bibliome
[2] https://umr-phim.cirad.fr/en/recherche/comprendre-les-epidemies-dans-les-champs-prism/equipe-forisk
[3] https://team.inria.fr/wimmics/

Sujet :
For more details, see https://maiage.inrae.fr/fr/node/2726

Profil du candidat :
A successful candidate will have an MSc or equivalent in Artificial Intelligence. Knowledge of natural language processing and Knowledge representation methods will be an advantage. Master studies in a related area, e.g., biology or bio-informatics will be an advantage.
– High level of academic English or French, both written and spoken;
– Good programming skills in Python or Java (and preferably experience with deep learning tools)
– Capacity to work as part of a team in a multidisciplinary framework.
– Experiences of applied research to Life Science is an asset.
We offer a motivating research environment with many opportunities for in-house, national and international collaborations and access to computing resources and state-of-the-art research equipment.

Formation et compétences requises :
Application
——
Interested candidates should send their application file to Claire Nédellec (claire.nedellec@inrae.fr), to Nicolas Sauvion (Nicolas.sauvion@inrae.fr), Catherine Faron (faron@i3s.unice.fr).
Applications will be assessed as they are received and decisions taken on a rolling basis. The application should comprise:
– a CV (max 5 pages) with transcripts (Master), diplomas, internships
– a cover letter
– the names and contact of two referees for reference letters< [1] https://maiage.inrae.fr/fr/bibliome [2] https://umr-phim.cirad.fr/en/recherche/comprendre-les-epidemies-dans-les-champs-prism/equipe-forisk [3] https://team.inria.fr/wimmics/

Adresse d’emploi :
Location:

The student will share his/her time between MaIAGE (Jouy-en-Josas) and Wimmics (Sophia Antipolis) over long periods (1 to 2 years) and will make regular visits to PHIM (Montpellier).

Salary:

2300 gross salary per month including social security package (healthcare, pensions, unemployment benefits).

Categories: theses

Uncertainty quantification for machine and deep learning techniques

Jun 30 – Jul 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : FEMTO-ST
Durée : 3years
Contact : noura.dridi@ens2m.fr
Date limite de publication : 2023-06-30

Contexte :
Most of the real physical system and everyday situations include
uncertainty. This is the case for medical diagnosis, weather forecasting, evolution of the stock market and so on. In the literature two types of uncertainty are distinguished:
aleatoric uncertainty denotes the one that is inherent to the data, e.g., noise in measurements or natural variability of the inputs, and epistemic uncertainty related to the model and due to lack of knowledge. Measuring the uncertainty is important, so as to support the user in the action to take. For example, when an anomaly is detected, with weak confidence level, another source of information should be added (image, human intervention, etc.) before planning intervention actions. More generally, quantification of the prediction uncertainty allows to trust or not predictions. In fact, incorrect overconfident predictions can be harmful and lead to erroneous decision.

Sujet :
Goal of the thesis: The goal of this thesis is to develop a robust method to evaluate uncertainty for machine and deep learning algorithm predictions. Major of works focused on improving the algorithm performance, few works deal with measuring the uncertainty
related to the predictions. In particular in this thesis we want to relax some hypothesis in the existing approach related to the distribution of the data and symmetry of the algorithm. This subject is challenging with many theoretical and applicatives difficulties. It is multidisciplinary including competences in probability, statistic and data processing. The
two principal goal are:
-First, we aim to measure the impact of uncertainty miss evaluation on the decision.
-The second part is focused on developing new method to quantify uncertainty, that can be applied to different type of data and without restrictive constraint on distribution or the exchangeability.
The third part, includes generalization of the proposed method when we have noisy and/or missing data.
The second part include study of the theoretical aspects: proof of convergence, complexity issue. In addition to practical aspects: independence from the chosen algorithm, architecture of the NN, implementation… Finally, a validation criterion is defined to attest
the performance of the uncertainty measure.

Profil du candidat :
Master in applied mathematics (or equivalent). Probability, statistic.
Good skills in Python programming. Experience in machine learning/deep learning

Formation et compétences requises :
Master in applied mathematics (or equivalent: engineering school diploma)

Adresse d’emploi :
FEMTO-ST
15B avenue des Montboucons
25030 Besançon cedex France

Document attaché : 202304261402_ThesisOfferFEMTO.pdf

Categories: theses

Sat

Analyse des usages, bibliothèque numérique, recherche d’information

Jul 1 – Jul 2 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : OpenEdition / LIS
Durée : 3 ans
Contact : patrice.bellot@univ-amu.fr
Date limite de publication : 2023-07-01

Contexte :
Dans le contexte du projet européen COMMONS (https://www.thecommonsproject.org), OpenEdition (https://www.openedition.org) recrute deux doctorant·es pour travailler à concevoir et déployer un observatoire des usages qui permettra de documenter les pratiques des utilisateurs et utilisatrices des quatre plateformes d’OpenEdition, ainsi que d’Huma-Num et Métopes.

Sujet :
Les deux doctorant·es (l’un·e en informatique/humanités numériques et l’autre en sciences de l’information et de la communication/sociologie) collaboreront à la conception d’un protocole d’enquête permettant de documenter les usages des plateformes de science ouverte. Une première campagne d’enquête permettra non seulement un retour critique et réflexif à des fins d’amélioration du protocole mais également une première typologie des usagers et pratiques.

La structure d’accueil du doctorat sera OpenEdition, à Marseille, en co-tutelle avec :
– le LIS (UMR AMU CNRS de l’INS2i) et son école doctorale (ED 184) pour la thèse en informatique ;
– ELICO (Équipe de recherche de Lyon en sciences de l’Information et de la COmmunication) et son école doctorale (485 ED EPIC) pour la thèse en sciences de l’information et de la communication.

Votre CV, un projet de recherche (max. 1 page + 10 références) et une lettre de motivation doivent être déposés sur le portail du CNRS.
Veuillez également mentionner les coordonnées de 2 contacts de référence que nous contacterons directement le cas échéant.

Offre en SIC : https://emploi.cnrs.fr/Offres/Doctorant/UAR2504-SIMDUM-001/Default.aspx
Offre en informatique : https://emploi.cnrs.fr/Offres/Doctorant/UAR2504-SIMDUM-002/Default.aspx

Contact Informatique : Patrice Bellot (AMU CNRS LIS) patrice.bellot@univ-amu.fr ; Simon Dumas Primbault (CNRS OpenEdition) simon.dumas-primbault@openedition.org

Profil du candidat :
Bac +5 informatique (Master Informatique ou équivalent)

Formation et compétences requises :
Apprentissage machine
Recherche d’information
Développement Python

Adresse d’emploi :
Marseille

Categories: theses

Apprentissage pour l’étude de l’activité électrophysiologie haute-résolution

Jul 1 – Jul 2 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : LaTIM & Lab-STICC
Durée : 36 mois
Contact : francois.rousseau@imt-atlantique.fr
Date limite de publication : 2023-07-01

Contexte :
Lab

La recherche à IMT Atlantique concerne près de 800 personnes, dont 290 enseignants et chercheurs et 300 doctorants, et porte sur les technologies du numériques, de l’énergie et de l’environnement. Il couvre toutes les disciplines (des sciences physiques aux sciences humaines et sociales en passant par celles de l’information et du savoir) et couvre tous les domaines des sciences et des technologies de l’information et de la communication.

La thèse se déroulera au laboratoire LaTIM (INSERM U1101), sur le campus de Brest, en collaboration avec le Lab-STICC (Brest).

Date de début : Octobre 2023
Financement : Union européenne (projet CEREBRO)

Sujet :
Description

Description du projet :
Le projet EIC Pathfinder CEREBRO (an electric Contrast medium for computationally intensive Electroencephalographies for high REsolution BRain imaging withOut skull trepanation) vise le développement d’une nouvelle modalité d’imagerie de l’anatomie et de l’activité électrophysiologique du cerveau, qui est essentielle pour de nombreuses applications, notamment la dosimétrie électromagnétique, la neurostimulation, les interfaces cerveau-ordinateur et le diagnostic de maladies telles que le cancer, l’épilepsie et la maladie de Parkinson.

L’imagerie de l’activité cérébrale peut être réalisée à l’aide d’un électroencéphalographe (EEG), mais en raison des effets de blindage du crâne, la résolution spatiale des relevés est limitée. Une solution fréquente pour surmonter ce problème consiste à implanter des électrodes directement sous le crâne (ECoG) ou sur le cortex. L’imagerie qui en résulte est de meilleure qualité, mais elle n’est que locale.

CEREBRO verra la conception et le design d’une nouvelle modalité d’imagerie basée sur un milieu de contraste électromagnétique qui permettra de contourner l’effet de blindage du crâne, permettant ainsi une imagerie à haute résolution spatiale de l’activité cérébrale dans son ensemble, tout en préservant la haute résolution temporelle des modalités d’imagerie directe de l’activité électrophysiologique.

Les informations qui seront rendues accessibles à la communauté médicale n’ont jamais été extraites auparavant et devraient permettre des percées importantes dans le domaine des neurosciences et des soins aux patients.

Description du sujet :
Ce travail de thèse vise à étendre les algorithmes statiques de source inverse en neuroimagerie au régime des hautes fréquences. Ces extensions reposeront sur le remplacement du “problème direct” statique par un problème dynamique (pour lequel le solveur sera spécifiquement développé). Les courants statiques sont remplacés par des courants oscillants et le potentiel est remplacé par le champ électromagnétique harmonique. Il s’agit clairement d’un cadre sans précédent pour la neuro-imagerie, mais la stabilité à haute fréquence est très attendue puisque le problème mathématique de la neuro-imagerie à haute fréquence pourrait être considéré comme une contrepartie vectorielle de l’imagerie de la source acoustique dans l’eau pour laquelle il existe des algorithmes de source inverse très efficaces.

Dans ce but, il s’agira d’implémenter des algorithmes de source inverse à haute fréquence utilisés par exemple en océanographie et de les utiliser en neuro-imagerie. Contrairement au cas statique qui est mathématiquement mal posé (pour des distributions de sources générales), les problèmes de sources inverses multifréquences sont bien posés. On s’attend donc à ce que l’imagerie en présence des micro-tiges, en plus de compenser les différences de RSB entre les lectures EEG invasives et non invasives, réduise également le caractère mal posé, ce qui permettra d’augmenter encore la précision.

Ce travail de thèse vise à apporter une contribution sur les méthodes de résolution de problèmes inverses à l’aide de techniques d’apprentissage profond. Il s’agira de mettre en place une formulation variationnelle pour l’estimation des propriétés électrophysiologies des tissus cérébraux à partir de données dans le cadre de l’apprentissage profond afin d’apprendre conjointement le terme de régularisation (a priori) et le solveur associé au problème de minimisation.

Profil du candidat :
Profil
Les compétences requises pour mener à bien ce travail concernent l’apprentissage machine, le traitement d’images, et les mathématiques appliquées. Des connaissances en informatique et en programmation (Python) seront également requises afin de développer les algorithmes associés.

Formation et compétences requises :
Master 2 ou équivalent en apprentissage / math appliqués / traitement de données médicales

Adresse d’emploi :
IMT Atlantique, Campus Brest.

Document attaché : 202304151611_2023-Cerebro_french.pdf

Categories: theses

Offre de thèse dans le cadre du projet MOCKUP : Meteorological Observation ontologies and Contextual Knowledge for final User Policies

Jul 1 – Jul 2 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : IRIT/CRNM
Durée : 3 ans
Contact : cassia.trojahn@irit.fr
Date limite de publication : 2023-07-01

Contexte :
Le développement durable est inscrit profondément dans les politiques publiques. L’une des conséquences en est la recherche d’outils techniques ou scientifiques pour réaliser cet objectif, et c’est ce que l’on appelle la science de la durabilité. Ce nouveau champ de la science est hautement interdisciplinaire avec une interaction entre les sciences de la société et de l’humain, et les sciences de la nature ou les sciences formelles. Dans ce contexte, les actions interdisciplinaires en sciences des données sur le pilotage de l’aménagement des territoires méritent d’être développées. Ce pilotage est en effet guidé par des données issues d’une variété de domaines (géographie, économie, environnement, etc.). Les points de vue sur ces données sont contextuels et évolutifs.

Le projet MOCKUP est donc centré sur l’observation et le pilotage du territoire, en s’appuyant sur une représentation sémantique des données entrantes et des points de vue sur ces données. Ses objectifs sont les suivants :
• l’apprentissage de points de vue selon l’usage des données, en particulier des données environnementales ;
• la représentation de points de vues pour définir des ontologies dynamiques et adaptables aux usages et contexte ;
• le raisonnement contextuel sur les données pour l’aide à la décision.

Sujet :
Dans ce projet, notre hypothèse est de considérer que l’appropriation de données décrites par une ontologie, dite de référence, passe par la prise en compte des usagers (et des usages), tant des publics ciblés que des contextes d’utilisation, dans la manière de présenter cette ontologie. Nous proposons pour cela de définir la notion de point de vue, considéré comme un prisme, une manière de présenter l’ontologie en partie ou en totalité, de manière adaptée à des utilisateurs et à un contexte d’usage.

Pour s’adapter au contexte d’usage, nous voulons donner un caractère dynamique, adaptatif et contextuel à l’ontologie, ce qui est négligé dans les travaux sur la construction des ontologies dans l’état de l’art. Bien que l’évolution des ontologies, pour leur donner un minimum de caractère “dynamique”, ait fait l’objet de nombreuses recherches dans la communauté Web sémantique, la problématique traitée ici est différente et nouvelle : l’ontologie serait stable et suffisamment riche pour être adaptée par de nouvelles instantiations donnant lieu à différentes vues sur celle-ci, selon les usages et les contextes.

Il s’agira de reformuler, simplifier ou extraire des sous-ensembles de l’ontologie, et si besoin d’envisager des présentations adaptées, et cela de façon adéquate aux contextes d’usage. Ces ontologies adaptatives peuvent donc être le pilier pour le raisonnement dépendant du contexte d’usage.

Profil du candidat :
Master en informatique (si possible avec mention), formé à la représentation de connaissances et aux technologies du web sémantique. Compétences en programmation, bonnes capacités de rédaction, y compris en anglais.

Formation et compétences requises :
Master en informatique (si possible avec mention), formé à la représentation de connaissances et aux technologies du web sémantique. Compétences en programmation, bonnes capacités de rédaction, y compris en anglais.

Adresse d’emploi :
Le la doctorant.e bénéficiera d’une allocation doctorale interdisciplinaire (ADI) cofinancée par l’Université de Toulouse et la région Occitanie (démarrage octobre 2023 pour 3 ans). La thèse sera co-encadré par Cassia Trojahn (IRIT, UT2J), Christophe Baehr (CRNM) et Nathalie Aussenac-Gilles (CNRS/IRIT).

Categories: theses

Sun

Fine-grained, multimodal speech anonymization

Jul 2 – Jul 3 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Inria Nancy & Lille
Durée : 36 mois
Contact : emmanuel.vincent@inria.fr
Date limite de publication : 2023-07-02

Contexte :
This PhD is part of the “Personal data protection” project of PEPR Cybersécurité, which aims to advance privacy preservation technology for various application sectors. It will be co-supervised by Emmanuel Vincent and Marc Tommasi. The PhD student will have the opportunity to spend time in both the Multispeech and Magnet teams, to collaborate with 9 other research teams in France and with the French data protection authority CNIL, and to contribute to the project’s overall goals including the organization of an anonymization challenge.

Sujet :
Large-scale collection, storage, and processing of speech data poses severe privacy threats [1]. Indeed, speech encapsulates a wealth of personal data (e.g., age and gender, ethnic origin, personality traits, health and socio-economic status, etc.) which can be linked to the speaker’s identity via metadata or via automatic speaker recognition. Speech data may also be used for voice spoofing using voice cloning software. With firm backing by privacy legislations such as the European general data protection regulation (GDPR), several initiatives are emerging to develop and evaluate privacy preservation solutions for speech technology. These include voice anonymization methods [2] which aim to conceal the speaker’s voice identity without degrading the utility for downstream tasks, and speaker re-identification attacks [3] which aim to assess the resulting privacy guarantees, e.g., in the scope of the VoicePrivacy challenge series [4].

The first objective of this PhD is to improve the privacy-utility tradeoff by better disentangling speaker identity from other attributes, and better decorrelating the underlying dimensions. Solutions may rely on suitable generative or self-supervised models [5, 6] or on adversarial learning [7]. The resulting privacy guarantees will be evaluated via stronger attackers, e.g., taking metadata into account.

The second objective is to extend the proposed audio-only approach to multimodal speech (audio, facial video, and gestures). Solutions will exploit existing facial anonymization technology [8]. A key difficulty will be to preserve the correlations between modalities, which are essential for training multimodal voice processing systems.

Depending on the PhD student’s skills, additional directions may also be explored, e.g., evaluating the proposed anonymization solutions in the context of federated learning.

[1] A. Nautsch, A. Jimenez, A. Treiber, J. Kolberg, C. Jasserand, E. Kindt, H. Delgado, M. Todisco, M. A. Hmani, M. A. Mtibaa, A. Abdelraheem, A. Abad, F. Teixeira, M. Gomez-Barrero, D. Petrovska, N. Chollet, G. Evans, T. Schneider, J.-F. Bonastre, B. Raj, I. Trancoso, and C. Busch, “Preserving privacy in speaker and speech characterisation,” Computer Speech and Language, vol. 58, pp. 441–480, 2019.

[2] B. M. L. Srivastava, M. Maouche, M. Sahidullah, E. Vincent, A. Bellet, M. Tommasi, N. Tomashenko, X. Wang, and J. Yamagishi, “Privacy and utility of x-vector based speaker anonymization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, to appear.

[3] B. M. L. Srivastava, N. Vauquier, M. Sahidullah, A. Bellet, M. Tommasi, and E. Vincent, “Evaluating voice conversion-based privacy protection against informed attackers,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2802–2806, 2020.

[4] N. Tomashenko, X. Wang, E. Vincent, J. Patino, B. M. L. Srivastava, P.-G. Noé, A. Nautsch, N. Evans, J. Yamagishi, B. O’Brien, A. Chanclu, J.-F. Bonastre, M. Todisco, and M. Maouche, “The VoicePrivacy 2020 Challenge: Results and findings,” Computer Speech and Language, vol. 74, pp. 101362, 2022.

[5] L. Girin, S. Leglaive, X. Bie, J. Diard, T. Hueber, and X. Alameda-Pineda, “Dynamical variational autoencoders: A comprehensive review,” Now Foundations and Trends, 2021.

[6] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems, pp. 12449–12460, 2020.

[7] B. M. L. Srivastava, A. Bellet, M. Tommasi, and E. Vincent, “Privacy-preserving adversarial representation learning in ASR: Reality or illusion?” in Interspeech, pp. 3700–3704, 2019.

[8] T. Ma, D. Li, W. Wang, and J. Dong, “CFA-Net: Controllable face anonymization network with identity representation manipulation,” arXiv preprint arXiv:2105.11137, 2021.

Profil du candidat :
MSc in computer science, machine learning, or signal processing.

Formation et compétences requises :
Strong programming skills in Python/Pytorch.
Prior experience in speech and video processing will be an asset.

Apply online at: https://jobs.inria.fr/public/classic/fr/offres/2023-06410

Adresse d’emploi :
615 Rue du Jardin-Botanique, 54600 Villers-lès-Nancy

Categories: theses

CIFRE PhD : Building a Large Patients Graph Data Lake for Primary Care Medicine

Jul 7 – Jul 8 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Télécom SudParis / Aldebaran
Durée : 36 mois
Contact : Amel.Bouzeghoub@telecom-sudparis.eu
Date limite de publication : 2023-07-07

Contexte :
This thesis is a CIFRE and a collaboration between Telecom SudParis and Aldebaran. The position will start in October 2023.

Keywords: Knowledge Graphs, Graph Database, Medical Ontology, Medical History taking

Sujet :
The aim of this PhD thesis is to build a graph-oriented data lake to store all patient health data connected to a relevant medical ontology. Patient data are acquired through Aldebaran (https://aldebaran.care/) digital medical history-taking device that generates a medical report for generalist practitioners. The logic of medical history-taking is supported by a knowledge graph database comprising up to 600 questions, 3000 responses, and 6000 relationships. As of April 1st 2023, 2500 questionnaires and their medical reports have been generated. Of note, each response
of a patient to a question generates lexical annotations for the medical reports and semantic annotations supported by Conceptual Graphs (CG). CGs are the basis of the hyper-contextuality of the medical history tracking device. As it supports the semantics of the response, it will also populate the patient data graph. The main objectives of this thesis project are threefold : (i) to propose a model for patients’ reported health needs, characteristics, and continuum of
care using knowledge graphs and including existing standards (OMOP, OHDSI, HL7-FIHR) in a future perspective of interoperability and data analysis ; (ii) to provide a safe and efficient mean to enrich the patient file with longitudinal data acquired along successive medical consultations. In this respect, temporality management is a key challenge ; (iii) to propose a method to ensure the contextualization and conciseness of the questionnaire according to the elements already known in the patient data graph.

Profil du candidat :
For this thesis, we will consider candidates with a Master’s or Engineer ‘s degree with knowledge about several of the following skills:

– Fluent in written and spoken English. Some knowledge of French can be useful.
– Skills in mathematics (statistics, graphs) and computer science (algorithms, machine learning, knowledge, data modeling, symbolic artificial intelligence, Natural Language Processing)
– Mastery of Python language, Cypher, (React), and experience in software development
– Adaptability and ability to invest in the field of medical application
– Experience in a research laboratory

This thesis is a CIFRE and a collaboration between Telecom SudParis and Aldebaran. The position will start in October 2023.

Formation et compétences requises :
Master’s or engineer’s degree in Computer Science with an affinity for Machine Learning.

Adresse d’emploi :
The PhD student will be co-hosted by Telecom SudParis (Palaiseau site) and Aldebaran (Paris).

Applications should be submitted by email to Amel.Bouzeghoub@telecom-sudparis.eu, Julien.Romero@telecom-sudparis.eu, and christian@aldebaran.care

They must include the following:
– A Curriculum Vitae;
– Transcripts of records of undergraduate and graduate studies;
– Link to MSc thesis and publications if applicable;
– Link to personal software repositories
– Name of 2 or 3 references to contact (position, email);

Categories: theses

Partial differential equation discovery for spatio-temporal simulations in cells

Jul 7 – Jul 8 all-day

Offre en lien avec l’Action/le Réseau : DSChem/– — –

Laboratoire/Entreprise : Inria Lyon / AIstroSight
Durée : 36 mois
Contact : thomas.guyet@inria.fr
Date limite de publication : 2023-07-07

Contexte :
A funded PhD position is available in the AIstroSight team, INRIA, Lyon, France (https://team.inria.fr/aistrosight/), starting in November 2023. Our interdisciplinary team aims at developing innovative numerical methods for the search of new drug candidates to treat brain diseases, targeting neurons as well as glial cells. We value diversity, trust, growth, equity and creativity.

Sujet :
The goal of this PhD project is to develop a data-driven partial differential equation (PDE) discovery method for complex dynamical systems such as brain cells. The algorithm will be evaluated on its ability to robustly and accurately learn cell function at the macroscopic scale from data simulated at the nanoscopic level.

Profil du candidat :
We are looking for a student with a Master degree who has experience in at least one of the following areas: data science/machine learning/mathematical modeling as well as an interest in cell biology/neuroscience. Proficiency in written and oral English is required. No knowledge of French is needed. Most importantly, we are looking for future colleagues who are eager to learn and grow, and who are driven by scientific curiosity.

Formation et compétences requises :
We are looking for a student with a Master degree who has experience in at least one of the following areas: data science/machine learning/mathematical modeling as well as an interest in cell biology/neuroscience. Proficiency in written and oral English is required. No knowledge of French is needed. Most importantly, we are looking for future colleagues who are eager to learn and grow, and who are driven by scientific curiosity.

Adresse d’emploi :
Inria Lyon, Campus de la Doua et/ou HCL

Document attaché : 202306041943_PDE_Discovery_AIstroSight_2023.pdf

Categories: theses

Méthodes d’apprentissage profonds visant le contrôle des structures critiques : vers des solutions en quasi temps réel pour la résolution de problèmes directes et inverses

Jul 21 – Jul 22 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : CEA List
Durée : 36 mois
Contact : roberto.miorelli@cea.fr
Date limite de publication : 2023-07-21

Contexte :
In this thesis, we focus on the study and the application of generative methods aiming at enhancing the quality of predictions in case of forward and inverse ML models (i.e., metamodels) for safety critical problems in industry.

Sujet :
L’utilisation des méthodes d’Intelligence Artificielle (i.e., méthodes d’apprentissage par réseaux neuronaux profondes) ouvre des perspectives très intéressantes dans le domaine du contrôle non destructif (CND) appliqué dans le domaine industriel comme par exemple l’assistance au diagnostic sur la base de mesures non-destructives. Récemment, l’utilisation des techniques d’apprentissage profond a montré des premières résultats encourageants dans le domaine du contrôle de la santé intégré [1], dans le contrôle des températures par méthodes d’imagerie infrarouge dans les centrales de fusion nucléaires [2, 3] et pour la génération de données réalistes dans le cas d’imagerie par ultrasons, entre autres.

Néanmoins, malgré plusieurs succès récents des techniques d’apprentissage automatique appliquées au CND, des verrous restent encore à lever pour rendre performantes et robustes ces techniques et les utiliser de manière fiable et systématique dans le domaine du CND hautement critique. Cette thèse vise à répondre à deux principaux points qui jouent un rôle dominant sur les performances des modèles d’apprentissage : i) le manque de données labélisées disponibles pour l’apprentissage des réseaux de neurones et ii) l’impact de incertitudes, qu’elles soient aléatoires ou bien épistémiques. Dans ce contexte, l’usage de la simulation permet de dépasser une grande partie des limitations actuelles, liées au manque de données de terrain labellisées, en créant de grandes bases de données synthétiques d’apprentissage, capable de couvrir tous types de scénarios et des cas non encore observés. La difficulté, cependant ici, est d’être capable de gérer les incertitudes inhérentes à la mesure expérimentale (erreur de calibration, bruit de mesure, etc.) et les incertitudes du modèle lui-même (i.e., les erreurs à la fois du modèle physique utilisé pour l’apprentissage et celle du modèle d’apprentissage –réseaux de neurones).

Cette thèse vise à améliorer la qualité des prédictions par IA dans le cas des modèles direct (de l’observable à la mesure) et dans le cas de modèles inverses (de la mesure à l’observable). En première lieu, une attention particulière sera mise sur la conception d’outils d’apprentissage profond de type générative conditionnées (e.g., auto-encodeurs variationels conditionnés, architectures de type UNET conditionnées, etc.) -reposant sur l’utilisation de données multi-fidélités- pour la génération de données réalistes, l’analyse et l’optimisation des problèmes d’inspection CND. Dans une deuxième étape, une forte attention sera donnée aux schémas d’apprentissage profond capable de promouvoir l’estimation des incertitudes (e.g., méthodes d’ensemble, Monte Carlo drop out, etc.) associées à la tâche d’apprentissage menées (i.e., régression, classification, etc.).

Dans ce travail de thèse, l’application des outils d’apprentissage développées se fera dans deux principaux domaines d’intérêt : le contrôle des parois des centrales de fusion nucléaire par imagerie infrarouge et l’inspection de pièces industrielles par imagerie ultrasonore.

———————————————————————————————-
English version:
———————————————————————————————-

Title: Enhancing the monitoring capabilities for safety critical problems via deep learning-based models: toward quasi real-time forward and inverse applications

Machine learning (ML) and in particular deep learning (DL) methods are gaining the attention of the engineering scientific and industry communities for enhance data analysis capabilities, in supporting human decisions, etc.. In the context of nondestructive testing and evaluation (NDT&E) and structural health monitoring (SHM), the use of ML methods is gaining the attention of scholars, researchers and experts. Indeed, the possibility to develop and deploy tailored ML strategies for detecting, classify and possibly provide quantitative information on anomalies (i.e., defects) in inspected structures in one of the most active investigation field in the community. That is, the application of deep learning methods (i.e., deep convolutional neural networks) has been recently applied with success in the domain of SHM based on guided wave imaging data [1], in the field of infrared thermography applied to tokamak plasma temperature monitoring [2, 3] and for the realistic simulation of ultrasound testing images under uncertainties, for instance.

Nevertheless, despite some recent successes in applying ML schema in NDT&E problems, there is still room of improvements different directions. Indeed, in NDT&E one faces two main challenges i) the chronical lack of properly labelled experimental data (e.g., security and secrecy issues) and ii) the impact of uncertainties on the measurements (e.g., experimental conditions, knowledge on the actual material properties). In this context, the use of advanced numerical models can be used to mitigate the impact of such issues by exploiting simulations results to be integrated into the ML model (the so-called model-driven ML) and possibly coupled to experimental data too.

In this thesis, we focus on the study and the application of generative methods aiming at enhancing the quality of predictions in case of forward and inverse ML models (i.e., metamodels). Firstly, a particular emphasis will be given on deep learning schemas aiming at enhancing the computational performance of advanced numerical solvers. Toward this end, conditional generative models (e.g., cVAE, cUNET, etc.) based on multi-fidelity data will be considered for fast and reliable generation of data for understanding, analyzing and optimize the NDT&E scenario considered. In a second and tightly related stage, a particular focus will be given on the study of deep learning strategies (e.g., deep ensembles, Monte Carlo drop out, etc.) aiming at performing forward and inverse tasks providing the uncertainty estimation associated to the predictions.

In the context of this thesis the use of data issued from infrared thermography and ultrasound testing will be privileged with a specific emphasis on imaging post-processed data.

Ref.:

[1] Miorelli, Roberto, et al. “Defect sizing in guided wave imaging structural health monitoring using convolutional neural networks.” NDT & E International 122 (2021): 102480.

[2] Juven et al., “Temperature Estimation in Fusion Devices using Machine Learning techniques on Infrared Specular Synthetic Data,” 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), 2022, pp. 1-5, doi: 10.1109/IVMSP54334.2022.9816270.

[3] Aumeunier, M. H., et al. “Development of inverse methods for infrared thermography in fusion devices.” Nuclear Materials and Energy 33 (2022): 101231.

Contacts : Roberto Miorelli, Ph.D.

Université Paris-Saclay, CEA, List – Département Instrumentation Numérique

roberto.miorelli@cea.fr;

Profil du candidat :
Niveau M2 en Physique, Mathématiques Appliquées ou Statistique

Formation et compétences requises :
Niveau M2 en Physique, Mathématiques Appliquées ou Statistique

Adresse d’emploi :
CEA Saclay

Categories: theses

Sun

Apprentissage frugal et compressé pour l’interprétation des scènes sous-marines

Jul 30 – Jul 31 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : L@bISEN Yncréa Ouest, Equipe Vision-AD et Thales B
Durée : 3 ans
Contact : ayoub.karine@yncrea.fr
Date limite de publication : 2023-07-30

Contexte :
L’océan couvre la plus grande partie de la surface de la Terre (environ 71%) avec une superficie de 361 millions de kilomètres carrés. De ce fait, cette richesse naturelle est étudiée avec la plus grande attention pour répondre à des enjeux écologiques et économiques cruciaux. Dans ce sens, les recherches scientifiques peuvent se scinder en deux grandes familles à savoir l’étude de la surface des océans et l’étude du milieu sous-marin. C’est dans le contexte général de l’interprétation du milieu sous-marin que s’articule le présent sujet de thèse. L’interprétation directe de ce milieu par un être humain reste une tâche risquée, coûteuse et difficile compte tenu du temps et des coûts liés aux systèmes d’acquisition et aux missions sous-marines. En conséquence, il est nécessaire de développer de nouvelles méthodes visant à définir des outils technologiques de prise de décision pour l’exploration automatique du milieu sous-marin.

Sujet :
Les données concernant le milieu sous-marin peuvent être acquises via des systèmes d’observation à l’instar des drones sous-marins autonomes (AUV, Autonomous Underwater Vehicle). Ces systèmes sont dotés de capteurs de vision leur permettant d’acquérir des vidéos sous-marines optiques. L’analyse automatique de ces flux vidéo permet d’interpréter avec plus de finesse l’espace sous-marin pour des missions de cartographie, d’études des habitats, de suivis de structures sous-marines ou encore pour la recherche d’objets immergés. Dans la présente thèse, nous nous intéressons à la segmentation sémantique qui vise à affecter une classe à chaque pixel dans les vidéos. Ainsi, la carte de segmentation générée représente une classification fine des différentes zones (substrats et objets) de la scène observée. En vision sous-marine, cette tâche est effectuée en s’inspirant des méthodes d’intelligence artificielle ayant montré leurs suprématies dans l’interprétation du milieu aérien à l’instar des réseaux de neurones profonds [1, 2]. Étant donné qu’une vidéo est une succession d’images (frames), la solution la plus directe pour sa segmentation sémantique est d’appliquer un modèle de prédiction à chacune de ses images en utilisant un CNN (Convolutional Neural Network) par exemple [3]. Cependant, cette solution naïve ne fait pas preuve de bonne performance pour la segmentation. Ceci est dû principalement à la nonconsidération de la relation temporelle entre les images de la vidéo. Contrairement aux images statiques, l’information temporelle est d’une grande importance dans le traitement des vidéos. Elle permet de modéliser la progression de la scène observée dans le temps. Par conséquent, les travaux récents traitant cette problématique essaient de prédire les classes associées aux pixels d’une image à l’instant “t” en utilisant les classes affectées aux pixels des images précédentes (“t-1”, “t-2”, etc.). Pour ce faire, plusieurs travaux, après la segmentation sémantique de chaque image, ajoutent un module d’agrégation (flux optique, tracking, etc.) suivi d’un réseau de neurones séquentiel (RNN, LSTM, Transformer, etc.). D’autres familles de méthodes n’utilisent que quelques images de la vidéo (appelés keyframes) et propagent les cartes caractéristiques

vers les autres images à travers le flux optique. Néanmoins, l’adaptation de ces méthodes en vision sous-marine a montré des limites en termes de robustesse, mais aussi en termes de temps de calcul. Ces deux limites sont respectivement liées, principalement, à deux facteurs : (1) la non-disponibilité d’un grand jeu de données étiquetées de vidéos sous-marines, ceci à cause des coûts élevés des missions d’acquisition (systèmes coûteux et annotation manuelle chronophage des vidéos pour les experts). (2) le nombre important des opérations et des paramètres utilisés dans les approches neuronales pour la segmentation des vidéos sous-marines. Pour combler ces deux limites, le présent sujet de thèse vise à proposer des architectures neuronales compressées capables d’apprendre à partir d’un très faible volume de vidéos et d’offrir des performances de segmentation élevées.

Concernant la première limite, la piste de l’apprentissage frugal (Few-Shot Learning) [4, 5] sera étudiée. Il s’agit d’une approche apte à apprendre à partir d’un nombre limité de données d’apprentissage étiquetées. Cela est atteint en se basant sur l’accumulation de différentes connaissances préalables extraites à partir d’autres bases de données (appelée donnée de base) plus grandes. Cette étape fait référence à l’apprentissage de représentation. Finalement, le nombre réduit des données d’apprentissage (appelées support) est utilisé pour construire la fonction de décision.

Quant à la deuxième limite, la distillation de connaissances (Knowledge Distillation) [6, 7] pourrait être une approche prometteuse. Le principe de cette approche consiste à transmettre les connaissances d’un grand réseau de neurones qui donne de bonnes performances sur une tâche spécifique, dit enseignant, vers un autre réseau réduit, dit étudiant. L’objectif est que le réseau étudiant imite l’apprentissage du réseau enseignant. Autrement dit, l’apprentissage du réseau étudiant est supervisé par le réseau enseignant. Ainsi, il est possible d’aboutir à un réseau de neurones performant en segmentation sémantique avec une taille réduite, facilement embarquable dans les drones autonomes sous-marins grâce à sa rapidité d’inférence et sa consommation énergétique réduite. La méthode mise en oeuvre pourra ainsi être utilisée pour segmenter, en temps réel et précisément, les fonds marins en fonction des substrats présents. Enfin, des données dynamiques 3D, de type nuage de points, pourront être utilisées afin de renforcer la segmentation sémantique et produire des relevés encore plus précis pour créer une cartographie des fonds marins.

*** Références :
[1] Islam, M. J., Edge, C., Xiao, Y., Luo, P., Mehtaz, M., Morse, C., & Sattar, J. “Semantic segmentation of underwater imagery : Dataset and benchmark”, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
[2] Chicchon, M., & Bedon, H. “Semantic Segmentation of Underwater Environments Using DeepLabv3+ and Transfer Learning”, Smart Trends in Computing and Communications (pp. 301-309). Springer, Singapore, 2022.
[3] T. Zhou, F. Porikli, D. J. Crandall, L. V. Gool and W. Wang, “A Survey on Deep Learning Technique for Video Segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
[4] Snell, J., Swersky, K., & Zemel, R. “Prototypical networks for few-shot learning”, Advances in neural information processing systems, 30, 2017.
[5] Wang, K., Liew, J. H., Zou, Y., Zhou, D., & Feng, J. “Panet : Few-shot image semantic segmentation with prototype alignment”, Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9197-9206), 2019.
[6] Liu, Y., Shu, C., Wang, J., & Shen, C. “Structured knowledge distillation for dense prediction”, IEEE transactions on pattern analysis and machine intelligence, 2020.
[7] Karine, A., Napoléon, T., Jridi, M., “Semantic Images Segmentation for autonomous driving using Self-Attention Knowledge Distillation”, 16th IEEE International Conference

Profil du candidat :
• Expérience en intelligence artificielle et vision par ordinateur
• Des connaissances en apprentissage profond appliqué à la vision par ordinateur.
• Intérêt pour la vision sous-marine

Formation et compétences requises :
Dans l’idéal, le candidat doit avoir :
• suivi un cursus de Master ou d’Ingénieur dans un des domaines suivants : intelligence artificielle, vision par ordinateur, science des données, mathématiques appliquées ;
• de solides compétences en algorithmique et en programmation : Python, PyTorch, Tensoflow, Keras… ;

Adresse d’emploi :
20 Rue Cuirassé Bretagne, 29200 Brest

Document attaché : 202305041329_TheseVisionSousMarine_LabISEN-Thales.pdf

Categories: theses

Estimation de l’état de la mer par vidéo

Jul 30 – Jul 31 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Laboratoire Informatique, Image et Interaction (L3
Durée : 3 ans
Contact : sylvain.marchand@univ-lr.fr
Date limite de publication : 2023-07-30

Contexte :
La Rochelle Université va se doter d’une plate-forme dont l’objectif à terme est de devenir un voilier autonome, capable de sentir son environnement avec des capteurs et de prendre les bonnes décisions.

Sujet :
Lors de la navigation, il est fondamental de pouvoir estimer voire anticiper l’état de la mer, la direction et la période de la houle, les vagues plus importantes, ainsi que les brusques variations de vent (risées) qui peuvent être observées à la surface de la mer. C’est en général le rôle d’un équipier, appelé « numéro 1 ». Dans le cadre de la navigation en solitaire, on ne peut plus s’appuyer sur cet équipier. Or ces informations sont essentielles pour pouvoir anticiper et gagner en performance ou en sécurité, et pourraient être prises en compte par des centrales de navigation (« pilotes automatiques ») de nouvelle génération (dotées d’intelligence artificielle). Une autre application est l’observation des vagues sur le littoral, pour mesurer voire anticiper l’érosion des côtes par exemple. Là aussi, cela se fait actuellement via l’observation humaine principalement. C’est un cas plus simple car la caméra est fixe (ce qui est rarement le cas sur un voilier…). Nous proposons de concevoir une méthode pour estimer automatiquement ces informations (état de la mer, houle voire variations de vent), à l’aide de séquences d’images (type vidéo) possiblement issues d’un unique capteur (cas monoculaire, mais peut-être avec une vision à 360 degrés), avec des contraintes de sobriété énergétique. Une première piste est la conception d’une nouvelle transformée mathématique, combinaison de deux transformées existantes (Fourier et Hough), approche introduite dans le domaine du son pour estimer la direction et la période entre des fronts d’onde. Il faudra également étendre cette transformée à la dimension temporelle (pour les séquences d’images). Il faudra faire l’acquisition de données, concevoir la ou les méthode(s) d’estimation ainsi que le dispositif matériel final (« capteur intelligent »).

Profil du candidat :
Master Informatique ou équivalent.

Formation et compétences requises :
Master Informatique,
analyse d’images / vidéos, traitement du signal et des images,
des connaissances en navigation à la voile étant un plus.

Adresse d’emploi :
Laboratoire Informatique, Image et Interaction (L3i), La Rochelle Université.

Document attaché : 202305040957_Sujet_MARCHAND.pdf

Categories: theses

Learning temporally-consistent 3D mesh models of growing plants

Jul 30 – Jul 31 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : ICube, Université de Strasbourg, CNRS
Durée : 3 years
Contact : remi.allegre@unistra.fr
Date limite de publication : 2023-07-30

Contexte :
The aim of this Doctoral thesis is to develop an approach to reconstruct 3D+t (i.e. temporally-consistent) mesh models of growing plants suitable for accurate measurements at fine scales.

Deadline for application: May 23, 2023

Sujet :
Host team: IGG (Computer Graphics and Geometry Group), ICube laboratory

Advisor: Franck Hétroy-Wheeler, Professor in Computer Science (hetroywheeler AT unistra.fr)

Co-advisor: Rémi Allègre, Associate Professor in Computer Science (remi.allegre AT unistra.fr)

Starting date: October 2023

Keywords: Computer Vision, Computer Graphics, Image Processing, Data Science

Description: This doctoral thesis position is proposed in the context of a research project with biophysicists from the University Paris Diderot and ENS Lyon. This project aims at modeling plant growth movements during leaf development and understanding the underlying physical and biological mechanisms at play. In this context, measurements of both plant movements and magnitude of local growth are required. This is currently achieved with the help of photogrammetry only at a coarse scale, considering small sets of markers painted on the leaves. A key challenge of this project is to develop an approach to reconstruct 3D+t (i.e. temporally-consistent) mesh models of growing plants suitable for accurate measurements at fine scales, which involves both high-resolution reconstruction and point-to-point correspondences issues. The goal of this thesis is to address this challenge following a three-part approach: 1) the estimation of optical and scene flows from photographs for fine-scale correspondences between time steps, 2) the combination of different acquisition modalities (photogrammetry, laser scanning and structured light scanning) for high-resolution 3D reconstruction, and 3) the definition of either fine-scale statistical geometric templates for leaves or a neural network architecture for shape interpolation. The developed models and methods will rely on recent machine learning techniques. Several datasets of photographs and 3D reconstructions of growing plants will be provided.

A detailed version of the proposal including bibliography is available at the following address:
https://seafile.unistra.fr/f/93cc5483d1514e3a9b0c/

Profil du candidat :
Desired skills:
– Computer Vision, and/or Computer Graphics or Image Processing, or Data Science
– Basic skills in machine and deep learning

Formation et compétences requises :
M2 or Engineering School degree in Computer Science

Adresse d’emploi :
Illkirch (Strasbourg area)

Categories: theses

Aug

Thu

Approche neuro-symbolique dans le cadre de la photogrammétrie pour le suivi d’ouvrage d’art : Application à la détection et au suivi de fissures béton.

Aug 31 – Sep 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/Doctorants

Laboratoire/Entreprise : UMR 7020 Laboratoire d’Informatique & Systèmes (L
Durée : 36 mois
Contact : julien.seinturier@univ-tln.fr
Date limite de publication : 2023-08-31

Contexte :
La surveillance et le suivi des ouvrages d’art est une problématique actuelle dans le domaine des Bâtiments et Travaux Public (BTP). Parmi les ouvrages les plus à risque, les structures béton comme les ponts ou les barrages sont surveillés et suivi grâce à l’étude des fissures pouvant apparaitre à leur surface. Les méthodes actuelles de détection et de classification de fissures béton sont articulées autour de trois axes.
L’étude experte consistent à faire analyser par un expert béton des prises de vues de fissures ou les fissures elles même in situ. A partir des données recueillies, l’expert émet une conclusion afin de planifier les interventions nécessaires. L’intervention manuelle pose de nombreuses contraintes comme le nombre de structures à surveiller surpassant de loin la capacité d’inspection des experts ou l’accès aux sites eux-mêmes.
Les méthodes basées sur l’inférence logique reposent sur la formalisation des connaissances expertes impliquées dans la détection et la classification de fissures. Ces connaissances peuvent être intégrées dans des systèmes expert [1] ou dans des ontologies [2], [3]. Les moteurs d’inférence sous-jacent permettent alors de classifier les fissures à partir des données recueillies. De telles méthodes répondent à la problématique de la disponibilité des experts et à l’explicabilité des résultats fournis (par la nature de l’inférence logique). Cependant, elles sont soumises aux contraintes de l’inférence logique et de sa complexité algorithmique qui n’est pas adapté à de grands volumes de données.
L’apprentissage automatique, en particulier basé sur des images, permet dans la détection de fissures à partir de simples images. De nombreuses solutions ont été proposées à partir d’apprentissage machine standard [4] ou encore d’apprentissage profond [5], [6]. Ces solutions sont rapides, ne demandent pas l’acquisition de données complexes et ont un bon ratio de détection et de classification. Les contraintes de l’apprentissage automatique restent néanmoins présentes, comme la justification d’une détection / classification ou encore la prédiction de l’évolution d’une fissure.

Sujet :
La thèse vise la mise en place d’une approche neuro-symbolique pour répondre à la problématique de la surveillance et le suivi des ouvrages d’art. Le choix des méthodes automatiques et le choix d’un cadre de formalisation des connaissances expertes s’avère alors critique. Au-delà de la simple détection de fissure, l’approche doit également être capable de justifier ses choix et de les expliquer aux experts (qui restent les décideurs finaux). Un verrou secondaire est l’évaluation de la meilleure méthode d’intégration entre approche neuronale et approche symbolique (comme utiliser l’approche neuronale comme fournisseur de faits pour l’inférence symbolique ou utiliser la connaissance symbolique pour paramétrer le réseau de neurones).

Les travaux engagés visent à proposer une approche permettant l’automatisation de la détection et de la classification de fissures béton en intégrant les approches basées sur l’apprentissage automatique [4]–[6] et les approches basées sur la représentation des connaissances et l’inférence [3] au sein d’un même cadre unifié. La construction de ce cadre s’appuiera sur les dernières recherches menées dans le domaine de l’intelligence artificielle neuro symbolique [9], [10].
Dans un premier temps, un état de l’art sera réalisé à sur la détection et la classification d’objets par apprentissage automatique ainsi que sur la représentation de connaissances expertes et l’inférence sur celles-ci. Différentes méthodes issues de ces approches seront implantées et testées afin de retenir les plus à mêmes d’être intégrées dans un cadre unifié.
Dans un second temps, le cadre unifié sera implanté au moyen d’ontologies. Le formalisme choisi reposera sur l’Ontology Web Language étendu aux logiques de descriptions (OWL-DL)[11] pour la formalisation de connaissances et sur le Semantic Web Rule Language (SWRL)[12] pour ses capacités d’inférences. Le choix de baser le cadre sur OWL permettra d’intégrer également des représentations ontologiques du temps [13] et de l’espace [14], [15], notions fondamentales dans la localisation et le suivi d’objets.
Une fois des méthodes de détection automatiques de fissures par apprentissage automatique et le la connaissance experte formalisée, une approche neuro-symbolique pour la détection et la classification de fissures sera proposée à partir de l’état de l’art [9], [10]. Une première piste pouvant être suivi dans ce contexte étant de peupler l’ontologie à partir des résultats de détection / classification automatique et de procéder à des inférences classiques ou du requêtage, par exemple via SPARQL.
Après avoir mis en place le pipeline de détection / classification fondé sur une approche neuro-symbolique, les travaux seront évalués en utilisant des données réelles acquises par le système de photogrammétrie HYDRO Series déployé sur des sites identifiés.

[1] P. Dalmagioni, M. Lazzari, R. Pellegrini, S. Paolo, et M. Emborg, « An Expert System For Managing Early Age Concrete Crack Prediction », in Proceedings of the 9th International Workshop of the European Group for Intelligent Computing in Engineering, Darmstadt, Germany, août 2002.
[2] W. T. Chen et T. A. Bria, « A Review of Ontology-Based Safety Management in Construction », Sustainability, vol. 15, no 1, 2023, doi: 10.3390/su15010413.
[3] S. Jung, S. Lee, et J. Yu, « Ontological Approach for Automatic Inference of Concrete Crack Cause », Appl. Sci., vol. 11, no 1, déc. 2020, doi: 10.3390/app11010252.
[4] H. Kim, E. Ahn, M. Shin, et S.-H. Sim, « Crack and Noncrack Classification from Concrete Surface Images Using Machine Learning », Struct. Health Monit., vol. 18, no 3, p. 725‑738, avr. 2018, doi: https://doi.org/10.1177/14759217187687.
[5] P. Tupe-Waghmare et R. R. Joshi, « A Scoping Review of Classification of Concrete Cracks using Deep Convolution Learning Approach », Libr. Philos. Pract., vol. 5127, no 1, févr. 2021, [En ligne]. Disponible sur: https://digitalcommons.unl.edu/libphilprac/5127/
[6] W. R. L. da Silva et D. S. de Lucena, « Concrete Cracks Detection Based on Deep Learning Image Classification », in Proceedings of the 18th International Conference on Experimental Mechanics ICEM18, Brussels, Belgium, janv. 2018, vol. 2, no 8. doi: 10.3390/ICEM18-05387.
[7] F. Menna et al., « Evaluation of vision-based localization and mapping techniques in a subsea metrology scenario », Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci., vol. XLII-2/W10, p. 127‑134, mai 2019, doi: 10.5194/isprs-archives-XLII-2-W10-127-2019.
[8] F. Menna et al., « Towards real-time underwater photogrammetry for subsea metrology applications », in Proceedings of OCEANS 2019 conference, Marseilles, France, juin 2019, p. 1‑10. doi: 10.1109/OCEANSE.2019.8867285.
[9] P. Hitzler et S. Kamruzzaman, Neuro-Symbolic Artificial Intelligence: The State of the Art, Joost Breuker, Nicola Guarino., vol. 342. Nieuwe Hemweg 6B 1013 BG, Amsterdam, Netherlands: IOS Press, 2022. [En ligne]. Disponible sur: https://ebooks.iospress.nl/ISBN/978-1-64368-245-7
[10] A. d’Avila Garcez et al., « Neural-Symbolic Learning and Reasoning: Contributions and Challenges », in Proceedings of the 2015 AAAI Spring Symposium Series, Stanford University, USA, mars 2015. [En ligne]. Disponible sur: https://www.aaai.org/ocs/index.php/SSS/SSS15/paper/view/10281
[11] OWL Working Group, « OWL 2 Web Ontology Language Document Overview », W3C, Recommandation, déc. 2012. [En ligne]. Disponible sur: https://www.w3.org/TR/owl2-overview/
[12] I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, et M. Dean, « SWRL: A Semantic Web Rule Language Combining OWL and RuleML », W3C, Submission, mai 2004. [En ligne]. Disponible sur: https://www.w3.org/Submission/SWRL/
[13] S. Cox et C. Little, « Time Ontology in OWL », W3C, Recommandation, nov. 2022. [En ligne]. Disponible sur: https://www.w3.org/TR/owl-time/
[14] S. Marc-Zwecker, F. de Bertrand de Beuvron, C. Zanni-Merk, et F. Le Ber, « Qualitative spatial reasoning in RCC8 with OWL and SWRL », in Proceedingsof the 5th International Conference on Knowledge Engineering and Ontology Development KEOD2013, Vilamoura, Algarve, Portugal, sept. 2013, p. 214‑221. doi: 10.5220/0004543702140221.
[15] Y. Wang, Q. Mengling, H. Liu, et X. ye, « Qualitative spatial reasoning on topological relations by combining the semantic web and constraint satisfaction », Geo-Spat. Inf. Sci., vol. 21, no 2, p. 80‑92, févr. 2017, doi: 10.1080/10095020.2018.1430659.
[16] Md. Safiuddin, A. B. M. A. Kaish, C.-O. Woon, et S. N. Raman, « Early-Age Cracking in Concrete: Causes, Consequences, Remedial Measures, and Recommendations », Appl. Sci., vol. 8, no 10, sept. 2018, doi: 10.3390/app8101730.
[17] C. J. Larosche, « Types and causes of cracking in concrete structures », Failure, distress and repair of concrete structures. Woodhead Publishing Limited, p. 57‑83, 2009. doi: 10.1533/9781845697037.1.57.
[18] J. F. Allen, « Maintaining knowledge about temporal intervals », Commun. ACM, no 11, p. 832‑843, nov. 1983, doi: https://doi.org/10.1145/182.358434.
[19] J. F. Allen, « An Interval-Based Representation of Temporal Knowledge », in Proceedings of the 7th international joint conference on Artificial intelligence IJCAI’81, Vancouver, BC, Canada, août 1981, vol. 1, p. 221‑226. [En ligne]. Disponible sur: https://www.ijcai.org/Proceedings/81-1/Papers/045.pdf
[20] D. A. Randell, Z. Cui, et A. G. Cohn, « A spatial logic based on regions and connection », in Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, Cambridge, oct. 1992, p. 165‑176.

Profil du candidat :
Le candidat devra avoir une formation en informatique avec de solides bases formelles, notamment en Représentation des connaissances (Ontologies, OWL) et en apprentissage machine (Apprentissage profond). Une connaissance des Systèmes d’Informations et de de la Vision par Ordinateur sera un plus pour s’intégrer au projet et échange avec les partenaires.

La thèse étant un profil CIFRE, un profil ingénieur ainsi qu’une sensibilité au monde de l’entreprise peuvent s’avérer être des avantage.

Formation et compétences requises :
Formation Bac + 5 (Master 2 / Ingénieur) en informatique avec des UE s’approchant des domaines de l’Intelligence Artificielle et des systèmes d’informations.

Adresse d’emploi :
L’emploi sera réparti entre la société IVM Technologies, située à Marseille (9ième arrondissement) pour 70% du temps et l’Université de Toulon (30% du temps). La répartition des lieux de travail pourra être affinée en accord avec le candidat.

Document attaché : 202305101530_2023-THESE-IVM-LIS-UTLN-COURT.pdf

Categories: theses

Combining Knowledge graph embedding and prior knowledge based semi-supervised learning for ontology learning from large scale data.

Aug 31 – Sep 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : DUKe, LS2N (Laboratory of Digital Sciences of Nan
Durée : 3 years
Contact : Fabrice.Guillet@univ-nantes.fr
Date limite de publication : 2023-08-31

Contexte :
PhD Description

Background. The popularity of ontologies and the easy access to a large number of textual resources have strongly motivated the automatic construction of ontologies using artificial intelligence techniques. Three types of construction approaches are distinguished: distributional approaches, knowledge graph-based approaches and pattern-based approaches [Xu et al., 2019, Chen et al. 2020]. In this thesis, we will focus on distributional approaches and more specifically on clustering and graph-based approaches. Generally, clustering allows to consider a large amount of data. However, it faces two main difficulties: the cluster labelling and the formation of semantically consistent clusters relevant to the ontology domain. In our previous work, we have developed a prior knowledge-driven LDA to tackle these two difficulties [Huang et al. 2021, Xu et al 2020]. However, clustering based approaches suffer also from the sparsity of the term representation space [Shwartz et al., 2016]. Graph-based approaches extract triples from texts (subject, predicate, object), then align and link them to form knowledge graphs (e.g. Yago, DBpedia). They allow to process a large number of texts and build very large graphs, but they suffer from the issue of data heterogeneity, because the same concept can be denoted by different terms in distinct triples and the same term can have several semantics [Nguyen and Ichise, 2012], [Kertkeidkachorn and Ichise, 2018].

Sujet :
Title: Combining Knowledge graph embedding and prior knowledge based semi-supervised learning for ontology learning from large scale data.

Keywords: Ontology learning, Knowledge Graph Completion, Prior Knowledge, Clustering, Relation Prediction, Knowledge Graph Embedding, Graph Neural Network.

Laboratory: DUKe, LS2N (Laboratory of Digital Sciences of Nantes, France) and a collaboration with NII & AIST (Tokyo, Japan)

Supervisors: Mounira Harzallah and Fabrice Guillet

CNRS financial support: 2135 € (gross salary)/month and a NII financial support for the Japan internship.

Start date: 1st of October

Duration: 3 years

Requirements:

-Education Level: MSc

-Field: Computer Science, Data Science, Web Science, Computational Linguistics, Artificial Intelligence

-Candidate Profile: Knowledge on Data mining/Machine Learning, Knowledge on Semantic Web and NLP will be strongly appreciated but not mandatory, Knowledge in programing languages mainly Python.

-Language: English

The application evaluation will be continuous until the position is filled. Interested candidates should submit : CV, cover letter, transcripts of records of the tree last years and names and addresses of two references. Applications should be submitted to mounira.harzallah@univ-nantes.fr and fabrice.guillet@univ-nantes.fr

PhD purpose.

The purpose of this thesis is to develop a new approach for automatic ontology construction combining semi-supervised clustering methods driven by prior knowledge (seed knowledge, local knowledge, domain knowledge, DBpedia,..) [Jagarlamudi et al. 2012, Xu et al. 2019, Huang et al, 2021] and knowledge graph embedding [Ebisu and Ichise, 2018]. This new approach will solve the scientific locks of data heterogeneity and data sparsity. By defining cluster terms by subgraphs and their vector embeddings, the problem of text sparsity can be addressed and the quality of clusters can be improved. In recent years, graph embedding has gained rapid growth [Zhang et al. 2020]. It aims to automatically learn a low-dimensional feature representation for each node in a graph. Graph embedding is used in the construction of machine learning models for various tasks, and our goal is to exploit them to improve ontology learning. The approach to be developed in this thesis will also infer hypernym relationships between terms within each cluster. The objective of this task is threefold: 1) to evaluate the quality of the clusters, 2) to refine their description space in an iterative clustering/extraction of hypernym relations/clustering approach, and 3) to evaluate and improve the quality of the exploited knowledge graphs from which term subgraphs are extracted.

The positioning and significance of this research

Since Ontology is crucial for AI applications, many research studies are working on ontology learning. However, they investigate the sparsity and the heterogeneous problem separately. The first originality of our research is to combine knowledge graph representation and prior-knowledge-driven clustering to solve simultaneously the sparsity and the heterogeneous problems. Knowledge graph and graph embedding deal with sparsity problem and prior knowledge-driven clustering deals with heterogenous problem.The second originality of our research is to enrich semantically the graph embedding by integrating prior knowledge from the core ontology in the process of embedding. Focusing on improving the embedding process itself, Sun et al [2020] show that embedding based approaches perform well when training is performed on the text corpus from which the graph is constructed. However, in the case where this corpus is unavailable or of small size, the graph embedding will be based exclusively on its structure, which weakens the performance of these approaches. In this case, in order to semantically enrich the graph embedding input, considering the semantics of certain entities or properties of the graphs could be relevant. This enrichment could be done using a domain ontology or its core ontology.

Therefore, we would like to develop an original approach benefiting on the one hand from the power of graph embedding techniques for the clustering of entities, and on the other hand from the semantic quality of ontology in order to drive and refine the learning. A core ontology will be used as a seed knowledge model to improve the quality of graph embedding as well as for clustering.

Profil du candidat :
Requirements:

-Education Level: MSc

-Field: Computer Science, Data Science, Web Science, Computational Linguistics, Artificial Intelligence

-Candidate Profile: Knowledge on Data mining/Machine Learning, Knowledge on Semantic Web and NLP will be strongly appreciated but not mandatory, Knowledge in programing languages mainly Python.

-Language: English

Formation et compétences requises :
MSc in computer sciences with a good ranking

Adresse d’emploi :
Laboratory: DUKe, LS2N (Laboratory of Digital Sciences of Nantes, France) and a collaboration with NII & AIST (Tokyo, Japan)

Categories: theses

Estimation de l’uplift dans les systèmes de recommandation d’offres

Aug 31 – Sep 1 all-day

Offre en lien avec l’Action/le Réseau : – — –/– — –

Laboratoire/Entreprise : Innovation Orange (Lannion) et GREYC CNRS UMR 6072
Durée : 3 ans
Contact : bruno.cremilleux@unicaen.fr
Date limite de publication : 2023-08-31

Contexte :
Pour candidater :
déposer votre candidature à
https://orange.jobs/jobs/v3/offers/124860?lang=fr

Les systèmes de recommandation d’offres tels que les NBO (Next Best Offer) sont de plus en plus courants dans les entreprises comme Orange qui cherchent à améliorer leurs relations avec les usagers de leurs services. On propose aux clients ou visiteurs une action personnalisée en fonction de leurs profils et de leurs préférences. Cependant, le traitement personnalisé en recommandant une offre sur ces critères ne suffit pas toujours à satisfaire un client. Il est donc important pour les entreprises de mesurer l’uplift [1], c’est-à-dire la différence de revenu ou de satisfaction entre les choix que le client aurait effectué sans recommandation et ceux qu’il effectue avec recommandation. Le défi des systèmes de recommandation d’offres est donc de trouver des algorithmes pour mesurer l’uplift et estimer des politiques du système de recommandation efficaces. Le choix de la mesure d’uplift et de la modélisation de la politique du système est un enjeu important pour maximiser l’impact des actions. Une difficulté intrinsèque de l’uplift est qu’on ne peut pas faire un traitement et un non-traitement pour un même individu. Ce qui implique que l’uplift ne peut pas être mesuré directement pour un individu mais uniquement pour un groupe d’individus, ce qu’on appelle le CATE (Conditional Average Treatment Effect). Hors la mesure du CATE dans un système ou les profils changent selon le traitement que l’on veut faire devient difficile à estimer [4]. De plus, les biais entre les données issues de différents traitements biaisent la mesure de CATE. La littérature propose différentes approches pour résoudre ce problème. Certaines visent à débiaiser les données et utiliser un estimateur robuste [2] et d’autres à utiliser directement les approches causales [5].

Sujet :
L’objectif de la thèse consiste à proposer de nouvelles métriques d’évaluation et des méthodes de modélisation pour l’uplift dans un système de recommandation d’offres. Les défis principaux défis sont l’évaluation de l’uplift dans un système de recommandation et l’apprentissage de politique de recommandation optimisant l’uplift dans un contexte de données biaisées. On s’intéressera en particulier aux approches causales [3, 5] et approche bayésienne connues [2] pour leur robustesse.

[1] Sato Masahiro et al. “Uplift-based evaluation and optimization of recommenders”, proceedings of the 13th ACM Conference on Recommender Systems, 2019.
[2] Rafla Mina, et al. “A Non-Parametric Bayesian Approach for Uplift Discretization and Feature Selection”, ECML PKDD 2022.
[3] Verlelst Théo et al. “Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment”, Machine Learning, 2023, p. 1-25.
[4] Qian Xufeng et al. “Intelligent Request Strategy Design in Recommender System”, proceedings of the 28th ACM SIGKDD 2022.
[5] Bang Heejung et Robins James M. “Doubly robust estimation in missing data and causal inference models”, Biometrics, 2005.

Profil du candidat :
Le profil souhaité est BAC + 5, école d’ingénieur ou Master Recherche statistiques et/ou mathématiques appliquées et/ou data sciences.

Formation et compétences requises :
– la doctorante ou le doctorant devra avoir une bonne connaissance des statistiques et des mathématiques.
– des connaissances en apprentissage machine sont un réel plus.
– des compétences en programmation sont indispensables : maîtrise d’un langage de script dédié à l’analyse de données (Python, éventuellement R ou Matlab).
– une forte motivation, des capacités de synthèse, à bien rédiger et présenter les travaux (anglais) et à s’intégrer dans une équipe sont également demandées
– une expérience sous la forme d’un stage de recherche dans le domaine statistique/ apprentissage machine.

Adresse d’emploi :
Innovation Orange (Lannion) et laboratoire GREYC CNRS UMR 6072 (Caen)

Au sein de Innovation Orange, vous serez intégré(e) dans une équipe de recherche à la pointe de l’innovation et de l’expertise en Machine Learning travaillant sur diverses thématiques, comme par exemple les modèles génératifs, le traitement de séries temporelles, l’IA éthique et la modélisation de l’Uplift. Vous ferez partie d’un écosystème de recherche côtoyant les unités opérationnelles, ayant pour but de développer des algorithmes à la pointe et de les diffuser dans le groupe.

Sujet porteur permettant l’évolution vers les métiers de la recherche en apprentissage artificiel ou de la data-science

Valorisation des travaux via la collaboration au développement d’une librairie open source python sur la modélisation de l’uplift (Kuplift).

Salaire : Vous percevez une rémunération annuelle brute de 33 848 € en 1ère et 2ème année et de 38 480 € en troisième année.

Pour candidater :
déposer votre candidature à
https://orange.jobs/jobs/v3/offers/124860?lang=fr

Categories: theses

Intelligence artificielle explicable et non biaisée : vers une compréhension et représentation des phénomènes de sécurité urbaine